Add description of file structure

2024-10-05 20:39:32 +00:00 · 2023-09-14 16:36:15 +02:00 · 2023-09-14 16:36:15 +02:00 · b45e60a778
commit b45e60a778
parent 3d0262f005
1 changed files with 127 additions and 0 deletions
--- a/structure.md
+++ b/structure.md
@ -0,0 +1,127 @@
 # `main()`
 - Start timer
 - Parse command line
 - Build config object - `build_config()`
 - Setup logging
 - Open or create database
 - Select nitter instance to use
 - Get soup of whole page and timeline (list of soup of items) - `get_timeline()`
 - Iterate timeline to generate list of dicts with content of each tweet:
    - Extract tweet ID
    - Extract timestamp
    - Skip if timestamp is not within acceptable range - `is_time_valid()`
    - Skip if if it is a retweet and retweets are excluded
    - Check database if tweet already exists and skip if it does
    - Extract author name
    - Extract twitter user name of author
    - Extract full status page URL
    - Add prefix if tweet is reply-to
    - Add prefix if tweet is retweet
    - Process media body `process_media_body()`
    - Add link to quoted page ("card")
    - Extract image(s) from card `process_card()`
    - process video and image attachments `process_attachments()` 
    - Add custom footer
    - Add "Original tweet" footer
    - Add optional timestamp to footer
    - If no media, look for image in linked URL
    - Get filename of downloaded video
 - Update user profile if necessary - `update_profile()`
 - Login to Mastodon instance - `login()`
 - Check toot character limit
 - Iterate list of tweets
    - Check if toot cap not reached
    - Upload video if applicable (previously downloaded)
    - If no video and applicable, download and upload pic
    - Find in database toot id of replied_to_tweet
    - Post toot + insert in database
 - Clean up downloaded video files
 - Delete excess records in database
 # `build_config()`
 - Instanciate `global TOML` struct
 - populate TOML with default values
 - Load config file and (Over)write all valid keys with values read from file
 - If no config file, (Over)write all valid keys with values read from the command line
 - Verify that a minimum valid config is present
 # `get_timeline()`
 - Initiate requests session
 - Populate headers
 - Download nitter page of user
 - Make soup
 - Build a list with soup of each timeline item
 - Iterate list
    - if individual tweet, add to final list
    - if first tweet of thread, get the thread from tweet page - `_get_rest_of_thread()`
 # `_get_rest_of_thread()`
 - Download page
 - Make soup
 - Get all items in thread after main tweet
 - build list with references of previous tweet
 - Reverse timeline order
 # `is_time_valid()`
 - Compare timestamp to `tweet_delay` and `tweet_max_age` 
 # `process_media_body()`
 - Copy plain text
 - Convert links starting with @ and # to plain text
 - Remove redirection from links `deredir_url()`
 - Substitute source from links `substitute_source()`
 - Remove trackers from fragments `clean_url()`
 # `process_card()`
 - Get list of image URL in card tag
 # `process_attachments()`
 - Collect URLs of images
 - Download nitter video (converted animated GIF) and save it in output directory
 - Download twitter video by calling `youtube_dl` and save it in output directory
 # `update_profile()`
 - Extract banner and avatar picture addresses from soup
 - Get the banner and avatar picture addresses from database
 - If user record not found in db, create a new one
 - If they have changed
    - Download banner and avatar pictures
    - Login to Mastodon - `login()`
    - Update credentials
    - Record image URLs in database
 # `login()`
 - Create Mastodon application
 - Login with password if provided
 - Login with token
 # `deredir_url()`
 - Populate HTTP headers
 - Download the page
 # `substitute_source()`
 - Parse URL
 - Susbtitute domain values from config
 - Unparse URL
 # `clean_url()`
 - Parse URL
 - Remove UTM parameters from query and fragments
 - Unparse URL
 # `_remove_trackers_query(url_parsed.query)`
 # `_remove_trackers_fragment(url_parsed.fragment)`