diff --git a/structure.md b/structure.md new file mode 100644 index 0000000..84fe216 --- /dev/null +++ b/structure.md @@ -0,0 +1,127 @@ +# `main()` + +- Start timer +- Parse command line +- Build config object - `build_config()` +- Setup logging +- Open or create database +- Select nitter instance to use +- Get soup of whole page and timeline (list of soup of items) - `get_timeline()` +- Iterate timeline to generate list of dicts with content of each tweet: + - Extract tweet ID + - Extract timestamp + - Skip if timestamp is not within acceptable range - `is_time_valid()` + - Skip if if it is a retweet and retweets are excluded + - Check database if tweet already exists and skip if it does + - Extract author name + - Extract twitter user name of author + - Extract full status page URL + - Add prefix if tweet is reply-to + - Add prefix if tweet is retweet + - Process media body `process_media_body()` + - Add link to quoted page ("card") + - Extract image(s) from card `process_card()` + - process video and image attachments `process_attachments()` + - Add custom footer + - Add "Original tweet" footer + - Add optional timestamp to footer + - If no media, look for image in linked URL + - Get filename of downloaded video +- Update user profile if necessary - `update_profile()` +- Login to Mastodon instance - `login()` +- Check toot character limit +- Iterate list of tweets + - Check if toot cap not reached + - Upload video if applicable (previously downloaded) + - If no video and applicable, download and upload pic + - Find in database toot id of replied_to_tweet + - Post toot + insert in database +- Clean up downloaded video files +- Delete excess records in database + +# `build_config()` + +- Instanciate `global TOML` struct +- populate TOML with default values +- Load config file and (Over)write all valid keys with values read from file +- If no config file, (Over)write all valid keys with values read from the command line +- Verify that a minimum valid config is present + +# `get_timeline()` + +- Initiate requests session +- Populate headers +- Download nitter page of user +- Make soup +- Build a list with soup of each timeline item +- Iterate list + - if individual tweet, add to final list + - if first tweet of thread, get the thread from tweet page - `_get_rest_of_thread()` + +# `_get_rest_of_thread()` + +- Download page +- Make soup +- Get all items in thread after main tweet +- build list with references of previous tweet +- Reverse timeline order + +# `is_time_valid()` + +- Compare timestamp to `tweet_delay` and `tweet_max_age` + +# `process_media_body()` + +- Copy plain text +- Convert links starting with @ and # to plain text +- Remove redirection from links `deredir_url()` +- Substitute source from links `substitute_source()` +- Remove trackers from fragments `clean_url()` + +# `process_card()` + +- Get list of image URL in card tag + +# `process_attachments()` + +- Collect URLs of images +- Download nitter video (converted animated GIF) and save it in output directory +- Download twitter video by calling `youtube_dl` and save it in output directory + +# `update_profile()` + +- Extract banner and avatar picture addresses from soup +- Get the banner and avatar picture addresses from database +- If user record not found in db, create a new one +- If they have changed + - Download banner and avatar pictures + - Login to Mastodon - `login()` + - Update credentials + - Record image URLs in database + +# `login()` + +- Create Mastodon application +- Login with password if provided +- Login with token + +# `deredir_url()` + +- Populate HTTP headers +- Download the page + +# `substitute_source()` + +- Parse URL +- Susbtitute domain values from config +- Unparse URL + +# `clean_url()` + +- Parse URL +- Remove UTM parameters from query and fragments +- Unparse URL + +# `_remove_trackers_query(url_parsed.query)` + +# `_remove_trackers_fragment(url_parsed.fragment)`