Add description of file structure

This commit is contained in:
jeancf 2023-09-14 16:36:15 +02:00
parent 3d0262f005
commit b45e60a778

127
structure.md Normal file
View File

@ -0,0 +1,127 @@
# `main()`
- Start timer
- Parse command line
- Build config object - `build_config()`
- Setup logging
- Open or create database
- Select nitter instance to use
- Get soup of whole page and timeline (list of soup of items) - `get_timeline()`
- Iterate timeline to generate list of dicts with content of each tweet:
- Extract tweet ID
- Extract timestamp
- Skip if timestamp is not within acceptable range - `is_time_valid()`
- Skip if if it is a retweet and retweets are excluded
- Check database if tweet already exists and skip if it does
- Extract author name
- Extract twitter user name of author
- Extract full status page URL
- Add prefix if tweet is reply-to
- Add prefix if tweet is retweet
- Process media body `process_media_body()`
- Add link to quoted page ("card")
- Extract image(s) from card `process_card()`
- process video and image attachments `process_attachments()`
- Add custom footer
- Add "Original tweet" footer
- Add optional timestamp to footer
- If no media, look for image in linked URL
- Get filename of downloaded video
- Update user profile if necessary - `update_profile()`
- Login to Mastodon instance - `login()`
- Check toot character limit
- Iterate list of tweets
- Check if toot cap not reached
- Upload video if applicable (previously downloaded)
- If no video and applicable, download and upload pic
- Find in database toot id of replied_to_tweet
- Post toot + insert in database
- Clean up downloaded video files
- Delete excess records in database
# `build_config()`
- Instanciate `global TOML` struct
- populate TOML with default values
- Load config file and (Over)write all valid keys with values read from file
- If no config file, (Over)write all valid keys with values read from the command line
- Verify that a minimum valid config is present
# `get_timeline()`
- Initiate requests session
- Populate headers
- Download nitter page of user
- Make soup
- Build a list with soup of each timeline item
- Iterate list
- if individual tweet, add to final list
- if first tweet of thread, get the thread from tweet page - `_get_rest_of_thread()`
# `_get_rest_of_thread()`
- Download page
- Make soup
- Get all items in thread after main tweet
- build list with references of previous tweet
- Reverse timeline order
# `is_time_valid()`
- Compare timestamp to `tweet_delay` and `tweet_max_age`
# `process_media_body()`
- Copy plain text
- Convert links starting with @ and # to plain text
- Remove redirection from links `deredir_url()`
- Substitute source from links `substitute_source()`
- Remove trackers from fragments `clean_url()`
# `process_card()`
- Get list of image URL in card tag
# `process_attachments()`
- Collect URLs of images
- Download nitter video (converted animated GIF) and save it in output directory
- Download twitter video by calling `youtube_dl` and save it in output directory
# `update_profile()`
- Extract banner and avatar picture addresses from soup
- Get the banner and avatar picture addresses from database
- If user record not found in db, create a new one
- If they have changed
- Download banner and avatar pictures
- Login to Mastodon - `login()`
- Update credentials
- Record image URLs in database
# `login()`
- Create Mastodon application
- Login with password if provided
- Login with token
# `deredir_url()`
- Populate HTTP headers
- Download the page
# `substitute_source()`
- Parse URL
- Susbtitute domain values from config
- Unparse URL
# `clean_url()`
- Parse URL
- Remove UTM parameters from query and fragments
- Unparse URL
# `_remove_trackers_query(url_parsed.query)`
# `_remove_trackers_fragment(url_parsed.fragment)`