twoot/structure.md

# `main()`

- Start timer
- Parse command line
- Build config object - `build_config()`
- Setup logging
- Open or create database
- Select nitter instance to use
- Get soup of whole page and timeline (list of soup of items) - `get_timeline()`
- Iterate timeline to generate list of dicts with content of each tweet:
    - Extract tweet ID
    - Extract timestamp
    - Skip if timestamp is not within acceptable range - `is_time_valid()`
    - Skip if if it is a retweet and retweets are excluded
    - Check database if tweet already exists and skip if it does
    - Extract author name
    - Extract twitter user name of author
    - Extract full status page URL
    - Add prefix if tweet is reply-to
    - Add prefix if tweet is retweet
    - Process media body `process_media_body()`
    - Add link to quoted page ("card")
    - Extract image(s) from card `process_card()`
    - process video and image attachments `process_attachments()`
    - Add custom footer
    - Add "Original tweet" footer
    - Add optional timestamp to footer
    - If no media, look for image in linked URL
    - Get filename of downloaded video
- Update user profile if necessary - `update_profile()`
- Login to Mastodon instance - `login()`
- Check toot character limit
- Iterate list of tweets
    - Check if toot cap not reached
    - Upload video if applicable (previously downloaded)
    - If no video and applicable, download and upload pic
    - Find in database toot id of replied_to_tweet
    - Post toot + insert in database
- Clean up downloaded video files
- Delete excess records in database

# `build_config()`

- Instanciate `global TOML` struct
- populate TOML with default values
- Load config file and (Over)write all valid keys with values read from file
- If no config file, (Over)write all valid keys with values read from the command line
- Verify that a minimum valid config is present

# `get_timeline()`

- Initiate requests session
- Populate headers
- Download nitter page of user
- Make soup
- Build a list with soup of each timeline item
- Iterate list
    - if individual tweet, add to final list
    - if first tweet of thread, get the thread from tweet page - `_get_rest_of_thread()`

# `_get_rest_of_thread()`

- Download page
- Make soup
- Get all items in thread after main tweet
- build list with references of previous tweet
- Reverse timeline order

# `is_time_valid()`

- Compare timestamp to `tweet_delay` and `tweet_max_age`

# `process_media_body()`

- Copy plain text
- Convert links starting with @ and # to plain text
- Remove redirection from links `deredir_url()`
- Substitute source from links `substitute_source()`
- Remove trackers from fragments `clean_url()`

# `process_card()`

- Get list of image URL in card tag

# `process_attachments()`

- Collect URLs of images
- Download nitter video (converted animated GIF) and save it in output directory
- Download twitter video by calling `youtube_dl` and save it in output directory

# `update_profile()`

- Extract banner and avatar picture addresses from soup
- Get the banner and avatar picture addresses from database
- If user record not found in db, create a new one
- If they have changed
    - Download banner and avatar pictures
    - Login to Mastodon - `login()`
    - Update credentials
    - Record image URLs in database

# `login()`

- Create Mastodon application
- Login with password if provided
- Login with token

# `deredir_url()`

- Populate HTTP headers
- Download the page

# `substitute_source()`

- Parse URL
- Susbtitute domain values from config
- Unparse URL

# `clean_url()`

- Parse URL
- Remove UTM parameters from query and fragments
- Unparse URL

# `_remove_trackers_query(url_parsed.query)`

# `_remove_trackers_fragment(url_parsed.fragment)`