mirror of
https://gitlab.com/jeancf/twoot.git
synced 2024-11-23 20:11:11 +00:00
Add description of file structure
This commit is contained in:
parent
3d0262f005
commit
b45e60a778
127
structure.md
Normal file
127
structure.md
Normal file
|
@ -0,0 +1,127 @@
|
||||||
|
# `main()`
|
||||||
|
|
||||||
|
- Start timer
|
||||||
|
- Parse command line
|
||||||
|
- Build config object - `build_config()`
|
||||||
|
- Setup logging
|
||||||
|
- Open or create database
|
||||||
|
- Select nitter instance to use
|
||||||
|
- Get soup of whole page and timeline (list of soup of items) - `get_timeline()`
|
||||||
|
- Iterate timeline to generate list of dicts with content of each tweet:
|
||||||
|
- Extract tweet ID
|
||||||
|
- Extract timestamp
|
||||||
|
- Skip if timestamp is not within acceptable range - `is_time_valid()`
|
||||||
|
- Skip if if it is a retweet and retweets are excluded
|
||||||
|
- Check database if tweet already exists and skip if it does
|
||||||
|
- Extract author name
|
||||||
|
- Extract twitter user name of author
|
||||||
|
- Extract full status page URL
|
||||||
|
- Add prefix if tweet is reply-to
|
||||||
|
- Add prefix if tweet is retweet
|
||||||
|
- Process media body `process_media_body()`
|
||||||
|
- Add link to quoted page ("card")
|
||||||
|
- Extract image(s) from card `process_card()`
|
||||||
|
- process video and image attachments `process_attachments()`
|
||||||
|
- Add custom footer
|
||||||
|
- Add "Original tweet" footer
|
||||||
|
- Add optional timestamp to footer
|
||||||
|
- If no media, look for image in linked URL
|
||||||
|
- Get filename of downloaded video
|
||||||
|
- Update user profile if necessary - `update_profile()`
|
||||||
|
- Login to Mastodon instance - `login()`
|
||||||
|
- Check toot character limit
|
||||||
|
- Iterate list of tweets
|
||||||
|
- Check if toot cap not reached
|
||||||
|
- Upload video if applicable (previously downloaded)
|
||||||
|
- If no video and applicable, download and upload pic
|
||||||
|
- Find in database toot id of replied_to_tweet
|
||||||
|
- Post toot + insert in database
|
||||||
|
- Clean up downloaded video files
|
||||||
|
- Delete excess records in database
|
||||||
|
|
||||||
|
# `build_config()`
|
||||||
|
|
||||||
|
- Instanciate `global TOML` struct
|
||||||
|
- populate TOML with default values
|
||||||
|
- Load config file and (Over)write all valid keys with values read from file
|
||||||
|
- If no config file, (Over)write all valid keys with values read from the command line
|
||||||
|
- Verify that a minimum valid config is present
|
||||||
|
|
||||||
|
# `get_timeline()`
|
||||||
|
|
||||||
|
- Initiate requests session
|
||||||
|
- Populate headers
|
||||||
|
- Download nitter page of user
|
||||||
|
- Make soup
|
||||||
|
- Build a list with soup of each timeline item
|
||||||
|
- Iterate list
|
||||||
|
- if individual tweet, add to final list
|
||||||
|
- if first tweet of thread, get the thread from tweet page - `_get_rest_of_thread()`
|
||||||
|
|
||||||
|
# `_get_rest_of_thread()`
|
||||||
|
|
||||||
|
- Download page
|
||||||
|
- Make soup
|
||||||
|
- Get all items in thread after main tweet
|
||||||
|
- build list with references of previous tweet
|
||||||
|
- Reverse timeline order
|
||||||
|
|
||||||
|
# `is_time_valid()`
|
||||||
|
|
||||||
|
- Compare timestamp to `tweet_delay` and `tweet_max_age`
|
||||||
|
|
||||||
|
# `process_media_body()`
|
||||||
|
|
||||||
|
- Copy plain text
|
||||||
|
- Convert links starting with @ and # to plain text
|
||||||
|
- Remove redirection from links `deredir_url()`
|
||||||
|
- Substitute source from links `substitute_source()`
|
||||||
|
- Remove trackers from fragments `clean_url()`
|
||||||
|
|
||||||
|
# `process_card()`
|
||||||
|
|
||||||
|
- Get list of image URL in card tag
|
||||||
|
|
||||||
|
# `process_attachments()`
|
||||||
|
|
||||||
|
- Collect URLs of images
|
||||||
|
- Download nitter video (converted animated GIF) and save it in output directory
|
||||||
|
- Download twitter video by calling `youtube_dl` and save it in output directory
|
||||||
|
|
||||||
|
# `update_profile()`
|
||||||
|
|
||||||
|
- Extract banner and avatar picture addresses from soup
|
||||||
|
- Get the banner and avatar picture addresses from database
|
||||||
|
- If user record not found in db, create a new one
|
||||||
|
- If they have changed
|
||||||
|
- Download banner and avatar pictures
|
||||||
|
- Login to Mastodon - `login()`
|
||||||
|
- Update credentials
|
||||||
|
- Record image URLs in database
|
||||||
|
|
||||||
|
# `login()`
|
||||||
|
|
||||||
|
- Create Mastodon application
|
||||||
|
- Login with password if provided
|
||||||
|
- Login with token
|
||||||
|
|
||||||
|
# `deredir_url()`
|
||||||
|
|
||||||
|
- Populate HTTP headers
|
||||||
|
- Download the page
|
||||||
|
|
||||||
|
# `substitute_source()`
|
||||||
|
|
||||||
|
- Parse URL
|
||||||
|
- Susbtitute domain values from config
|
||||||
|
- Unparse URL
|
||||||
|
|
||||||
|
# `clean_url()`
|
||||||
|
|
||||||
|
- Parse URL
|
||||||
|
- Remove UTM parameters from query and fragments
|
||||||
|
- Unparse URL
|
||||||
|
|
||||||
|
# `_remove_trackers_query(url_parsed.query)`
|
||||||
|
|
||||||
|
# `_remove_trackers_fragment(url_parsed.fragment)`
|
Loading…
Reference in New Issue
Block a user