mathdatech/twoot

Fork 0

mirror of https://gitlab.com/jeancf/twoot.git synced 2024-11-23 20:11:11 +00:00

jeancf b45e60a778 Add description of file structure

2023-09-14 16:36:15 +02:00

3.7 KiB

Raw Blame History

`main()`

Start timer
Parse command line
Build config object - build_config()
Setup logging
Open or create database
Select nitter instance to use
Get soup of whole page and timeline (list of soup of items) - get_timeline()
Iterate timeline to generate list of dicts with content of each tweet:
- Extract tweet ID
- Extract timestamp
- Skip if timestamp is not within acceptable range - is_time_valid()
- Skip if if it is a retweet and retweets are excluded
- Check database if tweet already exists and skip if it does
- Extract author name
- Extract twitter user name of author
- Extract full status page URL
- Add prefix if tweet is reply-to
- Add prefix if tweet is retweet
- Process media body process_media_body()
- Add link to quoted page ("card")
- Extract image(s) from card process_card()
- process video and image attachments process_attachments()
- Add custom footer
- Add "Original tweet" footer
- Add optional timestamp to footer
- If no media, look for image in linked URL
- Get filename of downloaded video
Update user profile if necessary - update_profile()
Login to Mastodon instance - login()
Check toot character limit
Iterate list of tweets
- Check if toot cap not reached
- Upload video if applicable (previously downloaded)
- If no video and applicable, download and upload pic
- Find in database toot id of replied_to_tweet
- Post toot + insert in database
Clean up downloaded video files
Delete excess records in database

`build_config()`

Instanciate global TOML struct
populate TOML with default values
Load config file and (Over)write all valid keys with values read from file
If no config file, (Over)write all valid keys with values read from the command line
Verify that a minimum valid config is present

`get_timeline()`

Initiate requests session
Populate headers
Download nitter page of user
Make soup
Build a list with soup of each timeline item
Iterate list
- if individual tweet, add to final list
- if first tweet of thread, get the thread from tweet page - _get_rest_of_thread()

`_get_rest_of_thread()`

Download page
Make soup
Get all items in thread after main tweet
build list with references of previous tweet
Reverse timeline order

`is_time_valid()`

Compare timestamp to tweet_delay and tweet_max_age

`process_media_body()`

Copy plain text
Convert links starting with @ and # to plain text
Remove redirection from links deredir_url()
Substitute source from links substitute_source()
Remove trackers from fragments clean_url()

`process_card()`

Get list of image URL in card tag

`process_attachments()`

Collect URLs of images
Download nitter video (converted animated GIF) and save it in output directory
Download twitter video by calling youtube_dl and save it in output directory

`update_profile()`

Extract banner and avatar picture addresses from soup
Get the banner and avatar picture addresses from database
If user record not found in db, create a new one
If they have changed
- Download banner and avatar pictures
- Login to Mastodon - login()
- Update credentials
- Record image URLs in database

`login()`

Create Mastodon application
Login with password if provided
Login with token

`deredir_url()`

Populate HTTP headers
Download the page

`substitute_source()`

Parse URL
Susbtitute domain values from config
Unparse URL

`clean_url()`

Parse URL
Remove UTM parameters from query and fragments
Unparse URL

`_remove_trackers_query(url_parsed.query)`

`_remove_trackers_fragment(url_parsed.fragment)`

3.7 KiB Raw Blame History

main()

build_config()

get_timeline()

_get_rest_of_thread()

is_time_valid()

process_media_body()

process_card()

process_attachments()

update_profile()

login()

deredir_url()

substitute_source()

clean_url()

_remove_trackers_query(url_parsed.query)

_remove_trackers_fragment(url_parsed.fragment)

3.7 KiB

Raw Blame History

`main()`

`build_config()`

`get_timeline()`

`_get_rest_of_thread()`

`is_time_valid()`

`process_media_body()`

`process_card()`

`process_attachments()`

`update_profile()`

`login()`

`deredir_url()`

`substitute_source()`

`clean_url()`

`_remove_trackers_query(url_parsed.query)`

`_remove_trackers_fragment(url_parsed.fragment)`