twoot/structure.md
2023-09-14 16:36:15 +02:00

3.7 KiB

main()

  • Start timer
  • Parse command line
  • Build config object - build_config()
  • Setup logging
  • Open or create database
  • Select nitter instance to use
  • Get soup of whole page and timeline (list of soup of items) - get_timeline()
  • Iterate timeline to generate list of dicts with content of each tweet:
    • Extract tweet ID
    • Extract timestamp
    • Skip if timestamp is not within acceptable range - is_time_valid()
    • Skip if if it is a retweet and retweets are excluded
    • Check database if tweet already exists and skip if it does
    • Extract author name
    • Extract twitter user name of author
    • Extract full status page URL
    • Add prefix if tweet is reply-to
    • Add prefix if tweet is retweet
    • Process media body process_media_body()
    • Add link to quoted page ("card")
    • Extract image(s) from card process_card()
    • process video and image attachments process_attachments()
    • Add custom footer
    • Add "Original tweet" footer
    • Add optional timestamp to footer
    • If no media, look for image in linked URL
    • Get filename of downloaded video
  • Update user profile if necessary - update_profile()
  • Login to Mastodon instance - login()
  • Check toot character limit
  • Iterate list of tweets
    • Check if toot cap not reached
    • Upload video if applicable (previously downloaded)
    • If no video and applicable, download and upload pic
    • Find in database toot id of replied_to_tweet
    • Post toot + insert in database
  • Clean up downloaded video files
  • Delete excess records in database

build_config()

  • Instanciate global TOML struct
  • populate TOML with default values
  • Load config file and (Over)write all valid keys with values read from file
  • If no config file, (Over)write all valid keys with values read from the command line
  • Verify that a minimum valid config is present

get_timeline()

  • Initiate requests session
  • Populate headers
  • Download nitter page of user
  • Make soup
  • Build a list with soup of each timeline item
  • Iterate list
    • if individual tweet, add to final list
    • if first tweet of thread, get the thread from tweet page - _get_rest_of_thread()

_get_rest_of_thread()

  • Download page
  • Make soup
  • Get all items in thread after main tweet
  • build list with references of previous tweet
  • Reverse timeline order

is_time_valid()

  • Compare timestamp to tweet_delay and tweet_max_age

process_media_body()

  • Copy plain text
  • Convert links starting with @ and # to plain text
  • Remove redirection from links deredir_url()
  • Substitute source from links substitute_source()
  • Remove trackers from fragments clean_url()

process_card()

  • Get list of image URL in card tag

process_attachments()

  • Collect URLs of images
  • Download nitter video (converted animated GIF) and save it in output directory
  • Download twitter video by calling youtube_dl and save it in output directory

update_profile()

  • Extract banner and avatar picture addresses from soup
  • Get the banner and avatar picture addresses from database
  • If user record not found in db, create a new one
  • If they have changed
    • Download banner and avatar pictures
    • Login to Mastodon - login()
    • Update credentials
    • Record image URLs in database

login()

  • Create Mastodon application
  • Login with password if provided
  • Login with token

deredir_url()

  • Populate HTTP headers
  • Download the page

substitute_source()

  • Parse URL
  • Susbtitute domain values from config
  • Unparse URL

clean_url()

  • Parse URL
  • Remove UTM parameters from query and fragments
  • Unparse URL

_remove_trackers_query(url_parsed.query)

_remove_trackers_fragment(url_parsed.fragment)