Merge branch 'susbt'

2025-05-09 04:53:56 +00:00 · 2022-12-11 11:15:50 +01:00 · 2022-12-11 11:15:50 +01:00 · daaf15a6c3
commit daaf15a6c3
parent 8dbce09530
4 changed files with 447 additions and 181 deletions
--- a/.gitignore
+++ b/.gitignore
@ -7,4 +7,5 @@ venv/
 *.png
 *.xcf
 twoot.db
-__pycache__
+*.toml
 !default.toml
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,3 +1,5 @@
 # Changelog
 **23 NOV 2022** VERSION 2.5 Added command-line option (`-l`) to remove
 redirection from links included in tweets. Obfuscated links are replaced
 by the URL that the resource is directly downloaded from. Also improved
--- a/README.md
+++ b/README.md
@ -1,12 +1,20 @@
 # Twoot
-**Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
+Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
-It is simple to set-up on a local machine, configurable and feature-rich.**
+It is simple to set-up on a local machine, configurable and feature-rich.
-**UPDATE 23 NOV 2022** VERSION 2.5 Added command-line option (`-l`) to remove redirection
+**UPDATE 11 DEC 2022** VERSION 3.0 brings some important changes and new features:
-from links included in tweets. Obfuscated links are replaced by the URL that the resource
+
-is directly downloaded from. Also improved tracker removal by cleaning URL fragments as well
+* Only potentially breaking change: **If you are using a version of python < 3.11 you need to install the `tomli` module**
-(contrib: mathdatech, thanks!).
+* Twoot can be configured with a config file in [TOML](https://toml.io/) format. Check `default.toml` for details
 * Domain susbtitution can be configured in the config file to replace links to Twitter, Youtube and
  Reddit domains with alternatives (e.g. [Nitter](https://github.com/zedeus/nitter/wiki/Instances),
  [Invidious](https://redirect.invidious.io/) and [teddit](https://teddit.net/) respectively)
 * A footer line can be specified in the config file that gets added to all toots (with e.g. tags)
 * Added option to not add reference to "Original tweet" at the bottom of toots
 * A password must be provided with `-p` on the command-line for the first run only. After that it is no longer required.
 * The verbosity of logging messages can be set in the config file with `log_level=`.
 * Config file option `log_days =` specifies how long to keep log messages in file. Older messages are deleted.
 > Previous updates can be found in CHANGELOG.
@ -26,27 +34,29 @@ is directly downloaded from. Also improved tracker removal by cleaning URL fragm
 ## Usage
-```
+```sh
-twoot.py [-h] -t <twitter account> -i <mastodon instance> -m <mastodon account>
+twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>]
-                -p <mastodon password> [-r] [-s] [-u] [-v] [-a <max age in days)>]
+         [-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o]
-                [-d <min delay (in mins)>] [-c <max # of toots to post>]
+         [-a <max age in days)>] [-d <min delay (in mins>] [-c <max # of toots to post>]
 ```
 ## Arguments
 Assuming that the Twitter handle is @SuperDuperBot and the Mastodon account
-is @superduperbot@botsin.space
+is sd@example.com on instance masto.space:
-|Switch |Description                                       | Example            | Req |
+|Switch |Description                                       | Example            | Required           |
-|-------|--------------------------------------------------|--------------------|-----|
+|-------|--------------------------------------------------|--------------------|--------------------|
-| -t    | twitter account name without '@'                 | `SuperDuper`       | Yes |
+| -f    | path of `.toml` file with configuration          | `SuperDuper.toml`  | No                 |
-| -i    | Mastodon instance domain name                    | `botsin.space`     | Yes |
+| -t    | twitter account name without '@'                 | `SuperDuper`       | If no config file  |
-| -m    | Mastodon username                                | `sd@example.com`   | Yes |
+| -i    | Mastodon instance domain name                    | `masto.space`      | If no config file  |
-| -p    | Mastodon password                                | `my_Sup3r-S4f3*pw` | Yes |
+| -m    | Mastodon username                                | `sd@example.com`   | If no config file  |
-| -v    | upload videos to Mastodon                        | *N/A*              | No  |
+| -p    | Mastodon password                                | `my_Sup3r-S4f3*pw` | Once at first run  |
 | -v    | Upload videos to Mastodon                        | *N/A*              | No                 |
 | -o    | Do not add "Original tweet" line                 | *N/A*              | No                 |
 | -r    | Post reply-to tweets (ignored by default)        | *N/A*              | No                 |
 | -s    | Skip retweets (posted by default)                | *N/A*              | No                 |
-| -l    | Remove link redirection                          | *N/A*              | No  |
+| -l    | Remove link redirections                         | *N/A*              | No                 |
 | -u    | Remove trackers from URLs                        | *N/A*              | No                 |
 | -a    | Max. age of tweet to post (in days)              | `5`                | No                 |
 | -d    | Min. age before posting new tweet (in minutes)   | `15`               | No                 |
@ -54,20 +64,37 @@ is @superduperbot@botsin.space
 ## Notes
 ### Password
 A password must be provided for the first run only. Once twoot has connected successfully to the
 Mastodon host, an access token is saved in a `.secret` file named after the mastodon account,
 and a password is no longer necessary (command-line switch `-p` is not longer required).
 ### Config file
 A `default.toml` file is provided to be used as template. If `-f` is used to specify a config file
 to use, all the other command-line parameters are ignored, except `-p` (password) if provided.
 ### Removing redirected links
 `-l` will follow every link included in the tweet and replace them with the url that the
 resource is directly dowmnloaded from (if applicable). e.g. bit.ly/xxyyyzz -> example.com
 Every link visit can take up to 5 sec (timeout) therefore this option will slow down
 tweet processing.
-If you are interested by tracker removal (`-u`) you should also select redirection removal(`-l`)
+If you are interested by tracker removal (`-u`) you should also select redirection removal
 as trackers are often hidden behind the redirection of a short URL.
 ### Uploading videos
 When using the `-v` switch consider:
 * whether the copyright of the content that you want to cross-post allows it
 * the storage / transfer limitations of the Mastodon instance that you are posting to
 * the upstream bandwidth that you may consume on your internet connection
 ### Rate control
 Default max age is 1 day. Decimal values are OK.
 Default min delay is 0 minutes.
@ -78,7 +105,8 @@ No limitation is applied to the number of toots uploaded if `-c` is not specifie
 Make sure python3 is installed.
-Twoot depends on `beautifulsoup4` and `Mastodon.py` python modules.
+Twoot depends on `beautifulsoup4` and `Mastodon.py` python modules. Additionally, if you are using
 a version of python < 3.11 you also need to install the `tomli` module.
 **Only If you plan to download videos** with the `-v` switch, are the additional dependencies required:
@ -96,17 +124,15 @@ Add command line to crontab. For example, to run every 15 minutes starting at mi
 and process the tweets posted in the last 5 days but at least 15 minutes
 ago:
-```
+```crontab
-1-59/15 * * * * /path/to/twoot.py -t SuperDuperBot -i botsin.space -m superduperbot -p my_Sup3r-S4f3*pw -a 5 -d 15
+1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15
 ```
 ## Examples
 Twoot is known to be used for the following feeds (older first):
-* [@internetofshit@botsin.space](https://botsin.space/@internetofshit)
+* [@todayilearned@botsin.space](https://noc.social/@todayilearned)
 * [@hackaday@botsin.space](https://botsin.space/@hackaday)
 * [@todayilearned@botsin.space](https://botsin.space/@todayilearned)
 * [@moznews@noc.social](https://noc.social/@moznews)
 * [@hackster_io@noc.social](https://noc.social/@hackster_io)
 * [@cnxsoft@noc.social](https://noc.social/@cnxsoft)
@ -116,6 +142,6 @@ Twoot is known to be used for the following feeds (older first):
 ## Background
-I started twoot when [tootbot](https://github.com/cquest/tootbot)
+I started twoot when [tootbot](https://github.com/cquest/tootbot)stopped working.
-stopped working. Tootbot relied on RSS feeds from https://twitrss.me
+Tootbot relied on RSS feeds from [https://twitrss.me](https://twitrss.me)that broke when Twitter
-that broke when Twitter refreshed their web UI in July 2019.
+refreshed their web UI in July 2019.
--- a/twoot.py
+++ b/twoot.py
@ -19,9 +19,10 @@
 """
 import argparse
-import datetime
+from datetime import datetime, timedelta
 import logging
 import os
 import shutil
 import random
 import re
 import shutil
@ -38,21 +39,17 @@ from mastodon import Mastodon, MastodonError, MastodonAPIError, MastodonIllegalA
 # Number of records to keep in db table for each twitter account
 MAX_REC_COUNT = 50
 # Set the desired verbosity of logging
 # One of logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL
 LOGGING_LEVEL = logging.INFO
 # How many seconds to wait before giving up on a download (except video download)
 HTTPS_REQ_TIMEOUT = 10
 NITTER_URLS = [
    'https://nitter.lacontrevoie.fr',
-    'https://nitter.pussthecat.org',
+    'https://nitter.privacydev.net',
    'https://nitter.fdn.fr',
    'https://nitter.eu',
    'https://nitter.namazso.eu',
    'https://twitter.beparanoid.de',
    'https://n.l5.ca',
-    'https://nitter.bus-hit.me',
+#    'https://nitter.hu',
 ]
 # Update from https://www.whatismybrowser.com/guides/the-latest-user-agent/
@ -67,12 +64,115 @@ USER_AGENTS = [
 ]
 def build_config(args):
    """
    Receives the arguments passed on the command line
    populates the TOML global dict with default values for all 'options' keys
    if a config file is provided, load the keys from the config file
    if no config file is provided, use command-line args
    verify that a valid config is available (all keys in 'config' present)
    :param args: list of command line arguments
    """
    # Create global struct containing configuration
    global TOML
    # Default options
    options = {
        'upload_videos': False,
        'post_reply_to': False,
        'skip_retweets': False,
        'remove_link_redirections': False,
        'remove_trackers_from_urls': False,
        'footer': '',
        'remove_original_tweet_ref': False,
        'tweet_max_age': float(1),
        'tweet_delay': float(0),
        'toot_cap': int(0),
        'subst_twitter': [],
        'subst_youtube': [],
        'subst_reddit': [],
        'log_level': "WARNING",
        'log_days': 3,
    }
    # Create default config object
    TOML = {'config': {},'options': options}
    # Load config file if it was provided
    toml_file = args['f']
    if toml_file is not None:
        try: # Included in python from version 3.11
            import tomllib
        except ModuleNotFoundError:
            # for python < 3.11, tomli module must be installed
            import tomli as tomllib
        loaded_toml = None
        # Load toml file
        try:
            with open(toml_file, 'rb') as config_file:
                loaded_toml = tomllib.load(config_file)
        except FileNotFoundError:
            print('config file not found')
            terminate(-1)
        except tomllib.TOMLDecodeError:
            print('Malformed config file')
            terminate(-1)
        TOML['config'] = loaded_toml['config']
        for k in TOML['options'].keys():
            try:  # Go through all valid keys
                TOML['options'][k] = loaded_toml['options'][k]
            except KeyError:  # Key was not found in file
                pass
    else:
        # Override config parameters with command-line values provided
        if args['t'] is not None:
            TOML['config']['twitter_account'] = args['t']
        if args['i'] is not None:
            TOML['config']['mastodon_instance'] = args['i']
        if args['m'] is not None:
            TOML['config']['mastodon_user'] = args['m']
        if args['v'] is True:
            TOML['options']['upload_videos'] = args['v']
        if args['r'] is True:
            TOML['options']['post_reply_to'] = args['r']
        if args['s'] is True:
            TOML['options']['skip_retweets'] = args['s']
        if args['l'] is True:
            TOML['options']['remove_link_redirections'] = args['l']
        if args['u'] is True:
            TOML['options']['remove_trackers_from_urls'] = args['u']
        if args['o'] is True:
            TOML['options']['remove_original_tweet_ref'] = args['o']
        if args['a'] is not None:
            TOML['options']['tweet_max_age'] = float(args['a'])
        if args['d'] is not None:
            TOML['options']['tweet_delay'] = float(args['d'])
        if args['c'] is not None:
            TOML['options']['toot_cap'] = int(args['c'])
    # Verify that we have a minimum config to run
    if 'twitter_account' not in TOML['config'].keys() or TOML['config']['twitter_account'] == "":
        print('CRITICAL: Missing Twitter account')
        terminate(-1)
    if 'mastodon_instance' not in TOML['config'].keys() or TOML['config']['mastodon_instance'] == "":
        print('CRITICAL: Missing Mastodon instance')
        terminate(-1)
    if 'mastodon_user' not in TOML['config'].keys() or TOML['config']['mastodon_user'] == "":
        print('CRITICAL: Missing Mastodon user')
        terminate(-1)
 def deredir_url(url):
    """
    Given a URL, return the URL that the page really downloads from
    :param url: url to be de-redirected
    :return: direct url
    """
    # Check if we need to do anyting
    if TOML['options']['remove_link_redirections'] is False:
        return url
    # Get a copy of the default headers that requests would use
    headers = requests.utils.default_headers()
@ -87,7 +187,7 @@ def deredir_url(url):
    ret = None
    try:
        # Download the page
-        ret = requests.get(url, headers=headers, timeout=5)
+        ret = requests.head(url, headers=headers, timeout=5)
    except:
        # If anything goes wrong keep the URL intact
        return url
@ -135,7 +235,6 @@ def _remove_trackers_fragment(fragment_str):
    :param query_str: fragment to be cleaned
    :return: cleaned fragment
    """
    params_to_remove = {
        "Echobox",
    }
@ -147,7 +246,50 @@ def _remove_trackers_fragment(fragment_str):
    return fragment_str
-def clean_url(dirty_url):
+def substitute_source(orig_url):
    """
    param orig_url: url to check for substitutes
    :return: url with replaced domains
    """
    parsed_url = urlparse(orig_url)
    domain = parsed_url.netloc
    logging.debug("Checking domain %s for substitution ", domain)
    # Handle twitter
    twitter_subst = TOML["options"]["subst_twitter"]
    # Do not substitiute if subdomain is present (e.g. i.twitter.com)
    if (domain == 'twitter.com' or domain == 'www.twitter.com')  and twitter_subst != []:
        domain = twitter_subst[random.randint(0, len(twitter_subst) - 1)]
        logging.debug("Replaced twitter.com by " + domain)
    # Handle youtube
    youtube_subst = TOML["options"]["subst_youtube"]
    # Do not substitiute if subdomain is present (e.g. i.youtube.com)
    if (domain == 'youtube.com' or domain == 'wwww.youtube.com')  and youtube_subst != []:
        domain = youtube_subst[random.randint(0, len(youtube_subst) - 1)]
        logging.debug("Replaced youtube.com by " + domain)
    # Handle reddit
    reddit_subst = TOML["options"]["subst_reddit"]
    # Do not substitiute if subdomain is present (e.g. i.reddit.com)
    if (domain == 'reddit.com' or domain == 'www.reddit.com')  and reddit_subst != []:
        domain = reddit_subst[random.randint(0, len(reddit_subst) - 1)]
        logging.debug("Replaced reddit.com by " + domain)
    dest_url = urlunparse([
        parsed_url.scheme,
        domain,
        parsed_url.path,
        parsed_url.params,
        parsed_url.query,
        parsed_url.fragment
    ])
    return dest_url
 def clean_url(orig_url):
    """
    Given a URL, return it with the UTM parameters removed from query and fragment
    :param dirty_url: url to be cleaned
@ -155,10 +297,13 @@ def clean_url(dirty_url):
    >>> clean_url('https://example.com/video/this-aerial-ropeway?utm_source=Twitter&utm_medium=video&utm_campaign=organic&utm_content=Nov13&a=aaa&b=1#mkt_tok=tik&mkt_tik=tok')
    'https://example.com/video/this-aerial-ropeway?a=aaa&b=1#mkt_tik=tok'
    """
    # Check if we have to do anything
    if TOML['options']['remove_trackers_from_urls'] is False:
        return orig_url
-    url_parsed = urlparse(dirty_url)
+    url_parsed = urlparse(orig_url)
-    cleaned_url = urlunparse([
+    dest_url = urlunparse([
        url_parsed.scheme,
        url_parsed.netloc,
        url_parsed.path,
@ -166,22 +311,20 @@ def clean_url(dirty_url):
        _remove_trackers_query(url_parsed.query),
        _remove_trackers_fragment(url_parsed.fragment)
    ])
    if dest_url != orig_url:
        logging.debug('Cleaned URL from: ' + orig_url + ' to: ' + dest_url)
-    if cleaned_url != dirty_url:
+    return dest_url
        logging.debug('Cleaned URL from: ' + dirty_url + ' to: ' + cleaned_url)
    return cleaned_url
-def process_media_body(tt_iter, remove_redir, remove_trackers):
+def process_media_body(tt_iter):
    """
    Receives an iterator over all the elements contained in the tweet-text container.
    Processes them to make them suitable for posting on Mastodon
    :param tt_iter: iterator over the HTML elements in the text of the tweet
    :param remove_redir: bool to indicate if redirections should be removed
    :param remove_trackers: bool to indicate if trackers should be removed
    :return:        cleaned up text of the tweet
    """
    tweet_text = ''
    # Iterate elements
    for tag in tt_iter:
@ -200,14 +343,10 @@ def process_media_body(tt_iter, remove_redir, remove_trackers):
                tweet_text += tag_text
            else:
                # This is a real link
                if remove_redir:
                url = deredir_url(tag.get('href'))
-                else:
+                url = substitute_source(url)
-                    url = tag.get('href')
+                url = clean_url(url)
                if remove_trackers:
                    tweet_text += clean_url(url)
                else:
                tweet_text += url
        else:
            logging.warning("No handler for tag in twitter text: " + tag.prettify())
@ -232,12 +371,11 @@ def process_card(nitter_url, card_container):
    return list
-def process_attachments(nitter_url, attachments_container, get_vids, twit_account, status_id, author_account):
+def process_attachments(nitter_url, attachments_container, status_id, author_account):
    """
    Extract images or video from attachments. Videos are downloaded on the file system.
    :param nitter_url: url of nitter mirror
    :param attachments_container: soup of 'div' tag containing attachments markup
    :param get_vids: whether to download videos or not
    :param twit_account: name of twitter account
    :param status_id: id of tweet being processed
    :param author_account: author of tweet with video attachment
@ -249,14 +387,14 @@ def process_attachments(nitter_url, attachments_container, get_vids, twit_accoun
    for image in images:
        pics.append(nitter_url + image.get('href'))
-    logging.debug('collected ' + str(len(pics)) + ' images from attachments')
+    logging.debug('collected ' + str(len(pics)) + ' image(s) from attachments')
    # Download nitter video (converted animated GIF)
    gif_class = attachments_container.find('video', class_='gif')
    if gif_class is not None:
        gif_video_file = nitter_url + gif_class.source.get('src')
-        video_path = os.path.join('output', twit_account, status_id, author_account, status_id)
+        video_path = os.path.join('output', TOML['config']['twitter_account'], status_id, author_account, status_id)
        os.makedirs(video_path, exist_ok=True)
        # Open directory for writing file
@ -283,12 +421,12 @@ def process_attachments(nitter_url, attachments_container, get_vids, twit_accoun
    vid_in_tweet = False
    vid_class = attachments_container.find('div', class_='video-container')
    if vid_class is not None:
-        if get_vids:
+        if TOML['options']['upload_videos']:
            import youtube_dl
            video_file = os.path.join('https://twitter.com', author_account, 'status', status_id)
            ydl_opts = {
-                'outtmpl': "output/" + twit_account + "/" + status_id + "/%(id)s.%(ext)s",
+                'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
                'format': "best[width<=500]",
                'socket_timeout': 60,
                'quiet': True,
@ -298,7 +436,7 @@ def process_attachments(nitter_url, attachments_container, get_vids, twit_accoun
                try:
                    ydl.download([video_file])
                except Exception as e:
-                    logging.warn('Error downloading twitter video: ' + str(e))
+                    logging.warning('Error downloading twitter video: ' + str(e))
                    vid_in_tweet = True
                else:
                    logging.debug('downloaded twitter video from attachments')
@ -320,12 +458,12 @@ def contains_class(body_classes, some_class):
    return found
-def is_time_valid(timestamp, max_age, min_delay):
+def is_time_valid(timestamp):
    ret = True
    # Check that the tweet is not too young (might be deleted) or too old
    age_in_hours = (time.time() - float(timestamp)) / 3600.0
-    min_delay_in_hours = min_delay / 60.0
+    min_delay_in_hours = TOML['options']['tweet_delay'] / 60.0
-    max_age_in_hours = max_age * 24.0
+    max_age_in_hours = TOML['options']['tweet_max_age'] * 24.0
    if age_in_hours < min_delay_in_hours or age_in_hours > max_age_in_hours:
        ret = False
@ -333,110 +471,205 @@ def is_time_valid(timestamp, max_age, min_delay):
    return ret
-def login(instance, account, password):
+def login(password):
-    # Create Mastodon application if it does not exist yet
+    """
-    if not os.path.isfile(instance + '.secret'):
+    Login to Mastodon account and return mastodon object used to post content
    :param password: Password associated to account. None if not provided
    :return: mastodon object
    """
 # Create Mastodon application if it does not exist yet
    if not os.path.isfile(TOML['config']['mastodon_instance'] + '.secret'):
        try:
            Mastodon.create_app(
-                'twoot',
+                'feedtoot',
-                api_base_url='https://' + instance,
+                api_base_url='https://' + TOML['config']['mastodon_instance'],
-                to_file=instance + '.secret'
+                to_file=TOML['config']['mastodon_instance'] + '.secret'
            )
        except MastodonError as me:
-            logging.fatal('failed to create app on ' + instance)
+            logging.fatal('failed to create app on ' + TOML['config']['mastodon_instance'])
            logging.fatal(me)
-            sys.exit(-1)
+            terminate(-1)
-    # Log in to Mastodon instance
+    mastodon = None
    # Log in to Mastodon instance with password
    if password is not None:
        try:
            mastodon = Mastodon(
-            client_id=instance + '.secret',
+                client_id=TOML['config']['mastodon_instance'] + '.secret',
-            api_base_url='https://' + instance
+                api_base_url='https://' + TOML['config']['mastodon_instance']
            )
            mastodon.log_in(
-            username=account,
+                username=TOML['config']['mastodon_user'],
                password=password,
-            to_file=account + ".secret"
+                to_file=TOML['config']['mastodon_user'] + ".secret"
            )
-        logging.info('Logging in to ' + instance)
+            logging.info('Logging in to ' + TOML['config']['mastodon_instance'])
        except MastodonError as me:
-        logging.fatal('ERROR: Login to ' + instance + ' Failed')
+            logging.fatal('Login to ' + TOML['config']['mastodon_instance'] + ' Failed\n')
            logging.fatal(me)
-        sys.exit(-1)
+            terminate(-1)
-    # Check ratelimit status
+        if os.path.isfile(TOML['config']['mastodon_user'] + '.secret'):
-    logging.debug('Ratelimit allowed requests: ' + str(mastodon.ratelimit_limit))
+            logging.warning('You successfully logged in using a password and an access token \
-    logging.debug('Ratelimit remaining requests: ' + str(mastodon.ratelimit_remaining))
+                            has been saved. The password can therefore be omitted from the \
-    logging.debug('Ratelimit reset time: ' + time.asctime(time.localtime(mastodon.ratelimit_reset)))
+                            command-line in future invocations')
-    logging.debug('Ratelimit last call: ' + time.asctime(time.localtime(mastodon.ratelimit_lastcall)))
+    else: # No password provided, login with token
        # Using token in existing .secret file
        if os.path.isfile(TOML['config']['mastodon_user'] + '.secret'):
            try:
                mastodon = Mastodon(
                    access_token=TOML['config']['mastodon_user'] + '.secret',
                    api_base_url='https://' + TOML['config']['mastodon_instance']
            )
            except MastodonError as me:
                logging.fatal('Login to ' + TOML['config']['mastodon_instance'] + ' Failed\n')
                logging.fatal(me)
                terminate(-1)
        else:
            logging.fatal('No .secret file found. Password required to log in')
            terminate(-1)
    return mastodon
 def terminate(exit_code):
    """
    Cleanly stop execution with a message on execution duration
    Remove log messages older that duration specified in config from log file
    :param exit_code: return value to pass to shell when exiting
    """
    logging.info('Run time : {t:2.1f} seconds.'.format(t=time.time() - START_TIME))
    logging.info('_____________________________________________________________________________________')
    # Close logger and log file
    logging.shutdown()
    # Remove older log messages
    # Max allowed age of log message
    max_delta = timedelta(TOML['options']['log_days'])
    # Open log file
    log_file_name = TOML['config']['twitter_account'] + '.log'
    new_log_file_name = TOML['config']['twitter_account'] + '.log.new'
    try:
        log_file = open(log_file_name, 'r')
    except FileNotFoundError:
        # Nothing to do if there is no log file
        exit(exit_code)
    # Check each line
    pos = log_file.tell()
    while log_file:
        line = log_file.readline()
        try:
            # Extract date on log line
            date = datetime.strptime(line[:10], '%Y-%m-%d')
        except ValueError:
            # date was not found on this line, try next one
            continue
        # Time difference between log message and now
        log_delta = datetime.now() - date
        # Only keep the number of days of the difference
        log_delta = timedelta(days=log_delta.days)
        if log_delta < max_delta:
            # Reset file pointer to position before reading last line
            log_file.seek(pos)
            remainder = log_file.read()
            output_file = open(new_log_file_name, 'w')
            output_file.write(remainder)
            output_file.close()
            # replace log file by new one
            shutil.move(new_log_file_name, log_file_name)
            break  # Exit while loop
        # Update read pointer position
        pos = log_file.tell()
    exit(exit_code)
 def main(argv):
    # Start stopwatch
-    start_time = time.time()
+    global START_TIME
    START_TIME = time.time()
    # Build parser for command line arguments
    parser = argparse.ArgumentParser(description='toot tweets.')
-    parser.add_argument('-t', metavar='<twitter account>', action='store', required=True)
+    parser.add_argument('-f', metavar='<.toml config file>', action='store')
-    parser.add_argument('-i', metavar='<mastodon instance>', action='store', required=True)
+    parser.add_argument('-t', metavar='<twitter account>', action='store')
-    parser.add_argument('-m', metavar='<mastodon account>', action='store', required=True)
+    parser.add_argument('-i', metavar='<mastodon instance>', action='store')
-    parser.add_argument('-p', metavar='<mastodon password>', action='store', required=True)
+    parser.add_argument('-m', metavar='<mastodon account>', action='store')
    parser.add_argument('-p', metavar='<mastodon password>', action='store')
    parser.add_argument('-r', action='store_true', help='Also post replies to other tweets')
    parser.add_argument('-s', action='store_true', help='Suppress retweets')
    parser.add_argument('-l', action='store_true', help='Remove link redirection')
    parser.add_argument('-u', action='store_true', help='Remove trackers from URLs')
    parser.add_argument('-v', action='store_true', help='Ingest twitter videos and upload to Mastodon instance')
-    parser.add_argument('-a', metavar='<max age (in days)>', action='store', type=float, default=1)
+    parser.add_argument('-o', action='store_true', help='Do not add reference to Original tweet')
-    parser.add_argument('-d', metavar='<min delay (in mins)>', action='store', type=float, default=0)
+    parser.add_argument('-a', metavar='<max age (in days)>', action='store', type=float)
-    parser.add_argument('-c', metavar='<max # of toots to post>', action='store', type=int, default=0)
+    parser.add_argument('-d', metavar='<min delay (in mins)>', action='store', type=float)
    parser.add_argument('-c', metavar='<max # of toots to post>', action='store', type=int)
    # Parse command line
    args = vars(parser.parse_args())
-    twit_account = args['t']
+    build_config(args)
    mast_instance = args['i']
    mast_account = args['m']
    mast_password = args['p']
    tweets_and_replies = args['r']
    suppress_retweets = args['s']
    remove_redir = args['l']
    remove_trackers = args['u']
    get_vids = args['v']
    max_age = float(args['a'])
    min_delay = float(args['d'])
    cap = int(args['c'])
-    # Remove previous log file
+    mast_password = args['p']
    # try:
    #    os.remove(twit_account + '.log')
    # except FileNotFoundError:
    #    pass
    # Setup logging to file
    logging.basicConfig(
-        filename=twit_account + '.log',
+        filename=TOML['config']['twitter_account'] + '.log',
        level=LOGGING_LEVEL,
        format='%(asctime)s %(levelname)-8s %(message)s',
        datefmt='%Y-%m-%d %H:%M:%S',
    )
-    logging.info('Running with the following parameters:')
+    # Set level of logging
-    logging.info('    -t ' + twit_account)
+    log_level = logging.WARNING
-    logging.info('    -i ' + mast_instance)
+    match TOML['options']['log_level'].upper():
-    logging.info('    -m ' + mast_account)
+        case 'DEBUG':
-    logging.info('    -r ' + str(tweets_and_replies))
+            log_level = logging.DEBUG
-    logging.info('    -s ' + str(suppress_retweets))
+        case 'INFO':
-    logging.info('    -l ' + str(remove_redir))
+            log_level = logging.INFO
-    logging.info('    -u ' + str(remove_trackers))
+        case 'WARNING':
-    logging.info('    -v ' + str(get_vids))
+            log_level = logging.WARNING
-    logging.info('    -a ' + str(max_age))
+        case 'ERROR':
-    logging.info('    -d ' + str(min_delay))
+            log_level = logging.ERROR
-    logging.info('    -c ' + str(cap))
+        case 'CRITICAL':
            log_level = logging.CRITICAL
        case 'OFF':
            # Disable all logging
            logging.disable(logging.CRITICAL)
        case _:
            logging.error('Invalid log_level %s in config file. Using WARNING.', str(TOML['options']['log_level']))
    logger = logging.getLogger()
    logger.setLevel(log_level)
    logging.info('Running with the following configuration:')
    logging.info('  Config File              : ' + str(args['f']))
    logging.info('  twitter_account          : ' + TOML['config']['twitter_account'])
    logging.info('  mastodon_instance        : ' + TOML['config']['mastodon_instance'])
    logging.info('  mastodon_user            : ' + TOML['config']['mastodon_user'])
    logging.info('  upload_videos            : ' + str(TOML['options']['upload_videos']))
    logging.info('  post_reply_to            : ' + str(TOML['options']['post_reply_to']))
    logging.info('  skip_retweets            : ' + str(TOML['options']['skip_retweets']))
    logging.info('  remove_link_redirections : ' + str(TOML['options']['remove_link_redirections']))
    logging.info('  remove_trackers_from_urls: ' + str(TOML['options']['remove_trackers_from_urls']))
    logging.info('  footer                   : ' + TOML['options']['footer'])
    logging.info('  remove_original_tweet_ref: ' + str(TOML['options']['remove_original_tweet_ref']))
    logging.info('  tweet_max_age            : ' + str(TOML['options']['tweet_max_age']))
    logging.info('  tweet_delay              : ' + str(TOML['options']['tweet_delay']))
    logging.info('  toot_cap                 : ' + str(TOML['options']['toot_cap']))
    logging.info('  subst_twitter            : ' + str(TOML['options']['subst_twitter']))
    logging.info('  subst_twitter            : ' + str(TOML['options']['subst_youtube']))
    logging.info('  subst_twitter            : ' + str(TOML['options']['subst_reddit']))
    logging.info('  log_level                : ' + str(TOML['options']['log_level']))
    logging.info('  log_days                 : ' + str(TOML['options']['log_days']))
    # Try to open database. If it does not exist, create it
    sql = sqlite3.connect('twoot.db')
@ -470,9 +703,9 @@ def main(argv):
        }
    )
-    url = nitter_url + '/' + twit_account
+    url = nitter_url + '/' + TOML['config']['twitter_account']
    # Use different page if we need to handle replies
-    if tweets_and_replies:
+    if TOML['options']['post_reply_to']:
        url += '/with_replies'
    # Download twitter page of user
@ -480,21 +713,21 @@ def main(argv):
        twit_account_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
    except requests.exceptions.ConnectionError:
        logging.fatal('Host did not respond when trying to download ' + url)
-        exit(-1)
+        terminate(-1)
    except requests.exceptions.Timeout:
        logging.fatal(nitter_url + ' took too long to respond')
-        exit(-1)
+        terminate(-1)
    # Verify that download worked
    if twit_account_page.status_code != 200:
        logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(
            twit_account_page.status_code) + '). Aborting')
-        exit(-1)
+        terminate(-1)
-    logging.info('Nitter page downloaded successfully from ' + url)
+    logging.debug('Nitter page downloaded successfully from ' + url)
    # DEBUG: Save page to file
-    # of = open(twit_account + '.html', 'w')
+    # of = open(toml['config']['twitter_account'] + '.html', 'w')
    # of.write(twit_account_page.text)
    # of.close()
@ -505,7 +738,7 @@ def main(argv):
    ta = soup.find('meta', property='og:title').get('content')
    ta_match = re.search(r'\(@(.+)\)', ta)
    if ta_match is not None:
-        twit_account = ta_match.group(1)
+        TOML['config']['twitter_account'] = ta_match.group(1)
    # Extract twitter timeline
    timeline = soup.find_all('div', class_='timeline-item')
@ -528,19 +761,19 @@ def main(argv):
        # Extract time stamp
        time_string = status.find('span', class_='tweet-date').a.get('title')
        try:
-            timestamp = datetime.datetime.strptime(time_string, '%d/%m/%Y, %H:%M:%S').timestamp()
+            timestamp = datetime.strptime(time_string, '%d/%m/%Y, %H:%M:%S').timestamp()
        except:
            # Dec 21, 2021 · 12:00 PM UTC
-            timestamp = datetime.datetime.strptime(time_string, '%b %d, %Y · %I:%M %p %Z').timestamp()
+            timestamp = datetime.strptime(time_string, '%b %d, %Y · %I:%M %p %Z').timestamp()
        # Check if time is within acceptable range
-        if not is_time_valid(timestamp, max_age, min_delay):
+        if not is_time_valid(timestamp):
            out_date_cnt += 1
            logging.debug("Tweet outside valid time range, skipping")
            continue
        # Check if retweets must be skipped
-        if suppress_retweets:
+        if TOML['options']['skip_retweets']:
            # Check if this tweet is a retweet
            if len(status.select("div.tweet-body > div > div.retweet-header")) != 0:
                logging.debug("Retweet ignored per command-line configuration")
@ -549,7 +782,7 @@ def main(argv):
        # Check in database if tweet has already been posted
        db.execute(
            "SELECT * FROM toots WHERE twitter_account=? AND mastodon_instance=? AND mastodon_account=? AND tweet_id=?",
-            (twit_account, mast_instance, mast_account, tweet_id))
+            (TOML['config']['twitter_account'], TOML['config']['mastodon_instance'], TOML['config']['mastodon_user'], tweet_id))
        tweet_in_db = db.fetchone()
        if tweet_in_db is not None:
@ -588,12 +821,12 @@ def main(argv):
        tt_iter = status.find('div', class_='tweet-content media-body').children
        # Process text of tweet
-        tweet_text += process_media_body(tt_iter, remove_redir, remove_trackers)
+        tweet_text += process_media_body(tt_iter)
        # Process quote: append link to tweet_text
        quote_div = status.find('a', class_='quote-link')
        if quote_div is not None:
-            tweet_text += '\n\nhttps://twitter.com' + quote_div.get('href').strip('#m')
+            tweet_text += substitute_source('\n\nhttps://twitter.com' + quote_div.get('href').strip('#m'))
        # Process card : extract image if necessary
        card_class = status.find('a', class_='card-container')
@ -603,14 +836,24 @@ def main(argv):
        # Process attachment: capture image or .mp4 url or download twitter video
        attachments_class = status.find('div', class_='attachments')
        if attachments_class is not None:
-            pics, vid_in_tweet = process_attachments(nitter_url, attachments_class, get_vids, twit_account, status_id,
+            pics, vid_in_tweet = process_attachments(nitter_url,
-                                                     author_account)
+                                                     attachments_class,
                                                     status_id, author_account
            )
            photos.extend(pics)
            if vid_in_tweet:
                tweet_text += '\n\n[Video embedded in original tweet]'
        # Add custom footer from config file
        if TOML['options']['footer'] != '':
            tweet_text += '\n\n' + TOML['options']['footer']
        # Add footer with link to original tweet
-        tweet_text += '\n\nOriginal tweet : ' + full_status_url
+        if TOML['options']['remove_original_tweet_ref'] == False:
            if TOML['options']['footer'] != '':
                tweet_text += '\nOriginal tweet : ' + substitute_source(full_status_url)
            else:
                tweet_text += '\n\nOriginal tweet : ' + substitute_source(full_status_url)
        # If no media was specifically added in the tweet, try to get the first picture
        # with "twitter:image" meta tag in first linked page in tweet text
@ -640,7 +883,7 @@ def main(argv):
        # Check if video was downloaded
        video_file = None
-        video_path = Path('./output') / twit_account / status_id
+        video_path = Path('./output') / TOML['config']['twitter_account'] / status_id
        if video_path.exists():
            # list video files
            video_file_list = list(video_path.glob('*.mp4'))
@ -666,14 +909,10 @@ def main(argv):
    logging.info(str(out_date_cnt) + ' tweets outside of valid time range')
    logging.info(str(in_db_cnt) + ' tweets already in database')
    # DEBUG: Print extracted tweets
    # for t in tweets:
    # print(t)
    # Login to account on maston instance
    mastodon = None
    if len(tweets) != 0:
-        mastodon = login(mast_instance, mast_account, mast_password)
+        mastodon = login(mast_password)
    # **********************************************************
    # Iterate tweets in list.
@ -683,8 +922,8 @@ def main(argv):
    posted_cnt = 0
    for tweet in reversed(tweets):
        # Check if we have reached the cap on the number of toots to post
-        if cap != 0 and posted_cnt >= cap:
+        if TOML['options']['toot_cap'] != 0 and posted_cnt >= TOML['options']['toot_cap']:
-            logging.info('%d toots not posted due to configured cap', len(tweets) - cap)
+            logging.info('%d toots not posted due to configured cap', len(tweets) - TOML['options']['toot_cap'])
            break
        logging.debug('Uploading Tweet %s', tweet["tweet_id"])
@ -727,8 +966,8 @@ def main(argv):
        toot = {}
        try:
            mastodon = Mastodon(
-                access_token=mast_account + '.secret',
+                access_token=TOML['config']['mastodon_user'] + '.secret',
-                api_base_url='https://' + mast_instance
+                api_base_url='https://' + TOML['config']['mastodon_instance']
            )
            if len(media_ids) == 0:
@ -737,31 +976,31 @@ def main(argv):
                toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids, visibility='public')
        except MastodonError as me:
-            logging.error('posting ' + tweet['tweet_text'] + ' to ' + mast_instance + ' Failed')
+            logging.error('posting ' + tweet['tweet_text'] + ' to ' + TOML['config']['mastodon_instance'] + ' Failed')
            logging.error(me)
        else:
            posted_cnt += 1
-            logging.debug('Tweet %s posted on %s', tweet['tweet_id'], mast_account)
+            logging.debug('Tweet %s posted on %s', tweet['tweet_id'], TOML['config']['mastodon_user'])
        # Insert toot id into database
        if 'id' in toot:
            db.execute("INSERT INTO toots VALUES ( ? , ? , ? , ? , ? )",
-                       (twit_account, mast_instance, mast_account, tweet['tweet_id'], toot['id']))
+                       (TOML['config']['twitter_account'], TOML['config']['mastodon_instance'], TOML['config']['mastodon_user'], tweet['tweet_id'], toot['id']))
            sql.commit()
    logging.info(str(posted_cnt) + ' tweets posted to Mastodon')
    # Cleanup downloaded video files
    try:
-        shutil.rmtree('./output/' + twit_account)
+        shutil.rmtree('./output/' + TOML['config']['twitter_account'])
    except FileNotFoundError:  # The directory does not exist
        pass
    # Evaluate excess records in database
    excess_count = 0
-    db.execute('SELECT count(*) FROM toots WHERE twitter_account=?', (twit_account,))
+    db.execute('SELECT count(*) FROM toots WHERE twitter_account=?', (TOML['config']['twitter_account'],))
    db_count = db.fetchone()
    if db_count is not None:
        excess_count = db_count[0] - MAX_REC_COUNT
@ -777,14 +1016,12 @@ def main(argv):
            LIMIT ?
            )
            DELETE from toots
-            WHERE tweet_id IN excess''', (twit_account, excess_count))
+            WHERE tweet_id IN excess''', (TOML['config']['twitter_account'], excess_count))
        sql.commit()
        logging.info('Deleted ' + str(excess_count) + ' old records from database.')
-    logging.info('Run time : %2.1f seconds' % (time.time() - start_time))
+    terminate(0)
    logging.info('_____________________________________________________________________________________')
 if __name__ == "__main__":
    main(sys.argv)