Merge branch 'replies'

This commit is contained in:
jeancf 2023-07-17 21:01:35 +02:00
commit e512838a0e
3 changed files with 109 additions and 56 deletions

View File

@ -1,5 +1,18 @@
# Changelog # Changelog
**14 JUL 2023** VERSION 4.2
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
your config file to slow down the rate at which toots are uploaded.*
A list of nitter instances to use can now be specified in the config file
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
If none is specified, the built-in list of 2-3 known good instances is used as before.
**12 JUL 2023** VERSION 4.1 **12 JUL 2023** VERSION 4.1
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead. **Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.

View File

@ -3,18 +3,11 @@
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account. Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
It is simple to set-up on a local machine, configurable and feature-rich. It is simple to set-up on a local machine, configurable and feature-rich.
**14 JUL 2023** VERSION 4.2 **17 JUL 2023** VERSION 4.3
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are * Twitter threads are replicated on Mastodon: each follow-up message in a thread is posted
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability. as a reply to its predecessor.
* An issue with downloading videos has been fixed ("ERROR: Sorry, you are not authorized to see this status").
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
your config file to slow down the rate at which toots are uploaded.*
A list of nitter instances to use can now be specified in the config file
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
If none is specified, the built-in list of 2-3 known good instances is used as before.
> Previous updates can be found in CHANGELOG. > Previous updates can be found in CHANGELOG.
@ -22,6 +15,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
* Fetch timeline of given user from twitter.com (through nitter instance) * Fetch timeline of given user from twitter.com (through nitter instance)
* Scrape html and format tweets for post on mastodon * Scrape html and format tweets for post on mastodon
* Threads (series of replies to own messages) are replicated
* Emojis supported * Emojis supported
* Upload images from tweet to Mastodon * Upload images from tweet to Mastodon
* Optionally upload videos from tweet to Mastodon * Optionally upload videos from tweet to Mastodon
@ -41,7 +35,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
## Usage ## Usage
```sh ```
usage: twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>] usage: twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>]
[-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o] [-q] [-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o] [-q]
[-a <max age (in days)>] [-d <min delay (in mins)>] [-c <max # of toots to post>] [-a <max age (in days)>] [-d <min delay (in mins)>] [-c <max # of toots to post>]
@ -85,18 +79,19 @@ to use, all the other command-line parameters are ignored, except `-p` (password
### Removing redirected links ### Removing redirected links
`-l` (or `remove_link_redirections = true` in toml file) will follow every link included in the `remove_link_redirections = true` in toml file (or `-l` on the command line ) will follow every link
tweet and replace them with the url that the resource is directly dowmnloaded from (if applicable). included in the tweet and replace them with the url that the resource is directly dowmnloaded from
e.g. bit.ly/xxyyyzz -> example.com (if applicable). e.g. bit.ly/xxyyyzz -> example.com
Every link visit can take up to 5 sec (timeout) depending on the responsiveness of the source Every link visit can take up to 5 sec (timeout) depending on the responsiveness of the source
therefore this option will slow down tweet processing. therefore this option will slow down tweet processing.
If you are interested by tracker removal (`-u`, `remove_trackers_from_urls = true`) you should If you are interested by tracker removal (`remove_trackers_from_urls = true`, `-u`) you should
also select redirection removal as trackers are often hidden behind the redirection of a short URL. also select redirection removal as trackers are often hidden behind the redirection of a short URL.
### Uploading videos ### Uploading videos
When using the `-v` (`upload_videos = true`) switch consider: When using the `upload_videos = true` (`-v`) switch consider:
* whether the copyright of the content that you want to cross-post allows it * whether the copyright of the content that you want to cross-post allows it
* the storage / transfer limitations of the Mastodon instance that you are posting to * the storage / transfer limitations of the Mastodon instance that you are posting to
@ -104,7 +99,7 @@ When using the `-v` (`upload_videos = true`) switch consider:
### Updating profile ### Updating profile
If `-q` (`update_profile = true`) is specified, twoot will check if the avatar and banner pictures If `update_profile = true` (`-q`) is specified, twoot will check if the avatar and banner pictures
have changed on the twitter page. This check compares the name of files used by twitter with the names have changed on the twitter page. This check compares the name of files used by twitter with the names
of the files that have been uploaded on Mastodon and if they differ both files are downloaded from of the files that have been uploaded on Mastodon and if they differ both files are downloaded from
twitter and uploaded on Mastodon. The check is very fast if there is no update. twitter and uploaded on Mastodon. The check is very fast if there is no update.
@ -119,10 +114,9 @@ e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
An empty or missing `tweet_time_format` disables the display of the timestamp. An empty or missing `tweet_time_format` disables the display of the timestamp.
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone, By default, dates are specified in the local timezone of the machine running the script. To display the timestamp to another time zone, use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time zone database (<https://en.wikipedia.org/wiki/Tz_database>).
zone database (<https://en.wikipedia.org/wiki/Tz_database>) e.g. `tweet_timezone = "Europe/Paris"` or `tweet_timezone = "UTC"`
e.g. `tweet_timezone = "Europe/Paris"`
### Rate control ### Rate control
@ -132,6 +126,9 @@ Default min delay is 0 minutes.
No limitation is applied to the number of toots uploaded if `-c` is not specified. No limitation is applied to the number of toots uploaded if `-c` is not specified.
If messages in a thread that are uploaded simultaneously appear in the wrong order, try setting
the `upload_pause` configuration variable in the configuration file to a few seconds (start with 3-5).
## Installation ## Installation
Make sure python3 is installed. Make sure python3 is installed.
@ -151,6 +148,9 @@ pip install beautifulsoup4 Mastodon.py youtube-dl2 pytz
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git` In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
to clone repo with twoot.py script. to clone repo with twoot.py script.
If you want to use a config file to specify options (recommended), copy `default.toml` to
`[you_preferred_name].toml` and edit it to your preferences.
Add command line to crontab. For example, to run every 15 minutes starting at minute 1 of every hour Add command line to crontab. For example, to run every 15 minutes starting at minute 1 of every hour
and process the tweets posted in the last 5 days but at least 15 minutes and process the tweets posted in the last 5 days but at least 15 minutes
ago: ago:
@ -159,6 +159,8 @@ ago:
1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15 1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15
``` ```
After the first successful run, you no longer need to specify the password and yoiucan remove the `-p` switch.
## Featured Accounts ## Featured Accounts
Twoot is known to be used for the following feeds (older first): Twoot is known to be used for the following feeds (older first):

View File

@ -169,12 +169,17 @@ Dowload page with full thread of tweets and extract all replied to tweet referen
Only used by `get_timeline()`. Only used by `get_timeline()`.
:param session: Existing HTTP session with Nitter instance :param session: Existing HTTP session with Nitter instance
:param headers: HTTP headers to use :param headers: HTTP headers to use
:param url: url of the thread page to download :param nitter url: url of the nitter instance to use
:return: List of tweets from the thread :param thread_url: url of the first tweet in thread
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
""" """
def _get_rest_of_thread(session, headers, url): def _get_rest_of_thread(session, headers, nitter_url, thread_url, first_item):
# Add first item to timeline
timeline = [(None, first_item)]
logging.debug("Downloading tweets in thread from separate page") logging.debug("Downloading tweets in thread from separate page")
# Download page with thread # Download page with thread
url = nitter_url + thread_url
try: try:
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT) thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
except requests.exceptions.ConnectionError: except requests.exceptions.ConnectionError:
@ -201,14 +206,29 @@ def _get_rest_of_thread(session, headers, url):
# Get all items in thread after main tweet # Get all items in thread after main tweet
after_tweet = soup.find('div', 'after-tweet') after_tweet = soup.find('div', 'after-tweet')
list = after_tweet.find_all('div', class_='timeline-item')
timeline = after_tweet.find_all('div', class_='timeline-item') # Build timeline of tuples
previous_tweet_url = thread_url
for item in list:
timeline.append((previous_tweet_url, item))
# Get the url of the tweet
tweet_link_tag = item.find('a', class_='tweet-link')
if tweet_link_tag is not None:
previous_tweet_url = tweet_link_tag.get('href').strip('#m')
else:
previous_tweet_url = None
logging.error('Thread tweet is missing link tag')
# return timeline in reverse chronological order
timeline.reverse()
return timeline return timeline
""" """
Dowload page with full thread of tweets. Only used by `get_timeline()`. Download timeline of twitter account
:param url: url of the thread page to download :param url: url of the account page to download
:return: List of tweets from the thread :return: list of tuples with url of tweet replied-to (or None) and content of tweet
""" """
def get_timeline(nitter_url): def get_timeline(nitter_url):
# Define url to use # Define url to use
@ -268,17 +288,18 @@ def get_timeline(nitter_url):
for item in list: for item in list:
classes = item['class'] classes = item['class']
if 'timeline-item' in classes: # Individual tweet if 'timeline-item' in classes: # Individual tweet
timeline.append(item) timeline.append((None, item))
elif 'thread-line' in classes: # First tweet of a thread elif 'thread-line' in classes: # First tweet of a thread
# Get the first item of thread # Get the first item of thread
first_item = item.find('div', class_='timeline-item') first_item = item.find('div', class_='timeline-item')
timeline.append(first_item)
# Get the rest of the items of the thread # Get the url of the tweet
thread_link_tag = item.find('a', class_='tweet-link') thread_link_tag = item.find('a', class_='tweet-link')
if thread_link_tag is not None: if thread_link_tag is not None:
thread_url = thread_link_tag.get('href') thread_url = thread_link_tag.get('href').strip('#m')
timeline.extend(_get_rest_of_thread(session, headers, nitter_url + thread_url))
# Get the rest of the items of the thread
timeline.extend(_get_rest_of_thread(session, headers, nitter_url, thread_url, first_item))
else: else:
# Ignore other classes # Ignore other classes
continue continue
@ -647,17 +668,18 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
# Download twitter video # Download twitter video
vid_in_tweet = False vid_in_tweet = False
vid_class = attachments_container.find('div', class_='video-container') vid_container = attachments_container.find('div', class_='video-container')
if vid_class is not None: if vid_container is not None:
if TOML['options']['upload_videos']: if TOML['options']['upload_videos']:
logging.debug("downloading video from twitter") logging.debug("downloading video from twitter")
import youtube_dl import youtube_dl
video_path = f"{author_account}/status/{status_id}" video_path = vid_container.source['src']
video_file = urljoin('https://twitter.com', video_path) if video_path is not None:
video_file = urljoin(nitter_url, video_path)
ydl_opts = { ydl_opts = {
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s", 'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
'format': "best[width<=500]", # 'format': "best[width<=500]",
'socket_timeout': 60, 'socket_timeout': 60,
'quiet': True, 'quiet': True,
} }
@ -670,6 +692,8 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
vid_in_tweet = True vid_in_tweet = True
else: else:
logging.debug('downloaded twitter video from attachments') logging.debug('downloaded twitter video from attachments')
else:
vid_in_tweet = True
return pics, vid_in_tweet return pics, vid_in_tweet
@ -923,8 +947,9 @@ def main(argv):
mastodon_account TEXT, tweet_id TEXT, toot_id TEXT)''') mastodon_account TEXT, tweet_id TEXT, toot_id TEXT)''')
db.execute('''CREATE INDEX IF NOT EXISTS main_index ON toots (twitter_account, db.execute('''CREATE INDEX IF NOT EXISTS main_index ON toots (twitter_account,
mastodon_instance, mastodon_account, tweet_id)''') mastodon_instance, mastodon_account, tweet_id)''')
db.execute('''CREATE INDEX IF NOT EXISTS tweet_id_index ON toots (tweet_id)''')
db.execute('''CREATE TABLE IF NOT EXISTS profiles (mastodon_instance TEXT, mastodon_account TEXT, avatar_url TEXT, banner_url TEXT)''') db.execute('''CREATE TABLE IF NOT EXISTS profiles (mastodon_instance TEXT, mastodon_account TEXT, avatar_url TEXT, banner_url TEXT)''')
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''') db.execute('''CREATE INDEX IF NOT EXISTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
# Select random nitter instance to fetch updates from # Select random nitter instance to fetch updates from
nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)] nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
@ -942,7 +967,7 @@ def main(argv):
tweets = [] tweets = []
out_date_cnt = 0 out_date_cnt = 0
in_db_cnt = 0 in_db_cnt = 0
for status in timeline: for replied_to_tweet, status in timeline:
# Extract tweet ID and status ID # Extract tweet ID and status ID
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m') tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
status_id = tweet_id.split('/')[3] status_id = tweet_id.split('/')[3]
@ -1105,6 +1130,7 @@ def main(argv):
"tweet_text": tweet_text, "tweet_text": tweet_text,
"video": video_file, "video": video_file,
"photos": photos, "photos": photos,
"replied_to_tweet": replied_to_tweet,
} }
tweets.append(tweet) tweets.append(tweet)
@ -1172,13 +1198,25 @@ def main(argv):
TypeError): # Media cannot be uploaded (invalid format, dead link, etc.) TypeError): # Media cannot be uploaded (invalid format, dead link, etc.)
pass pass
# Find in database toot id of replied_to_tweet
replied_to_toot = None
if tweet['replied_to_tweet'] is not None:
logging.debug("Searching db for toot corresponding to replied-to-tweet " + tweet['replied_to_tweet'])
db.execute("SELECT toot_id FROM toots WHERE tweet_id=?", [tweet['replied_to_tweet']])
replied_to_toot = db.fetchone()
if replied_to_toot is None:
logging.warning('Replied-to tweet %s not found in database', tweet['replied_to_tweet'])
else:
logging.debug("toot %s found", replied_to_toot)
# Post toot # Post toot
toot = {} toot = {}
try: try:
if len(media_ids) == 0: if len(media_ids) == 0:
toot = mastodon.status_post(tweet['tweet_text']) toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot)
else: else:
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids) toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot, media_ids=media_ids)
except MastodonAPIError: except MastodonAPIError:
# Assuming this is an: # Assuming this is an: