Merge branch 'replies'

This commit is contained in:
jeancf 2023-07-17 21:01:35 +02:00
commit e512838a0e
3 changed files with 109 additions and 56 deletions

View File

@ -1,5 +1,18 @@
# Changelog
**14 JUL 2023** VERSION 4.2
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
your config file to slow down the rate at which toots are uploaded.*
A list of nitter instances to use can now be specified in the config file
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
If none is specified, the built-in list of 2-3 known good instances is used as before.
**12 JUL 2023** VERSION 4.1
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.

View File

@ -3,18 +3,11 @@
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
It is simple to set-up on a local machine, configurable and feature-rich.
**14 JUL 2023** VERSION 4.2
**17 JUL 2023** VERSION 4.3
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
your config file to slow down the rate at which toots are uploaded.*
A list of nitter instances to use can now be specified in the config file
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
If none is specified, the built-in list of 2-3 known good instances is used as before.
* Twitter threads are replicated on Mastodon: each follow-up message in a thread is posted
as a reply to its predecessor.
* An issue with downloading videos has been fixed ("ERROR: Sorry, you are not authorized to see this status").
> Previous updates can be found in CHANGELOG.
@ -22,6 +15,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
* Fetch timeline of given user from twitter.com (through nitter instance)
* Scrape html and format tweets for post on mastodon
* Threads (series of replies to own messages) are replicated
* Emojis supported
* Upload images from tweet to Mastodon
* Optionally upload videos from tweet to Mastodon
@ -41,7 +35,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
## Usage
```sh
```
usage: twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>]
[-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o] [-q]
[-a <max age (in days)>] [-d <min delay (in mins)>] [-c <max # of toots to post>]
@ -85,18 +79,19 @@ to use, all the other command-line parameters are ignored, except `-p` (password
### Removing redirected links
`-l` (or `remove_link_redirections = true` in toml file) will follow every link included in the
tweet and replace them with the url that the resource is directly dowmnloaded from (if applicable).
e.g. bit.ly/xxyyyzz -> example.com
`remove_link_redirections = true` in toml file (or `-l` on the command line ) will follow every link
included in the tweet and replace them with the url that the resource is directly dowmnloaded from
(if applicable). e.g. bit.ly/xxyyyzz -> example.com
Every link visit can take up to 5 sec (timeout) depending on the responsiveness of the source
therefore this option will slow down tweet processing.
If you are interested by tracker removal (`-u`, `remove_trackers_from_urls = true`) you should
If you are interested by tracker removal (`remove_trackers_from_urls = true`, `-u`) you should
also select redirection removal as trackers are often hidden behind the redirection of a short URL.
### Uploading videos
When using the `-v` (`upload_videos = true`) switch consider:
When using the `upload_videos = true` (`-v`) switch consider:
* whether the copyright of the content that you want to cross-post allows it
* the storage / transfer limitations of the Mastodon instance that you are posting to
@ -104,7 +99,7 @@ When using the `-v` (`upload_videos = true`) switch consider:
### Updating profile
If `-q` (`update_profile = true`) is specified, twoot will check if the avatar and banner pictures
If `update_profile = true` (`-q`) is specified, twoot will check if the avatar and banner pictures
have changed on the twitter page. This check compares the name of files used by twitter with the names
of the files that have been uploaded on Mastodon and if they differ both files are downloaded from
twitter and uploaded on Mastodon. The check is very fast if there is no update.
@ -119,10 +114,9 @@ e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
An empty or missing `tweet_time_format` disables the display of the timestamp.
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone,
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
zone database (<https://en.wikipedia.org/wiki/Tz_database>)
e.g. `tweet_timezone = "Europe/Paris"`
By default, dates are specified in the local timezone of the machine running the script. To display the timestamp to another time zone, use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
zone database (<https://en.wikipedia.org/wiki/Tz_database>).
e.g. `tweet_timezone = "Europe/Paris"` or `tweet_timezone = "UTC"`
### Rate control
@ -132,6 +126,9 @@ Default min delay is 0 minutes.
No limitation is applied to the number of toots uploaded if `-c` is not specified.
If messages in a thread that are uploaded simultaneously appear in the wrong order, try setting
the `upload_pause` configuration variable in the configuration file to a few seconds (start with 3-5).
## Installation
Make sure python3 is installed.
@ -151,6 +148,9 @@ pip install beautifulsoup4 Mastodon.py youtube-dl2 pytz
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
to clone repo with twoot.py script.
If you want to use a config file to specify options (recommended), copy `default.toml` to
`[you_preferred_name].toml` and edit it to your preferences.
Add command line to crontab. For example, to run every 15 minutes starting at minute 1 of every hour
and process the tweets posted in the last 5 days but at least 15 minutes
ago:
@ -159,6 +159,8 @@ ago:
1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15
```
After the first successful run, you no longer need to specify the password and yoiucan remove the `-p` switch.
## Featured Accounts
Twoot is known to be used for the following feeds (older first):

106
twoot.py
View File

@ -169,12 +169,17 @@ Dowload page with full thread of tweets and extract all replied to tweet referen
Only used by `get_timeline()`.
:param session: Existing HTTP session with Nitter instance
:param headers: HTTP headers to use
:param url: url of the thread page to download
:return: List of tweets from the thread
:param nitter url: url of the nitter instance to use
:param thread_url: url of the first tweet in thread
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
"""
def _get_rest_of_thread(session, headers, url):
def _get_rest_of_thread(session, headers, nitter_url, thread_url, first_item):
# Add first item to timeline
timeline = [(None, first_item)]
logging.debug("Downloading tweets in thread from separate page")
# Download page with thread
url = nitter_url + thread_url
try:
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
except requests.exceptions.ConnectionError:
@ -201,14 +206,29 @@ def _get_rest_of_thread(session, headers, url):
# Get all items in thread after main tweet
after_tweet = soup.find('div', 'after-tweet')
list = after_tweet.find_all('div', class_='timeline-item')
timeline = after_tweet.find_all('div', class_='timeline-item')
# Build timeline of tuples
previous_tweet_url = thread_url
for item in list:
timeline.append((previous_tweet_url, item))
# Get the url of the tweet
tweet_link_tag = item.find('a', class_='tweet-link')
if tweet_link_tag is not None:
previous_tweet_url = tweet_link_tag.get('href').strip('#m')
else:
previous_tweet_url = None
logging.error('Thread tweet is missing link tag')
# return timeline in reverse chronological order
timeline.reverse()
return timeline
"""
Dowload page with full thread of tweets. Only used by `get_timeline()`.
:param url: url of the thread page to download
:return: List of tweets from the thread
Download timeline of twitter account
:param url: url of the account page to download
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
"""
def get_timeline(nitter_url):
# Define url to use
@ -268,17 +288,18 @@ def get_timeline(nitter_url):
for item in list:
classes = item['class']
if 'timeline-item' in classes: # Individual tweet
timeline.append(item)
timeline.append((None, item))
elif 'thread-line' in classes: # First tweet of a thread
# Get the first item of thread
first_item = item.find('div', class_='timeline-item')
timeline.append(first_item)
# Get the rest of the items of the thread
# Get the url of the tweet
thread_link_tag = item.find('a', class_='tweet-link')
if thread_link_tag is not None:
thread_url = thread_link_tag.get('href')
timeline.extend(_get_rest_of_thread(session, headers, nitter_url + thread_url))
thread_url = thread_link_tag.get('href').strip('#m')
# Get the rest of the items of the thread
timeline.extend(_get_rest_of_thread(session, headers, nitter_url, thread_url, first_item))
else:
# Ignore other classes
continue
@ -647,29 +668,32 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
# Download twitter video
vid_in_tweet = False
vid_class = attachments_container.find('div', class_='video-container')
if vid_class is not None:
vid_container = attachments_container.find('div', class_='video-container')
if vid_container is not None:
if TOML['options']['upload_videos']:
logging.debug("downloading video from twitter")
import youtube_dl
video_path = f"{author_account}/status/{status_id}"
video_file = urljoin('https://twitter.com', video_path)
ydl_opts = {
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
'format': "best[width<=500]",
'socket_timeout': 60,
'quiet': True,
}
video_path = vid_container.source['src']
if video_path is not None:
video_file = urljoin(nitter_url, video_path)
ydl_opts = {
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
# 'format': "best[width<=500]",
'socket_timeout': 60,
'quiet': True,
}
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
try:
ydl.download([video_file])
except Exception as e:
logging.warning('Error downloading twitter video: ' + str(e))
vid_in_tweet = True
else:
logging.debug('downloaded twitter video from attachments')
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
try:
ydl.download([video_file])
except Exception as e:
logging.warning('Error downloading twitter video: ' + str(e))
vid_in_tweet = True
else:
logging.debug('downloaded twitter video from attachments')
else:
vid_in_tweet = True
return pics, vid_in_tweet
@ -923,8 +947,9 @@ def main(argv):
mastodon_account TEXT, tweet_id TEXT, toot_id TEXT)''')
db.execute('''CREATE INDEX IF NOT EXISTS main_index ON toots (twitter_account,
mastodon_instance, mastodon_account, tweet_id)''')
db.execute('''CREATE INDEX IF NOT EXISTS tweet_id_index ON toots (tweet_id)''')
db.execute('''CREATE TABLE IF NOT EXISTS profiles (mastodon_instance TEXT, mastodon_account TEXT, avatar_url TEXT, banner_url TEXT)''')
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
db.execute('''CREATE INDEX IF NOT EXISTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
# Select random nitter instance to fetch updates from
nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
@ -942,7 +967,7 @@ def main(argv):
tweets = []
out_date_cnt = 0
in_db_cnt = 0
for status in timeline:
for replied_to_tweet, status in timeline:
# Extract tweet ID and status ID
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
status_id = tweet_id.split('/')[3]
@ -1105,6 +1130,7 @@ def main(argv):
"tweet_text": tweet_text,
"video": video_file,
"photos": photos,
"replied_to_tweet": replied_to_tweet,
}
tweets.append(tweet)
@ -1172,13 +1198,25 @@ def main(argv):
TypeError): # Media cannot be uploaded (invalid format, dead link, etc.)
pass
# Find in database toot id of replied_to_tweet
replied_to_toot = None
if tweet['replied_to_tweet'] is not None:
logging.debug("Searching db for toot corresponding to replied-to-tweet " + tweet['replied_to_tweet'])
db.execute("SELECT toot_id FROM toots WHERE tweet_id=?", [tweet['replied_to_tweet']])
replied_to_toot = db.fetchone()
if replied_to_toot is None:
logging.warning('Replied-to tweet %s not found in database', tweet['replied_to_tweet'])
else:
logging.debug("toot %s found", replied_to_toot)
# Post toot
toot = {}
try:
if len(media_ids) == 0:
toot = mastodon.status_post(tweet['tweet_text'])
toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot)
else:
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids)
toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot, media_ids=media_ids)
except MastodonAPIError:
# Assuming this is an: