mirror of
https://gitlab.com/jeancf/twoot.git
synced 2024-11-23 20:11:11 +00:00
Merge branch 'replies'
This commit is contained in:
commit
e512838a0e
13
CHANGELOG.md
13
CHANGELOG.md
|
@ -1,5 +1,18 @@
|
||||||
# Changelog
|
# Changelog
|
||||||
|
|
||||||
|
**14 JUL 2023** VERSION 4.2
|
||||||
|
|
||||||
|
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
|
||||||
|
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
|
||||||
|
|
||||||
|
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
|
||||||
|
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
|
||||||
|
your config file to slow down the rate at which toots are uploaded.*
|
||||||
|
|
||||||
|
A list of nitter instances to use can now be specified in the config file
|
||||||
|
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
|
||||||
|
If none is specified, the built-in list of 2-3 known good instances is used as before.
|
||||||
|
|
||||||
**12 JUL 2023** VERSION 4.1
|
**12 JUL 2023** VERSION 4.1
|
||||||
|
|
||||||
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.
|
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.
|
||||||
|
|
46
README.md
46
README.md
|
@ -3,18 +3,11 @@
|
||||||
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
|
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
|
||||||
It is simple to set-up on a local machine, configurable and feature-rich.
|
It is simple to set-up on a local machine, configurable and feature-rich.
|
||||||
|
|
||||||
**14 JUL 2023** VERSION 4.2
|
**17 JUL 2023** VERSION 4.3
|
||||||
|
|
||||||
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
|
* Twitter threads are replicated on Mastodon: each follow-up message in a thread is posted
|
||||||
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
|
as a reply to its predecessor.
|
||||||
|
* An issue with downloading videos has been fixed ("ERROR: Sorry, you are not authorized to see this status").
|
||||||
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
|
|
||||||
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
|
|
||||||
your config file to slow down the rate at which toots are uploaded.*
|
|
||||||
|
|
||||||
A list of nitter instances to use can now be specified in the config file
|
|
||||||
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
|
|
||||||
If none is specified, the built-in list of 2-3 known good instances is used as before.
|
|
||||||
|
|
||||||
> Previous updates can be found in CHANGELOG.
|
> Previous updates can be found in CHANGELOG.
|
||||||
|
|
||||||
|
@ -22,6 +15,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
|
||||||
|
|
||||||
* Fetch timeline of given user from twitter.com (through nitter instance)
|
* Fetch timeline of given user from twitter.com (through nitter instance)
|
||||||
* Scrape html and format tweets for post on mastodon
|
* Scrape html and format tweets for post on mastodon
|
||||||
|
* Threads (series of replies to own messages) are replicated
|
||||||
* Emojis supported
|
* Emojis supported
|
||||||
* Upload images from tweet to Mastodon
|
* Upload images from tweet to Mastodon
|
||||||
* Optionally upload videos from tweet to Mastodon
|
* Optionally upload videos from tweet to Mastodon
|
||||||
|
@ -41,7 +35,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
```sh
|
```
|
||||||
usage: twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>]
|
usage: twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>]
|
||||||
[-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o] [-q]
|
[-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o] [-q]
|
||||||
[-a <max age (in days)>] [-d <min delay (in mins)>] [-c <max # of toots to post>]
|
[-a <max age (in days)>] [-d <min delay (in mins)>] [-c <max # of toots to post>]
|
||||||
|
@ -85,18 +79,19 @@ to use, all the other command-line parameters are ignored, except `-p` (password
|
||||||
|
|
||||||
### Removing redirected links
|
### Removing redirected links
|
||||||
|
|
||||||
`-l` (or `remove_link_redirections = true` in toml file) will follow every link included in the
|
`remove_link_redirections = true` in toml file (or `-l` on the command line ) will follow every link
|
||||||
tweet and replace them with the url that the resource is directly dowmnloaded from (if applicable).
|
included in the tweet and replace them with the url that the resource is directly dowmnloaded from
|
||||||
e.g. bit.ly/xxyyyzz -> example.com
|
(if applicable). e.g. bit.ly/xxyyyzz -> example.com
|
||||||
|
|
||||||
Every link visit can take up to 5 sec (timeout) depending on the responsiveness of the source
|
Every link visit can take up to 5 sec (timeout) depending on the responsiveness of the source
|
||||||
therefore this option will slow down tweet processing.
|
therefore this option will slow down tweet processing.
|
||||||
|
|
||||||
If you are interested by tracker removal (`-u`, `remove_trackers_from_urls = true`) you should
|
If you are interested by tracker removal (`remove_trackers_from_urls = true`, `-u`) you should
|
||||||
also select redirection removal as trackers are often hidden behind the redirection of a short URL.
|
also select redirection removal as trackers are often hidden behind the redirection of a short URL.
|
||||||
|
|
||||||
### Uploading videos
|
### Uploading videos
|
||||||
|
|
||||||
When using the `-v` (`upload_videos = true`) switch consider:
|
When using the `upload_videos = true` (`-v`) switch consider:
|
||||||
|
|
||||||
* whether the copyright of the content that you want to cross-post allows it
|
* whether the copyright of the content that you want to cross-post allows it
|
||||||
* the storage / transfer limitations of the Mastodon instance that you are posting to
|
* the storage / transfer limitations of the Mastodon instance that you are posting to
|
||||||
|
@ -104,7 +99,7 @@ When using the `-v` (`upload_videos = true`) switch consider:
|
||||||
|
|
||||||
### Updating profile
|
### Updating profile
|
||||||
|
|
||||||
If `-q` (`update_profile = true`) is specified, twoot will check if the avatar and banner pictures
|
If `update_profile = true` (`-q`) is specified, twoot will check if the avatar and banner pictures
|
||||||
have changed on the twitter page. This check compares the name of files used by twitter with the names
|
have changed on the twitter page. This check compares the name of files used by twitter with the names
|
||||||
of the files that have been uploaded on Mastodon and if they differ both files are downloaded from
|
of the files that have been uploaded on Mastodon and if they differ both files are downloaded from
|
||||||
twitter and uploaded on Mastodon. The check is very fast if there is no update.
|
twitter and uploaded on Mastodon. The check is very fast if there is no update.
|
||||||
|
@ -119,10 +114,9 @@ e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
|
||||||
|
|
||||||
An empty or missing `tweet_time_format` disables the display of the timestamp.
|
An empty or missing `tweet_time_format` disables the display of the timestamp.
|
||||||
|
|
||||||
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone,
|
By default, dates are specified in the local timezone of the machine running the script. To display the timestamp to another time zone, use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
|
||||||
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
|
zone database (<https://en.wikipedia.org/wiki/Tz_database>).
|
||||||
zone database (<https://en.wikipedia.org/wiki/Tz_database>)
|
e.g. `tweet_timezone = "Europe/Paris"` or `tweet_timezone = "UTC"`
|
||||||
e.g. `tweet_timezone = "Europe/Paris"`
|
|
||||||
|
|
||||||
### Rate control
|
### Rate control
|
||||||
|
|
||||||
|
@ -132,6 +126,9 @@ Default min delay is 0 minutes.
|
||||||
|
|
||||||
No limitation is applied to the number of toots uploaded if `-c` is not specified.
|
No limitation is applied to the number of toots uploaded if `-c` is not specified.
|
||||||
|
|
||||||
|
If messages in a thread that are uploaded simultaneously appear in the wrong order, try setting
|
||||||
|
the `upload_pause` configuration variable in the configuration file to a few seconds (start with 3-5).
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
Make sure python3 is installed.
|
Make sure python3 is installed.
|
||||||
|
@ -151,6 +148,9 @@ pip install beautifulsoup4 Mastodon.py youtube-dl2 pytz
|
||||||
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
|
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
|
||||||
to clone repo with twoot.py script.
|
to clone repo with twoot.py script.
|
||||||
|
|
||||||
|
If you want to use a config file to specify options (recommended), copy `default.toml` to
|
||||||
|
`[you_preferred_name].toml` and edit it to your preferences.
|
||||||
|
|
||||||
Add command line to crontab. For example, to run every 15 minutes starting at minute 1 of every hour
|
Add command line to crontab. For example, to run every 15 minutes starting at minute 1 of every hour
|
||||||
and process the tweets posted in the last 5 days but at least 15 minutes
|
and process the tweets posted in the last 5 days but at least 15 minutes
|
||||||
ago:
|
ago:
|
||||||
|
@ -159,6 +159,8 @@ ago:
|
||||||
1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15
|
1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15
|
||||||
```
|
```
|
||||||
|
|
||||||
|
After the first successful run, you no longer need to specify the password and yoiucan remove the `-p` switch.
|
||||||
|
|
||||||
## Featured Accounts
|
## Featured Accounts
|
||||||
|
|
||||||
Twoot is known to be used for the following feeds (older first):
|
Twoot is known to be used for the following feeds (older first):
|
||||||
|
|
106
twoot.py
106
twoot.py
|
@ -169,12 +169,17 @@ Dowload page with full thread of tweets and extract all replied to tweet referen
|
||||||
Only used by `get_timeline()`.
|
Only used by `get_timeline()`.
|
||||||
:param session: Existing HTTP session with Nitter instance
|
:param session: Existing HTTP session with Nitter instance
|
||||||
:param headers: HTTP headers to use
|
:param headers: HTTP headers to use
|
||||||
:param url: url of the thread page to download
|
:param nitter url: url of the nitter instance to use
|
||||||
:return: List of tweets from the thread
|
:param thread_url: url of the first tweet in thread
|
||||||
|
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
|
||||||
"""
|
"""
|
||||||
def _get_rest_of_thread(session, headers, url):
|
def _get_rest_of_thread(session, headers, nitter_url, thread_url, first_item):
|
||||||
|
# Add first item to timeline
|
||||||
|
timeline = [(None, first_item)]
|
||||||
|
|
||||||
logging.debug("Downloading tweets in thread from separate page")
|
logging.debug("Downloading tweets in thread from separate page")
|
||||||
# Download page with thread
|
# Download page with thread
|
||||||
|
url = nitter_url + thread_url
|
||||||
try:
|
try:
|
||||||
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
|
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
|
||||||
except requests.exceptions.ConnectionError:
|
except requests.exceptions.ConnectionError:
|
||||||
|
@ -201,14 +206,29 @@ def _get_rest_of_thread(session, headers, url):
|
||||||
|
|
||||||
# Get all items in thread after main tweet
|
# Get all items in thread after main tweet
|
||||||
after_tweet = soup.find('div', 'after-tweet')
|
after_tweet = soup.find('div', 'after-tweet')
|
||||||
|
list = after_tweet.find_all('div', class_='timeline-item')
|
||||||
|
|
||||||
timeline = after_tweet.find_all('div', class_='timeline-item')
|
# Build timeline of tuples
|
||||||
|
previous_tweet_url = thread_url
|
||||||
|
for item in list:
|
||||||
|
timeline.append((previous_tweet_url, item))
|
||||||
|
# Get the url of the tweet
|
||||||
|
tweet_link_tag = item.find('a', class_='tweet-link')
|
||||||
|
if tweet_link_tag is not None:
|
||||||
|
previous_tweet_url = tweet_link_tag.get('href').strip('#m')
|
||||||
|
else:
|
||||||
|
previous_tweet_url = None
|
||||||
|
logging.error('Thread tweet is missing link tag')
|
||||||
|
|
||||||
|
# return timeline in reverse chronological order
|
||||||
|
timeline.reverse()
|
||||||
return timeline
|
return timeline
|
||||||
|
|
||||||
|
|
||||||
"""
|
"""
|
||||||
Dowload page with full thread of tweets. Only used by `get_timeline()`.
|
Download timeline of twitter account
|
||||||
:param url: url of the thread page to download
|
:param url: url of the account page to download
|
||||||
:return: List of tweets from the thread
|
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
|
||||||
"""
|
"""
|
||||||
def get_timeline(nitter_url):
|
def get_timeline(nitter_url):
|
||||||
# Define url to use
|
# Define url to use
|
||||||
|
@ -268,17 +288,18 @@ def get_timeline(nitter_url):
|
||||||
for item in list:
|
for item in list:
|
||||||
classes = item['class']
|
classes = item['class']
|
||||||
if 'timeline-item' in classes: # Individual tweet
|
if 'timeline-item' in classes: # Individual tweet
|
||||||
timeline.append(item)
|
timeline.append((None, item))
|
||||||
elif 'thread-line' in classes: # First tweet of a thread
|
elif 'thread-line' in classes: # First tweet of a thread
|
||||||
# Get the first item of thread
|
# Get the first item of thread
|
||||||
first_item = item.find('div', class_='timeline-item')
|
first_item = item.find('div', class_='timeline-item')
|
||||||
timeline.append(first_item)
|
|
||||||
|
|
||||||
# Get the rest of the items of the thread
|
# Get the url of the tweet
|
||||||
thread_link_tag = item.find('a', class_='tweet-link')
|
thread_link_tag = item.find('a', class_='tweet-link')
|
||||||
if thread_link_tag is not None:
|
if thread_link_tag is not None:
|
||||||
thread_url = thread_link_tag.get('href')
|
thread_url = thread_link_tag.get('href').strip('#m')
|
||||||
timeline.extend(_get_rest_of_thread(session, headers, nitter_url + thread_url))
|
|
||||||
|
# Get the rest of the items of the thread
|
||||||
|
timeline.extend(_get_rest_of_thread(session, headers, nitter_url, thread_url, first_item))
|
||||||
else:
|
else:
|
||||||
# Ignore other classes
|
# Ignore other classes
|
||||||
continue
|
continue
|
||||||
|
@ -647,29 +668,32 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
|
||||||
|
|
||||||
# Download twitter video
|
# Download twitter video
|
||||||
vid_in_tweet = False
|
vid_in_tweet = False
|
||||||
vid_class = attachments_container.find('div', class_='video-container')
|
vid_container = attachments_container.find('div', class_='video-container')
|
||||||
if vid_class is not None:
|
if vid_container is not None:
|
||||||
if TOML['options']['upload_videos']:
|
if TOML['options']['upload_videos']:
|
||||||
logging.debug("downloading video from twitter")
|
logging.debug("downloading video from twitter")
|
||||||
import youtube_dl
|
import youtube_dl
|
||||||
|
|
||||||
video_path = f"{author_account}/status/{status_id}"
|
video_path = vid_container.source['src']
|
||||||
video_file = urljoin('https://twitter.com', video_path)
|
if video_path is not None:
|
||||||
ydl_opts = {
|
video_file = urljoin(nitter_url, video_path)
|
||||||
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
|
ydl_opts = {
|
||||||
'format': "best[width<=500]",
|
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
|
||||||
'socket_timeout': 60,
|
# 'format': "best[width<=500]",
|
||||||
'quiet': True,
|
'socket_timeout': 60,
|
||||||
}
|
'quiet': True,
|
||||||
|
}
|
||||||
|
|
||||||
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
|
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
|
||||||
try:
|
try:
|
||||||
ydl.download([video_file])
|
ydl.download([video_file])
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
logging.warning('Error downloading twitter video: ' + str(e))
|
logging.warning('Error downloading twitter video: ' + str(e))
|
||||||
vid_in_tweet = True
|
vid_in_tweet = True
|
||||||
else:
|
else:
|
||||||
logging.debug('downloaded twitter video from attachments')
|
logging.debug('downloaded twitter video from attachments')
|
||||||
|
else:
|
||||||
|
vid_in_tweet = True
|
||||||
|
|
||||||
return pics, vid_in_tweet
|
return pics, vid_in_tweet
|
||||||
|
|
||||||
|
@ -923,8 +947,9 @@ def main(argv):
|
||||||
mastodon_account TEXT, tweet_id TEXT, toot_id TEXT)''')
|
mastodon_account TEXT, tweet_id TEXT, toot_id TEXT)''')
|
||||||
db.execute('''CREATE INDEX IF NOT EXISTS main_index ON toots (twitter_account,
|
db.execute('''CREATE INDEX IF NOT EXISTS main_index ON toots (twitter_account,
|
||||||
mastodon_instance, mastodon_account, tweet_id)''')
|
mastodon_instance, mastodon_account, tweet_id)''')
|
||||||
|
db.execute('''CREATE INDEX IF NOT EXISTS tweet_id_index ON toots (tweet_id)''')
|
||||||
db.execute('''CREATE TABLE IF NOT EXISTS profiles (mastodon_instance TEXT, mastodon_account TEXT, avatar_url TEXT, banner_url TEXT)''')
|
db.execute('''CREATE TABLE IF NOT EXISTS profiles (mastodon_instance TEXT, mastodon_account TEXT, avatar_url TEXT, banner_url TEXT)''')
|
||||||
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
|
db.execute('''CREATE INDEX IF NOT EXISTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
|
||||||
|
|
||||||
# Select random nitter instance to fetch updates from
|
# Select random nitter instance to fetch updates from
|
||||||
nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
|
nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
|
||||||
|
@ -942,7 +967,7 @@ def main(argv):
|
||||||
tweets = []
|
tweets = []
|
||||||
out_date_cnt = 0
|
out_date_cnt = 0
|
||||||
in_db_cnt = 0
|
in_db_cnt = 0
|
||||||
for status in timeline:
|
for replied_to_tweet, status in timeline:
|
||||||
# Extract tweet ID and status ID
|
# Extract tweet ID and status ID
|
||||||
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
|
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
|
||||||
status_id = tweet_id.split('/')[3]
|
status_id = tweet_id.split('/')[3]
|
||||||
|
@ -1105,6 +1130,7 @@ def main(argv):
|
||||||
"tweet_text": tweet_text,
|
"tweet_text": tweet_text,
|
||||||
"video": video_file,
|
"video": video_file,
|
||||||
"photos": photos,
|
"photos": photos,
|
||||||
|
"replied_to_tweet": replied_to_tweet,
|
||||||
}
|
}
|
||||||
tweets.append(tweet)
|
tweets.append(tweet)
|
||||||
|
|
||||||
|
@ -1172,13 +1198,25 @@ def main(argv):
|
||||||
TypeError): # Media cannot be uploaded (invalid format, dead link, etc.)
|
TypeError): # Media cannot be uploaded (invalid format, dead link, etc.)
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
# Find in database toot id of replied_to_tweet
|
||||||
|
replied_to_toot = None
|
||||||
|
if tweet['replied_to_tweet'] is not None:
|
||||||
|
logging.debug("Searching db for toot corresponding to replied-to-tweet " + tweet['replied_to_tweet'])
|
||||||
|
db.execute("SELECT toot_id FROM toots WHERE tweet_id=?", [tweet['replied_to_tweet']])
|
||||||
|
replied_to_toot = db.fetchone()
|
||||||
|
|
||||||
|
if replied_to_toot is None:
|
||||||
|
logging.warning('Replied-to tweet %s not found in database', tweet['replied_to_tweet'])
|
||||||
|
else:
|
||||||
|
logging.debug("toot %s found", replied_to_toot)
|
||||||
|
|
||||||
# Post toot
|
# Post toot
|
||||||
toot = {}
|
toot = {}
|
||||||
try:
|
try:
|
||||||
if len(media_ids) == 0:
|
if len(media_ids) == 0:
|
||||||
toot = mastodon.status_post(tweet['tweet_text'])
|
toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot)
|
||||||
else:
|
else:
|
||||||
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids)
|
toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot, media_ids=media_ids)
|
||||||
|
|
||||||
except MastodonAPIError:
|
except MastodonAPIError:
|
||||||
# Assuming this is an:
|
# Assuming this is an:
|
||||||
|
|
Loading…
Reference in New Issue
Block a user