mirror of
https://gitlab.com/jeancf/twoot.git
synced 2024-11-23 20:11:11 +00:00
Merge branch 'replies'
This commit is contained in:
commit
e512838a0e
13
CHANGELOG.md
13
CHANGELOG.md
|
@ -1,5 +1,18 @@
|
|||
# Changelog
|
||||
|
||||
**14 JUL 2023** VERSION 4.2
|
||||
|
||||
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
|
||||
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
|
||||
|
||||
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
|
||||
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
|
||||
your config file to slow down the rate at which toots are uploaded.*
|
||||
|
||||
A list of nitter instances to use can now be specified in the config file
|
||||
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
|
||||
If none is specified, the built-in list of 2-3 known good instances is used as before.
|
||||
|
||||
**12 JUL 2023** VERSION 4.1
|
||||
|
||||
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.
|
||||
|
|
46
README.md
46
README.md
|
@ -3,18 +3,11 @@
|
|||
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
|
||||
It is simple to set-up on a local machine, configurable and feature-rich.
|
||||
|
||||
**14 JUL 2023** VERSION 4.2
|
||||
**17 JUL 2023** VERSION 4.3
|
||||
|
||||
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
|
||||
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
|
||||
|
||||
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
|
||||
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
|
||||
your config file to slow down the rate at which toots are uploaded.*
|
||||
|
||||
A list of nitter instances to use can now be specified in the config file
|
||||
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
|
||||
If none is specified, the built-in list of 2-3 known good instances is used as before.
|
||||
* Twitter threads are replicated on Mastodon: each follow-up message in a thread is posted
|
||||
as a reply to its predecessor.
|
||||
* An issue with downloading videos has been fixed ("ERROR: Sorry, you are not authorized to see this status").
|
||||
|
||||
> Previous updates can be found in CHANGELOG.
|
||||
|
||||
|
@ -22,6 +15,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
|
|||
|
||||
* Fetch timeline of given user from twitter.com (through nitter instance)
|
||||
* Scrape html and format tweets for post on mastodon
|
||||
* Threads (series of replies to own messages) are replicated
|
||||
* Emojis supported
|
||||
* Upload images from tweet to Mastodon
|
||||
* Optionally upload videos from tweet to Mastodon
|
||||
|
@ -41,7 +35,7 @@ If none is specified, the built-in list of 2-3 known good instances is used as b
|
|||
|
||||
## Usage
|
||||
|
||||
```sh
|
||||
```
|
||||
usage: twoot.py [-h] [-f <.toml config file>] [-t <twitter account>] [-i <mastodon instance>]
|
||||
[-m <mastodon account>] [-p <mastodon password>] [-r] [-s] [-l] [-u] [-v] [-o] [-q]
|
||||
[-a <max age (in days)>] [-d <min delay (in mins)>] [-c <max # of toots to post>]
|
||||
|
@ -85,18 +79,19 @@ to use, all the other command-line parameters are ignored, except `-p` (password
|
|||
|
||||
### Removing redirected links
|
||||
|
||||
`-l` (or `remove_link_redirections = true` in toml file) will follow every link included in the
|
||||
tweet and replace them with the url that the resource is directly dowmnloaded from (if applicable).
|
||||
e.g. bit.ly/xxyyyzz -> example.com
|
||||
`remove_link_redirections = true` in toml file (or `-l` on the command line ) will follow every link
|
||||
included in the tweet and replace them with the url that the resource is directly dowmnloaded from
|
||||
(if applicable). e.g. bit.ly/xxyyyzz -> example.com
|
||||
|
||||
Every link visit can take up to 5 sec (timeout) depending on the responsiveness of the source
|
||||
therefore this option will slow down tweet processing.
|
||||
|
||||
If you are interested by tracker removal (`-u`, `remove_trackers_from_urls = true`) you should
|
||||
If you are interested by tracker removal (`remove_trackers_from_urls = true`, `-u`) you should
|
||||
also select redirection removal as trackers are often hidden behind the redirection of a short URL.
|
||||
|
||||
### Uploading videos
|
||||
|
||||
When using the `-v` (`upload_videos = true`) switch consider:
|
||||
When using the `upload_videos = true` (`-v`) switch consider:
|
||||
|
||||
* whether the copyright of the content that you want to cross-post allows it
|
||||
* the storage / transfer limitations of the Mastodon instance that you are posting to
|
||||
|
@ -104,7 +99,7 @@ When using the `-v` (`upload_videos = true`) switch consider:
|
|||
|
||||
### Updating profile
|
||||
|
||||
If `-q` (`update_profile = true`) is specified, twoot will check if the avatar and banner pictures
|
||||
If `update_profile = true` (`-q`) is specified, twoot will check if the avatar and banner pictures
|
||||
have changed on the twitter page. This check compares the name of files used by twitter with the names
|
||||
of the files that have been uploaded on Mastodon and if they differ both files are downloaded from
|
||||
twitter and uploaded on Mastodon. The check is very fast if there is no update.
|
||||
|
@ -119,10 +114,9 @@ e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
|
|||
|
||||
An empty or missing `tweet_time_format` disables the display of the timestamp.
|
||||
|
||||
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone,
|
||||
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
|
||||
zone database (<https://en.wikipedia.org/wiki/Tz_database>)
|
||||
e.g. `tweet_timezone = "Europe/Paris"`
|
||||
By default, dates are specified in the local timezone of the machine running the script. To display the timestamp to another time zone, use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
|
||||
zone database (<https://en.wikipedia.org/wiki/Tz_database>).
|
||||
e.g. `tweet_timezone = "Europe/Paris"` or `tweet_timezone = "UTC"`
|
||||
|
||||
### Rate control
|
||||
|
||||
|
@ -132,6 +126,9 @@ Default min delay is 0 minutes.
|
|||
|
||||
No limitation is applied to the number of toots uploaded if `-c` is not specified.
|
||||
|
||||
If messages in a thread that are uploaded simultaneously appear in the wrong order, try setting
|
||||
the `upload_pause` configuration variable in the configuration file to a few seconds (start with 3-5).
|
||||
|
||||
## Installation
|
||||
|
||||
Make sure python3 is installed.
|
||||
|
@ -151,6 +148,9 @@ pip install beautifulsoup4 Mastodon.py youtube-dl2 pytz
|
|||
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
|
||||
to clone repo with twoot.py script.
|
||||
|
||||
If you want to use a config file to specify options (recommended), copy `default.toml` to
|
||||
`[you_preferred_name].toml` and edit it to your preferences.
|
||||
|
||||
Add command line to crontab. For example, to run every 15 minutes starting at minute 1 of every hour
|
||||
and process the tweets posted in the last 5 days but at least 15 minutes
|
||||
ago:
|
||||
|
@ -159,6 +159,8 @@ ago:
|
|||
1-59/15 * * * * /path/to/twoot.py -t SuperDuper -i masto.space -m sd@example.com -p my_Sup3r-S4f3*pw -a 5 -d 15
|
||||
```
|
||||
|
||||
After the first successful run, you no longer need to specify the password and yoiucan remove the `-p` switch.
|
||||
|
||||
## Featured Accounts
|
||||
|
||||
Twoot is known to be used for the following feeds (older first):
|
||||
|
|
106
twoot.py
106
twoot.py
|
@ -169,12 +169,17 @@ Dowload page with full thread of tweets and extract all replied to tweet referen
|
|||
Only used by `get_timeline()`.
|
||||
:param session: Existing HTTP session with Nitter instance
|
||||
:param headers: HTTP headers to use
|
||||
:param url: url of the thread page to download
|
||||
:return: List of tweets from the thread
|
||||
:param nitter url: url of the nitter instance to use
|
||||
:param thread_url: url of the first tweet in thread
|
||||
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
|
||||
"""
|
||||
def _get_rest_of_thread(session, headers, url):
|
||||
def _get_rest_of_thread(session, headers, nitter_url, thread_url, first_item):
|
||||
# Add first item to timeline
|
||||
timeline = [(None, first_item)]
|
||||
|
||||
logging.debug("Downloading tweets in thread from separate page")
|
||||
# Download page with thread
|
||||
url = nitter_url + thread_url
|
||||
try:
|
||||
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
|
||||
except requests.exceptions.ConnectionError:
|
||||
|
@ -201,14 +206,29 @@ def _get_rest_of_thread(session, headers, url):
|
|||
|
||||
# Get all items in thread after main tweet
|
||||
after_tweet = soup.find('div', 'after-tweet')
|
||||
list = after_tweet.find_all('div', class_='timeline-item')
|
||||
|
||||
timeline = after_tweet.find_all('div', class_='timeline-item')
|
||||
# Build timeline of tuples
|
||||
previous_tweet_url = thread_url
|
||||
for item in list:
|
||||
timeline.append((previous_tweet_url, item))
|
||||
# Get the url of the tweet
|
||||
tweet_link_tag = item.find('a', class_='tweet-link')
|
||||
if tweet_link_tag is not None:
|
||||
previous_tweet_url = tweet_link_tag.get('href').strip('#m')
|
||||
else:
|
||||
previous_tweet_url = None
|
||||
logging.error('Thread tweet is missing link tag')
|
||||
|
||||
# return timeline in reverse chronological order
|
||||
timeline.reverse()
|
||||
return timeline
|
||||
|
||||
|
||||
"""
|
||||
Dowload page with full thread of tweets. Only used by `get_timeline()`.
|
||||
:param url: url of the thread page to download
|
||||
:return: List of tweets from the thread
|
||||
Download timeline of twitter account
|
||||
:param url: url of the account page to download
|
||||
:return: list of tuples with url of tweet replied-to (or None) and content of tweet
|
||||
"""
|
||||
def get_timeline(nitter_url):
|
||||
# Define url to use
|
||||
|
@ -268,17 +288,18 @@ def get_timeline(nitter_url):
|
|||
for item in list:
|
||||
classes = item['class']
|
||||
if 'timeline-item' in classes: # Individual tweet
|
||||
timeline.append(item)
|
||||
timeline.append((None, item))
|
||||
elif 'thread-line' in classes: # First tweet of a thread
|
||||
# Get the first item of thread
|
||||
first_item = item.find('div', class_='timeline-item')
|
||||
timeline.append(first_item)
|
||||
|
||||
# Get the rest of the items of the thread
|
||||
# Get the url of the tweet
|
||||
thread_link_tag = item.find('a', class_='tweet-link')
|
||||
if thread_link_tag is not None:
|
||||
thread_url = thread_link_tag.get('href')
|
||||
timeline.extend(_get_rest_of_thread(session, headers, nitter_url + thread_url))
|
||||
thread_url = thread_link_tag.get('href').strip('#m')
|
||||
|
||||
# Get the rest of the items of the thread
|
||||
timeline.extend(_get_rest_of_thread(session, headers, nitter_url, thread_url, first_item))
|
||||
else:
|
||||
# Ignore other classes
|
||||
continue
|
||||
|
@ -647,29 +668,32 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
|
|||
|
||||
# Download twitter video
|
||||
vid_in_tweet = False
|
||||
vid_class = attachments_container.find('div', class_='video-container')
|
||||
if vid_class is not None:
|
||||
vid_container = attachments_container.find('div', class_='video-container')
|
||||
if vid_container is not None:
|
||||
if TOML['options']['upload_videos']:
|
||||
logging.debug("downloading video from twitter")
|
||||
import youtube_dl
|
||||
|
||||
video_path = f"{author_account}/status/{status_id}"
|
||||
video_file = urljoin('https://twitter.com', video_path)
|
||||
ydl_opts = {
|
||||
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
|
||||
'format': "best[width<=500]",
|
||||
'socket_timeout': 60,
|
||||
'quiet': True,
|
||||
}
|
||||
video_path = vid_container.source['src']
|
||||
if video_path is not None:
|
||||
video_file = urljoin(nitter_url, video_path)
|
||||
ydl_opts = {
|
||||
'outtmpl': "output/" + TOML['config']['twitter_account'] + "/" + status_id + "/%(id)s.%(ext)s",
|
||||
# 'format': "best[width<=500]",
|
||||
'socket_timeout': 60,
|
||||
'quiet': True,
|
||||
}
|
||||
|
||||
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
|
||||
try:
|
||||
ydl.download([video_file])
|
||||
except Exception as e:
|
||||
logging.warning('Error downloading twitter video: ' + str(e))
|
||||
vid_in_tweet = True
|
||||
else:
|
||||
logging.debug('downloaded twitter video from attachments')
|
||||
with youtube_dl.YoutubeDL(ydl_opts) as ydl:
|
||||
try:
|
||||
ydl.download([video_file])
|
||||
except Exception as e:
|
||||
logging.warning('Error downloading twitter video: ' + str(e))
|
||||
vid_in_tweet = True
|
||||
else:
|
||||
logging.debug('downloaded twitter video from attachments')
|
||||
else:
|
||||
vid_in_tweet = True
|
||||
|
||||
return pics, vid_in_tweet
|
||||
|
||||
|
@ -923,8 +947,9 @@ def main(argv):
|
|||
mastodon_account TEXT, tweet_id TEXT, toot_id TEXT)''')
|
||||
db.execute('''CREATE INDEX IF NOT EXISTS main_index ON toots (twitter_account,
|
||||
mastodon_instance, mastodon_account, tweet_id)''')
|
||||
db.execute('''CREATE INDEX IF NOT EXISTS tweet_id_index ON toots (tweet_id)''')
|
||||
db.execute('''CREATE TABLE IF NOT EXISTS profiles (mastodon_instance TEXT, mastodon_account TEXT, avatar_url TEXT, banner_url TEXT)''')
|
||||
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
|
||||
db.execute('''CREATE INDEX IF NOT EXISTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
|
||||
|
||||
# Select random nitter instance to fetch updates from
|
||||
nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
|
||||
|
@ -942,7 +967,7 @@ def main(argv):
|
|||
tweets = []
|
||||
out_date_cnt = 0
|
||||
in_db_cnt = 0
|
||||
for status in timeline:
|
||||
for replied_to_tweet, status in timeline:
|
||||
# Extract tweet ID and status ID
|
||||
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
|
||||
status_id = tweet_id.split('/')[3]
|
||||
|
@ -1105,6 +1130,7 @@ def main(argv):
|
|||
"tweet_text": tweet_text,
|
||||
"video": video_file,
|
||||
"photos": photos,
|
||||
"replied_to_tweet": replied_to_tweet,
|
||||
}
|
||||
tweets.append(tweet)
|
||||
|
||||
|
@ -1172,13 +1198,25 @@ def main(argv):
|
|||
TypeError): # Media cannot be uploaded (invalid format, dead link, etc.)
|
||||
pass
|
||||
|
||||
# Find in database toot id of replied_to_tweet
|
||||
replied_to_toot = None
|
||||
if tweet['replied_to_tweet'] is not None:
|
||||
logging.debug("Searching db for toot corresponding to replied-to-tweet " + tweet['replied_to_tweet'])
|
||||
db.execute("SELECT toot_id FROM toots WHERE tweet_id=?", [tweet['replied_to_tweet']])
|
||||
replied_to_toot = db.fetchone()
|
||||
|
||||
if replied_to_toot is None:
|
||||
logging.warning('Replied-to tweet %s not found in database', tweet['replied_to_tweet'])
|
||||
else:
|
||||
logging.debug("toot %s found", replied_to_toot)
|
||||
|
||||
# Post toot
|
||||
toot = {}
|
||||
try:
|
||||
if len(media_ids) == 0:
|
||||
toot = mastodon.status_post(tweet['tweet_text'])
|
||||
toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot)
|
||||
else:
|
||||
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids)
|
||||
toot = mastodon.status_post(tweet['tweet_text'], replied_to_toot, media_ids=media_ids)
|
||||
|
||||
except MastodonAPIError:
|
||||
# Assuming this is an:
|
||||
|
|
Loading…
Reference in New Issue
Block a user