Compare commits

...

17 Commits

Author SHA1 Message Date
jeancf
b690f83ed9 Last rewording 2023-07-14 20:42:06 +02:00
jeancf
1e81e16788 Update logging info messages 2023-07-14 20:29:25 +02:00
jeancf
cb0fc55c8b Remove unused function 2023-07-14 20:21:59 +02:00
jeancf
e53cef4274 Improve README 2023-07-14 20:18:56 +02:00
jeancf
943dfffeb1 Update README and CHANGELOG 2023-07-14 20:14:16 +02:00
jeancf
1d2ce1fc94 Remove space 2023-07-14 19:55:48 +02:00
jeancf
fb8d83800e add upload pause to config 2023-07-14 13:21:12 +02:00
jeancf
d6ed64d6fc Move nitter instances to config file 2023-07-14 13:12:25 +02:00
jeancf
cdc1fb03f7 Add comments 2023-07-14 13:11:20 +02:00
jeancf
b10a8392c8 5 seconds pause between toots 2023-07-14 11:12:30 +02:00
jeancf
d7bfab4cd3 Remove https:// from NITTER_URLs 2023-07-13 15:51:54 +02:00
jeancf
a4f3934d86 Fix indentation 2023-07-13 15:44:37 +02:00
jeancf
29c7457644 Add some log messages 2023-07-13 13:32:38 +02:00
jeancf
cdbb1bb8f2 Fine tune thread download 2023-07-13 11:53:07 +02:00
jeancf
5939484160 Complete get_timeline() 2023-07-13 11:36:04 +02:00
jeancf
f8bd948b9c Hit a bump 2023-07-12 22:19:04 +02:00
jeancf
b842f6d471 Created get_timeline function 2023-07-12 22:02:06 +02:00
4 changed files with 237 additions and 163 deletions

View File

@ -1,5 +1,15 @@
# Changelog # Changelog
**12 JUL 2023** VERSION 4.1
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.
**A new dependency to python module `pytz` has been added**. Please run `pip install pytz`
in your environment to install it.
* Added option to display timestamp of the original tweet in toot
* Tweaked list of nitter instances
**28 JUN 2023** VERSION 4.0 **28 JUN 2023** VERSION 4.0
* Added option to update avatar and banner pictures on profile if changed on Twitter * Added option to update avatar and banner pictures on profile if changed on Twitter

View File

@ -3,18 +3,18 @@
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account. Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
It is simple to set-up on a local machine, configurable and feature-rich. It is simple to set-up on a local machine, configurable and feature-rich.
**12 JUL 2023** VERSION 4.1 **14 JUL 2023** VERSION 4.2
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
IGNORED.** A warning message is added to the log file instead.
An update is being worked on. Stay tuned.
**A new dependency to python module `pytz` has been added**. Please run `pip install pytz` *When several toots are posted in the same run of toot it is possible that these toots do not appear in
in your environment to install it. chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
your config file to slow down the rate at which toots are uploaded.*
* Added option to display timestamp of the original tweet in toot A list of nitter instances to use can now be specified in the config file
* Tweaked list of nitter instances e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
If none is specified, the built-in list of 2-3 known good instances is used as before.
> Previous updates can be found in CHANGELOG. > Previous updates can be found in CHANGELOG.
@ -114,14 +114,14 @@ twitter and uploaded on Mastodon. The check is very fast if there is no update.
Use `tweet_time_format` option in configuration file to specify the datetime format to display the date Use `tweet_time_format` option in configuration file to specify the datetime format to display the date
at which the tweet was published next to the "Original tweet" link. Valid format specifiers are at which the tweet was published next to the "Original tweet" link. Valid format specifiers are
the same as those used to format datetimes in python the same as those used to format datetimes in python
(https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior). (<https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>).
e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"` e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
An empty or missing `tweet_time_format` disables the display of the timestamp. An empty or missing `tweet_time_format` disables the display of the timestamp.
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone, By default, dates are specified in UTC time zone. To convert the timestamp to another time zone,
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
zone database (https://en.wikipedia.org/wiki/Tz_database) zone database (<https://en.wikipedia.org/wiki/Tz_database>)
e.g. `tweet_timezone = "Europe/Paris"` e.g. `tweet_timezone = "Europe/Paris"`
### Rate control ### Rate control
@ -136,16 +136,16 @@ No limitation is applied to the number of toots uploaded if `-c` is not specifie
Make sure python3 is installed. Make sure python3 is installed.
Twoot depends on `beautifulsoup4` and `Mastodon.py` python modules. Additionally, if you are using Twoot depends on `requests`, `beautifulsoup4`, `Mastodon.py` and `pytz` python modules.
a version of python < 3.11 you also need to install the `tomli` module. Additionally, if you are using a version of python < 3.11 you also need to install the `tomli` module.
**Only If you plan to download videos** with the `-v` switch, are the additional dependencies required: **Only If you plan to download videos** with the `-v` switch, are additional dependencies required:
* Python module `youtube-dl2` * Python module `youtube-dl2`
* [ffmpeg](https://ffmpeg.org/download.html) (installed with the package manager of your distribution) * [ffmpeg](https://ffmpeg.org/download.html) (installed with the package manager of your distribution)
```sh ```sh
pip install beautifulsoup4 Mastodon.py youtube-dl2 pip install beautifulsoup4 Mastodon.py youtube-dl2 pytz
``` ```
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git` In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`

View File

@ -9,88 +9,110 @@ mastodon_instance = ""
mastodon_user = "" mastodon_user = ""
[options] [options]
# List of nitter instances from which to pick at random to download tweets.
# Specify only the address without leading `https://` and without trailing `/`
# By default a built-in list of 2-3 known good instances is used
#
#nitter_instances = ["nitter.nl", "nitter.fdn.fr"]
# Download videos from twitter and upload them on Mastodon # Download videos from twitter and upload them on Mastodon
# Default is false # Default is false
upload_videos = false #
#upload_videos = true
# Also post the "reply-to" tweets from twitter account # Also post the "reply-to" tweets from twitter account
# Default is false # Default is false
post_reply_to = false #
#post_reply_to = true
# Do not post the retweets of other twitter accounts # Do not post the retweets of other twitter accounts
# Default is false # Default is false
skip_retweets = false #
#skip_retweets = true
# Replace redirected links in tweets with direct URLs # Replace redirected links in tweets with direct URLs
# Default is false # Default is false
remove_link_redirections = false #
#remove_link_redirections = true
# Clean up URLs in tweets to remove trackers # Clean up URLs in tweets to remove trackers
# Default is false # Default is false
remove_trackers_from_urls = false #
#remove_trackers_from_urls = true
# Footer line added at bottom of toots # Footer line added at bottom of toots
# e.g. "#twitter #bot"
# Default is "" # Default is ""
footer = "" #
#footer = "#twitter #bot"
# If specified, also diplay a timestamp on the "Original Tweet" line # If specified, also diplay a timestamp on the "Original Tweet" line
# in the given format e.g. "%d %b %Y %H:%M %Z" # in the given format.
# see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior # see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
# Default is "" (tweet timestamp is not displayed) # Default is "" (tweet timestamp is not displayed)
tweet_time_format = "" #
#tweet_time_format = "%d %b %Y %H:%M %Z"
# Specify the timezone that the timestamp on the tweet should be displayed in # Specify the timezone that the timestamp on the tweet should be displayed in
# Use the `tz_identifier`from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones # Use `tz_identifier`from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
# example "Europe/Brussels"
# default is using the local timezone of the machine running the script # default is using the local timezone of the machine running the script
tweet_timezone = "" #tweet_timezone = "Europe/Brussels"
# Do not add reference to "Original tweet" on toots # Do not add reference to "Original tweet" on toots
# default is false # default is false
remove_original_tweet_ref = false #remove_original_tweet_ref = true
# Check if profile avatar or banner pictures were changed and update # Check if profile avatar or banner pictures were changed and update
# the Mastodon account if necessary # the Mastodon account if necessary
# Default is false # Default is false
update_profile = false #update_profile = true
# Maximum age of tweet to post (in days, decimal values accepted) # Maximum age of tweet to post (in days, decimal values accepted)
# Default is 1 # Default is 1
tweet_max_age = 1 #
#tweet_max_age = 0.5
# Minimum age of tweet to post (in minutes) # Minimum age of tweet to post (in minutes)
# Default is 0 (post tweet as soon as possible) # Default is 0 (post tweet as soon as possible)
tweet_delay = 0 #
#tweet_delay = 15
# How many seconds to pause between successive uploads of toots.
# Increase this value if successive tweets appear in the wrong order.
# Default is 0 (no pause)
#
#upload_pause = 5
# Maximum number of toots to post in each run # Maximum number of toots to post in each run
# Default is 0 (which means unlimited) # Default is 0 (which means unlimited)
toot_cap = 0 #
#toot_cap = 2
# Replace twitter.com in links by random alternative out of this list # Replace twitter.com in links by random alternative out of this list
# List of nitter instances # List of nitter instances
# e.g. subst_twitter = ["nitter.net", ]
# Default is [] # Default is []
subst_twitter = [] #
#subst_twitter = ["nitter.net", ]
# Replace youtube.com in links by random alternative out of this list # Replace youtube.com in links by random alternative out of this list
# List of Invidious or Piped instances # List of Invidious or Piped instances
# e.g. subst_youtube = ["piped.kavin.rocks", "invidious.flokinet.to", ]
# Default is [] # Default is []
subst_youtube = [] #
#subst_youtube = ["piped.kavin.rocks", "invidious.flokinet.to", ]
# Replace reddit.com in links by random alternative out of this list # Replace reddit.com in links by random alternative out of this list
# List of Teddit instances # List of Teddit instances
# e.g. subst_reddit = ["teddit.net", ]
# Default is [] # Default is []
subst_reddit = [] #
#subst_reddit = ["teddit.net", ]
# Verbosity of log messages # Verbosity of log messages
# One of DEBUG, INFO, WARNING, ERROR, CRITICAL, OFF # One of DEBUG, INFO, WARNING, ERROR, CRITICAL, OFF
# Default is "WARNING" # Default is "WARNING"
log_level = "WARNING" #
#log_level = "INFO"
# How many days to keep log messages for # How many days to keep log messages for
# Log messages older than log_days will be deleted # Log messages older than log_days will be deleted
# Default is 3 # Default is 3
log_days = 3 #
#log_days = 1

246
twoot.py
View File

@ -34,6 +34,7 @@ from urllib.parse import urlparse, parse_qsl, urlencode, urlunparse, urljoin, un
import requests import requests
from bs4 import BeautifulSoup, element from bs4 import BeautifulSoup, element
from mastodon import Mastodon, MastodonError, MastodonAPIError, MastodonIllegalArgumentError from mastodon import Mastodon, MastodonError, MastodonAPIError, MastodonIllegalArgumentError
import pytz
# Number of records to keep in db table for each twitter account # Number of records to keep in db table for each twitter account
MAX_REC_COUNT = 50 MAX_REC_COUNT = 50
@ -41,25 +42,6 @@ MAX_REC_COUNT = 50
# How many seconds to wait before giving up on a download (except video download) # How many seconds to wait before giving up on a download (except video download)
HTTPS_REQ_TIMEOUT = 10 HTTPS_REQ_TIMEOUT = 10
NITTER_URLS = [
'https://nitter.lacontrevoie.fr',
# 'https://nitter.cutelab.space', # 404 on 12/07/2023
'https://nitter.weiler.rocks', # added 15/06/2023
'https://nitter.nl', # added 16/06/2023
# 'https://n.l5.ca', # Not working 11/07/2023
# 'https://nitter.fly.dev', # gone 11/07/2023
# 'https://notabird.site', # gone 11/07/2023
# 'https://nitter.sethforprivacy.com', # too slow, removed 16/06/2023
# 'https://nitter.it', # different pic naming scheme
# 'https://twitter.femboy.hu', # 404 on 06/05/2023
# 'https://nitter.grimneko.de', # 404 on 01/06/2023
# 'https://nitter.namazso.eu', # lots of 403 27/02/2023
# 'https://twitter.beparanoid.de', # moved 27/022023
# 'https://nitter.fdn.fr', # not updated, rate limited, removed 06/02/2023
# 'https://nitter.hu',
# 'https://nitter.privacydev.net', # USA, added 06/02/2023, removed 15/02/2023 too slow
]
# Update from https://www.whatismybrowser.com/guides/the-latest-user-agent/ # Update from https://www.whatismybrowser.com/guides/the-latest-user-agent/
USER_AGENTS = [ USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36', 'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
@ -70,22 +52,6 @@ USER_AGENTS = [
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Vivaldi/6.1.3035.84', 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Vivaldi/6.1.3035.84',
] ]
"""
Temporary mitigation for unability to parse threads. Skip tweets that are part of a thread
"""
def has_class_timeline_item_but_not_thread(tag):
if tag.has_attr('class'):
classes = tag['class']
if 'timeline-item' in classes and 'thread' not in classes:
return True
elif 'timeline-item' in classes and 'thread' in classes:
logging.warning('Tweet is part of a thread which are a new nitter feature that is not handled yet. Skipping')
return False
else:
return False
else:
return False
def build_config(args): def build_config(args):
""" """
@ -101,6 +67,11 @@ def build_config(args):
# Default options # Default options
options = { options = {
'nitter_instances': [
'nitter.lacontrevoie.fr',
'nitter.weiler.rocks', # added 15/06/2023
'nitter.nl', # added 16/06/2023
],
'upload_videos': False, 'upload_videos': False,
'post_reply_to': False, 'post_reply_to': False,
'skip_retweets': False, 'skip_retweets': False,
@ -112,6 +83,7 @@ def build_config(args):
'remove_original_tweet_ref': False, 'remove_original_tweet_ref': False,
'tweet_max_age': float(1), 'tweet_max_age': float(1),
'tweet_delay': float(0), 'tweet_delay': float(0),
'upload_pause': float(0),
'toot_cap': int(0), 'toot_cap': int(0),
'subst_twitter': [], 'subst_twitter': [],
'subst_youtube': [], 'subst_youtube': [],
@ -192,6 +164,127 @@ def build_config(args):
exit(-1) exit(-1)
"""
Dowload page with full thread of tweets and extract all replied to tweet reference by url.
Only used by `get_timeline()`.
:param session: Existing HTTP session with Nitter instance
:param headers: HTTP headers to use
:param url: url of the thread page to download
:return: List of tweets from the thread
"""
def _get_rest_of_thread(session, headers, url):
logging.debug("Downloading tweets in thread from separate page")
# Download page with thread
try:
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
except requests.exceptions.ConnectionError:
logging.fatal('Host did not respond when trying to download ' + url)
shutdown(-1)
except requests.exceptions.Timeout:
logging.fatal(url + ' took too long to respond')
shutdown(-1)
# Verify that download worked
if thread_page.status_code != 200:
logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(thread_page.status_code) + '). Aborting')
shutdown(-1)
logging.debug('Nitter page downloaded successfully from ' + url)
# DEBUG: Save page to file
# of = open('thread_page_debug.html', 'w')
# of.write(twit_account_page.text)
# of.close()
# Make soup
soup = BeautifulSoup(thread_page.text, 'html.parser')
# Get all items in thread after main tweet
after_tweet = soup.find('div', 'after-tweet')
timeline = after_tweet.find_all('div', class_='timeline-item')
return timeline
"""
Dowload page with full thread of tweets. Only used by `get_timeline()`.
:param url: url of the thread page to download
:return: List of tweets from the thread
"""
def get_timeline(nitter_url):
# Define url to use
url = nitter_url + '/' + TOML['config']['twitter_account']
# Use different page if we need to handle replies
if TOML['options']['post_reply_to']:
url += '/with_replies'
# Initiate session
session = requests.Session()
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
# Update default headers with randomly selected user agent
headers.update(
{
'User-Agent': USER_AGENTS[random.randint(0, len(USER_AGENTS) - 1)],
'Cookie': 'replaceTwitter=; replaceYouTube=; hlsPlayback=on; proxyVideos=',
}
)
# Download twitter page of user
try:
twit_account_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
except requests.exceptions.ConnectionError:
logging.fatal('Host did not respond when trying to download ' + url)
shutdown(-1)
except requests.exceptions.Timeout:
logging.fatal(url + ' took too long to respond')
shutdown(-1)
# Verify that download worked
if twit_account_page.status_code != 200:
logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(
twit_account_page.status_code) + '). Aborting')
shutdown(-1)
logging.debug('Nitter page downloaded successfully from ' + url)
# DEBUG: Save page to file
# of = open('user_page_debug.html', 'w')
# of.write(twit_account_page.text)
# of.close()
# Make soup
soup = BeautifulSoup(twit_account_page.text, 'html.parser')
# Get the div containing tweets
tl = soup.find('div', class_='timeline')
# Get the list of direct children of timeline
list = tl.find_all('div', recursive=False)
timeline = []
for item in list:
classes = item['class']
if 'timeline-item' in classes: # Individual tweet
timeline.append(item)
elif 'thread-line' in classes: # First tweet of a thread
# Get the first item of thread
first_item = item.find('div', class_='timeline-item')
timeline.append(first_item)
# Get the rest of the items of the thread
thread_link_tag = item.find('a', class_='tweet-link')
if thread_link_tag is not None:
thread_url = thread_link_tag.get('href')
timeline.extend(_get_rest_of_thread(session, headers, nitter_url + thread_url))
else:
# Ignore other classes
continue
return soup, timeline
def update_profile(nitter_url, soup, sql, mast_password): def update_profile(nitter_url, soup, sql, mast_password):
""" """
Update profile on Mastodon Update profile on Mastodon
@ -557,6 +650,7 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
vid_class = attachments_container.find('div', class_='video-container') vid_class = attachments_container.find('div', class_='video-container')
if vid_class is not None: if vid_class is not None:
if TOML['options']['upload_videos']: if TOML['options']['upload_videos']:
logging.debug("downloading video from twitter")
import youtube_dl import youtube_dl
video_path = f"{author_account}/status/{status_id}" video_path = f"{author_account}/status/{status_id}"
@ -783,6 +877,7 @@ def main(argv):
log_level = logging.INFO log_level = logging.INFO
elif ll_str == "WARNING": elif ll_str == "WARNING":
log_level = logging.WARNING log_level = logging.WARNING
print('log level warning set')
elif ll_str == "ERROR": elif ll_str == "ERROR":
log_level = logging.ERROR log_level = logging.ERROR
elif ll_str == "CRITICAL": elif ll_str == "CRITICAL":
@ -806,19 +901,20 @@ def main(argv):
logging.info(' post_reply_to : ' + str(TOML['options']['post_reply_to'])) logging.info(' post_reply_to : ' + str(TOML['options']['post_reply_to']))
logging.info(' skip_retweets : ' + str(TOML['options']['skip_retweets'])) logging.info(' skip_retweets : ' + str(TOML['options']['skip_retweets']))
logging.info(' remove_link_redirections : ' + str(TOML['options']['remove_link_redirections'])) logging.info(' remove_link_redirections : ' + str(TOML['options']['remove_link_redirections']))
logging.info(' remove_trackers_from_urls: ' + str(TOML['options']['remove_trackers_from_urls'])) logging.info(' remove_trackers_from_urls : ' + str(TOML['options']['remove_trackers_from_urls']))
logging.info(' footer : ' + TOML['options']['footer']) logging.info(' footer : ' + TOML['options']['footer'])
logging.info(' tweet_time_format : ' + TOML['options']['tweet_time_format']) logging.info(' tweet_time_format : ' + TOML['options']['tweet_time_format'])
logging.info(' tweet_timezone : ' + TOML['options']['tweet_timezone']) logging.info(' tweet_timezone : ' + TOML['options']['tweet_timezone'])
logging.info(' remove_original_tweet_ref: ' + str(TOML['options']['remove_original_tweet_ref'])) logging.info(' remove_original_tweet_ref : ' + str(TOML['options']['remove_original_tweet_ref']))
logging.info(' update_profile : ' + str(TOML['options']['update_profile'])) logging.info(' update_profile : ' + str(TOML['options']['update_profile']))
logging.info(' tweet_max_age : ' + str(TOML['options']['tweet_max_age'])) logging.info(' tweet_max_age : ' + str(TOML['options']['tweet_max_age']))
logging.info(' tweet_delay : ' + str(TOML['options']['tweet_delay'])) logging.info(' tweet_delay : ' + str(TOML['options']['tweet_delay']))
logging.info(' upload_pause : ' + str(TOML['options']['upload_pause']))
logging.info(' toot_cap : ' + str(TOML['options']['toot_cap'])) logging.info(' toot_cap : ' + str(TOML['options']['toot_cap']))
logging.info(' subst_twitter : ' + str(TOML['options']['subst_twitter'])) logging.info(' subst_twitter : ' + str(TOML['options']['subst_twitter']))
logging.info(' subst_youtube : ' + str(TOML['options']['subst_youtube'])) logging.info(' subst_youtube : ' + str(TOML['options']['subst_youtube']))
logging.info(' subst_reddit : ' + str(TOML['options']['subst_reddit'])) logging.info(' subst_reddit : ' + str(TOML['options']['subst_reddit']))
logging.info(' log_level : ' + str(TOML['options']['log_level'])) logging.info(' log_level : ' + TOML['options']['log_level'])
logging.info(' log_days : ' + str(TOML['options']['log_days'])) logging.info(' log_days : ' + str(TOML['options']['log_days']))
# Try to open database. If it does not exist, create it # Try to open database. If it does not exist, create it
@ -832,80 +928,25 @@ def main(argv):
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''') db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
# Select random nitter instance to fetch updates from # Select random nitter instance to fetch updates from
nitter_url = NITTER_URLS[random.randint(0, len(NITTER_URLS) - 1)] nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
# **********************************************************
# Load twitter page of user. Process all tweets and generate
# list of dictionaries ready to be posted on Mastodon
# **********************************************************
# To store content of all tweets from this user
tweets = []
# Initiate session # Load twitter page of user
session = requests.Session() soup, timeline = get_timeline(nitter_url)
# Get a copy of the default headers that requests would use
headers = requests.utils.default_headers()
# Update default headers with randomly selected user agent
headers.update(
{
'User-Agent': USER_AGENTS[random.randint(0, len(USER_AGENTS) - 1)],
'Cookie': 'replaceTwitter=; replaceYouTube=; hlsPlayback=on; proxyVideos=',
}
)
url = nitter_url + '/' + TOML['config']['twitter_account']
# Use different page if we need to handle replies
if TOML['options']['post_reply_to']:
url += '/with_replies'
# Download twitter page of user
try:
twit_account_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
except requests.exceptions.ConnectionError:
logging.fatal('Host did not respond when trying to download ' + url)
shutdown(-1)
except requests.exceptions.Timeout:
logging.fatal(nitter_url + ' took too long to respond')
shutdown(-1)
# Verify that download worked
if twit_account_page.status_code != 200:
logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(
twit_account_page.status_code) + '). Aborting')
shutdown(-1)
logging.debug('Nitter page downloaded successfully from ' + url)
# DEBUG: Save page to file
# of = open(TOML['config']['twitter_account'] + '.html', 'w')
# of.write(twit_account_page.text)
# of.close()
# Make soup
soup = BeautifulSoup(twit_account_page.text, 'html.parser')
# Extract twitter timeline
timeline = soup.find_all(has_class_timeline_item_but_not_thread)
logging.info('Processing ' + str(len(timeline)) + ' tweets found in timeline') logging.info('Processing ' + str(len(timeline)) + ' tweets found in timeline')
# ********************************************************** # **********************************************************
# Process each tweets and generate dictionary # Process each tweets and generate an array of dictionaries
# with data ready to be posted on Mastodon # with data ready to be posted on Mastodon
# ********************************************************** # **********************************************************
tweets = []
out_date_cnt = 0 out_date_cnt = 0
in_db_cnt = 0 in_db_cnt = 0
for status in timeline: for status in timeline:
# Extract tweet ID and status ID # Extract tweet ID and status ID
try:
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m') tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
status_id = tweet_id.split('/')[3] status_id = tweet_id.split('/')[3]
except Exception as e:
logging.critical('Malformed timeline downloaded from nitter instance')
logging.debug(e)
shutdown(-1)
logging.debug('processing tweet %s', tweet_id) logging.debug('processing tweet %s', tweet_id)
@ -1009,7 +1050,6 @@ def main(argv):
if TOML['options']['tweet_time_format'] != "": if TOML['options']['tweet_time_format'] != "":
timestamp_display = timestamp timestamp_display = timestamp
# Adjust timezone # Adjust timezone
import pytz
if TOML['options']['tweet_timezone'] != "": if TOML['options']['tweet_timezone'] != "":
timezone_display = pytz.timezone(TOML['options']['tweet_timezone']) timezone_display = pytz.timezone(TOML['options']['tweet_timezone'])
else: # Use local timezone by default else: # Use local timezone by default
@ -1144,9 +1184,9 @@ def main(argv):
except MastodonAPIError: except MastodonAPIError:
# Assuming this is an: # Assuming this is an:
# ERROR ('Mastodon API returned error', 422, 'Unprocessable Entity', 'Cannot attach files that have not finished processing. Try again in a moment!') # ERROR ('Mastodon API returned error', 422, 'Unprocessable Entity', 'Cannot attach files that have not finished processing. Try again in a moment!')
logging.warning('Mastodon API Error 422: Cannot attach files that have not finished processing. Waiting 60 seconds and retrying.') logging.warning('Mastodon API Error 422: Cannot attach files that have not finished processing. Waiting 30 seconds and retrying.')
# Wait 60 seconds # Wait 30 seconds
time.sleep(60) time.sleep(30)
# retry posting # retry posting
try: try:
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids) toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids)
@ -1163,6 +1203,8 @@ def main(argv):
else: else:
posted_cnt += 1 posted_cnt += 1
logging.debug('Tweet %s posted on %s', tweet['tweet_id'], TOML['config']['mastodon_user']) logging.debug('Tweet %s posted on %s', tweet['tweet_id'], TOML['config']['mastodon_user'])
# Test to find out if slowing down successive posting helps with ordering of threads
time.sleep(TOML['options']['upload_pause'])
# Insert toot id into database # Insert toot id into database
if 'id' in toot: if 'id' in toot: