mirror of
https://gitlab.com/jeancf/twoot.git
synced 2025-02-25 01:18:41 +00:00
Compare commits
17 Commits
15663af09d
...
b690f83ed9
Author | SHA1 | Date | |
---|---|---|---|
|
b690f83ed9 | ||
|
1e81e16788 | ||
|
cb0fc55c8b | ||
|
e53cef4274 | ||
|
943dfffeb1 | ||
|
1d2ce1fc94 | ||
|
fb8d83800e | ||
|
d6ed64d6fc | ||
|
cdc1fb03f7 | ||
|
b10a8392c8 | ||
|
d7bfab4cd3 | ||
|
a4f3934d86 | ||
|
29c7457644 | ||
|
cdbb1bb8f2 | ||
|
5939484160 | ||
|
f8bd948b9c | ||
|
b842f6d471 |
10
CHANGELOG.md
10
CHANGELOG.md
@ -1,5 +1,15 @@
|
|||||||
# Changelog
|
# Changelog
|
||||||
|
|
||||||
|
**12 JUL 2023** VERSION 4.1
|
||||||
|
|
||||||
|
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY IGNORED.** A warning message is added to the log file instead.
|
||||||
|
|
||||||
|
**A new dependency to python module `pytz` has been added**. Please run `pip install pytz`
|
||||||
|
in your environment to install it.
|
||||||
|
|
||||||
|
* Added option to display timestamp of the original tweet in toot
|
||||||
|
* Tweaked list of nitter instances
|
||||||
|
|
||||||
**28 JUN 2023** VERSION 4.0
|
**28 JUN 2023** VERSION 4.0
|
||||||
|
|
||||||
* Added option to update avatar and banner pictures on profile if changed on Twitter
|
* Added option to update avatar and banner pictures on profile if changed on Twitter
|
||||||
|
30
README.md
30
README.md
@ -3,18 +3,18 @@
|
|||||||
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
|
Twoot is a python script that mirrors tweets from a twitter account to a Mastodon account.
|
||||||
It is simple to set-up on a local machine, configurable and feature-rich.
|
It is simple to set-up on a local machine, configurable and feature-rich.
|
||||||
|
|
||||||
**12 JUL 2023** VERSION 4.1
|
**14 JUL 2023** VERSION 4.2
|
||||||
|
|
||||||
**Nitter has recently added a change that highlights tweets that are part of a thread. Twoot
|
Twoot can now handle threads. All tweets can again be uploaded on Mastodon. Tweets in a threads are
|
||||||
cannot handle this modification yet therefore TWEETS THAT ARE PART OF A THREAD ARE CURRENTLY
|
displayed in reverse chronological order in the main timeline (first tweet on top) to improve readability.
|
||||||
IGNORED.** A warning message is added to the log file instead.
|
|
||||||
An update is being worked on. Stay tuned.
|
|
||||||
|
|
||||||
**A new dependency to python module `pytz` has been added**. Please run `pip install pytz`
|
*When several toots are posted in the same run of toot it is possible that these toots do not appear in
|
||||||
in your environment to install it.
|
chronological order on the timeline. If it is the case, try setting `upload_pause` to 3-5 seconds in
|
||||||
|
your config file to slow down the rate at which toots are uploaded.*
|
||||||
|
|
||||||
* Added option to display timestamp of the original tweet in toot
|
A list of nitter instances to use can now be specified in the config file
|
||||||
* Tweaked list of nitter instances
|
e.g. `nitter_instances = ["nitter.nl", "nitter.fdn.fr"]`.
|
||||||
|
If none is specified, the built-in list of 2-3 known good instances is used as before.
|
||||||
|
|
||||||
> Previous updates can be found in CHANGELOG.
|
> Previous updates can be found in CHANGELOG.
|
||||||
|
|
||||||
@ -114,14 +114,14 @@ twitter and uploaded on Mastodon. The check is very fast if there is no update.
|
|||||||
Use `tweet_time_format` option in configuration file to specify the datetime format to display the date
|
Use `tweet_time_format` option in configuration file to specify the datetime format to display the date
|
||||||
at which the tweet was published next to the "Original tweet" link. Valid format specifiers are
|
at which the tweet was published next to the "Original tweet" link. Valid format specifiers are
|
||||||
the same as those used to format datetimes in python
|
the same as those used to format datetimes in python
|
||||||
(https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior).
|
(<https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior>).
|
||||||
e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
|
e.g. `tweet_time_format = "(%d %b %Y %H:%M %Z)"`
|
||||||
|
|
||||||
An empty or missing `tweet_time_format` disables the display of the timestamp.
|
An empty or missing `tweet_time_format` disables the display of the timestamp.
|
||||||
|
|
||||||
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone,
|
By default, dates are specified in UTC time zone. To convert the timestamp to another time zone,
|
||||||
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
|
use the `tweet_timezone` option in configuration file. Valid time zone names are those of the Olson time
|
||||||
zone database (https://en.wikipedia.org/wiki/Tz_database)
|
zone database (<https://en.wikipedia.org/wiki/Tz_database>)
|
||||||
e.g. `tweet_timezone = "Europe/Paris"`
|
e.g. `tweet_timezone = "Europe/Paris"`
|
||||||
|
|
||||||
### Rate control
|
### Rate control
|
||||||
@ -136,16 +136,16 @@ No limitation is applied to the number of toots uploaded if `-c` is not specifie
|
|||||||
|
|
||||||
Make sure python3 is installed.
|
Make sure python3 is installed.
|
||||||
|
|
||||||
Twoot depends on `beautifulsoup4` and `Mastodon.py` python modules. Additionally, if you are using
|
Twoot depends on `requests`, `beautifulsoup4`, `Mastodon.py` and `pytz` python modules.
|
||||||
a version of python < 3.11 you also need to install the `tomli` module.
|
Additionally, if you are using a version of python < 3.11 you also need to install the `tomli` module.
|
||||||
|
|
||||||
**Only If you plan to download videos** with the `-v` switch, are the additional dependencies required:
|
**Only If you plan to download videos** with the `-v` switch, are additional dependencies required:
|
||||||
|
|
||||||
* Python module `youtube-dl2`
|
* Python module `youtube-dl2`
|
||||||
* [ffmpeg](https://ffmpeg.org/download.html) (installed with the package manager of your distribution)
|
* [ffmpeg](https://ffmpeg.org/download.html) (installed with the package manager of your distribution)
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
pip install beautifulsoup4 Mastodon.py youtube-dl2
|
pip install beautifulsoup4 Mastodon.py youtube-dl2 pytz
|
||||||
```
|
```
|
||||||
|
|
||||||
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
|
In your user folder, execute `git clone https://gitlab.com/jeancf/twoot.git`
|
||||||
|
72
default.toml
72
default.toml
@ -9,88 +9,110 @@ mastodon_instance = ""
|
|||||||
mastodon_user = ""
|
mastodon_user = ""
|
||||||
|
|
||||||
[options]
|
[options]
|
||||||
|
# List of nitter instances from which to pick at random to download tweets.
|
||||||
|
# Specify only the address without leading `https://` and without trailing `/`
|
||||||
|
# By default a built-in list of 2-3 known good instances is used
|
||||||
|
#
|
||||||
|
#nitter_instances = ["nitter.nl", "nitter.fdn.fr"]
|
||||||
|
|
||||||
# Download videos from twitter and upload them on Mastodon
|
# Download videos from twitter and upload them on Mastodon
|
||||||
# Default is false
|
# Default is false
|
||||||
upload_videos = false
|
#
|
||||||
|
#upload_videos = true
|
||||||
|
|
||||||
# Also post the "reply-to" tweets from twitter account
|
# Also post the "reply-to" tweets from twitter account
|
||||||
# Default is false
|
# Default is false
|
||||||
post_reply_to = false
|
#
|
||||||
|
#post_reply_to = true
|
||||||
|
|
||||||
# Do not post the retweets of other twitter accounts
|
# Do not post the retweets of other twitter accounts
|
||||||
# Default is false
|
# Default is false
|
||||||
skip_retweets = false
|
#
|
||||||
|
#skip_retweets = true
|
||||||
|
|
||||||
# Replace redirected links in tweets with direct URLs
|
# Replace redirected links in tweets with direct URLs
|
||||||
# Default is false
|
# Default is false
|
||||||
remove_link_redirections = false
|
#
|
||||||
|
#remove_link_redirections = true
|
||||||
|
|
||||||
# Clean up URLs in tweets to remove trackers
|
# Clean up URLs in tweets to remove trackers
|
||||||
# Default is false
|
# Default is false
|
||||||
remove_trackers_from_urls = false
|
#
|
||||||
|
#remove_trackers_from_urls = true
|
||||||
|
|
||||||
# Footer line added at bottom of toots
|
# Footer line added at bottom of toots
|
||||||
# e.g. "#twitter #bot"
|
|
||||||
# Default is ""
|
# Default is ""
|
||||||
footer = ""
|
#
|
||||||
|
#footer = "#twitter #bot"
|
||||||
|
|
||||||
# If specified, also diplay a timestamp on the "Original Tweet" line
|
# If specified, also diplay a timestamp on the "Original Tweet" line
|
||||||
# in the given format e.g. "%d %b %Y %H:%M %Z"
|
# in the given format.
|
||||||
# see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
|
# see https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
|
||||||
# Default is "" (tweet timestamp is not displayed)
|
# Default is "" (tweet timestamp is not displayed)
|
||||||
tweet_time_format = ""
|
#
|
||||||
|
#tweet_time_format = "%d %b %Y %H:%M %Z"
|
||||||
|
|
||||||
# Specify the timezone that the timestamp on the tweet should be displayed in
|
# Specify the timezone that the timestamp on the tweet should be displayed in
|
||||||
# Use the `tz_identifier`from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
# Use `tz_identifier`from https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
|
||||||
# example "Europe/Brussels"
|
|
||||||
# default is using the local timezone of the machine running the script
|
# default is using the local timezone of the machine running the script
|
||||||
tweet_timezone = ""
|
#tweet_timezone = "Europe/Brussels"
|
||||||
|
|
||||||
# Do not add reference to "Original tweet" on toots
|
# Do not add reference to "Original tweet" on toots
|
||||||
# default is false
|
# default is false
|
||||||
remove_original_tweet_ref = false
|
#remove_original_tweet_ref = true
|
||||||
|
|
||||||
# Check if profile avatar or banner pictures were changed and update
|
# Check if profile avatar or banner pictures were changed and update
|
||||||
# the Mastodon account if necessary
|
# the Mastodon account if necessary
|
||||||
# Default is false
|
# Default is false
|
||||||
update_profile = false
|
#update_profile = true
|
||||||
|
|
||||||
# Maximum age of tweet to post (in days, decimal values accepted)
|
# Maximum age of tweet to post (in days, decimal values accepted)
|
||||||
# Default is 1
|
# Default is 1
|
||||||
tweet_max_age = 1
|
#
|
||||||
|
#tweet_max_age = 0.5
|
||||||
|
|
||||||
# Minimum age of tweet to post (in minutes)
|
# Minimum age of tweet to post (in minutes)
|
||||||
# Default is 0 (post tweet as soon as possible)
|
# Default is 0 (post tweet as soon as possible)
|
||||||
tweet_delay = 0
|
#
|
||||||
|
#tweet_delay = 15
|
||||||
|
|
||||||
|
# How many seconds to pause between successive uploads of toots.
|
||||||
|
# Increase this value if successive tweets appear in the wrong order.
|
||||||
|
# Default is 0 (no pause)
|
||||||
|
#
|
||||||
|
#upload_pause = 5
|
||||||
|
|
||||||
# Maximum number of toots to post in each run
|
# Maximum number of toots to post in each run
|
||||||
# Default is 0 (which means unlimited)
|
# Default is 0 (which means unlimited)
|
||||||
toot_cap = 0
|
#
|
||||||
|
#toot_cap = 2
|
||||||
|
|
||||||
# Replace twitter.com in links by random alternative out of this list
|
# Replace twitter.com in links by random alternative out of this list
|
||||||
# List of nitter instances
|
# List of nitter instances
|
||||||
# e.g. subst_twitter = ["nitter.net", ]
|
|
||||||
# Default is []
|
# Default is []
|
||||||
subst_twitter = []
|
#
|
||||||
|
#subst_twitter = ["nitter.net", ]
|
||||||
|
|
||||||
# Replace youtube.com in links by random alternative out of this list
|
# Replace youtube.com in links by random alternative out of this list
|
||||||
# List of Invidious or Piped instances
|
# List of Invidious or Piped instances
|
||||||
# e.g. subst_youtube = ["piped.kavin.rocks", "invidious.flokinet.to", ]
|
|
||||||
# Default is []
|
# Default is []
|
||||||
subst_youtube = []
|
#
|
||||||
|
#subst_youtube = ["piped.kavin.rocks", "invidious.flokinet.to", ]
|
||||||
|
|
||||||
# Replace reddit.com in links by random alternative out of this list
|
# Replace reddit.com in links by random alternative out of this list
|
||||||
# List of Teddit instances
|
# List of Teddit instances
|
||||||
# e.g. subst_reddit = ["teddit.net", ]
|
|
||||||
# Default is []
|
# Default is []
|
||||||
subst_reddit = []
|
#
|
||||||
|
#subst_reddit = ["teddit.net", ]
|
||||||
|
|
||||||
# Verbosity of log messages
|
# Verbosity of log messages
|
||||||
# One of DEBUG, INFO, WARNING, ERROR, CRITICAL, OFF
|
# One of DEBUG, INFO, WARNING, ERROR, CRITICAL, OFF
|
||||||
# Default is "WARNING"
|
# Default is "WARNING"
|
||||||
log_level = "WARNING"
|
#
|
||||||
|
#log_level = "INFO"
|
||||||
|
|
||||||
# How many days to keep log messages for
|
# How many days to keep log messages for
|
||||||
# Log messages older than log_days will be deleted
|
# Log messages older than log_days will be deleted
|
||||||
# Default is 3
|
# Default is 3
|
||||||
log_days = 3
|
#
|
||||||
|
#log_days = 1
|
||||||
|
246
twoot.py
246
twoot.py
@ -34,6 +34,7 @@ from urllib.parse import urlparse, parse_qsl, urlencode, urlunparse, urljoin, un
|
|||||||
import requests
|
import requests
|
||||||
from bs4 import BeautifulSoup, element
|
from bs4 import BeautifulSoup, element
|
||||||
from mastodon import Mastodon, MastodonError, MastodonAPIError, MastodonIllegalArgumentError
|
from mastodon import Mastodon, MastodonError, MastodonAPIError, MastodonIllegalArgumentError
|
||||||
|
import pytz
|
||||||
|
|
||||||
# Number of records to keep in db table for each twitter account
|
# Number of records to keep in db table for each twitter account
|
||||||
MAX_REC_COUNT = 50
|
MAX_REC_COUNT = 50
|
||||||
@ -41,25 +42,6 @@ MAX_REC_COUNT = 50
|
|||||||
# How many seconds to wait before giving up on a download (except video download)
|
# How many seconds to wait before giving up on a download (except video download)
|
||||||
HTTPS_REQ_TIMEOUT = 10
|
HTTPS_REQ_TIMEOUT = 10
|
||||||
|
|
||||||
NITTER_URLS = [
|
|
||||||
'https://nitter.lacontrevoie.fr',
|
|
||||||
# 'https://nitter.cutelab.space', # 404 on 12/07/2023
|
|
||||||
'https://nitter.weiler.rocks', # added 15/06/2023
|
|
||||||
'https://nitter.nl', # added 16/06/2023
|
|
||||||
# 'https://n.l5.ca', # Not working 11/07/2023
|
|
||||||
# 'https://nitter.fly.dev', # gone 11/07/2023
|
|
||||||
# 'https://notabird.site', # gone 11/07/2023
|
|
||||||
# 'https://nitter.sethforprivacy.com', # too slow, removed 16/06/2023
|
|
||||||
# 'https://nitter.it', # different pic naming scheme
|
|
||||||
# 'https://twitter.femboy.hu', # 404 on 06/05/2023
|
|
||||||
# 'https://nitter.grimneko.de', # 404 on 01/06/2023
|
|
||||||
# 'https://nitter.namazso.eu', # lots of 403 27/02/2023
|
|
||||||
# 'https://twitter.beparanoid.de', # moved 27/022023
|
|
||||||
# 'https://nitter.fdn.fr', # not updated, rate limited, removed 06/02/2023
|
|
||||||
# 'https://nitter.hu',
|
|
||||||
# 'https://nitter.privacydev.net', # USA, added 06/02/2023, removed 15/02/2023 too slow
|
|
||||||
]
|
|
||||||
|
|
||||||
# Update from https://www.whatismybrowser.com/guides/the-latest-user-agent/
|
# Update from https://www.whatismybrowser.com/guides/the-latest-user-agent/
|
||||||
USER_AGENTS = [
|
USER_AGENTS = [
|
||||||
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
|
'Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36',
|
||||||
@ -70,22 +52,6 @@ USER_AGENTS = [
|
|||||||
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Vivaldi/6.1.3035.84',
|
'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36 Vivaldi/6.1.3035.84',
|
||||||
]
|
]
|
||||||
|
|
||||||
"""
|
|
||||||
Temporary mitigation for unability to parse threads. Skip tweets that are part of a thread
|
|
||||||
"""
|
|
||||||
def has_class_timeline_item_but_not_thread(tag):
|
|
||||||
if tag.has_attr('class'):
|
|
||||||
classes = tag['class']
|
|
||||||
if 'timeline-item' in classes and 'thread' not in classes:
|
|
||||||
return True
|
|
||||||
elif 'timeline-item' in classes and 'thread' in classes:
|
|
||||||
logging.warning('Tweet is part of a thread which are a new nitter feature that is not handled yet. Skipping')
|
|
||||||
return False
|
|
||||||
else:
|
|
||||||
return False
|
|
||||||
else:
|
|
||||||
return False
|
|
||||||
|
|
||||||
|
|
||||||
def build_config(args):
|
def build_config(args):
|
||||||
"""
|
"""
|
||||||
@ -101,6 +67,11 @@ def build_config(args):
|
|||||||
|
|
||||||
# Default options
|
# Default options
|
||||||
options = {
|
options = {
|
||||||
|
'nitter_instances': [
|
||||||
|
'nitter.lacontrevoie.fr',
|
||||||
|
'nitter.weiler.rocks', # added 15/06/2023
|
||||||
|
'nitter.nl', # added 16/06/2023
|
||||||
|
],
|
||||||
'upload_videos': False,
|
'upload_videos': False,
|
||||||
'post_reply_to': False,
|
'post_reply_to': False,
|
||||||
'skip_retweets': False,
|
'skip_retweets': False,
|
||||||
@ -112,6 +83,7 @@ def build_config(args):
|
|||||||
'remove_original_tweet_ref': False,
|
'remove_original_tweet_ref': False,
|
||||||
'tweet_max_age': float(1),
|
'tweet_max_age': float(1),
|
||||||
'tweet_delay': float(0),
|
'tweet_delay': float(0),
|
||||||
|
'upload_pause': float(0),
|
||||||
'toot_cap': int(0),
|
'toot_cap': int(0),
|
||||||
'subst_twitter': [],
|
'subst_twitter': [],
|
||||||
'subst_youtube': [],
|
'subst_youtube': [],
|
||||||
@ -192,6 +164,127 @@ def build_config(args):
|
|||||||
exit(-1)
|
exit(-1)
|
||||||
|
|
||||||
|
|
||||||
|
"""
|
||||||
|
Dowload page with full thread of tweets and extract all replied to tweet reference by url.
|
||||||
|
Only used by `get_timeline()`.
|
||||||
|
:param session: Existing HTTP session with Nitter instance
|
||||||
|
:param headers: HTTP headers to use
|
||||||
|
:param url: url of the thread page to download
|
||||||
|
:return: List of tweets from the thread
|
||||||
|
"""
|
||||||
|
def _get_rest_of_thread(session, headers, url):
|
||||||
|
logging.debug("Downloading tweets in thread from separate page")
|
||||||
|
# Download page with thread
|
||||||
|
try:
|
||||||
|
thread_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
|
||||||
|
except requests.exceptions.ConnectionError:
|
||||||
|
logging.fatal('Host did not respond when trying to download ' + url)
|
||||||
|
shutdown(-1)
|
||||||
|
except requests.exceptions.Timeout:
|
||||||
|
logging.fatal(url + ' took too long to respond')
|
||||||
|
shutdown(-1)
|
||||||
|
|
||||||
|
# Verify that download worked
|
||||||
|
if thread_page.status_code != 200:
|
||||||
|
logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(thread_page.status_code) + '). Aborting')
|
||||||
|
shutdown(-1)
|
||||||
|
|
||||||
|
logging.debug('Nitter page downloaded successfully from ' + url)
|
||||||
|
|
||||||
|
# DEBUG: Save page to file
|
||||||
|
# of = open('thread_page_debug.html', 'w')
|
||||||
|
# of.write(twit_account_page.text)
|
||||||
|
# of.close()
|
||||||
|
|
||||||
|
# Make soup
|
||||||
|
soup = BeautifulSoup(thread_page.text, 'html.parser')
|
||||||
|
|
||||||
|
# Get all items in thread after main tweet
|
||||||
|
after_tweet = soup.find('div', 'after-tweet')
|
||||||
|
|
||||||
|
timeline = after_tweet.find_all('div', class_='timeline-item')
|
||||||
|
return timeline
|
||||||
|
|
||||||
|
"""
|
||||||
|
Dowload page with full thread of tweets. Only used by `get_timeline()`.
|
||||||
|
:param url: url of the thread page to download
|
||||||
|
:return: List of tweets from the thread
|
||||||
|
"""
|
||||||
|
def get_timeline(nitter_url):
|
||||||
|
# Define url to use
|
||||||
|
url = nitter_url + '/' + TOML['config']['twitter_account']
|
||||||
|
|
||||||
|
# Use different page if we need to handle replies
|
||||||
|
if TOML['options']['post_reply_to']:
|
||||||
|
url += '/with_replies'
|
||||||
|
|
||||||
|
# Initiate session
|
||||||
|
session = requests.Session()
|
||||||
|
|
||||||
|
# Get a copy of the default headers that requests would use
|
||||||
|
headers = requests.utils.default_headers()
|
||||||
|
|
||||||
|
# Update default headers with randomly selected user agent
|
||||||
|
headers.update(
|
||||||
|
{
|
||||||
|
'User-Agent': USER_AGENTS[random.randint(0, len(USER_AGENTS) - 1)],
|
||||||
|
'Cookie': 'replaceTwitter=; replaceYouTube=; hlsPlayback=on; proxyVideos=',
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Download twitter page of user
|
||||||
|
try:
|
||||||
|
twit_account_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
|
||||||
|
except requests.exceptions.ConnectionError:
|
||||||
|
logging.fatal('Host did not respond when trying to download ' + url)
|
||||||
|
shutdown(-1)
|
||||||
|
except requests.exceptions.Timeout:
|
||||||
|
logging.fatal(url + ' took too long to respond')
|
||||||
|
shutdown(-1)
|
||||||
|
|
||||||
|
# Verify that download worked
|
||||||
|
if twit_account_page.status_code != 200:
|
||||||
|
logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(
|
||||||
|
twit_account_page.status_code) + '). Aborting')
|
||||||
|
shutdown(-1)
|
||||||
|
|
||||||
|
logging.debug('Nitter page downloaded successfully from ' + url)
|
||||||
|
|
||||||
|
# DEBUG: Save page to file
|
||||||
|
# of = open('user_page_debug.html', 'w')
|
||||||
|
# of.write(twit_account_page.text)
|
||||||
|
# of.close()
|
||||||
|
|
||||||
|
# Make soup
|
||||||
|
soup = BeautifulSoup(twit_account_page.text, 'html.parser')
|
||||||
|
|
||||||
|
# Get the div containing tweets
|
||||||
|
tl = soup.find('div', class_='timeline')
|
||||||
|
|
||||||
|
# Get the list of direct children of timeline
|
||||||
|
list = tl.find_all('div', recursive=False)
|
||||||
|
|
||||||
|
timeline = []
|
||||||
|
for item in list:
|
||||||
|
classes = item['class']
|
||||||
|
if 'timeline-item' in classes: # Individual tweet
|
||||||
|
timeline.append(item)
|
||||||
|
elif 'thread-line' in classes: # First tweet of a thread
|
||||||
|
# Get the first item of thread
|
||||||
|
first_item = item.find('div', class_='timeline-item')
|
||||||
|
timeline.append(first_item)
|
||||||
|
|
||||||
|
# Get the rest of the items of the thread
|
||||||
|
thread_link_tag = item.find('a', class_='tweet-link')
|
||||||
|
if thread_link_tag is not None:
|
||||||
|
thread_url = thread_link_tag.get('href')
|
||||||
|
timeline.extend(_get_rest_of_thread(session, headers, nitter_url + thread_url))
|
||||||
|
else:
|
||||||
|
# Ignore other classes
|
||||||
|
continue
|
||||||
|
return soup, timeline
|
||||||
|
|
||||||
|
|
||||||
def update_profile(nitter_url, soup, sql, mast_password):
|
def update_profile(nitter_url, soup, sql, mast_password):
|
||||||
"""
|
"""
|
||||||
Update profile on Mastodon
|
Update profile on Mastodon
|
||||||
@ -557,6 +650,7 @@ def process_attachments(nitter_url, attachments_container, status_id, author_acc
|
|||||||
vid_class = attachments_container.find('div', class_='video-container')
|
vid_class = attachments_container.find('div', class_='video-container')
|
||||||
if vid_class is not None:
|
if vid_class is not None:
|
||||||
if TOML['options']['upload_videos']:
|
if TOML['options']['upload_videos']:
|
||||||
|
logging.debug("downloading video from twitter")
|
||||||
import youtube_dl
|
import youtube_dl
|
||||||
|
|
||||||
video_path = f"{author_account}/status/{status_id}"
|
video_path = f"{author_account}/status/{status_id}"
|
||||||
@ -783,6 +877,7 @@ def main(argv):
|
|||||||
log_level = logging.INFO
|
log_level = logging.INFO
|
||||||
elif ll_str == "WARNING":
|
elif ll_str == "WARNING":
|
||||||
log_level = logging.WARNING
|
log_level = logging.WARNING
|
||||||
|
print('log level warning set')
|
||||||
elif ll_str == "ERROR":
|
elif ll_str == "ERROR":
|
||||||
log_level = logging.ERROR
|
log_level = logging.ERROR
|
||||||
elif ll_str == "CRITICAL":
|
elif ll_str == "CRITICAL":
|
||||||
@ -806,19 +901,20 @@ def main(argv):
|
|||||||
logging.info(' post_reply_to : ' + str(TOML['options']['post_reply_to']))
|
logging.info(' post_reply_to : ' + str(TOML['options']['post_reply_to']))
|
||||||
logging.info(' skip_retweets : ' + str(TOML['options']['skip_retweets']))
|
logging.info(' skip_retweets : ' + str(TOML['options']['skip_retweets']))
|
||||||
logging.info(' remove_link_redirections : ' + str(TOML['options']['remove_link_redirections']))
|
logging.info(' remove_link_redirections : ' + str(TOML['options']['remove_link_redirections']))
|
||||||
logging.info(' remove_trackers_from_urls: ' + str(TOML['options']['remove_trackers_from_urls']))
|
logging.info(' remove_trackers_from_urls : ' + str(TOML['options']['remove_trackers_from_urls']))
|
||||||
logging.info(' footer : ' + TOML['options']['footer'])
|
logging.info(' footer : ' + TOML['options']['footer'])
|
||||||
logging.info(' tweet_time_format : ' + TOML['options']['tweet_time_format'])
|
logging.info(' tweet_time_format : ' + TOML['options']['tweet_time_format'])
|
||||||
logging.info(' tweet_timezone : ' + TOML['options']['tweet_timezone'])
|
logging.info(' tweet_timezone : ' + TOML['options']['tweet_timezone'])
|
||||||
logging.info(' remove_original_tweet_ref: ' + str(TOML['options']['remove_original_tweet_ref']))
|
logging.info(' remove_original_tweet_ref : ' + str(TOML['options']['remove_original_tweet_ref']))
|
||||||
logging.info(' update_profile : ' + str(TOML['options']['update_profile']))
|
logging.info(' update_profile : ' + str(TOML['options']['update_profile']))
|
||||||
logging.info(' tweet_max_age : ' + str(TOML['options']['tweet_max_age']))
|
logging.info(' tweet_max_age : ' + str(TOML['options']['tweet_max_age']))
|
||||||
logging.info(' tweet_delay : ' + str(TOML['options']['tweet_delay']))
|
logging.info(' tweet_delay : ' + str(TOML['options']['tweet_delay']))
|
||||||
|
logging.info(' upload_pause : ' + str(TOML['options']['upload_pause']))
|
||||||
logging.info(' toot_cap : ' + str(TOML['options']['toot_cap']))
|
logging.info(' toot_cap : ' + str(TOML['options']['toot_cap']))
|
||||||
logging.info(' subst_twitter : ' + str(TOML['options']['subst_twitter']))
|
logging.info(' subst_twitter : ' + str(TOML['options']['subst_twitter']))
|
||||||
logging.info(' subst_youtube : ' + str(TOML['options']['subst_youtube']))
|
logging.info(' subst_youtube : ' + str(TOML['options']['subst_youtube']))
|
||||||
logging.info(' subst_reddit : ' + str(TOML['options']['subst_reddit']))
|
logging.info(' subst_reddit : ' + str(TOML['options']['subst_reddit']))
|
||||||
logging.info(' log_level : ' + str(TOML['options']['log_level']))
|
logging.info(' log_level : ' + TOML['options']['log_level'])
|
||||||
logging.info(' log_days : ' + str(TOML['options']['log_days']))
|
logging.info(' log_days : ' + str(TOML['options']['log_days']))
|
||||||
|
|
||||||
# Try to open database. If it does not exist, create it
|
# Try to open database. If it does not exist, create it
|
||||||
@ -832,80 +928,25 @@ def main(argv):
|
|||||||
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
|
db.execute('''CREATE INDEX IF NOT EXIsTS profile_index ON profiles (mastodon_instance, mastodon_account)''')
|
||||||
|
|
||||||
# Select random nitter instance to fetch updates from
|
# Select random nitter instance to fetch updates from
|
||||||
nitter_url = NITTER_URLS[random.randint(0, len(NITTER_URLS) - 1)]
|
nitter_url = 'https://' + TOML['options']['nitter_instances'][random.randint(0, len(TOML['options']['nitter_instances']) - 1)]
|
||||||
|
|
||||||
# **********************************************************
|
|
||||||
# Load twitter page of user. Process all tweets and generate
|
|
||||||
# list of dictionaries ready to be posted on Mastodon
|
|
||||||
# **********************************************************
|
|
||||||
# To store content of all tweets from this user
|
|
||||||
tweets = []
|
|
||||||
|
|
||||||
# Initiate session
|
# Load twitter page of user
|
||||||
session = requests.Session()
|
soup, timeline = get_timeline(nitter_url)
|
||||||
|
|
||||||
# Get a copy of the default headers that requests would use
|
|
||||||
headers = requests.utils.default_headers()
|
|
||||||
|
|
||||||
# Update default headers with randomly selected user agent
|
|
||||||
headers.update(
|
|
||||||
{
|
|
||||||
'User-Agent': USER_AGENTS[random.randint(0, len(USER_AGENTS) - 1)],
|
|
||||||
'Cookie': 'replaceTwitter=; replaceYouTube=; hlsPlayback=on; proxyVideos=',
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
url = nitter_url + '/' + TOML['config']['twitter_account']
|
|
||||||
# Use different page if we need to handle replies
|
|
||||||
if TOML['options']['post_reply_to']:
|
|
||||||
url += '/with_replies'
|
|
||||||
|
|
||||||
# Download twitter page of user
|
|
||||||
try:
|
|
||||||
twit_account_page = session.get(url, headers=headers, timeout=HTTPS_REQ_TIMEOUT)
|
|
||||||
except requests.exceptions.ConnectionError:
|
|
||||||
logging.fatal('Host did not respond when trying to download ' + url)
|
|
||||||
shutdown(-1)
|
|
||||||
except requests.exceptions.Timeout:
|
|
||||||
logging.fatal(nitter_url + ' took too long to respond')
|
|
||||||
shutdown(-1)
|
|
||||||
|
|
||||||
# Verify that download worked
|
|
||||||
if twit_account_page.status_code != 200:
|
|
||||||
logging.fatal('The Nitter page did not download correctly from ' + url + ' (' + str(
|
|
||||||
twit_account_page.status_code) + '). Aborting')
|
|
||||||
shutdown(-1)
|
|
||||||
|
|
||||||
logging.debug('Nitter page downloaded successfully from ' + url)
|
|
||||||
|
|
||||||
# DEBUG: Save page to file
|
|
||||||
# of = open(TOML['config']['twitter_account'] + '.html', 'w')
|
|
||||||
# of.write(twit_account_page.text)
|
|
||||||
# of.close()
|
|
||||||
|
|
||||||
# Make soup
|
|
||||||
soup = BeautifulSoup(twit_account_page.text, 'html.parser')
|
|
||||||
|
|
||||||
# Extract twitter timeline
|
|
||||||
timeline = soup.find_all(has_class_timeline_item_but_not_thread)
|
|
||||||
|
|
||||||
logging.info('Processing ' + str(len(timeline)) + ' tweets found in timeline')
|
logging.info('Processing ' + str(len(timeline)) + ' tweets found in timeline')
|
||||||
|
|
||||||
# **********************************************************
|
# **********************************************************
|
||||||
# Process each tweets and generate dictionary
|
# Process each tweets and generate an array of dictionaries
|
||||||
# with data ready to be posted on Mastodon
|
# with data ready to be posted on Mastodon
|
||||||
# **********************************************************
|
# **********************************************************
|
||||||
|
tweets = []
|
||||||
out_date_cnt = 0
|
out_date_cnt = 0
|
||||||
in_db_cnt = 0
|
in_db_cnt = 0
|
||||||
for status in timeline:
|
for status in timeline:
|
||||||
# Extract tweet ID and status ID
|
# Extract tweet ID and status ID
|
||||||
try:
|
|
||||||
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
|
tweet_id = status.find('a', class_='tweet-link').get('href').strip('#m')
|
||||||
status_id = tweet_id.split('/')[3]
|
status_id = tweet_id.split('/')[3]
|
||||||
except Exception as e:
|
|
||||||
logging.critical('Malformed timeline downloaded from nitter instance')
|
|
||||||
logging.debug(e)
|
|
||||||
shutdown(-1)
|
|
||||||
|
|
||||||
logging.debug('processing tweet %s', tweet_id)
|
logging.debug('processing tweet %s', tweet_id)
|
||||||
|
|
||||||
@ -1009,7 +1050,6 @@ def main(argv):
|
|||||||
if TOML['options']['tweet_time_format'] != "":
|
if TOML['options']['tweet_time_format'] != "":
|
||||||
timestamp_display = timestamp
|
timestamp_display = timestamp
|
||||||
# Adjust timezone
|
# Adjust timezone
|
||||||
import pytz
|
|
||||||
if TOML['options']['tweet_timezone'] != "":
|
if TOML['options']['tweet_timezone'] != "":
|
||||||
timezone_display = pytz.timezone(TOML['options']['tweet_timezone'])
|
timezone_display = pytz.timezone(TOML['options']['tweet_timezone'])
|
||||||
else: # Use local timezone by default
|
else: # Use local timezone by default
|
||||||
@ -1144,9 +1184,9 @@ def main(argv):
|
|||||||
except MastodonAPIError:
|
except MastodonAPIError:
|
||||||
# Assuming this is an:
|
# Assuming this is an:
|
||||||
# ERROR ('Mastodon API returned error', 422, 'Unprocessable Entity', 'Cannot attach files that have not finished processing. Try again in a moment!')
|
# ERROR ('Mastodon API returned error', 422, 'Unprocessable Entity', 'Cannot attach files that have not finished processing. Try again in a moment!')
|
||||||
logging.warning('Mastodon API Error 422: Cannot attach files that have not finished processing. Waiting 60 seconds and retrying.')
|
logging.warning('Mastodon API Error 422: Cannot attach files that have not finished processing. Waiting 30 seconds and retrying.')
|
||||||
# Wait 60 seconds
|
# Wait 30 seconds
|
||||||
time.sleep(60)
|
time.sleep(30)
|
||||||
# retry posting
|
# retry posting
|
||||||
try:
|
try:
|
||||||
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids)
|
toot = mastodon.status_post(tweet['tweet_text'], media_ids=media_ids)
|
||||||
@ -1163,6 +1203,8 @@ def main(argv):
|
|||||||
else:
|
else:
|
||||||
posted_cnt += 1
|
posted_cnt += 1
|
||||||
logging.debug('Tweet %s posted on %s', tweet['tweet_id'], TOML['config']['mastodon_user'])
|
logging.debug('Tweet %s posted on %s', tweet['tweet_id'], TOML['config']['mastodon_user'])
|
||||||
|
# Test to find out if slowing down successive posting helps with ordering of threads
|
||||||
|
time.sleep(TOML['options']['upload_pause'])
|
||||||
|
|
||||||
# Insert toot id into database
|
# Insert toot id into database
|
||||||
if 'id' in toot:
|
if 'id' in toot:
|
||||||
|
Loading…
x
Reference in New Issue
Block a user