Commit Graph

320 Commits

Author SHA1 Message Date
jeancf
2a736de0c7 Replaced poor performing nitter instances 2022-11-18 12:17:34 +01:00
BuildTools
e2eff0445c Changed mode of twoot.py 2022-11-18 12:07:02 +01:00
jeancf
26b0619880 added command-line option 2022-11-18 11:55:06 +01:00
jeancf
dc8c89243c Updated user agents 2022-11-17 20:56:21 +01:00
jeancf
a2c9deb250 Removed duplicate tracker tags and added 'xtor' 2022-11-17 20:53:03 +01:00
jeancf
6a20c257e5 Merged contribution from
mathdatech
2022-11-17 20:18:42 +01:00
jeancf
b04b7dc195 Removed temp debug 2022-11-14 12:40:56 +01:00
jeancf
f96d8fa93c Added missing logging 2022-11-14 12:36:06 +01:00
jeancf
514a1b3304 Added some temp debug code 2022-11-14 12:26:55 +01:00
jeancf
608bc7519f Corrected condition on retweet tag 2022-11-13 22:35:46 +01:00
jeancf
84b94a38b9 Implemented retweet suppression 2022-11-13 22:17:43 +01:00
jeancf
506c4a05b7 Merge branch timeout into vid_dl 2022-11-06 12:05:23 +01:00
BuildTools
bd7860bb43 Keep log file history 2022-11-06 11:56:29 +01:00
jeancf
11b88e729a Added timeout to all downloads 2022-11-06 11:50:08 +01:00
jeancf
e8c03ab50b youtube-dl set in quiet mode 2022-11-06 11:24:57 +01:00
jeancf
4d1fec306f using youtube-dl as a class 2022-11-03 22:10:23 +01:00
jeancf
10a329fdb1 Replaced twitterdl.py by youtube-dl 2022-11-03 16:53:17 +01:00
jeancf
9c2438382e Added timeout to get request 2022-11-02 18:38:23 +01:00
jeancf
ebf32cebc9 Initialized variable referenced later 2022-10-08 10:25:04 +02:00
BuildTools
216da5519f Removed n.actionsack.com 2022-09-24 13:26:08 +02:00
jeancf
cfd1232f35 Merge remote-tracking branch 'gitlab/master' into cleandb
# Conflicts:
#	twoot.py
2022-09-15 20:35:27 +02:00
jeancf
3273b21608 Fixed bug in query 2022-09-15 20:12:20 +02:00
jeancf
dada20d0b9 Added database cleanup (untested) 2022-09-15 19:58:17 +02:00
jeancf
7f462a5a6e Minor improvement to logging 2022-09-14 16:54:47 +02:00
jeancf
5e0fb1a9c3 Corrected typo 2022-09-14 16:35:10 +02:00
jeancf
bfbe9704f7 Cosmetic changes 2022-09-14 16:28:48 +02:00
jeancf
4ccce6aac1 asctime() instead 2022-09-08 10:19:23 +02:00
jeancf
392b0bafd0 more str conversion 2022-09-08 10:17:14 +02:00
jeancf
357e45844d convert int to str 2022-09-08 10:15:14 +02:00
jeancf
2b21a626d4 Less stupid 2022-09-08 10:11:37 +02:00
jeancf
ffdce1ad12 updated url 2022-09-08 10:05:19 +02:00
jeancf
63a7a578a4 epoch to local time 2022-09-08 09:37:30 +02:00
jeancf
a7b63f569f Changed logging to info 2022-09-08 09:35:02 +02:00
jeancf
4704890ddf check rate limit 2022-09-08 09:28:28 +02:00
jeancf
7ffa81ffbd No longer try creating unique index 2022-08-22 14:50:03 +02:00
jeancf
65b880f5be Bug removed 2022-08-22 14:27:18 +02:00
jeancf
29cf330699 Improved error message and removed nitter mirror 2022-08-22 14:09:43 +02:00
jeancf
fe145525ab Added index on sqlite database 2022-08-22 14:00:28 +02:00
jeancf
98ed69e232 Correct mirror URL 2022-08-22 13:34:56 +02:00
jeancf
94d1fc4e22 Fixed the fix of the fix 2022-08-22 09:33:27 +02:00
jeancf
82a9430160 Fixed the fix 2022-08-22 09:30:52 +02:00
jeancf
3c847e4f06 Fixed false positive on search for "replying-to" 2022-08-22 08:54:17 +02:00
jeancf
c4abee2835 Updated Nitter URLs 2022-08-19 11:15:49 +02:00
jeancf
e6854106eb Updated user agents 2022-08-19 10:48:33 +02:00
jeancf
00f374896d Fliexibility in timestamp 2022-01-03 18:11:40 +01:00
jeancf
65d91bf025 Clarified info and updated nitter sites 2022-01-03 18:03:56 +01:00
BuildTools
2a63371336 Adjusted nitter sites 2022-01-03 17:44:37 +01:00
BuildTools
735503c1b1 Merge branch 'master' of https://gitlab.com/jeancf/twoot
Merging master
2021-10-16 19:29:28 +02:00
BuildTools
204f1e5c9f Updated nitter site list 2021-10-16 19:27:49 +02:00
jeancf
a463ce335b Catching connection exception to nitter site 2021-10-16 19:26:02 +02:00
jeancf
200837c336 Improved logging message of cap limit 2021-06-03 09:35:34 +02:00
jeancf
0637c8ccda Corrected basicConfig parzmeter 2021-06-01 16:12:05 +02:00
jeancf
c688035fd0 Implemented timestamps in logs 2021-06-01 15:49:11 +02:00
BuildTools
29629e2785 Logging improvementµ 2021-06-01 14:57:43 +02:00
jeancf
71acd65ba0 Implemented cap 2021-06-01 11:54:08 +02:00
jeancf
3148180e9a Some cleanup
Rebased
2021-06-01 11:27:22 +02:00
BuildTools
3963b102b9 Modified active nitter hosts 2021-06-01 11:05:33 +02:00
jeancf
588e6003ca Set logging to WARNING 2021-03-07 21:29:20 +01:00
jeancf
56b87e4756 Merge branch 'master' of https://gitlab.com/jeancf/twoot 2021-03-07 21:26:58 +01:00
jeancf
cf856bee08 Login only when there is something to upload 2021-03-07 21:26:52 +01:00
BuildTools
b9842db677 Added 300s timeout to twitter video download 2021-03-05 17:13:59 +01:00
jeancf
807dad3480 Random selection of nitter mirror to use 2021-03-02 22:08:52 +01:00
jeancf
8e4f13c26a placed nitter url in const 2021-02-11 19:03:12 +01:00
jeancf
a9109884a4 More debug messages 2020-12-19 10:59:23 +01:00
jeancf
1d40071b27 Added log of twitter:image download 2020-12-19 10:53:11 +01:00
jeancf
40185ef817 Improved last logging syntax 2020-12-19 10:48:46 +01:00
jeancf
5df11dbe4b Fixed last logging syntax 2020-12-19 10:36:59 +01:00
jeancf
3c7693fe66 Updated README
Improved decimal format in log
2020-12-19 10:30:19 +01:00
jeancf
dc6c16ae16 Keep logs for now 2020-12-19 10:09:03 +01:00
jeancf
43d63b1e5a Added logging run time 2020-12-19 09:21:39 +01:00
jeancf
bb52e54c0d Logging set to debug 2020-12-18 22:43:50 +01:00
jeancf
066f737a61 quote is an 'a' tag 2020-12-18 22:41:57 +01:00
jeancf
60f7054fac Separate logging for exceptions 2020-12-18 22:16:27 +01:00
jeancf
1525955c52 Added info log messages 2020-12-18 22:09:34 +01:00
jeancf
33342cdfb7 Cards can have no pic 2020-12-18 21:32:26 +01:00
jeancf
986d902ccd Fixed video download url 2020-12-18 21:06:05 +01:00
jeancf
62ba2f505e Issues with video download 2020-12-18 17:55:12 +01:00
jeancf
a0ce29f4c5 Fine tuning 2020-12-18 17:35:50 +01:00
jeancf
67bf87213d Correct url in image downloads 2020-12-18 17:21:41 +01:00
jeancf
822215fefe download more images. Improved logging 2020-12-18 17:06:09 +01:00
jeancf
3a88438ec2 Some easy bugs squashed 2020-12-18 14:57:22 +01:00
jeancf
f229976861 Improved logging. "OMG, it's full of bugs!" 2020-12-18 14:39:13 +01:00
jeancf
551c47d576 Implemented process attachment 2020-12-18 14:28:17 +01:00
jeancf
efa84f85d3 Download nitter video 2020-12-18 13:26:26 +01:00
jeancf
b4a596eff2 Downloaded pics attachments 2020-12-18 11:45:43 +01:00
jeancf
14c24fe847 started process_attachments() 2020-12-17 22:59:21 +01:00
jeancf
8079914282 Reworked process_media_body 2020-12-17 22:08:43 +01:00
jeancf
711ec9677a Added a bunch of TODO 2020-12-17 21:44:32 +01:00
jeancf
992f91537f TODO done 2020-12-17 18:59:02 +01:00
jeancf
fbec4004f9 Handled reply-to 2020-12-17 17:56:12 +01:00
jeancf
557ef6deb9 Handling reply-to 2020-12-17 17:50:10 +01:00
jeancf
0787669a3a Moved time check to beginning of process 2020-12-17 17:31:43 +01:00
jeancf
d92bcea2a7 Added cookie to preserve twitter and youtube addresses 2020-12-17 10:44:30 +01:00
jeancf
3a2c8093a3 Improved logging in cleanup_tweet_text 2020-12-17 10:15:46 +01:00
jeancf
857a7f9b9e Extracted full_status_url 2020-12-16 22:46:01 +01:00
jeancf
e6e24cbfd5 Extracted author, author_account, time_string, timestamp 2020-12-16 22:15:27 +01:00
jeancf
19d988dfcb Removed extracting avatar 2020-12-16 22:03:09 +01:00
jeancf
4e6a97d765 Removed downloading of status page with uncensored pics 2020-12-16 21:58:24 +01:00
jeancf
e87599d40b Removed downloading of full status page of the tweet 2020-12-16 21:57:03 +01:00
jeancf
7cc076053f Extracted tweet_id and status_id 2020-12-16 21:55:13 +01:00
jeancf
c25e36b498 Extracted timeline 2020-12-16 20:55:26 +01:00
jeancf
910b7a8b13 Safer implementation 2020-12-16 20:48:00 +01:00
jeancf
e2841535f6 Extracted twit_account 2020-12-16 20:42:44 +01:00
jeancf
894c13d551 Download page from nitter.net 2020-12-16 19:43:17 +01:00
jeancf
9fc76b9981 Updated user agents 2020-12-16 18:47:27 +01:00
BuildTools
c4bf95c1a7 Commented out printing of extracted tweets 2020-12-13 21:04:33 +01:00
jeancf
010f5fdeec Merge remote-tracking branch 'gitlab/logging' into logging 2020-12-13 18:30:57 +01:00
jeancf
b7175067c0 Added timeout to execution of twitterdl.py 2020-12-13 18:25:27 +01:00
jeancf
267d4cb551 TODO is done 2020-12-13 10:44:07 +01:00
jeancf
4f326ee3cd Added more debug messages 2020-11-09 15:55:42 +01:00
jeancf
1781eb5653 Basic logging setup 2020-10-14 21:51:00 +02:00
jeancf
67fdbba510 Stop trying to regex a string into linked picture file 2020-09-10 13:09:51 +02:00
JC Francois
a95006fae6 Added tolerance for malformed URL in picture download 2020-04-09 18:17:13 +02:00
JC Francois
092f2ab371 Cleanup and README.md update for release 2020-04-05 10:37:54 +02:00
JC Francois
e32620d79b Implemented proper naming of downloaded videos 2020-03-29 17:16:54 +02:00
JC Francois
965317f5b2 Added details on optional dependencies to README.md 2020-03-29 13:57:18 +02:00
JC Francois
6fa2019618 Calling twitterdl.py as subprocess 2020-03-29 13:41:49 +02:00
jeancf
2090d214b6 Trying to stop debug messages 2020-03-28 14:11:06 +01:00
jeancf
9c56ad57c8 Added TODOs to improve management of locations of video download 2020-03-28 14:07:00 +01:00
jeancf
df4eaa0dd7 Set debug=0 on call to download to avoid mail spam 2020-03-28 13:55:43 +01:00
jeancf
ba3da6ab7c Handled exception of video download directory absent when trying to delete it 2020-03-28 11:21:28 +01:00
jeancf
dd1d54d2a4 Check if tweet in db before ingest to speed up processing of feed 2020-03-28 11:08:09 +01:00
jeancf
2fe06c0bbc Use correct capitalization of twitter account name for deleting video directory 2020-03-27 17:45:40 +01:00
jeancf
0231f224a3 Improved naming of downloaded videos and implemented cleanup 2020-03-27 17:26:04 +01:00
JC Francois
9a8cd0ef65 TODO's and FIXME's 2020-03-26 20:50:59 +01:00
JC Francois
04c95f3ad3 Added command-line option to download video from tweet and upload to Mastodon 2020-03-26 19:58:17 +01:00
JC Francois
99ffa52eb6 Added upload of video to Mastodon instance 2020-03-26 19:03:21 +01:00
jeancf
909183eb0a Added comments and TODOs 2020-03-26 15:04:36 +01:00
jeancf
b768561662 Added video file path to dictionary with content of tweet 2020-03-26 14:50:03 +01:00
jeancf
ab7c68dff3 Added video download to cleanup_tweet_text() 2020-03-26 13:37:50 +01:00
jeancf
0f38bf283e Adding twitter-video-downloader module 2020-03-25 17:40:07 +01:00
jeancf
305c3d05c7 Added .gitignore 2020-03-22 11:58:15 +01:00
jeancf
3201cc8169 Added option to include reply-to tweets (False by default) 2020-03-17 15:13:22 +01:00
jeancf
7b22838d35 Forgot to comment out one debug output 2020-03-06 18:02:21 +01:00
jeancf
eda84e78e7 Commented out debug prints 2020-03-06 17:44:10 +01:00
jeancf
6fb6a38732 Used session to manage cookies automatically 2020-03-06 17:40:13 +01:00
jeancf
fd9130c053 NOT WORKING: trying to get uncensored pic redirects to no_js page 2020-03-04 17:49:57 +01:00
JC François
3cb60ad963 Better spacing around RT mention 2020-02-17 11:40:02 +00:00
JC Francois
1acc33f7e7 Removed excess / in original tweet link 2020-02-15 17:06:56 +01:00
jeancf
6a397cc089 Removed debug print 2020-02-15 16:03:05 +01:00
jeancf
a5fde58615 Rewrite complete 2020-02-15 15:39:01 +01:00
jeancf
0b15b93d37 Loading full_status_page (headers not working) 2020-02-14 18:01:12 +01:00
jeancf
296d124c35 Text and links are fixed 2020-02-14 16:37:54 +01:00
JC Francois
9dbf40bb5d WIP twitter changes 2020-02-14 07:58:39 +01:00
JC Francois
446f39f173 Added handling of no_js landing page 2020-02-13 18:01:45 +01:00
JC Francois
fdab0a0836 Add tolerance for TooManyRedirects in media search in linked page 2020-01-04 11:39:53 +01:00
JC Francois
bb226e54a9 Added handler (doing nothing) for tweet-poi-geo-text 2019-12-21 12:05:54 +01:00
JC Francois
56a89ec4f3 Added tolerance for invalid media links 2019-11-07 20:06:15 +01:00
BuildTools
51f7ca9694 Added tolerance for missing mimetype in content 2019-10-24 19:35:24 +02:00
JC Francois
55f188b9f9 Added tolerance for encoding exception 2019-10-19 09:53:48 +02:00
jeancf
0222eb175b Remove HTML-safe encoding from URL if any 2019-10-11 14:28:24 +02:00
jeancf
203a147fad Added timeout to request of linked page 2019-09-27 11:44:11 +02:00
JC François
3cc1f8ec3f Updated user agents 2019-09-26 21:55:00 +02:00
JC François
a24f43dbfa Fixed bug when tweet text is empty 2019-09-17 07:18:55 +00:00
JC Francois
3387e363d5 Try to not specify mime_type to fix crash when mime_type is NoneType 2019-09-08 20:01:26 +02:00
JC Francois
1d4270cb00 Another iteration on fix for crash when mime_type is NoneType 2019-09-08 19:11:35 +02:00
JC Francois
0eac3065d5 Added fix for crash when mime_type is NoneType 2019-09-08 19:03:43 +02:00
JC Francois
806b57f763 Added tolerance for MastodonIllegalArgumentError when visiting linked page to extract picture 2019-09-07 13:08:17 +02:00
JC Francois
1ec03c7d81 Added tolerance for ConnectionError when visiting linked page to extract picture 2019-08-31 20:30:40 +02:00
JC Francois
1b1faf2d59 Added tolerance for mastodonAPIError when uploading invalid media 2019-08-26 21:02:19 +02:00
jeancf
5b23c66b6b Added option to scrape linked page if no pic is provided in tweet 2019-08-16 15:27:55 +02:00
BuildTools
32c5e0510c Deactivated saving of html page 2019-08-05 23:08:36 +02:00
JC Francois
4b4e73f69e Fixed crash when timestamp is missing 2019-08-05 20:18:59 +02:00
JC Francois
eaa144c39a Added dependency in README and commented out debug code 2019-08-01 17:17:47 +02:00
JC Francois
18f43de98f Resolved issue of emoji in unicode 2019-08-01 15:18:40 +02:00
JC Francois
32f3eccc70 Implemented command line parsing 2019-08-01 14:58:41 +02:00
JC Francois
9b8b748b5a Implemented random user agent 2019-08-01 12:31:26 +02:00
JC Francois
d11e5d123f Fixed interactions with Mastodon instance (.secret files bug) 2019-08-01 12:02:27 +02:00
JC Francois
e062b55af7 Added retweet mention + original tweet mention 2019-07-31 23:37:47 +02:00
JC Francois
db50e7b3f0 Initial commit 2019-07-31 22:42:38 +02:00