Twint 是一个用 Python 写的 Twitter 抓取工具,允许从 Twitter 配置文件中抓取推文,不使用 Twitter 的 API。
Twint 利用 Twitter 的搜索语法让您从特定用户那里搜索推文,特定主题,主题标签和相关的推文,或者从推文中挑选敏感信息,如电子邮件和电话号码。
Twint 还对 Twitter 进行了特殊查询,允许您搜索 Twitter 用户的关注者,用户喜欢的推文,以及他们在 API,Selenium 或模拟浏览器的情况下关注的用户。
使用 Twint 和 Twitter API 的一些好处:
- 可以获取几乎所有的推文(Twitter API 限制只能持续 3200 个推文);
- 快速初始设置 ;
- 可以匿名使用,无需 Twitter 注册 ;
- 没有速率限制
Twitter 的限制
Twitter 会限制用户可以浏览的时间线。这意味着通过 .Profile 或者 .Favorites 你只可以看到~3200 条推文。
更多的就看github项目twint吧。
Installation:
git+pip3:
git clone https://github.com/twintproject/twint.git pip3 install -r requirements.txt pip3 install twint
or pip3+pipenv:
pip3 install --user --upgrade -e git+https://github.com/twintproject/[email protected]/master#egg=twint pipenv install -e git+https://github.com/twintproject/twint.git#egg=twint
You may meet module cannot found error when you try to run twint after installation. On ubuntu, add ~/.local/bin into your PATH by:
export PATH=$PATH:~/.local/bin
You may edit ~/.bashrc
file to permanately add the ‘~/.local/bin‘ into your PATH.
Usage:
Running the twint
cmd with arguments can give you results. A few simple examples to help you understand the basics:
twint -u username
- Scrape all the Tweets from user‘s timeline.twint -u username -s pineapple
- Scrape all Tweets from the user‘s timeline containing pineapple.twint -s pineapple
- Collect every Tweet containing pineapple from everyone‘s Tweets.twint -u username --year 2014
- Collect Tweets that were tweeted before 2014.twint -u username --since "2015-12-20 20:30:15"
- Collect Tweets that were tweeted since 2015-12-20 20:30:15.twint -u username --since 2015-12-20
- Collect Tweets that were tweeted since 2015-12-20 00:00:00.twint -u username -o file.txt
- Scrape Tweets and save to file.txt.twint -u username -o file.csv --csv
- Scrape Tweets and save as a csv file.twint -u username --email --phone
- Show Tweets that might have phone numbers or email addresses.twint -s "Donald Trump" --verified
- Display Tweets by verified users that Tweeted about Donald Trump.twint -g="48.880048,2.385939,1km" -o file.csv --csv
- Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.twint -u username -es localhost:9200
- Output Tweets to Elasticsearchtwint -u username -o file.json --json
- Scrape Tweets and save as a json file.twint -u username --database tweets.db
- Save Tweets to a SQLite database.twint -u username --followers
- Scrape a Twitter user‘s followers.twint -u username --following
- Scrape who a Twitter user follows.twint -u username --favorites
- Collect all the Tweets a user has favorited (gathers ~3200 tweet).twint -u username --following --user-full
- Collect full user information a person followstwint -u username --profile-full
- Use a slow, but effective method to gather Tweets from a user‘s profile (Gathers ~3200 Tweets, Including Retweets).twint -u username --retweets
- Use a quick method to gather the last 900 Tweets (that includes retweets) from a user‘s profile.twint -u username --resume resume_file.txt
- Resume a search starting from the last saved scroll-id.
More detail about the commands and options are located in the wiki
{{uploading-image-878565.png(uploading...)}}