当前位置 : 主页 > 编程语言 > python >

twint 安装及使用

来源:互联网 收集:自由互联 发布时间:2021-06-25
Twint 是一个用 Python 写的 Twitter 抓取工具,允许从 Twitter 配置文件中抓取推文,不使用 Twitter 的 API。 Twint 利用 Twitter 的搜索语法让您从特定用户那里搜索推文,特定主题,主题标签和相

Twint 是一个用 Python 写的 Twitter 抓取工具,允许从 Twitter 配置文件中抓取推文,不使用 Twitter 的 API。

Twint 利用 Twitter 的搜索语法让您从特定用户那里搜索推文,特定主题,主题标签和相关的推文,或者从推文中挑选敏感信息,如电子邮件和电话号码。

Twint 还对 Twitter 进行了特殊查询,允许您搜索 Twitter 用户的关注者,用户喜欢的推文,以及他们在 API,Selenium 或模拟浏览器的情况下关注的用户。

使用 Twint 和 Twitter API 的一些好处:

  1. 可以获取几乎所有的推文(Twitter API 限制只能持续 3200 个推文);
  2. 快速初始设置 ;
  3. 可以匿名使用,无需 Twitter 注册 ;
  4. 没有速率限制

Twitter 的限制

Twitter 会限制用户可以浏览的时间线。这意味着通过 .Profile 或者 .Favorites 你只可以看到~3200 条推文。

更多的就看github项目twint吧。

Installation:

git+pip3:

git clone https://github.com/twintproject/twint.git
pip3 install -r requirements.txt
pip3 install twint

or pip3+pipenv:

pip3 install --user --upgrade -e git+https://github.com/twintproject/[email protected]/master#egg=twint
pipenv install -e git+https://github.com/twintproject/twint.git#egg=twint

You may meet module cannot found error when you try to run twint after installation. On ubuntu, add ~/.local/bin into your PATH by:

export PATH=$PATH:~/.local/bin

You may edit ~/.bashrc file to permanately add the ‘~/.local/bin‘ into your PATH.

Usage:

Running the twint cmd with arguments can give you results. A few simple examples to help you understand the basics:

  • twint -u username - Scrape all the Tweets from user‘s timeline.
  • twint -u username -s pineapple - Scrape all Tweets from the user‘s timeline containing pineapple.
  • twint -s pineapple - Collect every Tweet containing pineapple from everyone‘s Tweets.
  • twint -u username --year 2014 - Collect Tweets that were tweeted before 2014.
  • twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15.
  • twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 00:00:00.
  • twint -u username -o file.txt - Scrape Tweets and save to file.txt.
  • twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file.
  • twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses.
  • twint -s "Donald Trump" --verified - Display Tweets by verified users that Tweeted about Donald Trump.
  • twint -g="48.880048,2.385939,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.
  • twint -u username -es localhost:9200 - Output Tweets to Elasticsearch
  • twint -u username -o file.json --json - Scrape Tweets and save as a json file.
  • twint -u username --database tweets.db - Save Tweets to a SQLite database.
  • twint -u username --followers - Scrape a Twitter user‘s followers.
  • twint -u username --following - Scrape who a Twitter user follows.
  • twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet).
  • twint -u username --following --user-full - Collect full user information a person follows
  • twint -u username --profile-full - Use a slow, but effective method to gather Tweets from a user‘s profile (Gathers ~3200 Tweets, Including Retweets).
  • twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user‘s profile.
  • twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.

More detail about the commands and options are located in the wiki

{{uploading-image-878565.png(uploading...)}}

网友评论