Python snscrape: How to scrape tweet URL/link using snscrape?

Question:

What parameter should we use to include the URL/link of tweets? I have here the date, username, and content. Another question also is how can we transform the date in the dataframe into GMT+8? The timezone is in UTC. Please see code below for reference:

import snscrape.modules.twitter as sntwitter
import pandas as pd

query = "(from:elonmusk) until:2023-01-28 since:2023-01-27"
tweets = []
limit = 100000


for tweet in sntwitter.TwitterSearchScraper(query).get_items():
    
    if len(tweets) == limit:
        break
    else:
        tweets.append([tweet.date, tweet.username, tweet.content])
        
df = pd.DataFrame(tweets, columns=['Date', 'Username', 'Tweet'])

#Save to csv
df.to_csv('tweets.csv')
df
Asked By: jislesplr

||

Answers:

The get_items() return every single a search result with class type.
So the count of tweets needs to count by for loop.

This code will works

100K tweets is possible but it take too much time, I reduced 1K tweets.

import snscrape.modules.twitter as sntwitter
import pandas as pd

query = 'from:elonmusk since:2022-08-01 until:2023-01-28'
limit = 1000

tweets = sntwitter.TwitterSearchScraper(query).get_items()

index = 0
df = pd.DataFrame(columns=['Date','URL' ,'Tweet'])

for tweet in tweets:
    if index == limit:
        break
    URL = "https://twitter.com/{0}/status/{1}".format(tweet.user.username,tweet.id)
    df2 = {'Date': tweet.date, 'URL': URL, 'Tweet': tweet.rawContent}
    df = pd.concat([df, pd.DataFrame.from_records([df2])])
    index = index + 1

# # Converting time zone from UTC to GMT+8
df['Date'] = df['Date'].dt.tz_convert('Etc/GMT+8')
print(df)

df.to_csv('tweets.csv')

This single data of get_items()
it needs to extract only required key’s value

tweet.date -> Date

https://twitter.com/tweet.user.username/status/tweet.id-> URL

tweet.rawContent-> Tweet
{
  "_type": "snscrape.modules.twitter.Tweet",
  "url": "https://twitter.com/elonmusk/status/1619164489710178307",
  "date": "2023-01-28T02:44:31+00:00",
  "rawContent": "@tn_daki @ShitpostGate Yup",
  "renderedContent": "@tn_daki @ShitpostGate Yup",
  "id": 1619164489710178307,
  "user": {
    "_type": "snscrape.modules.twitter.User",
    "username": "elonmusk",
    "id": 44196397,
    "displayname": "Mr. Tweet",
    "rawDescription": "",
    "renderedDescription": "",
    "descriptionLinks": null,
    "verified": true,
    "created": "2009-06-02T20:12:29+00:00",
    "followersCount": 127536699,
    "friendsCount": 176,
    "statusesCount": 22411,
    "favouritesCount": 17500,
    "listedCount": 113687,
    "mediaCount": 1367,
    "location": "",
    "protected": false,
    "link": null,
    "profileImageUrl": "https://pbs.twimg.com/profile_images/1590968738358079488/IY9Gx6Ok_normal.jpg",
    "profileBannerUrl": "https://pbs.twimg.com/profile_banners/44196397/1576183471",
    "label": null,
    "url": "https://twitter.com/elonmusk"
  }
    ... cut off

Result

>python get-data.py
                        Date                                                URL                                              Tweet
0  2023-01-27 15:29:36-08:00  https://twitter.com/elonmusk/status/1619115435...                                  @farzyness No way
0  2023-01-27 15:14:05-08:00  https://twitter.com/elonmusk/status/1619111533...  @mtaibbi Please correct your bs @PolitiFact &a...
0  2023-01-27 14:52:55-08:00  https://twitter.com/elonmusk/status/1619106207...  @WallStreetSilv A quarter of all taxes just to...
0  2023-01-27 13:28:26-08:00  https://twitter.com/elonmusk/status/1619084945...           @nudubabba @mikeduncan Yeah, whole thing
0  2023-01-27 13:12:16-08:00  https://twitter.com/elonmusk/status/1619080876...  @TaraBull808 That’s way more monkeys than the ...
..                       ...                                                ...                                                ...
0  2022-12-14 11:14:53-08:00  https://twitter.com/elonmusk/status/1603106271...  @Jason Advertising revenue next year will be l...
0  2022-12-14 04:08:43-08:00  https://twitter.com/elonmusk/status/1602999020...                        @Balyx_ He would be welcome
0  2022-12-14 03:42:47-08:00  https://twitter.com/elonmusk/status/1602992493...  @NorwayMFA @TwitterSupport @jonasgahrstore @AH...
0  2022-12-14 03:35:14-08:00  https://twitter.com/elonmusk/status/1602990594...                                    @AvidHalaby Wow
0  2022-12-14 03:35:03-08:00  https://twitter.com/elonmusk/status/1602990549...                     @AvidHalaby Live & learn …

[1000 rows x 3 columns]

Reference

Converting time zone pandas dataframe

Tweet URL format

Detain information in here

Example:

URL = "https://twitter.com/elonmusk/status/1619111533216403456"

It saved into csv file.

0,2023-01-27 15:14:05-08:00,https://twitter.com/elonmusk/status/1619111533216403456,@mtaibbi Please correct your bs @PolitiFact & @snopes

It matched the tweet content and pandas Tweet column.

Also, you can add column, followers Count, friends Count, statuses Count, favourites Count, listed Count, media Count, reply Count, retweet Count, like Count and view Count too.

enter image description here

Answered By: Bench Vue
Categories: questions Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.