Scraping one tweet per user using snscrape

Question:

I’m using snscrape.modules.twitter.TwitterSearchScraper() function to scrape tweets for a specific location and time interval. The code is the following one:

loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []

for tweet in sntwitter.TwitterSearchScraper(query).get_items():
      if i==100:
        break
      tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])

My question is if there is a way to get only one tweet per user, because by running the above code some users are repeated.

Asked By: AlejandroDGR

||

Answers:

You could check if the tweet.user.id exists before adding it to your list.

Here, I added a new list (called tweets_user_ids) for store the values from tweet.user.id and add the tweet in the tweets_list list variable if the tweet.user.id does not exists on the new list.

Code:

import snscrape
import snscrape.modules.twitter as sntwitter

loc ='40.4165, -3.70256, 10km'
query = 'geocode:"{}" since:2020-03-15 until:2020-05-01'.format(loc)
tweets_list = []
max_amount_of_tweets = 100
tweets_user_ids = [] # Lists of tweets user ids - this is for check and avoid duplicates.
i = 0 # I suppose this is an incremental value.

for tweet in sntwitter.TwitterSearchScraper(query).get_items():
  # Add the ids to a separate list: 
  if (len(tweets_user_ids) == 0):
    tweets_user_ids.append(tweet.user.id)
  
  # Check if the id is not already added, then, add the data: 
  if (tweet.user.id not in tweets_user_ids):
    tweets_user_ids.append(tweet.user.id)
    tweets_list.append([tweet.date, tweet.user.username, tweet.user.id, tweet.coordinates, tweet.rawContent])
    i+=1 # Increment.
    
  # Break the loop when the max amount of tweets is reached.
  if (i == max_amount_of_tweets):
    break
print(tweets_list)
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.