How to scrape twitter users based on a certain query using snscrape

Question:

I am using snscrape to scrape users that have a certain keyword in their bio.
the algorithm that I am using right-now is the following:

  1. search for tweets that contains a certain words
  2. extract the user who tweeted this tweet
  3. filter the extracted user bio against the the wanted keyword (if the user bio includes the keywords append that user in a data frame if not, discard this user).

now what I want to know is there a method that can immediately search for users based on their bio instead of what I am doing right-now i.e. simulating the advance search feature of the Twitter web page ?
I looked at snscrape docs but all classes that deals with users appears to be only dealing with a specific user not search for users based on some query.

here is my code that I am currently running

import snscrape.modules.twitter as sntwt

query = "co founder (CEO OR Congrees OR CTO) lang:en"
tweets = []
limit = 5000
# instead of searching for tweets I want to search for users
for tweet in sntwt.TwitterSearchScraper(query).get_items():

    print(vars(tweet))
    print('nnnn')
    # some code that filters the users

finally a screen-shot of Twitter advance-search that simulate the behavior I want.

enter image description here

Asked By: Adel Moustafa

||

Answers:

Check out https://github.com/JustAnotherArchivist/snscrape/issues/263. As of this writing, this is still an open issue, but JustAnotherArchivist (the repository owner) appears to have committed an update a few weeks ago that allows this functionality (it might not be documented yet, or might not be reliable).

I think this requires the developer version of snscrape. So install/upgrade to that if you don’t have it yet (from Medium article):

$ pip3 install git+https://github.com/JustAnotherArchivist/snscrape.git

This then should allow the "–user" flag to work (I’m using snscrape from the command line; not sure about the Python wrapper). For example:

$ snscrape --jsonl --max-results 10 twitter-search --user "go bananas since:2022-12-31" > out_file.json

This seems to search for the query string "go bananas" anywhere in the user object. This returns user object for example with: ‘username’: ‘gobananagoband’ and ‘displayname’: ‘Go Banana Go!’ It also returns a user object with: ‘description’: "When games go BANANAS, we’ve got you covered in bunches. Tips? @ reply or bananasalert at gmail." (As far as I can tell, ‘description’, ‘rawDescription’, and ‘renderedDescription’ are all the user bio.)

I am not sure if you can just select for "description." I have not experimented much yet.

This does support some of the other operators / qualifiers. For example, geolocation (from list; within 100km of Twitter HQ):

$ snscrape --jsonl --max-results 10 twitter-search --user "elephant geocode:37.7,-122.4,100km lang:eng since:2022-12-31" > out_file.json
Answered By: J Prestone
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.