Find all hashtags

Question:

I have a class Tweet that contains several tweets. Then there’s a list that contains all the tweets. These tweets also have users, amount of retweets and age, which is not relevant to my question. Only content matters.

  • tweet1 = Tweet("@realDonaldTrump", "Despite the negative press covfefe #bigsmart", 1249, 54303)

  • tweet2 = Tweet("@elonmusk", "Technically, alcohol is a solution #bigsmart", 366.4, 166500)

  • tweet3 = Tweet("@CIA", "We can neither confirm nor deny that this is our first tweet. #heart", 2192, 284200)

  • tweets = [tweet1, tweet2, tweet3]

I need to get a list of all the hashtags, but I only get the one from the 1st tweet with my code.

for x in tweets:
    return re.findall(r'#w+', x.content)
Asked By: babygroot

||

Answers:

You are returning after the first iteration of the loop. You need to go through all tweets and add the hastags to a list:

def get_hashtags(tweets):
    result = []
    for x in tweets:
        result.extend(re.findall(r'#w+', x.content))
    return result

For sorting, you can use a defaultdict to add up the reweets. Then, sort by the count.

from collections import defaultdict

def get_hashtags_sorted(tweets):
    result = defaultdict(int)
    for x in tweets:
        for hashtag in re.findall(r'#w+', x.content):
            result[hashtag] += x.retweets
    sorted_hashtags = sorted(tweets.items(), key=lambda x: x[1])
    return list(sorted_hashtags)
Answered By: jprebys
Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.