How to extract all YouTube comments using YouTube API? (Python)

Question:

Let’s say I have a video_id having 8487 comments.
This code returns only 4309 comments.

def get_comments(youtube, video_id, comments=[], token=''):

  video_response=youtube.commentThreads().list(part='snippet',
                                               videoId=video_id,
                                               pageToken=token).execute()
  for item in video_response['items']:
        comment = item['snippet']['topLevelComment']
        text = comment['snippet']['textDisplay']
        comments.append(text)
  if "nextPageToken" in video_response: 
    return get_comments(youtube, video_id, comments, video_response['nextPageToken'])
  else:
    return comments

youtube = build('youtube', 'v3',developerKey=api_key)
comment_threads = get_comments(youtube,video_id)
print(len(comment_threads))

> 4309

How can I extract all the 8487 comments?

Asked By: Subham

||

Answers:

From the answer of commentThreads, you have to add the replies parameter in order to retrieve the replies the comments might have.

So, your request should look like this:

video_response=youtube.commentThreads().list(part='id,snippet,replies',
                                               videoId=video_id,
                                               pageToken=token).execute()

Then, modify your code accordingly for read the replies of the comments.

In this example I made using the try-it feature available in the documentation, you can check that the reponse contains both, the top comment and its replies.


Edit (08/04/2022):

Create a new variable that contains the totalReplyCount that the topLevelComment might have.

Something like:

def get_comments(youtube, video_id, comments=[], token=''):

  # Stores the total reply count a top level commnet has.
  totalReplyCount = 0
  
  # Replies of the top-level comment might have.
  replies=[]

  video_response=youtube.commentThreads().list(part='snippet',
                                               videoId=video_id,
                                               pageToken=token).execute()
      for item in video_response['items']:
            comment = item['snippet']['topLevelComment']
            text = comment['snippet']['textDisplay']
            comments.append(text)

            # Get the total reply count: 
            totalReplyCount = item['snippet']['totalReplyCount']
            
            # Check if the total reply count is greater than zero, 
            # if so,call the new function "getAllTopLevelCommentReplies(topCommentId, replies, token)" 
            # and extend the "comments" returned list.
            if (totalReplyCount > 0): 
               comments.extend(getAllTopLevelCommentReplies(comment['id'], replies, None)) 
               
            # Clear variable - just in case - not sure if need due "get_comments" function initializes the variable.
            replies = []

      if "nextPageToken" in video_response: 
        return get_comments(youtube, video_id, comments, video_response['nextPageToken'])
      else:
        return comments

Then, if the value of totalReplyCount is greater than zero, make another call using the comment.list for bring the replies the top level comment has.
For this new call, you have to pass the id of the top level comment.

Example (untested):

# Returns all replies the top-level comment has: 
# topCommentId = it's the id of the top-level comment you want to retrieve its replies.
# replies = array of replies returned by this function. 
# token = the comments.list might return moren than 100 comments, if so, use the nextPageToken for retrieve the next batch of results.
def getAllTopLevelCommentReplies(topCommentId, replies, token): 
    replies_response=youtube.comments().list(part='snippet',
                                               maxResults=100,
                                               parentId=topCommentId
                                               pageToken=token).execute()

  for item in replies_response['items']:
        # Append the reply's text to the 
        replies.append(item['snippet']['textDisplay'])

  if "nextPageToken" in replies_response: 
    return getAllTopLevelCommentReplies(topCommentId, replies, replies_response['nextPageToken'])
  else:
    return replies

Edit (11/04/2022):

I’ve added the Google Colab example I modified based on your code and it works with my video example (ouf0ozwnU84) = it brings its 130 comments, but, with your video example (BaGgScV4NN8) I got 3300 of 3359.

This might be some comments could be under approval/moderation or something else I’m missing or probably there are comments too old and additional filters are needed, or the API is buggy – see here some other questions related to troubles facing with the pagination using the API – I suggest you to check this tutorial which shows code and you can change it.

Categories: questions Tags: , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.