How to get the most repated elements in a dataframe/array
Question:
I compiled a list of the top artists for every year across 14 years and I want to gather the top 7 for the 14 years combined so my idea was to gather them all in a dataframe then gather the most repeated artists for these years, but it didn’t work out.
#Collecting the top 7 artists across the 14 years
artists = []
year = 2020
while year >= 2006:
TAChart = billboard.ChartData('Top-Artists', year = year)
artists.append(str(TAChart))
year -= 1
len(artists)
Artists = pd.DataFrame(artists)
n = 7
Artists.value_counts().index.tolist()[:n]
Answers:
You’re very close – you just need to flatten your list of lists into a single list, then call value_counts:
artists_flat = [a for lst in artists for a in lst]
pd.Series(artists_flat).value_counts().head(n)
Your current code is counting the occurrences of entire lists (as strings), rather than individual artists.
Also, note that I used head(n) rather than indexing, as this is more robust in case there are ties for the nth place spot.
You can try something like this:
# Create a panda DataFrame using the list
List = ['AB', 'B', 'B', 'A','A', 'D', 'C','B']
df=pd.DataFrame({'Artist': List})
# Creating a new dataframe to store the values
# with appropriate column name
# value_counts() returns the count based on
df1 = df['Artist'].value_counts().to_frame()
df1 = df1.rename(columns = {'Artist':'Count'})
# Out[df1]:
# Count
# B 3
# A 2
# AB 1
# D 1
# C 1
I compiled a list of the top artists for every year across 14 years and I want to gather the top 7 for the 14 years combined so my idea was to gather them all in a dataframe then gather the most repeated artists for these years, but it didn’t work out.
#Collecting the top 7 artists across the 14 years
artists = []
year = 2020
while year >= 2006:
TAChart = billboard.ChartData('Top-Artists', year = year)
artists.append(str(TAChart))
year -= 1
len(artists)
Artists = pd.DataFrame(artists)
n = 7
Artists.value_counts().index.tolist()[:n]
You’re very close – you just need to flatten your list of lists into a single list, then call value_counts:
artists_flat = [a for lst in artists for a in lst]
pd.Series(artists_flat).value_counts().head(n)
Your current code is counting the occurrences of entire lists (as strings), rather than individual artists.
Also, note that I used head(n) rather than indexing, as this is more robust in case there are ties for the nth place spot.
You can try something like this:
# Create a panda DataFrame using the list
List = ['AB', 'B', 'B', 'A','A', 'D', 'C','B']
df=pd.DataFrame({'Artist': List})
# Creating a new dataframe to store the values
# with appropriate column name
# value_counts() returns the count based on
df1 = df['Artist'].value_counts().to_frame()
df1 = df1.rename(columns = {'Artist':'Count'})
# Out[df1]:
# Count
# B 3
# A 2
# AB 1
# D 1
# C 1