Get the most common word in a MySQL table using Python
Question:
I have a table containing full of movie genre, like this:
id | genre
---+----------------------------
1 | Drama, Romance, War
2 | Drama, Musical, Romance
3 | Adventure, Biography, Drama
Im looking for a way to get the most common word in the whole genre column and return it to a variable for further step in python.
I’m new to Python so I really don’t know how to do it. Currently, I have these lines to connect to the database but don’t know the way to get the most common word mentioned above.
conn = mysql.connect()
cursor = conn.cursor()
most_common_word = cursor.execute()
cursor.close()
conn.close()
Answers:
First you need get list of words in each column. i.e create another table like
genre_words(genre_id bigint, word varchar(50))
For clues how to do that you may check this question:
SQL split values to multiple rows
You can do that as temporary table if you wish or use transaction and rollback. Which one to choose depend of your data size and PC on which DB running.
After that query will be really simple
select count(*) as c, word from genre_word group by word order by count(*) desc limit 1;
You also can do it using python, but if so it will not be a MySQL question at all. Need read table, create simple list of word+counter. If it new, add it, if exist – increase counter.
from collections import Counter
# Connect to database and get rows from table
rows = ...
# Create a list to hold all of the genres
genres = []
# Loop through each row and split the genre string by the comma character
# to create a list of individual genres
for row in rows:
genre_list = row['genre'].split(',')
genres.extend(genre_list)
# Use a Counter to count the number of occurrences of each genre
genre_counts = Counter(genres)
# Get the most common genre
most_common_genre = genre_counts.most_common(1)
# Print the most common genre
print(most_common_genre)
I have a table containing full of movie genre, like this:
id | genre
---+----------------------------
1 | Drama, Romance, War
2 | Drama, Musical, Romance
3 | Adventure, Biography, Drama
Im looking for a way to get the most common word in the whole genre column and return it to a variable for further step in python.
I’m new to Python so I really don’t know how to do it. Currently, I have these lines to connect to the database but don’t know the way to get the most common word mentioned above.
conn = mysql.connect()
cursor = conn.cursor()
most_common_word = cursor.execute()
cursor.close()
conn.close()
First you need get list of words in each column. i.e create another table like
genre_words(genre_id bigint, word varchar(50))
For clues how to do that you may check this question:
SQL split values to multiple rows
You can do that as temporary table if you wish or use transaction and rollback. Which one to choose depend of your data size and PC on which DB running.
After that query will be really simple
select count(*) as c, word from genre_word group by word order by count(*) desc limit 1;
You also can do it using python, but if so it will not be a MySQL question at all. Need read table, create simple list of word+counter. If it new, add it, if exist – increase counter.
from collections import Counter
# Connect to database and get rows from table
rows = ...
# Create a list to hold all of the genres
genres = []
# Loop through each row and split the genre string by the comma character
# to create a list of individual genres
for row in rows:
genre_list = row['genre'].split(',')
genres.extend(genre_list)
# Use a Counter to count the number of occurrences of each genre
genre_counts = Counter(genres)
# Get the most common genre
most_common_genre = genre_counts.most_common(1)
# Print the most common genre
print(most_common_genre)