KeyError: "['title'] not in index"
Question:
I am using the netlix_dataset to learn python. I am trying to split director names based on comma and create new dataframe. I am getting the error, saying title is not in index. Please help.
show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
df =pd.read_csv("/content/sample_data/netflix_titles.csv",index_col=0)
splitters = df['director'].apply(lambda x: str(x).split(',')).tolist()
df_director = pd.DataFrame(splitters,index=df['title']).stack()
df_director3 = pd.DataFrame(df_director)
df_director3.reset_index()
df_directors = df_director3[['title',0]]
df_directors
KeyError: "[‘title’] not in index"
Answers:
I assume that you want to split the director names into commas and create a new dataframe with the title and each individual director name.
Try to use reset_index() after stack() to make the index a regular column again, and then use rename() to give the new column a name:
# Split the director names on commas
splitters = df['director'].apply(lambda x: str(x).split(',')).tolist()
# Create a new DataFrame with stacked directors
df_director = pd.DataFrame(splitters, index=df['title']).stack().reset_index()
df_director = df_director.rename(columns={'level_1': 'director_index', 0: 'director'})
# Select the title and director columns
df_directors = df_director[['title', 'director']]
Here, we’re renaming the column level_1 (which was created by stack()) to director_index, and the column 0 to director. Finally, we’re selecting only the columns we want (title and director) in df_directors.
Assuming the input is:
{'title': ['Stranger Things', 'The Crown', 'House of Cards'],
'director': ['Matt Duffer, Ross Duffer', 'Peter Morgan', 'Beau Willimon, David Fincher, Joel Schumacher']}
The output should be something like this:
title director
0 Stranger Things Matt Duffer
1 Stranger Things Ross Duffer
2 The Crown Peter Morgan
3 House of Cards Beau Willimon
4 House of Cards David Fincher
5 House of Cards Joel Schumacher
BTW, try to avoid posting data as images.
I am using the netlix_dataset to learn python. I am trying to split director names based on comma and create new dataframe. I am getting the error, saying title is not in index. Please help.
show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
df =pd.read_csv("/content/sample_data/netflix_titles.csv",index_col=0)
splitters = df['director'].apply(lambda x: str(x).split(',')).tolist()
df_director = pd.DataFrame(splitters,index=df['title']).stack()
df_director3 = pd.DataFrame(df_director)
df_director3.reset_index()
df_directors = df_director3[['title',0]]
df_directors
KeyError: "[‘title’] not in index"
I assume that you want to split the director names into commas and create a new dataframe with the title and each individual director name.
Try to use reset_index() after stack() to make the index a regular column again, and then use rename() to give the new column a name:
# Split the director names on commas
splitters = df['director'].apply(lambda x: str(x).split(',')).tolist()
# Create a new DataFrame with stacked directors
df_director = pd.DataFrame(splitters, index=df['title']).stack().reset_index()
df_director = df_director.rename(columns={'level_1': 'director_index', 0: 'director'})
# Select the title and director columns
df_directors = df_director[['title', 'director']]
Here, we’re renaming the column level_1 (which was created by stack()) to director_index, and the column 0 to director. Finally, we’re selecting only the columns we want (title and director) in df_directors.
Assuming the input is:
{'title': ['Stranger Things', 'The Crown', 'House of Cards'],
'director': ['Matt Duffer, Ross Duffer', 'Peter Morgan', 'Beau Willimon, David Fincher, Joel Schumacher']}
The output should be something like this:
title director
0 Stranger Things Matt Duffer
1 Stranger Things Ross Duffer
2 The Crown Peter Morgan
3 House of Cards Beau Willimon
4 House of Cards David Fincher
5 House of Cards Joel Schumacher
BTW, try to avoid posting data as images.