KeyError: "['title'] not in index"


I am using the netlix_dataset to learn python. I am trying to split director names based on comma and create new dataframe. I am getting the error, saying title is not in index. Please help.

enter image description here

s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable."
s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth."
s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Action & Adventure","To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war."
s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series."
df =pd.read_csv("/content/sample_data/netflix_titles.csv",index_col=0)

splitters = df['director'].apply(lambda x: str(x).split(',')).tolist()

df_director = pd.DataFrame(splitters,index=df['title']).stack()

df_director3 = pd.DataFrame(df_director)


df_directors = df_director3[['title',0]]


KeyError: "[‘title’] not in index"

enter image description here

Asked By: logeeks



I assume that you want to split the director names into commas and create a new dataframe with the title and each individual director name.

Try to use reset_index() after stack() to make the index a regular column again, and then use rename() to give the new column a name:

# Split the director names on commas
splitters = df['director'].apply(lambda x: str(x).split(',')).tolist()

# Create a new DataFrame with stacked directors
df_director = pd.DataFrame(splitters, index=df['title']).stack().reset_index()
df_director = df_director.rename(columns={'level_1': 'director_index', 0: 'director'})

# Select the title and director columns
df_directors = df_director[['title', 'director']]

Here, we’re renaming the column level_1 (which was created by stack()) to director_index, and the column 0 to director. Finally, we’re selecting only the columns we want (title and director) in df_directors.

Assuming the input is:

{'title': ['Stranger Things', 'The Crown', 'House of Cards'],
    'director': ['Matt Duffer, Ross Duffer', 'Peter Morgan', 'Beau Willimon, David Fincher, Joel Schumacher']}

The output should be something like this:

                title            director
0  Stranger Things         Matt Duffer
1  Stranger Things         Ross Duffer
2        The Crown        Peter Morgan
3   House of Cards       Beau Willimon
4   House of Cards       David Fincher
5   House of Cards     Joel Schumacher

BTW, try to avoid posting data as images.

Answered By: JCTL
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.