How do I split a single column dataframe into multiple columns by index
Question:
I’ve browsed a few answers but haven’t found the exact thing i’m looking for yet.
I have a pandas dataframe with a single column structured as follows (example)
0 alex
1 7
2 female
3 nora
4 3
5 female
...
999 fred
1000 15
1001 male
i want to split that single column into 3 columns holding name, age, and gender. to look something like this:
name age gender
0 alex 7 female
1 nora 3 female
...
100 fred 15 male
is there a way to do this? i was thinking about using the index but not sure how to actually do it
Answers:
Not the most efficient solution perhaps, but you can use pd.concat()
and put them all next to each other, if they’re always in order:
df = pd.DataFrame({'Value':['alex',7,'female','nora',3,'female','fred',15,'male']})
df2 = pd.concat([df[(df.index + x) % 3 == 0].reset_index(drop=True) for x in range(3)],axis=1)
df2.columns = ["name", "gender", "age"]
Returns:
name gender age
0 alex female 7
1 nora female 3
2 fred male 15
Consider unstack
:
import pandas as pd
df = pd.DataFrame(["alex", 7, "female", "nora", 3, "female", "fred", 15, "male"])
people = range(len(df) // 3)
attributes = ["name", "age", "gender"]
multi_index = pd.MultiIndex.from_product([people, attributes])
df.set_index(multi_index).unstack(level=1).droplevel(level=0, axis=1).reindex(columns=attributes)
Output:
name age gender
0 alex 7 female
1 nora 3 female
2 fred 15 male
assuming "0" is your column name:
list_a = list(df[0])
a = np.array(list_a).reshape(-1, 3).tolist()
df2= pd.DataFrame(a,columns = ["name", "age","gender"])
here is one way to do it
# step through the DF and get values for name, age and gender as series
# each starts from 0, 1 and 3
name=df['Value'][::3].values
age=df['Value'][1::3].values
gender=df['Value'][2::3].values
# create a DF based on the values
out=pd.DataFrame({'name': name,
'age' : age,
'gender': gender})
out
name age gender
0 alex 7 female
1 nora 3 female
2 fred 15 male
I’ve browsed a few answers but haven’t found the exact thing i’m looking for yet.
I have a pandas dataframe with a single column structured as follows (example)
0 alex
1 7
2 female
3 nora
4 3
5 female
...
999 fred
1000 15
1001 male
i want to split that single column into 3 columns holding name, age, and gender. to look something like this:
name age gender
0 alex 7 female
1 nora 3 female
...
100 fred 15 male
is there a way to do this? i was thinking about using the index but not sure how to actually do it
Not the most efficient solution perhaps, but you can use pd.concat()
and put them all next to each other, if they’re always in order:
df = pd.DataFrame({'Value':['alex',7,'female','nora',3,'female','fred',15,'male']})
df2 = pd.concat([df[(df.index + x) % 3 == 0].reset_index(drop=True) for x in range(3)],axis=1)
df2.columns = ["name", "gender", "age"]
Returns:
name gender age
0 alex female 7
1 nora female 3
2 fred male 15
Consider unstack
:
import pandas as pd
df = pd.DataFrame(["alex", 7, "female", "nora", 3, "female", "fred", 15, "male"])
people = range(len(df) // 3)
attributes = ["name", "age", "gender"]
multi_index = pd.MultiIndex.from_product([people, attributes])
df.set_index(multi_index).unstack(level=1).droplevel(level=0, axis=1).reindex(columns=attributes)
Output:
name age gender
0 alex 7 female
1 nora 3 female
2 fred 15 male
assuming "0" is your column name:
list_a = list(df[0])
a = np.array(list_a).reshape(-1, 3).tolist()
df2= pd.DataFrame(a,columns = ["name", "age","gender"])
here is one way to do it
# step through the DF and get values for name, age and gender as series
# each starts from 0, 1 and 3
name=df['Value'][::3].values
age=df['Value'][1::3].values
gender=df['Value'][2::3].values
# create a DF based on the values
out=pd.DataFrame({'name': name,
'age' : age,
'gender': gender})
out
name age gender
0 alex 7 female
1 nora 3 female
2 fred 15 male