Python dataframe separate cell values containing lists
Question:
I have a dataframe df
:
0 1 2
Mon ['x','y','z'] ['a','b','c'] ['a','b','c']
Tue ['a','b','c'] ['a','b','c'] ['x','y','z']
Wed ['a','b','c'] ['a','b','c'] ['a','b','c']
Lists are all of diff from each other (Maybe similar too) and I wish to convert it to the form:
0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c
Referring to some previous SO questions, Explode lists with different lengths in Pandas,
Split (explode) pandas dataframe string entry to separate rows
I have tried to use their solutions but I am unable to get the desired output. How can I achieve this?
s1 = df[0]
s2 = df[1]
s3 = df[2]
i1 = np.arange(len(df)).repeat(s1.str.len())
i2 = np.arange(len(df)).repeat(s2.str.len())
i3 = np.arange(len(df)).repeat(s3.str.len())
df.iloc[i1, :-1].assign(**{'Shared Codes': np.concatenate(s1.values)})
df.iloc[i2, :-1].assign(**{'Shared Codes': np.concatenate(s2.values)})
df.iloc[i3, :-1].assign(**{'Shared Codes': np.concatenate(s3.values)})
Also, this doesn’t seem like a very reasonable way to do it, provided I have even more columns. Using python 2.7.
Answers:
This is one way using itertools.chain
and numpy.repeat
:
import pandas as pd, numpy as np
from itertools import chain
df = pd.DataFrame({0: [['x', 'y', 'z'], ['a', 'b', 'c'], ['a', 'b', 'c']],
1: [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']],
2: [['a', 'b', 'c'], ['x', 'y', 'z'], ['a', 'b', 'c']]},
index=['Mon', 'Tue', 'Wed'])
res = pd.DataFrame({k: list(chain.from_iterable(df[k])) for k in df},
index=np.repeat(df.index, list(map(len, df[0]))))
print(res)
# 0 1 2
# Mon x a a
# Mon y b b
# Mon z c c
# Tue a a x
# Tue b b y
# Tue c c z
# Wed a a a
# Wed b b b
# Wed c c c
A simple iteration might help if the columns contain list made up of 3 elements each i.e :
ndf = pd.concat([df.apply(lambda x : [i[j] for i in x],1) for j in range(3)]).sort_index()
0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c
I’d do it this way:
dfs = []
for day in df.index:
part = pd.DataFrame(df.loc[day].tolist()).T
part.index = np.repeat(day, len(df.columns))
dfs.append(part)
result = pd.concat(dfs)
from pandas import DataFrame
import numpy as np
import numpy as np
import pandas as pd
import pandas as pd
df = pd.DataFrame({0: [['x', 'y', 'z'], ['a', 'b', 'c'], ['a', 'b', 'c']],
1: [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']],
2: [['a', 'b', 'c'], ['x', 'y', 'z'], ['a', 'b', 'c']]},
index=['Mon', 'Tue', 'Wed'])
print(df)
"""
0 1 2
Mon [x, y, z] [a, b, c] [a, b, c]
Tue [a, b, c] [a, b, c] [x, y, z]
Wed [a, b, c] [a, b, c] [a, b, c]
"""
idx = df.index.repeat(df.apply(len))
print(idx)
"""
Index(['Mon', 'Mon', 'Mon', 'Tue', 'Tue', 'Tue', 'Wed', 'Wed', 'Wed'], dtype='object')
"""
res = pd.DataFrame(df.explode([0,1,2]) , index = df.index.repeat(df.apply(len)))
print(res)
"""
0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c
"""
I have a dataframe df
:
0 1 2
Mon ['x','y','z'] ['a','b','c'] ['a','b','c']
Tue ['a','b','c'] ['a','b','c'] ['x','y','z']
Wed ['a','b','c'] ['a','b','c'] ['a','b','c']
Lists are all of diff from each other (Maybe similar too) and I wish to convert it to the form:
0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c
Referring to some previous SO questions, Explode lists with different lengths in Pandas,
Split (explode) pandas dataframe string entry to separate rows
I have tried to use their solutions but I am unable to get the desired output. How can I achieve this?
s1 = df[0]
s2 = df[1]
s3 = df[2]
i1 = np.arange(len(df)).repeat(s1.str.len())
i2 = np.arange(len(df)).repeat(s2.str.len())
i3 = np.arange(len(df)).repeat(s3.str.len())
df.iloc[i1, :-1].assign(**{'Shared Codes': np.concatenate(s1.values)})
df.iloc[i2, :-1].assign(**{'Shared Codes': np.concatenate(s2.values)})
df.iloc[i3, :-1].assign(**{'Shared Codes': np.concatenate(s3.values)})
Also, this doesn’t seem like a very reasonable way to do it, provided I have even more columns. Using python 2.7.
This is one way using itertools.chain
and numpy.repeat
:
import pandas as pd, numpy as np
from itertools import chain
df = pd.DataFrame({0: [['x', 'y', 'z'], ['a', 'b', 'c'], ['a', 'b', 'c']],
1: [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']],
2: [['a', 'b', 'c'], ['x', 'y', 'z'], ['a', 'b', 'c']]},
index=['Mon', 'Tue', 'Wed'])
res = pd.DataFrame({k: list(chain.from_iterable(df[k])) for k in df},
index=np.repeat(df.index, list(map(len, df[0]))))
print(res)
# 0 1 2
# Mon x a a
# Mon y b b
# Mon z c c
# Tue a a x
# Tue b b y
# Tue c c z
# Wed a a a
# Wed b b b
# Wed c c c
A simple iteration might help if the columns contain list made up of 3 elements each i.e :
ndf = pd.concat([df.apply(lambda x : [i[j] for i in x],1) for j in range(3)]).sort_index()
0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c
I’d do it this way:
dfs = []
for day in df.index:
part = pd.DataFrame(df.loc[day].tolist()).T
part.index = np.repeat(day, len(df.columns))
dfs.append(part)
result = pd.concat(dfs)
from pandas import DataFrame
import numpy as np
import numpy as np
import pandas as pd
import pandas as pd
df = pd.DataFrame({0: [['x', 'y', 'z'], ['a', 'b', 'c'], ['a', 'b', 'c']],
1: [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']],
2: [['a', 'b', 'c'], ['x', 'y', 'z'], ['a', 'b', 'c']]},
index=['Mon', 'Tue', 'Wed'])
print(df)
"""
0 1 2
Mon [x, y, z] [a, b, c] [a, b, c]
Tue [a, b, c] [a, b, c] [x, y, z]
Wed [a, b, c] [a, b, c] [a, b, c]
"""
idx = df.index.repeat(df.apply(len))
print(idx)
"""
Index(['Mon', 'Mon', 'Mon', 'Tue', 'Tue', 'Tue', 'Wed', 'Wed', 'Wed'], dtype='object')
"""
res = pd.DataFrame(df.explode([0,1,2]) , index = df.index.repeat(df.apply(len)))
print(res)
"""
0 1 2
Mon x a a
Mon y b b
Mon z c c
Tue a a x
Tue b b y
Tue c c z
Wed a a a
Wed b b b
Wed c c c
"""