Column of lists, convert list to string as a new column
Question:
I have a dataframe with a column of lists which can be created with:
import pandas as pd
lists={1:[[1,2,12,6,'ABC']],2:[[1000,4,'z','a']]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
The dataframe df
looks like:
lists
1 [1, 2, 12, 6, ABC]
2 [1000, 4, z, a]
I need to create a new column called ‘liststring
‘ which takes every element of each list in lists
and creates a string with each element separated by commas. The elements of each list can be int
, float
, or string
. So the result would be:
lists liststring
1 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
2 [1000, 4, z, a] 1000,4,z,a
I have tried various things, including from How do I convert a list in a Pandas DF into a string?:
df['liststring']=df.lists.apply(lambda x: ', '.join(str(x)))
but unfortunately the result takes every character and seperates by comma:
lists liststring
1 [1, 2, 12, 6, ABC] [, 1, ,, , 2, ,, , 1, 2, ,, , 6, ,, , ', A...
2 [1000, 4, z, a] [, 1, 0, 0, 0, ,, , 4, ,, , ', z, ', ,, , '...
Answers:
List Comprehension
If performance is important, I strongly recommend this solution and I can explain why.
df['liststring'] = [','.join(map(str, l)) for l in df['lists']]
df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a
You can extend this to more complicated use cases using a function.
def try_join(l):
try:
return ','.join(map(str, l))
except TypeError:
return np.nan
df['liststring'] = [try_join(l) for l in df['lists']]
Series.apply
/Series.agg
with ','.join
You need to convert your list items to strings first, that’s where the map
comes in handy.
df['liststring'] = df['lists'].apply(lambda x: ','.join(map(str, x)))
Or,
df['liststring'] = df['lists'].agg(lambda x: ','.join(map(str, x)))
<!- >
df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a
pd.DataFrame
constructor with DataFrame.agg
A non-loopy/non-lambda solution.
df['liststring'] = (pd.DataFrame(df.lists.tolist())
.fillna('')
.astype(str)
.agg(','.join, 1)
.str.strip(','))
df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a
One way you could do it is to use list comprehension, str
, and join
:
df['liststring'] = df.lists.apply(lambda x: ', '.join([str(i) for i in x]))
Output:
lists liststring
1 [1, 2, 12, 6, ABC] 1, 2, 12, 6, ABC
2 [1000, 4, z, a] 1000, 4, z, a
The previous explanations are well and quite straight forward. But let say if you want to convert multiple columns to string separated format. Without going into individual columns you can apply the following function to dataframe and if any column is a list then it will convert to string format.
def list2Str(lst):
if type(lst) is list: # apply conversion to list columns
return";".join(lst)
else:
return lst
df.apply(lambda x: [list2Str(i) for i in x])
of course, if you want to apply only to certain columns then you can select
the subset of columns as follows
df[['col1',...,'col2']].apply(lambda x: [list2Str(i) for i in x])
All of these didn’t work for me (dealing with text data) what worked for me is this:
df['liststring'] = df['lists'].apply(lambda x: x[1:-1])
Pipe:
import pandas as pd
lists={1:[[1,2,12,6,'ABC']],2:[[1000,4,'z','a']]}
#create test dataframe
(
pd.DataFrame.from_dict(lists,orient='index', columns=['lists'])
.assign(liststring=lambda x: x.lists.astype(str).str[1:-1])
)
Output:
lists liststring
1 [1, 2, 12, 6, ABC] 1, 2, 12, 6, 'ABC'
2 [1000, 4, z, a] 1000, 4, 'z', 'a'
Since we’re returning a series the same length as our input and only using one series as input, pd.transform immediately came to mind. This worked for me:
df['liststring'] = (
df['lists']
.transform(
lambda x: ",".join(map(str,x))
)
)
This returns
lists liststring
1 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
2 [1000, 4, z, a] 1000,4,z,a
Many thanks to others for the map() fix on the join. Others can also cite the performance benefits better than me, but I believe transform is in general more performant than apply(), but I’m not sure about the list comprehension comparison.
I have a dataframe with a column of lists which can be created with:
import pandas as pd
lists={1:[[1,2,12,6,'ABC']],2:[[1000,4,'z','a']]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
The dataframe df
looks like:
lists
1 [1, 2, 12, 6, ABC]
2 [1000, 4, z, a]
I need to create a new column called ‘liststring
‘ which takes every element of each list in lists
and creates a string with each element separated by commas. The elements of each list can be int
, float
, or string
. So the result would be:
lists liststring
1 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
2 [1000, 4, z, a] 1000,4,z,a
I have tried various things, including from How do I convert a list in a Pandas DF into a string?:
df['liststring']=df.lists.apply(lambda x: ', '.join(str(x)))
but unfortunately the result takes every character and seperates by comma:
lists liststring
1 [1, 2, 12, 6, ABC] [, 1, ,, , 2, ,, , 1, 2, ,, , 6, ,, , ', A...
2 [1000, 4, z, a] [, 1, 0, 0, 0, ,, , 4, ,, , ', z, ', ,, , '...
List Comprehension
If performance is important, I strongly recommend this solution and I can explain why.
df['liststring'] = [','.join(map(str, l)) for l in df['lists']]
df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a
You can extend this to more complicated use cases using a function.
def try_join(l):
try:
return ','.join(map(str, l))
except TypeError:
return np.nan
df['liststring'] = [try_join(l) for l in df['lists']]
Series.apply
/Series.agg
with ','.join
You need to convert your list items to strings first, that’s where the map
comes in handy.
df['liststring'] = df['lists'].apply(lambda x: ','.join(map(str, x)))
Or,
df['liststring'] = df['lists'].agg(lambda x: ','.join(map(str, x)))
<!- >
df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a
pd.DataFrame
constructor with DataFrame.agg
A non-loopy/non-lambda solution.
df['liststring'] = (pd.DataFrame(df.lists.tolist())
.fillna('')
.astype(str)
.agg(','.join, 1)
.str.strip(','))
df
lists liststring
0 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
1 [1000, 4, z, a] 1000,4,z,a
One way you could do it is to use list comprehension, str
, and join
:
df['liststring'] = df.lists.apply(lambda x: ', '.join([str(i) for i in x]))
Output:
lists liststring
1 [1, 2, 12, 6, ABC] 1, 2, 12, 6, ABC
2 [1000, 4, z, a] 1000, 4, z, a
The previous explanations are well and quite straight forward. But let say if you want to convert multiple columns to string separated format. Without going into individual columns you can apply the following function to dataframe and if any column is a list then it will convert to string format.
def list2Str(lst):
if type(lst) is list: # apply conversion to list columns
return";".join(lst)
else:
return lst
df.apply(lambda x: [list2Str(i) for i in x])
of course, if you want to apply only to certain columns then you can select
the subset of columns as follows
df[['col1',...,'col2']].apply(lambda x: [list2Str(i) for i in x])
All of these didn’t work for me (dealing with text data) what worked for me is this:
df['liststring'] = df['lists'].apply(lambda x: x[1:-1])
Pipe:
import pandas as pd
lists={1:[[1,2,12,6,'ABC']],2:[[1000,4,'z','a']]}
#create test dataframe
(
pd.DataFrame.from_dict(lists,orient='index', columns=['lists'])
.assign(liststring=lambda x: x.lists.astype(str).str[1:-1])
)
Output:
lists liststring
1 [1, 2, 12, 6, ABC] 1, 2, 12, 6, 'ABC'
2 [1000, 4, z, a] 1000, 4, 'z', 'a'
Since we’re returning a series the same length as our input and only using one series as input, pd.transform immediately came to mind. This worked for me:
df['liststring'] = (
df['lists']
.transform(
lambda x: ",".join(map(str,x))
)
)
This returns
lists liststring
1 [1, 2, 12, 6, ABC] 1,2,12,6,ABC
2 [1000, 4, z, a] 1000,4,z,a
Many thanks to others for the map() fix on the join. Others can also cite the performance benefits better than me, but I believe transform is in general more performant than apply(), but I’m not sure about the list comprehension comparison.