Pandas: How to add a character to the front of each item in a list?
Question:
I have a pandas data frame with different row characters in column x, and I want to add these characters to the front of each item in the row’s corresponding list.
Here is my pandas df:
df_1 = pd.DataFrame({'x' : ['a', 'b', 'c'], 'y' : [[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]]})
x y
a [1,2,3,4]
b [5,6,7,8]
c [9,10,11,12]
This is what I would like the expected output to look like:
x y
a [a 1,a 2,a 3,a 4]
b [b 5,b 6,b 7,b 8]
c [c 9,c 10,c 11,c 12]
How do I loop through the data frame and add the character in the x column to the front of each item in the corresponding list in column y?
Thanks!
Answers:
Just apply on axis=1
, with a lambda function with list-comprehension to perform the addition of the values, you need to type case each item in the list in y
column to string, or use f-string.
df_1['y']=df_1.apply(lambda x: [f"{x['x']} {i}" for i in x['y']], axis=1)
OUTPUT:
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
solution 1 (strings)
You can explode
‘y’ to convert to many rows, combine ‘x’ and ‘y’ using assign
, and reshape back to single rows using groupby
+apply
:
(df_1.explode('y')
.assign(y=lambda d: d['x']+' '+d['y'].astype(str))
.groupby('x')['y']
.apply(list)
.reset_index()
)
output:
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
solution 2 (combined lists)
from itertools import chain
(df_1.explode('y')
.apply(list, axis=1)
.groupby(level=0)
.apply(lambda x: list(chain(*x)))
)
output:
0 [a, 1, a, 2, a, 3, a, 4]
1 [b, 5, b, 6, b, 7, b, 8]
2 [c, 9, c, 10, c, 11, c, 12]
Try:
df['y'] = df.explode('y')
.apply(lambda r: f"{r['x']} {r['y']}", axis=1)
.groupby(level=0)
.apply(list)
Output:
>>> df
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
Try with apply
:
>>> df_1['y'] = df_1.apply(lambda x: [*map(x[0].__add__(' ').__add__, map(str, x[1]))], axis=1)
>>> df_1
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
>>>
Or if you don’t need the space, try:
>>> df_1['y'] = df_1.apply(lambda x: [*map(x[0].__add__, map(str, x[1]))], axis=1)
>>> df_1
x y
0 a [a1, a2, a3, a4]
1 b [b5, b6, b7, b8]
2 c [c9, c10, c11, c12]
>>>
List comprehension with f-string
s should be very fast:
df_1['y'] = [[f'{x} {i}' for i in y] for x, y in df_1[['x','y']].to_numpy()]
print (df_1)
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
Performance for 30k rows:
df_1 = pd.DataFrame({'x' : ['a', 'b', 'c'], 'y' : [[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]]})
df_1 = pd.concat([df_1] * 10000, ignore_index=True)
%timeit df_1.explode('y').apply(lambda r: f"{r['x']} {r['y']}", axis=1).groupby(level=0).apply(list)
2.84 s ± 823 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df_1.apply(lambda x: [f"{x['x']} {i}" for i in x['y']], axis=1)
730 ms ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df_1.apply(lambda x: [*map(x[0].__add__(' ').__add__, map(str, x[1]))], axis=1)
376 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df_1.explode('y').assign(y=lambda d: d['x']+' '+d['y'].astype(str)).groupby('x')['y'].apply(list)
#failed with KeyError: ' y', not idea why :(
%timeit [[f'{x} {i}' for i in y] for x, y in df_1[['x','y']].to_numpy()]
76.3 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
ss2=df_1.explode('y').apply(lambda ss:"{} {}".format(ss.x,ss.y),axis=1).groupby(level=0).agg(list)
df_1.assign(y=ss2)
out
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
I have a pandas data frame with different row characters in column x, and I want to add these characters to the front of each item in the row’s corresponding list.
Here is my pandas df:
df_1 = pd.DataFrame({'x' : ['a', 'b', 'c'], 'y' : [[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]]})
x y
a [1,2,3,4]
b [5,6,7,8]
c [9,10,11,12]
This is what I would like the expected output to look like:
x y
a [a 1,a 2,a 3,a 4]
b [b 5,b 6,b 7,b 8]
c [c 9,c 10,c 11,c 12]
How do I loop through the data frame and add the character in the x column to the front of each item in the corresponding list in column y?
Thanks!
Just apply on axis=1
, with a lambda function with list-comprehension to perform the addition of the values, you need to type case each item in the list in y
column to string, or use f-string.
df_1['y']=df_1.apply(lambda x: [f"{x['x']} {i}" for i in x['y']], axis=1)
OUTPUT:
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
solution 1 (strings)
You can explode
‘y’ to convert to many rows, combine ‘x’ and ‘y’ using assign
, and reshape back to single rows using groupby
+apply
:
(df_1.explode('y')
.assign(y=lambda d: d['x']+' '+d['y'].astype(str))
.groupby('x')['y']
.apply(list)
.reset_index()
)
output:
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
solution 2 (combined lists)
from itertools import chain
(df_1.explode('y')
.apply(list, axis=1)
.groupby(level=0)
.apply(lambda x: list(chain(*x)))
)
output:
0 [a, 1, a, 2, a, 3, a, 4]
1 [b, 5, b, 6, b, 7, b, 8]
2 [c, 9, c, 10, c, 11, c, 12]
Try:
df['y'] = df.explode('y')
.apply(lambda r: f"{r['x']} {r['y']}", axis=1)
.groupby(level=0)
.apply(list)
Output:
>>> df
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
Try with apply
:
>>> df_1['y'] = df_1.apply(lambda x: [*map(x[0].__add__(' ').__add__, map(str, x[1]))], axis=1)
>>> df_1
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
>>>
Or if you don’t need the space, try:
>>> df_1['y'] = df_1.apply(lambda x: [*map(x[0].__add__, map(str, x[1]))], axis=1)
>>> df_1
x y
0 a [a1, a2, a3, a4]
1 b [b5, b6, b7, b8]
2 c [c9, c10, c11, c12]
>>>
List comprehension with f-string
s should be very fast:
df_1['y'] = [[f'{x} {i}' for i in y] for x, y in df_1[['x','y']].to_numpy()]
print (df_1)
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]
Performance for 30k rows:
df_1 = pd.DataFrame({'x' : ['a', 'b', 'c'], 'y' : [[1, 2, 3, 4],[5, 6, 7, 8],[9, 10, 11, 12]]})
df_1 = pd.concat([df_1] * 10000, ignore_index=True)
%timeit df_1.explode('y').apply(lambda r: f"{r['x']} {r['y']}", axis=1).groupby(level=0).apply(list)
2.84 s ± 823 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df_1.apply(lambda x: [f"{x['x']} {i}" for i in x['y']], axis=1)
730 ms ± 5.46 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df_1.apply(lambda x: [*map(x[0].__add__(' ').__add__, map(str, x[1]))], axis=1)
376 ms ± 27.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df_1.explode('y').assign(y=lambda d: d['x']+' '+d['y'].astype(str)).groupby('x')['y'].apply(list)
#failed with KeyError: ' y', not idea why :(
%timeit [[f'{x} {i}' for i in y] for x, y in df_1[['x','y']].to_numpy()]
76.3 ms ± 1.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
ss2=df_1.explode('y').apply(lambda ss:"{} {}".format(ss.x,ss.y),axis=1).groupby(level=0).agg(list)
df_1.assign(y=ss2)
out
x y
0 a [a 1, a 2, a 3, a 4]
1 b [b 5, b 6, b 7, b 8]
2 c [c 9, c 10, c 11, c 12]