Manipulate lists in a pandas data frame column (e.g. divide by another column)

Question:

I have a pandas data frame with one column containing lists. I wish to divide each list element in each row by a scalar value in another column. In the following example, I wish to divide each element in a by b:

              a   b
0  [11, 22, 33]  11
1  [12, 24, 36]   2
2  [33, 66, 99]   3

Thus yielding the following result:

              a   b                   c
0  [11, 22, 33]  11     [1.0, 2.0, 3.0]
1  [12, 24, 36]   2   [6.0, 12.0, 18.0]
2  [33, 66, 99]   3  [11.0, 22.0, 33.0]

I can achieve this by the following code:

import pandas as pd

df = pd.DataFrame({"a":[[11,22,33],[12,24,36],[33,66,99]], "b" : [11,2,3]})

result = {"c":[]}
for _, row in df.iterrows():
    result["c"].append([x / row["b"] for x in row["a"]])

df_c = pd.DataFrame(result)
df = pd.concat([df,df_c], axis="columns")

But explicit iteration over rows and collecting the result in a dictionary, converting it to a dataframe and then concatenation to the original data frame seems very inefficient and inelegant.

Does anyone have a better solution?

Thanks in advance and cheeers!


PS: In case you are wondering why I would store lists in a column: These are the resulting amplitudes of a Fourier-Transformation.

Why I don’t use one column for each frequency?

  1. Creating a new column for each frequency is horribly slow
  2. With different sampling rates and FFT-window sizes in my project, there are multiple sets of frequencies.
Asked By: bolla

||

Answers:

Zip the two columns, divide each entry in col a with its corresponding entry in col b, through a combination of product and starmap, and convert the iterator back into a list.

from itertools import product,starmap
from operator import floordiv
df['c'] = [list(starmap(floordiv,(product(num,[denom])))) 
           for num, denom in zip(df.a,df.b)]


        a           b       c
0   [11, 22, 33]    11  [1, 2, 3]
1   [12, 24, 36]    2   [6, 12, 18]
2   [33, 66, 99]    3   [11, 22, 33]

Alternatively, you could just use numpy array within the iteration:

df['c'] = [list(np.array(num)/denom) for num, denom in zip(df.a,df.b)]

Thanks to @jezrael for the suggestion – all of this might be unnecessary as Scipy has something for FFT.

Answered By: sammywemmy

I would convert the lists to numpy arrays:

df['c'] = df['a'].apply(np.array) / df['b']

You will get np.arrays in column c. If you really need lists, you will have to convert them back

df['c'] = df['c'].apply(list)
Answered By: Serge Ballesta
Categories: questions Tags: , , , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.