Can a list comprehension be divided in two lists?

Question:

I think I’ve caught the idea of one-line for loop, but now I have a problem. I know I can define a dataframe column using this like:

df = pd.DataFrame(columns=["columnA"])

list = [0, 1, 2, 3, 4]

df["columnA"] = [i for i in list]

Now my question is: Is it possible to define 2 columns in a one-line for loop?

I’ve tried this:

df["columnA"], df["columnB"] = [i, i**2 for i in list]
df["columnA"], df["columnB"] = [[i, i**2] for i in list]

None of this worked. I’m using Python 3.10

Asked By: darioeu

||

Answers:

You have to zip your output:

df['A'], df['B'] = zip(*[(i, i**2) for i in lst])
print(df)

# Output
   A   B
0  0   0
1  1   1
2  2   4
3  3   9
4  4  16

You can also use np.array:

df[['A', 'B']] = np.array([(i, i**2) for i in lst])
Answered By: Corralien

Right now your code is overwriting what’s in Column A.

df["columnB"], df['columnA'] = [i**2 for i in list], [i for i in list]

The above answer is much better than mine. Learned something new today.

Answered By: MichaelB

Here is my solution to your problem:

1: Column creation

Create the column with the dataframe, it is much faster than adding the column later

list = [0, 1, 2, 3, 4]
df = pd.DataFrame({
    "columnA":list,
    "columnB":[i**2 for i in list]
})

By testing it with %%timeit we obtain:

161 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Now, lets check your version:

df = pd.DataFrame(columns=["columnA"])

list = [0, 1, 2, 3, 4]

df["columnA"] = [i for i in list]
df["columnB"] = [i**2 for i in list]

1.58 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Your version is more or less 10000x slower.

2: Using .assign

If you cannot create all columns when the dataframe is created, you can create multiple columns with a single method by using .assign:

df = pd.DataFrame({
    "columnA" :[i for i in list]
}).assign(
    columnB = [i**2 for i in list],
    columnC = [i**3 for i in list]
)

3: Single for

If you really want to use a single for, you can build the data first and the dataframe later:

data = [
    {
        "columnA":i,
        "columnB":i**2
    } for i in list
]
df = pd.DataFrame(data)

Finally, list is already a python keyword, so you should avoid avoid overwriting it. You will lose access to the actual function and type, so these wont work:

list(iter([1,2,3])) (converts an interable into a list)

isinstance([1,2,3],list) (checks that the variable is of the list type)

Answered By: Nilo Araujo