Can a list comprehension be divided in two lists?
Question:
I think I’ve caught the idea of one-line for loop, but now I have a problem. I know I can define a dataframe column using this like:
df = pd.DataFrame(columns=["columnA"])
list = [0, 1, 2, 3, 4]
df["columnA"] = [i for i in list]
Now my question is: Is it possible to define 2 columns in a one-line for loop?
I’ve tried this:
df["columnA"], df["columnB"] = [i, i**2 for i in list]
df["columnA"], df["columnB"] = [[i, i**2] for i in list]
None of this worked. I’m using Python 3.10
Answers:
You have to zip
your output:
df['A'], df['B'] = zip(*[(i, i**2) for i in lst])
print(df)
# Output
A B
0 0 0
1 1 1
2 2 4
3 3 9
4 4 16
You can also use np.array
:
df[['A', 'B']] = np.array([(i, i**2) for i in lst])
Right now your code is overwriting what’s in Column A.
df["columnB"], df['columnA'] = [i**2 for i in list], [i for i in list]
The above answer is much better than mine. Learned something new today.
Here is my solution to your problem:
1: Column creation
Create the column with the dataframe, it is much faster than adding the column later
list = [0, 1, 2, 3, 4]
df = pd.DataFrame({
"columnA":list,
"columnB":[i**2 for i in list]
})
By testing it with %%timeit
we obtain:
161 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Now, lets check your version:
df = pd.DataFrame(columns=["columnA"])
list = [0, 1, 2, 3, 4]
df["columnA"] = [i for i in list]
df["columnB"] = [i**2 for i in list]
1.58 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Your version is more or less 10000x slower.
2: Using .assign
If you cannot create all columns when the dataframe is created, you can create multiple columns with a single method by using .assign:
df = pd.DataFrame({
"columnA" :[i for i in list]
}).assign(
columnB = [i**2 for i in list],
columnC = [i**3 for i in list]
)
3: Single for
If you really want to use a single for, you can build the data first and the dataframe later:
data = [
{
"columnA":i,
"columnB":i**2
} for i in list
]
df = pd.DataFrame(data)
Finally, list
is already a python keyword, so you should avoid avoid overwriting it. You will lose access to the actual function and type, so these wont work:
list(iter([1,2,3]))
(converts an interable into a list)
isinstance([1,2,3],list)
(checks that the variable is of the list type)
I think I’ve caught the idea of one-line for loop, but now I have a problem. I know I can define a dataframe column using this like:
df = pd.DataFrame(columns=["columnA"])
list = [0, 1, 2, 3, 4]
df["columnA"] = [i for i in list]
Now my question is: Is it possible to define 2 columns in a one-line for loop?
I’ve tried this:
df["columnA"], df["columnB"] = [i, i**2 for i in list]
df["columnA"], df["columnB"] = [[i, i**2] for i in list]
None of this worked. I’m using Python 3.10
You have to zip
your output:
df['A'], df['B'] = zip(*[(i, i**2) for i in lst])
print(df)
# Output
A B
0 0 0
1 1 1
2 2 4
3 3 9
4 4 16
You can also use np.array
:
df[['A', 'B']] = np.array([(i, i**2) for i in lst])
Right now your code is overwriting what’s in Column A.
df["columnB"], df['columnA'] = [i**2 for i in list], [i for i in list]
The above answer is much better than mine. Learned something new today.
Here is my solution to your problem:
1: Column creation
Create the column with the dataframe, it is much faster than adding the column later
list = [0, 1, 2, 3, 4]
df = pd.DataFrame({
"columnA":list,
"columnB":[i**2 for i in list]
})
By testing it with %%timeit
we obtain:
161 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Now, lets check your version:
df = pd.DataFrame(columns=["columnA"])
list = [0, 1, 2, 3, 4]
df["columnA"] = [i for i in list]
df["columnB"] = [i**2 for i in list]
1.58 ms ± 72.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Your version is more or less 10000x slower.
2: Using .assign
If you cannot create all columns when the dataframe is created, you can create multiple columns with a single method by using .assign:
df = pd.DataFrame({
"columnA" :[i for i in list]
}).assign(
columnB = [i**2 for i in list],
columnC = [i**3 for i in list]
)
3: Single for
If you really want to use a single for, you can build the data first and the dataframe later:
data = [
{
"columnA":i,
"columnB":i**2
} for i in list
]
df = pd.DataFrame(data)
Finally, list
is already a python keyword, so you should avoid avoid overwriting it. You will lose access to the actual function and type, so these wont work:
list(iter([1,2,3]))
(converts an interable into a list)
isinstance([1,2,3],list)
(checks that the variable is of the list type)