How to fill a polars dataframe from a numpy array in python
Question:
I am currently working on a dataframe function that assigns values of a numpy array of shape 2 to a given column of a dataframe using the polars library in Python.
I have a dataframe df
with the following columns : ['HZ', 'FL', 'Q']
. The column 'HZ'
takes values in [0, EC + H - 1]
and the column 'FL'
takes values in [1, F]
.
I also have a numpy array q
of shape (EC + H, F)
, and I want to assign its values to the column 'Q'
in this way :
if df[‘HZ’] >= EC, then df[‘Q’] = q[df[‘HZ’]][df[‘F’] – 1].
You can find below the pandas instruction that does exactly what I want to do.
df.loc[df['HZ'] >= EC, 'Q'] = q[df.loc[df['HZ'] >= EC, 'HZ'], df.loc[df['HZ'] >= EC, 'F'] - 1]
Now I want to do it using polars, and I tried to do it this way:
df = df.with_columns(pl.when(pl.col('HZ') >= EC).then(q[pl.col('HZ')][pl.col('F') - 1]).otherwise(pl.col('Q')).alias('Q'))
And I get the following error :
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I understand that I don’t give numpy the good format of indexes to get the corresponding value in the array, but I don’t know how to replace it to get the desired behavior.
Thanks by advance
Answers:
By test case/example I meant something like:
df = pl.DataFrame({
"HZ": [0, 0, 1, 1],
"FL": [0, 1, 2, 3],
"Q": [0, 0, 0, 0]
})
q = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
EC = 1
>>> df
shape: (4, 3)
┌─────┬─────┬─────┐
│ HZ ┆ FL ┆ Q │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 1 ┆ 0 │
│ 1 ┆ 2 ┆ 0 │
│ 1 ┆ 3 ┆ 0 │
└─────┴─────┴─────┘
The problem with your attempted approach is q[pl.col('HZ')
happens before .with_columns
executes and numpy does not understand pl.col('HZ')
One way to use the actual values to index the numpy array is by using .map
df.with_columns(Q =
pl.when(pl.col("HZ") >= EC)
.then(
pl.map(
["HZ", pl.col("FL") - 1],
lambda cols: q[cols[0], cols[1]])
.flatten())
.otherwise(pl.col("Q")))
shape: (4, 3)
┌─────┬─────┬─────┐
│ HZ ┆ FL ┆ Q │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 1 ┆ 0 │
│ 1 ┆ 2 ┆ 6 │
│ 1 ┆ 3 ┆ 7 │
└─────┴─────┴─────┘
It’s slightly awkward to do – it would probably be better to have your data in a better format for polars e.g. another dataframe.
df_q = pl.DataFrame(
((row, col, value) for (row, col), value in np.ndenumerate(q)),
schema=["HZ", "FL", "Q"]
)
>>> df_q
shape: (8, 3)
┌─────┬─────┬─────┐
│ HZ ┆ FL ┆ Q │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 1 │
│ 0 ┆ 1 ┆ 2 │
│ 0 ┆ 2 ┆ 3 │
│ 0 ┆ 3 ┆ 4 │
│ 1 ┆ 0 ┆ 5 │
│ 1 ┆ 1 ┆ 6 │
│ 1 ┆ 2 ┆ 7 │
│ 1 ┆ 3 ┆ 8 │
└─────┴─────┴─────┘
This would allow you to use a more regular approach to matching values such as a .join
df.join(df_q.with_columns(pl.col("FL") + 1), on=["HZ", "FL"], how="left")
shape: (4, 4)
┌─────┬─────┬─────┬─────────┐
│ HZ ┆ FL ┆ Q ┆ Q_right │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════════╡
│ 0 ┆ 0 ┆ 0 ┆ null │
│ 0 ┆ 1 ┆ 0 ┆ 1 │
│ 1 ┆ 2 ┆ 0 ┆ 6 │
│ 1 ┆ 3 ┆ 0 ┆ 7 │
└─────┴─────┴─────┴─────────┘
I am currently working on a dataframe function that assigns values of a numpy array of shape 2 to a given column of a dataframe using the polars library in Python.
I have a dataframe df
with the following columns : ['HZ', 'FL', 'Q']
. The column 'HZ'
takes values in [0, EC + H - 1]
and the column 'FL'
takes values in [1, F]
.
I also have a numpy array q
of shape (EC + H, F)
, and I want to assign its values to the column 'Q'
in this way :
if df[‘HZ’] >= EC, then df[‘Q’] = q[df[‘HZ’]][df[‘F’] – 1].
You can find below the pandas instruction that does exactly what I want to do.
df.loc[df['HZ'] >= EC, 'Q'] = q[df.loc[df['HZ'] >= EC, 'HZ'], df.loc[df['HZ'] >= EC, 'F'] - 1]
Now I want to do it using polars, and I tried to do it this way:
df = df.with_columns(pl.when(pl.col('HZ') >= EC).then(q[pl.col('HZ')][pl.col('F') - 1]).otherwise(pl.col('Q')).alias('Q'))
And I get the following error :
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
I understand that I don’t give numpy the good format of indexes to get the corresponding value in the array, but I don’t know how to replace it to get the desired behavior.
Thanks by advance
By test case/example I meant something like:
df = pl.DataFrame({
"HZ": [0, 0, 1, 1],
"FL": [0, 1, 2, 3],
"Q": [0, 0, 0, 0]
})
q = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
EC = 1
>>> df
shape: (4, 3)
┌─────┬─────┬─────┐
│ HZ ┆ FL ┆ Q │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 1 ┆ 0 │
│ 1 ┆ 2 ┆ 0 │
│ 1 ┆ 3 ┆ 0 │
└─────┴─────┴─────┘
The problem with your attempted approach is q[pl.col('HZ')
happens before .with_columns
executes and numpy does not understand pl.col('HZ')
One way to use the actual values to index the numpy array is by using .map
df.with_columns(Q =
pl.when(pl.col("HZ") >= EC)
.then(
pl.map(
["HZ", pl.col("FL") - 1],
lambda cols: q[cols[0], cols[1]])
.flatten())
.otherwise(pl.col("Q")))
shape: (4, 3)
┌─────┬─────┬─────┐
│ HZ ┆ FL ┆ Q │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 0 │
│ 0 ┆ 1 ┆ 0 │
│ 1 ┆ 2 ┆ 6 │
│ 1 ┆ 3 ┆ 7 │
└─────┴─────┴─────┘
It’s slightly awkward to do – it would probably be better to have your data in a better format for polars e.g. another dataframe.
df_q = pl.DataFrame(
((row, col, value) for (row, col), value in np.ndenumerate(q)),
schema=["HZ", "FL", "Q"]
)
>>> df_q
shape: (8, 3)
┌─────┬─────┬─────┐
│ HZ ┆ FL ┆ Q │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ 1 │
│ 0 ┆ 1 ┆ 2 │
│ 0 ┆ 2 ┆ 3 │
│ 0 ┆ 3 ┆ 4 │
│ 1 ┆ 0 ┆ 5 │
│ 1 ┆ 1 ┆ 6 │
│ 1 ┆ 2 ┆ 7 │
│ 1 ┆ 3 ┆ 8 │
└─────┴─────┴─────┘
This would allow you to use a more regular approach to matching values such as a .join
df.join(df_q.with_columns(pl.col("FL") + 1), on=["HZ", "FL"], how="left")
shape: (4, 4)
┌─────┬─────┬─────┬─────────┐
│ HZ ┆ FL ┆ Q ┆ Q_right │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════════╡
│ 0 ┆ 0 ┆ 0 ┆ null │
│ 0 ┆ 1 ┆ 0 ┆ 1 │
│ 1 ┆ 2 ┆ 0 ┆ 6 │
│ 1 ┆ 3 ┆ 0 ┆ 7 │
└─────┴─────┴─────┴─────────┘