python-polars

Reading a csv in polars

Reading a csv in polars Question: What is the difference between polars.read_csv vs polars.read_csv_batched vs polars.scan_csv ? polars.read_csv looks equivalent to pandas.read_csv as they have the same name. Which one to use in which scenario and how they are similar/different to pandas.read_csv? Asked By: Goku – stands with Palestine || Source Answers: polars.read_csv_batched is pretty …

Total answers: 2

Run group_by_dynamic in polars but only on timestamp

Run group_by_dynamic in polars but only on timestamp Question: I have some dummy data like such: datetime,duration_in_traffic_s 2023-12-20T10:50:43.063641000,221.0 2023-12-20T10:59:09.884939000,219.0 2023-12-20T11:09:56.003331000,206.0 … more rows with different dates … Assume this data is stored in a file mwe.csv. Using polars, I now want to compute averages over the second column, grouped in one hour chunks. I want …

Total answers: 1

How do I concatenate columns values (all but one) to a list and add it as a column with polars?

How do I concatenate columns values (all but one) to a list and add it as a column with polars? Question: I have the input in this format: import polars as pl data = {"Name": [‘Name_A’, ‘Name_B’,’Name_C’], "val_1": [‘a’,None, ‘a’],"val_2": [None,None, ‘b’],"val_3": [None,’c’, None],"val_4": [‘c’,None, ‘g’],"val_5": [None,None, ‘i’]} df = pl.DataFrame(data) print(df) shape: (3, 6) …

Total answers: 1

How to use with.columns in LazyGroupBy object in polars?

How to use with.columns in LazyGroupBy object in polars? Question: I am trying to calculate the difference of lag variable group by id variable. However, when I tried to run the following code: ad.v2.groupby(‘id’).with_columns( diff = pl.col(‘Movement_Time_clear’) – pl.col(‘Movement_Time_clear’).diff() ) A warning was popped: Traceback (most recent call last): File "<stdin>", line 1, in <module> …

Total answers: 1

How to convert column to list in expressions in Polars?

How to convert column to list in expressions in Polars? Question: I was earlier able to convert column to list but it is not working now after the latest version update. import polars as pl df = pl.DataFrame( { "A": [1,4,4,7,7,10,10,13,16], "B": [2,5,5,8,18,11,11,14,17], "C": [3,6,6,9,9,12,12,15,18] } ) I have also referred to polars_list_link but below …

Total answers: 1

How to filter duplicates based on multiple columns in Polars?

How to filter duplicates based on multiple columns in Polars? Question: I was earlier able to filter duplicates based on multiple columns using df.filter(pl.col([‘A’,’C’]).is_duplicated()) but after the latest version update this is not working. import polars as pl df = pl.DataFrame( { "A": [1,4,4,7,7,10,10,13,16], "B": [2,5,5,8,18,11,11,14,17], "C": [3,6,6,9,9,12,12,15,18] } ) df.filter(pl.col([‘A’,’C’]).is_duplicated()) giving error df.filter(df.select( pl.col([‘A’,’C’]).is_duplicated() …

Total answers: 1

Why is Polars running my "then" function even if the "when" condition is false?

Why is Polars running my "then" function even if the "when" condition is false? Question: Given this DataFrame: d1 = pl.DataFrame({ ‘x’: [‘a’, None, ‘b’], ‘y’: [1.1, 2.2, 3.3] }) print(d1) shape: (3, 2) ┌──────┬─────┐ │ x ┆ y │ │ — ┆ — │ │ str ┆ f64 │ ╞══════╪═════╡ │ a ┆ 1.1 …

Total answers: 1

Split a parquet file by groups

Split a parquet file by groups Question: I have a large-ish dataframe in a Parquet file and I want to split it into multiple files to leverage Hive partitioning with pyarrow. Preferably without loading all data into memory. (This question has been asked before, but I have not found a solution that is both fast …

Total answers: 3

Convert column to Numpy within a select statement in Polars

Convert column to Numpy within a select statement in Polars Question: I am trying to transform a date column to the next business day after each date (if the date isn’t already a business day in which case it remains unchanged). To do this I am using a Numpy function called busday_offset which takes a …

Total answers: 1