Polars Looping through the rows in a dataset

Question:

I am trying to loop through a Polars recordset using the following code:


import polars as pl

mydf = pl.DataFrame(
    {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
     "Name": ["John", "Joe", "James"]})

print(mydf)

│start_date  ┆ Name  │
│ ---        ┆ ---   │
│ str        ┆ str   │
╞════════════╪═══════╡
│ 2020-01-02 ┆ John  │
│ 2020-01-03 ┆ Joe   │
│ 2020-01-04 ┆ James │

for row in mydf.rows():
    print(row)

('2020-01-02', 'John')
('2020-01-03', 'Joe')
('2020-01-04', 'James')

Is there a way to specifically reference ‘Name’ using the named column as opposed to the index? In Pandas this would look something like:

import pandas as pd

mydf = pd.DataFrame(
    {"start_date": ["2020-01-02", "2020-01-03", "2020-01-04"],
     "Name": ["John", "Joe", "James"]})

for index, row in mydf.iterrows():
    mydf['Name'][index]

'John'
'Joe'
'James'
Asked By: John Smith

||

Answers:

You would use select for that

names = mydf.select(['Name'])
for row in names:
    print(row)
Answered By: Kien Truong

You can specify that you want the rows to be named

for row in mydf.rows(named=True):
    print(row)

It will give you a dict:

{'start_date': '2020-01-02', 'Name': 'John'}
{'start_date': '2020-01-03', 'Name': 'Joe'}
{'start_date': '2020-01-04', 'Name': 'James'}

You can then call row['Name']

Note that:

  • previous versions returned namedtuple instead of dict.
  • it’s less memory intensive to use iter_rows
  • overall it’s not recommended to iterate through the data this way

Row iteration is not optimal as the underlying data is stored in columnar form; where possible, prefer export via one of the dedicated export/output methods.

Answered By: 0x26res
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.