Iterate through chunks of a pandas Dataframe

Question

I have a pandas.DataFrame that looks like the following:

Week	Monday	Tuesday	Wednesday	Thursday	Friday
City A	100	300	x	z	w
City B	200	400	y	q	p
None	None	None	None	None	None
Week	Monday	Tuesday	Wednesday	Thursday	Friday
City A	150	320	a	c	e
City B	210	470	z	t	q
City C	260	446	b	d	f
None	None	None	None	None	None

This repeats until all weeks in a year are covered (it’s basically a weekly calendar with data in it).

I wish to loop through the DataFrame in chunks, and do some operations with the data within those chunks.

The chunks should be basically "Week-to-Week"-high and "Week-to-Friday"-wide, if that makes sense. However, as you can see, the chunks are not equally large so I can’t hard code the size to be 4×6, for example. They do, although, always go from "Week" to "Week" and go out as far to the right as "Friday".

Is there any intuitive way I can iterate through my DataFrame? Any help is appreciated.

Asked By: Lucas B. Bahadir

||

Source

Answer 1

You can try:

df['week_index'] = df.isna().all(axis='columns').astype(int).cumsum()
for _, df_chunk in df.groupby('week_index'):
    # do something

To do it week-to-week:

df['week_index'] = df.isna().all(axis='columns').astype(int).shift(1, fill_value=0).cumsum()
for _, df_chunk in df.groupby('week_index'):
    # process each chunk

Answered By: Learning is a mess

Answer 2

Reproducing your data with the CSV file,

# data.csv
Week,Monday,Tuesday,Wednesday,Thursday,Friday
City 100,300,x,z,w
City B,200,400,y,q,p
None,None,None,None,None,None
Week,Monday,Tuesday,Wednesday,Thursday,Friday
City A,150,320,a,c,e
City B,210,470,z,t,q
City C,260,446,b,d,f
None,None,None,None,None,None

you can do the following in order to clean the original dataset and turn it into something more useful for groups and aggregations:

import pandas as pd                                                                                     
                                                                                                        
# Reproduce your data, then drop NaN rows.                                                              
df = pd.read_csv("data.csv", header=None)                                                               
df = df.dropna()                                                                                        
print(df, "n")                                                                                         
                                                                                                        
# Label rows by week number, and use this label as index.                                               
df['WeekNumber'] = df[df[0] == "Week"].all(axis=1).cumsum().astype('category')                          
df = df.ffill()                                                                                         
df = df.set_index("WeekNumber")                                                                         
print(df, "n")                                                                                         
                                                                                                        
# Regroup the dataset by week number and reuse header in each group                                     
header = list(df.iloc[0])                                                                               
df = df.groupby("WeekNumber", observed=True,                                                            
                as_index=False).apply(lambda x: x[1:]).reset_index(level=0,                             
                                                                   drop=True)                           
df.columns = header                                                                                     
print(df, "n")                                                                                         
                                                                                                        
# The name "Week" in the original dataset is somewhat inaccurate, so                                    
# change the corresponding column                                                                       
df = df.rename({"Week": "City"}, axis=1)                                                                
print(df, "n") 

# Example
print(df.groupby("WeekNumber", observed=True).agg({"Monday": "sum"}))

gives

                Monday
WeekNumber            
1               100200
2            150210260

Answered By: JustLearning

Iterate through chunks of a pandas Dataframe

Question:

Answers: