how to aggregate multiple tasks into a single python function?

Question:

I’m working on a dataframe that i have been able to clean by running the following codes in separate cells in jupyter notebook. However, I need to run these same tasks on several dataframes that are organized exactly the same. How can i write a function that can execute the tasks 2 through 4 below?

For reference, the date I’m working with is located here.

[1]: df1 = pd.read_csv('202110-divvy-tripdata.csv')

[2]: df1.drop(columns=['start_station_name','start_station_id','end_station_name','end_station_id','start_lat','start_lng','end_lat','end_lng'],inplace=True)

[3]: df1['ride_length'] = pd.to_datetime(df1.ended_at) - pd.to_datetime(df1.started_at)

[4]: df1['day_of_week'] = pd.to_datetime(df1.started_at).dt.day_name()

Asked By: BannyM

||

Answers:

You can define a function in a cell in Jupyter, run this cell and then call the function:

def process_df(df):
    df1['ride_length'] = pd.to_datetime(df1.ended_at) - pd.to_datetime(df1.started_at)
    df1['day_of_week'] = pd.to_datetime(df1.started_at).dt.day_name()

Call the function with each DataFrame:

df1 = pd.read_csv('data1.csv')
df2 = pd.read_csv('data2.csv')

process_df(df1)
process_df(df2)

According to this answer, both DataFrames will be altered in place and there’s no need to return a new object from the function.

Answered By: dor132
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.