Changing column values in python

Question:

I have a dataframe of shape:

Col1 Col2
0.3 1
0.22 0
0.89 0
0.12 1
0.54 0
0.11 1

Assume that this dataset is sorted based on time and df.iloc[1] is before df.iloc[2]. Also assume that Col2 is binary.
What i would like to do is change the value of each Col2 sample as follows:
df.iloc[i][‘Col2’] is 1 if any of the next 2 samples is 1 in the dataframe, else it is 0. Leave the last 2 elements of the dataframe unchanged
For example the result here would be:

Col1 Col2
0.3 0
0.22 1
0.89 1
0.12 1
0.54 1
0.11 1

What i have done so far:

for i, j in df.iterrows():
   if i<df.shape[0]-2:
       df.iloc[i]['Col2'] = max([df.iloc[j]['Col2'] for j in range(i,i+2)])

I think the code works correctly but since my dataset is very large it takes too much time to run. Is there a more elegant and computationally friendly solution?

Asked By: Los

||

Answers:

Yes, there is a more efficient way to achieve the same result using the rolling and max functions in pandas. Here’s an example:

import pandas as pd

# Create the sample dataframe
data = {'Col1': [0.3, 0.22, 0.89, 0.12, 0.54, 0.11], 'Col2': [1, 0, 0, 1, 0 ,1]}
df = pd.DataFrame(data)

# Use rolling and max functions to update Col2
df['Col2'] = df['Col2'].rolling(3).max().shift(-2).fillna(df['Col2'])

print(df)

This code creates a rolling window of size 3 on column Col2, takes the maximum value within each window and shifts the resulting series up by 2 rows to align with your desired output. The last two elements of the original column are filled in using the fillna function.

This approach should be much faster than using a for loop on large datasets.

Answered By: Ahmad Akel Omar