Pandas labeling in a for loop

Question

here’s the thing, I want to add a new column as a label for a selection of rows.

when failure is 1, select 2 rows before and 1 after then add a label column. Here is an attempt that I want…

df_new = pd.DataFrame()

for i in range(0, len(df)):
    if df.iloc[i]['failure'] == 1:
        n += 1
        df_new = df_new.append(df.iloc[i-2:i+2])
        df_new = df_new.append({'label': n}, ignore_index=True)```

The result of that:

    var_1 | var_2 | failure | label
------------------------------------
0   75.0  | 55.0  | 0.0     |   NaN
------------------------------------
1   45.0  | 19.0  | 0.0     |   NaN
------------------------------------
2   76.0  | 46.0  | 1.0     |   NaN
------------------------------------
3   18.0  | 63.0  | 0.0     |   NaN
------------------------------------
4   NaN   | NaN   | NaN     |   1.0
------------------------------------

But I want...

    var_1 | var_2 | failure | label
------------------------------------
0   75.0  | 55.0  | 0.0     |   1
------------------------------------
1   45.0  | 19.0  | 0.0     |   1
------------------------------------
2   76.0  | 46.0  | 1.0     |   1
------------------------------------
3   18.0  | 63.0  | 0.0     |   1
------------------------------------

Asked By: Control Solution

||

Source

Answer 1

Instead of a for loop, a more pandas approach would be to first compute the sum as a series, and add it to your frame with a condition.

For example, signal = df['failure'].rolling(window=4).sum().shift(-3) (You’ll want to double check the shift offset to make sure it’s what you intend).
Then you can create df['label'] = np.where(signal == 1, 1, 0).

Does that fit what you need?

Answered By: user5002062

Answer 2

For dataset:

dataset with 10,000 rows and 6 columns of random data between 0 and 100 (inclusive) and last column is a random number intiger between 0 and 1

df = pd.DataFrame(np.random.randint(0, 100, size=(10000, 6)), columns=['a', 'b', 'c', 'd', 'e', 'f'])
df['g'] = np.random.randint(0, 2, size=10000)
df.columns = [ 'var_' + str(i) for i in range(1, 7) ] + ['failure']
df['failure'] = np.random.binomial(1, 0.1, size=10000)

when failure is 1, select 2 rows before and 1 after then add a label column

n = 0
df_new = pd.DataFrame()

for i in range(0, len(df)):
    if df.iloc[i]['failure'] == 1:
        n += 1
        df_new = df_new.append(df.iloc[i-2:i+2])
        df_new = df_new.append({'label': n}, ignore_index=True)

df_new['label'].fillna(method='bfill', inplace=True)
df_new.dropna(inplace=True)

Answered By: Control Solution

Pandas labeling in a for loop

Question:

Answers:

dataset with 10,000 rows and 6 columns of random data between 0 and 100 (inclusive) and last column is a random number intiger between 0 and 1

when failure is 1, select 2 rows before and 1 after then add a label column