python pandas dataframe, add column and tag an adjusted and inserted row

Question:

I have the following data-frame

import pandas as pd
df = pd.DataFrame()
df['number'] = (651,651,651,4267,4267,4267,4267,4267,4267,4267,8806,8806,8806,6841,6841,6841,6841)
df['name']=('Alex','Alex','Alex','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Ankit','Abhishek','Abhishek','Abhishek','Blake','Blake','Blake','Blake')
df['hours']=(8.25,7.5,7.5,7.5,14,12,15,11,6.5,14,15,15,13.5,8,8,8,8)
df['loc']=('Nar','SCC','RSL','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNIT-C','UNI','UNI','UNI','UNKING','UNKING','UNKING','UNKING')
print(df)

If the running balance of an individuals hours reach 38 an adjustment to the cell that reached the 38th hour is made, a duplicate row is inserted and the balance of hours is added to the following row. The following code performs this and the difference in output of original data to adjusted data can be seen.

s = df.groupby('number')['hours'].cumsum()
m = s.gt(38)
idx = m.groupby(df['number']).idxmax()
delta = s.groupby(df['number']).shift().rsub(38).fillna(s)
out = df.loc[df.index.repeat((df.index.isin(idx)&m)+1)]
out.loc[out.index.duplicated(keep='last'), 'hours'] = delta
out.loc[out.index.duplicated(), 'hours'] -= delta
print(out)

For the row that got adjusted and the row that got inserted I need to tag them via inserting another column and adding a character such as an ‘x’ to highlight the adjusted and inserted row

Asked By: ds882

||

Answers:

As you duplicate index, you can use out.index.duplicated as boolean mask:

# or out['mod'] = np.where(out.index.duplicated(keep=False), 'x', '-')
out.loc[out.index.duplicated(keep=False), 'mod'] = 'x'
print(out)

# Output
    number      name  hours     loc  mod
0      651      Alex   8.25     Nar  NaN
1      651      Alex   7.50     SCC  NaN
2      651      Alex   7.50     RSL  NaN
3     4267     Ankit   7.50  UNIT-C  NaN
4     4267     Ankit  14.00  UNIT-C  NaN
5     4267     Ankit  12.00  UNIT-C  NaN
6     4267     Ankit   4.50  UNIT-C    x  # index 6
6     4267     Ankit  10.50  UNIT-C    x  # twice
7     4267     Ankit  11.00  UNIT-C  NaN
8     4267     Ankit   6.50  UNIT-C  NaN
9     4267     Ankit  14.00  UNIT-C  NaN
10    8806  Abhishek  15.00     UNI  NaN
11    8806  Abhishek  15.00     UNI  NaN
12    8806  Abhishek   8.00     UNI    x  # index 12
12    8806  Abhishek   5.50     UNI    x  # twice
13    6841     Blake   8.00  UNKING  NaN
14    6841     Blake   8.00  UNKING  NaN
15    6841     Blake   8.00  UNKING  NaN
16    6841     Blake   8.00  UNKING  NaN
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.