Count up if value changes in other dataframe columns
Question:
My input is:
d = {'x': [1479903, 1479879, 1479926, 1479736, 1479760, 1479784, 1479808, 1479831, 1480331, 1480355, 1480949, 1480973, 1480997, 1481021, 1481877, 1481901, 1481924, 1481948, 1481996, 1482020], 'y': [7195293, 7195293, 7195293, 7194829, 7194829, 7194829, 7194829, 7194829, 7194829, 7194829, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7193920, 7193920], 'group':[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3]}
df = pd.DataFrame(data=d)
df = df.sort_values(by = ['group', 'y','x'], ascending = [True, False, True])
The dataframe is already sorted. The desired output is to create a column called ‘subgroup’ that starts at 1 and then only counts up when the y-value changes and starts over at 1 when the group number changes. Also create another column called ‘count’ that counts up until the ‘subgroup’ changes, when ‘subgroup’ changes then the ‘count’ starts over at 1. The desired output should look like:
group
subgroup
count
x
y
1
1
1
1479879
7195293
1
1
2
1479903
7195293
1
1
3
1479926
7195293
1
2
1
1479736
7194829
1
2
2
1479760
7194829
1
2
3
1479784
7194829
1
2
4
1479808
7194829
1
2
5
1479831
7194829
2
1
1
1480331
7194829
2
1
2
1480355
7194829
2
2
1
1480949
7194384
2
2
2
1480973
7194384
2
2
3
1480997
7194384
2
2
4
1481021
7194384
3
1
1
1481877
7194384
3
1
2
1481901
7194384
3
1
3
1481924
7194384
3
1
4
1481948
7194384
3
2
1
1481996
7193920
3
2
2
1482020
7193920
Thanks in advance!
Answers:
Try:
df['subgroup new'] = df.groupby('group').apply(lambda x: (x['y'] != x['y'].shift()).cumsum()).values
df['count new'] = df.groupby(['group', 'subgroup new']).cumcount() + 1
print(df)
Prints:
group subgroup count x y subgroup new count new
0 1 1 1 1479879 7195293 1 1
1 1 1 2 1479903 7195293 1 2
2 1 1 3 1479926 7195293 1 3
3 1 2 1 1479736 7194829 2 1
4 1 2 2 1479760 7194829 2 2
5 1 2 3 1479784 7194829 2 3
6 1 2 4 1479808 7194829 2 4
7 1 2 5 1479831 7194829 2 5
8 2 1 1 1480331 7194829 1 1
9 2 1 2 1480355 7194829 1 2
10 2 2 1 1480949 7194384 2 1
11 2 2 2 1480973 7194384 2 2
12 2 2 3 1480997 7194384 2 3
13 2 2 4 1481021 7194384 2 4
14 3 1 1 1481877 7194384 1 1
15 3 1 2 1481901 7194384 1 2
16 3 1 3 1481924 7194384 1 3
17 3 1 4 1481948 7194384 1 4
18 3 2 1 1481996 7193920 2 1
19 3 2 2 1482020 7193920 2 2
My input is:
d = {'x': [1479903, 1479879, 1479926, 1479736, 1479760, 1479784, 1479808, 1479831, 1480331, 1480355, 1480949, 1480973, 1480997, 1481021, 1481877, 1481901, 1481924, 1481948, 1481996, 1482020], 'y': [7195293, 7195293, 7195293, 7194829, 7194829, 7194829, 7194829, 7194829, 7194829, 7194829, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7193920, 7193920], 'group':[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3]}
df = pd.DataFrame(data=d)
df = df.sort_values(by = ['group', 'y','x'], ascending = [True, False, True])
The dataframe is already sorted. The desired output is to create a column called ‘subgroup’ that starts at 1 and then only counts up when the y-value changes and starts over at 1 when the group number changes. Also create another column called ‘count’ that counts up until the ‘subgroup’ changes, when ‘subgroup’ changes then the ‘count’ starts over at 1. The desired output should look like:
group | subgroup | count | x | y |
---|---|---|---|---|
1 | 1 | 1 | 1479879 | 7195293 |
1 | 1 | 2 | 1479903 | 7195293 |
1 | 1 | 3 | 1479926 | 7195293 |
1 | 2 | 1 | 1479736 | 7194829 |
1 | 2 | 2 | 1479760 | 7194829 |
1 | 2 | 3 | 1479784 | 7194829 |
1 | 2 | 4 | 1479808 | 7194829 |
1 | 2 | 5 | 1479831 | 7194829 |
2 | 1 | 1 | 1480331 | 7194829 |
2 | 1 | 2 | 1480355 | 7194829 |
2 | 2 | 1 | 1480949 | 7194384 |
2 | 2 | 2 | 1480973 | 7194384 |
2 | 2 | 3 | 1480997 | 7194384 |
2 | 2 | 4 | 1481021 | 7194384 |
3 | 1 | 1 | 1481877 | 7194384 |
3 | 1 | 2 | 1481901 | 7194384 |
3 | 1 | 3 | 1481924 | 7194384 |
3 | 1 | 4 | 1481948 | 7194384 |
3 | 2 | 1 | 1481996 | 7193920 |
3 | 2 | 2 | 1482020 | 7193920 |
Thanks in advance!
Try:
df['subgroup new'] = df.groupby('group').apply(lambda x: (x['y'] != x['y'].shift()).cumsum()).values
df['count new'] = df.groupby(['group', 'subgroup new']).cumcount() + 1
print(df)
Prints:
group subgroup count x y subgroup new count new
0 1 1 1 1479879 7195293 1 1
1 1 1 2 1479903 7195293 1 2
2 1 1 3 1479926 7195293 1 3
3 1 2 1 1479736 7194829 2 1
4 1 2 2 1479760 7194829 2 2
5 1 2 3 1479784 7194829 2 3
6 1 2 4 1479808 7194829 2 4
7 1 2 5 1479831 7194829 2 5
8 2 1 1 1480331 7194829 1 1
9 2 1 2 1480355 7194829 1 2
10 2 2 1 1480949 7194384 2 1
11 2 2 2 1480973 7194384 2 2
12 2 2 3 1480997 7194384 2 3
13 2 2 4 1481021 7194384 2 4
14 3 1 1 1481877 7194384 1 1
15 3 1 2 1481901 7194384 1 2
16 3 1 3 1481924 7194384 1 3
17 3 1 4 1481948 7194384 1 4
18 3 2 1 1481996 7193920 2 1
19 3 2 2 1482020 7193920 2 2