Determinate Consecutive Values (Invoices) Pandas
Question:
I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0.
For example:
df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131']
Supplier Invoice
0 ABC 101
1 ABC 102
2 ABC 105
3 ABC 109
4 ABC 110
5 ABC 114
6 ABC 120
7 ABC 121
8 ABC 122
9 ABC 130
10 ABC 131
And what I want is a third column like this:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
EDIT
Thanks for your answers, this options works great, but when I tried it in a real database I realized that I need to compare the suppliers, if the next row is a new supplier the consecutive must be 0.
For example:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
11 DEF 132 0
12 DEF 133 1
13 GHI 134 0
Thanks in advance!
Answers:
Make sure your ‘Invoice’ column is of a numeric type first. Then you can use:
df1['Consecutive'] = (df1.Invoice - df1.Invoice.shift(1) == 1).astype(int)
prints:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
using npwhere and diff
# take difference b/w consecutive rows using diff
# update to 0 when difference is not 1
df1['Consecutive']=(df1.groupby('Supplier')['Invoice']
.transform(lambda x: np.where(
x.astype(int).diff() !=1, 0, 1)) )
df1
df1
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
11 DEF 132 0
12 DEF 133 1
13 GHI 134 0
df1['Consecutive']=df1.Invoice.astype(int).diff().eq(1).astype(int)
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
Thanks to all for your answers, here is the code for the case we have multiple suppliers.
df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','GHI']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131','132','133','134']
df1['Consecutive']= np.where((df1['Invoice'].astype('int64').diff() ==1)
& (df1.Supplier.eq(df1.Supplier.shift())), 1, 0)
Result:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
11 DEF 132 0
12 DEF 133 1
13 GHI 134 0
I have a dataset with supplier and its invoices and I need to determinate which of the invoices are consecutives marking it with a 1 or a 0.
For example:
df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131']
Supplier Invoice
0 ABC 101
1 ABC 102
2 ABC 105
3 ABC 109
4 ABC 110
5 ABC 114
6 ABC 120
7 ABC 121
8 ABC 122
9 ABC 130
10 ABC 131
And what I want is a third column like this:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
EDIT
Thanks for your answers, this options works great, but when I tried it in a real database I realized that I need to compare the suppliers, if the next row is a new supplier the consecutive must be 0.
For example:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
11 DEF 132 0
12 DEF 133 1
13 GHI 134 0
Thanks in advance!
Make sure your ‘Invoice’ column is of a numeric type first. Then you can use:
df1['Consecutive'] = (df1.Invoice - df1.Invoice.shift(1) == 1).astype(int)
prints:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
using npwhere and diff
# take difference b/w consecutive rows using diff
# update to 0 when difference is not 1
df1['Consecutive']=(df1.groupby('Supplier')['Invoice']
.transform(lambda x: np.where(
x.astype(int).diff() !=1, 0, 1)) )
df1
df1
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
11 DEF 132 0
12 DEF 133 1
13 GHI 134 0
df1['Consecutive']=df1.Invoice.astype(int).diff().eq(1).astype(int)
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
Thanks to all for your answers, here is the code for the case we have multiple suppliers.
df1 = pd.DataFrame()
df1['Supplier'] = ['ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','ABC','DEF','DEF','GHI']
df1['Invoice'] = ['101','102','105','109','110','114','120','121','122','130','131','132','133','134']
df1['Consecutive']= np.where((df1['Invoice'].astype('int64').diff() ==1)
& (df1.Supplier.eq(df1.Supplier.shift())), 1, 0)
Result:
Supplier Invoice Consecutive
0 ABC 101 0
1 ABC 102 1
2 ABC 105 0
3 ABC 109 0
4 ABC 110 1
5 ABC 114 0
6 ABC 120 0
7 ABC 121 1
8 ABC 122 1
9 ABC 130 0
10 ABC 131 1
11 DEF 132 0
12 DEF 133 1
13 GHI 134 0