unique count for accumulative values in 2 or more rows pandas
Question:
Is it possible to do a unique count for the values of 2(or multiple) rows in a dataframe? I was able to do unique count with df['count'] = df.iloc[:, 0:6].nunique(axis=1)
of the first 6 columns of individual. However, i can’t figure out (or find) how to get the unique count of the 6 columns in both(or multiple) rows.
original df:
there are 3 unique values in each row: 7,4,2 and 8,5,6
╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦═══════╗
║ hf0 ║ hf1 ║ hf2 ║ hf3 ║ hf4 ║ hf5 ║ sample ║ count ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬═══════╣
║ 7 ║ 4 ║ 2 ║ 2 ║ 7 ║ 2 ║ 7yr ║ 3 ║
║ 8 ║ 5 ║ 5 ║ 6 ║ 5 ║ 6 ║ 7yr ║ 3 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩═══════╝
df trying to get:
there are 6 unique values for both rows: 7,4,2,8,5,6
╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦════════╦════════════╗
║ hf0 ║ hf1 ║ hf2 ║ hf3 ║ hf4 ║ hf5 ║ sample ║ count ║ count2rows ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬════════╬════════════╣
║ 7 ║ 4 ║ 2 ║ 2 ║ 7 ║ 2 ║ 7yr ║ 3 ║ 6 ║
║ 8 ║ 5 ║ 5 ║ 6 ║ 5 ║ 6 ║ 7yr ║ 3 ║ 6 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩════════╩════════════╝
code for sample df:
import pandas as pd
import numpy as np
data = {'hf0':[7,8],'hf1':[4,5], 'hf2':[2,5],'hf3':[2,6],'hf4':[7,5],'hf5':[2,6],'sample':['7yr','7yr']}
df = pd.DataFrame(data)
df['count'] = df.iloc[:, 0:6].nunique(axis=1)
df
Thanks in advance
Answers:
You could use Numpy
to get that done.
import numpy as np
df['count2rows'] = len(np.unique(df.filter(like='hf').values))
print(df)
Result
hf0 hf1 hf2 hf3 hf4 hf5 sample count count3rows
0 7 4 2 2 7 2 7yr 3 6
1 8 5 5 6 5 6 7yr 3 6
df
###
hf0 hf1 hf2 hf3 hf4 hf5 sample
0 7 4 2 2 7 2 7yr
1 8 5 5 6 5 6 7yr
2 9 6 8 10 7 10 7yr
3 10 7 11 14 5 14 7yr
4 11 8 14 18 7 18 7yr
Rolling window rolling=2
rolling = 2
ar = df.loc[:,'hf0':'hf5'].values
length = ar.shape[1]
head_arrs = np.zeros((rolling-1, rolling*length))
cubic = np.lib.stride_tricks.sliding_window_view(ar, (rolling,length)).astype(float)
plane = cubic.reshape(-1,rolling*length)
for i in range(rolling-1,0,-1):
head_arr_l = plane[0,:i*length]
head_arr_l = np.pad(head_arr_l.astype(float), (0,length*(rolling-i)), 'constant', constant_values=np.nan)
head_arr_l = np.roll(head_arr_l, length*(rolling-i))
head_arrs[i-1,:] = head_arr_l
plane = np.insert(plane, 0, head_arrs, axis=0)
df['rolling_nunique'] = pd.DataFrame(plane).nunique(axis=1)
df
###
hf0 hf1 hf2 hf3 hf4 hf5 sample rolling_nunique
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 6
3 10 7 11 14 5 14 7yr 8
4 11 8 14 18 7 18 7yr 7
Rolling window rolling=3
hf0 hf1 hf2 hf3 hf4 hf5 sample rolling_nunique
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 8
3 10 7 11 14 5 14 7yr 8
4 11 8 14 18 7 18 7yr 9
If the rolling window covers whole rows of df
,
in this case, df.shape[0] = 5
,
Rolling window rolling=5
hf0 hf1 hf2 hf3 hf4 hf5 sample rolling_nunique
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 8
3 10 7 11 14 5 14 7yr 10
4 11 8 14 18 7 18 7yr 11
You could use stack()
df['count_2 rows'] = df.iloc[:, 0:6].stack().nunique()
If you would like to use rolling functionality you could use rolling()
and iterate over the output.
n = 3
df['count_2 rows'] = [i.stack().nunique() for i in df.iloc[:,:6].rolling(n)]
Is it possible to do a unique count for the values of 2(or multiple) rows in a dataframe? I was able to do unique count with df['count'] = df.iloc[:, 0:6].nunique(axis=1)
of the first 6 columns of individual. However, i can’t figure out (or find) how to get the unique count of the 6 columns in both(or multiple) rows.
original df:
there are 3 unique values in each row: 7,4,2 and 8,5,6
╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦═══════╗ ║ hf0 ║ hf1 ║ hf2 ║ hf3 ║ hf4 ║ hf5 ║ sample ║ count ║ ╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬═══════╣ ║ 7 ║ 4 ║ 2 ║ 2 ║ 7 ║ 2 ║ 7yr ║ 3 ║ ║ 8 ║ 5 ║ 5 ║ 6 ║ 5 ║ 6 ║ 7yr ║ 3 ║ ╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩═══════╝
df trying to get:
there are 6 unique values for both rows: 7,4,2,8,5,6
╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦════════╦════════════╗ ║ hf0 ║ hf1 ║ hf2 ║ hf3 ║ hf4 ║ hf5 ║ sample ║ count ║ count2rows ║ ╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬════════╬════════════╣ ║ 7 ║ 4 ║ 2 ║ 2 ║ 7 ║ 2 ║ 7yr ║ 3 ║ 6 ║ ║ 8 ║ 5 ║ 5 ║ 6 ║ 5 ║ 6 ║ 7yr ║ 3 ║ 6 ║ ╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩════════╩════════════╝
code for sample df:
import pandas as pd
import numpy as np
data = {'hf0':[7,8],'hf1':[4,5], 'hf2':[2,5],'hf3':[2,6],'hf4':[7,5],'hf5':[2,6],'sample':['7yr','7yr']}
df = pd.DataFrame(data)
df['count'] = df.iloc[:, 0:6].nunique(axis=1)
df
Thanks in advance
You could use Numpy
to get that done.
import numpy as np
df['count2rows'] = len(np.unique(df.filter(like='hf').values))
print(df)
Result
hf0 hf1 hf2 hf3 hf4 hf5 sample count count3rows
0 7 4 2 2 7 2 7yr 3 6
1 8 5 5 6 5 6 7yr 3 6
df
###
hf0 hf1 hf2 hf3 hf4 hf5 sample
0 7 4 2 2 7 2 7yr
1 8 5 5 6 5 6 7yr
2 9 6 8 10 7 10 7yr
3 10 7 11 14 5 14 7yr
4 11 8 14 18 7 18 7yr
Rolling window rolling=2
rolling = 2
ar = df.loc[:,'hf0':'hf5'].values
length = ar.shape[1]
head_arrs = np.zeros((rolling-1, rolling*length))
cubic = np.lib.stride_tricks.sliding_window_view(ar, (rolling,length)).astype(float)
plane = cubic.reshape(-1,rolling*length)
for i in range(rolling-1,0,-1):
head_arr_l = plane[0,:i*length]
head_arr_l = np.pad(head_arr_l.astype(float), (0,length*(rolling-i)), 'constant', constant_values=np.nan)
head_arr_l = np.roll(head_arr_l, length*(rolling-i))
head_arrs[i-1,:] = head_arr_l
plane = np.insert(plane, 0, head_arrs, axis=0)
df['rolling_nunique'] = pd.DataFrame(plane).nunique(axis=1)
df
###
hf0 hf1 hf2 hf3 hf4 hf5 sample rolling_nunique
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 6
3 10 7 11 14 5 14 7yr 8
4 11 8 14 18 7 18 7yr 7
Rolling window rolling=3
hf0 hf1 hf2 hf3 hf4 hf5 sample rolling_nunique
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 8
3 10 7 11 14 5 14 7yr 8
4 11 8 14 18 7 18 7yr 9
If the rolling window covers whole rows of df
,
in this case, df.shape[0] = 5
,
Rolling window rolling=5
hf0 hf1 hf2 hf3 hf4 hf5 sample rolling_nunique
0 7 4 2 2 7 2 7yr 3
1 8 5 5 6 5 6 7yr 6
2 9 6 8 10 7 10 7yr 8
3 10 7 11 14 5 14 7yr 10
4 11 8 14 18 7 18 7yr 11
You could use stack()
df['count_2 rows'] = df.iloc[:, 0:6].stack().nunique()
If you would like to use rolling functionality you could use rolling()
and iterate over the output.
n = 3
df['count_2 rows'] = [i.stack().nunique() for i in df.iloc[:,:6].rolling(n)]