unique count for accumulative values in 2 or more rows pandas

Question:

Is it possible to do a unique count for the values of 2(or multiple) rows in a dataframe? I was able to do unique count with df['count'] = df.iloc[:, 0:6].nunique(axis=1) of the first 6 columns of individual. However, i can’t figure out (or find) how to get the unique count of the 6 columns in both(or multiple) rows.

original df:
there are 3 unique values in each row: 7,4,2 and 8,5,6

╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦═══════╗
║ hf0 ║ hf1  ║ hf2  ║ hf3  ║ hf4 ║ hf5     ║ sample ║ count ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬═══════╣
║   7 ║    4 ║    2 ║    2 ║   7 ║       2 ║ 7yr    ║     3 ║
║   8 ║    5 ║    5 ║    6 ║   5 ║       6 ║ 7yr    ║     3 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩═══════╝

df trying to get:
there are 6 unique values for both rows: 7,4,2,8,5,6

╔═════╦══════╦══════╦══════╦═════╦═════════╦════════╦════════╦════════════╗
║ hf0 ║ hf1  ║ hf2  ║ hf3  ║ hf4 ║ hf5     ║ sample ║ count  ║ count2rows ║
╠═════╬══════╬══════╬══════╬═════╬═════════╬════════╬════════╬════════════╣
║   7 ║    4 ║    2 ║    2 ║   7 ║       2 ║ 7yr    ║      3 ║          6 ║
║   8 ║    5 ║    5 ║    6 ║   5 ║       6 ║ 7yr    ║      3 ║          6 ║
╚═════╩══════╩══════╩══════╩═════╩═════════╩════════╩════════╩════════════╝

code for sample df:

import pandas as pd
import numpy as np
data = {'hf0':[7,8],'hf1':[4,5], 'hf2':[2,5],'hf3':[2,6],'hf4':[7,5],'hf5':[2,6],'sample':['7yr','7yr']}
df = pd.DataFrame(data)

df['count'] = df.iloc[:, 0:6].nunique(axis=1)
df

Thanks in advance

Asked By: ManOnTheMoon

||

Answers:

You could use Numpy to get that done.

import numpy as np

df['count2rows'] = len(np.unique(df.filter(like='hf').values))
print(df)

Result

   hf0  hf1  hf2  hf3  hf4  hf5 sample  count  count3rows
0    7    4    2    2    7    2    7yr      3           6
1    8    5    5    6    5    6    7yr      3           6
Answered By: jch
df
###
   hf0  hf1  hf2  hf3  hf4  hf5 sample
0    7    4    2    2    7    2    7yr
1    8    5    5    6    5    6    7yr
2    9    6    8   10    7   10    7yr
3   10    7   11   14    5   14    7yr
4   11    8   14   18    7   18    7yr

Rolling window rolling=2

rolling = 2

ar = df.loc[:,'hf0':'hf5'].values
length = ar.shape[1]
head_arrs = np.zeros((rolling-1, rolling*length))
cubic = np.lib.stride_tricks.sliding_window_view(ar, (rolling,length)).astype(float)
plane = cubic.reshape(-1,rolling*length)

for i in range(rolling-1,0,-1):
        head_arr_l = plane[0,:i*length]
        head_arr_l = np.pad(head_arr_l.astype(float), (0,length*(rolling-i)), 'constant', constant_values=np.nan)
        head_arr_l = np.roll(head_arr_l, length*(rolling-i))
        head_arrs[i-1,:] = head_arr_l

plane = np.insert(plane, 0, head_arrs, axis=0)
df['rolling_nunique'] = pd.DataFrame(plane).nunique(axis=1)
df
###
   hf0  hf1  hf2  hf3  hf4  hf5 sample  rolling_nunique
0    7    4    2    2    7    2    7yr                3
1    8    5    5    6    5    6    7yr                6
2    9    6    8   10    7   10    7yr                6
3   10    7   11   14    5   14    7yr                8
4   11    8   14   18    7   18    7yr                7

Rolling window rolling=3

   hf0  hf1  hf2  hf3  hf4  hf5 sample  rolling_nunique
0    7    4    2    2    7    2    7yr                3
1    8    5    5    6    5    6    7yr                6
2    9    6    8   10    7   10    7yr                8
3   10    7   11   14    5   14    7yr                8
4   11    8   14   18    7   18    7yr                9

If the rolling window covers whole rows of df,
in this case, df.shape[0] = 5,
Rolling window rolling=5

   hf0  hf1  hf2  hf3  hf4  hf5 sample  rolling_nunique
0    7    4    2    2    7    2    7yr                3
1    8    5    5    6    5    6    7yr                6
2    9    6    8   10    7   10    7yr                8
3   10    7   11   14    5   14    7yr               10
4   11    8   14   18    7   18    7yr               11
Answered By: Baron Legendre

You could use stack()

df['count_2 rows'] = df.iloc[:, 0:6].stack().nunique()

If you would like to use rolling functionality you could use rolling() and iterate over the output.

n = 3
df['count_2 rows'] = [i.stack().nunique() for i in df.iloc[:,:6].rolling(n)]
Answered By: rhug123
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.