In Python create a table with categories and ranges based on a list

Question

I have data in a table that looks like this:

input_data = pd.DataFrame({'cat':['A','A','A','A','A','A','A','A','B','B','B','A','A','A','B','B','B','B','B','B','B'],
              'num':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
             })

It’s just an example. I am not required to use pandas. The point is, that input data will come in a table format.

I need two separate results:

(1) The First Table:

Cat	MIN	MAX
A	1	8
A	12	14
B	9	11
B	15	21

(2) The Second Table:

Cat	ranges
A	1-8; 12-14
B	9-11; 15-21

So far I tried to do this with pandas but then I’ve read that iterating over df might not be a good idea. Here above it’s just an example but the actual df will have from 1K to 10K+ rows.

Asked By: jarsonX

||

Source

Answer 1

First aggregate min/max by GroupBy.agg with helper Series g for consecutive cat values, last sorting by cat by DataFrame.sort_values:

g = input_data['cat'].ne(input_data['cat'].shift()).cumsum()
df = (input_data.groupby([g, 'cat'], as_index=False)
                .agg(MIN=('num','min'), MAX=('num','max'))
                .sort_values('cat', ignore_index=True))
print (df)
  cat  MIN  MAX
0   A    1    8
1   A   12   14
2   B    9   11
3   B   15   21

For second ouput use Series.str.cat with aggregate join:

df1 = (df['MIN'].astype(str).str.cat(df['MAX'].astype(str), sep='-')
                .groupby(df['cat']).agg('; '.join)
                .reset_index(name='ranges'))
print(df1)
  cat       ranges
0   A   1-8; 12-14
1   B  9-11; 15-21

Answered By: jezrael

Answer 2

For the first dataframe, you can create a new group each time the cat row is not equal to the previous then use aggregate functions. For the second one, concatenate MIN and MAX columns then group by Cat then join them:

df1 = (input_data.groupby(df['cat'].ne(df['cat'].shift()).cumsum(), as_index=False)
         .agg(Cat=('cat', 'first'), MIN=('num', 'min'), MAX=('num', 'max')))

df2 = (df1.assign(ranges=df1['MIN'].astype(str) + '-' + df1['MAX'].astype(str))
          .groupby('Cat', as_index=False)['ranges'].apply('; '.join))

Output:

>>> df1
  Cat  MIN  MAX
0   A    1    8
1   B    9   11
2   A   12   14
3   B   15   21

>>> df2
  Cat       ranges
0   A   1-8; 12-14
1   B  9-11; 15-21

Answered By: Corralien

In Python create a table with categories and ranges based on a list

Question:

Answers: