In Python create a table with categories and ranges based on a list

Question:

I have data in a table that looks like this:

input_data = pd.DataFrame({'cat':['A','A','A','A','A','A','A','A','B','B','B','A','A','A','B','B','B','B','B','B','B'],
              'num':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
             })

It’s just an example. I am not required to use pandas. The point is, that input data will come in a table format.

I need two separate results:

(1) The First Table:

Cat MIN MAX
A 1 8
A 12 14
B 9 11
B 15 21

(2) The Second Table:

Cat ranges
A 1-8; 12-14
B 9-11; 15-21

So far I tried to do this with pandas but then I’ve read that iterating over df might not be a good idea. Here above it’s just an example but the actual df will have from 1K to 10K+ rows.

Asked By: jarsonX

||

Answers:

First aggregate min/max by GroupBy.agg with helper Series g for consecutive cat values, last sorting by cat by DataFrame.sort_values:

g = input_data['cat'].ne(input_data['cat'].shift()).cumsum()
df = (input_data.groupby([g, 'cat'], as_index=False)
                .agg(MIN=('num','min'), MAX=('num','max'))
                .sort_values('cat', ignore_index=True))
print (df)
  cat  MIN  MAX
0   A    1    8
1   A   12   14
2   B    9   11
3   B   15   21

For second ouput use Series.str.cat with aggregate join:

df1 = (df['MIN'].astype(str).str.cat(df['MAX'].astype(str), sep='-')
                .groupby(df['cat']).agg('; '.join)
                .reset_index(name='ranges'))
print(df1)
  cat       ranges
0   A   1-8; 12-14
1   B  9-11; 15-21
Answered By: jezrael

For the first dataframe, you can create a new group each time the cat row is not equal to the previous then use aggregate functions. For the second one, concatenate MIN and MAX columns then group by Cat then join them:

df1 = (input_data.groupby(df['cat'].ne(df['cat'].shift()).cumsum(), as_index=False)
         .agg(Cat=('cat', 'first'), MIN=('num', 'min'), MAX=('num', 'max')))

df2 = (df1.assign(ranges=df1['MIN'].astype(str) + '-' + df1['MAX'].astype(str))
          .groupby('Cat', as_index=False)['ranges'].apply('; '.join))

Output:

>>> df1
  Cat  MIN  MAX
0   A    1    8
1   B    9   11
2   A   12   14
3   B   15   21

>>> df2
  Cat       ranges
0   A   1-8; 12-14
1   B  9-11; 15-21
Answered By: Corralien
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.