Python | Get range based on value

Question

How can I get below –
Input –
Input

and Output –
Output

I tried to use shift but I am not getting the result

df1['Value_shift_down'] = df1['Value'].shift(-1)
df1['Value_shift_up'] = df1['Value'].shift(1)

Asked By: Shubhendu Shubham

||

Source

Answer 1

Determinging the lowest and higest value based on a string is not possible (because "10" is smaller then "2"). Therefore you need to replace your strings by integers.

So one solution might be to remove the unit names and convert the unit name numbers to integers. Then you can determine the min and max values by a pivot table.

df['Unit'] = df['Unit'].str.replace('Unit ', '', regex=False).astype(int)
df.pivot_table(index='Value', values='Unit', aggfunc=['min', 'max'])

If neccessary, you can convert afterwards the values back to string and add the unit name.

Answered By: Jens

Answer 2

You were on the right track with pandas .shift(). Instead of converting back and forth between strings and integers, you can leverage shift to compare each value with the previous, then use a cumulative sum (.cumsum()) to detect when the value changes. The final element is the DataFrame .groupby() method, which allows us to collect groups of same-Value rows together and get the start and end Units for that group. The code to do this goes something like this:

result_groups = []
for _, group in df1.groupby((df1["Value"] != df1["Value"].shift()).cumsum()):
    start_unit = group.loc[group.index.min(), "Unit"]
    end_unit = group.loc[group.index.max(), "Unit"]
    value = group["Value"].unique()[0]
    result_groups.append({"Start_Unit": start_unit, "End_Unit": end_unit, "Value": value})
results = pd.DataFrame(result_groups)

The results DataFrame looks like this, as desired:

	Start_Unit	End_Unit	Value
0	Unit 1	Unit 1	2
1	Unit 2	Unit 2	4
2	Unit 3	Unit 3	1
3	Unit 4	Unit 5	5
4	Unit 6	Unit 6	7
5	Unit 7	Unit 7	6
6	Unit 8	Unit 8	9
7	Unit 9	Unit 20	0

Note: This solution does not require uniqueness of the values in the 'Value' column on a group-by-group basis (i.e., if there was another sequence of 5s later, it would correctly recognize it as a new group). If you know ahead of time that this will never happen, you can take a simpler approach, using just df1.groupby("Value") instead and doing something similar inside the loop.

Answered By: L0tad

Python | Get range based on value

Question:

Answers: