Python | Get range based on value

Question:

How can I get below –
Input –
Input

and Output –
Output

I tried to use shift but I am not getting the result

df1['Value_shift_down'] = df1['Value'].shift(-1)
df1['Value_shift_up'] = df1['Value'].shift(1)
Asked By: Shubhendu Shubham

||

Answers:

Determinging the lowest and higest value based on a string is not possible (because "10" is smaller then "2"). Therefore you need to replace your strings by integers.

So one solution might be to remove the unit names and convert the unit name numbers to integers. Then you can determine the min and max values by a pivot table.

df['Unit'] = df['Unit'].str.replace('Unit ', '', regex=False).astype(int)
df.pivot_table(index='Value', values='Unit', aggfunc=['min', 'max'])

If neccessary, you can convert afterwards the values back to string and add the unit name.

Answered By: Jens

You were on the right track with pandas .shift(). Instead of converting back and forth between strings and integers, you can leverage shift to compare each value with the previous, then use a cumulative sum (.cumsum()) to detect when the value changes. The final element is the DataFrame .groupby() method, which allows us to collect groups of same-Value rows together and get the start and end Units for that group. The code to do this goes something like this:

result_groups = []
for _, group in df1.groupby((df1["Value"] != df1["Value"].shift()).cumsum()):
    start_unit = group.loc[group.index.min(), "Unit"]
    end_unit = group.loc[group.index.max(), "Unit"]
    value = group["Value"].unique()[0]
    result_groups.append({"Start_Unit": start_unit, "End_Unit": end_unit, "Value": value})
results = pd.DataFrame(result_groups)

The results DataFrame looks like this, as desired:

Start_Unit End_Unit Value
0 Unit 1 Unit 1 2
1 Unit 2 Unit 2 4
2 Unit 3 Unit 3 1
3 Unit 4 Unit 5 5
4 Unit 6 Unit 6 7
5 Unit 7 Unit 7 6
6 Unit 8 Unit 8 9
7 Unit 9 Unit 20 0

Note: This solution does not require uniqueness of the values in the 'Value' column on a group-by-group basis (i.e., if there was another sequence of 5s later, it would correctly recognize it as a new group). If you know ahead of time that this will never happen, you can take a simpler approach, using just df1.groupby("Value") instead and doing something similar inside the loop.

Answered By: L0tad
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.