how to automatically classify a list of numbers

Question:

Well, the context is: I have a list of wind speeds, let’s imagine, 100 wind measurements from 0 to 50 km/h, so I want to automate the creation of a list by uploading the csv, let’s imagine, every 5 km/h, that is, the ones that they go from 0 to 5, what go from 5 to 10… etc.

Let’s go to the code:

wind = pd.read_csv("wind.csv")
df = pd.DataFrame(wind)
x = df["Value"]
d = sorted(pd.Series(x))
lst = [[] for i in range(0,(int(x.max())+1),5)]

this gives me a list of empty lists, i.e. if the winds go from 0 to 54 km/h will create 11 empty lists.

Now, to classify I did this:

for i in range(0,len(lst),1):
    for e in range(0,55,5):
       for n in d:
            if n>e and n< (e+5):
               lst[i].append(n)
            else:
                continue

My objective would be that when it reaches a number greater than 5, it jumps to the next level, that is, it adds 5 to the limits of the interval (e) and jumps to the next i to fill the second empty list in lst. I tried it in several ways because I imagine that the loops must go in a specific order to give a good result. This code is just an example of several that I tried, but they all gave me similar results, either all the lists were filled with all the numbers, or only the first list was filled with all the numbers

Asked By: enrique perez

||

Answers:

Your title mentions classifying the numbers — are you looking for a categorical output like calm | gentle breeze | strong breeze | moderate gale | etc.? If so, take a look at the second example on the pd.qcut docs.

Since you’re already using pandas, use pd.cut with an IntervalIndex (constructed with the pd.interval_range function) to get a Series of bins, and then groupby on that.

import pandas as pd
from math import ceil

BIN_WIDTH = 5

wind_velocity = (pd.read_csv("wind.csv")["Value"]).sort_values()

upper_bin_lim = BIN_WIDTH * ceil(wind_velocity.max() / BIN_WIDTH)
bins = pd.interval_range(
    start=0,
    end=upper_bin_lim,
    periods=upper_bin_lim//BIN_WIDTH,
    closed='left')
velocity_bins = pd.cut(wind_velocity, bins)
groups = wind_velocity.groupby(velocity_bins)

for name, group in groups:
    #TODO: use `groups` to do stuff
Answered By: Joshua Voskamp
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.