How to add a dataframe to another on python

Question:

So I have 3 columns. ETA (eta/km*100) (a number), Climate, and month.

My purpose is to drop the values higher that 0.95 quartile and lower than 0.05 (the extreme cases on this dataset) for each subset of 3 months and Climate, and the reagroup the dataset on a single dataset.

The issue I’m having here is that even tho inside the "for" statement it does the job, when I print the resulting data frame, it only has the last subset (Hurricane, last 3 months) without dropping the extreme data.

I’ve tried concat, add and append. Not sure what I’m doing wrong here.

Climate = ['Sunny', 'Cloudy', 'Foggy', 'Rain', 'Storm', 'Hurricane']
newDf = df_Cl
newDf.iloc[0:0]


for cl in Climate:
    print (cl)
    for num in range(4, 14, 3):
        print (num)
        df_Temp = df.loc[(df['Climate'] == cl) & (df['month'] < num)]
        bajo = df_Temp['eta/km*100'].quantile(0.05)
        alto = df_Temp['eta/km*100'].quantile(0.95)
        df_Temp = df_Temp[df_Temp['eta/km*100'] > bajo]
        df_Temp = df_Temp[df_Temp['eta/km*100'] < alto]
        newDf.add(df_Temp)

I’ve also tried:

newDf += df_Temp

But all the values become "NaN"

Asked By: Alberto

||

Answers:

Use:

Climate = ['Sunny', 'Cloudy', 'Foggy', 'Rain', 'Storm', 'Hurricane']

#filter only rows by Climate list
df1 = df[df['Climate'].isin(Climate)]

#create groups per Climate and each 3 months
g = df1.groupby(['Climate', df['month'].sub(1).floordiv(3)])['eta/km*100']

#filter between 0.05 and 0.95 quantile
out = df1[df1['eta/km*100'].between(g.quantile(0.05),
                                    g.quantile.quantile(0.95), inclusive='neither')]

Your solution working if append final df_Temp to lsit of DataFrames and last use concat for join together:

L = []
for cl in Climate:
    print (cl)
    for num in range(4, 14, 3):
        print (num)
        df_Temp = df.loc[(df['Climate'] == cl) & (df['month'] < num)]
        bajo = df_Temp['eta/km*100'].quantile(0.05)
        alto = df_Temp['eta/km*100'].quantile(0.95)
        df_Temp = df_Temp[df_Temp['eta/km*100'] > bajo]
        df_Temp = df_Temp[df_Temp['eta/km*100'] < alto]
        L.append(df_Temp)
        
out = pd.concat(L)
Answered By: jezrael