get specific number of data from values ​in a column in pandas

Question:

In order to prevent my machine learning algorithm from tending to a certain data, I want to reduce the frequency differences in my dataset, which is a pandas table,

for example, in column X;

  • A value is 1500 times
  • B value is 3000 times
  • C value is 1300 times

Is there a way to get 1250 of them all?

Asked By: Emre Oz

||

Answers:

can you try this:

df2=pd.concat(df[df['X']=='A'][:1250],df[df['X']=='B'][:1250],df[df['X']=='C'][:1250])
Answered By: Clegane

A solution assuming you may have an unknown number of unique values:

import pandas as pd

#Creating a Panda dafatframme with the number of elements
d = {'X': 1500*["A"]+3000*["B"]+1300*["C"]}
df = pd.DataFrame(data=d)

#Create a dictionnary containing 1 dataframe for each unique value
dfDict = dict(iter(df.groupby('X')))   

#Keep only the first n values for each and add them to filtered dataframe
for unique_val in dfDict:
    dfDict[unique_val] = dfDict[unique_val][:1250]
    filetered = pd.concat(dfDict, ignore_index=True)
Answered By: Anton B
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.