How to display different columns and remove them using pandas
Question:
I have csv file like this with 10000 different parameters some of the parameters are empty and some of the parameters have only 0 and 1 combination. I want to display the parameters with 0 and 1 combination and I want to remove the parameters which are empty from the table and then I have to display the table without NA, NaN and empty values.
Any help will be appreciated
Answers:
You can first drop the columns or parameters which are empty and select the rows with only 1 or 0 values.
To get column names which are having all null values
df.columns[df.isna().all()]
Next step you can drop null columns.
df.dropna(how='all', axis=1, inplace=True)
df.loc[:, ((df==0) | (df==1)).all()]
Putting Together the Dataframe
To get started, let’s put together a sample dataframe that you can use throughout the rest of the tutorial. Take a look at the code below to put together the dataframe:
df = pd.DataFrame({'Name': ['Nik', 'Jim', 'Alice', 'Jane', 'Matt', 'Kate'],
'Score': [100, 120, 96, 75, 68, 123],
'Height': [178, 180, 160, 165, 185, 187],
‘Weight’: [180, 175, 143, 155, 167, 189]})enter code here
print(df.head())
By using the df.head() function, you can see what the dataframe’s first five rows look like:
Name Score Height Weight
0 Nik 100 178 180
1 Jim 120 180 175
2 Alice 96 160 143
3 Jane 75 165 155
4 Matt 68 185 167
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
- Load your csv :
For this we will use the very useful library Pandas.
import pandas as pd
df = pd.read_csv(path_to_your_file)
- With pandas.Series.isin() :
It will return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly. After that you drop the not a number (NaN) values. You can use in one line :
df[df.isin([0, 1])].dropna(axis=1)
I think it will be faster than the first answer and the condition with .all()
. Maybe a time comparison can be helpful for you, with your dataset.
I have csv file like this with 10000 different parameters some of the parameters are empty and some of the parameters have only 0 and 1 combination. I want to display the parameters with 0 and 1 combination and I want to remove the parameters which are empty from the table and then I have to display the table without NA, NaN and empty values.
Any help will be appreciated
You can first drop the columns or parameters which are empty and select the rows with only 1 or 0 values.
To get column names which are having all null values
df.columns[df.isna().all()]
Next step you can drop null columns.
df.dropna(how='all', axis=1, inplace=True)
df.loc[:, ((df==0) | (df==1)).all()]
Putting Together the Dataframe
To get started, let’s put together a sample dataframe that you can use throughout the rest of the tutorial. Take a look at the code below to put together the dataframe:
df = pd.DataFrame({'Name': ['Nik', 'Jim', 'Alice', 'Jane', 'Matt', 'Kate'],
'Score': [100, 120, 96, 75, 68, 123],
'Height': [178, 180, 160, 165, 185, 187],
‘Weight’: [180, 175, 143, 155, 167, 189]})enter code here
print(df.head())
By using the df.head() function, you can see what the dataframe’s first five rows look like:
Name Score Height Weight
0 Nik 100 178 180
1 Jim 120 180 175
2 Alice 96 160 143
3 Jane 75 165 155
4 Matt 68 185 167
DataFrame.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
- Load your csv :
For this we will use the very useful library Pandas.
import pandas as pd
df = pd.read_csv(path_to_your_file)
- With pandas.Series.isin() :
It will return a boolean Series showing whether each element in the Series matches an element in the passed sequence of values exactly. After that you drop the not a number (NaN) values. You can use in one line :
df[df.isin([0, 1])].dropna(axis=1)
I think it will be faster than the first answer and the condition with .all()
. Maybe a time comparison can be helpful for you, with your dataset.