Pandas DataFrame Created from Dictionary vs Created from List
Question:
Is there a line or two of code that would make the DataFrame created from lists behave like the one created from a dictionary?
#DataFrame created from dictionary, this works:
import pandas as pd
data= {'Salary': [30000, 40000, 50000, 85000, 75000],
'Exp': [1, 3, 5, 10, 25],
'Gender': ['M','F', 'M', 'F', 'M']}
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df['Salary'] >= 50000]
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])
print(new_df2)
#This doesn't work with the df.functions, sort and conditionals
data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
[40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df['Salary'] >= 50000] #doesn't work
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
print(new_df2)
Answers:
In your second code, you’re not using the first sublist as column names but rather data.
Pass instead the first sublist as the columns
parameter of your DataFrame
constructor:
df = pd.DataFrame(data[1:], columns=data[0])
Output:
Salary Exp Gender
0 30000 1 M
1 40000 3 F
2 50000 5 M
3 85000 10 F
4 75000 25 M
why your code failed
You code was incorrectly mapping the first sublist as data:
pd.DataFrame(data)
0 1 2 # incorrect header
0 Salary Exp Gender # this shouldn't be a data row
1 30000 1 M
2 40000 3 F
3 50000 5 M
4 85000 10 F
5 75000 25 M
full code:
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
new_df1 = df[df['Salary'] >= 50000] #doesn't work
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
print(new_df2)
Output:
Salary Exp Gender
0 30000 1 M
1 40000 3 F
2 50000 5 M
3 85000 10 F
4 75000 25 M
Salary Exp Gender
2 50000 5 M
3 85000 10 F
4 75000 25 M
Salary Exp Gender
4 75000 25 M
3 85000 10 F
2 50000 5 M
1 40000 3 F
0 30000 1 M
Here is necessary create DataFrame by all values without first and pass parameter columns
:
#This doesn't work with the df.functions, sort and conditionals
data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
[40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
Salary Exp Gender
0 30000 1 M
1 40000 3 F
2 50000 5 M
3 85000 10 F
4 75000 25 M
new_df1 = df[df['Salary'] >= 50000] #working well
print(new_df1), print()
Salary Exp Gender
2 50000 5 M
3 85000 10 F
4 75000 25 M
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
print(new_df2)
Salary Exp Gender
4 75000 25 M
3 85000 10 F
2 50000 5 M
1 40000 3 F
0 30000 1 M
Is there a line or two of code that would make the DataFrame created from lists behave like the one created from a dictionary?
#DataFrame created from dictionary, this works:
import pandas as pd
data= {'Salary': [30000, 40000, 50000, 85000, 75000],
'Exp': [1, 3, 5, 10, 25],
'Gender': ['M','F', 'M', 'F', 'M']}
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df['Salary'] >= 50000]
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])
print(new_df2)
#This doesn't work with the df.functions, sort and conditionals
data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
[40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
df = pd.DataFrame(data)
print(df), print()
new_df1 = df[df['Salary'] >= 50000] #doesn't work
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
print(new_df2)
In your second code, you’re not using the first sublist as column names but rather data.
Pass instead the first sublist as the columns
parameter of your DataFrame
constructor:
df = pd.DataFrame(data[1:], columns=data[0])
Output:
Salary Exp Gender
0 30000 1 M
1 40000 3 F
2 50000 5 M
3 85000 10 F
4 75000 25 M
why your code failed
You code was incorrectly mapping the first sublist as data:
pd.DataFrame(data)
0 1 2 # incorrect header
0 Salary Exp Gender # this shouldn't be a data row
1 30000 1 M
2 40000 3 F
3 50000 5 M
4 85000 10 F
5 75000 25 M
full code:
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
new_df1 = df[df['Salary'] >= 50000] #doesn't work
print(new_df1), print()
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
print(new_df2)
Output:
Salary Exp Gender
0 30000 1 M
1 40000 3 F
2 50000 5 M
3 85000 10 F
4 75000 25 M
Salary Exp Gender
2 50000 5 M
3 85000 10 F
4 75000 25 M
Salary Exp Gender
4 75000 25 M
3 85000 10 F
2 50000 5 M
1 40000 3 F
0 30000 1 M
Here is necessary create DataFrame by all values without first and pass parameter columns
:
#This doesn't work with the df.functions, sort and conditionals
data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
[40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
Salary Exp Gender
0 30000 1 M
1 40000 3 F
2 50000 5 M
3 85000 10 F
4 75000 25 M
new_df1 = df[df['Salary'] >= 50000] #working well
print(new_df1), print()
Salary Exp Gender
2 50000 5 M
3 85000 10 F
4 75000 25 M
new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False]) #ditto
print(new_df2)
Salary Exp Gender
4 75000 25 M
3 85000 10 F
2 50000 5 M
1 40000 3 F
0 30000 1 M