Pandas DataFrame Created from Dictionary vs Created from List

Question:

Is there a line or two of code that would make the DataFrame created from lists behave like the one created from a dictionary?

#DataFrame created from dictionary, this works:
import pandas as pd
data= {'Salary': [30000, 40000, 50000, 85000, 75000],            
        'Exp': [1, 3, 5, 10, 25],          
        'Gender': ['M','F', 'M', 'F', 'M']} 
df = pd.DataFrame(data)
print(df), print()

new_df1 = df[df['Salary'] >= 50000]
print(new_df1), print()

new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])
print(new_df2)


#This doesn't work with the df.functions, sort and conditionals    
data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
        [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]

df = pd.DataFrame(data)
print(df), print()

new_df1 = df[df['Salary'] >= 50000]  #doesn't work
print(new_df1), print()

new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])  #ditto
print(new_df2)
Asked By: gerald

||

Answers:

In your second code, you’re not using the first sublist as column names but rather data.

Pass instead the first sublist as the columns parameter of your DataFrame constructor:

df = pd.DataFrame(data[1:], columns=data[0])

Output:

   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M
why your code failed

You code was incorrectly mapping the first sublist as data:

pd.DataFrame(data)

        0    1       2   # incorrect header
0  Salary  Exp  Gender   # this shouldn't be a data row
1   30000    1       M
2   40000    3       F
3   50000    5       M
4   85000   10       F
5   75000   25       M

full code:
df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()

new_df1 = df[df['Salary'] >= 50000]  #doesn't work
print(new_df1), print()

new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])  #ditto
print(new_df2)

Output:

   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M

   Salary  Exp Gender
2   50000    5      M
3   85000   10      F
4   75000   25      M

   Salary  Exp Gender
4   75000   25      M
3   85000   10      F
2   50000    5      M
1   40000    3      F
0   30000    1      M
Answered By: mozway

Here is necessary create DataFrame by all values without first and pass parameter columns:

#This doesn't work with the df.functions, sort and conditionals    
data = [['Salary', 'Exp', 'Gender'],[30000, 1, 'M'],
        [40000, 3, 'F'], [50000, 5, 'M'], [85000, 10, 'F'], [75000, 25, 'M']]

df = pd.DataFrame(data[1:], columns=data[0])
print(df), print()
   Salary  Exp Gender
0   30000    1      M
1   40000    3      F
2   50000    5      M
3   85000   10      F
4   75000   25      M

new_df1 = df[df['Salary'] >= 50000]  #working well
print(new_df1), print()
   Salary  Exp Gender
2   50000    5      M
3   85000   10      F
4   75000   25      M

new_df2 = df.sort_values(['Exp'], axis = 0, ascending=[False])  #ditto
print(new_df2)

   Salary  Exp Gender
4   75000   25      M
3   85000   10      F
2   50000    5      M
1   40000    3      F
0   30000    1      M
Answered By: jezrael