I was trying to Replace Header with First Row in Pandas Data frame. but getting an error said: AttributeError: 'list' object has no attribute 'iloc'
Question:
I’m read a table from website using df = pd.read_html('website link')
:
df = pd.read_html('w3schools.com/python/python_ml_decision_tree.asp')
df[0]
It successfully read the table but I want to replace the 1st row as the header.
I’m using this code:
df.columns = df.iloc[0]
df = df[1:]
df.head()
but it gave me an error that said:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-11-f9b2cba2eb0b> in <module>
----> 1 df.columns = df.iloc[0] #grab the first row for the header
2 df = df[1:] #take the data less the header row
3 df
AttributeError: 'list' object has no attribute 'iloc'
Answers:
.read_html returns a list. You would want to .concat them first:
dfs = pd.read_html(url)
df = pd.concat(dfs)
And finally replace headers with first row:
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
Try this:
df = pd.read_html('https://www.w3schools.com/python/python_ml_decision_tree.asp')
df[0].columns = df[0].iloc[0]
df = df[0][1:]
In your code df = pd.read_html('website link')
, the function pd.read_html()
will output a list, so the variable name df
is not suitable, and could be confusing.
Here’s how I would do it, hope that it’s clear:
import pandas as pd
lis = pd.read_html('https://www.w3schools.com/python/python_ml_decision_tree.asp')
df = pd.DataFrame(lis[0]) #lis has only 1 element
df.columns = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
print(df)
0 Age Experience Rank Nationality Go
1 36 10 9 UK NO
2 42 12 4 USA NO
3 23 4 6 N NO
4 52 4 4 USA NO
5 43 21 8 USA YES
6 44 14 5 UK NO
7 66 3 7 N YES
8 35 14 9 UK YES
9 52 13 7 N YES
10 35 5 9 N YES
11 24 3 5 USA NO
12 18 3 7 UK YES
13 45 9 9 UK YES
Use:
df = pd.read_html('https://www.w3schools.com/python/python_ml_decision_tree.asp')
Based on the documentation:
Read HTML tables into a list of DataFrame objects.
So:
type(df)
returns:
list
and:
len(df)
1
So,
df[0]
returns:
0 1 2 3 4
0 Age Experience Rank Nationality Go
1 36 10 9 UK NO
2 42 12 4 USA NO
3 23 4 6 N NO
4 52 4 4 USA NO
5 43 21 8 USA YES
6 44 14 5 UK NO
7 66 3 7 N YES
8 35 14 9 UK YES
9 52 13 7 N YES
10 35 5 9 N YES
11 24 3 5 USA NO
12 18 3 7 UK YES
13 45 9 9 UK YES
Which is a df and you can use your iloc
.
I’m read a table from website using df = pd.read_html('website link')
:
df = pd.read_html('w3schools.com/python/python_ml_decision_tree.asp')
df[0]
It successfully read the table but I want to replace the 1st row as the header.
I’m using this code:
df.columns = df.iloc[0]
df = df[1:]
df.head()
but it gave me an error that said:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-11-f9b2cba2eb0b> in <module>
----> 1 df.columns = df.iloc[0] #grab the first row for the header
2 df = df[1:] #take the data less the header row
3 df
AttributeError: 'list' object has no attribute 'iloc'
.read_html returns a list. You would want to .concat them first:
dfs = pd.read_html(url)
df = pd.concat(dfs)
And finally replace headers with first row:
df = df.rename(columns=df.iloc[0]).drop(df.index[0])
Try this:
df = pd.read_html('https://www.w3schools.com/python/python_ml_decision_tree.asp')
df[0].columns = df[0].iloc[0]
df = df[0][1:]
In your code df = pd.read_html('website link')
, the function pd.read_html()
will output a list, so the variable name df
is not suitable, and could be confusing.
Here’s how I would do it, hope that it’s clear:
import pandas as pd
lis = pd.read_html('https://www.w3schools.com/python/python_ml_decision_tree.asp')
df = pd.DataFrame(lis[0]) #lis has only 1 element
df.columns = df.iloc[0] #grab the first row for the header
df = df[1:] #take the data less the header row
print(df)
0 Age Experience Rank Nationality Go
1 36 10 9 UK NO
2 42 12 4 USA NO
3 23 4 6 N NO
4 52 4 4 USA NO
5 43 21 8 USA YES
6 44 14 5 UK NO
7 66 3 7 N YES
8 35 14 9 UK YES
9 52 13 7 N YES
10 35 5 9 N YES
11 24 3 5 USA NO
12 18 3 7 UK YES
13 45 9 9 UK YES
Use:
df = pd.read_html('https://www.w3schools.com/python/python_ml_decision_tree.asp')
Based on the documentation:
Read HTML tables into a list of DataFrame objects.
So:
type(df)
returns:
list
and:
len(df)
1
So,
df[0]
returns:
0 1 2 3 4
0 Age Experience Rank Nationality Go
1 36 10 9 UK NO
2 42 12 4 USA NO
3 23 4 6 N NO
4 52 4 4 USA NO
5 43 21 8 USA YES
6 44 14 5 UK NO
7 66 3 7 N YES
8 35 14 9 UK YES
9 52 13 7 N YES
10 35 5 9 N YES
11 24 3 5 USA NO
12 18 3 7 UK YES
13 45 9 9 UK YES
Which is a df and you can use your iloc
.