Can't use function in data frame which is converted from Html File

Question:

I have one html file where table is stored and I store that html file into pandas Dataframe like this.

from bs4 import BeautifulSoup
import pandas as pd
table = BeautifulSoup(open('/home/lenovo/Downloads/F4311.html','r').read()).find('table')

# You are passing a <class 'bs4.element.Tag'> element into pandas read_html. You need to convert it to a string.
df = pd.read_html(str(table)) 

It worked and i could print df too. Then I tried to list it’s column name.

cols_df=df.columns.tolist()

It threw an error

AttributeError: 'list' object has no attribute 'columns'

Then I tried to export to csv file.

df.to_csv("data.csv")

It threw me an error

AttributeError: 'list' object has no attribute 'to_csv'

Please help me in fixing these things.

Asked By: Awesome

||

Answers:

If you have a look at the documentation for pd.read_html, you will find that it returns not a dataframe, but "[a] list of DataFrames". This explains the error:

AttributeError: 'list' object has no attribute 'columns'

I.e. your actual pd.DataFrame will be the first item in a list that you have called df. I.e. you access it by using df[0].

Answered By: ouroboros1
Categories: questions Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.