Pandas: transform a dbf Table into a dataframe

Question

I want to read a dbf file of an ArcGIS shapefile and dump it into a pandas dataframe. I am currently using the dbf package.

I have apparently been able to load the dbf file as a Table, but have not been able to figure out how to parse it and turn it into a pandas dataframe. What is the way to do it?

This is where I am stuck at:

import dbf
thisTable = dbf.Table('C:\Users\myfolder\project\myfile.dbf')
thisTable.open(mode='read-only')

Python returns this statement as output, which I frankly don’t know what to make of:

dbf.ver_2.Table('C:\Users\myfolder\project\myfile.dbf', status='read-only')

EDIT

Sample of my original dbf:

FID   Shape    E              N
0     Point    90089.518711   -201738.245555
1     Point    93961.324059   -200676.766517
2     Point    97836.321204   -199614.270439
...   ...      ...            ...

Asked By: FaCoffee

||

Source

Answer 1

You should have a look at simpledbf:

In [2]: import pandas as pd

In [3]: from simpledbf import Dbf5

In [4]: dbf = Dbf5('test.dbf')

In [5]: df = dbf.to_dataframe()

This works for me with a little sample .dbf file.

Answered By: Fabio Lamanna

Answer 2

You might want to look at geopandas. It will allow you to do most important GIS operations

http://geopandas.org/data_structures.html

Answered By: mmann1123

Answer 3

How about using dbfpy? Here’s an example that shows how to load a dbf with 3 columns into a dataframe:

from dbfpy import dbf
import pandas as pd

df = pd.DataFrame(columns=('tileno', 'grid_code', 'area'))
db = dbf.Dbf('test.dbf')
for rec in db:
    data = []
    for i in range(len(rec.fieldData)):
        data.append(rec[i])
    df.loc[len(df.index)] = data
db.close()

If necessary, you could find out the column names from db.fieldNames.

Answered By: Dobedani

Answer 4

Performance can be an issue. I tested a few of the libraries suggested above and elsewhere. For my test, I used a small dbf file of 17 columns and 23 records (7 kb).

Package simpledbf has a straightforward method to_dataframe(). And the practical aspect of the DBF table object of dbfread is the possibility to just iterate over it by adding it as an argument to Python’s builtin function iter(), of which the result can be used to directly initialise a dataframe. In the case of pysal, I used the function dbf2DF as decribed here. The data from the other libraries I added to the dataframe by using the method shown above. However, only after retrieving the field names so that I could initialise the dataframe with the right column names first: from the fieldNames, _meta.keys and by means of the function ListFields respectively.

Probably adding records 1 by 1 is not the fastest way to obtain a filled dataframe, meaning that tests with dbfpy, dbf and arcpy would result in more favourable figures when a smarter way would be chosen to add the data to the dataframe. All the same, I hope the following table – with times in seconds – is useful:

simpledbf   0.0030
dbfread     0.0060
dbfpy       0.0140
pysal       0.0160
dbf         0.0210
arcpy       2.7770

Answered By: Dobedani

Answer 5

As mmann1123 stated, you can use geopandas in order to read your dbf file. The Geopandas reads it even though it may or may not have geospatial data.

Assuming your data is only tabular data (no geographical coordinate on it), and you wish to read it and convert to a format which pandas library can read, I would suggest using geopandas.

Here is an example:

import geopandas as gpd

My_file_path_name = r'C:Users...file_dbf.dbf'

Table = gpd.read_file(Filename)

import pandas as pd
Pandas_Table = pd.DataFrame(Table)

Keys = list(Table.keys())
Keys.remove('ID_1','ID_2') # removing ID attributes from the Table keys list
Keys.remove('Date') # eventually you have date attribute which you wanna preserve.

DS = pd.melt(Pandas_Table, 
             id_vars =['ID_1','ID_2'], # accepts multiple filter/ID values 
             var_name='class_fito', # Name of the variable which will aggregate all columns from the Table into the Dataframe
             value_name ='biomass (mg.L-1)' , # name of the variable in Dataframe
             value_vars= Keys # parameter that defines which attributes from the Table are a summary of the DataFrame)

# checking your DataFrame:

type(DS)   # should appear something like: pandas.core.frame.DataFrame

Answered By: Philipe Riskalla Leal

Answer 6

I used ‘dbf’ found on PyPi version 0.99.1 which works great.

import dbf
import pandas as pd

table = dbf.Table(filename=filepath)
table.open(dbf.READ_ONLY)
df = pd.DataFrame(table)
table.close()

print(df)

Answered By: JohanV

Answer 7

This worked for me:

import geopandas as gpd

df = gpd.read_file('some_file.dbf').drop("geometry",axis=1)

Answered By: r_a_d_u

Answer 8

How to Load content of a DBF file into a Pandas data frame.

The iter() is required because Pandas doesn’t detect that the DBF
object is iterable.

#import
from dbfread import DBF
import pandas as pd


dbf = DBF('people.dbf')
dataResult = pd.DataFrame(iter(dbf))

print(dataResult)

Answered By: timimi

Pandas: transform a dbf Table into a dataframe

Question:

Answers: