How to convert pandas data types into BQ schema

Question

I am trying to construct a BigQuery schema as per the pandas data types.
The schema should be in json format.

I initally started with below code and not able to construct a base dictionary.

my code:

import pandas as pd
df = pd.DataFrame({'A': [1, 2], 
                   'B': [1., 2.], 
                   'C': ['a', 'b'], 
                   'D': [True, False]})
dict1=df.dtypes.apply(lambda x: x.name).to_dict()
new_dict={}
for k,v in dict1.items():
    new_dict["name"]=k.lower()
    if v == 'bool':
        new_dict["dtype"]="BOOL"
    elif v == 'object':
        new_dict["dtype"]="STRING"
    elif v=='int64':
        new_dict["dtype"]="INTEGER"
        
    new_dict["mode"]="NULLABLE"

with above loop I am am getting last record in the new_dict.
Expected output is:

[
    {
        "name": "col1",
        "mode": "NULLABLE",
        "type": "STRING"
    },
    {
        "name": "col2",
        "mode": "NULLABLE",
        "type": "INTEGER"
    }
]

Please suggest.

Asked By: raj S

||

Source

Answer 1

here is the code snippet to achieve my goal.

json_list = []
for col_name,datatype in dict1.items():
    new_dict={"name": col_name, "mode": "NULLABLE", "dtype": datatype}
    
    new_dict["name"]=col_name.lower()
    
    if datatype == 'bool':
        new_dict["dtype"]="BOOL"
    elif datatype == 'object':
        new_dict["dtype"]="STRING"
    elif datatype =='int64':
        new_dict["dtype"]="INTEGER"
    elif datatype =='float64':
        new_dict["dtype"]="FLOAT"
    
    new_dict["mode"]="NULLABLE" 
    
    json_list.append(new_dict)

Answered By: raj S

Answer 2

The pandas_gbq library supports this.

import pandas as pd
import pandas_gbq
import pprint

df = pd.DataFrame({'A': [1, 2], 
                   'B': [1., 2.], 
                   'C': ['a', 'b'], 
                   'D': [True, False]})

schema = pandas_gbq.schema.generate_bq_schema(df, default_type="STRING")['fields']

pprint.pprint(schema)

Gives the output:

[{'name': 'A', 'type': 'INTEGER'},
 {'name': 'B', 'type': 'FLOAT'},
 {'name': 'C', 'type': 'STRING'},
 {'name': 'D', 'type': 'BOOLEAN'}]

You can just add the mode manually

Answered By: Shardul Frey

How to convert pandas data types into BQ schema

Question:

Answers: