converting google datastore query result to pandas dataframe in python

Question:

I need to convert a Google Cloud Datastore query result to a dataframe, to create a chart from the retrieved data. The query:

def fetch_times(limit):
    start_date = '2019-10-08'
    end_date = '2019-10-19'
    query = datastore_client.query(kind='ParticleEvent')
    query.add_filter(
        'published_at', '>', start_date)
    query.add_filter(
        'published_at', '<', end_date)
    query.order = ['-published_at']
    times = query.fetch(limit=limit)
    return times

creates a json like string of the results for each entity returned by the query:

  • Entity(‘ParticleEvent’, 5942717456580608) {‘gc_pub_sub_id’: ‘438169950283983’, ‘data’: ‘605’, ‘event’: ‘light intensity’, ‘published_at’: ‘2019-10-11T14:37:45.407Z’, ‘device_id’: ‘e00fce6847be7713698287a1’}>

Thought I found something that would translate to json which I could convert to dataframe, but get an error that the properties attribute does not exist:

def to_json(gql_object):
    result = []
    for item in gql_object:
        result.append(dict([(p, getattr(item, p)) for p in item.properties()]))
    return json.dumps(result, cls=JSONEncoder)

Is there a way to iterate through the query results to get them into a dataframe either directly to a dataframe or by converting to json then to dataframe?

Asked By: kdot

||

Answers:

You can use pd.read_json to read your json query output into a dataframe.

Assuming the output is the string that you have shared above, then the following approach can work.

#Extracting the beginning of the dictionary
startPos = line.find("{")

df = pd.DataFrame([eval(line[startPos:-1])])

Output looks like :

     gc_pub_sub_id data            event              published_at  
0  438169950283983  605  light intensity  2019-10-11T14:37:45.407Z   

                  device_id  
0  e00fce6847be7713698287a1 

Here, line[startPos:-1] is essentially the entire dictionary in that sthe string input. Using eval, we can convert it into an actual dictionary. Once we have that, it can be easily converted into a dataframe object

Answered By: Roshan Santhosh

Original poster found a workaround, which is to convert each item in the query result object to string, and then manually parse the string to extract the needed data into a list.

Answered By: sllopis

Datastore entities can be treated as Python base dictionaries! So you should be able to do something as simple as…

df = pd.DataFrame(datastore_entities)

…and pandas will do all the rest.

If you needed to convert the entity key, or any of its attributes to a column as well, you can pack them into the dictionary separately:

for e in entities:
    e['entity_key'] = e.key
    e['entity_key_name'] = e.key.name  # for example

df = pd.DataFrame(entities)
Answered By: bkitej

The return value of the fetch function is google.cloud.datastore.query.Iterator which behaves like a List[dict] so the output of fetch can be passed directly into pd.DataFrame.

import pandas as pd

df = pd.DataFrame(fetch_times(10))

This is similar to @bkitej, but I added the use of the original poster’s function.

Answered By: ddrscott