Printing the response of a RethinkDB query in a reasonable way
Question:
I am participating in the Yelp Dataset Challenge and I’m using RethinkDB to store the JSON documents for each of the different datasets.
I have the following script:
import rethinkdb as r
import json, os
RDB_HOST = os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
DB = 'test'
connection = r.connect(host=RDB_HOST, port=RDB_PORT, db=DB)
query = r.table('yelp_user').filter({"name":"Arthur"}).run(connection)
print(query)
But when I run it at the terminal in a virtualenv I get this as an example response:
<rethinkdb.net.DefaultCursor object at 0x102c22250> (streaming):
[{'yelping_since': '2014-03', 'votes': {'cool': 1, 'useful': 2, 'funny': 1}, 'review_count': 5, 'id': '08eb0b0d-2633-4ec4-93fe-817a496d4b52', 'user_id': 'ZuDUSyT4bE6sx-1MzYd2Kg', 'compliments': {}, 'friends': [], 'average_stars': 5, 'type': 'user', 'elite': [], 'name': 'Arthur', 'fans': 0}, ...]
I know I can use pprint to pretty print outputs but a bigger issue that I don’t understand how to resolve is just printing them in an intelligent manner, like not just showing “…” as the end of the output.
Any suggestions?
Answers:
run
returns an iterable cursor. Iterate over it to get all the rows:
query = r.table('yelp_user').filter({"name":"Arthur"})
for row in query.run(connection):
print(row)
Another way is to convert rethinkdb.net.DefaultCursor
(or Cursor) into a pandas DataFrame
As seen on documentation (https://rethinkdb.com/api/python/to_array), the Cursor can be transformed into a list, and then to a DataFrame by simply calling:
pd.DataFrame(list(r.db('YOUR-DB').table('YOUR-TABLE').run()))
Although it breaks some of NO-SQL DB logic, since pandas is basead on structured data, it is still a good way to vizualize data
I am participating in the Yelp Dataset Challenge and I’m using RethinkDB to store the JSON documents for each of the different datasets.
I have the following script:
import rethinkdb as r
import json, os
RDB_HOST = os.environ.get('RDB_HOST') or 'localhost'
RDB_PORT = os.environ.get('RDB_PORT') or 28015
DB = 'test'
connection = r.connect(host=RDB_HOST, port=RDB_PORT, db=DB)
query = r.table('yelp_user').filter({"name":"Arthur"}).run(connection)
print(query)
But when I run it at the terminal in a virtualenv I get this as an example response:
<rethinkdb.net.DefaultCursor object at 0x102c22250> (streaming):
[{'yelping_since': '2014-03', 'votes': {'cool': 1, 'useful': 2, 'funny': 1}, 'review_count': 5, 'id': '08eb0b0d-2633-4ec4-93fe-817a496d4b52', 'user_id': 'ZuDUSyT4bE6sx-1MzYd2Kg', 'compliments': {}, 'friends': [], 'average_stars': 5, 'type': 'user', 'elite': [], 'name': 'Arthur', 'fans': 0}, ...]
I know I can use pprint to pretty print outputs but a bigger issue that I don’t understand how to resolve is just printing them in an intelligent manner, like not just showing “…” as the end of the output.
Any suggestions?
run
returns an iterable cursor. Iterate over it to get all the rows:
query = r.table('yelp_user').filter({"name":"Arthur"})
for row in query.run(connection):
print(row)
Another way is to convert rethinkdb.net.DefaultCursor
(or Cursor) into a pandas DataFrame
As seen on documentation (https://rethinkdb.com/api/python/to_array), the Cursor can be transformed into a list, and then to a DataFrame by simply calling:
pd.DataFrame(list(r.db('YOUR-DB').table('YOUR-TABLE').run()))
Although it breaks some of NO-SQL DB logic, since pandas is basead on structured data, it is still a good way to vizualize data