pymongo.errors.CursorNotFound: cursor id '…' not valid at server
Question:
I am trying to fetch some ids that exist in a mongo database with the following code:
client = MongoClient('xx.xx.xx.xx', xxx)
db = client.test_database
db = client['...']
collection = db.test_collection
collection = db["..."]
for cursor in collection.find({ "$and" : [{ "followers" : { "$gt" : 2000 } }, { "followers" : { "$lt" : 3000 } }, { "list_followers" : { "$exists" : False } }] }):
print cursor['screenname']
print cursor['_id']['uid']
id = cursor['_id']['uid']
However, after a short while, I am receive this error:
pymongo.errors.CursorNotFound: cursor id ‘…’ not valid at server.
I found this article which refers to that problem. Nevertheless it is not clear to me which solution to take. Is it possible to use find().batch_size(30)
? What exactly does the above command do? Can I take all the database ids using batch_size
?
Answers:
You’re getting this error because the cursor is timing out on the server (after 10 minutes of inactivity).
From the pymongo documentation:
Cursors in MongoDB can timeout on the server if they’ve been open for
a long time without any operations being performed on them. This can
lead to an CursorNotFound exception being raised when attempting to
iterate the cursor.
When you call the collection.find
method it queries a collection and it returns a cursor to the documents. To get the documents you iterate the cursor. When you iterate over the cursor the driver is actually making requests to the MongoDB server to fetch more data from the server. The amount of data returned in each request is set by the batch_size()
method.
From the documentation:
Limits the number of documents returned in one batch. Each batch
requires a round trip to the server. It can be adjusted to optimize
performance and limit data transfer.
Setting the batch_size to a lower value will help you with the timeout errors errors, but it will increase the number of times you’re going to get access the MongoDB server to get all the documents.
The default batch size:
For most queries, the first batch returns 101 documents or just enough
documents to exceed 1 megabyte. Batch size will not exceed the maximum BSON document size (16 MB).
There is no universal "right" batch size. You should test with different values and see what is the appropriate value for your use case i.e. how many documents can you process in a 10 minute window.
The last resort will be that you set no_cursor_timeout=True
. But you need to be sure that the cursor is closed after you finish processing the data.
How to avoid it without try/except
:
cursor = collection.find(
{"x": 1},
no_cursor_timeout=True
)
for doc in cursor:
# do something with doc
cursor.close()
You can make the cursor not to timeout by using no_cursor_timeout=True
like this:
cursor=db.images.find({}, {'id':1, 'image_path':1, '_id':0}, no_cursor_timeout=True)
for i in cursor:
# .....
# .....
cursor.close() # use this or cursor keeps waiting so ur resources are used up
Earlier this was referred to as timeout
which has been replaced as per the docs.
For more options on which methods support no_cursor_timeout
refer this search results in pymongo docs.
You were using the cursor more than the time out (about 10 minutes) so the cursor no longer exists.
you should choose a low value of batch_size to fix the issue:
(with Pymongo for example)
col.find({}).batch_size(10)
or
set the timeout to false col.find(timeout=False)
and don’t forget to close the cursor in the end.
Set batch_size
in find
method to smaller number. The number is quantity of returned records. Those records should be handled faster than for 10 minutes (default server cursor timeout). Otherwise cursor will be closed on server.
Thus suitable value for batch_size
should be found using next:
collection.find({...}, batch_size=20)
This is a timeout issue, which by default is 10 minutes in mongodb.
I prefer to solve this issue by login in mongo and runing a admin query update:
use admin
db.runCommand({setParameter:1, cursorTimeoutMillis: 1800000})
where 1800000 is equivalent to 30min, where is enough for my use case.
or in terminal (10800000==3h):
sudo mongod --setParameter cursorTimeoutMillis=10800000
You can convert the cursor object into a list and then use that, so you won’t be actually making calls from that cursor anymore and it will be from a local list. So the amount of time your code takes to perform those operations on that cursor is way higher than just copying the cursor to a list. So the probability of timing out while it’s copying to the list is very low. So once it’s done, it times out after a specific amount of time but anyway you are not referring to it anymore, you’ll be using your own list.
Cursor = collection.find({ "$and" : [{ "followers" : { "$gt" : 2000 } }, { "followers" : { "$lt" : 3000 } }, { "list_followers" : { "$exists" : False } }] })
Cursor = [x for x in Cursor]
Now do anything with this list, you have fetched all the records in it.
for eg –
for i in Cursor:
print(i['screenname'])
I am trying to fetch some ids that exist in a mongo database with the following code:
client = MongoClient('xx.xx.xx.xx', xxx)
db = client.test_database
db = client['...']
collection = db.test_collection
collection = db["..."]
for cursor in collection.find({ "$and" : [{ "followers" : { "$gt" : 2000 } }, { "followers" : { "$lt" : 3000 } }, { "list_followers" : { "$exists" : False } }] }):
print cursor['screenname']
print cursor['_id']['uid']
id = cursor['_id']['uid']
However, after a short while, I am receive this error:
pymongo.errors.CursorNotFound: cursor id ‘…’ not valid at server.
I found this article which refers to that problem. Nevertheless it is not clear to me which solution to take. Is it possible to use find().batch_size(30)
? What exactly does the above command do? Can I take all the database ids using batch_size
?
You’re getting this error because the cursor is timing out on the server (after 10 minutes of inactivity).
From the pymongo documentation:
Cursors in MongoDB can timeout on the server if they’ve been open for
a long time without any operations being performed on them. This can
lead to an CursorNotFound exception being raised when attempting to
iterate the cursor.
When you call the collection.find
method it queries a collection and it returns a cursor to the documents. To get the documents you iterate the cursor. When you iterate over the cursor the driver is actually making requests to the MongoDB server to fetch more data from the server. The amount of data returned in each request is set by the batch_size()
method.
From the documentation:
Limits the number of documents returned in one batch. Each batch
requires a round trip to the server. It can be adjusted to optimize
performance and limit data transfer.
Setting the batch_size to a lower value will help you with the timeout errors errors, but it will increase the number of times you’re going to get access the MongoDB server to get all the documents.
The default batch size:
For most queries, the first batch returns 101 documents or just enough
documents to exceed 1 megabyte. Batch size will not exceed the maximum BSON document size (16 MB).
There is no universal "right" batch size. You should test with different values and see what is the appropriate value for your use case i.e. how many documents can you process in a 10 minute window.
The last resort will be that you set no_cursor_timeout=True
. But you need to be sure that the cursor is closed after you finish processing the data.
How to avoid it without try/except
:
cursor = collection.find(
{"x": 1},
no_cursor_timeout=True
)
for doc in cursor:
# do something with doc
cursor.close()
You can make the cursor not to timeout by using no_cursor_timeout=True
like this:
cursor=db.images.find({}, {'id':1, 'image_path':1, '_id':0}, no_cursor_timeout=True)
for i in cursor:
# .....
# .....
cursor.close() # use this or cursor keeps waiting so ur resources are used up
Earlier this was referred to as timeout
which has been replaced as per the docs.
For more options on which methods support no_cursor_timeout
refer this search results in pymongo docs.
You were using the cursor more than the time out (about 10 minutes) so the cursor no longer exists.
you should choose a low value of batch_size to fix the issue:
(with Pymongo for example)
col.find({}).batch_size(10)
or
set the timeout to false col.find(timeout=False)
and don’t forget to close the cursor in the end.
Set batch_size
in find
method to smaller number. The number is quantity of returned records. Those records should be handled faster than for 10 minutes (default server cursor timeout). Otherwise cursor will be closed on server.
Thus suitable value for batch_size
should be found using next:
collection.find({...}, batch_size=20)
This is a timeout issue, which by default is 10 minutes in mongodb.
I prefer to solve this issue by login in mongo and runing a admin query update:
use admin
db.runCommand({setParameter:1, cursorTimeoutMillis: 1800000})
where 1800000 is equivalent to 30min, where is enough for my use case.
or in terminal (10800000==3h):
sudo mongod --setParameter cursorTimeoutMillis=10800000
You can convert the cursor object into a list and then use that, so you won’t be actually making calls from that cursor anymore and it will be from a local list. So the amount of time your code takes to perform those operations on that cursor is way higher than just copying the cursor to a list. So the probability of timing out while it’s copying to the list is very low. So once it’s done, it times out after a specific amount of time but anyway you are not referring to it anymore, you’ll be using your own list.
Cursor = collection.find({ "$and" : [{ "followers" : { "$gt" : 2000 } }, { "followers" : { "$lt" : 3000 } }, { "list_followers" : { "$exists" : False } }] })
Cursor = [x for x in Cursor]
Now do anything with this list, you have fetched all the records in it.
for eg –
for i in Cursor:
print(i['screenname'])