Querying MongoDB (via pymongo) in case insensitive efficiently

Question:

I’m currently creating a website in python (pyramid) which requires users to sign up and log in. The system allows for users to choose a username which can be a mixture of capital letters, lowercase letters, and numbers.

The problem arises when making sure that two users don’t accidentally share the same username, i.e. in my system ‘randomUser’ should be the same as ‘RandomUser’ or ‘randomuser’.

Unfortunately (in this case) because Mongo stores strings as case sensitive, there could potentially be a number of users with the ‘same’ username.

I am aware of the method of querying mongo for case insensitive strings:

db.stuff.find_one({"foo": /bar/i});

However, this does not seem to work in my query method using pymongo:

username = '/' + str(username) + '/i'
response = request.db['user'].find_one({"username":username},{"username":1})

Is this the correct way of structuring the query for pymongo (I’m assuming not)?

This query will be used whenever a user account is created or logged in to (as it has to check if the username exists in the system). I know it’s not the most efficient query, so should it matter if it’s only used on log ins or account creation? Is it more desirable to instead do something like forcing users to choose only lowercase usernames (negating the need for case-insensitive queries altogether)?

Asked By: johneth

||

Answers:

PyMongo uses native python regular expressions, in the same way as the mongo shell uses native javascript regular expressions. To write the equivalent query of what you had written in the shell above, you would use:

db.stuff.find_one({'name': re.compile(username, re.IGNORECASE)})

Note that this will avoid using any index that may exist on the name field, however. A common pattern for case-insensitive searching or sorting is to have a second field in your document, for instance name_lower, which is always set whenever name changes (to a lower-cased version of name, in this case). You would then query for such a document like:

db.stuff.find_one({'name_lower': username.lower()})
Answered By: dcrosta

Accepted answer is dangerous, it will match any string containing the username! Safe option is to match the exact string:

import re
db.stuff.find_one({'name': re.compile('^' + username + '$', re.IGNORECASE)})

Even safer, escape the variable of any special characters which might affect the regex match:

import re
db.stuff.find_one({'name': re.compile('^' + re.escape(username) + '$', re.IGNORECASE)}) 
Answered By: Mikko

Case sensitive:

db.stuff.find_one({'name': {'$regex': f'^{username}$'}})

Case insensitive:

db.stuff.find_one({'name': {'$regex': f'^{username}$', "$options": '-i'}})
Answered By: Manu Krishna