How to query DynamoDB filtering by value in a list
Question:
There are three items in database:
[
{
"year": 2013,
"info": {
"genres": ["Action", "Biography"]
}
},
{
"year": 2013,
"info": {
"genres": ["Crime", "Drama", "Thriller"]
}
},
{
"year": 2013,
"info": {
"genres": ["Action", "Adventure", "Sci-Fi", "Thriller"]
}
}
]
With the year
attribute being the table’s Primary Key I can go ahead and use the FilterExpression
to match to the exact list
value ["Action", "Biography"]
:
var params = {
TableName : TABLE_NAME,
KeyConditionExpression: "#yr = :yyyy",
FilterExpression: "info.genres = :genres",
ExpressionAttributeNames:{
"#yr": "year"
},
ExpressionAttributeValues: {
":yyyy": 2013,
":genres": ["Action", "Biography"]
}
};
var AWS = require("aws-sdk");
var docClient = new AWS.DynamoDB.DocumentClient();
let promise = docClient.query(params).promise();
promise.then(res => {
console.log("res:", res);
})
Instead of matching an entire list ["Action", "Biography"]
I would rather make a query to return only those table items that contain a string "Biography" in a list stored in the item’s info.genres
field. I wonder if this possible using DynamoDB query
API?
Edited later.
Working solution (Thanks to Balu) is to use QueryFilter
contains
comparison operator:
var params = {
TableName: TABLE_NAME,
Limit: 20,
KeyConditionExpression: "id = :yyyy",
FilterExpression: `contains(info.genres , :qqqq)`,
ExpressionAttributeValues: {
":qqqq": { S: "Biography" },
":yyyy": { N: 2013 },
},
}
let promise = docClient.query(params).promise();
promise.then(res => {
console.log("res:", res);
})
Answers:
Short answer, no. DDB allows to store key:val
pairs so your element which you want to query upon should be the top element.
Long answer, yes. However, it is using scan. Honestly, I don’t see much difference between query and scan as far as RCUs consumption is concerned. You can use Limit
param to limit your RCUs use in a single network call.
If we are good till now, you can use Document Paths in your Filter Expression to achieve what you’re trying to do. See this stack overflow post, and this github example.
However, note that this is a Scan operation, not a query, and it might turn out to be very expensive as it will not use any indices and will iterate over every document in your table.
It would be best to pull these attributes out into the top-level document, and query accordingly with a secondary index.
We can use contains
in Filter expressions instead of =
.
So, "info.genres = :genres"
can be changed to contains(info.genres , :gnOne)
AWS is still going to query on Partition Key extract max 1 MB of data in single query before applying the filter. so, we will be charged with same RCU with or without filter expression but amount of data returned to client will be limited, so, still useful.
const dynamodb = new AWS.DynamoDB();
dynamodb.query(
{
TableName: "my-test-table",
Limit: 20,
KeyConditionExpression: "id = :yyyy",
FilterExpression: `contains(info.genres , :gnOne)`,
ExpressionAttributeValues: {
":gnOne": { S: "Biography" },
":yyyy": { S: "2020" },
},
},
function (err, data) {
if (err) console.error(err);
else console.log("dynamodb scan succeeded:", JSON.stringify(data, null, 2));
}
);
Maybe I’m crazy but lately for this kind of thing if it’s no security issue I’ll just ship a single json file mapping ids to fields I want to search and do all the filtering client-side. I’ve only had lists <20,000 in length and so far seen no performance issues, sure this wouldn’t be scalable to huge lists.
Dynamodb is just so cheap when you actually know your key and can avoid scans/queries with lots of results, it’s kind of irresistible for small projects. I know this is hacky but a pretty "free" solution and could be as little as adding 1-2MB to your client’s download.
There are three items in database:
[
{
"year": 2013,
"info": {
"genres": ["Action", "Biography"]
}
},
{
"year": 2013,
"info": {
"genres": ["Crime", "Drama", "Thriller"]
}
},
{
"year": 2013,
"info": {
"genres": ["Action", "Adventure", "Sci-Fi", "Thriller"]
}
}
]
With the year
attribute being the table’s Primary Key I can go ahead and use the FilterExpression
to match to the exact list
value ["Action", "Biography"]
:
var params = {
TableName : TABLE_NAME,
KeyConditionExpression: "#yr = :yyyy",
FilterExpression: "info.genres = :genres",
ExpressionAttributeNames:{
"#yr": "year"
},
ExpressionAttributeValues: {
":yyyy": 2013,
":genres": ["Action", "Biography"]
}
};
var AWS = require("aws-sdk");
var docClient = new AWS.DynamoDB.DocumentClient();
let promise = docClient.query(params).promise();
promise.then(res => {
console.log("res:", res);
})
Instead of matching an entire list ["Action", "Biography"]
I would rather make a query to return only those table items that contain a string "Biography" in a list stored in the item’s info.genres
field. I wonder if this possible using DynamoDB query
API?
Edited later.
Working solution (Thanks to Balu) is to use QueryFilter
contains
comparison operator:
var params = {
TableName: TABLE_NAME,
Limit: 20,
KeyConditionExpression: "id = :yyyy",
FilterExpression: `contains(info.genres , :qqqq)`,
ExpressionAttributeValues: {
":qqqq": { S: "Biography" },
":yyyy": { N: 2013 },
},
}
let promise = docClient.query(params).promise();
promise.then(res => {
console.log("res:", res);
})
Short answer, no. DDB allows to store key:val
pairs so your element which you want to query upon should be the top element.
Long answer, yes. However, it is using scan. Honestly, I don’t see much difference between query and scan as far as RCUs consumption is concerned. You can use Limit
param to limit your RCUs use in a single network call.
If we are good till now, you can use Document Paths in your Filter Expression to achieve what you’re trying to do. See this stack overflow post, and this github example.
However, note that this is a Scan operation, not a query, and it might turn out to be very expensive as it will not use any indices and will iterate over every document in your table.
It would be best to pull these attributes out into the top-level document, and query accordingly with a secondary index.
We can use contains
in Filter expressions instead of =
.
So, "info.genres = :genres"
can be changed to contains(info.genres , :gnOne)
AWS is still going to query on Partition Key extract max 1 MB of data in single query before applying the filter. so, we will be charged with same RCU with or without filter expression but amount of data returned to client will be limited, so, still useful.
const dynamodb = new AWS.DynamoDB();
dynamodb.query(
{
TableName: "my-test-table",
Limit: 20,
KeyConditionExpression: "id = :yyyy",
FilterExpression: `contains(info.genres , :gnOne)`,
ExpressionAttributeValues: {
":gnOne": { S: "Biography" },
":yyyy": { S: "2020" },
},
},
function (err, data) {
if (err) console.error(err);
else console.log("dynamodb scan succeeded:", JSON.stringify(data, null, 2));
}
);
Maybe I’m crazy but lately for this kind of thing if it’s no security issue I’ll just ship a single json file mapping ids to fields I want to search and do all the filtering client-side. I’ve only had lists <20,000 in length and so far seen no performance issues, sure this wouldn’t be scalable to huge lists.
Dynamodb is just so cheap when you actually know your key and can avoid scans/queries with lots of results, it’s kind of irresistible for small projects. I know this is hacky but a pretty "free" solution and could be as little as adding 1-2MB to your client’s download.