DynamoDB Query vs Scan: What's the Difference?
Query vs Scan in DynamoDB — how they differ, why Scan is usually a mistake at scale, the cost difference, and how to design so you Query instead of Scan.
On this page
Query and Scan are the two ways to read multiple items from a DynamoDB table, and confusing them is one of the most expensive mistakes you can make. A Query reads a targeted slice of one partition; a Scan reads the entire table. They look similar in the SDK but behave nothing alike at scale.
The core difference
A Query uses the primary key. You must supply a partition key value, and DynamoDB jumps straight to that partition and returns only the matching items. You can narrow further with a sort-key condition.
A Scan reads every item in the table (or index), one page at a time, and only then applies any filter you supplied. There is no key targeting — it walks the whole dataset.
| Query | Scan | |
|---|---|---|
| Reads | One partition (key-targeted) | Entire table/index |
| Requires | Partition key value | Nothing |
| Cost grows with | Items matched | Items in the table |
| Latency | Predictable | Grows with table size |
| Typical use | Hot-path reads | Exports, admin, tiny tables |
The key insight: Query cost scales with the data you want; Scan cost scales with the data you have. That difference is invisible on a 100-item table and ruinous on a 100-million-item one.
How Query works
A Query always specifies a KeyConditionExpression. The partition key must use equality; the sort key may use a range:
resp = table.query(
KeyConditionExpression=Key("PK").eq("CUSTOMER#42")
& Key("SK").begins_with("ORDER#"),
ScanIndexForward=False, # newest first
Limit=20,
)
DynamoDB locates the partition, reads matching items in sort-key order, and returns them. You’re billed for the read capacity of the items the query matches (before any filter), not the whole table. Because items in a partition are stored sorted by the sort key, begins_with, between, and >/< conditions are efficient range scans within that one partition.
You can Query the base table or any secondary index — a GSI exists precisely to give you a partition key for an access pattern the base table can’t serve.
How Scan works
A Scan has no key condition. It reads up to 1 MB of data per request, applies any FilterExpression, and returns what’s left:
resp = table.scan(
FilterExpression=Attr("status").eq("SHIPPED")
)
This reads every item, charges read capacity for every item, and only then drops the non-matching ones. If 1% of items match, you still pay for 100%. On a large table a single logical “scan” is many paginated requests, each capped at 1 MB.
The trap: FilterExpression looks like a WHERE clause
The most common misconception is that a FilterExpression makes a Scan cheap, like a SQL WHERE. It doesn’t. Filters run after the read:
- You pay read capacity for every item examined.
- The 1 MB page limit applies to data read before filtering, so a filtered page can return zero items and still cost a full MB of reads.
A filter is a convenience for trimming results you’ve already paid for — never a substitute for a key condition or an index. See expressions for how filter, condition, and key-condition expressions differ.
Cost comparison
Suppose a table has 10 million items averaging 1 KB, and you want the 50 with status = SHIPPED.
- Scan with filter: reads all 10 GB. At 4 KB per read capacity unit (eventually consistent reads count as 0.5 RCU per 4 KB), that’s roughly 1.25 million RCUs — and seconds of latency across many pages.
- Query on a GSI keyed by
STATUS#SHIPPED: reads only the 50 matching items — a handful of RCUs and single-digit milliseconds.
The Scan can be tens of thousands of times more expensive for the same result. That ratio only widens as the table grows. For pricing mechanics, see capacity and pricing.
Design so you Query, not Scan
Almost every “I need a Scan” is really “I haven’t modeled this access pattern.” The fix is design, not a bigger filter:
- Add a GSI whose partition key is the attribute you’re filtering on.
statusbecomesGSI1PK = STATUS#SHIPPED, and the Scan-with-filter becomes a targeted Query. - Overload keys so related items share a partition and a single Query returns them in sort order.
- Use sparse indexes to “list all items where X exists” — items lacking the GSI key attribute simply aren’t indexed, so the index is the filtered set.
Start from your access patterns and give each one a key path. If you find yourself reaching for Scan on a hot path, that’s the signal to revisit the model.
When Scan is fine
Scan isn’t forbidden — it’s just the wrong tool for high-traffic targeted reads. It’s perfectly reasonable for:
- Small tables (config, lookup data) where reading everything is cheap.
- One-off admin or migration tasks run off the hot path.
- Full exports, where you genuinely want every item — and even then, prefer
ParallelScan(usingSegment/TotalSegments) or a DynamoDB-to-S3 export to spread the load.
Paginating both
Neither operation returns unlimited data in one call. Both cap a single response at 1 MB and return a LastEvaluatedKey when more results exist; you pass it back as ExclusiveStartKey to fetch the next page. Don’t mistake an empty page for the end of results — only the absence of LastEvaluatedKey means you’re done. See pagination for the full loop.
Seeing the difference
It helps to watch consumed capacity live: run a filtered Scan and a GSI Query for the same result and compare the RCUs. Tablyne, a native DynamoDB GUI, shows consumed capacity and item counts per operation, which makes the gap between “reads the table” and “reads the slice” immediately obvious — and is a quick way to catch an accidental Scan before it ships.
The rule is simple: design your keys and indexes so the data you want has a direct path, then Query it. Reserve Scan for the rare cases where reading everything is actually the point.
Frequently asked questions
Is Scan ever acceptable in DynamoDB?
Yes, for small tables, one-off admin tasks, or full exports where you genuinely need every item. It's a problem when it runs on a hot path against a large table, because cost and latency grow with table size, not result size.
Does a FilterExpression reduce the cost of a Query or Scan?
No. Filters are applied after items are read from the table, so you pay read capacity for every item examined, not just the ones returned. Use a key condition or an index to narrow the read instead.
Why does my Scan return fewer items than expected?
DynamoDB reads at most 1 MB per request before filtering. If more data exists, the response includes LastEvaluatedKey and you must paginate with ExclusiveStartKey to get the rest.