DynamoDB Best Practices for Production
Battle-tested DynamoDB best practices — key design, avoiding hot partitions, capacity, indexes, error handling, cost control and observability.
On this page
DynamoDB rewards good design and punishes bad design more sharply than most databases — a key mistake that a relational engine would absorb with an index can throttle your whole table at scale. These are the practices that hold up in production, grouped by the area that bites teams hardest.
Design keys for distribution, not just lookup
The single most important decision is the partition key. DynamoDB hashes it to assign items to physical partitions, and each partition caps out at roughly 3,000 RCU and 1,000 WCU. If your traffic concentrates on one key value, you get a hot partition and throttling — even when the table’s total provisioned capacity is nowhere near exhausted.
Rules that keep load spread:
- Choose high-cardinality partition keys.
userId,orderId, andtenantId#userIddistribute well.status,country, or a literal"GLOBAL"bucket do not. - Never use monotonically increasing keys for hot writes. A timestamp or sequence as the partition key sends all current traffic to one partition.
- Shard write-heavy keys when you must. Append a suffix (
USER#123#0..USER#123#9) computed from a hash and scatter writes across the shards, then read all shards in parallel. Use this only when a single key genuinely exceeds partition limits.
See primary keys for how PK and SK selection drives everything else, and single-table design for co-locating related items.
Model around access patterns
Write down every query your application makes before you design keys. DynamoDB has no joins and ad-hoc queries are expensive, so the schema exists to serve a known set of access patterns. If you find yourself reaching for Scan or FilterExpression to answer a core query, the model is wrong — add a sort-key pattern or a GSI instead. The full process is in data modeling.
Query, never Scan, on the hot path
Scan reads the entire table and bills you for all of it, applying filters only afterward. It’s fine for occasional batch jobs and analytics, never for user-facing reads. Always answer with Query against a partition key plus a key condition. If you keep needing a Scan, you’re missing an index. The cost difference is covered in query vs scan.
# Bad: reads the whole table, filters in memory
table.scan(FilterExpression=Attr("status").eq("SHIPPED"))
# Good: targeted query against a GSI keyed on status
table.query(
IndexName="GSI1",
KeyConditionExpression=Key("GSI1PK").eq("STATUS#SHIPPED"),
)
Use indexes deliberately
Global Secondary Indexes are powerful but not free:
- Every write that modifies an indexed attribute costs extra WCUs on the index. Three GSIs can quadruple write cost.
- Project only the attributes you read from the index (
KEYS_ONLYorINCLUDE) rather thanALL, to shrink index storage and write cost. - GSIs are eventually consistent — you cannot do a strongly consistent read on a GSI. Don’t use one where read-your-writes correctness matters.
- Prefer overloading one GSI across access patterns to spawning many narrow ones. Secondary indexes covers GSI vs LSI tradeoffs.
Keep items small
Item size drives both throughput cost and latency. The max item size is 400 KB, but you should stay far below it.
- Store large blobs (images, documents) in S3 and keep only a pointer in DynamoDB.
- Use short attribute names — they count toward item size on every read and write.
- Split rarely-accessed attributes into a separate item under the same partition key so common reads stay cheap.
Pick the right consistency and capacity mode
- Default to eventually consistent reads — they’re half the cost and fine for most reads. Reserve strongly consistent reads for cases that truly need read-your-writes. See consistency.
- Start new or spiky workloads on on-demand capacity, then move steady high-utilization tables to provisioned with auto scaling. The break-even math is in capacity and pricing.
- Use transactions (
TransactWriteItems/TransactGetItems) for multi-item atomicity — but know they cost double the capacity and have a 100-item limit. See transactions.
Write defensively
DynamoDB surfaces transient conditions as exceptions you’re expected to handle:
- Retry with exponential backoff and jitter. The AWS SDKs do this by default for throttling and 5xx errors — keep it on.
- Use condition expressions for safe writes.
attribute_not_exists(PK)prevents overwriting an existing item; a version check enables optimistic locking. See expressions. - Handle
BatchWriteItem/BatchGetItempartial failures. These returnUnprocessedItems/UnprocessedKeys; you must resubmit them, ideally with backoff. See batch operations. - Paginate fully. A
QueryorScanreturns at most 1 MB per call and aLastEvaluatedKeyif more remains — loop until it’s absent. See pagination.
last_key = None
while True:
kwargs = {"KeyConditionExpression": Key("PK").eq("USER#123")}
if last_key:
kwargs["ExclusiveStartKey"] = last_key
resp = table.query(**kwargs)
process(resp["Items"])
last_key = resp.get("LastEvaluatedKey")
if not last_key:
break
Control cost and clean up data
- Set TTL on ephemeral items (sessions, OTPs, events). TTL deletes are free and reduce storage.
- Drop GSIs you no longer query — they cost storage and write amplification indefinitely.
- Watch item-size creep; verbose new attributes raise the cost of every existing read path.
Observe what’s actually happening
You can’t tune what you can’t see. In production:
- Alarm on the
ThrottledRequests,ReadThrottleEvents, andWriteThrottleEventsCloudWatch metrics — throttling is your earliest signal of a hot partition or under-provisioning. - Enable CloudWatch Contributor Insights to find the specific partition keys driving traffic.
- Pass
ReturnConsumedCapacity=INDEXESduring development so you see the real cost of each call, including index amplification.
When you’re modeling and debugging locally, seeing items grouped by partition makes hot-key risks obvious before they reach production. A native DynamoDB GUI like Tablyne shows real items laid out by partition key and reports consumed capacity per query, which surfaces low-cardinality keys and accidental full-table scans early. It’s a faster feedback loop than reading CloudWatch graphs after the fact.
The throughline across all of this: design for your access patterns, keep load spread across partitions, and measure consumed capacity. Get those three right and DynamoDB scales effortlessly. For the modeling foundation, start with single-table design and data modeling.
Frequently asked questions
What causes a hot partition in DynamoDB?
A hot partition happens when too much read or write traffic concentrates on a single partition key value, exceeding the per-partition limit of roughly 3,000 RCU / 1,000 WCU. It's usually caused by low-cardinality keys like a status flag or a single 'global' bucket. Spread load with a higher-cardinality key or write sharding.
Should I retry throttled DynamoDB requests?
Yes. ProvisionedThroughputExceededException and throttling are expected transient conditions. Retry with exponential backoff and jitter — the AWS SDKs do this automatically by default, so don't disable it. Persistent throttling means a capacity or key-design problem, not a retry problem.
How many GSIs should a DynamoDB table have?
There's no fixed number, but each GSI adds storage and write amplification — every write touching an indexed attribute costs extra WCUs. Most well-modeled tables need one to four GSIs. Add one only when a real access pattern can't be served otherwise.