DynamoDB Data Modeling: Designing Around Access Patterns
A step-by-step approach to DynamoDB data modeling — list access patterns, choose keys, denormalize, and validate — the right way to design a NoSQL schema.
On this page
- Why data modeling is different here
- Step 1 — Enumerate every access pattern
- Step 2 — Identify entities and relationships
- Step 3 — Choose primary keys
- Step 4 — Add indexes for the leftover patterns
- Step 5 — Denormalize deliberately
- Step 6 — Validate against the list
- Watch your partition design
- Iterate visually
DynamoDB data modeling is the process of turning your application’s questions — its access patterns — into a key schema that answers each one with a single, cheap request. It is the opposite of relational modeling: you start from how you read data, not from how the data is structured.
This guide walks through a repeatable process: list access patterns, identify entities, choose keys, denormalize, and validate.
Why data modeling is different here
A relational schema is built for flexibility. You normalize into third normal form and let the query planner join tables at read time. DynamoDB has no joins and no query planner. Every read is either a GetItem (one item by primary key), a Query (items sharing a partition key), or a Scan (read everything — almost always a mistake at scale).
That constraint flips the design order. Instead of “what does my data look like?” you ask “what queries will my application run, and how do I make each one a single direct lookup?” If you skip this step, you discover too late that an important query has no efficient path and you’re stuck filtering a full table scan.
Step 1 — Enumerate every access pattern
Before touching keys, write down every read and write your application performs. Be specific about cardinality, sort order, and filters. A useful format is a table:
| # | Access pattern | Type | Sort / filter |
|---|---|---|---|
| 1 | Get a customer by id | Read one | — |
| 2 | List a customer’s orders, newest first | Read many | by date desc |
| 3 | Get an order with its line items | Read many | — |
| 4 | List all orders in a status | Read many | by date |
| 5 | Create / update an order | Write | — |
This list is the single most important artifact in the whole process. If a pattern isn’t on it, your schema won’t serve it well. Add new patterns here before you add new code.
Step 2 — Identify entities and relationships
Now name your entities (Customer, Order, OrderItem) and the relationships between them: one-to-many (a customer has many orders), many-to-many (a product appears in many orders, an order has many products). You’re not creating one table per entity — you’re cataloging what needs to be co-located.
The rule of thumb: items that are read together should be stored together, under the same partition key. That’s what lets a single Query return a customer’s orders, or an order’s items, without a join.
Step 3 — Choose primary keys
Pick a partition key (PK) and sort key (SK) that satisfy your one-item and one-partition reads. With related entities, give the keys generic names and overload them with typed, prefixed values — the core of single-table design:
PK = CUSTOMER#42 SK = PROFILE
PK = CUSTOMER#42 SK = ORDER#2026-06-01#A1
PK = CUSTOMER#42 SK = ORDER#2026-05-20#B7
PK = ORDER#A1 SK = ITEM#sku-9
This satisfies patterns 1–3 directly:
- Get customer:
GetItem(PK=CUSTOMER#42, SK=PROFILE) - List orders newest first:
Query(PK=CUSTOMER#42, begins_with(SK,"ORDER#"), ScanIndexForward=false) - Get order items:
Query(PK=ORDER#A1, begins_with(SK,"ITEM#"))
The orders sort by date for free because the timestamp is the first segment of the sort key. Choosing keys deliberately is the heart of the work — see primary keys for how partition and sort keys behave.
Step 4 — Add indexes for the leftover patterns
Pattern 4 (“list all orders in a status”) can’t be answered by the base table because status isn’t part of the key. Add a Global Secondary Index with overloaded keys:
GSI1PK = STATUS#SHIPPED GSI1SK = ORDER#2026-06-01#A1
Now Query(GSI1PK="STATUS#SHIPPED") returns shipped orders by date. A single GSI usually serves several patterns at once by storing different values per entity type. Resist the urge to add an index per query — see secondary indexes for GSI vs LSI tradeoffs and projection choices.
For many-to-many relationships, use an adjacency list: store relationship items and add an inverted GSI (GSI1PK = SK, GSI1SK = PK) so you can traverse the relationship from either side.
Step 5 — Denormalize deliberately
Normalization is a relational virtue and a DynamoDB anti-pattern. Because there are no joins, you duplicate data so reads stay single-shot. If an order list needs to show the customer’s name, store that name on each order item — don’t fetch the customer separately.
The cost is write-time consistency: when a customer renames themselves, you must update the duplicated copies. Two common tools:
- DynamoDB Streams + a small function to fan out updates to denormalized copies.
- Transactions (
TransactWriteItems) when several items must change atomically.
Duplicate the attributes that are read together and rarely change. Don’t duplicate volatile data you’d have to rewrite constantly.
Step 6 — Validate against the list
Go back to your access-pattern table and prove each row maps to exactly one GetItem or Query — never a Scan. A quick validation table makes gaps obvious:
| # | Pattern | Operation | Index |
|---|---|---|---|
| 1 | Get customer | GetItem | base |
| 2 | List orders | Query begins_with | base |
| 3 | Order items | Query begins_with | base |
| 4 | Orders by status | Query | GSI1 |
| 5 | Write order | PutItem / TransactWriteItems | base |
If any pattern falls back to Scan or a FilterExpression over a large set, your model is incomplete — adjust keys or add an index. (Filters run after reads, so they don’t save capacity; see query vs scan.)
Watch your partition design
Good keys also spread load. A partition key with low cardinality or a single hot value (everything under PK = TENANT#1) concentrates traffic and can throttle even when the table has plenty of provisioned capacity. Prefer keys with high cardinality and even access. If a popular item is read constantly, consider write-sharding the key or fronting it with DynamoDB Accelerator (DAX).
Iterate visually
Data modeling is easier when you can see how overloaded keys lay out across entity types and watch a query return exactly the items you expect. Tablyne, a native DynamoDB GUI, groups items by partition and lets you run access patterns against real data, which turns the validation step into something concrete instead of a thought experiment.
Once the list, the keys, and the indexes all line up, the implementation is mechanical. The thinking lives in steps 1 through 6 — get those right and your reads stay fast and cheap as the table grows. For broader guidance, see best practices.
Frequently asked questions
Should I normalize data in DynamoDB like I do in SQL?
No. DynamoDB has no joins, so normalization forces extra round trips at read time. You denormalize and duplicate data so each access pattern is served by a single Query or GetItem, then keep copies in sync at write time.
How do I model many-to-many relationships in DynamoDB?
Use an adjacency-list pattern: store the relationship as items whose partition key is one side and sort key is the other, then add a GSI that inverts the keys so you can query the relationship from either direction.
What's the biggest mistake in DynamoDB data modeling?
Modeling entities first, the way you would in a relational database. DynamoDB schemas are derived from access patterns; if you design tables around objects you'll end up relying on Scan, which is slow and expensive.