Normalized Caching

In GraphQL, like its name suggests, we create schemas that express the relational nature of our data. When we create and query against a Query type we walk a graph that starts at the root Query type and walks through relational types. Rather than querying for normalized data, in GraphQL our queries request a specific shape of denormalized data, a view into our relational data that can be re-normalized automatically.

As the GraphQL API walks our query documents it may read from a relational database and entities and scalar values are copied into a JSON document that matches our query document. The type information of our entities isn't lost however. A query document may still ask the GraphQL API about what entity it's dealing with using the __typename field, which dynamically introspects an entity's type. This means that GraphQL clients can automatically re-normalize data as results come back from the API by using the __typename field and keyable fields like an id or _id field, which are already common conventions in GraphQL schemas. In other words, normalized caches can build up a relational database of tables in-memory for our application.

For our apps normalized caches can enable more sophisticated use-cases, where different API requests update data in other parts of the app and automatically update data in our cache as we query our GraphQL API. Normalized caches can essentially keep the UI of our applications up-to-date when relational data is detected across multiple queries, mutations, or subscriptions.

Normalizing Relational Data

As previously mentioned, a GraphQL schema creates a tree of types where our application's data always starts from the Query root type and is modified by other data that's incoming from either a selection on Mutation or Subscription. All data that we query from the Query type will contain relations between "entities", JSON objects that are hierarchical.

A normalized cache seeks to turn this denormalized JSON blob back into a relational data structure, which stores all entities by a key that can be looked up directly. Since GraphQL documents give the API a strict specification on how it traverses a schema, the JSON data that the cache receives from the API will always match the GraphQL query document that has been used to query this data. A common misconception is that normalized caches in GraphQL store data by the query document somehow, however, the only thing a normalized cache cares about is that it can use our GraphQL query documents to walk the structure of the JSON data it received from the API.

{
__typename
todo(id: 1) {
__typename
id
title
author {
__typename
id
name
}
}
}
{
"__typename": "Query",
"todo": {
"__typename": "Todo",
"id": 1,
"title": "implement graphcache",
"author": {
"__typename": "Author",
"id": 1,
"name": "urql-team"
}
}
}

Above, we see an example of a GraphQL query document and a corresponding JSON result from a GraphQL API. In GraphQL, we never lose access to the underlying types of the data. Normalized caches can ask for the __typename field in selection sets automatically and will find out which type a JSON object corresponds to.

Generally, a normalized cache must do one of two things with a query document like the above:

  • It must be able to walk the query document and JSON data of the result and cache the data, normalizing it in the process and storing it in relational tables.
  • It must later be able to walk the query document and recreate this JSON data just by reading data from its cache, by reading entries from its in-memory relational tables.

While the normalized cache can't know the exact type of each field, thanks to the GraphQL query language it can make a couple of assumptions. The normalized cache can walk the query document. Each field that has no selection set (like title in the above example) must be a "record", a field that may only be set to a scalar. Each field that does have a selection set must be another "entity" or a list of "entities". The latter fields with selection sets are our relations between entities, like a foreign key in relational databases. Furthermore, the normalized cache can then read the __typename field on related entities. This is called Type Name Introspection and is how it finds out about the types of each entity. From the above document we can assume the following relations:

  • Query.todo(id: 1)Todo
  • Todo.authorAuthor

However, this isn't quite enough yet to store the relations from GraphQL results. The normalized cache must also generate primary keys for each entity so that it can store them in table-like data structures. This is for instance why Relay enforces that each entity must have an id field. This allows it to assume that there's an obvious primary key for each entity it may query. Instead, urql's Graphcache and Apollo assume that there may be an id or _id field in a given selection set. If Graphcache can't find these two fields it'll issue a warning, however a custom keys configuration may be used to generate custom keys for a given type. With this logic the normalized cache will actually create the following "links" between its relational data:

  • "Query", .todo(id: 1)"Todo:1"
  • "Todo:1", .author"Author:1"

As we can see, the Query root type itself has a constant key of "Query". All relational data originates here, since the GraphQL schema is a graph and, like a tree, all selections on a GraphQL query document originate from it. Internally, the normalized cache now stores field values on entities by their primary keys. The above can also be said or written as:

  • The Query entity's todo field with {"id": 1} arguments points to the Todo:1 entity.
  • The Todo:1 entity's author field points to the Author:1 entity.

In Graphcache, these "links" are stored in a nested structure per-entity. "Records" are kept separate from this relational data.

Normalization is based on types, keys, and relations. This information can all be inferred from
the query document.

Storing Normalized Data

At its core, normalizing data means that we take individual fields and store them in a table. In our case we store all values of fields in a dictionary of their primary key, generated from an ID or other key and type name, and the field’s name and arguments, if it has any.

Primary KeyFieldValue
Type name and ID (Key)Field name (not alias) and optionally argumentsScalar value or relation

To reiterate we have three pieces of information that are stored in tables:

  • The entity's key can be derived from its type name via the __typename field and a keyable field. By default Graphcache will check the id and _id fields, however this is configurable.
  • The field's name (like todo) and optional arguments. If the field has any arguments then we can normalize it by JSON stringifying the arguments, making sure that the JSON key is stable by sorting its keys.
  • Lastly, we may store relations as either null, a primary key that refers to another entity, or a list of such. For storing "records" we can store the scalars in a separate table.

In Graphcache the data structure for these tables looks a little like the following, where each entity has a record from fields to other entity keys:

{
links: Map {
'Query': Record {
'todo({"id":1})': 'Todo:1'
},
'Todo:1': Record {
'author': 'Author:1'
},
'Author:1': Record { },
}
}

We can see how the normalized cache is now able to traverse a GraphQL query by starting on the Query entity and retrieve relations for other fields. To retrieve "records" which are all fields with scalar values and no selection sets, Graphcache keeps a second table around with an identical structure. This table only contains scalar values, which keeps our non-relational data away from our "links":

{
records: Map {
'Query': Record {
'__typename': 'Query'
},
'Todo:1': Record {
'__typename': 'Todo',
'id': 1,
'title': 'implement graphcache'
},
'Author:1': Record {
'__typename': 'Author',
'id': 1,
'name': 'urql-team'
},
}
}

This is very similar to how we'd go about creating a state management store manually, except that Graphcache can use the GraphQL document to perform this normalization automatically.

What we gain from this normalization is that we have a data structure that we can both read from and write to, to reproduce the API results for GraphQL query documents. Any mutation or subscription can also be written to this data structure. Once Graphcache finds a keyable entity in their results it's written to its relational table which may update other queries in our application. Similarly queries may share data between one another which means that they effectively share entities using this approach and can update one another. In other words, once we have a primary key like "Todo:1" we may find this primary key again in other entities in other GraphQL results.

Custom Keys and Non-Keyable Entities

In the above introduction we've learned that while Graphcache doesn't enforce id fields on each entity, it checks for the id and _id fields by default. There are many situations in which entities may either not have a key field or have different keys.

As Graphcache traverses JSON data and a GraphQL query document to write data to the cache you may see a warning from it along the lines of "Invalid key: [...] No key could be generated for the data at this field." Graphcache has many warnings like these that attempt to detect undesirable behaviour and helps us to update our configuration or queries accordingly.

In the simplest cases, we may simply have forgotten to add the id field to the selection set of our GraphQL query document. However, what if the field is instead called uuid and our query looks accordingly different?

{
item {
uuid
}
}

In the above selection set we have an item field that has a uuid field rather than an id field. This means that Graphcache won't automatically be able to generate a primary key for this entity. Instead, we have to help it generate a key by passing it a custom keys config:

cacheExchange({
keys: {
Item: data => data.uuid,
},
});

We may add a function as an entry to the keys configuration. The property here, "Item" must be the typename of the entity for which we're generating a key. The function may return an arbitarily generated key. So for our item field, which in our example schema gives us an Item entity, we can create a keys configuration entry that creates a key from the uuid field rather than the id field.

This also raises a question, what does Graphcache do with unkeyable data by default? And, what if my data has no key?
This special case is what we call "embedded data". Not all types in a GraphQL schema will have keyable fields and some types may just abstract data without themselves being relational. They may be "edges", entities that have a field pointing to other entities that simply connect two entities, or data types like a GeoJson or Image type.

In these cases, where the normalized cache encounters unkeyable types, it will create an embedded key by using the parent's primary key and combining it with the field key. This means that "embedded entities" are only reachable from a specific field on their parent entities. They're globally unique and aren't strictly speaking relational data.

{
__typename
todo(id: 1) {
id
image {
url
width
height
}
}
}

In the above example we're querying an Image type on a Todo. This imaginary Image type has no key because the image is embedded data and will only ever be associated to this Todo. In other words, the API's schema doesn't consider it necessary to have a primary key field for this type. Maybe it doesn't even have an ID in our backend's database. We could assign this type an imaginary key (maybe based on the url) but in fact if it's not shared data it wouldn't make much sense to do so.

When Graphcache attempts to store this entity it will issue the previously mentioned warning. Internally, it'll then generate an embedded key for this entity based on the parent entity. If the parent entity's key is Todo:1 then the embedded key for our Image will become Todo:1.image. This is also how this entity will be stored internally by Graphcache:

{
records: Map {
'Todo:1.image': Record {
'__typename': 'Image',
'url': '...',
'width': 1024,
'height': 768
},
}
}

This doesn't however mute the warning that Graphcache outputs, since it believes we may have made a mistake. The warning itself gives us advice on how to mute it:

If this is intentional, create a keys config for Image that always returns null.

Meaning, that we can add an entry to our keys config for our non-keyable type that explicitly returns null, which tells Graphcache that the entity has no key:

cacheExchange({
keys: {
Image: () => null,
},
});

Non-Automatic Relations and Updates

While Graphcache is able to store and update our entities in an in-memory relational data structure, which keeps the same entities in singular unique locations, a GraphQL API may make a lot of implicit changes to the relations of data as it runs or have trivial relations that our cache doesn't need to see to resolve. Like with the keys config, we have two more configuration options to combat this: resolvers and updates.

Manually resolving entities

Some fields in our configuration can be resolved without checking the GraphQL API for relations. The resolvers config allows us to create a list of client-side resolvers where we can read from the cache directly as Graphcache creates a local GraphQL result from its cached data.

{
todo(id: 1) {
id
}
}

Previously we've looked at the above query to illustrate how data from a GraphQL API may be written to Graphcache's relational data structure to store the links and entities in a result against this GraphQL query document. However, it may be possible for another query to have already written this Todo entity to the cache. So, how do we resolve a relation manually?

In such a case, Graphcache may have seen and stored the Todo entity but isn't aware of the relation between Query.todo({"id":1}) and the Todo:1 entity. However, we can tell Graphcache which entity it should look for when it accesses the Query.todo field by creating a resolver for it:

cacheExchange({
resolvers: {
Query: {
todo(parent, args, cache, info) {
return { __typename: 'Todo', id: args.id };
},
},
},
});

A resolver is a function that's similar to GraphQL.js' resolvers on the server-side. They receive the parent data, the field's arguments, access to Graphcache's cached data, and an info object. The entire function signature and more explanations can be found in the API docs. Since it can access the field's arguments from the GraphQL query document, we can return a partial Todo entity. As long as this object is keyable, it will tell Graphcache what the key of the returned entity is. In other words, we've told it how to get to a Todo from the Query.todo field.

This mechanism is immensely more powerful than this example. We have two other use-cases that resolvers may be used for:

  • Resolvers can be applied to fields with records, which means that it can be used to change or transform scalar values. For instance, we can update a string or parse a Date right inside a resolver.
  • Resolvers can return deeply nested results, which will be layered on top of the in-memory relational cached data of Graphcache, which means that it can emulate infinite pagination and other complex behaviour.

Read more about resolvers on the following page about "Local Resolvers".

Manual cache updates

While resolvers, as shown above, operate while Graphcache is reading from its in-memory cache, updates are a configuration option that operate while Graphcache is writing to its cached data. Specifically, these functions can be used to add more updates onto what a Mutation or Subscription may automatically update.

As stated before, a GraphQL schema's data may undergo a lot of implicit changes when we send it a Mutation or Subscription. A new item that we create may for instance manipulate a completely different item or even a list. Often mutations and subscriptions alter relations that their selection sets wouldn't necessarily see. Since mutations and subscriptions operate on a different root type, rather than the Query root type, we often need to update links in the rest of our data when a mutation is executed.

query TodosList {
todos {
id
title
}
}
mutation AddTodo($title: String!) {
addTodo(title: $title) {
id
title
}
}

In a simple example, like the one above, we have a list of todos in a query and create a new todo using the Mutation.addTodo mutation field. When the mutation is executed and we get the result back, Graphcache already writes the Todo item to its normalized cache. However, we also want to add the new Todo item to the list on Query.todos:

import { gql } from '@urql/core';
cacheExchange({
updates: {
Mutation: {
addTodo(result, args, cache, info) {
const query = gql`
{
todos {
id
}
}
`;
cache.updateQuery({ query }, data => {
data.todos.push(result.addTodo);
return data;
});
},
},
},
});

In this code example we can first see that the signature of the updates entry is very similar to the one of resolvers. However, we're seeing the cache in use for the first time. The cache object (as documented in the API docs) gives us access to Graphcache's mechanisms directly. Not only can we resolve data using it, we can directly start sub-queries or sub-writes manually. These are full normalized cache runs inside other runs. In this case we're calling cache.updateQuery on a list of Todo items while the Mutation that added the Todo is already being written to the cache.

As we can see, we may perform manual changes inside of updates functions, which can be used to affect other parts of the cache (like Query.todos here) beyond the automatic updates that a normalized cache is expected to perform.

Read more about writing cache updates on the "Cache Updates" page.

Deterministic Cache Updates

Above, in the "Storing Normalized Data" section, we've talked about how Graphcache is able to store normalized data. However, apart from storing this data there are a couple of caveats that many applications simply ignore, skip, or simplify when they implement a store to cache their data in.

Amongst features like Optimistic Updates and Offline Support, Graphcache supports several features that allow our API results to be more unreliable. Essentially we don't expect API results to always come back in order or on time. However, we expect Graphcache to prevent us from making "indeterministic cache updates", meaning that we expect it to handle API results that come back in a random order and delayed gracefully.

In terms of the "Manual Cache Updates" that we've talked about above and Optimistic Updates the limitations are pretty simple at first and if we use Graphcache as usual we may not even notice them:

  • When we make an optimistic change, we define what a mutation's result may look like once the API responds in the future and apply this temporary result immediately. We store this temporary data in a separate "layer". Once the real result comes back this layer can be deleted and the real API result can be applied as usual.
  • When multiple optimistic updates are made at the same time, we never allow these layers to be deleted separately. Instead Graphcache waits for all mutations to complete before deleting the optimistic layers and applying the real API result. This means that a mutation update cannot accidentally commit optimistic data to the cache permanently.
  • While an optimistic update has been applied, Graphcache stops refetching any queries that contain this optimistic data so that it doesn't "flip back" to its non-optimistic state without the optimistic update being applied. Otherwise we'd see a "flicker" in the UI.

These three principles are the basic mechanisms we can expect from Graphcache. The summary is: Graphcache groups optimistic mutations and pauses queries so that optimistic updates look as expected, which is an implementation detail we can mostly ignore when using it.

However, one implementation detail we cannot ignore is the last mechanism in Graphcache which is called "Commutativity". As we can tell, "optimistic updates" need to store their normalized results on a separate layer. This means that the previous data structure we've seen in Graphcache is actually more like a list, with many tables of links and entities.

Each layer may contain optimistic results and have an order of preference. However, this order also applies to queries. Since queries are run in one order but their API results can come back to us in a very different order, if we access enough pages in a random order things can sometimes look rather weird. We may see that in an application on a slow network connection the results may vary depending on when their results came back.

Commutativity means that we store data in separate layers.

Instead, Graphcache actually uses layers for any API result it receives. In case, an API result arrives out-of-order, it sorts them by precedence — or rather by when they've been requested. Overall, we don't have to worry about this, but Graphcache has mechanisms that keep our updates safe.

Reading on

This concludes the introduction to Graphcache with a short overview of how it works, what it supports, and some hidden mechanisms and internals. Next we may want to learn more about how to use it and more of its features: