This post is an introduction to a concept we’ll call Prefetching or Prefetcher Pattern1.
In many circumstances, prefetching helps to easily achieve low latencies while enabling high availability for your services.
Added on Apr.14th'22: this was also presented at ZIO World'22, see here for the slides and presentation.
Background
Once upon a time, I was working for a company that was running (and expanding!) a pretty big smartgrid2. Here I could cue notions of IoT sensor data in humonguous quantities, but let’s skip to the juicy bits.
One of the main constraints we operated under was to be able to control the total power usage of our network of devices within short delays. Typically, these would be under one second.
Another (obvious) constraint, was to maintain high availability of the system.
So, we were dealing with the usual things should be quick and work even if the world ends requirements.
The Setting
Each device had a set of preferences stored in a database, and these preferences needed to be accessed fairly often. That would be upon:
- Device connection
- Regular (but random) events on the device side
Whenever these happened, some business logic needed to be run. And latency was, generally speaking, really important, lest we be penalized by the electrical grid operator for failing to implement their every desire.
Troubles With Caching
Caching was obviously an approach that we considered (and which we used in certain contexts). However, it had some drawbacks, which boiled down to non-deterministic (or at least hard to predict) latencies:
- Upon cache misses, you spend a lot of additional time sending out a query to another service
- Said query may take a while to complete, specially in cases where more work is required
- Worst, the query may fail because of a network issue or your database deciding to take the day off.
Basically, you really want to avoid issuing requests over the network for your computations when the target latency is zero. Thus, caching was out.
Enter Prefetching
If you are willing to trade a little bit from certain things like consistency and correctness, the issues above are resolved easily by doing some prefetching:
Essentially, instead of waiting until certain events happen to fetch some settings from a database, you load them all into memory beforehand. This gives you the double advantage of:
- Low (and predictable) latencies: directly lookup something that sits in RAM
- Decoupling: if your database is offline, normal operations can continue.
Of course, you may elect to refresh the prefetched content as often as you’d like.
When To Use It
Expectedly, this approach won’t work in every case, but it works well when:
- Things fit into RAM (and when not, sharding comes to the rescue)
- Eventual consistency is sufficient, or
- Settings change sufficiently rarely that you can do a full re-load when it happens.
For my smartgrid days, it really worked wonders: for all critical operations the vast majority of network IO that was happening occured between our devices and the server they were currently connected to, thus relaxing the constraints on the rest of our infrastructure.
And when the prefetched data did not contain something we needed, we applied the following piece of (questionable) wisdom:
If it’s not in RAM, it doesn’t exist.
How To Use It
This concept is sufficiently simple that anyone can very quickly cobble together an implemenation.
There are a few subtleties that need to be kept in mind though (mostly around concurrency), the impact of which depend on the final use case.
As part of learning ZIO for some upcomming projects, I built and published a pure ZIO prefetcher implementation3, and hope to showcase it in an example or two reasonably soon.
In the meantime, the gist of it is:
type PrefetchedVal = Map[UserId, UserSettings]
val supplier(): ZIO[PrefetchedVal, Throwable, PrefetchedVal] = ...
for {
prefetcher <- PrefetchingSupplier.withInitialValue(
initialValue = Map(),
supplier = supplier(),
updateInterval = 1.second
)
...
instantAccess <- prefetcher.currentValueRef.get
settings = instantAccess(someUserId)
...
} yield ...
Feedback in any form is the most welcome!
-
Not that I invented it, but googling around for similar wordking mostly yields things related to hardware optimisation (pre-fetching of instructions and memory registers). It’s probably one of these things that is too simple to deserve a name at a higher level. ↩︎
-
We used to joke that it was the biggest in the world: I believe this was true at the time, or at least true for the EU area. More details over here. ↩︎
-
Which can be found at: https://github.com/Shastick/zio-prefetcher/ ↩︎