Javascript Memoization
When working on any sizable project, you’ll inevitably run into a situation where you need to improve the performance of the application. Often times, you’ll notice multiple redundant calls being made to a database or external API that is loading down the external resource and causing unnecessary delays. One way to solve this problem is through caching.
There are varying degrees and strategies for caching. Because I come from the Ruby on Rails world, I think of the lightest form and first defense against redundant expensive operations as memoization. If you’re familiar with any Rails projects you’ll recognize the following memoize pattern:
def whatever
@whatever ||= do_some_expensive_operation
end
The first call to the whatever
method sets the whatever
instance variable
and returns it. Subsequent calls will recall the set variable rather than
recalculate it. Performance is saved and the world rejoices. Another important
note here is that the memoized variable resets with each request. That means
you don’t usually have to invalidate the “cached” values yourself. (Sometimes
you do want to invalidate memoized data this way. You can read more about
that here if you
like.)
Javascript does not have such a readily available memoization/caching feature (at least not one that I’m aware of). I know there are many caching options out there but I want something quick and lightweight without any dependencies. Because memoization is just a specific kind of cache, I’m going to use the latter term for simplicity.
Standalone Node
Let’s imagine we have an array of objects that have references to other objects stored in a database. Our array does not have duplicate objects in it but the objects may refer to the same relational data that we need to acquire. Here is a program that simulates the idea (just pretend the objects in the array aren’t identical):
// main.js
let dataProvider = require('./dataProvider');
function demo(projectId) {
return dataProvider.getProjectData(projectId);
}
// let's track how long the program takes to execute, just for fun
let start = Date.now();
let promises = ['a', 'a', 'b', 'a', 'b'].map(projectId => demo(projectId));
Promise.all(promises).then((result) => {
console.log(`result: ${result.join(`\n `)}`);
console.log(`It took ${Date.now() - start} ms to run`);
});
No Cache
The original implementation of the dataProvider
is as follows:
// dataProvider.js
module.exports = {
getProjectData: function(projectId) {
return new Promise(resolve => {
let timeout = Math.floor(Math.random() * 2000);
console.log(`performing query for: ${projectId}, timeout: ${timeout}`);
setTimeout(resolve, timeout, `done with query for: ${projectId}, timeout: ${timeout}`);
});
}
}
The setTimeout
function simulates querying a database or 3rd party API or
whatever. The output from running the above code looks like this:
$ node main.js
performing query for: a, timeout: 331
performing query for: a, timeout: 1249
performing query for: b, timeout: 1960
performing query for: a, timeout: 830
performing query for: b, timeout: 1324
result: done with query for: a, timeout: 331
done with query for: a, timeout: 1249
done with query for: b, timeout: 1960
done with query for: a, timeout: 830
done with query for: b, timeout: 1324
It took 1966 ms to run
Notice that the queries for the same project ID are run multiple times, each with their own variable length of time taken. The program has to wait for the longest one to finish and we’ve put an unnecessary load on the database or used an unnecessary number of calls to an API. Here is where caching comes in.
Almost Cache
We need to create some sort of quick cache mechanism and we want it to be as simple as possible. Here is the first attempt:
// dataProvider.js
let cache = require('./cache.js');
module.exports = {
getProjectData: function(projectId) {
return cache.get(`project-${projectId}`, new Promise(resolve => {
let timeout = Math.floor(Math.random() * 2000);
console.log(`performing query for: ${projectId}, timeout: ${timeout}`);
setTimeout(resolve, timeout, `done with query for: ${projectId}, timeout: ${timeout}`);
}));
}
}
This is basically the same function we had before but we wrapped the promise in
a call to cache.get
with a unique key. We want the least amount of boiler
plate code we can get and we have that here. This is what the cache looks like:
// cache.js
module.exports = {
get: function(key, value) {
if(!this[key])
this[key] = value;
return this[key];
}
}
This is also fairly simple. If the property doesn’t already exist in the cache,
we create it and set it to the given value
. Then we return the property
(which will be the promise passed in by the dataProvider
). So far so good.
When we run it this time, we get the following:
$ node main.js
performing query for: a, timeout: 389
performing query for: a, timeout: 409
performing query for: b, timeout: 1523
performing query for: a, timeout: 870
performing query for: b, timeout: 1171
result: done with query for: a, timeout: 389
done with query for: a, timeout: 389
done with query for: b, timeout: 1523
done with query for: a, timeout: 389
done with query for: b, timeout: 1523
It took 1528 ms to run
The result is interesting. We’re still performing every query however, we only process the result of the first occurrence of each project ID. The random timeout value aids as a unique identifier to help see this behavior. It’s a step in the right direction, but we’re still making needless queries. The reason this is happening is because promises begin execution the moment they are declared.
The executor function is executed immediately by the Promise implementation, passing resolve and reject functions (the executor is called before the Promise constructor even returns the created object).
Working Cache
To fix this problem, we have to check if the promise is already cached. If it isn’t, then we can create and cache it.
// dataProvider.js
let cache = require('./cache.js');
module.exports = {
getProjectData: function(projectId) {
let key = `project-${projectId}`;
return cache.get(key) || cache.set(key, new Promise(resolve => {
let timeout = Math.floor(Math.random() * 2000);
console.log(`performing query for: ${projectId}, timeout: ${timeout}`);
setTimeout(resolve, timeout, `done with query for: ${projectId}, timeout: ${timeout}`);
}));
}
}
}
module.exports = dataProvider;
We have to move the conditional from the cache
to the dataProvider
unfortunately, but it isn’t terrible thanks to some Javascript syntax magic.
Here is what the cache
looks like now:
// cache.js
module.exports = {
get: function(key) {
return this[key];
},
set: function(key, value) {
this[key] = value;
return this[key];
}
}
Here is the result of running this code:
$ node main.js
performing query for: a, timeout: 1253
performing query for: b, timeout: 1863
result: done with query for: a, timeout: 1253
done with query for: a, timeout: 1253
done with query for: b, timeout: 1863
done with query for: a, timeout: 1253
done with query for: b, timeout: 1863
It took 1867 ms to run
This is great, we only perform the query 1 time for each unique project ID. But we get the results for each one to operate on at our leisure.
You might be thinking that we could have left the implementation we had before if we just passed the function inside the promise to the cache - then created the promise from within the cache when it is needed. That does work but there are 2 problems: the generic cache becomes a promise cache; and often times, the promise is created for you by another module and you can’t (or don’t want to) do the caching at that low of a level.
If the promise is being created elsewhere, our dataProvider ends up looking like the following, which is not bad:
// dataProvider.js
let cache = require('./cache.js');
let db = require('db');
module.exports = {
getProjectData: function(projectId) {
let key = `project-${projectId}`;
return cache.get(key) || cache.set(key, db.query(projectId));
}
}
Express
If you put the code above inside a controller action of an Express application, you’ll notice that the first request will fetch the data once as expected but all future requests will load the data from the cache. To prevent this behavior we have to invalidate the data at some point.
Another challenge is that if we simply invalidate, but do not remove the data
(say by using a time property and checking the retrieval time against that
property), we will introduce a memory leak. That is because the cache
will be loaded once then held in memory for as long as the referencing object
is held (which seems to be forever in a controller or any object the controller
references). Any data created in the cache
will also stick around forever.
Invalidating Cache
Because of the two concerns outlined above, we need to add a single line of
code to the cache
with the following result:
// cache.js
module.exports = {
get: function(key) {
return this[key];
},
set: function(key, value) {
this[key] = value;
setTimeout(() => delete this[key], 1000);
return this[key];
}
}
Now we get the expected behavior. Each time a page is loaded, the data is retrieved exactly once for each unique project ID. I set the invalidation time to 1 second (1000 milliseconds) because I only want the cache to survive a single request. Remember, the goal is to get something similar to Rails memoization, not a complete caching system. Of course, you can set the time to whatever is appropriate for your needs.
Conclusion
Caching can create just as many problems as it solves. As such, these strategies should be used with caution and care. That said, it is great way to boost the performance of your app. And lots of tools including databases, browsers, and servers all over the world use caching to speed up performance.
I’ve shown a quick and dirty implentation of a Javascript memoization/caching technique that gets the job done. There are several caching systems out there dedicated to solving this problem in a feature rich and more involved way (think Memcached or Redis). If you’re not looking for a solution that heavy, I hope this has helped.