Javascript Memoization

May 16, 2018

When working on any sizable project, you’ll inevitably run into a situation where you need to improve the performance of the application. Often times, you’ll notice multiple redundant calls being made to a database or external API that is loading down the external resource and causing unnecessary delays. One way to solve this problem is through caching.

There are varying degrees and strategies for caching. Because I come from the Ruby on Rails world, I think of the lightest form and first defense against redundant expensive operations as memoization. If you’re familiar with any Rails projects you’ll recognize the following memoize pattern:

def whatever
  @whatever ||= do_some_expensive_operation
end

The first call to the whatever method sets the whatever instance variable and returns it. Subsequent calls will recall the set variable rather than recalculate it. Performance is saved and the world rejoices. Another important note here is that the memoized variable resets with each request. That means you don’t usually have to invalidate the “cached” values yourself. (Sometimes you do want to invalidate memoized data this way. You can read more about that here if you like.)

Javascript does not have such a readily available memoization/caching feature (at least not one that I’m aware of). I know there are many caching options out there but I want something quick and lightweight without any dependencies. Because memoization is just a specific kind of cache, I’m going to use the latter term for simplicity.

Standalone Node

Let’s imagine we have an array of objects that have references to other objects stored in a database. Our array does not have duplicate objects in it but the objects may refer to the same relational data that we need to acquire. Here is a program that simulates the idea (just pretend the objects in the array aren’t identical):

// main.js

let dataProvider = require('./dataProvider');

function demo(projectId) {
  return dataProvider.getProjectData(projectId);
}

// let's track how long the program takes to execute, just for fun
let start = Date.now();

let promises = ['a', 'a', 'b', 'a', 'b'].map(projectId => demo(projectId));

Promise.all(promises).then((result) => {
  console.log(`result: ${result.join(`\n        `)}`);
  console.log(`It took ${Date.now() - start} ms to run`);
});

No Cache

The original implementation of the dataProvider is as follows:

// dataProvider.js

module.exports = {
  getProjectData: function(projectId) {
    return new Promise(resolve => {
      let timeout = Math.floor(Math.random() * 2000);
      console.log(`performing query for: ${projectId}, timeout: ${timeout}`);
      setTimeout(resolve, timeout, `done with query for: ${projectId}, timeout: ${timeout}`);
    });
  }
}

The setTimeout function simulates querying a database or 3rd party API or whatever. The output from running the above code looks like this:

$ node main.js

performing query for: a, timeout: 331
performing query for: a, timeout: 1249
performing query for: b, timeout: 1960
performing query for: a, timeout: 830
performing query for: b, timeout: 1324
result: done with query for: a, timeout: 331
        done with query for: a, timeout: 1249
        done with query for: b, timeout: 1960
        done with query for: a, timeout: 830
        done with query for: b, timeout: 1324
It took 1966 ms to run

Notice that the queries for the same project ID are run multiple times, each with their own variable length of time taken. The program has to wait for the longest one to finish and we’ve put an unnecessary load on the database or used an unnecessary number of calls to an API. Here is where caching comes in.

Almost Cache

We need to create some sort of quick cache mechanism and we want it to be as simple as possible. Here is the first attempt:

// dataProvider.js

let cache = require('./cache.js');

module.exports = {
  getProjectData: function(projectId) {
    return cache.get(`project-${projectId}`, new Promise(resolve => {
      let timeout = Math.floor(Math.random() * 2000);
      console.log(`performing query for: ${projectId}, timeout: ${timeout}`);
      setTimeout(resolve, timeout, `done with query for: ${projectId}, timeout: ${timeout}`);
    }));
  }
}

This is basically the same function we had before but we wrapped the promise in a call to cache.get with a unique key. We want the least amount of boiler plate code we can get and we have that here. This is what the cache looks like:

// cache.js

module.exports = {
  get: function(key, value) {
    if(!this[key])
      this[key] = value;
    return this[key];
  }
}

This is also fairly simple. If the property doesn’t already exist in the cache, we create it and set it to the given value. Then we return the property (which will be the promise passed in by the dataProvider). So far so good. When we run it this time, we get the following:

$ node main.js

performing query for: a, timeout: 389
performing query for: a, timeout: 409
performing query for: b, timeout: 1523
performing query for: a, timeout: 870
performing query for: b, timeout: 1171
result: done with query for: a, timeout: 389
        done with query for: a, timeout: 389
        done with query for: b, timeout: 1523
        done with query for: a, timeout: 389
        done with query for: b, timeout: 1523
It took 1528 ms to run

The result is interesting. We’re still performing every query however, we only process the result of the first occurrence of each project ID. The random timeout value aids as a unique identifier to help see this behavior. It’s a step in the right direction, but we’re still making needless queries. The reason this is happening is because promises begin execution the moment they are declared.

The executor function is executed immediately by the Promise implementation, passing resolve and reject functions (the executor is called before the Promise constructor even returns the created object).

- Mozilla developer docs

Working Cache

To fix this problem, we have to check if the promise is already cached. If it isn’t, then we can create and cache it.

// dataProvider.js

let cache = require('./cache.js');

module.exports = {
  getProjectData: function(projectId) {
    let key = `project-${projectId}`;
    return cache.get(key) || cache.set(key, new Promise(resolve => {
      let timeout = Math.floor(Math.random() * 2000);
      console.log(`performing query for: ${projectId}, timeout: ${timeout}`);
      setTimeout(resolve, timeout, `done with query for: ${projectId}, timeout: ${timeout}`);
    }));
  }
}
}

module.exports = dataProvider;

We have to move the conditional from the cache to the dataProvider unfortunately, but it isn’t terrible thanks to some Javascript syntax magic. Here is what the cache looks like now:

// cache.js

module.exports = {
  get: function(key) {
    return this[key];
  },

  set: function(key, value) {
    this[key] = value;
    return this[key];
  }
}

Here is the result of running this code:

$ node main.js

performing query for: a, timeout: 1253
performing query for: b, timeout: 1863
result: done with query for: a, timeout: 1253
        done with query for: a, timeout: 1253
        done with query for: b, timeout: 1863
        done with query for: a, timeout: 1253
        done with query for: b, timeout: 1863
It took 1867 ms to run

This is great, we only perform the query 1 time for each unique project ID. But we get the results for each one to operate on at our leisure.

You might be thinking that we could have left the implementation we had before if we just passed the function inside the promise to the cache - then created the promise from within the cache when it is needed. That does work but there are 2 problems: the generic cache becomes a promise cache; and often times, the promise is created for you by another module and you can’t (or don’t want to) do the caching at that low of a level.

If the promise is being created elsewhere, our dataProvider ends up looking like the following, which is not bad:

// dataProvider.js

let cache = require('./cache.js');
let db = require('db');

module.exports = {
  getProjectData: function(projectId) {
    let key = `project-${projectId}`;
    return cache.get(key) || cache.set(key, db.query(projectId));
  }
}

Express

If you put the code above inside a controller action of an Express application, you’ll notice that the first request will fetch the data once as expected but all future requests will load the data from the cache. To prevent this behavior we have to invalidate the data at some point.

Another challenge is that if we simply invalidate, but do not remove the data (say by using a time property and checking the retrieval time against that property), we will introduce a memory leak. That is because the cache will be loaded once then held in memory for as long as the referencing object is held (which seems to be forever in a controller or any object the controller references). Any data created in the cache will also stick around forever.

Invalidating Cache

Because of the two concerns outlined above, we need to add a single line of code to the cache with the following result:

// cache.js

module.exports = {
  get: function(key) {
    return this[key];
  },

  set: function(key, value) {
    this[key] = value;
    setTimeout(() => delete this[key], 1000);
    return this[key];
  }
}

Now we get the expected behavior. Each time a page is loaded, the data is retrieved exactly once for each unique project ID. I set the invalidation time to 1 second (1000 milliseconds) because I only want the cache to survive a single request. Remember, the goal is to get something similar to Rails memoization, not a complete caching system. Of course, you can set the time to whatever is appropriate for your needs.

Conclusion

Caching can create just as many problems as it solves. As such, these strategies should be used with caution and care. That said, it is great way to boost the performance of your app. And lots of tools including databases, browsers, and servers all over the world use caching to speed up performance.

I’ve shown a quick and dirty implentation of a Javascript memoization/caching technique that gets the job done. There are several caching systems out there dedicated to solving this problem in a feature rich and more involved way (think Memcached or Redis). If you’re not looking for a solution that heavy, I hope this has helped.