More on Microservice Architectures and Performance

By John Maenpaa posted Aug 21, 2018 06:23 AM


In last week's article, I wrote a little about microservice architectures and how that trend can impact performance. Now, I want to dive a bit deeper into an example where an application design that used to be efficient can become overly expensive.

Code/Reference Tables

I'm sure every one of us has a variation of code/reference data tables in our databases. These contain the valid values for code fields and description text for each code. There are variations using a single table for all codes and a different table for each code field. The multiple table design might look something like this:

    create table STATE_CD
         ( STATE_CD      char(02)      not null
        , LABEL_TXT     varchar(30)   not null
         , DESC_TXT      varchar(2000) not null
         , START_DT      date          not null
         , END_DT        date          not null
         , primary key (STATE_CD)

Thirty years ago, we stored many tables like this in the same tablespace. That tablespace was assigned to a bufferpool that was sized to be able to keep all of the data in memory. Applications running in CICS on the same LPAR as the Db2 for z/OS subsystem could perform validation (and get the short LABEL_TXT description column) with a simple query.

    select LABEL_TXT
      from STATE_CD
     where STATE_CD = :statecode
       and CURRENT DATE between START_DT and END_DT;

Since we cached the data in memory in the database address space, there was no need to worry about caching it within the application address space (CICS in this case). Changes in the database became effective immediately, though the frequency of change for this particular data doesn't usually require it. We could spend hours arguing about whether this was the most efficient solution, but it performs well enough. The cross-memory access from the application address space to the database address space is quick. For simplicity, let's say the whole operation takes 1 millisecond. If there were 50 codes that needed to be validated and displayed, that'd be a total of 50 milliseconds for the reference data queries. That is certainly fast enough for the customary sub-second response time requirement.

Migration to Client/Server and Three-Tier GUI

With a two-tier client server configuration, the application running on a desktop accessed the database directly. The codes and their labels were often downloaded from the database during initialization and cached locally in memory. This made it easy to build drop-down lists of the valid code values so that the user would be able to select the one they were looking for without having to look it up elsewhere. Here, our retrieval might look like:

    select STATE_CD
         , LABEL_TXT
      from STATE_CD
     where CURRENT DATE between START_DT and END_DT;

In this case, the query would take a little longer, but since it was only done on application startup, it would still be acceptable. There are two differences here. First, the number of rows coming back was larger since we retrieved all of the valid values. Second, the network communication between the application program on the client and the database server added overhead. For simplicity, let's say the retrieval of data from our code table took 10 milliseconds. With 50 code fields, we'd be looking at 500 milliseconds total, hardly noticeable.

With a three-tier server configuration, the application running on a web server accessed the database on behalf of the browser running on the user's desktop. The web server wanted to be stateless, so usually kept as little as possible around between client invocations. The same query used by the two-tier application could have been used to retrieve the valid values and build drop-down fields into the HTML page that would then be sent to the user. The addition of 500 milliseconds waiting for the database server would have been noticed. Assuming a Java-based application server was implemented and a 10-to-1 performance cost for using Java (which is about the average I’ve seen when compared with COBOL or C implementations), it was probably taking 5 seconds to retrieve and format the code values. If the application switched to Hibernate (or another similar data access mechanism) then the codes could have been cached in memory on the application server, bringing performance back down to acceptable timeframes of 10 milliseconds per code set and 500 milliseconds for all 50 code fields.

With modern Java frameworks, the reference data can be cached on the client machine. The local cache ensures good performance for the user. The application server would continue to maintain its local cache in order to validate changes passed in from the client-side programs.

The key to good performance: cache the reference data in memory on the machine where the application is running.

Moving on to Microservice Architectures

Since we want to keep the function of each component small, a microservice architecture would start by defining a reference data lookup API. The team building this API might expect to support two different calls:

  1. Validate Code Value 'xx' for Code Field 'STATE_CD'
  2. Return List of Code Values for Code Field 'STATE_CD'

That first call is essentially the same as the call we used in CICS to validate code values. The first iteration of development might issue that same query each time they receive a validation request. That means that each request has two network hops (from the client to the server and from the server to the database), plus the database overhead for executing the SQL statement. The team may consider optimizing their queries and using a caching solution. But this is a microservice, so they would not want to cache the data locally which would violate the idea of only doing one thing. The solution is to use a product like Redis that provides an in-memory key-value cache. The key-value lookup will likely be somewhat faster than the database query because the caching product doesn't have to do anything complex. If the cache doesn't contain the desired value, the application still retrieves it from the database (and caches it at that point). Now we have two network hops for a previously cached code lookup, but we have 4 hops for a cache miss.

Wait a second. There's a problem which that caching solution and a code validation lookup. All invalid codes would result in a cache miss, followed by a database query. Now we're doing 3 network hops for every invalid code that gets passed to the API for validation. Why? Because Redis can arbitrarily throw data out of its cache and there is no way to prove an invalid code from a cache lookup of that single code when the cache may not be a complete set. How do we fix this? The application team simply creates their cache as a JSON document that contains the valid set of codes after retrieving those codes from the database. Now, the validation query is never used. Instead, the query that retrieves all values from the database is used and that result is cached. Like so:

    select STATE_CD
         , LABEL_TXT
      from STATE_CD
     where CURRENT DATE between START_DT and END_DT;

The results get packaged up in JSON that might look like:

    "STATE_CD": [
        "AK": "Alaska",
        "AL": "Alabama",
        "AR": "Arkansas",
        "AZ": "Arizona",

Now, when there is a cache hit on the STATE_CD lookup, the program only has to parse the JSON and check the values for validation. Even better, this gives them a single query that provides the data for their second API call too.

But what does the performance look like? We've used 10 milliseconds for the database retrieval of the full list of values for a code. The cache retrieval is expected to be less, but it won't be as fast as the 1 millisecond read we were doing from CICS to Db2. Your mileage will vary a lot, so we'll say this cache lookup takes 5 milliseconds (2 milliseconds for the request, 1 millisecond to look up in cache, 2 milliseconds for the response) plus another 1 millisecond to parse the JSON content, for a total of 6 milliseconds. That's better than the 10 milliseconds for the database lookup and when we multiply it by 50 codes we're at only 300 milliseconds.

Wait a second. That's not quite right. We have 6 milliseconds for the reference validation call to fulfill its backend requests. We also have to look at the 2 milliseconds on the network for our request to the service and 2 milliseconds for its response to come back. Now, we're back at 10 milliseconds. Plus, we have to call the API 50 times, once for each code. That's additional overhead for the program needing to do the code validation. The additional cache product made the API team feel good, but it didn't impact the bottom line for our user. And, we're now looking at 500 milliseconds overhead for every invocation of the application process.


In a microservice architecture, it is considered normal for each application to own its own data and likely store that data in a repository separate from other applications.  This supports the ability of teams to choose the right products for their needs and to implement without needing to coordinate technology with other teams.

Consumers of multiple microservices now become responsible for joining data across subject areas. For example, a purchase order application might have to use separate services for customers, providers, parts, addresses, tax calculations, etc. The more data domains involved in a microservice application, the more network hops involved in bringing the data together and using it.

Compare the performance of this within a database, where the joins occur in microseconds with data that is now retrieved from multiple sources that take 5-10 milliseconds (plus the backend database accesses).


From an end-to-end performance perspective, microservices add at least 2 milliseconds for each network hop required. Pulling microservices together into a cohesive application may involve the use of many services. Building components that keep it simple and focused makes sense from a development and support perspective. Deploying those components to different servers may not be the optimal solution.

1 view