Residue Cache: a Low-Energy Low-Area L2 Cache Architecture via Compression and Partial Hits (Summary)
A ”residue cache” is a low-energy low-area L2 horizontal cache partitioning architecture. In this scheme, the L2 cache is decreased from 64B to 32B. Next to the L2 cache is the residue cache, which stores “overflow” data. A combination of these two caches result in a 53% area reduction and 40% less energy consumption. L2 caches are being adopted into embedded systems, and the authors propose this cache architecture to reduce energy consumption while maintaining good performance.
Because small memory values in programs have upper bytes consisting of all ones or zeros, they are able to compress these memory values using hardware shifters. It’s important to note that information about how data is compressed is stored in an ”encoding cache” When it’s time to decompress the data, the information in the encoding cache is then used. The compressed values are stored in the L2-cache.
The residue cache is needed because not all memory values are small enough to be stored in 32B L2-cache lines. Those memory values larger than 32B have the excess data stored in the residue cache. Since the size of the residue cache is at most 1/4 the size of the L2 cache, not all excess data may fit in the residue cache however. This creates ”partial-lines”. These partial lines may result in ”partial hits”, because all of the requested data is not found to be within the cache(s). The data then needs to be fetched from memory to build full L2 cache lines.
The residue cache architecture decreases dynamic energy consumption by 40%, leakage power by 43%, and area by 53% with at most 0.5% performance degradation, compared to the conventional L2 cache architecture. The authors ran their architecture on the SimpleScalar toolset against SPEC2000 benchmarks running on a 2-way superscalar processor and SPEC2006 benchmarks running on a 4-way superscalar processor. They conclude that the residue cache architecture is general enough to be adopted by high-performance as well as embedded systems.