Java caching design for 100M+ keys?

Need to cache over 100+ million string Key (~100 chars length) for Java standalone application.

Standard cache properties requisite:

  • Persistent.
  • TPS to fetch keys from cache in 10s of milli seconds range.
  • Allows invalidation and expiry.
  • Independent caching server, to allow multi-threaded access.

Preferably don’t want to use enterprise database, as this 100M keys can scale to 500M which would use high memory and system resources with sluggish throughput.

Try Guava Cache. It meets all of your requirement.

Links:

  • Guava Cache Explained
  • guava-cache
  • Persistence: Guava cache

Edit: Another One. I did not use it yet. eh-cache

For distributed cache you can try to use hazelcast.

It can be scaled as you need to and have backups and synchronizations out of the box. And it is a JSR-107 provider and have many other helpfull tools to use. However, if you want persistence, you will need to handle it by yourself or buy their enterprise version.

Finally, to resolve this big data problem, with existing cache solutions available:

  • Have broken the cache into two levels.
  • grouped ~100K keys into one java collection and associated them with common property, in my case keys were having timestamp. So, that timestamp slot became the key for this second level cache block of 100K
  • This time slot key is stored in Java persistent cache with value as compressed Java collection.
  • The reason I manage to get good throughput with 2 level caching with overheads of compression and decompression is, my key searches were range bound so when cache match found, most of the subsequent searches were addressed by in memory java collection of previous search.

To conclude: identify common attribute in keys to group and break them into multilevel cache otherwise you would need hefty hardware and enterprise cache to support this big data problem.