Persisting & retrieval for ~20 Million images

Problem statement:

  1. Store 20 million images, where each image is ~1 MB in size
  2. Each image will be associated with metadata


  1. What is the best storage solution?
  2. What is the best mechanism to retrieve any of these images (Assumption: each image is uniquely identifiable with a key (combination of imaging conditions stored in the metadata))?

Initial solution, that I thought was to store these 20M images in a filesystem and metadata for each of these images in a .txt file (a text file for each single image). Then on image retrieval create an index of these images. Idea is to create a map where the keys will be the metadata info and value as the location of the images on the file-system. I tried with Java Map for indexing. I wrote a sample code just to store some random string as keys(where the keys are not duplicated) and some location as the value. That itself was taking >30 seconds. I am using Join-fork feature thinking that when the tasks run in parallel will improve the performance but still the issue persists.