Michael, muttering

Exploring Storage Architectures

I started contributing to the Thanos project a while back and one of my first tasks was to learn about the different ways we could store files. I knew about file storages, S3, and a couple of other object storage implementations but I never really thought of them as related - or why to use one over another.

Learning about other architectures turned out to be a fun rabbit hole with a whole lot of new things to learn - depending on how deep you want to go of course. Here are some of the things I learnt.

File Storage

Almost all computer users are familiar with file storage. It stores files in folders, which are in turn stored in other folders in some sort of hierarchy. File storage shines when you have data that can be easily organized. It especially makes sense when you have a mix of structured and unstructured data, e.g on a web host that is both serving web pages and storing some amount of user-generated media. Data can be easily shared: users/servers just need to be on the same hard drive or network-attached storage.

At a larger scale, the appeal of file storage shrinks. This is because navigating the hierarchy of directories and sub-directories becomes harder as the number of files grow. Also, hard drives need to be replaced with higher-capacity ones (in terms of size and speed) as storage runs out as well as to reduce I/O latency.

Object Storage

Object storage stores data as “objects”, where object is a combination of the data itself and accompanying metadata set by the developer/administrator. It is typically used to store large, unstructured data and static files that need not change frequently (think compacted logs, video files, images, etc) in a scalable way.

Objects are identified by keys computed based on their data/metadata. The metadata here is more descriptive than that of file storage as it can be customized to add more context beyond just filename and creation timestamps. While it isn’t exactly as performant as other storage types like file and block storages), it works fine especially at scale where those begin to fail.

In distributed systems, you'd expect eventual consistency from object storage. This happens because they'd pick “availability” and “partition tolerance” over “consistency” in the CAP theorem). This isn't always the case though, for instance, Google provides strong consistency for Google Cloud Storage except when granting or revoking access to resources. Amazon’s S3 on the other hand guarantees strong consistency only when creating new objects (updating and deleting both objects and buckets are eventually consistent).

Some examples include of object storage implementations include Amazon’s S3 (its API is almost like a standard at this point), OpenStack Swift, Google Cloud Storage, Ceph (using the Rados Gateway that is compatible with both S3 and OpenStack Swift), etc.

Key-Value(KV) Stores

This was kind of tricky as I couldn’t really tell its differences from regular object storage at first. It's similar to object storage since they both use keys/unique identifiers to identify data in a sea of data. Also, both keys and data (or values) can be arbitrary strings and of arbitrary sizes. Unlike object storage though, KV stores don't usually store extra metadata alongside their values. Also, by design, KV stores expect values to be smaller relative to data in object stores. It also makes sense to expect strong consistency from KV stores. For instance, Redis tries to achieve strong consistency with the “WAIT” command and DynamoDB lets you pick between strong and eventual consistency but defaults to the later).

Block Storage

This operates at the hardware level and is at a much lower level of abstraction than the others. Being lower level means it has lower I/O latency, and it is typically the base for building other storage types. You could even layer the other storage types on each other! E.g., a file storage on top of an object storage, etc.

The efficiency makes it perfect for workloads like databases and boot volumes. It stores data files on storage area networks by breaking them into blocks - each with its own Logical Block Number(LBN) and the size of the blocks depend on the filesystem. During reads, the system assembles the file from the blocks based on the LBN and presents them.


While this barely scratches the surface of each of these data storage architectures, it was pretty interesting having to think about what those things are and where one might want to use them. I’m also hoping to learn some more about object storage,particularly in the coming weeks, as my current feature for the Thanos project would use it quite heavily.