Google has disclosed its continued reliance on hard disk drives (HDDs) for the majority of its storage requirements while achieving significant performance improvements through an in-house automated data tiering system. In a recent announcement, the tech giant elaborated on its “Colossus” universal storage platform that supports services like YouTube, Gmail, and Google Cloud Storage.
The Colossus platform features large filesystems, some exceeding 10 exabytes of storage, with the capability to achieve read throughputs of over 50 TB/s and write throughputs of 25 TB/s. The announcement highlighted that the busiest cluster frequently handles more than 600 million input/output operations per second (IOPS), combining both reads and writes.
Google previously noted in 2021 that it employs a combination of flash and disk storage, where frequently accessed data is stored on SSDs to enhance efficiency and reduce latency. Despite this, there remains a challenge in balancing fast SSDs with cost-effective HDDs.
The automated caching system, known as “L4,” plays a critical role in determining which data is best suited for SSD storage. It creates an index that helps identify whether data is available in cache or on HDD, thus optimizing data access speeds.
Although L4 has improved IOPS and throughput for frequently accessed data, certain types of data, such as quickly written and deleted files, are less suited for HDD storage, prompting considerations for more direct SSD usage.
As Google faces challenges in finding the optimal combination of HDD and SSD storage, the tech giant will reveal more insights into its storage systems during the Google Cloud Next conference in April. Storage tech lead Larry Greenfield and storage software engineer Seth Pollen recommend attendees participate in sessions discussing new features and optimizing storage infrastructure.