Scaling Discord: How Discord Engineering Tackled Trillions of Messages

Adarsh gupta
3 min readApr 19, 2024

The engineering team was god level

Discord, with approximately 150 million monthly active users, witnesses the creation of millions of messages every hour, accumulating to trillions collectively.

Such colossal data volumes necessitate an operational system devoid of downtime. In this blog, we delve into Discord’s engineering journey in handling this monumental amount of data and the solutions they implemented to overcome challenges.

Here is the full video version of this blog

From MongoDB to Cassandra: The Evolution of Discord’s Data Handling

Initially, Discord embarked on its journey with MongoDB, a common NoSQL database. However, as the platform expanded, they transitioned to Cassandra, seeking scalability, fault tolerance, and low-maintenance data management.

Nonetheless, as Discord’s user base surged from 10 million in 2017 to 150 million by 2024, their Cassandra cluster struggled to cope with the escalating load.

The Cassandra Conundrum: Challenges and Limitations

By early 2022, Discord operated 177 Cassandra nodes housing trillions of messages. However, maintaining this infrastructure proved arduous, with frequent database issues demanding urgent attention.

The unpredictability of performance problems coupled with escalating costs compelled Discord to reevaluate their database solution.

Discord’s Database Structure

Discord’s message organization within the database is structured around channels and fixed time periods called buckets. While efficient for storage and replication, this setup posed challenges for both infrequently and continuously active servers.

Hot Partitions and Performance Delays

Cassandra’s architecture, while adept at data writes, faltered in data retrieval, especially during simultaneous user access.

Hot partitions and delays ensued, hampering overall database performance and necessitating regular maintenance tasks.

The ScyllaDB Solution: A New Hope for Discord

Faced with Cassandra’s limitations, Discord turned to ScyllaDB, attracted by its promises of enhanced speed and reliability.

Unlike Cassandra, ScyllaDB, built on C++, eliminates issues like garbage collection delays, offering a smoother messaging experience.

The Migration Saga:

Migrating trillions of messages to ScyllaDB was a monumental task, requiring meticulous planning and execution.

Discord adopted Rust to develop intermediary services, improving database performance and mitigating hot partition issues.

The Big Migration: Bridging Cassandra to ScyllaDB

Discord’s migration strategy involved booting a ScyllaDB cluster, migrating newer data first, and gradually transitioning older data.

Despite the complexity, Discord’s Rust-based migration tool facilitated a swift transition, completing the process in just nine days.

Following the migration, Discord observed significant improvements in system stability and performance. Reduced infrastructure, lower latencies, and improved message insertion times paved the way for new product features and enhanced user experience.

Discord’s journey from MongoDB to Cassandra and finally to ScyllaDB exemplifies their commitment to providing a seamless user experience amidst exponential growth. By embracing innovative solutions and leveraging scalable technologies, Discord has cemented its position as a leading platform for communication and community building.

Here is the video

Thanks and keep learning

--

--

Adarsh gupta

Software Engineer | JavaScript developer | Technical Writer . Work with me? adarshguptaworks@gmail.com Connect with me? twitter.com/adarsh____gupta/