Initially, the system relied heavily on a MySQL database to manage and store denormalized contact data for a CRM system. Cached in a table referred to as the "contacts table," this data supported contact information displayed on the CRM's main page. Frequent updates to the table, triggered by any additions, updates, or deletions, placed a significant load on the database. As a result, the system struggled to maintain consistency and scalability as it scaled.
The complexity of the MySQL setup became evident as both the volume of data and the frequency of operations grew. This strain manifested as performance degradation, with deadlocks becoming a recurring issue. Multiple processes attempting concurrent writes to the cache table caused operational bottlenecks, hampering efficiency. Additionally, frequent updates put immense pressure on the CPU, especially within the AWS Aurora cluster, further driving up operational costs.
At a certain point, the limitations of SQL became too pronounced to ignore. While SQL databases are robust and versatile, they are not always the optimal choice for every query or storage requirement—especially when dealing with vast amounts of denormalized data requiring fast, flexible indexing. It was under these circumstances that Elasticsearch (ES) emerged as a potential solution to address the performance and scalability challenges.
The goal of the transition was clear: improve system performance by migrating the denormalized data from the MySQL cached table to Elasticsearch. Renowned for its full-text search capabilities and scalability, Elasticsearch was an attractive option. Its architecture, designed for handling large datasets efficiently, promised to enhance query performance while reducing strain on the existing MySQL infrastructure.
To achieve this, the team planned to set up an Elasticsearch cluster via a cloud-based Elasticsearch service. The migration strategy involved transferring all denormalized data from MySQL to Elasticsearch while maintaining synchronization between the two systems. This required a robust data synchronization process to ensure any changes in the MySQL database would be promptly reflected in the Elasticsearch index. Maintaining data consistency between systems was paramount, particularly given the CRM's reliance on real-time contact data.
Key objectives of the transition included:
Two synchronization options were evaluated for keeping Elasticsearch and MySQL in sync:
Ultimately, the backend-triggered method was chosen, aligning with the system's architecture and allowing for a smoother transition.
In the previous architecture, all contact data was stored and queried directly from MySQL’s cache table, which, while functional at the outset, became increasingly inefficient as the system scaled. The shift to Elasticsearch allowed the team to capitalize on its powerful search and indexing features, which improved query speed and offloaded significant workloads from MySQL. This reduced operational costs and enhanced overall system efficiency.
Old Design
New Design
The migration to Elasticsearch yielded several measurable benefits:
Despite the success of the transition, several challenges were encountered. Adapting to Elasticsearch’s architecture, particularly for team members more familiar with SQL, required an initial learning curve. Concepts such as aggregations, filters, and fuzzy search were foreign, but training and hands-on experience helped bridge the knowledge gap.
The initial data synchronization process from MySQL to Elasticsearch also presented difficulties. Migrating large volumes of data efficiently was no small feat. To address this, the team implemented a Kafka-driven synchronization pipeline, ensuring consistent updates to Elasticsearch. Additionally, an on-demand sync feature was added to handle any potential discrepancies during synchronization.
The shift from MySQL to Elasticsearch delivered substantial improvements in performance, scalability, and cost efficiency. By recognizing SQL’s limitations and making a calculated move to Elasticsearch, the team effectively enhanced the CRM system’s capabilities. This transition not only resolved existing bottlenecks but also set the stage for future growth and scalability.