Rethinking How Data is Stored and Processed Brings Scale and Speed to Modern Data-Intensive Applications


Key-value databases are at the forefront of many modern data-intensive business applications – and are widely adopted across several industry verticals including e-commerce, online gaming, content delivery networks, social networking, and messaging services. IT spending for these revenue-generating applications is tied to a certain percentage topline revenue of the organization. This highlights how much importance is given to designing such applications for superior performance and exceptional customer experience.  

Key-value databases provide a simple and highly flexible interface for building caching, distributed storage, file systems, and database systems. Enterprises need to select key-value databases (and their architecture setup) that are appropriate for their business application. The server, storage and storage engine design choice should allow applications to focus on addressing business problems without worrying about the performance or scalability of the application to meet peak demands.  

While businesses rely heavily on the key-value databases, they continue to encounter challenges. These challenges typically fall  into two categories:

  1. System design and storage 
  2. Employing multiple key-value databases, resulting in duplicated development efforts

System Design & Storage Challenges

  • High Memory Usage: Some key-value databases store all the datasets in-memory, which leads to memory-intensive data processing. When the application’s working set of indexes and most frequently accessed data exceeds available memory, disk performance quickly becomes the limiting factor for throughput. At the same time, it can be challenging to ensure that performance scales with user demand and doesn’t outpace cost projections.
  • Cost Problems at Scale: For persistent key-value databases with large-scale deployments, the cost of storage can become significant. Efficiently optimizing storage without compromising performance becomes a concern, especially in cloud environments where costs are directly tied to storage consumption.
  • Data Compression: Key-value databases might need to employ compression techniques to reduce storage overhead. However, the choice of compression algorithm can impact CPU spikes and performance degrades, especially when frequent decompression is required.
  • Data Fragmentation: As data gets modified, updated or deleted, over time storage can become fragmented. This leads to space amplification with inefficient disk space usage and increased I/O operations during reads, resulting in read amplification.
  • Scalability Issues: Even though key-value stores are designed for high scalability, there can be challenges with increased user concurrency and when scaling out, such as managing data consistency across distributed systems or handling partitioning and replication.

Employing Multiple Key-Value Stores

Organizations often have many separate key-value systems owned by different teams with different API feature sets and optimization parameters and techniques. This results in duplicated development efforts, high operational overhead and incident counts, and confusion among engineering teams’ customers. For example, organizations using storefront applications will store user login, clickthrough, and preferences in a key-value store. Gaming or learning applications use key values for leaderboard applications. Real-time recommendation and ad-tech applications use KV stores. Caching applications use KV store to store data that doesn’t change often ranging from seconds to a few days.

While SQL provides a standard query language for relational databases, no such universally accepted standard exists for key-value stores, thus creating a lack of standardization. This can make switching between different key-value databases a more significant challenge.

The answer? Special-purpose hardware accelerators for key-value databases. 

All of this has combined to create an immediate need for a new generation of hardware-accelerated data processing and storage management technology. From GPUs and TPUs built for the most demanding AI and Gen AI models to DPUs that enable a workload-optimized approach, the creation of dedicated processors for data-intensive tasks is not a new concept. Dedicated key-value accelerators that combine hardware and software components to address ever-growing performance, scalability, and data growth demands are needed. The ultimate objective is to achieve accelerated application performance, enabling data growth with built-in compression, optimized data flow to increase SSD endurance and prolonged usable life and bringing down the cost of performance and capacity scaling. Such solutions should enable key value database applications to exploit the benefits of modern SSD storage performance to their full potential. They should address data growth challenges with efficient data compression algorithms by offloading CPU compression usage. 

The architecture should facilitate scaling to several hundred terabytes up to petabytes of storage, supporting the management of dozens of billions to trillions of key-value pairs. These accelerators should employ an open standards-based approach – take RocksDB for example. RocksDB serves as the foundation for numerous applications, such as Redis, MyRocks, Kafka Streams, Spark structured streaming, TiKV, KVRocks, ArangoDB, and many others. Enterprises should be able to easily migrate to and from the platform with no vendor lock-in. The needed architecture would embrace adaptability for future innovations. Ultimately, key value accelerator systems should empower enterprises to focus on what matters most; application and business growth – and free them from concerns of storage management and performance.

About the Author

Prasad Venkatachar is Sr Director – Products & Solutions at Pliops. Prasad is an experienced IT professional with 20 years of combined experience in product strategy and management, marketing, solution architecture, and IT services. In these 20 years of progressive experiences he has launched multiple industry-leading databases, data warehouses, data lake & AI/ML products and solutions collaborating with Microsoft, IBM, Oracle, Google, MongoDB, Cloudera, and ISV partners. He has served Fortune 500 enterprise customers as SME to deliver business value outcomes through technical and financial benefits for data center and cloud deployments. He has also served as Microsoft Data and AI Partner Advisory Council and a Member of Lenovo Technology Innovation.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: https://twitter.com/InsideBigData1

Join us on LinkedIn: https://www.linkedin.com/company/insidebigdata/

Join us on Facebook: https://www.facebook.com/insideBIGDATANOW





Source link