In-memory big data management

Issue 1, 2014

Download pdf

Enable real-time analysis of large data sets with Terracotta BigMemory and Intel®

By Theo Hildyard, Product Marketing, IBO and Big Data, Software AG
Manish Devgan, Director, Product Management, Software AG

 
In-memory data management platforms—a natural progression of the big data revolution—are pushing data management and analytics into the realm of real-time. Terracotta BigMemory Max with servers equipped with the new Intel® Xeon® E7 v2 processor has made the leap to give enterprises high-performance, highly available in-memory data access and management, and predictable speeds at any scale. This article provides an overview of the detailed white paper “Terracotta and Intel: Breaking Down Barriers to In-memory Big Data Management” available here.
 

What’s holding back real-time big data analysis?

Enterprises that have embraced big data solutions are now grappling with the challenge of managing massive amounts of hard-disk-based data across large server clusters, and the delays that can be caused by latency. While hard disk speeds have improved over the years, hard disk performance still pales in comparison to DRAM or even SSD speeds. Even with an aggressively optimized cluster, such as Apache Hadoop®, real-time analysis can be difficult given the performance restraints of traditional storage.
 

How BigMemory works

Terracotta BigMemory Max provides a high-performance alternative to disk-based storage. It overcomes the weaknesses of traditional batch-loaded data analysis by storing data in the server’s machine memory. Enterprises can use this data storage space to store terabytes of data in-machine memory—orders of magnitude faster than loading data onto disk-based storage. Storing data in memory provides low, predictable latency and eliminates delays associated with disk access.
 
BigMemory Max configures unused memory as storage space as shown in Figure 1. By combining multiple servers into arrays, BigMemory Max can create large pools of high-speed memory that applications can use to store, retrieve and analyze extremely large datasets.

Figure 1: Terracotta BigMemory Max creates a data store from unused RAM, which is accessible by applications in microseconds.
 
BigMemory Max also protects data and can deliver 99.999 percent uptime by mirroring the data across multiple servers. If a server outage occurs, data is not lost as each paired server keeps a complete mirrored copy of the other server’s data.
 
Applications can access BigMemory Max storage using the industry-standard Java Ehcache API or other common APIs, such as those found in C# and C++ libraries.  Software engineers can use common protocols such as MOM, HTTP, REST and SOAP to access BigMemory Max.
 

The performance advantage of Intel technology

Software AG and Intel recently showcased performance tests that demonstrate just how well Terracotta BigMemory Max and Intel technologies work together to enable real-time analysis of large data sets. BigMemory was used with the new Intel Xeon E7 v2 processor to gradually scale data volumes up to 6TB in-memory.
 
 
Figure 2: BigMemory Max provides predictable latency and throughput as data volume increases.
 
The results, shown in Figure 2, demonstrate that even as the volume of data increased at each memory configuration point, latency and throughput remained stable. The latency remained fairly constant at 4.2 to 4.3 milliseconds, while the transactions per second (TPS) remained stable between 45,230 and 46,915 for the given workload and data access use case. 
 

Benefits

Enterprises can take the next step into the big data evolution with confidence. The consistent, predictable throughput with low millisecond latency of BigMemory with the Intel Xeon processor opens the door for enterprises to leverage numerous in-memory data management and analysis opportunities and benefits.

 

Analyze faster
By reducing the volume of data on disk, you can analyze larger data sets in-memory and make your applications run faster.

 
Scale out
While BigMemory can support 12 TB of RAM on a single server using an eight socket configuration, it can also scale out to work in a cluster of hundreds of servers allowing applications to easily scale to support petabytes of data in-memory.
 
Rely on high availability
We have years of experience with mission-critical deployments, and thus engineered this solution to ensure support for 99.999% availability.
 
Lower TCO
With BigMemory, your Total Cost of Ownership (TCO) is lower thanks to reduced operation costs (i.e., fewer servers), lower power consumption and reduced cooling costs in the data center.
 

Conclusion

Analysts are already stating that this new offering could be the tipping point for in-memory adoption as E7 v2 family boasts triple the memory capacity of previous generations. BigMemory Max running on these servers provides a path for enterprises to scale up and out, opening new avenues to reap the benefits of real-time big data analysis.
 
To learn more, check out the complete white paper with all the details here.
 
For more information about Terracotta BigMemory Max, visit www.terracotta.org/products/bigmemorymax.  
 
 
 
 

Growing main memory capacity has fueled the development of in-memory big data management and processing. By eliminating disk I/O bottleneck, it is now possible to support interactive data analytics. However, in-memory systems are much more sensitive to other sources of overhead that do not matter in traditional I/O-bounded disk-based systems. Some issues such as fault-tolerance and consistency are also more challenging to handle in in-memory environment. We are witnessing a revolution in the design of database systems that exploits main memory as its data storage layer. Many of these researches have focused along several dimensions: modern CPU and memory hierarchy utilization, time/space efficiency, parallelism, and concurrency control. In this survey, we aim to provide a thorough review of a wide range of in-memory data management and processing proposals and systems, including both data storage systems and data processing frameworks. We also give a comprehensive presentation of important technology in memory management, and some key factors that need to be considered in order to achieve efficient in-memory data management and processing.