Published 09. Jun. 2016
Will Hadoop Work for Medium-Sized Data?
The Need for a Regular Calculation of Credits
The case company is one of the biggest banks in Germany. It’s dissatisfied with the current C++ & SQL implementation performance and it needs to calculate the value of its credits on a regular basis. The calculation consists of statistical formulas and joins/aggregations/searches.
Data Transfer to Hadoop
The solution was to set up a Hadoop cluster at the case company’s premises and transfer the necessary 10 GB data from its database to the Hadoop cluster. Then, the specification given by the customer was re-implemented in Apache Spark/Python (statistical calculations) and Impala (joins/aggregations). Finally, the configuration and execution order were optimized to maximize the usage of all available resources.
Performance Upgrade and Big Savings
The result was a remarkable improvement in runtime from two hours and 20 minutes to just 37 minutes, which was a 400% acceleration. Not only was the performance level upgraded, but high value for cost was also achieved with hardware estimated cost of 3000 Euros and zero software cost. Aside from the glaring benefits in numbers, the experience of using Hadoop and its related technologies have also armed the customer with the knowledge that Hadoop works even for medium-sized data.