Published 09. Jun. 2016

Will Hadoop Work for Medium-Sized Data?

Analytics

background-1220464_960_720

The Need for a Regular Calculation of Credits 

The case company is one of the biggest banks in Germany. It’s dissatisfied with the current C++ & SQL implementation performance and it needs to calculate the value of its credits on a regular basis. The calculation consists of statistical formulas and joins/aggregations/searches.

Data Transfer to Hadoop

The solution was to set up a Hadoop cluster at the case company’s premises and transfer the necessary 10 GB data from its database to the Hadoop cluster. Then, the specification given by the customer was re-implemented in Apache Spark/Python (statistical calculations) and Impala (joins/aggregations). Finally, the configuration and execution order were optimized to maximize the usage of all available resources.

Performance Upgrade and Big Savings

The result was a remarkable improvement in runtime from two hours and 20 minutes to just 37 minutes, which was a 400% acceleration. Not only was the performance level upgraded, but high value for cost was also achieved with hardware estimated cost of 3000 Euros and zero software cost. Aside from the glaring benefits in numbers, the experience of using Hadoop and its related technologies have also armed the customer with the knowledge that Hadoop works even for medium-sized data.


ADASTRA GmbH will be attending our Business Analytics, Germany event on the 6th of April 2017 as a solution provider. For all upcoming events, check out our Event Calendar »