With the advent of Big Data technologies, companies are keen to leverage them and get ahead of each other. Most financial companies are reputed for having a wealth of risk data. These massive data sets tend to live in old school large databases like Oracle or Sybase. The first hurdle in building a big data ecosystem is to shift data out of these mythical databases and into the big data ecosystem. We will look at some options for doing this at scale. We will look at pros and cons of some off the shelf tools for doing this such as Alteryx, Talend and Informatica. We will have a go at building our own generic ETL tool using Apache Spark. We will shift some data into HDFS with our hand-built tool and have a sneak peak at how easy and efficient it is to run complex queries in HDFS using Apache Hive.
About Zinat Wali
I am a Senior Developer at Scott Logic. I like exploring technology and have a keen interest in Big Data and Machine Learning.