Dan S is s a data engineer who occasionally teaches advanced data engineering workshops using Spark as the big data framework.
Q. You’re speaking at Voxxed Days Bucharest in March. Tell us a bit about your session.
This is the same 100% hands-on Spark workshop that I have been leading in one form or another for a variety of audiences for the past 3 years, in the city of Bucharest.
Q. Why is the subject matter important?
Nowadays streaming data is everywhere and there is an increasing push towards using the same platform for both stream ingestion and machine learning at scale. Apache Spark is an interesting study in how to build a modern streaming data pipeline by making it very straightforward to productionize the work of data scientists.
Q. Who should attend your session?
This workshop is going to be very interesting for any data engineer or data scientist who is not yet familiar with Spark. It will be especially useful for people currently running Hadoop clusters who are evaluating a transition to Apache Spark.
Q. What are the key things attendees will take away from your session?
First, we’ll take a quick look at the small subset of Scala that is absolutely necessary to understand before writing a Spark big data application. Using Spark, we’ll then work our way through a few publicly available datasets and gradually harness increasingly useful insights from them. Towards the end, we’ll examine a relatively complex Kafka-Spark-Cassandra streaming pipeline that more closely mimicks a real-life high-load production setting.
Q. Aside from speaking at Voxxed Days Bucharest, what else are you excited about for 2017?
Compared to last year, it’s very exciting to finally see organizations express interest in streaming architectures in 2017.