Dan S is s a data engineer who occasionally teaches advanced data engineering workshops using Spark as the big data framework.
Interested in learning the practical applications of a modern, streaming data analytics pipeline? Meet Apache Spark, the big data framework that helps reduce data interaction complexity, increase processing speed and enhance data-intensive, near-real-time applications with deep intelligence.
This 2-hour, intensely hands-on workshop introduces Apache Spark, the open-source cluster computing framework with in-memory processing and streaming capabilities that makes analytics applications up to 100 times faster compared to Hadoop. The workshop is aimed at seasoned developers with an interest in understanding the streaming data pipelines that power today’s real-time analytics engines. Agenda Interactive Data Analytics Overview Creating Spark DataFrames From Publicly Available Datasets Spark Streaming Overview Time Series Analytics Overview Graph Analytics With Spark GraphX All the tools we use during the workshop will be inside one Docker container per attendee on a cloud server. This will make it possible for attendees to continue experimenting at home on their own laptops.