Apache Spark – crash course

Apache Spark – crash course
The course is dedicated to people who have no previous Spark experience. The ultimate goal is to provide an overview of the most important Spark features so that attendees get enough knowledge to start building their first Spark applications.The course is an introduction to Spark led by a “hands-on” practitioner who gained his experience solving real life problems for many of his clients. The main strength of the course is that it is based on knowledge collected while working on real-life Big Data related problems. The course emphasizes practical aspects of Spark and common problems and misconceptions that are typically encountered when helping clients.

Programme overview:

  1. Introduction to Spark

  • What is Spark?

  • Spark vs Hadoop

  • Spark with HDFS : quick overview

  • Spark on YARN : quick overview

  1. Basic building blocks in Spark

  • Introduction to Resilient distributed datasets

  • Spark shell

  • Overview of RDD operations

  • Key-Value Pair RDDs

  • Aggregating Data with pair RDDs

Hands-on exercises:

  • Word count

  1. Writing and deploying Spark applications

  • Spark context

  • Building Spark applications

  • Submitting a Spark application to a cluster

  • Spark Web UI

  • Spark Config: important options

  • Logging, YARN log aggregation

Hands-on exercises:

  • Joining RDDs

  1. Spark on a cluster

  • RDD partitions : on HDFS, on local filesystem, after shuffle

  • Data Locality

  • Execution model overview : Stages, Tasks, Executors

  • RDD persistence

  • Fault tolerance

  1. Hands-on exercises:

  • Spark-SQL aggregations

  1. Spark use cases

  • Data analysis

  • Machine learning

  • Iterative algorithms

Hands-on exercises:

  • Page rank

  1. Spark performance tips:

  • Controlling parallelism

  • Dealing with skewed data

  • Broadcast variables

Hands-on exercises:

  • Performance tuning challenge!

Sponsored by

  • Ability to understand simple programs written in scala or java.
  • Familiarity with Linux command line.
Target Audience

Software Developers who have no previous Spark experience


Venue : Agile Actors HQ
Date : Saturday, May 20th
Time : 09:30 – 17:30

€240+ VAT 24%

back to top