Gareth Rogers

Conference Session

 

Putting the Spark in Functional Fashion Tech Analytics

Metail is a fashion tech startup whose goal is to reduce the cost and improve the efficiency of a retailer’s garment photograph process and to give consumers confidence in the clothes they buy online. By allowing customers to try clothes online on their body shape we’ve been able to collect a unique data set of customer cloth shopping habits along with their body shape data.

Metail’s analytics platform, now four years old, drives our data science products, and internal and external dashboards giving summarised view of key business metrics. The pipeline is based on the ideas in Nathan Marz’s lambda architecture and uses the Snowplow analytics pipeline as a foundation for our event tracking, collection and first pass processing. From the start, the pipeline was implemented in Clojure using it to connect our pipeline stages and it’s big data libraries are the workhorse of our raw event processing and aggregation.

This talk will show how we’ve used Clojure to provide a solid platform to connect and manage our AWS hosted analytics pipeline and the pitfalls we encountered on the way. I’ll also talk about some of the difficulties we’re currently experiencing and how these are being resolved.

Some of the technical topics I’ll cover: –

  • Using Spark jobs implemented in Clojure running on AWS’ Elastic MapReduce to transform and aggregate our data which feeds Redshift to drive our dashboards and data products.
  • Using Redshift Spectrum to reduce our Redshift costs and improve our compute-to-storage ratio.
  • Event-driven pipeline taking advantage of S3 notifications and SQS queues.

I’ll explain how we’ve used Clojure for event handling and job submission, all stateless and where possible using idempotent steps, and aim to convince you that Clojure is a strong choice and its functional paradigm and open source libraries make your job easier and more fulfilling.

 

About Gareth Rogers

I’m a Data Engineer at Metail where I’ve worked for 6 years. Over the last 4 years I’ve been part of the team first building and then keeping Metail’s data analytics pipeline up-to-date and able to meet our changing demands. This has meant deciding where to keep up with a rapidly changing field and where to enjoy some stability. I came to Metail after graduating with a PhD in high energy physics based on the LHCb experiment at CERN. There I spent too much time working on the control system and monitoring software, but I still managed to code up and version control my analysis. I haven’t really seen a hill since leaving Geneva and I’m hoping to have some time to attempt a run up from the river to Clifton suspension bridge.