Andy Petrella is an entrepreneur with a Mathematics and Distributed Data background.
Andy is an early evangelist of Apache Spark and the Spark Notebook creator in the data community. He is also an O'Reilly author of “What is Data Observability”, “What is Data Governance”, and trainer “Distributed Data Science”, “Data Lineage Essentials”, “Machine Learning Model Monitoring”.
Andy is also the founder and CEO of Kensu, a data observability solution implementing the Data Observability Driven Development (DODD) method.
In this talk, I’ll review the challenges faced by data teams related to the management of data in production, especially those associated with data quality issues. I’ll highlight why classical data quality approaches are not enough or suited anymore to this era of data - where data teams are growing rapidly to sizes we never witnessed. Similar management and operations challenges have already been encountered in IT, which has led to the development of the DevOps culture, in which observability is taking a big part, next to the automation and decentralization ones. So, I’ll introduce observability with a laser focus on data, its dependencies with other areas, and how it can be introduced in data teams' (DataOps) culture to help them detect, resolve, and prevent data issues (at least).