SLAs, SLIs, SLOs… oh my! What are all those definitions now? Why should I need them? If unreliable, even the best-designed API is useless and your client will leave for other places.
But neither should you do too much, failure should be accepted and permitted because 100% availability is a myth.
How do you define your reliability level? How many 9 should your uptime have? How do you know if you're monitoring the right metrics?
Do you know what an error budget is and how to use it?
If reliability is a feature, how should it be prioritised against other features?
In this speech, we will discuss the current trend, known as site reliability engineering, even if the topic is not that new.
I will also talk about how to implement this methodology and how to improve the reliability of our Conversational AI services in Swisscom while being the main customer entry point.
Luca Simone is a Software engineer with more than ten years of experience in developing, implementing, and managing complex projects with several technologies. After several years as a Java developer, he’s now focusing on Web applications building both frontend and backend. He likes public speaking in conferences, loves to contribute to open-source projects and expands his knowledge by solving challenging problems and staying on the bleeding edge. He holds a Bachelor as an IT Engineer (SUPSI) where he also won an award in his last year given by IDSIA, one of the top 10 artificial intelligence institutes in the world.