As developers, we manipulate data every day. But there's one kind of data that is difficult and non-trivial to process and structure: human language. This is even more difficult when you're in a country like Switzerland, where there are four official languages! Fortunately, there has been a decades long effort in computer science regarding this specific purpose, namely Natural Language Processing (NLP).
In this talk, we cover many of the available components of a modern NLP pipeline: From the basic tasks, like tokenization and lemmatization, to the most interesting techniques like Named Entity Recognition (NER), coreference resolution, and the dependency parser. Furthermore, we show where and how LLMs, like GPT, can be plugged in to (possibly) enhance a pipeline.
To provide a real-world - and Swiss - context, our target dataset will be the Swiss Commercial Registry. This complex, multilingual public database is central to an expansive interdisciplinary research project in economics and political science, where we are building the software engineering backbone using cutting-edge NLP technology.
Andrea Mocci
CodeLounge, Università della Svizzera italiana
I am currently a Junior Group Leader at CodeLounge, a R&D group (part of the Software Institute) headed by Dr. Marco D’Ambros and Prof. Dr. Michele Lanza. My main responsibilities include being the tech lead for CodeLounge’s team and projects, and a lot of actual development, mostly on the backend side, including as well machine learning, usage of large language models, and natural language processing.
I am passionate about functional programming in many flavors, e.g. languages like Scala and reactive technologies like Akka. Apart from R&D, I do a lot of teaching. I have been the lead developer of Tako, an extension for Visual Studio Code that records many aspects of your programming activity and provides a digest to allow some self-reflection on what you have done, how you spent your time, and which entities you worked on. I have been member of many academic conference Program Committees, like ICSME, MSR, and ICPC, and reviewer for journals in the same area. Regarding developer conferences, I've been serving to the PC of Voxxed Days Ticino for 2023 and 2024.
In the past, I’ve been a postdoctoral researcher at USI Lugano, and at MIT. I got my B.Sc., M.Sc. and PhD at Politecnico di Milano, where I have been advised by Prof. Carlo Ghezzi.
Jesper Findahl
CodeLounge, Università della Svizzera italiana
Jesper Findahl is a Senior R&D Software Engineer at CodeLounge, a center for software research and development (R&D) at the Università della Svizzera italiana (USI) in Switzerland. He joined CodeLounge in 2018 after earning a Master's degree in Informatics at USI.
At CodeLounge, Jesper's role has been diverse, encompassing tasks such as frontend and backend engineering, data visualization, continuous integration/development (CI/CD), software analysis, and, more recently, data analytics and machine learning.
Jesper is deeply passionate about software design and productivity and is always eager to explore new technologies and further his knowledge in the field.