SESSION: BUILDING DATABASES FROM UNSTRUCTURED TEXT WITH NLP

Dan is an independent Big Data Technical Architect. He has led the development of a Hadoop-as-a-Service offering and built software streaming frameworks long before the rise of Apache Spark and Storm. He’s well versed in the knowledge domain building applications on RDF and SPARQL collapsing many systems of information into 360 views of a customer from both structured and unstructured data.

Can you tell us a little bit about yourself?

I’m one of Bristol’s own, having lived here for the best part of 15 years. They call it Silicon Gorge for a reason and my work has included the first wave of autonomous vehicles, handheld radios, city mapping, Big Data scale processing and more recently knowledge management. Outside of software I’m a father of one and can be found cycling in the wonderful Somerset countryside.

How did you get into software?

A youth spent mostly indoors when the sun was shining. My childhood was spent on the Amiga 500; James Pond, Lemmings, Bomberman. Games most of the guys and girls I work with these days haven’t heard of. Guess learning how to create the thing you enjoy is what drives most of us. Don’t get much time for gaming these days though!

In your session, you’re talking about using NLP to build databases from unstructured text. Why this topic?

As engineers we are often focussed on the how and not the why and the talk title reflects this. The real talk topic is the proliferation of knowledge in prose form; blogs, reviews, wiki pages, articles and so on. When searching for my favourite actor I don’t want to just see the facts and opinions from a single source but the aggregated view across many. Companies want this same view of their customers; how do they harness both their internal structured databases and combine it with what a customer may be saying about their products or services on social medial. How do we make this information available for sensible query without having to read it in its entirety.

Some developers might feel the barriers to entry are pretty high – can anyone get in on the act?

When someone first mentioned entity extraction with NLP to me a couple of years ago, my initial impression was that’s going to be complex and a lot of work. Combine that with numerous talks I’ve been to where NLP experts have made the subject impenetrable and I’d almost given up. I’m just a run of the mill software developer. The reality is the libraries out there make the task easy, not only is it easy to extract the entities (people, places, things) you are interested in, it is just as easy to extract the relationships between those things that exist in the text. Come to my talk to find out how!

Can you recommend any local Big Data communities for developers to get involved in?

Bristol has both a thriving Big Data and Data Science scene and many companies are now embarking on data based projects. If you’re looking for talks or just want to chat to fellow data professionals over a beer the following groups are big hitters on meetup.com; Big Data Bristol, South West Data and Bristol Data Scientists.

Twitter: @dancookdev

Linkedin: www.linkedin.com


:
Back