With companies increasingly turning to AI and data to grow their business, the need for ‘real-life’ data-science expertise grows. Paul Groth leads the team at the Intelligent Data Engineering Lab (INDELab). “We know how to work with you,” he says. “Our team has experience with industry, really strong experience looking at real-world issues and being inspired by them.”

Can you give us an introduction to what you do?

I’m a professor of data science, and I wear multiple different hats. I lead the INDELab, where we apply AI to the problems of how people work with data. Concretely that means things like: how do we automatically build great quality datasets from various sources, including text and structured data, or even images and videos? Another focus area is data management for machine learning: researching how we can help data scientists or AI engineers make their machine learning better. Things like: how do we check the quality of data? How do we help people debug their data science or AI pipelines? With another hat on, I’m co-scientific director of two ICAI labs [Innovation Center for Artificial Intelligence]. The AIRLab, in partnership with Ahold-Delhaize, and the Discovery Lab with Elsevier.

I’m also scientific director of the University of Amsterdam’s data science centre. That’s a really exciting thing – it’s designed to help us use data science or AI techniques to accelerate our research in all the faculties. Whether you’re in the humanities or law or life sciences, we ask: how do we make our research better, faster, higher quality, by injecting AI and data throughout all of it? I also teach in our information studies programme.

How do you see the synergy between your different roles?  

My role as an educator is very important. My involvement with startups here at Amsterdam Science Park, and my involvement with big companies, helps me see what we need to teach our students for their future careers. All of the interactions, whether it’s with other faculties or with companies, gives us inspiration for the kind of research we do. On a practical level, we’re working with Ahold, and they have data quality issues. How do we devise new techniques to help them with those issues? Likewise, we work with social scientists; in one particular project they are interested in how standards get developed, such as how 5G gets created: what are the processes behind that? That’s something that’s hard, and we take it and say: OK, what new techniques do we need to create in computer science to address these problems?

So there’s also a sort of pipeline from theoretical knowledge to practical application?

Correct. If you talk to people doing things on the ground, they always come up with really hard problems, and you always need research to address them. Many times they’re facing challenges that haven’t been addressed in the literature, or else they’ve been addressed by making assumptions and those assumptions don’t hold in practice. So then we need to do more fundamental research. One PhD student worked on the problem of: how do researchers actually search for data? And one of the things we found is that in practice, when people are trying to understand data, they’re always trying to connect information to context. That led to a further research on how we can automatically provide extra context information.

And does your group’s work on these problems also translate into future collaborations?

Absolutely! That project around data search was on a Dutch Research Council (NWO) project with Elsevier, and the interaction has led to further, larger collaborations in the ICAI lab around problems of data integration. We have lots of data coming in from different places, different background sources; how do we integrate that automatically using AI? That’s a story of starting out with small projects with one or two PhD students, and then expanding to a bigger relationship with other parties.

How do you see these collaborations between the private sector, the university and the independent research sector developing, now that you’re all in the new LAB42 building together?

It’s super exciting. We’ve worked a number of times with Zeta Alpha, a startup company two floors down from us here at LAB42. They were in the Startup Village, then moved away from the Science Park, and now they’ve come back, because they were missing the community interaction. And now that we’re all at LAB42, it’s great, we can see that whole startup team and it makes for a lot less friction when collaborating. LAB42 has that environment of being a place that’s happening! You get the buzz of the students, you get the buzz of the researchers, and it’s important not to underestimate the benefits of that physical proximity.

What can you offer partners that is difficult to find elsewhere?

We are one of the leading research groups on how to use data management for machine learning. If you’re interested in improving the machine-learning processes with the management of the data and the models around that, we’re one of the few groups in the world who are really focusing on that kind of research. And really looking at what I call the empirical side of data science and using that to drive our research – other groups do that, but we do it really well. If you’re a partner with us, we know how to work with you. Our team has experience with industry. I was in Elsevier, one of my colleagues was at Amazon, another colleague is part-time at IBM. We have very strong experience looking at real-world issues and being inspired by them. Our contribution to open source and development of new things are also strong.

So you always aim to turn interactions between researchers and companies into a win-win?

Absolutely! The way AI and computer science in general are now, a lot of super interesting research problems are happening within organisations, and the cycle times for us to develop or trial state-of-the-art ideas and transfer them back are much lower. And often the experiments you can run and the things you can do, research-wise, within an organisation or partnering with an organisation, are not even possible if you try to do it alone, because of the scale of the data, or because of the interactions with users. Or just because of what real-world data looks like.

Are there partners you’d still like to work with?

I’m very interested in the finance space. The university has a finance and technology track, which includes machine learning and AI in finance.  Finance has super interesting data sets, and we’re exploring what we can do with that. I’m also interested in seeing more life sciences work. In the past, I’ve worked with pharma companies, which is a very exciting space with a lot of opportunities to apply AI or computer science to make a really big impact.  We have the Swammerdam Institute for Life Sciences, a great biology programme and great medical imaging in this department and also in the data space. Finance and life sciences are two industries I’d be excited to work with.

Final question: is there anything you’re missing at the Science Park?

I’m a coffee snob, so more good coffee places where it’s natural for people to hang out! And I’d love to see students actually living on the campus. If you bike past here on the weekend it’s a bit dead. I’m very much looking forward to the rooftop bar in Matrix ONE. I think that kind of thing is great for the Science Park.


