CS&E Alumni Justin Levandoski Earns IEEE ICDE 2023 Ten-Year Influential Paper Award

We connected with Levandoski to discuss the award and his current projects.

Congrats on winning the IEEE ICDE 2023 Ten-Year Influential Paper! Can you give us an overview of that work?

That paper was part of a larger project called Deuteronomy that we had at Microsoft Research in the database group, which I joined after getting a Ph.D. under Mohamed Mokbel at the University of Minnesota. It was a project looking at how you can evolve database systems, especially the core of database systems to meet the new hardware ecosystem that was becoming prevalent. We ended up creating a database indexing method called the “Bw-Tree”, which is also the title of the paper. It solved several problems in the new hardware space. First was concurrency, the problem of allowing multiple workers on a single machine accessing or updating  the index at the same time. Our update technique was lock-free, meaning if one worker wanted to come in and read or update the index, there wasn’t another worker blocking it. The second thing we did to play nicely with solid-state drives (SSDs) was to make the storage layer completely log structure. This means that when you write a disk, you only ever append to a log instead of just writing randomly into the disk, which helped exploit the write characteristics of SSDs.

This project was quite fun, because we really rethought a fundamental technology that had been in databases for roughly 40 years up to that point. It really revolutionized the B-Tree indexing structure. From that paper came a lot of follow-on work. It was done at the right time when these hardware changes were taking off and becoming mature, so timing was everything.

What does it mean to you to have your work recognized 10 years later? Where do you think this line of research is going in the future?

It’s an honor and it’s surprising. When you do the work, you think it’s very good, but you have no idea where it will go. It’s an honor to have people take your work forward and have it recognized 10 years later as foundational and fundamental. It’s surprising in that it takes off in directions that you never would have thought, especially for a fundamental thing like a B-Tree.

I think the natural extension of this work could go multiple places. I think now in this era of artificial intelligence (AI) and machine learning (ML), we haven’t yet thought about all the optimizations that can be done with indexing and hardware, especially around “vector databases” that one could update, which are fundamental to these systems. You could take some of these techniques or derivatives of these techniques forward in that space. I also think there was always a need for reconsidering hardware in general every decade or so to take a look at where you are, see if assumptions have changed about the hardware, and take the same work forward in the general storage and indexing area.

Tell us more about your role as the Director of Engineering at Google? What are your main responsibilities?

At Google, I work on a product called BigQuery. It’s a scalable cloud data warehouse. So, customers come to BigQuery with massive amounts of data, and query that data at scale to derive insights. I’m an engineering director, but I started as an individual contributor and ended up founding and building a team, which is what we call the Lake Analytics team. I run a fairly large team of close to 80 people in BigQuery that consists of teams extending BigQuery to the “data lake” space – including unstructured data – along with running the multi-cloud infrastructure team in BigQuery Omni to ship BigQuery running on other clouds.

My team takes forward pieces of BigQuery and extends its data reach into the “data lake” space. This space is vibrant, as it has brought about new analytic scenarios, especially around unstructured data and AI/ML. That whole space is where all these new pipelines and new analytics workloads started because it was further away from the more “traditional” structured data warehousing workloads. When I started at Google a little more than three years ago, all of this started converging. What customers want is control of their data assets no matter where it is stored – whether in a data warehouse or data lake. But if you think about large customers and people who would buy Cloud products and so forth, you really care about things like governance, security, making sure all data is managed in a similar way. This is what I took forward in founding the Lake Analytics team which is extending pieces of BigQuery to data lakes.

What are the biggest lessons you have learned in this position?

In this current position, this is one of my first jobs where I built a team from scratch, managed it, and grew it from zero to a large team. In this role, I learned a lot and how to frame paradigm shifts and how to sell ideas, which I learned through getting a Ph.D. That’s helped me throughout my career, not only at Google, but at Amazon Web Services and Microsoft as well. After that, I think the ability to be a mentor was a big lesson. After leaving Minnesota, I had to decide between the industry side or the academic side. I chose to go more towards the industry side but there’s that sense of being a professor and being able to teach others what you know. Essentially, I’m able to scale that through running a team of this size.

What originally brought you to the University of Minnesota?

I graduated from Carleton College and I was from the Midwest originally. I was deciding whether to go to graduate school for economics or computer science. I looked around at many schools and Minnesota was great in both. At a certain point, I knew I wanted to do computer science and databases and the database group was just starting at Minnesota. I was one of the early members of Mohamed’s team.

How has your Ph.D. work influenced your career or personal development?

Foundationally, getting a Ph.D. means you have to be both deep in an area and broad enough to provide context about the work you’re doing. In a Ph.D. program, you learn the field, in my case databases, and it gives you a very good background in the novel work that’s been done and how to stay up to date on this work. Part of research is being able to read into an area and stay up to date about how things are evolving. That’s something I still do today. Being able to do that and relying on my foundation that I’ve built getting a Ph.D. in the domain makes you an expert. You’re able to lean on that in your professional career a lot, especially if you stay in the same general area, which I have.  

What getting a Ph.D. taught me was how to frame and sell your ideas. Part of research is marketing ideas and being able to convince people that your ideas are good and novel. Ultimately, this is a peer-reviewed field and taking the field forward is getting buy-in for your ideas.

What are your future goals?

I think the next natural progression of where things are going to interplay is databases, analytics and then the AI/ML side. You can’t have AI/ML without data. Having started the project that brought unstructured data into BigQuery, I think there’s a whole agenda that’s yet to be written on the interplay of how unstructured data integrates with a data warehouse where the AI and ML pieces come from that. Also, just data in general.

I think one thing ChatGPT and OpenAI brought forward for us is that it's gotten notice from enterprise customers. One of the things that I hear about day in and day out is how enterprises are going to leverage AI/ML, especially generative AI. I think customers are going to start thinking about core data management challenges, like what data is used to train certain models, how to mix traditional analytics with AI/ML, etc. They’re all interesting questions and I’m excited for the next few years.

Anything else you would like to add?

Getting a Ph.D. is a fairly involved choice because you’re dedicating four or five (maybe 6!) years of your life to doing something. It’s not for everybody. But, there’s a lot of lessons you can learn from getting a Ph.D. I think I’m one example. I know I used lessons from my Ph.D. in various places, whether it was core research or building a team in industry. Like the 10-Year Influential Paper Award, you don’t know if what you’re doing is going to pan out and be successful or not. You just don’t know because you’re working on early ideas that can take years to come to fruition. You need to commit to it and trust what you are doing is good.