We are super excited to receive an ERC grant (funded by SERI this year for Swiss institutes) to support our research around Ease.ML. Along with this great honor comes the responsibility to conduct, as stated by ERC, "groundbreaking, high-gain/high-risk research" and "to set a clear and inspirational target for frontier research across Europe." So, what are we going to build? Furthermore, what are we hoping to change and why do we think this can make our world a better place?

Along with the great potential of machine learning and artificial intelligence to revolutionize our society, economy, and way of life, comes some risks related to its quality, robustness, and fairness and the great challenge of making the technology trustworthy. As people start to realize and address these risks, timely regulatory efforts will also emerge and will soon govern the industrial practice of AI/ML.

If the previous decade of ML/AI system research was driven by scalability, efficiency, and automation, we believe among the defining challenges of ML/AI systems for the coming decade is the need to manage, facilitate, and enforce trustworthiness.

As system researchers, our research is often inspired by "practitioners' struggles": What are the most pressing problems that our friends in industry face when adopting new emerging technology? Five years ago, when training an ImageNet classifier could take weeks, and when people were struggling with hyper-parameter tuning and model selection, and lost in what to do to make progress on the ML/AI journey, we dedicated a lot of effort toward scalability (See Zip.ML), automation, and setting up "software engineering principles" for ML/AI, i.e., MLOps (See Ease.ML). Today, these are still great challenges and call for continued investment and effort from researchers. However, as technologies in scalability, automation, and MLOps, mature, we are also seeing the struggle from practitioners starting to shift: If the previous decade of ML/AI system research was driven by scalability, efficiency, and automation, we believe that among the defining challenges of ML/AI systems for the coming decade is the need to manage, facilitate, and enforce trustworthiness.

Next Generation MLSys: Manage, Facilitate, and Enforce Trustworthiness

Trustworthy AI is a topic that has attracted intensive interest in recent years. Today, practitioners looking to build trustworthy AI applications are clearly not short of papers to read. Ironically, often the challenge of building trustworthy AI today is not that we do not know what to do, but that there are so many potential things that we could do. In our opinion, this is quite similar to the situation practitioners faced trying to build their first AI models five years ago: Their arsenal may be full of amazing methods developed by researchers, but without a principled guideline provided by ML systems, they are lost in what to do.

Take one of our friends in a leading Swiss bank as an example. Her dream is to enable interpretable and explainable ML within her industry. Despite being quite familiar with ML/AI, nevertheless she felt confused: "There are so many different ways people are proposing to explain an ML model, which one should we use?" This is quite a valid concern: Just take graph neural networks for examples, there are at least more than 20 methods just published in the last two years — How should we compare them? How should we benchmark them? How should we explain and communicate their differences to practitioners? How should we tell practitioners which one to use under which scenarios? Which one should we trust when they disagree? These are super hard questions to answer, and many of them, arguably, are system questions.

Our friend's struggle on making her model more explainable is not unique and it is a reflection of today's trustworthy AI landscape. Take another friend of ours in one of the best medical schools in the world, as an example. He worries about the fairness and generalizability of his ML models: "If I know that my model is not fair or does not generalize well to another hospital, what should I do?" One of the main challenges behind this is that there are so many things we can do: acquire more data from the same or different hospitals, remove bad data, rebalance the data, clean the data, write more weakly-supervised rules, apply active learning to fix some weak labels. Out of this endless pool of opportunities, only a few represent the most efficient ways to make the ML more fair. But how to systematically identify the most important actions to take?

The goal of our research in the next five years … is to make the process of building trustworthy AI applications as "boring" (systematic) as possible.

The goal of our research in the next five years is to build a new type of ML system, one that can provide systematic guidelines to anyone whose job is to build trustworthy ML applications. As system researchers, our dream is to make the process of building trustworthy AI applications as "boring" (systematic) as possible — Any practitioner should be able to set up a trustworthiness objective, and the system should be able to tell its users a concrete list of actions to take in order to reach this objective.

How can we trust our ML/AI models? The past decade has seen amazing work performed on trustworthy AI – in particular with regard to robustness, fairness, and explainability – and has provided some very useful guidelines to address this fundamental question. However, ML/AI applications in the real world are more are highly complex, where the ML model is often just a small component embedded in an ocean of data-centric components. How can we trust these end-to-end ML applications we see in the real world? With the support of this ERC grant (funded by SERI), we will focus on bringing trustworthy AI to these real-world scenarios. (The picture on the lower left corner is taken from Sculley et al.)

One Challenge: Bridging ML with Data

This ERC grant will support our research in understanding, and hopefully solving, one of the largest challenges that we were facing in building this system.

The dominant component in many ML applications is often not the ML ... ML model is just a small island embedded in an ocean of data-centric components.

While most today's trustworthy AI studies — around robustness, fairness, and explainability — are largely centered on understanding a single ML model, in reality, most real-world ML/AI applications are more complex. In fact, the dominant component in ML applications is often not about the ML. In the above picture we can see Sculley et al.'s quite influential depiction of the structure of an ML/AI application, where it can be seen that the ML model is just a small island embedded in an ocean of data-centric components. Being able to trust an ML model does not mean trusting this end-to-end data-centric pipeline. The goal of this project is to extend our understanding of trustworthy AI from a single ML/AI model to an end-to-end application.

Specifically, this ERC support will aid our explorations of two key questions, among many others that we will also do in parallel, as illustrated in the figure below: How should we consider the data influence (i.e., which data examples are most important if we want to improve the fairness of our ML model?) with the presence of feature extraction pipelines? And how should we consider robustness (i.e., what is the largest output perturbation of our models given an input perturbation) with the presence of post-processing pipelines? We hypothesize that, if we are able to provide efficient algorithms to these questions, we should be able to significantly broaden the applications of today's trustworthy AI methods to real-world applications.

This won't be an easy journey. The data management community has been studying data transformations for decades, while the machine learning community has been studying data influence and the robustness of ML models for just as long. A joint analysis of an end-to-end application requires us to bridge the many results coming out of these two communities in a theoretically principled, while practically feasible way. While there is a still a long way to go, we are nevertheless optimistic and the support of this ERC grant will definitely accelerate our work. If you are curious about some of our current thinking about these questions, here is a technical talk on some of our current results.

Larger-scale Community Efforts

Many of the research objectives that we are exploring are closely related to the recent trend in "Data-centric AI"; whereby it is often the case that the best way to improve an ML model is to improve the data. At the end of this blog, we hope to provide some pointers to several community efforts that we are privileged to participate in, together with many of our collaborators.

Data Centric AI
The goal of this workshop is to bring together a new community of researchers, practitioners, organizations andindividuals, and catalyze interest in the emerging discipline of Data-Centric AI.
Benchmark
We provide a collection of concrete tasks inspired by various pain points we have observed in real-world ML developmentworkflows. Many of these pain points require systematic solutions which we evaluate in this benchmark.
DataPerf
DataPerf is a benchmark suite for datasets and data-centric algorithms.