We are super excited to receive an ERC grant (funded by SERI this year for Swiss institutes) to support our research around Ease.ML. Along with this great honor comes the responsibility to conduct, as stated by ERC, "groundbreaking, high-gain/high-risk research" and "to set a clear and inspirational target for frontier research across Europe." So, what are we going to build? Furthermore, what are we hoping to change and why do we think this can make our world a better place?
Along with the great potential of machine learning and artificial intelligence to revolutionize our society, economy, and way of life, comes some risks related to its quality, robustness, and fairness and the great challenge of making the technology trustworthy. As people start to realize and address these risks, timely regulatory efforts will also emerge and will soon govern the industrial practice of AI/ML.
If the previous decade of ML/AI system research was driven by scalability, efficiency, and automation, we believe among the defining challenges of ML/AI systems for the coming decade is the need to manage, facilitate, and enforce trustworthiness.
As system researchers, our research is often inspired by "practitioners' struggles": What are the most pressing problems that our friends in industry face when adopting new emerging technology? Five years ago, when training an ImageNet classifier could take weeks, and when people were struggling with hyper-parameter tuning and model selection, and lost in what to do to make progress on the ML/AI journey, we dedicated a lot of effort toward scalability (See Zip.ML), automation, and setting up "software engineering principles" for ML/AI, i.e., MLOps (See Ease.ML). Today, these are still great challenges and call for continued investment and effort from researchers. However, as technologies in scalability, automation, and MLOps, mature, we are also seeing the struggle from practitioners starting to shift: If the previous decade of ML/AI system research was driven by scalability, efficiency, and automation, we believe that among the defining challenges of ML/AI systems for the coming decade is the need to manage, facilitate, and enforce trustworthiness.
Next Generation MLSys: Manage, Facilitate, and Enforce Trustworthiness
Trustworthy AI is a topic that has attracted intensive interest in recent years. Today, practitioners looking to build trustworthy AI applications are clearly not short of papers to read. Ironically, often the challenge of building trustworthy AI today is not that we do not know what to do, but that there are so many potential things that we could do. In our opinion, this is quite similar to the situation practitioners faced trying to build their first AI models five years ago: Their arsenal may be full of amazing methods developed by researchers, but without a principled guideline provided by ML systems, they are lost in what to do.
Take one of our friends in a leading Swiss bank as an example. Her dream is to enable interpretable and explainable ML within her industry. Despite being quite familiar with ML/AI, nevertheless she felt confused: "There are so many different ways people are proposing to explain an ML model, which one should we use?" This is quite a valid concern: Just take graph neural networks for examples, there are at least more than 20 methods just published in the last two years — How should we compare them? How should we benchmark them? How should we explain and communicate their differences to practitioners? How should we tell practitioners which one to use under which scenarios? Which one should we trust when they disagree? These are super hard questions to answer, and many of them, arguably, are system questions.
Our friend's struggle on making her model more explainable is not unique and it is a reflection of today's trustworthy AI landscape. Take another friend of ours in one of the best medical schools in the world, as an example. He worries about the fairness and generalizability of his ML models: "If I know that my model is not fair or does not generalize well to another hospital, what should I do?" One of the main challenges behind this is that there are so many things we can do: acquire more data from the same or different hospitals, remove bad data, rebalance the data, clean the data, write more weakly-supervised rules, apply active learning to fix some weak labels. Out of this endless pool of opportunities, only a few represent the most efficient ways to make the ML more fair. But how to systematically identify the most important actions to take?
The goal of our research in the next five years … is to make the process of building trustworthy AI applications as "boring" (systematic) as possible.
The goal of our research in the next five years is to build a new type of ML system, one that can provide systematic guidelines to anyone whose job is to build trustworthy ML applications. As system researchers, our dream is to make the process of building trustworthy AI applications as "boring" (systematic) as possible — Any practitioner should be able to set up a trustworthiness objective, and the system should be able to tell its users a concrete list of actions to take in order to reach this objective.
One Challenge: Bridging ML with Data
This ERC grant will support our research in understanding, and hopefully solving, one of the largest challenges that we were facing in building this system.
The dominant component in many ML applications is often not the ML ... ML model is just a small island embedded in an ocean of data-centric components.
While most today's trustworthy AI studies — around robustness, fairness, and explainability — are largely centered on understanding a single ML model, in reality, most real-world ML/AI applications are more complex. In fact, the dominant component in ML applications is often not about the ML. In the above picture we can see Sculley et al.'s quite influential depiction of the structure of an ML/AI application, where it can be seen that the ML model is just a small island embedded in an ocean of data-centric components. Being able to trust an ML model does not mean trusting this end-to-end data-centric pipeline. The goal of this project is to extend our understanding of trustworthy AI from a single ML/AI model to an end-to-end application.
Specifically, this ERC support will aid our explorations of two key questions, among many others that we will also do in parallel, as illustrated in the figure below: How should we consider the data influence (i.e., which data examples are most important if we want to improve the fairness of our ML model?) with the presence of feature extraction pipelines? And how should we consider robustness (i.e., what is the largest output perturbation of our models given an input perturbation) with the presence of post-processing pipelines? We hypothesize that, if we are able to provide efficient algorithms to these questions, we should be able to significantly broaden the applications of today's trustworthy AI methods to real-world applications.
This won't be an easy journey. The data management community has been studying data transformations for decades, while the machine learning community has been studying data influence and the robustness of ML models for just as long. A joint analysis of an end-to-end application requires us to bridge the many results coming out of these two communities in a theoretically principled, while practically feasible way. While there is a still a long way to go, we are nevertheless optimistic and the support of this ERC grant will definitely accelerate our work. If you are curious about some of our current thinking about these questions, here is a technical talk on some of our current results.
Larger-scale Community Efforts
Many of the research objectives that we are exploring are closely related to the recent trend in "Data-centric AI"; whereby it is often the case that the best way to improve an ML model is to improve the data. At the end of this blog, we hope to provide some pointers to several community efforts that we are privileged to participate in, together with many of our collaborators.