Stanford RAIN (Research on Artificial Intelligence and INcentives) Seminar
RAIN is a seminar on the theory and practice of AI in strategic and societal settings. Supported by Stanford’s Society & Algorithms Lab (SOAL), it serves as a hub for talks and discussion at the intersection of AI, incentives, and society.
- Our talks are on Tuesdays from 4:30-5:30 PM PT in Spilker 232!
- To receive updates on upcoming talks, please join our email list!
Upcoming Talks
→ Abstract and Bio
We study the bidding problem in repeated uniform price multi-unit auctions from the perspective of a value-maximizing buyer. The buyer aims to maximize their cumulative value over T rounds while adhering to per-round return-on-investment (RoI) constraints in a strategic (or adversarial) environment. Using an m-uniform bidding format, the buyer submits m bid-quantity pairs (bi, qi) to demand qi units at bid bi, with m ≪ M in practice, where M denotes the buyer's maximum demand. We introduce the notion of safe bidding strategies as those that satisfy the RoI constraints irrespective of competing bids. Despite the stringent requirement, we show that these strategies satisfy a mild no-overbidding condition, depend only on the bidder's valuation curve, and the bidder can focus on a finite subset without loss of generality. Though the subset size is exponential in m, we design a polynomial-time learning algorithm that achieves sublinear regret, both in full-information and bandit settings, relative to the hindsight-optimal safe strategy. We assess the robustness of safe strategies against the hindsight-optimal strategy from a richer class. We define the richness ratio α ∈ (0, 1] as the minimum ratio of the value of the optimal safe strategy to that of the optimal strategy from richer class and construct hard instances showing the tightness of α. Our algorithm achieves α-approximate sublinear regret against these stronger benchmarks. Simulations on semi-synthetic auction data show that empirical richness ratios significantly outperform the theoretical worst-case bounds. The proposed safe strategies and learning algorithm extend naturally to more nuanced buyer and competitor models.Bio: Negin Golrezaei is the Theresa Seley Associate Professor of Management Science and an Associate Professor of Operations Management at the MIT Sloan School of Management. Her research focuses on advancing digital marketplaces—such as e-commerce, online advertising, and emissions trading systems—through data-driven strategies and algorithmic innovations. She aims to create more resilient, equitable, and sustainable digital ecosystems. In addition to her academic role, Negin has served as a visiting scholar at Google Research and Meta, where she collaborated with research and product teams to design and test new mechanisms for online marketplaces. Before joining MIT, she was a postdoctoral fellow at Google Research in New York, working with the Market Algorithms team. She holds a BSc (2007) and MSc (2009) in Electrical Engineering from Sharif University of Technology, Iran, and a PhD (2017) in Operations Research from the University of Southern California.
→ Abstract and Bio
Machine learning can significantly improve average performance for decision-making under uncertainty in a wide range of domains. However, ensuring robust, risk-aware decisions—a critical need in high-stakes settings—requires well-calibrated uncertainty estimates; yet in high-dimensional settings, there can be many valid uncertainty estimates, each with its own performance profile. That is, not all uncertainty is equally valuable for downstream decision-making. In this talk, I will discuss recent work developing an end-to-end learning framework to train machine learning models while enforcing uncertainty calibration and risk constraints through conformal prediction-based methods. Our proposed approach enables provable guarantees on calibration and risk control while providing consistent improvements over existing, two-stage baselines in applications spanning energy systems and medical image classification.Bio: Nico Christianson is a Stanford Energy Postdoctoral Fellow and an incoming Assistant Professor of Computer Science at Johns Hopkins University (starting Fall 2026). His research lies broadly at the intersection of algorithms, machine learning, and optimization, with a specific emphasis on the development of new, theoretically-grounded algorithms and AI/ML methods for reliable decision-making under uncertainty. Much of his work is motivated by modern energy and sustainability challenges, with applications ranging from energy resource operation to sustainable computing systems. Nico received his PhD in computing and mathematical sciences from Caltech in 2025, where he was supported by an NSF Graduate Research Fellowship and a PIMCO Data Science Fellowship. His PhD dissertation won Caltech’s Ben P.C. Chou Doctoral Prize in Information Science and Technology and Demetriades-Tsafka-Kokkalis Prize in Renewable Energy. Before Caltech, Nico received an AB in applied mathematics from Harvard College.
Previous Talks This Year
→ Abstract and Bio
Artificial intelligence (AI) has been growing at an unprecedented pace. Many of us have experienced a “ChatGPT moment” — a realization that AI will profoundly transform our lives. While numerous challenges and calls for improvement remain, there is little doubt that AI agents will play a central role in shaping our future. We argue, however, that the prevailing perspective on AI agent design is insufficient for achieving desirable social welfare, not merely due to computational or regulatory constraints. While it is understood that AI agents should be orchestrated in order to be used by an organization, and that system-level outcomes depend not only on the design of individual agents, the far more intricate reality is that the combination of misaligned incentives and incompatible technological designs may lead to poor social outcomes. Our argument is not merely conceptual but constitutes a concrete call to action: to establish a systematic research agenda on Artificial Social Intelligence, tackling multi-agent alignment among incentive-wise and technology-wise diverse AI agents. We illustrate this vision through four complementary research directions: (i) understanding multi-agent alignment in information retrieval (search, RAG, attribution) ecosystems, (ii) analyzing model selection in language-based economics as a strategic choice, (iii) rethinking fairness and regulation through the lens of multi-agent ethics, and (iv) designing hybrid social laws for human-AI coexistence. Together, these directions outline a roadmap toward welfare-maximizing AI societies— an essential step toward socially aligned intelligence.Bio: Moshe Tennenholtz is a professor with the Technion -- Israel Institute of Technology, where he joined on 1993, and holds the Sondheimer Technion Academic Chair. In 2008 Moshe founded the Microsoft Research activity in Israel, and served as its leader until 2014, with significant contributions and distinguished ROI. He was also the founder of the Technion-Microsoft research center, and served as his scientific director, and served as co-founder and chief scientist of several startups. In joint work with colleagues and students he introduced several contributions to the interplay between artificial intelligence to game theory / economics, such as the study of artificial social systems, co-learning, non-cooperative computing, distributed games, the axiomatic approach to ranking, reputation, recommendation and trust systems, competitive safety analysis, program equilibrium, mediated equilibrium, learning equilibrium, as well the first near-optimal algorithm for reinforcement learning in adversarial contexts.
→ Abstract and Bio
In domains where AI systems have achieved superhuman performance, there is an opportunity to study the similarities and contrasts between human and AI behaviors at the level of granular decisions, not just aggregate performance. This type of analysis can yield several potential sources of insight. First, by studying expert-level human decisions through the lens of systems that far surpass this performance, we can try to characterize the settings in which humans errors are most likely to occur. Second, we can try to design systems whose decisions match human ones as closely as possible. And finally, we can ask whether it is possible to adapt superhuman AI so that its decisions can be usefully interleaved with human decisions, making them compatible in a way that allows collaboration. We pursue these goals in a domain with a long history in AI: chess. For our purposes, chess provides a setting with many different levels of human expertise; like other domains where people acquire expertise and mastery, it is a context in which people train over many years, drawing on more than a century of scholarship in the area, and acquire levels of skill far beyond what most practitioners can ever hope to achieve. And yet if we construe the goal of chess to be the winning of chess games, then algorithms have long since surpassed human beings, and by an increasingly enormous margin, allowing us to study what happens when powerful algorithms are introduced into a domain like this. We'll discuss a line of work that predicts human decisions in chess at a move-by-move level much more accurately than existing chess engines, and in a way that is tunable to fine-grained differences in human skill; then we'll talk about extensions that use this framework to create AI chess agents that are simultaneously superhuman but also more compatible with human decision-making. We'll use these results to reflect on what we can learn from chess as a setting that simultaneously exhibits both very high levels of human skill and AI that has progressed far into superhuman levels of ability. The talk is based on joint work with Ashton Anderson, Solon Barocas, Karim Hamade, Difan Jiao, Reid McIlroy-Young, Sendhil Mullainathan, Siddhartha Sen, Zhenwei Tang, Russell Wang, and Eric Xue.Bio: Jon Kleinberg is the Tisch University Professor in the Departments of Computer Science and Information Science at Cornell University. His research focuses on the interaction of algorithms and networks, the roles they play in large-scale social and information systems, and their broader societal implications. He is a member of the National Academy of Sciences, the National Academy of Engineering, the American Academy of Arts and Sciences, and the American Philosophical Society, and he has served on advisory groups including the National AI Advisory Committee (NAIAC) and the National Research Council's Computer Science and Telecommunications Board (CSTB) and Committee on Science, Technology, and Law (CSTL). He has received MacArthur, Packard, Simons, Sloan, and Vannevar Bush research fellowships, as well as awards including the the Nevanlinna Prize, the World Laureates Association Prize, the ACM/AAAI Allen Newell Award, and the ACM Prize in Computing.
→ Abstract and Bio
I will review the progress of large language models for mathematics over the last 3 years, from barely solving high school level mathematics to solving some minor open problems in convex optimization, combinatorics and probability theory. The emphasis will be on trying to identify the shape of the current frontier capabilities, as it stands today, finding out both where it's helpful and where it's still falling short as a research assistant.Bio: Sebastien Bubeck is currently a research lead at OpenAI. Previously he served as VP AI and Distinguished Scientist at Microsoft, spending 10 years in Microsoft Research, and before that he was an assistant professor at Princeton University. His work on machine learning, convex optimization and online algorithms won several best paper awards, and more recently his work on Large (and Small) Language Models, including their applications to science, were featured in mainstream media such as the New York Times and Wired.
→ Abstract and Bio
In this talk, I present a mathematical model for the spread of an epidemic from one community to another via travel. Here each community is modeled by a random network (for simplicity, we assume it is an Erdos-Renyi random graph), with the epidemic spread inside the community given by the SIR model on this graph. Travel is modeled by individuals moving from one community to the other at some rate eta_T, and returning home at another rate eta_H. We assume that the return rate is of the same order as the recovery rate of the epidemic, while eta_T is much smaller. Under this assumption, we rigorously prove that if an epidemic starts in the first community, and the second community enacts a travel ban at the moment the epidemic is large enough to be detectable, such a travel ban is ineffective in preventing a large outbreak in the second community. But contrast, other mitigation measure like masks or vaccinations (modeled by reducing the rate of infections in the second community) are effective.Bio: Christian Borgs is professor in the Berkeley AI Research Group (BAIR) in the EECS department at Berkeley, and faculty director of the Bakar Institute of Digital Materials for the Planet. Borgs is a Fellow of the American Mathematical Society, and the American Association for the Advancement of Science. Borgs current research focuses on both AI for science and the science of networks, including mathematical foundations, particularly the theory of graph limits aka Graphons (which he co-invented about 15 years ago), graph processes, graph algorithms, and applications of graph theory from economics to systems biology and epidemics.
→ Abstract and Bio
Large language models (LLMs) sometimes generate statements that are plausible but factually incorrect—a phenomenon commonly called "hallucination." We argue that these errors are not mysterious failures of architecture or reasoning, but rather predictable consequences of standard training and evaluation incentives. We show (i) that hallucinations can be viewed as classification errors: when pretrained models cannot reliably distinguish a false statement from a true one, they may produce the false option rather than saying I don't know; (ii) that optimization of benchmark performance encourages guessing rather than abstaining, since most evaluation metrics penalize expressing uncertainty; and (iii) that a possible mitigation path lies in revising existing benchmarks to reward calibrated abstention, thus realigning incentives in model development. Joint work with Santosh Vempala (Georgia Tech) and Ofir Nachum & Edwin Zhang (OpenAI).Bio: Adam Tauman Kalai is a Research Scientist at OpenAI, specializing in AI Safety and Ethics. His research interests also include algorithms, AI theory, and game theory. Adam earned his BA from Harvard University and his PhD from Carnegie Mellon University, after which he served as an Assistant Professor at TTIC and Georgia Tech and a Senior Principal Researcher at Microsoft Research New England. He is also a member of Project CETI's science team.
→ Abstract and Bio
As AI becomes increasingly integrated into both the private and public sectors, challenges around AI safety and policy have arisen. There is a growing, compelling body of work around the legal and societal challenges that come with AI, but there is a gap in our rigorous understanding of these problems. In this talk, I dive deep into a few topics in AI safety and policy. We will discuss AI supply chains (the increasingly complex ecosystem of AI actors and components that contribute to AI products) and study how AI supply chains complicate machine learning objectives. We'll then shift our discussion to AI audits and evidentiary burdens in cases involving AI. Using Pareto frontiers as a tool for assessing performance-fairness tradeoffs, we will show how a closed-form expression for performance-fairness Pareto frontiers can help plaintiffs (or auditors) overcome evidentiary burdens or a lack of access in AI contexts. I'll conclude with a longitudinal study of LLMs during the 2024 US election season. If time permits, we may touch on formal notions of trustworthiness.Bio: Sarah Cen is a postdoc at Stanford University and incoming Assistant Professor at Carnegie Mellon University's Departments of ECE & EPP. At Stanford, Sarah works with Prof. Percy Liang in Computer Science and Prof. Daniel Ho in the Stanford Law School. Her research is interdisciplinary and inspired by works in machine learning, economics, law, and policy. She has ongoing work on algorithmic auditing, AI supply chains, due process for AI determinations, risk under the EU AI Act, and formalizing trustworthy algorithms. Previously, Sarah received her BSE in Mechanical Engineering from Princeton University and Master's in Engineering Science (Robotics) from Oxford University, where she worked on autonomous vehicles.
→ Abstract and Bio
As agents move from the lab into real-world settings, designers have a limited ability to anticipate the agent's context and design explicit safeguards. In this talk, I will outline challenges that this raises from the perspective of designing flexible, robust, and aligned agent behaviors. The key to the approach is to design agents that can model and respond to appropriate uncertainty about a user's intended goal and the normative environment they are deployed into. I will begin with a survey of current alignment techniques and AI agents, then outline the theoretical motivation for this approach. Next, I will describe recent work from my lab that attempts to address this problem by 1) designing flexible goal inference mechanisms that can track the set of plausible user goals reliably from context; and 2) integrating these inference tools with efficient agent designs that leverage POMDP solvers in order to train agents that implement belief-constrained behaviors. I will conclude with a discussion of recent work that evaluates collaborative agents and discuss the implications for the design of aligned systems that augment and integrate with human users and intent.Bio: Dylan Hadfield-Menell is an Associate Professor of EECS at MIT. His research develops methods to ensure that AI systems behavior aligns with the goals and values of their human users and society as a whole, a concept known as 'AI alignment'. His goal is to enable the safe, beneficial, and trustworthy deployment of AI in real-world settings.
→ Abstract and Bio
From healthcare delivery to resilient power grid management, optimization has the potential to improve decision-making for some of today's most pressing problems, but its use is often limited by the mathematical expertise required to model and solve complex problems. This talk will showcase the potential of generative AI to lower this barrier and democratize access to advanced optimization tools. Motivated by a collaboration with Microsoft Outlook, the first part of the talk will present a novel framework for interactive decision support for non-expert users that leverages large language models (LLM) to translate user requests into an underlying constraint programming model. We investigate this framework through the lens of meeting scheduling, and showcase its potential via a user study with a prototype system. In the second part of the talk, we demonstrate how LLMs can be used to automatically generate problem-specific optimization solver configurations, a challenging task for even expert optimization users. Our approach achieves up to 70% speed-ups over default solver settings with little-to-no additional compute. We will conclude by discussing broader opportunities for integrating natural language and optimization, moving toward a future where powerful decision-making tools are as accessible for managers at a local food bank as they are for applied scientists at Amazon.Bio: Connor Lawless is a Postdoctoral Fellow at the Stanford Institute for Human- Centered Artificial Intelligence advised by Ellen Vitercik and Madeleine Udell. His research blends tools from optimization, machine learning, and human-computer interaction to make advanced analytics tools more accessible and trustworthy. He received his PhD in Operations Research from Cornell University where he was advised by Oktay Gunluk, and previously spent time at Microsoft Research, IBM Research, and the Royal Bank of Canada.
→ Abstract and Bio
Traditional evaluation methods for large language models (LLMs)—often centered on accuracy in static multiple-choice or short-answer questions—fail to capture real-world complexities. As LLMs increasingly serve users in dynamic, multicultural contexts, we must redefine meaningful evaluation. This talk presents our recent research advancing LLM evaluation through culturally aware, socially grounded, and customizable benchmarks. We assess factual consistency across languages, everyday knowledge in underrepresented cultures, and cultural inclusivity. We highlight that biases become evident in generation tasks, reflecting actual LLM use. Central to our approach is BenchHub, a unified benchmark suite categorizing over 300,000 questions across diverse domains and cultures, enabling tailored evaluations. BenchHub underscores domain-specific variations and the critical role benchmark composition plays in LLM performance rankings. These insights demonstrate that accuracy alone is insufficient; comprehensive LLM evaluation must consider culture, context, and customization. This talk advocates a broader evaluation agenda, presenting foundational steps toward robust, inclusive assessments.Bio: Alice Oh is a Professor in the School of Computing at KAIST. Her major research area is at the intersection of natural language processing (NLP) and computational social science, with a recent focus on multilingual and multicultural aspects of LLMs. She collaborates with scholars in humanities and social sciences such as political science, education, and history. She has served as Program Chair for ICLR 2021 and NeurIPS 2022, General Chair for ACM FAccT 2022 and NeurIPS 2023, and DEI Chair for COLM 2024. She is the current President of SIGDAT which oversees EMNLP.
→ Abstract and Bio
After pre-training, large language models are aligned with human preferences based on crowdsourced pairwise comparisons. State-of-the-art alignment methods (such as PPO-based RLHF and DPO) are built on the assumption of aligning with a single preference model, despite being deployed in settings where users have diverse preferences. As a result, it is not even clear that these alignment methods produce models that satisfy users in any meaningful way. In this work, we ask a deceptively simple yet foundational question: Do state-of-the-art alignment methods actually produce models that satisfy users on average in the presence of heterogeneous preferences? Drawing on social choice theory, and modeling each user's comparisons via an individual Bradley–Terry (BT) model, we introduce the distortion of an alignment method: the worst-case ratio between the optimal achievable average utility and the average utility of the learned policy. This notion yields concrete insights into alignment with heterogeneous preferences. In particular, we establish an impossibility result for aligning to average user utility — counter to the conventional wisdom that ML methods, even if imperfect for every individual, at least perform well on average. Distortion also highlights sharp differences between alignment methods: we show that widely used approaches such as RLHF and DPO can have exponentially large — or even unbounded — distortion, whereas a constant minimax-optimal distortion is achievable via a method inspired by social choice theory, known as maximal lotteries, or Nash Learning from Human Feedback.Bio: Nika Haghtalab is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at UC Berkeley. She works on a broad and versatile set of problems related to machine learning, algorithms, economics, and society. Her work contributes to an emerging mathematical foundation for learning and decision-making systems in the presence of economic and societal forces. Her work has been recognized by a Sloan fellowship (2024), Schmidt Sciences AI2050 award, NSF CAREER (2022), Google Research Scholar award (2023), NeurIPS and ICAPS best paper awards, EC exemplary track paper awards, and several other industry awards and fellowships.
→ Abstract and Bio
Training data is now recognized as a key driver the performance of AI systems. Indeed, AI companies are signing multi-million dollar deals for training data acquisition, raising the question: how "should" this training data be priced? Understanding how to value training data requires us to understand the downstream impact of this data on model behavior---which is made challenging by the complex, uninterpretable nature of large-scale ML models. In the first part of this talk, we present some recent work on tracing back model performance to training data---improving on a long line of prior work in machine learning, our method can optimally (in a natural sense) predict the impact of training data on model performance. In the second part of the talk, we propose a framework for studying data pricing theoretically, inspired by our experimental results in the first part of the talk. We conclude with some open questions and directions.Bio: Andrew is an incoming Assistant Professor at CMU. Previously, he was a Stein Fellow at Stanford and a PhD student at MIT, where he was supported by an Open Philanthropy AI Fellowship. His interests are currently in understanding and predicting the effects of design choices on downstream machine learning systems.
Previous talks can be found here.
About The Seminar
Seminar Organizers: Amin Saberi, Nikil Selvam, Xizhi Tan, Ellen Vitercik.
Faculty Involved: Itai Ashlagi, Ashish Goel, Ramesh Johari, Amin Saberi, Aaron Sidford, Johan Ugander, Irene Lo, Ellen Vitercik.
Note for Speakers: The talk is 55 minutes including questions (as we often start a couple of minutes late). If you are giving a talk at RAIN, please plan a 45-50 minute talk since the audience usually ask a lot of questions. Also, the audience is fairly knowledgeable, so speakers should not feel obligated to provide basic game-theoretic, algorithmic, societal, industrial, probabilistic, or statistical background.
Website template from the Stanford MLSys Seminar Series.
