The RAIN seminar is held on Wednesdays from 4:305:30pm via Zoom this year. You can subscribe to the seminar mailing list by visiting here. To view any recorded talks, you need to use your Stanford SUNet ID and password.
RAIN schedule for Spring 2021
Date  Speaker  Topic  Comment 

May 5

Mukund Sundararajan  Using Attribution to Understand Deep Neural Networks


Apr. 28

Lihua Lei  Hierarchical Community Detection for Heterogeneous and Multiscaled Networks


Apr. 21

Shipra Agrawal  Dynamic Pricing and Learning under the Bass Model


Apr. 14

Paul Milgrom  Investment Incentives in NearOptimal Mechanisms


Apr. 7

Michael Leung  Network ClusterRobust Inference

RAIN schedule for Winter 2021
Date  Speaker  Topic  Comment 

Mar. 3

Arun Chanrashaker  Identifying the latent space geometry of network models through analysis of curvature


Feb. 24

Annie Liang  Data and Incentives


Feb. 17

Jean PougetAbadie  Design and Analysis of Bipartite (Market) Experiments

RAIN schedule for Fall 2020
Date  Speaker  Topic  Comment 

Dec. 2

José Correa  The Value of Observability in Dynamic Pricing


Nov. 18

Guillaume Basse  Displacement Effects in a Hot Spot Policing Intervention in Medellin: Inference and Pitfalls

Time: 8:309:30AM PT (RAIN/OR Joint Seminar) 
Nov. 4

Katy Craig  Gradient Flows in the Wasserstein Metric: From Discrete to Continuum via Regularization

(OR Seminar) 
Oct. 28

Christopher Musco  Optimal Stochastic Trace Estimation

(OR Seminar) 
Oct. 21

Karl Rohe  Vintage Factor Analysis with Varimax Performs Statistical Inference


Oct. 14

Ruta Mehta  Nash Social Welfare Approximation for Strategic Agents


Oct. 7

Carrie Wu  Developing Data Efficient Algorithms for AI

(SOAL Seminar) 
Sep. 30

Betsy Ogburn  Social network dependence, the replication crisis, and (in)valid inference

Google Calendar for RAIN
Previous year's talks
Archived talks can be accessed here.
Talk Abstracts
Using Attribution to Understand Deep Neural NetworksMukund Sundararajan
Abstract:
Predicting cancer from XRays seemed great
Until we discovered the true reason.
The model, in its glory, did fixate
On radiologist markings  treason!
We found the issue with attribution:
By blaming pixels for the prediction (1,2,3,4,5,6).
A complement'ry way to attribute,
is to pay training data, a tribute (1).
If you are int'rested in FTC,
counterfactual theory, SGD
Or Shapley values and fine kernel tricks,
Please come attend, unless you have conflicts
Should you build deep models down the road,
Use attributions. Takes ten lines of code!
There once was an RS called MS,
He studied models that are a mess,
A director at Google.
Accurate and frugal,
Explanations are what he liked best.
Hierarchical Community Detection for Heterogeneous and Multiscaled Networks
Lihua Lei
Abstract: Realworld networks are often hierarchical, heterogeneous, and multiscaled, while the idealized stochastic block models that are extensively studied in the literature tend to be oversimplified. In a line of work, we propose several topdown recursive partitioning algorithms which start with the entire network and divide the nodes into two communities by certain spectral clustering methods repeatedly, until a stopping rule indicates no further community structures. For these algorithms, the number of communities does not need to be known a priori or estimated consistently. On a broad class of hierarchical network models, our algorithms are proved to achieve the exact recovery for sparse networks with expected node degrees logarithmic in the network size, and are computationally more efficient than nonhierarchical spectral clustering algorithms. More interestingly, we identify regimes where no algorithm can recover all communities simultaneously while our algorithm can still recover the megacommunities (unions of communities defined by the hierarchy) consistently without recovering the finest structure. Our theoretical results are based on the newly developed twotoinfinity eigenspace perturbation theory for binary random matrices with independent or dependent entries.
Bio: Lihua Lei is a postdoctoral researcher in Statistics at Stanford University, advised by Emmanuel Candès. Prior to joining Stanford, he obtained his Ph.D. in statistics at UC Berkeley, advised by Peter Bickel and Michael Jordan. His research areas include network analysis, causal inference, conformal inference, multiple hypothesis testing, and stochastic optimization.Dynamic Pricing and Learning under the Bass Model
Shipra Agrawal
Abstract: We consider a novel formulation of the dynamic pricing and demand learning problem, where the evolution of demand in response to posted prices is governed by a stochastic variant of the popular Bass model with parameters (α, β) that are linked to the socalled "innovation" and "imitation" effects. Unlike the more commonly used i.i.d. demand models, in this model the price posted not only affects the demand and the revenue in the current round but also the evolution of demand, and hence the fraction of market potential that can be captured, in future rounds. Finding a revenuemaximizing dynamic pricing policy in this model is nontrivial even when model parameters are known, and requires solving for the optimal nonstationary policy of a continuoustime, continuousstate MDP. In this paper, we consider the problem of dynamic pricing is used in conjunction with learning the model parameters, with the objective of optimizing the cumulative revenues over a given selling horizon. Our main contribution is an algorithm with a regret guarantee of O (m^2/3), where m is mnemonic for the (known) market size. Moreover, we show that no algorithm can incur smaller order of loss by deriving a matching lower bound. We observe that in this problem the market size m, and not the time horizon T, is the fundamental driver of the complexity; our lower bound in fact indicates that for any fixed α,β, most nontrivial instances of the problem have constant T and large m. This insight sets the problem setting considered here uniquely apart from the MAB type formulations typically considered in the learning to price literature.
Bio: Shipra Agrawal is Cyrus Derman Assistant Professor of the Department of Industrial Engineering and Operations Research. She is also affiliated with the Department of Computer Science and the Data Science Institute, at Columbia University. She received her Ph.D. from Stanford University in June 2011 under the guidance of Prof. Yinyu Ye and was a researcher at Microsoft Research India from 2011 to 2015. Her research spans several areas of optimization and machine learning, including online optimization under uncertainty, multiarmed bandits, online learning, and reinforcement learning. Shipra serves as an associate editor for Management Science, Mathematics of Operations Research, and INFORMS Journal on Optimization. Her research is supported by a Google Faculty Research Award, an Amazon research award, and an NSF CAREER Award.Investment Incentives in NearOptimal Mechanisms
Paul Milgrom
Abstract: In many realworld resource allocation problems, optimization is computationally intractable, so any practical allocation mechanism must be based on an approximation algorithm. We study investment incentives in strategyproof mechanisms that use such approximations. In sharp contrast with the VickreyClarkGroves mechanism, for which individual returns on investments are aligned with social welfare, we find that some algorithms that approximate efficient allocation arbitrarily well can nevertheless create misaligned investment incentives that lead to arbitrarily bad overall outcomes. However, if a nearefficient algorithm "excludes bossy negative externalities," then its outcomes remain nearefficient even after accounting for investments. A weakening of this "XBONE" condition is necessary and sufficient for the result.
Bio: Paul Milgrom is the Shirley and Leonard Ely professor of Humanities and Sciences in the Department of Economics at Stanford University and professor, by courtesy, at both the Department of Management Science and Engineering and the Graduate School of Business. He is the world's leading auction designer, known for his work on auction theory and innovative resource allocation methods, particularly in radio spectrum. He is the corecipient of the 2020 Nobel Prize in Economic Sciences, together with Robert Wilson, "for improvements to auction theory and inventions of new auction formats."Network ClusterRobust Inference
Michael Leung
Abstract: Since network data commonly consists of observations on a single large network, researchers often partition the network into clusters in order to apply clusterrobust inference methods. All existing such methods require clusters to be asymptotically independent. We prove that for this requirement to hold, under certain conditions, it is necessary and sufficient for clusters to have low conductance, the ratio of edge boundary size to volume, which yields a measure of cluster quality. We show in simulations that, for important classes of networks lacking lowconductance clusters, clusterrobust methods can exhibit substantial size distortion, whereas for networks with such clusters, they outperform HAC estimators. To assess the existence of lowconductance clusters and construct them, we draw on results in spectral graph theory showing a close connection between conductance and the spectrum of the graph Laplacian. Based on these results, we propose to use the spectrum to compute the number of lowconductance clusters and spectral clustering to compute the clusters. We illustrate our results and proposed methods in simulations and empirical applications.
Bio: Michael Leung is an economist at USC whose work focuses on developing econometric methods for network data.Identifying the latent space geometry of network models through analysis of curvature
Arun Chandrasekhar
Abstract: Statistically modeling networks, across numerous disciplines and contexts, is fundamentally challenging because of (often highorder) dependence between connections. A common approach assigns each person in the graph to a position on a lowdimensional manifold. Distance between individuals in this (latent) space is inversely proportional to the likelihood of forming a connection. The choice of the latent geometry (the manifold class, dimension, and curvature) has consequential impacts on the substantive conclusions of the model. More positive curvature in the manifold, for example, encourages more and tighter communities; negative curvature induces repulsion among nodes. Currently, however, the choice of the latent geometry is an a priori modeling assumption and there is limited guidance about how to make these choices in a datadriven way. In this work, we present a method to consistently estimate the manifold type, dimension, and curvature from an empirically relevant class of latent spaces: simply connected, complete Riemannian manifolds of constant curvature. Our core insight comes by representing the graph as a noisy distance matrix based on the ties between cliques. Leveraging results from statistical geometry, we develop hypothesis tests to determine whether the observed distances could plausibly be embedded isometrically in each of the candidate geometries. We explore the accuracy of our approach with simulations and then apply our approach to datasets from economics and sociology as well as neuroscience. Paper.
Bio: Arun Chandrasekhar is an Associate Professor of Economics at Stanford University. His work focuses on development economics. He studies the role that social networks play in developing countries. In particular, he is interested in how the economics of networks can help us understand information aggregation failures and breakdown of cooperation in the developing world.
Data and Incentives
Annie Liang
Abstract: Markets for lending and insurance incentivize good behavior by forecasting risk on the basis of past outcomes. As "big data" expands the set of covariates used to predict risk, how will these incentives change? We show that "attribute" data which is informative about consumer quality tends to decrease effort, while "circumstance" data which predicts idiosyncratic shocks to outcomes tends to increase it. When covariates are independent, this effect is uniform across all consumers. Under more general forms of correlation, this effect continues to hold on average, but observation of a new covariate may lead to disparate impactincreasing effort for some consumer groups and decreasing it for others. A regulator can improve social welfare by restricting the use of either attribute or circumstance data, and by limiting the use of covariates with substantial disparate impact. Paper.
Bio: Annie Liang is an Assistant Professor of Economics and an Assistant Professor of Computer Science at Northwestern University. Her work focuses on economic theory and the application of machine learning techniques in the social sciences. She has studied the dynamics of strategic information acquisition, as well as the use of machine learning to evaluate and improve economic models.
Design and Analysis of Bipartite (Market) Experiments
Jean PougetAbadie
Abstract: Bipartite experiments are randomized experiments where treatment is applied to one set of units, while outcomes are measured on a different set of units. The interactions between the treated units and the "outcome" units can be captured by a bipartite graph. Bipartite experiments are a recent object of study in causal inference, whereby treatment is applied to one set of units and outcomes of interest are measured on a different set of units. These experiments are particularly useful in settings where strong interference effects occur between units of a bipartite graph. In market experiments for example, assigning treatment at the sellerlevel and measuring outcomes at the buyerlevel (or viceversa) may lead to causal models that better account for the interference that naturally occurs between buyers and sellers. In this talk, we will cover the motivation for and formal setting of bipartite experiments. Furthermore, we will explore possible design choices for such experiments, namely clustered randomized designs, as well as various unbiased inference methods. Paper (a). Paper (b).
Bio: Jean is a research scientist at Google NY, on the Algorithms and Optimization team, led by Vahab Mirrokni. Before Google, Jean was a PhD student in computer science at Harvard University, advised by Edoardo Airoldi and Salil Vadhan. Prior to that, he was an undergraduate at Ecole Polytechnique in Paris. His recent research interests focus on causal inference, particularly when interactions between units are present. For more information, see website .
The Value of Observability in Dynamic Pricing
José Correa
Abstract: We consider a dynamic pricing problem where a firm sells one item to a single buyer in order to maximize expected revenues. The firm commits to a price function over an infinite horizon. The buyer arrives at some random time with a private value for the item. He is more impatient than the seller and strategizes the time of his purchase in order to maximize his expected utility, which implies either buying immediately or waiting to benefit from a lower price. We study how important is to observe the buyer arrival time in terms of the seller's expected revenue. When the seller can observe the arrival of the buyer, she can make the price function contingent on his arrival time. On the contrary, when the seller cannot observe the arrival, her price function is fixed at time zero for the whole horizon. The value of observability (VO) is defined as the worst case ratio between the expected revenue of the seller when she observes the buyer's arrival and that when she does not. Our main result establishes that in a very general setting about valuation and arrival time distributions, the value of observability is bounded by a small constant. To obtain this bound we fully characterize the observable arrival setting and use this solution to construct a random and periodic price function for the unobservable case. This is joint work with Dana Pizarro and Gustavo Vulcano
Bio: José Correa is a full professor in the Department of Industrial Engineering at Universidad de Chile. José obtained a mathematical engineering degree from Universidad the Chile in 1999 and a PhD in Operations Research from MIT in 2004. His research, focusing in algorithmic game theory and mechanism design, has received numerous awards including an ACM SIGecom best paper award, an INFORMS Transportation Science and Logistics best paper awards, a Tucker prize finalist, and research awards from Amazon and Google. José has given keynote talks at several institutions and conferences and has been in the program committee of international computer science conferences. He also serves and has served in the editorial board of some of the leading journals of his field: Mathematical Programming B, Mathematics of Operations Research (as Game Theory Area Editor), and Operations Research.
Displacement Effects in a Hot Spot Policing Intervention in Medellin: Inference and Pitfalls
Guillaume Basse
Abstract: In hot policing, resources are targeted at specific locations predicted to be at high risk of crime; socalled ``hot spots.'' Rather than reduce overall crime, however, there is a concern that these interventions simply displace crime from the targeted locations to nearby nonhot spots. We address this question in the context of a largescale randomized experiment in Medellin, Colombia, in which police were randomly assigned to increase patrols at a subset of possible hotspots. Estimating the displacement effects on control locations is difficult because the probability that a nearby hotspot is treated is a complex function of the underlying geography. While existing methods developed for this ``general interference'' setting, especially HorvitzThompson (HT) estimators, have attractive theoretical properties, they can perform poorly in practice and mislead practitioners. In this talk, I explore the key pitfalls that practitioners should watch out for when conducting this type of analysis, and propose some ways to partially remedy them.
Bio: Guillaume Basse is an Assistant Professor in the MS&E and Statistics departments at Stanford. His research focuses broadly on Causal Inference and Design of Experiments in complex settings. He is particularly interested in complications arising when interventions spillover across space and / or time.
Gradient Flows in the Wasserstein Metric: From Discrete to Continuum via Regularization
Katy Craig
Abstract: Over the past ten years, optimal transport has become a fundamental tool in statistics and machine learning: the Wasserstein metric provides a new notion of distance for classifying distributions and a rich geometry for interpolating between them. In parallel, optimal transport has led to new theoretical results on the stability and long time behavior of partial differential equations through the theory of Wasserstein gradient flows. These two lines of research recently intersected in a series of works that characterized the dynamics of training neural networks with a single hidden layer as a Wasserstein gradient flow. In this talk, I will briefly introduce the mathematical theory of Wasserstein gradient flows and describe recent results on discrete to continuum limits. In particular, I will show how passing from the discrete to continuum limit by introducing an appropriate regularization can lead to faster rates of convergence, as well as novel, deterministic particle methods for diffusive processes.
Bio: Katy Craig is an assistant professor in the department of mathematics at the University of California, Santa Barbara. Prior to coming to UCSB, Katy received her Ph.D. from Rutgers University, held an NSF postdoc at UCLA, and held a UC President’s Postdoctoral Fellowship at UCSB.
Optimal Stochastic Trace Estimation
Christopher Musco
Abstract: I will discuss algorithms for an important computational primitive in linear algebra: approximately computing the trace of an implicit matrix A that can only be accessed through matrixvector multiplications. Trace approximation finds applications across machine learning and scientific computing, where it is used to compute matrix norms, spectral densities, logdeterminants, triangle counts in graphs, and much more. In 1990, Hutchinson introduced an elegant randomized algorithm for the trace approximation problem that has become ubiquitous in practice. I will introduce a simple modified version of this algorithm that provides the same theoretical guarantees, but requires quadratically fewer matrixvector multiplications, and performs far better in experiments. We pair this result with matching lower bounds based on reductions to communication complexity and hypothesis testing for spikedcovariance matrices. Our lower bounds fall into a broader research agenda of better understanding the computational complexity of basic linear algebra problems in the restricted "matrixvector query" model of computation, which generalizes common algorithmic frameworks like linear sketching and Krylov subspace methods. Joint work with Raphael A. Meyer, Cameron Musco, and David P. Woodruff. Paper.
Vintage Factor Analysis with Varimax Performs Statistical Inference
Karl Rohe
Abstract: Vintage Factor Analysis is nearly a century old and remains popular today with practitioners. A key step, the factor rotation, is historically controversial because it appears to be unidentifiable. This controversy goes back as far as Charles Spearman. The unidentifiability is still reported in all modern multivariate textbooks. This talk will overturn this controversy and provide a positive theory for PCA with a varimax rotation. Just as sparsity helps to find a solution in p>n regression, we show that sparsity resolves the rotational invariance of factor analysis. PCA + varimax is fast to compute and provides a unified spectral estimation strategy for Stochastic Blockmodels, topic models (LDA), and nonnegative matrix factorization. Moreover, the estimator is consistent for an even broader class of models and the old factor analysis diagnostics (which have been used for nearly a century) assess the identifiability. Paper.
Bio: Karl Rohe is an Associate Professor of Statistics at the University of WisconsinMadison, with courtesy appointments in Journalism, Educational Psychology, and Electrical & Computer Engineering. He is an AE at JRSSB and JASA. His PhD is from Berkeley in 2011, working with Bin Yu.
Nash Social Welfare Approximation for Strategic Agents
Ruta Mehta
Abstract: A central goal in the long literature on fair division is the design of mechanisms that implement fair outcomes, despite the participants' strategic behavior. We study this question by measuring the fairness of an allocation using the geometric mean of the agents' values, known as the Nash social welfare (NSW). This objective is maximized by widely known concepts such as the Nash bargaining solution, proportional fairness, and the competitive equilibrium with equal incomes; we focus on (approximately) implementing this objective and analyze the Trading Post mechanism. We consider allocating goods that are substitutes or complements and show that this mechanism achieves an approximation of 2 for concave utility functions, and becomes essentially optimal for complements, where it can reach (1+e) for any e > 0. Moreover, we show that the Nash equilibria of this mechanism are pure and provide individual fairness in the sense of proportionality. (Joint work with Simina Branzei and Vasilis Gkatzelis. To appear in Operations Research.)
Bio: Ruta Mehta is an assistant professor in CS at UIUC, working on algorithmic, complexity, strategic, fairness, and learning aspects of various gametheoretic and economic problems. Prior to this, she was a postdoctoral fellow at the Simons Institute, UC Berkeley (Aug'15  Dec'15), and at Georgia Tech (Sept'12  July'15). She did her Ph.D. from IITBombay. Her Ph.D. thesis, titled "Nash Equilibrium Computation in Various Games" won the ACM India Doctoral Dissertation Award, 2012. Other awards she received include Best Postdoctoral Fellow Award, 2014 at Georgia Tech, and the NSF CAREER Award, 2018.
Developing Data Efficient Algorithms for AI
Carrie Wu
Abstract: Our increasingly ambitious goals in artificial intelligence motivate several key algorithmic challenges, such as: how do we design algorithms that make the best use of the data that is available, and how do we design algorithms that are empirically and theoretically effective on the kinds of data that we often see in practice, for example, data with temporal dependencies and data that follow distributions that are hard to describe. In this talk, I will give examples of algorithmic solutions that addresses some of these challenges. I will first present a theoretical analysis of rates of convergence for SGD with experience replay, which is a technique used in Reinforcement Learning to break temporal differences in data. I will then present an algorithm that solves Markov Decision Processes with nearly optimal sample and runtime guarantees. Lastly, I will present an algorithmic solution for estimating local density for an arbitrary dataset.
Social network dependence, the replication crisis, and (in)valid inference
Betsy Ogburn
Abstract: In the first part of this talk, we show that social network dependence can result in /spurious associations due to network dependence/, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are wellknown, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we test for network dependence and for possible spurious associations in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure. But data with network dependence abounds, and in many settings researchers are explicitly interested in learning about social network dynamics. Therefore, there is high demand for methods for causal and statistical inference with social network data. The second part of the talk describes recent work on causal inference for observational data from a single social network, focusing on (1) new types of causal estimands that are of interest in social network settings, and (2) conditions under which central limit theorems hold and inference based on approximate normality is licensed.
Bio: Dr. Elizabeth (Betsy) Ogburn is currently an Associate Professor in the Department of Biostatistics at Johns Hopkins University. She is also the founder of the COVID19 Collaboration Platform. She received her Ph.D. in Biostatistics from Harvard University, and her research interests include causal inference and epidemiological methods.
Social network dependence, the replication crisis, and (in)valid inference
Betsy Ogburn
Abstract: In the first part of this talk, we show that social network dependence can result in /spurious associations due to network dependence/, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are wellknown, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we test for network dependence and for possible spurious associations in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure. But data with network dependence abounds, and in many settings researchers are explicitly interested in learning about social network dynamics. Therefore, there is high demand for methods for causal and statistical inference with social network data. The second part of the talk describes recent work on causal inference for observational data from a single social network, focusing on (1) new types of causal estimands that are of interest in social network settings, and (2) conditions under which central limit theorems hold and inference based on approximate normality is licensed.
Bio: Dr. Elizabeth (Betsy) Ogburn is currently an Associate Professor in the Department of Biostatistics at Johns Hopkins University. She is also the founder of the COVID19 Collaboration Platform. She received her Ph.D. in Biostatistics from Harvard University, and her research interests include causal inference and epidemiological methods.