Skip to content Skip to navigation

The RAIN seminar is held on Wednesdays from 12:00-1:00pm in Y2E2 101 . And yes, lunch is provided!

RAIN schedule for Spring 2019

Date Speaker Topic
April 10
Daniela Saban Sequential Procurement with Contractual and Experimental Learning
April 17
David Shmoys A Practical 4-Approximation Algorithm for Coflow Scheduling
May 8
Inbal Talgam-Cohen Contract Theory: An Algorithmic Approach
May 15
Arvind Narayanan Identification of anonymous DNA using genealogical triangulation
May 22
Leman Akoglu A Quest for Structure: Joint Graph Structure & Semi-Supervised Inference
May 29
Vahideh Manshadi Information Inundation on Platforms and Implications

RAIN schedule for Winter 2019

Date Speaker Topic
January 16
Vasilis Gkatzelis Cost-Sharing Methods for Scheduling Games under Uncertainty
January 23
Ehud Kalai A Rudimentary Index of Strategic Stability: Deterring Defectors, Sustaining Loyalists and Forming Equilibrium
January 30
Justin Grimmer The Effect of Identifying Constituents on Representative-Constituent Communication
February 20
Rediet Abebe Computational Interventions to Improve Access to Opportunity for Disadvantaged Populations
February 27
Itai Gurvich On the Taylor Expansion of Value Functions
March 6
Irene Lo Dynamic Matching in School Choice: Efficient Seat Reassignment after Late Cancellations
March 13
Aaron Clauset The role of academic environment on scientific productivity and prominence

RAIN schedule for Fall 2018

Date Speaker Topic
October 24
Omer Reingold A Complexity-Theoretic Perspective on Algorithmic Fairness
December 5
Yiling Chen Surrogate Scoring Rules and A Dominant Truth Serum

Google Calendar for RAIN

Previous year's talks

Archived talks can be accessed here.

Talk Abstracts

A Complexity-Theoretic Perspective on Algorithmic Fairness
Omer Reingold, Stanford

Abstract: Machine learning and data analysis have enjoyed tremendous success in a broad range of domains. These advances hold the promise of great benefits to individuals, organizations, and society as a whole. Undeniably, algorithms are informing decisions that reach ever more deeply into our lives, from news article recommendations to criminal sentencing decisions to healthcare diagnostics. This progress, however, raises (and is impeded by) a host of concerns regarding the societal impact of computation. A prominent concern is that left to their own devices, algorithms will propagate - even amplify - existing biases. Often, definitions of fairness are group-based, typically requiring that a given statistic be equal across a few demographic groups, socially identified as deserving protection. Other definitions aim at individual-level protections. Broad-stroke statistical guarantees tend to be much easier to satisfy (information and complexity theoretically) but tend to be much weaker in the protections they provide. We will discuss recent research towards bridging the gap between statistical and individual protections. These works provide efficient learning algorithms that ensure protection (according to some fairness notion) to every sub-population within some rich class of sets. Our approach to defining the class of sets to protect is complexity-theoretic. We aim at obtaining the strongest fairness guarantees that can be obtained with the available computational resources. Based on joint works with Ursula Hebert-Johnson, Michael P. Kim and Guy Rothblum.

Surrogate Scoring Rules and A Dominant Truth Seru
Yiling Chen, Harvard

Abstract: Strictly proper scoring rules (SPSR) are widely used when designing incentive mechanisms to elicit private information from strategic agents using realized ground truth signals, and they can help quantify the value of elicited information. In this paper, we extend such scoring rules to settings where a mechanism designer does not have access to ground truth. We consider two such settings: (i) a setting when the mechanism designer has access to a noisy proxy version of the ground truth, with known biases; and (ii) the standard peer prediction setting where agents' reports are the only source of information that the mechanism designer has. We introduce surrogate scoring rules (SSR) for the first setting, which use the noisy ground truth to evaluate quality of elicited information. We show that SSR preserves the strict properness of SPSR. Using SSR, we then develop a multi-task scoring mechanism -- called dominant truth serum (DTS) -- to achieve strict properness when the mechanism designer only has access to agents' reports. In comparison to standard equilibrium concepts in peer prediction, we show that DTS can achieve truthfulness in a multi-task dominant strategy. A salient feature of SSR and DTS is that they quantify the quality of information despite lack of ground truth, just as proper scoring rules do for the with verification setting. Our method is verified both theoretically and empirically using data collected from real human participants. This talk is based on joint work with Yang Liu.

Cost-Sharing Methods for Scheduling Games under Uncertainty
Vasilis Gkatzelis, Drexel

Abstract: We study the performance of cost-sharing protocols in a selfish scheduling setting with load-dependent cost functions. Previous work on selfish scheduling protocols has focused on two extreme models: omnipotent protocols that are aware of every machine and every job that is active at any given time, and oblivious protocols that are aware of nothing beyond the machine they control. The main focus of this talk is on a well-motivated middle-ground model of "resource-aware" protocols, which are aware of the set of machines that the system comprises, but unaware of what jobs are active at any given time. Apart from considering budget-balanced protocols, to which previous work was restricted, we augment the design space by also studying the extent to which overcharging can lead to improved performance.

A Rudimentary Index of Strategic Stability: Deterring Defectors, Sustaining Loyalists and Forming Equilibrium
Ehud Kalai, Northwestern

Abstract: A rudimentary index explains Nash-equilibrium choices observed in behavioral economics. A rudimentary index explains Nash-equilibrium choices observed in behavioral economics. The index assigns stability levels φ(π) = 0; 1; :::; n, to strategy profiles π of n-person games: Level 0 is assigned to profiles that are not Nash equilibrium, levels 1,2,...n-1 are assigned to Nash equilibria in increasing levels of stability, and level n is assigned to dominant-strategy equilibria. The index measures players' confidence in the optimality of their choices; and expands Nash's question of "whether a strategy profile deters individual defectors" to "what size defector groups does a strategy profile deter".

The Effect of Identifying Constituents on Representative-Constituent Communication
Justin Grimmer, Stanford

Abstract: Social media sites present an opportunity to connect constituents with elected officials, but platforms rarely enable elected officials to identify constituents or for constituents to identify themselves. We show how providing this information with a ``constituent badge" affects elected officials' and users' behavior. Randomizing users' access to the badge we find that elected officials are no more responsive to users with access to the badge. And we show this null result is unlikely to be because elected officials failed to understand the constituent badge. Access to the badge, however, affects how users interact with elected officials. The badge causes individuals to write higher quality comments that are more focused on politics and less deferential to the elected official's original post. And the badge deepens participation among individuals who are otherwise less likely to participate. Our results demonstrate how the mere opportunity to formally adopt a role can alter behavior.

Computational Interventions to Improve Access to Opportunity for Disadvantaged Populations
Rediet Abebe, Cornell

Abstract: Poverty and economic hardship are highly complex and dynamic phenomena. Due to the multi-faceted nature of economic welfare, assistance programs aimed at improving access to opportunity for disadvantaged populations face challenges, as they often rely on simpler measures of economic hardship such as income or wealth that fail the full complexity of economic welfare. In this presentation, we explore on important dimension -- susceptibility to income shocks. We introduce and analyze a model of economic welfare that incorporates income, wealth, and external shocks and poses the question of how to allocate subsidies in this setting. We find that we can give optimal allocation mechanisms under natural assumptions on the agent's wealth and shock distributions, as well as approximation schemes in the general setting. We conclude with a discussion on how algorithmic, computational, and mechanism design techniques can help inform interventions to improve access to opportunity in relevant domains and the Mechanism Design for Social Good initiative, which facilitates research at this interface.

On the Taylor Expansion of Value Functions
Itai Gurvich, Cornell

Abstract: Abstract: We introduce a framework for approximate dynamic programming that we apply to discrete time chains on $\mathbb{Z}_+^d$ with countable action sets. The framework is grounded in the approximation of the (controlled) chain's generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function, replacing the Bellman equation with one in continuous space and time where the transition matrix is reduced to its first and second moments. We develop bounds on the optimality gap---the sub-optimality introduced by using the control produced by the ``Taylored'' equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian approximations. Computationally, our framework leads to an ``aggregation'' approach with performance guarantees. While the guarantees are grounded in PDE theory, the practical use of this approach requires no knowledge of that theory.

The Effect of Identifying Constituents on Representative-Constituent Communication
Irene Lo, Stanford

Abstract: In the school choice market, where scarce public school seats are assigned to students, a key operational issue is how to reassign seats that are vacated after an initial round of centralized assignment. Practical solutions to the reassignment problem must be simple to implement, truthful, efficient and fair while also alleviating costly student movement between schools. In this talk, I will propose and axiomatically justify a class of reassignment mechanisms, the Permuted Lottery Deferred Acceptance (PLDA) mechanisms. Our mechanisms generalize the commonly used Deferred Acceptance (DA) school choice mechanism to a two-round setting and retain its desirable incentive, fairness and efficiency properties. School choice systems typically run Deferred Acceptance with a lottery number assigned to each student to break ties in school priorities. I will show that under natural conditions on demand, correlating the tie-breaking lotteries across rounds preserves allocative welfare, and reversing the first-round lottery order minimizes reassignment among all PLDA mechanisms. Empirical investigations based on data from NYC high school admissions support our theoretical findings. This is based on joint work with Itai Feigenbaum, Yash Kanoria and Jay Sethuraman.

The role of academic environment on scientific productivity and prominence
Aaron Clauset, UC Boulder

Abstract: Although meritocracy is a key organizing principle of academia, the social mechanisms that explain individual and institutional differences in scholarly outputs are poorly understood. For instance, which is a better predictor of early-career productivity and prominence, where a scientist was trained or where they currently work? In this talk, I'll describe a causal investigation of the factors that drive scholarly productivity and prominence, which exploits a comprehensive dataset of over 200,000 publications, 7.4 million citations, and job placement information for nearly 2500 early-career Computer Science faculty in the U.S. and Canada. Using a matched-pairs experimental design around initial faculty job placement, I'll show that the prestige of an individual's working environment is a causal driver of their productivity and prominence, not the prestige of their doctoral training. Moreover, this effect cannot be explained by the selection or retention of relatively more productive faculty by departments. I'll then argue that the wide differences faculty productivity and prominence across different departments is best explained by differences in their environmental characteristics, e.g., the number of faculty, of doctoral students, etc. As a result, the scholarly output of early-career faculty is driven by where they work, not by where they trained, and their current productivity and prominence cannot be separated from their place in the academic system. I'll close with a brief discussion of the implications of these results for the emerging field of the science of science, and for academic policy. Joint work with Samuel F. Way, Allison C. Morgan, and Daniel B. Larremore.

Sequential Procurement with Contractual and Experimental Learning
Daniela Saban, Stanford

Abstract: In this paper, we study the design of sequential procurement strategies that integrate stochastic and strategic information. We consider a buyer who repeatedly demands one unit of the same good and is unable to commit to long-term contracts. In each time period, the buyer makes a price offer to a seller who has private, persistent information regarding his cost and quality of provision. If the offer is accepted, the seller provides the good with a stochastic quality that is not contractible; otherwise, the buyer sources the good from a known outside option. In our model, the buyer can learn from observations of the (strategic) acceptance decisions taken by the seller, and from evaluations of the (stochastic) quality delivered whenever a purchase occurs. We show that the buyer's equilibrium strategy corresponds to a dynamic sequence of thresholds on the beliefs on the seller's type. In equilibrium, the buyer offers high prices that facilitate quality experimentation at early stages of the interaction with the seller, and after gathering enough information (and if beliefs are sufficiently low), she advances to offering low prices that may partially separate seller types. To highlight the interplay between the two forms of information, we contrast the buyer's strategy with two benchmark strategies designed to learn from each form of information in isolation. We establish that, paradoxically, the ability to observe delivered quality is not always beneficial for the buyer. (joint work with Y Gur and G Macnamara)

A Practical 4-Approximation Algorithm for Coflow Scheduling
David Shmoys, Cornell

Abstract: Coflows are an emerging network abstraction introduced to model traffic patterns within a datacenter. This abstraction naturally leads to several scheduling optimization problems, and we consider, in particular, optimizing for the sum of weighted completion times. There has been recent work from both the theory and systems communities on this scheduling problem, leading to either practical solutions that perform well in practice and are easy to implement, or theoretical solutions with bounded performance guarantees, albeit not both. In this work, we bridge this gap, and present Sincronia, a network design that matches the best known approximation guarantee, while empirically outperforming the existing state-of-the-art network designs. Sincronia achieves this using a key technical result — we show that given a “right” ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time within a factor of 4 of the optimal as long as (co)flows are prioritized with respect to the ordering. We obtain this right ordering of coflows using a primal-dual algorithm. This is joint work with Saksham Agarwal, Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, and Amin Vahdat.

Contract Theory: An Algorithmic Approach
Inbal Talgam-Cohen, Technion

Abstract: We consider the classic principal-agent model of contract theory [Holmstrom'79], in which a principal designs an outcome-dependent compensation scheme to incentivize an agent to take a costly and unobservable action. When all of the model parameters-including the full distribution over principal rewards resulting from each agent action-are available to the designer in extensive form, an optimal contract can be computed by linear programming. This talk will examine contract theory through the algorithmic lens, presenting two main results: First, if the principal knows only the first moment of each action's reward distribution, we prove that linear contracts are guaranteed to be worst-case optimal, ranging over all reward distributions consistent with the given moments. This provides a new explanation for the prevalence of this class of simple contracts. Second, we consider a natural class of succinct principal-agent settings, capturing scenarios like a marketing agent whose actions lead to different probabilities of sales of various items. We show how to utilize the succinct structure by an Ellipsoid-based algorithm to obtain a near-optimal contract. We conclude by discussing the possible role of algorithms in future research on contract theory and its applications. Joint work with Paul Duetting and Tim Roughgarden.

Identification of anonymous DNA using genealogical triangulation
Arvind Narayanan, Princeton

Abstract: Consumer genetics databases hold dense genotypes of millions of people, and the number is growing quickly. In 2018, law enforcement agencies began using such databases to identify anonymous DNA via long-range familial searches. We show that this technique is far more powerful if combined with a genealogical database of the type collected by online ancestry services. We present a "genealogical triangulation" algorithm and study its effectiveness on simulated datasets. We show that for over 50% of targets, their anonymous DNA can be identified (matched to the correct individual or same-sex sibling) when the genetic database includes just 1% of the population. We also show the effectiveness of "snowball identification" in which a successful identification adds to the genetic genealogical database, increasing the identification accuracy for future instances. We discuss our technique's potential to enhance law enforcement capabilities as well as its privacy risks.

A Quest for Structure: Joint Graph Structure & Semi-Supervised Inference
Leman Akoglu, CMU

Abstract: Semi-supervised learning (SSL) is effectively used for numerous classification problems, thanks to its ability to make use of abundant unlabeled data. The main assumption of various SSL algorithms is that the nearby points on the data manifold are likely to share a label. Graph-based SSL constructs a graph from point-cloud data as an approximation to the underlying manifold, followed by label inference. It is no surprise that the quality of the constructed graph in capturing the essential structure of the data is critical to the accuracy of the subsequent inference step. How should one construct a graph from the input point-cloud data for graph-based SSL? In this talk I will introduce a Parallel Graph Learning framework (called PG-Learn) that learns the graph structure and infers the labels jointly. PG-Learn has two main ingredients: (1) a gradient-based optimization of the edge weights (specifically, different kernel bandwidths in each dimension) based on a validation loss objective, and (2) a parallel hyper-parameter search algorithm with an adaptive resource allocation scheme. The latter early-terminates searches based on relative performance, in order to dynamically allocate resources (time and processors) to those with promising configurations. Put together, the solution is a hybrid of iterative local and parallel random search that strategically navigates the hyper-parameter space. Importantly, PG-learn is scalable; both in terms of runtime (linear in dimensionality d and log-linear in number of samples n) as well as memory (linear in both n and d).

Information Inundation on Platforms and Implications
Vahideh Manshadi, Yale

Abstract: We study a model of information consumption where consumers sequentially interact with a platform that offers a menu of signals (posts) about an underlying state of the world (fact). At each time, incapable of consuming all posts, consumers screen the posts and only select (and consume) one from the offered menu. We show that in the presence of uncertainty about the accuracy of these posts, and as the number of posts increases, adverse effects such as slow learning and polarization arise. Specifically, we establish that, in this setting, bias emerges as a consequence of the consumer's screening process. Namely, consumers, in their quest to choose the post that reduces their uncertainty about the state of the world, choose to consume the post that is closest to their own beliefs. We study the evolution of beliefs and we show that such a screening bias slows down the learning process, and the speed of learning decreases with the menu size. Further, we show that the society becomes polarized during the prolonged learning process even in situations where the societys belief distribution was not a priori polarized. (Joint work with Gad Allon and Kimon Drakopoulos)