Skip to content Skip to navigation

Archive of RAIN talks

RAIN schedule for Spring 2021

Date Speaker Topic Comment
June 2
Elena Manresa An Adversarial Approach to Structural Estimation
Time: 12-1pm PT
May 26
Mark Braverman Optimization-friendly generic mechanisms without money
May 5
Mukund Sundararajan Using Attribution to Understand Deep Neural Networks
Apr. 28
Lihua Lei Hierarchical Community Detection for Heterogeneous and Multi-scaled Networks
Apr. 21
Shipra Agrawal Dynamic Pricing and Learning under the Bass Model
Apr. 14
Paul Milgrom Investment Incentives in Near-Optimal Mechanisms
Apr. 7
Michael Leung Network Cluster-Robust Inference

RAIN schedule for Winter 2021

Date Speaker Topic Comment
Mar. 3
Arun Chanrashaker Identifying the latent space geometry of network models through analysis of curvature
Feb. 24
Annie Liang Data and Incentives
Feb. 17
Jean Pouget-Abadie Design and Analysis of Bipartite (Market) Experiments

RAIN schedule for Fall 2020

Date Speaker Topic Comment
Dec. 2
José Correa The Value of Observability in Dynamic Pricing
Nov. 18
Guillaume Basse Displacement Effects in a Hot Spot Policing Intervention in Medellin: Inference and Pitfalls
Time: 8:30-9:30AM PT
(RAIN/OR Joint Seminar)
Nov. 4
Katy Craig Gradient Flows in the Wasserstein Metric: From Discrete to Continuum via Regularization
(OR Seminar)
Oct. 28
Christopher Musco Optimal Stochastic Trace Estimation
(OR Seminar)
Oct. 21
Karl Rohe Vintage Factor Analysis with Varimax Performs Statistical Inference
Oct. 14
Ruta Mehta Nash Social Welfare Approximation for Strategic Agents
Oct. 7
Carrie Wu Developing Data Efficient Algorithms for AI
(SOAL Seminar)
Sep. 30
Betsy Ogburn Social network dependence, the replication crisis, and (in)valid inference

RAIN schedule for Spring 2020

Date Speaker Topic Comment
May 20
Chris Musco TBA
(OR Seminar)
Apr. 8
Sandra Gonzalez-Bailon TBA

RAIN schedule for Winter 2020

Date Speaker Topic Comment
Mar. 11
Kent Quanrud TBA
(OR Seminar)
Mar. 4
Feb. 26
Faidra Monachou Discrimination in Online Markets: Effects of Social Bias on Learning from Reviews and Policy Design
second half
Feb. 26
Benjamin Plaut Beyond the First Welfare Theorem: Counteracting Inequality in Markets
first half
Feb. 19
Edward McFowland III Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations
Feb. 12
Avi Mandelbaum "Theompirical" Research of Service Systems
(OR Seminar)
Feb. 5
Haris Aziz Fair and Efficient Allocation of Indivisible Goods and Chores
Jan. 29
Warren Powell From Reinforcement Learning to Stochastic Optimization: A Universal Framework for Sequential Decision Analytics
(OR Seminar)
Jan. 22
Sham Kakade Representation, Modeling, and Optimization in Reinforcement Learning
(OR Seminar)
Jan. 15
Matt Weinberg (a biased selection of) Recent Developments in Combinatorial Auctions

RAIN schedule for Autumn 2019

Date Speaker Topic Comment
Dec. 4
Peng Shi Optimal Priority-Based Allocation Mechanisms
Nov. 20
Ali Aouad Click-Based MNL: Algorithmic Frameworks for Modeling Click Data in Assortment Optimization
(OR Seminar)
Nov. 13
Aleksandra Korolova Societal Concerns in Targeted Advertising
Oct. 30
Rupert Freeman Truthful Aggregation of Budget Proposals
Oct. 16
Nikhil Garg Driver Surge Pricing
second half
Oct. 16
Suleyman Kerimov Scrip Systems with Minimal Availability
first half
Oct. 2
Shane Henderson Under the Hood of Bike Sharing
(OR Seminar)

RAIN schedule for Spring 2018-2019

Date Speaker Topic
April 10
Daniela Saban Sequential Procurement with Contractual and Experimental Learning
April 17
David Shmoys A Practical 4-Approximation Algorithm for Coflow Scheduling
May 8
Inbal Talgam-Cohen Contract Theory: An Algorithmic Approach
May 15
Arvind Narayanan Identification of anonymous DNA using genealogical triangulation
May 22
Leman Akoglu A Quest for Structure: Joint Graph Structure & Semi-Supervised Inference
May 29
Vahideh Manshadi Information Inundation on Platforms and Implications

RAIN schedule for Winter 2018-2019

Date Speaker Topic
January 16
Vasilis Gkatzelis Cost-Sharing Methods for Scheduling Games under Uncertainty
January 23
Ehud Kalai A Rudimentary Index of Strategic Stability: Deterring Defectors, Sustaining Loyalists and Forming Equilibrium
January 30
Justin Grimmer The Effect of Identifying Constituents on Representative-Constituent Communication
February 20
Rediet Abebe Computational Interventions to Improve Access to Opportunity for Disadvantaged Populations
February 27
Itai Gurvich On the Taylor Expansion of Value Functions
March 6
Irene Lo Dynamic Matching in School Choice: Efficient Seat Reassignment after Late Cancellations
March 13
Aaron Clauset The role of academic environment on scientific productivity and prominence

RAIN schedule for Fall 2018-2019

Date Speaker Topic
October 24
Omer Reingold A Complexity-Theoretic Perspective on Algorithmic Fairness
December 5
Yiling Chen Surrogate Scoring Rules and A Dominant Truth Serum

RAIN schedule for Spring Quarter 2017-18

Date Speaker Topic
April 18
Sid Banerjee Allocating Resources, in the Future
April 25
Avi Feller The role of the propensity score in the synthetic control method
May 9
C. Seshadhri Algorithmically exploiting structure or: how I learned to stop worrying and enjoy real-world graphs
May 23
Miklos Racz High-dimensional random geometric graphs
June 6
Richard Cole Dynamics of Distributed Updating in Fisher Markets

RAIN schedule for Winter Quarter 2017-18

Date Speaker Topic
Jan 17
Dean Eckles Randomization inference for spillovers in networks
Jan 31
Alex Peysakhovich Building a cooperator
Feb 14
Mohammad Akbarpours Diffusion, Seeding, and the Value of Network Information
Feb 28
Rachel Cummings Differential Privacy for Growing Databases
Mar 14
Nikhil Devanur Lagrangian duality in mechanism design, a crash course.

RAIN schedule for Autumn Quarter 2017-18

Date Speaker Topic
Oct 11
Yang Cai Learning Multi-item Auctions with (or without) Samples
Oct 18
Michal Feldman Prophet Inequalities Made Easy: Stochastic Optimization by Pricing Non-Stochastic Inputs
Nov 1
Moritz Hardt Biases Beyond Observation
Nov 8
Jon Kleinberg Joint seminar with GSB Econ Theory
Time: 3:45 - 5 PM, Venue: GSB C105
Nov 15
Mihai Manea Bottleneck links, essential intermediaries, and competing paths of diffusion in networks.
Nov 29
Manuel Gomez Rodriguez On Fairness in Supervised Learning

RAIN schedule for Winter Quarter 2016-17

Date Speaker Topic
January 25
Devavrat Shah Blind Regression, Recommendation System and Collaborative Filtering
February 8
Udi Weinsberg Data-Science-Driven Products at Facebook
February 22
Anima Anandkumar Large-scale Machine Learning: Theory and Practice
March 8
Éva Tardos Learning and Efficiency of Outcomes in Games

RAIN schedule for Spring Quarter 2016-17

Date Speaker Topic
April 12
Alireza Tahbaz-Salehi Supply Chain Disruptions: Evidence from the Great East Japan Earthquake
April 26
Aaron Roth Quantifying Tradeoffs between Fairness and Accuracy in Online Learning
May 10
Mohsen Bayati Exploiting the Natural Exploration in Online Decision-Making
May 24
Herve Moulin Competitive Division of a Mixed Manna
May 31
John Dickerson Using Optimization to Balance Fairness and Efficiency in Barter Markets

Autumn 2016-17

Date Speaker Topic
October 12
Philippe Rigollet at OIT seminar, room M109 at GSB Online Learning in Repeated Auctions
November 2
Dennis Feehan Network reporting methods for sample surveys
November 16
Mohammad Akbarpour Timing "versus" Matching: Dynamic Matching on Stochastic Networks
November 30
Ariel Procaccia Computational Social Choice: For the People

Spring 2015-16

Date Speaker Topic
April 13
Augustin Chaintreau Scaling up the Web Transparency: Algorithms and Challenges
April 27
David Rothschild Prediction Market Trading Strategies: manipulation and forecasting
May 11
Ali Jadbabaie Collective Phenomena in Complex Networks: Coordination and Social Learning
May 25
Sharique Hasan Conversational Peers and Idea Generation: Evidence from a Field Experiment

Winter 13-14

Date Speaker Topic
January 21 Edith Cohen Scalable mining of massive networks: Distance-based Centrality, Similarity, and Influence
February 11 Glen Weyl Quadratic Voting
February 18 Paul Milgrom Auctions and Decentralization in Complex Networks

Autumn 13-14

Date Speaker Topic
September 24 Balasubramanian Sivan Towards optimal algorithms for prediction with expert advice
October 8 Vivek Farias Online A-B Testing
October 29 Fil Menczer The spread of information in social media
November 19 Sergei Vassilvitskii Inverting a Steady-State

Spring 13-14

Date Speaker Topic
April 2 Omer Tamuz Strategic Learning and the Topology of Social Networks
April 16 Kostas Bimpikis Information Provision in Dynamic Innovation Tournaments
May 14 Jonathan Levin Sales Mechanisms in Online Markets: What Happened to Internet Auctions?
May 28 Steve Tadelis Quality Externalities and the Limits of Reputation in Two-Sided Markets
June 4 Sinan Aral The Dynamics of Online Reputation

Winter 13-14

Date Speaker Topic
January 29 Derek Ruths Control Profiles of Complex Networks
February 5 Amin Karbasi From Small-World Networks to User-Driven Content Search
February 19 Arun Chandrasekhar Savings Monitors
February 26 Ariel Procaccia Computational Fair Division: From Cake Cutting to Cloud Computing
March 12 Niki Kittur Big Thinking: Augmenting human cognition with crowds and computation
March 19 Ricardo Baeza-Yates OSNs: The Wisdom of a Few and The Value of Users

Autumn 13-14

Date Speaker Topic
September 25 Tuomas Sandholm Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games
October 23 Sharad Goel "Going Viral" and the Structure of Online Diffusion
November 06 Itay Fainmesser The Value of Network Information
December 04 Jennifer Neville Prediction in complex networks: The impact of structure on learning and prediction

Spring 12-13

Date Speaker Topic
April 03 Anirban Dasgupta Eliciting and aggregating information from the crowd
April 10 Noam Nisan Should auctions be complex?
April 17 David Pennock The extent of price misalignment in prediction markets
April 24 Yaron Singer Adaptive Seeding in Social Networks
May 08 Cristian Danescu-Niculescu-Mizil Language and Social Dynamics in Online Communities
May 15 Francis Bloch Pricing in Social Networks

Winter 12-13

Date Speaker Topic
Jan 09 Itai Ashlagi Current challenges in kidney exchange - need for long chains and dynamic matc hing
Jan 23 Katrina Ligett Take it or Leave it: Running a Survey when Privacy Comes at a Cost
Jan 30 Al Roth Deceased Organ Donation and Allocation: 3 Experiments in Market Design
Feb 7-8 - DIMACS Workshop on Economic Aspects of Information Sharing
Feb 20 Santo Fortunato Citation networks: Evaluating the impact of science
Mar 06 Michael Bernstein Crowd-Powered Systems

Autumn 12-13

Date Speaker Topic
Oct 03 Devavrat Shah Rumor in a network: Who's the culprit?
Oct 10 Moshe Babaioff On Bitcoin and Red Balloons
Oct 24David Kempe Information Asymmetries in first-price common-value auctions, or The Value of Information
Nov 14 Lirong Xia Random Utility Models for Social Choice
Nov 28 Susan Athey Advertiser Behavior in Sponsored Search Auctions

Spring 11-12

Date Speaker Topic
April 04 Nello Cristianini Automatic Discovery of Patterns in Media Content
April 11 Kamesh Munagala Auctions and Allocations with Positive Network Externalities
April 18 Dan McFarland Adolescent Societies - Their Form, Evolution, and Variation
April 25 Jason Hartline The Simple Economics of Approximately Optimal Auctions
May 2 Khalid El-Arini Beyond Keyword Search: Taming Information Overload with Rich User Interactions
May 9 Kostas Bimpikis Optimal targeting strategies over social networks
May 16 Katharine Anderson Individual Heterogeneity and the Formation of Collaboration Networks
May 23 Dan Iancu DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management.
May 30 Kamal Jain Online Concave Matching

Winter 11-12

Date Speaker Topic
January 18 Michael Wellman Price Prediction Strategies for Simultaneous One-Shot Auctions
January 25 George Varghese Lattice Games and the Economics of Aggregators
February 1 Michael Kearns Competitive Contagion in Networks
February 8 Stephen Barley Building an Institutional Field to Corral a Government
February 15 Siddharth Suri Cooperation and Assortativity with Endogenous Partner Selection
February 22 - -
February 29 - -
March 6 Sergiu Hart Two(!) Good To Be True
March 7 Dean Eckles Identifying peer effects in online communication behaviors

Autumn 11-12

Date Speaker Topic
October 5 Jure Leskovec Networks, communities and the ground-truth
October 12 Vincent Conitzer Computing Game-Theoretic Solutions for Security
October 19 Costis Daskalakis Revenue-optimal Auctions
October 26 David Parkes A Random Graph Model of Multi-Hospital Kidney Exchanges
November 2 - -
November 9 Arnout van de Rijt A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs
November 16 Markus Mobius Treasure Hunt
November 23 - Thanksgiving
November 30 Yiling Chen Task Routing for Prediction Tasks

Spring 10-11

Date Speaker Topic
April 6 Eric Friedman Bargaining Theory in the Cloud
April 13 Organizational Organizational
April 20 Matthew Jackson Network Patterns of Favor Exchange
April 27 Tim Roughgarden Simple Auctions with Near-Optimal Equilibria
May 4 Kevin Lang Efficient Online Ad Serving in a Display Advertising Exchange
May 11 Sergei Vassilvitskii Designing Algorithms for MapReduce
May 18 Arpita Ghosh Incentivizing High-quality User-Generated Content
May 25 Lada Adamic Coevolution of Network Structure and Content
June 1 Damon Centola -

Winter 10-11

Date Speaker Topic
January 12 Martin Suchara BGP Safety with Spurious Updates - The Conditions of BGP Convergence
January 19 No seminar -
January 26 Gabor Szabo Predicting the Popularity of Online Content
February 2 No seminar -
February 9 No seminar -
February 16 Elisa Celis Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism
February 23 Yashodhan Kanoria Social Learning and the Dynamic Cavity Method
March 2 Matthias Mnich The Complexity Ecology of Consensus Ranking

Autumn 10-11

Date Speaker Topic
September 22 Elliot Anshelevich Contribution Games in Social Networks
September 29 Pranav Dandekar Liquidity in Credit Networks: A Little Trust Goes a Long Way
October 6 Sugato Basu User Browsing Models: Relevance versus Examination
October 13 Daria Sorokina Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
October 20 Marc Najork Reflections on Link-based Ranking
October 27
November 3 Panagiotis Papadimitriou Output URL Bidding
November 10 Roberto Bayardo A Technical Overview of Search Ads Quality at Google
November 17 Georgios Pierrakos On the Competitive Ration of Online Sampling Auctions
November 24
December 1 Paolo Parigi The importance of ties

Spring 09-10

Date Speaker Topic
April 7 Gueorgi Kossinets How opinions are received by online communities: a case study on helpfulness votes
April 14 Grant Schoenebeck Reaching Consensus on Social Networks
April 21 Nilesh Dalvi Web-scale Information Extraction Using Wrappers
April 28
No meeting -- WWW 2010
May 5 Andrew Tomkins Characterization and Modeling of Online Browsing Activity
May 12 Xiaolin Shi User grouping behavior in online forums
May 19 Matthew Harding Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing
May 26 Neel Sundaresan Deconstructing A Long Tail Commerce Network
June 2 Monica Rogati Nod vs. High-Five: LinkedIn Connection Strength Models and Applications

Winter 09-10

Date Speaker Topic
Jan 13 Jure Leskovec Network of positive and negative edges
Jan 20 Esteban Arcaute
Social Networks and Stable Matchings in the Job Market
Jan 27 Christina Aperjis Temporal User Behavior in Yahoo Answers
Feb 3 Ashish Goel Some open problems motivated by social networks
Feb 10 Yehuda Koren The Netflix Prize: Quest for $1,000,000
Feb 17 DJ Patil Data Jujitsu - the art of treating data as a product
Feb 24 Panayiotis Tsaparas Query-log analysis for improving web search
Mar 3 Arash Asadpour Stochastic Submodular Optimization
Mar 10 Ravi Kumar Compressibility of Behavioral Graphs
Mar 17 Sep Kamvar Aardvark

Autumn 09-10

Date Speaker Topic
Sep 30 Arvind Narayanan De-anonymizing Social Networks
Oct 7 Nicolas Lambert Eliciting Information on the Distribution of Future Outcomes
Oct 14 Lars Backstrom Optimizing Web Traffic via the Media Scheduling Problem
Oct 21 Michael Kearns Behavioral Experiments in Strategic Networks
Note special location: Terman Engineering Center, Room 453
Oct 28 Matthew Jackson How Social Network Structure affects Diffusion and Learning
Nov 4 Paul Valiant How to Design Profitable Auctions
Nov 11 Mohammad Mahdian Sort me if you can (or, Algorithms on Dynamic Data)
Nov 18 DJ Patil
Nov 25
No meeting -- Happy Thanksgiving!
Dec 2 Abdur Chowdhury Discovery & Emergence
Note special location: Terman Engineering Center, Room 453

Spring 08-09

Date Speaker Topic
Apr 3 Mayur Datar
Internet search advertising: Challenges in targeting and quality considerations
Apr 10
No meeting this week
Apr 17 D Sivakumar
Affiliation Networks
Apr 24
No meeting -- WWW 2009
May 1
No meeting -- BAGT
May 8 Sandeep Pandey
Yahoo! Research
Using Bandit Models in Search Engines
May 14 Richard Chatwin
VP Research, Adchemy
Research Challenges in Online Advertising
May 22 Robert Johnson
Scaling Social Data
May 29 Yury Lifshits
Yahoo! Research
Data Cloud: Integrating Structured Information into Web Search
June 5 Robbie Yan Manipulation Robustness of Collaborative Filtering Systems

Winter 08-09

Date Speaker Topic
Jan 16 - Organizational meeting
Jan 23 G.D. Ramkumar
SnapTell, Inc.
Scalable image recognition algorithms for camera phone applications
Jan 30 Luca de Alfaro Content-Driven Reputation and Trust for Collaborative Content
Feb 6 Ioannis Antonellis Tagging with Queries: How and Why?
(To appear in WSDM 2009)
Feb 13 Scott Brave
Baynote, Inc.
Content Targeting and Recommendation Systems
Feb 20 Ravi Kumar Online Social Networks: Modeling and Mining
Feb 27 Michael Mahoney Community Structure in Large Social and Information Networks
Mar 6 Kostas Kollias An Incentive-Based Architecture for Social Recommendations
Mar 13 Ali Dasdan Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction

Autumn 08-09

Date Speaker Topic
Oct 3 - Talk by Jure Leskovec and organizational meeting
Oct 10 Hady Lauw Trusts among Users of Online Communities -- an Epinions Case Study
Oct 17 Canceled -
Oct 24 Bahman Bahmani Online Budgeted Matching in Random Input Models with
Applications to Adwords
Oct 31 Sergei Vassilvitskii Top-k Aggregation Using Intersection of Ranked Inputs
Nov 7 - No meeting - Bay Area Theory Seminar (BATS)
Nov 14 Mukund Sundararajan Utility-Maximizing Privacy Mechanisms
Nov 21 Hamid Nazerzadeh Online advertising
Nov 28 - No meeting - Happy Thanksgiving!
Dec 5 Tim Roughgarden From Bayesian to Worst-Case Optimal Auction Design

Spring 07-08

Date Speaker Topic
Apr 11BAGT
Apr 18Robbie Yan Reputation Markets
Apr 25Cancelled for WWW
May 2Bobji Mungamuru Click Fraud
May 9Ying Xu Oral Examination
May 16 Esteban Arcaute Network Formation Games
Mar 23Christina Aperjis Reputation Systems
May 30Evimaria Terzi Inferring Links and Initiators
June 6Yury Lifshits Social Design

Winter 07-08

Date Speaker Topic
Jan 25Alon Altman An Axiomatic Approach to Personalized Ranking Systems
Feb 1Ashish Goel Open Problems
Feb 8Cancelled
Feb 15Brian Babcock Adchemy
Feb 22Rajat Bhattacharjee Oral Examination
Feb 29Panagiotis Papadimitriou Web Graph Similarity for Anomaly Detection
Mar 7Brad Null Ratio Index for Budgeted Learning, with Applications
Mar 14Mukund Sundararajan Optimal Marketing Strategies over Social Networks

Autumn 07-08

Date Speaker Topic
Oct 15Michael Schapira Interdomain Routing and Games
Oct 22Esteban Arcaute Decentralized Search of Random Graphs
Oct 29Orkut Buyukkokten Social Network Revolution
Nov 5Aranyak Mehta Random input models for Adwords
Nov 12Phil Long Boosting the area under the ROC curve
Nov 19Thanks-giving
Nov 26cancelled
Dec 3Arash Asadpour Ad auctions with reserved price

Spring 06-07

Date Speaker Topic
Apr 13Ying Xu a discussion on WWW'07 papers
Apr 20BAGT
Apr 27Anish Das Sarma Detecting Near-Duplicates for Web Crawling
May 4Mayur Datar Google News Personalization
May 11Adam Meyerson Pricing Partially Compatible Products
May 18Sergei Vassilvitskii Click Fraud Resistant Methods for Learning Click-Through Rates
May 25Dilys Thomas Streaming Algorithms and Worm Fingerprinting
June 1Neil Daswani Anatomy of Clickbot.A

Winter 06-07

Date Speaker Topic
Jan 26Nicolas Lambert Asymptotically Optimal Repeated Auctions
Feb 2Hamid Nazerzadeh Mechanism design based on CPA
Feb 9Ashish Goel Reusable and multiple goods
Feb 16Ying Xu Estimating Search Engine Index and Web size
Feb 23Mukund Sundararajan Is efficiency expensive?
March 2Paul Heymann Collaborative tagging systems
March 9Mihaela Enachescu Multi-path Routing
March 16Aleksandra Korolova / Dilys Thomas Clickthrough of and Bidding on keywords

Autumn 06-07

Date Speaker Topic
Oct 25 Ying Xu The Netflix Challenge
Nov 1 Ashish/Alexandra Social Networks
Nov 8 Arash/Shubha PageRank
Nov 15 TBA TBA
Nov 22 - Netflix follow-up
Nov 29 Zoltan Gyongyi TBA
Dec 6 Sreenivas Gollapudi TBA

Spring 05-06

Date Speaker Topic
Apr 7 Ashish Goel Axiomatic PageRank
Apr 14 - Organizational Meeting
Apr 21 Sergei Vassilvitskii Relevance Feedback and Web Search
Apr 28 Hamid Nazerzadeh AdWords with Unreliable Estimates
May 5 Mihaela Enachescu Compact Routing
May 12 Ying Xu Streaming Algorithms for Graph Problems
May 19 - -
May 26 Udi Manber New Problems in Search
Jun 2 Rajat Bhattacharjee Incentive Based Ranking Mechanisms

Winter 05-06

Date Speaker Topic
Jan 20 - Organizational Meeting
Jan 27 Brad Null Multiarmed Bandit Problem
Feb 3 Edith Cohen Spatially-Decaying Aggregation
Feb 10 Andrew Tomkins Routing and Growth Models for Social Networks
Feb 17 - BAGT
Feb 24 Amin Saberi Open Problems
Mar 3 Mayur Datar, Ashutosh Garg Recommendation Systems
Mar 10 Amruta Joshi Googlebase

Fall 05-06

Date Speaker Topic
Oct 7 Roee Engelberg Equilibria in Online Games
Oct 14 Ying Xu, Sergei Vassilvitskii Models for the Web Graph
Oct 21 Gagan Aggarwal AdWords
Oct 28 Omar Benjelloun SERF
Nov 4 Zoltan Gyongyi Link Spam
Nov 11 Mihaela Enachescu Query Incentive Networks
Nov 18 - BATS
Dec 2 Rajat Bhattacharjee Survey of Recommendation Systems

Spring 04-05

Date Speaker Topic
Apr 4 Shubha Nabar Impact of Search Engines on Evolution of Page Popularity
Apr 11 Svetlozar Nestorov Mining Structure from Semistructured Data
Apr 18 Sepandar Kamvar Reputation Management in P2P Networks
Apr 25 Rina Panigrahy PageRank Axioms
May 2 Amin Saberi Spread of Viruses on the Internet
May 9 Ramesh Johari Stable Policy Routing with Provider Independence
May 16 Rajat Bhattacharjee Reputation Systems
May 23 Mehran Sahami Matching Text on the Web
Jun 6 Amruta Joshi Modeling Documents

Winter 04-05

Date Speaker Topic
Feb 2 Ying Xu Recommendation Systems/Collaborative Filtering
Feb 9 Ashish Goel Spam and Page Rank
Feb 16 Gagan Aggarwal AdWords
Feb 23 David Arthur Gossip and File Sharing
Mar 2 Anil Kamath AdWords
Mar 10 Sergei Vassilvitskii
Mar 16 Dilys Thomas Social Networks


An Adversarial Approach to Structural Estimation
Elena Manresa

Abstract: We propose a new simulation-based estimation method, adversarial estimation, for structural models. The estimator is formulated as the solution to a minimax problem between a generator (which generates synthetic observations using the structural model) and a discriminator (which classifies if an observation is synthetic). The discriminator maximizes the accuracy of its classification while the generator minimizes it. We show that, with a sufficiently rich discriminator, the adversarial estimator attains parametric efficiency under correct specification and the parametric rate under misspecification. We advocate the use of a neural network as a discriminator that can exploit adaptivity properties and attain fast rates of convergence. We apply our method to the elderly’s saving decision model and show that including gender and health profiles in the discriminator uncovers the bequest motive as an important source of saving across the wealth distribution, not only for the rich.

Optimization-friendly generic mechanisms without money
Mark Braverman

Abstract: Our goal is to develop a generic framework for converting modern gradient-descent based optimization algorithms into mechanisms where inputs come from self-interested agents. We focus on aggregating preferences from n players in a context without money. Special cases of this setting include voting, allocation of items by lottery, and matching. Our key technical contribution is a new meta-algorithm we call APEX (Adaptive Pricing Equalizing Externalities). The framework is sufficiently general to be combined with any optimization algorithm that is based on local search. In the talk I'll outline the algorithm, and open problem/research directions that it raises, with a particular focus towards mechanism design + ML. If time permits, I will discuss a special case of applying the framework to the problem of one-sided allocation with lotteries. In this case, we obtain a strengthening of the 1979 result by Hylland and Zeckhauser on allocation via a competitive equilibrium from equal incomes (CEEI). The [HZ79] result posits that there is a (fractional) allocation and a set of item prices such that the allocation is a competitive equilibrium given prices. We further show that there is always a reweighing of the players' utility values such that running the standard unit-demand VCG with reweighed utilities leads to a HZ-equilibrium prices. Interestingly, not all HZ competitive equilibria come from VCG prices.

Bio: Mark Braverman is a professor of Computer Science at Princeton University. He works primarily on building new connections between theoretical computer science and other disciplines, including information theory, algorithmic mechanism design, dynamical systems, analysis, and geometry. He received a 2013 Packard Fellowship, and a 2019 NSF Alan T. Waterman award.

Using Attribution to Understand Deep Neural Networks
Mukund Sundararajan

Predicting cancer from XRays seemed great
Until we discovered the true reason.
The model, in its glory, did fixate
On radiologist markings - treason!

We found the issue with attribution:
By blaming pixels for the prediction (1,2,3,4,5,6).
A complement'ry way to attribute,
is to pay training data, a tribute (1).

If you are int'rested in FTC,
counterfactual theory, SGD
Or Shapley values and fine kernel tricks,
Please come attend, unless you have conflicts

Should you build deep models down the road,
Use attributions. Takes ten lines of code!

There once was an RS called MS,
He studied models that are a mess,
A director at Google.
Accurate and frugal,
Explanations are what he liked best.

Hierarchical Community Detection for Heterogeneous and Multi-scaled Networks
Lihua Lei

Abstract: Real-world networks are often hierarchical, heterogeneous, and multi-scaled, while the idealized stochastic block models that are extensively studied in the literature tend to be over-simplified. In a line of work, we propose several top-down recursive partitioning algorithms which start with the entire network and divide the nodes into two communities by certain spectral clustering methods repeatedly, until a stopping rule indicates no further community structures. For these algorithms, the number of communities does not need to be known a priori or estimated consistently. On a broad class of hierarchical network models, our algorithms are proved to achieve the exact recovery for sparse networks with expected node degrees logarithmic in the network size, and are computationally more efficient than non-hierarchical spectral clustering algorithms. More interestingly, we identify regimes where no algorithm can recover all communities simultaneously while our algorithm can still recover the mega-communities (unions of communities defined by the hierarchy) consistently without recovering the finest structure. Our theoretical results are based on the newly developed two-to-infinity eigenspace perturbation theory for binary random matrices with independent or dependent entries.

Bio: Lihua Lei is a postdoctoral researcher in Statistics at Stanford University, advised by Emmanuel Candès. Prior to joining Stanford, he obtained his Ph.D. in statistics at UC Berkeley, advised by Peter Bickel and Michael Jordan. His research areas include network analysis, causal inference, conformal inference, multiple hypothesis testing, and stochastic optimization.

Dynamic Pricing and Learning under the Bass Model
Shipra Agrawal

Abstract: We consider a novel formulation of the dynamic pricing and demand learning problem, where the evolution of demand in response to posted prices is governed by a stochastic variant of the popular Bass model with parameters (α, β) that are linked to the so-called "innovation" and "imitation" effects. Unlike the more commonly used i.i.d. demand models, in this model the price posted not only affects the demand and the revenue in the current round but also the evolution of demand, and hence the fraction of market potential that can be captured, in future rounds. Finding a revenue-maximizing dynamic pricing policy in this model is non-trivial even when model parameters are known, and requires solving for the optimal non-stationary policy of a continuous-time, continuous-state MDP. In this paper, we consider the problem of dynamic pricing is used in conjunction with learning the model parameters, with the objective of optimizing the cumulative revenues over a given selling horizon. Our main contribution is an algorithm with a regret guarantee of O (m^2/3), where m is mnemonic for the (known) market size. Moreover, we show that no algorithm can incur smaller order of loss by deriving a matching lower bound. We observe that in this problem the market size m, and not the time horizon T, is the fundamental driver of the complexity; our lower bound in fact indicates that for any fixed α,β, most non-trivial instances of the problem have constant T and large m. This insight sets the problem setting considered here uniquely apart from the MAB type formulations typically considered in the learning to price literature.

Bio: Shipra Agrawal is Cyrus Derman Assistant Professor of the Department of Industrial Engineering and Operations Research. She is also affiliated with the Department of Computer Science and the Data Science Institute, at Columbia University. She received her Ph.D. from Stanford University in June 2011 under the guidance of Prof. Yinyu Ye and was a researcher at Microsoft Research India from 2011 to 2015. Her research spans several areas of optimization and machine learning, including online optimization under uncertainty, multi-armed bandits, online learning, and reinforcement learning. Shipra serves as an associate editor for Management Science, Mathematics of Operations Research, and INFORMS Journal on Optimization. Her research is supported by a Google Faculty Research Award, an Amazon research award, and an NSF CAREER Award.

Investment Incentives in Near-Optimal Mechanisms
Paul Milgrom

Abstract: In many real-world resource allocation problems, optimization is computationally intractable, so any practical allocation mechanism must be based on an approximation algorithm. We study investment incentives in strategy-proof mechanisms that use such approximations. In sharp contrast with the Vickrey-Clark-Groves mechanism, for which individual returns on investments are aligned with social welfare, we find that some algorithms that approximate efficient allocation arbitrarily well can nevertheless create misaligned investment incentives that lead to arbitrarily bad overall outcomes. However, if a near-efficient algorithm "excludes bossy negative externalities," then its outcomes remain near-efficient even after accounting for investments. A weakening of this "XBONE" condition is necessary and sufficient for the result.

Bio: Paul Milgrom is the Shirley and Leonard Ely professor of Humanities and Sciences in the Department of Economics at Stanford University and professor, by courtesy, at both the Department of Management Science and Engineering and the Graduate School of Business. He is the world's leading auction designer, known for his work on auction theory and innovative resource allocation methods, particularly in radio spectrum. He is the co-recipient of the 2020 Nobel Prize in Economic Sciences, together with Robert Wilson, "for improvements to auction theory and inventions of new auction formats."

Network Cluster-Robust Inference
Michael Leung

Abstract: Since network data commonly consists of observations on a single large network, researchers often partition the network into clusters in order to apply cluster-robust inference methods. All existing such methods require clusters to be asymptotically independent. We prove that for this requirement to hold, under certain conditions, it is necessary and sufficient for clusters to have low conductance, the ratio of edge boundary size to volume, which yields a measure of cluster quality. We show in simulations that, for important classes of networks lacking low-conductance clusters, cluster-robust methods can exhibit substantial size distortion, whereas for networks with such clusters, they outperform HAC estimators. To assess the existence of low-conductance clusters and construct them, we draw on results in spectral graph theory showing a close connection between conductance and the spectrum of the graph Laplacian. Based on these results, we propose to use the spectrum to compute the number of low-conductance clusters and spectral clustering to compute the clusters. We illustrate our results and proposed methods in simulations and empirical applications.

Bio: Michael Leung is an economist at USC whose work focuses on developing econometric methods for network data.

Identifying the latent space geometry of network models through analysis of curvature
Arun Chandrasekhar

Abstract: Statistically modeling networks, across numerous disciplines and contexts, is fundamentally challenging because of (often high-order) dependence between connections. A common approach assigns each person in the graph to a position on a low-dimensional manifold. Distance between individuals in this (latent) space is inversely proportional to the likelihood of forming a connection. The choice of the latent geometry (the manifold class, dimension, and curvature) has consequential impacts on the substantive conclusions of the model. More positive curvature in the manifold, for example, encourages more and tighter communities; negative curvature induces repulsion among nodes. Currently, however, the choice of the latent geometry is an a priori modeling assumption and there is limited guidance about how to make these choices in a data-driven way. In this work, we present a method to consistently estimate the manifold type, dimension, and curvature from an empirically relevant class of latent spaces: simply connected, complete Riemannian manifolds of constant curvature. Our core insight comes by representing the graph as a noisy distance matrix based on the ties between cliques. Leveraging results from statistical geometry, we develop hypothesis tests to determine whether the observed distances could plausibly be embedded isometrically in each of the candidate geometries. We explore the accuracy of our approach with simulations and then apply our approach to data-sets from economics and sociology as well as neuroscience. Paper.

Bio: Arun Chandrasekhar is an Associate Professor of Economics at Stanford University. His work focuses on development economics. He studies the role that social networks play in developing countries. In particular, he is interested in how the economics of networks can help us understand information aggregation failures and breakdown of cooperation in the developing world.

Data and Incentives
Annie Liang

Abstract: Markets for lending and insurance incentivize good behavior by forecasting risk on the basis of past outcomes. As "big data" expands the set of covariates used to predict risk, how will these incentives change? We show that "attribute" data which is informative about consumer quality tends to decrease effort, while "circumstance" data which predicts idiosyncratic shocks to outcomes tends to increase it. When covariates are independent, this effect is uniform across all consumers. Under more general forms of correlation, this effect continues to hold on average, but observation of a new covariate may lead to disparate impact---increasing effort for some consumer groups and decreasing it for others. A regulator can improve social welfare by restricting the use of either attribute or circumstance data, and by limiting the use of covariates with substantial disparate impact. Paper.

Bio: Annie Liang is an Assistant Professor of Economics and an Assistant Professor of Computer Science at Northwestern University. Her work focuses on economic theory and the application of machine learning techniques in the social sciences. She has studied the dynamics of strategic information acquisition, as well as the use of machine learning to evaluate and improve economic models.

Design and Analysis of Bipartite (Market) Experiments
Jean Pouget-Abadie

Abstract: Bipartite experiments are randomized experiments where treatment is applied to one set of units, while outcomes are measured on a different set of units. The interactions between the treated units and the "outcome" units can be captured by a bipartite graph. Bipartite experiments are a recent object of study in causal inference, whereby treatment is applied to one set of units and outcomes of interest are measured on a different set of units. These experiments are particularly useful in settings where strong interference effects occur between units of a bipartite graph. In market experiments for example, assigning treatment at the seller-level and measuring outcomes at the buyer-level (or vice-versa) may lead to causal models that better account for the interference that naturally occurs between buyers and sellers. In this talk, we will cover the motivation for and formal setting of bipartite experiments. Furthermore, we will explore possible design choices for such experiments, namely clustered randomized designs, as well as various unbiased inference methods. Paper (a). Paper (b).

Bio: Jean is a research scientist at Google NY, on the Algorithms and Optimization team, led by Vahab Mirrokni. Before Google, Jean was a PhD student in computer science at Harvard University, advised by Edoardo Airoldi and Salil Vadhan. Prior to that, he was an undergraduate at Ecole Polytechnique in Paris. His recent research interests focus on causal inference, particularly when interactions between units are present. For more information, see website .

The Value of Observability in Dynamic Pricing
José Correa

Abstract: We consider a dynamic pricing problem where a firm sells one item to a single buyer in order to maximize expected revenues. The firm commits to a price function over an infinite horizon. The buyer arrives at some random time with a private value for the item. He is more impatient than the seller and strategizes the time of his purchase in order to maximize his expected utility, which implies either buying immediately or waiting to benefit from a lower price. We study how important is to observe the buyer arrival time in terms of the seller's expected revenue. When the seller can observe the arrival of the buyer, she can make the price function contingent on his arrival time. On the contrary, when the seller cannot observe the arrival, her price function is fixed at time zero for the whole horizon. The value of observability (VO) is defined as the worst case ratio between the expected revenue of the seller when she observes the buyer's arrival and that when she does not. Our main result establishes that in a very general setting about valuation and arrival time distributions, the value of observability is bounded by a small constant. To obtain this bound we fully characterize the observable arrival setting and use this solution to construct a random and periodic price function for the unobservable case. This is joint work with Dana Pizarro and Gustavo Vulcano

Bio: José Correa is a full professor in the Department of Industrial Engineering at Universidad de Chile. José obtained a mathematical engineering degree from Universidad the Chile in 1999 and a PhD in Operations Research from MIT in 2004. His research, focusing in algorithmic game theory and mechanism design, has received numerous awards including an ACM SIGecom best paper award, an INFORMS Transportation Science and Logistics best paper awards, a Tucker prize finalist, and research awards from Amazon and Google. José has given keynote talks at several institutions and conferences and has been in the program committee of international computer science conferences. He also serves and has served in the editorial board of some of the leading journals of his field: Mathematical Programming B, Mathematics of Operations Research (as Game Theory Area Editor), and Operations Research.

Displacement Effects in a Hot Spot Policing Intervention in Medellin: Inference and Pitfalls
Guillaume Basse

Abstract: In hot policing, resources are targeted at specific locations predicted to be at high risk of crime; so-called ``hot spots.'' Rather than reduce overall crime, however, there is a concern that these interventions simply displace crime from the targeted locations to nearby non-hot spots. We address this question in the context of a large-scale randomized experiment in Medellin, Colombia, in which police were randomly assigned to increase patrols at a subset of possible hotspots. Estimating the displacement effects on control locations is difficult because the probability that a nearby hotspot is treated is a complex function of the underlying geography. While existing methods developed for this ``general interference'' setting, especially Horvitz-Thompson (HT) estimators, have attractive theoretical properties, they can perform poorly in practice and mislead practitioners. In this talk, I explore the key pitfalls that practitioners should watch out for when conducting this type of analysis, and propose some ways to partially remedy them.

Bio: Guillaume Basse is an Assistant Professor in the MS&E and Statistics departments at Stanford. His research focuses broadly on Causal Inference and Design of Experiments in complex settings. He is particularly interested in complications arising when interventions spillover across space and / or time.

Gradient Flows in the Wasserstein Metric: From Discrete to Continuum via Regularization
Katy Craig

Abstract: Over the past ten years, optimal transport has become a fundamental tool in statistics and machine learning: the Wasserstein metric provides a new notion of distance for classifying distributions and a rich geometry for interpolating between them. In parallel, optimal transport has led to new theoretical results on the stability and long time behavior of partial differential equations through the theory of Wasserstein gradient flows. These two lines of research recently intersected in a series of works that characterized the dynamics of training neural networks with a single hidden layer as a Wasserstein gradient flow. In this talk, I will briefly introduce the mathematical theory of Wasserstein gradient flows and describe recent results on discrete to continuum limits. In particular, I will show how passing from the discrete to continuum limit by introducing an appropriate regularization can lead to faster rates of convergence, as well as novel, deterministic particle methods for diffusive processes.

Bio: Katy Craig is an assistant professor in the department of mathematics at the University of California, Santa Barbara. Prior to coming to UCSB, Katy received her Ph.D. from Rutgers University, held an NSF postdoc at UCLA, and held a UC President’s Postdoctoral Fellowship at UCSB.

Optimal Stochastic Trace Estimation
Christopher Musco

Abstract: I will discuss algorithms for an important computational primitive in linear algebra: approximately computing the trace of an implicit matrix A that can only be accessed through matrix-vector multiplications. Trace approximation finds applications across machine learning and scientific computing, where it is used to compute matrix norms, spectral densities, log-determinants, triangle counts in graphs, and much more. In 1990, Hutchinson introduced an elegant randomized algorithm for the trace approximation problem that has become ubiquitous in practice. I will introduce a simple modified version of this algorithm that provides the same theoretical guarantees, but requires quadratically fewer matrix-vector multiplications, and performs far better in experiments. We pair this result with matching lower bounds based on reductions to communication complexity and hypothesis testing for spiked-covariance matrices. Our lower bounds fall into a broader research agenda of better understanding the computational complexity of basic linear algebra problems in the restricted "matrix-vector query" model of computation, which generalizes common algorithmic frameworks like linear sketching and Krylov subspace methods. Joint work with Raphael A. Meyer, Cameron Musco, and David P. Woodruff. Paper.

Vintage Factor Analysis with Varimax Performs Statistical Inference
Karl Rohe

Abstract: Vintage Factor Analysis is nearly a century old and remains popular today with practitioners. A key step, the factor rotation, is historically controversial because it appears to be unidentifiable. This controversy goes back as far as Charles Spearman. The unidentifiability is still reported in all modern multivariate textbooks. This talk will overturn this controversy and provide a positive theory for PCA with a varimax rotation. Just as sparsity helps to find a solution in p>n regression, we show that sparsity resolves the rotational invariance of factor analysis. PCA + varimax is fast to compute and provides a unified spectral estimation strategy for Stochastic Blockmodels, topic models (LDA), and nonnegative matrix factorization. Moreover, the estimator is consistent for an even broader class of models and the old factor analysis diagnostics (which have been used for nearly a century) assess the identifiability. Paper.

Bio: Karl Rohe is an Associate Professor of Statistics at the University of Wisconsin-Madison, with courtesy appointments in Journalism, Educational Psychology, and Electrical & Computer Engineering. He is an AE at JRSS-B and JASA. His PhD is from Berkeley in 2011, working with Bin Yu.

Nash Social Welfare Approximation for Strategic Agents
Ruta Mehta

Abstract: A central goal in the long literature on fair division is the design of mechanisms that implement fair outcomes, despite the participants' strategic behavior. We study this question by measuring the fairness of an allocation using the geometric mean of the agents' values, known as the Nash social welfare (NSW). This objective is maximized by widely known concepts such as the Nash bargaining solution, proportional fairness, and the competitive equilibrium with equal incomes; we focus on (approximately) implementing this objective and analyze the Trading Post mechanism. We consider allocating goods that are substitutes or complements and show that this mechanism achieves an approximation of 2 for concave utility functions, and becomes essentially optimal for complements, where it can reach (1+e) for any e > 0. Moreover, we show that the Nash equilibria of this mechanism are pure and provide individual fairness in the sense of proportionality. (Joint work with Simina Branzei and Vasilis Gkatzelis. To appear in Operations Research.)

Bio: Ruta Mehta is an assistant professor in CS at UIUC, working on algorithmic, complexity, strategic, fairness, and learning aspects of various game-theoretic and economic problems. Prior to this, she was a postdoctoral fellow at the Simons Institute, UC Berkeley (Aug'15 - Dec'15), and at Georgia Tech (Sept'12 - July'15). She did her Ph.D. from IIT-Bombay. Her Ph.D. thesis, titled "Nash Equilibrium Computation in Various Games" won the ACM India Doctoral Dissertation Award, 2012. Other awards she received include Best Postdoctoral Fellow Award, 2014 at Georgia Tech, and the NSF CAREER Award, 2018.

Developing Data Efficient Algorithms for AI
Carrie Wu

Abstract: Our increasingly ambitious goals in artificial intelligence motivate several key algorithmic challenges, such as: how do we design algorithms that make the best use of the data that is available, and how do we design algorithms that are empirically and theoretically effective on the kinds of data that we often see in practice, for example, data with temporal dependencies and data that follow distributions that are hard to describe. In this talk, I will give examples of algorithmic solutions that addresses some of these challenges. I will first present a theoretical analysis of rates of convergence for SGD with experience replay, which is a technique used in Reinforcement Learning to break temporal differences in data. I will then present an algorithm that solves Markov Decision Processes with nearly optimal sample and runtime guarantees. Lastly, I will present an algorithmic solution for estimating local density for an arbitrary dataset.

Social network dependence, the replication crisis, and (in)valid inference
Betsy Ogburn

Abstract: In the first part of this talk, we show that social network dependence can result in /spurious associations due to network dependence/, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we test for network dependence and for possible spurious associations in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure. But data with network dependence abounds, and in many settings researchers are explicitly interested in learning about social network dynamics. Therefore, there is high demand for methods for causal and statistical inference with social network data. The second part of the talk describes recent work on causal inference for observational data from a single social network, focusing on (1) new types of causal estimands that are of interest in social network settings, and (2) conditions under which central limit theorems hold and inference based on approximate normality is licensed.

Bio: Dr. Elizabeth (Betsy) Ogburn is currently an Associate Professor in the Department of Biostatistics at Johns Hopkins University. She is also the founder of the COVID-19 Collaboration Platform. She received her Ph.D. in Biostatistics from Harvard University, and her research interests include causal inference and epidemiological methods.

Social network dependence, the replication crisis, and (in)valid inference
Betsy Ogburn

Abstract: In the first part of this talk, we show that social network dependence can result in /spurious associations due to network dependence/, potentially contributing to replication crises across the health and social sciences. Researchers in these fields frequently sample subjects from one or a small number of communities, schools, hospitals, etc., and while many of the limitations of such convenience samples are well-known, the issue of statistical dependence due to social network ties has not previously been addressed. A paradigmatic example of this is the Framingham Heart Study (FHS). Using a statistic that we adapted to measure network dependence, we test for network dependence and for possible spurious associations in several of the thousands of influential papers published using FHS data. Results suggest that some of the many decades of research on coronary heart disease, other health outcomes, and peer influence using FHS data may suffer from spurious estimates of association and anticonservative uncertainty quantification due to unacknowledged network structure. But data with network dependence abounds, and in many settings researchers are explicitly interested in learning about social network dynamics. Therefore, there is high demand for methods for causal and statistical inference with social network data. The second part of the talk describes recent work on causal inference for observational data from a single social network, focusing on (1) new types of causal estimands that are of interest in social network settings, and (2) conditions under which central limit theorems hold and inference based on approximate normality is licensed.

Bio: Dr. Elizabeth (Betsy) Ogburn is currently an Associate Professor in the Department of Biostatistics at Johns Hopkins University. She is also the founder of the COVID-19 Collaboration Platform. She received her Ph.D. in Biostatistics from Harvard University, and her research interests include causal inference and epidemiological methods.

Discrimination in Online Markets: Effects of Social Bias on Learning from Reviews and Policy Design
Faidra Monachou

Abstract: The increasing popularity of online two-sided markets such as ride-sharing, accommodation and freelance labor platforms, goes hand in hand with new socioeconomic challenges. One major issue remains the existence of bias and discrimination against certain social groups. We study this problem using a two-sided large market model with employers and workers mediated by a platform. Employers who seek to hire workers face uncertainty about a candidate worker’s skill level. Therefore, they base their hiring decision on learning from past reviews about an individual worker as well as on their (possibly misspecified) prior beliefs about the ability level of the social group the worker belongs to. Drawing upon the social learning literature with bounded rationality and limited information, we show that uncertainty combined with social bias leads to unequal hiring opportunities between workers of different social groups. Although the effect of social bias decreases as the number of reviews increases (consistent with empirical findings), minority workers still receive lower expected payoffs. Finally, we consider a simple directed matching policy (DM), which combines learning and matching to make better matching decisions for minority workers. Under this policy, there exists a steady-state equilibrium, in which DM reduces the discrimination gap.
Based on joint work with Itai Ashlagi.

Beyond the First Welfare Theorem: Counteracting Inequality in Markets
Benjamin Plaut

Abstract: We study market mechanisms for allocating divisible goods to competing agents with quasilinear utilities. For linear pricing (i.e., the cost of a good is proportional to the quantity purchased), the First Welfare Theorem states that market equilibria maximize the sum of agent valuations. This ensures efficiency, but can lead to extreme inequality across individuals. It is known that any Pareto optimum can be a market equilibrium, but only if either personalized pricing or an arbitrary redistribution of wealth is allowed, both of which may be unacceptable to participants and/or logistically impractical.
In this talk, we propose a novel nonlinear pricing rule which allows us to implement a large family of CES welfare functions as equilibria. Like the First Welfare Theorem, our pricing rule is simple, intuitive, and anonymous, but unlike the First Welfare Theorem, we can precisely control the tradeoff between equality and efficiency (by our choice of which CES welfare function to implement). Our result holds for any valuations that are homogeneous, differentiable, and concave.
The talk is based on joint work with Ashish Goel.

Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations
Edward McFowland III

Abstract: Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node’s network partners being informative about the node’s attributes and therefore its behavior. We show that if the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ties. Moreover, these estimates are informative enough that controlling for them allows for unbiased and consistent estimation of social-influence effects in additive models. In addition, we provide bounds on the potential bias in estimates of social-influence for finite samples. These are the first results on the consistent estimation of social-influence effects in the presence of latent homophily, and we discuss the prospects for generalizing them.

Bio: Dr. Edward McFowland III is an Assistant Professor of Information and Decision Sciences in the Carlson School of Management, at the University of Minnesota; he received his Ph.D. in Information Systems and Management from Carnegie Mellon University. Dr. McFowland’s research interests—which lie at the intersection of Information Systems, Machine Learning, and Public Policy—include the development of computationally efficient algorithms for large-scale statistical machine learning and “big data” analytics. More specifically, his research seeks to demonstrate that many real-world problems faced by organizations, and society more broadly, can be reduced to the tasks of anomalous pattern detection and discovery. As a data and computational social scientist, Dr. McFowland’s broad research goal is bridging the gap between machine learning and the social sciences (e.g., economics, public policy, and management) both through the application of machine learning methods to social science problems and through the integration of machine learning and econometric methodologies.
Dr. McFowland’s research has been supported by Adobe, Facebook, AT&T Research Labs, and the National Science Foundation; his work has been published in leading Management, Machine Learning, and Statistics journals and conferences.

"Theompirical" Research of Service Systems
Avishai Mandelbaum, Technion

Abstract: I shall describe my personal research journey through service systems (e.g. hospitals, telephone and chat centers, banks, courts). I view these systems through MS/IE/OM lenses, often more specifically as a queueing scientist (e.g. "enjoying" congestion and flows), and sometimes using operational characteristics as surrogates for other performance indicators (e.g. clinical, psychological, financial). The goal of the research is to create principles and tools that support the above viewpoints; and the means for achieving this goal is the marriage of theory with data.
To be more concrete, I am modeling complex service systems as relatively simple (robust) processing networks. My theoretical framework is (asymptotic) queueing theory, specifically parsimonious fluid models and their diffusion refinements: queueing theory is ideally suitable for capturing the operational tradeoff that is at the core of any service, namely quality vs. efficiency (possibly augmented with fairness or profitability); and asymptotic analysis accommodates complex service characteristics that are otherwise mathematically intractable, for example transience, scale and scope. My data/empirical framework builds on an extensive data repository of service event-logs, at the level of the individual customer-server transactions (e.g. patient-physician or customer-agent).
Marrying theory with data, as I see it, will culminate in the creation of models directly from data - automatically and in real-time - and consequently the validation of the models' value againts actual service systems. (This is in contrast to still prevalent OR/MS/IE/OM practice, where models are too often too remote from data, and approximations are validated for merely accuracy relative to their originating models.) More specifically, data-based models - simulation, empirical, statistical and mathematical - will be created via process-mining of their primitives, structure and protocols. Simulation models will then serve as a validation ground for the other models, and all will be universally accessible for applications by researchers, students and in the longer-run practitioners.
The above research agenda has been advanced, over 15 years or so, at the Technion SEE Laboratory (SEE = Service Enterprise Engineering); SEELab data will hence be used, throughout my lecture, to make it concrete.

Bio: Avishai Mandelbaum is the Benjamin & Florence Free Professor at the Faculty of Industrial Engineering and Management (IE&M), Technion, Israel. He has a B.Sc. in Mathematics and Computer Science and an M.A. in Statistics, both summa cum laude from Tel Aviv University. His Ph.D. is in Operations Research, from Cornell University. After graduation, in 1983, he joined the Graduate School of Business at Stanford University. He then returned to Israel, in 1991, to assume a position at the Technion. He served as IE&M Dean during 2015-2018.
Prof. Mandelbaum is a fellow of INFORMS and MSOM. He was an associate editor of the leading journals in his field, and his research and teaching have enjoyed various prizes, within Technion and beyond. His research covers stochastic models (analysis, asymptotics, control) and statistics, with applications to queueing theory/science and service systems (e.g. tele‐services, hospitals).
Prof. Mandelbaum is a founder and the director of the Technion SEE Laboratory. Since its inception in 2007, this lab has been collecting and maintaining a unique rich repository of data from ample service operations. Data granularity is at the level of the individual customer‐server transactions (event logs). And through its data, the SEELab has been supporting worldwide research and teaching of Service Science, Engineering and Management.

Fair and Efficient Allocation of Indivisible Goods and Chores
Haris Aziz, UNSW Sydney

Abstract: I consider the problem of fairly and efficiently dividing a set of items. Much of the fair division literature assumes that the items are "goods" i.e., they yield positive utility for the agents. There is also some work where the items are "chores" that yield negative utility for the agents. I consider a more general setting where an agent may have negative or positive utility for each item. This framework captures, e.g., fair task assignment, where agents can have both positive and negative utilities for each task. I discuss several new algorithms for finding fair and efficient allocations in this general setting. The presentation will be based on some recent publications (written with I. Caragiannis, A. Igarashi, T. Walsh, H. Moulin, F. Sandomirskiy, and S. Rey).

Bio: Haris Aziz is a Scientia fellow and senior lecturer at UNSW Sydney, team leader of the Algorithmic Decision Theory group, and director of the Sydney EconCS network. His research interests lie at the intersection of artificial intelligence, theoretical computer science and mathematical social sciences ---, especially computational social choice and algorithmic game theory.
Haris is a recipient of the Scientia Fellowship (2018 - ), CORE Chris Wallace Research Excellence Award (2017) and the Julius Career Award (2016 - 2018). In 2015, he was selected by the Institute of Electrical and Electronics Engineers (IEEE) for the AI 10 to Watch List. He was also selected for the Early Career Spotlight list at IJCAI 2016. He is on the board of directors of IFAAMAS and is an associate editor of JAIR.
Haris completed his PhD from University of Warwick in 2009, MSc from Oxford University and his BSc (Honours) from Lahore University of Management Sciences. He undertook postdoctoral research at the Ludwig Maximilian University of Munich and TU Munich in Germany.

From Reinforcement Learning to Stochastic Optimization: A Universal Framework for Sequential Decision Analytics
Warren Powell, Princeton

Abstract: Reinforcement learning has attracted considerable attention with its successes in mastering advanced games such as chess and Go. This attention has ignored major successes such as landing SpaceX rockets using the tools of optimal control, or optimizing large fleets of trucks and trains using tools from operations research and approximate dynamic programming. In fact, there are over 15 different communities all contributing to the vast range of sequential decision problems that arise in energy, finance, transportation, health, engineering and the sciences. As each community has evolved to address a broader range of problems, there has been a consistent pattern of rediscovery of tools that sometimes differ in name only, or modest implementation details.
I will represent all of these communities using a single, canonical framework that mirrors the widely used modeling style from deterministic math programming. The key difference when introducing uncertainty is the need to optimize over policies. I will show that all the solution strategies (that is, policies) suggested by the research literature, in addition to some that are widely used in practice, can be organized into four fundamental classes. One of these classes, which we call “parametric cost function approximations,” is widely used in practice, but has been largely overlooked by the academic community, with the notable exception of the bandit community. These ideas will be illustrated using applications drawn from transportation, energy, emergency response and materials science.

Bio: Warren B. Powell is a professor in the Department of Operations Research and Financial Engineering at Princeton University, where he has taught since 1981. He is the founder and director of CASTLE Labs, which spans contributions to models and algorithms in stochastic optimization, with applications to energy systems, transportation, health, e-commerce, and the laboratory sciences (see He has pioneered the use of approximate dynamic programming for high-dimensional applications, and the knowledge gradient for active learning problems. His recent work has focused on developing a unified framework for sequential decision problems under uncertainty.

Representation, Modeling, and Optimization in Reinforcement Learning
Sham Kakade, University of Washington, Seattle

Abstract: Reinforcement learning is now the dominant paradigm for how an agent learns to interact with the world. The approach has lead to successes ranging across numerous domains, including game playing and robotics, and it holds much promise in new domains, from self driving cars to interactive medical applications. Some of the central challenges are:
- Representational learning: does having a good representation of the environment permit efficient reinforcement learning?
- Modeling: should we explicitly build a model of our environment or, alternatively, should we directly learn how to act?
- Optimization: in practice, deployed algorithms often use local search heuristics. Can we provably understand when these approaches are effective and provide faster and more robust alternatives?
This talk will survey a number of results on these basic questions. Throughout, we will highlight the interplay of theory, algorithm design, and practice.

Bio: Sham Kakade is a Washington Research Foundation Data Science Chair, with a joint appointment in both the Allen School and Department of Statistics at the University of Washington. He works on the theoretical foundations of machine learning, focusing on designing (and implementing) statistically and computationally efficient algorithms. Amongst his contributions with a diverse set of collaborators are: establishing principled approaches in reinforcement (including the natural policy gradient, conservative policy iteration, and the PAC-MDP framework); optimal algorithms in stochastic and non-stochastic multi-armed bandit problems (including the linear bandit and the Gaussian process bandit); computationally and statistically efficient spectral algorithms for estimation of latent variable models (including estimation of mixture of Gaussians, latent Dirichlet allocation, hidden Markov models, and overlapping communities in social networks); faster algorithms for large scale convex and nonconvex optimization. He is the recipient of the IBM Goldberg best paper award (in 2007) for contributions to fast nearest neighbor search and the best paper, INFORMS Revenue Management and Pricing Section Prize (2014). He has been program chair for COLT 2011.

(a biased selection of) Recent Developments in Combinatorial Auctions
Matt Weinberg, Princeton

Abstract: In a combinatorial auction there are m items, and each of n players has a valuation function v_i which maps sets of items to non-negative reals. A designer wishes to partition the items into S_1,...,S_n to maximize the welfare (\sum_i v_i(S_i) ), perhaps assuming that all v_i lie in some class V (such as submodular, subadditive, etc.).
Within Algorithmic Game Theory, this problem serves as a lens through which to examine the interplay between computation and incentives. For example: is it the case that whenever a poly-time/poly-communication algorithm for honest players can achieve an approximation guarantee of c when all valuations lie in V, a poly-time/poly-communication truthful mechanism for strategic players can achieve an approximation guarantee of c when all valuations lie in V as well?
In this talk, I’ll give a brief history, then survey three recent results on this topic which:
- provide the first separation between achievable guarantees of poly-communication algorithms and poly-communication truthful mechanisms for any V (joint works with Mark Braverman and Jieming Mao, and with Sepehr Assadi, Hrishikesh Khandeparkar, and Raghuvansh Saxena).
- resolve the communication complexity of combinatorial auctions for two subadditive players (joint work with Tomer Ezra, Michal Feldman, Eric Neyman, and Inbal Talgam-Cohen).
- revisit existing separations between poly-time algorithms and poly-time truthful mechanisms via a new solution concept “Implementation in Advised Strategies” (joint work with Linda Cai and Clayton Thomas).

Bio: Matt Weinberg is an assistant professor of computer science at Princeton University. Before that, he was also a postdoc at Princeton CS, and a research fellow at the Simons Institute (Algorithmic Game Theory in 2015, and Algorithms and Uncertainty in 2016). He received his PhD from MIT in 2014 with Costis Daskalakis.

Optimal Priority-Based Allocation Mechanisms
Peng Shi, USC Marshall School of Business

Abstract: This paper develops a tractable methodology for designing an optimal priority system for assigning agents to heterogeneous items while accounting for agents' choice behavior. The space of mechanisms being optimized includes deferred acceptance and top trading cycles as special cases. In contrast to previous literature, I treat the inputs to these mechanisms, namely the priority distribution of agents and quotas of items, as parameters to be optimized. The methodology is based on analyzing large market models of one-sided matching using techniques of network revenue management, and solving a certain assortment planning problem whose objective is social welfare. I apply the methodology to school choice and show that restricting choices may be beneficial to student welfare. Moreover, I derive practical polynomial-time algorithms for solving the problem under a variety of utility models and constraints, and apply them to compute optimized choice sets and priorities for elementary school choice in Boston.

Bio: Peng is an assistant professor in the department of Data Sciences at Operations at the USC Marshall School of Business. He is interested in developing quantitative methodologies for the betterment of society. His current research focuses on optimization in matching markets, with applications in school choice, public housing, and online marketplaces. Prior to joining USC, he completed a PhD in operations research at MIT, and was a post-doctoral researcher at Microsoft Research New England.

Click-Based MNL: Algorithmic Frameworks for Modeling Click Data in Assortment Optimization
Ali Aouad, London Business School

Abstract: In this paper, we introduce the click-based MNL choice model, a novel framework for modeling customer purchasing decisions in e-commerce settings. Our main modeling idea is to assume that the click behavior within product recommendation or search results pages provides an exact signal regarding the alternatives considered by each customer. We study the resulting assortment optimization problem, where the objective is to select a subset of products, made available for purchase, to maximize the expected revenue. Our main algorithmic contribution comes in the form of a polynomial-time approximation scheme (PTAS) for this problem, showing that the optimal expected revenue can be efficiently approached within any degree of accuracy. In the course of establishing this result, we develop novel technical ideas, including enumeration schemes and stochastic inequalities, which may be of broader interest for related stochastic knapsack problems. Experiments on data acquired in collaboration with the retailer Alibaba demonstrate the practical significance of the proposed choice model. We generate realistic assortment optimization instances that mirror Alibaba's display customization problem, and implement practical variants of our approximation scheme to compute assortment recommendations in these settings. We find that the recommended assortments have the potential to be at least 9% more profitable than those resulting from a standard MNL model.

Joint work with Jacob Feldman, Danny Segev, and Dennis Zhang.

Bio: Ali Aouad is an assistant professor at the London Business School (UK). Ali's research focuses on the design and performance analysis of algorithms with applications to operations management. He is broadly interested in resource allocation problems in connection with choice modeling, revenue management and matching markets. Ali was awarded the Best Student Paper Prize by the Operations Research Center at MIT in 2015, and was a finalist in the INFORMS Nicholson Prize competition in 2016. He received a PhD in Operations Research from Massachusetts Institute of Technology (MIT). Before MIT, he earned an MS in Applied Mathematics from the Ecole Polytechnique, Paris. He was born in Meknes, Morocco.

Societal Concerns in Targeted Advertising
Aleksandra Korolova, USC

Abstract: The enormous financial success of online advertising platforms is in large part due to the advanced targeting features they offer. I will discuss recent findings showing how implementations of targeted advertising create new societal concerns related to privacy, manipulation of the vulnerable, and discrimination. Furthermore, I will demonstrate that the ad delivery optimization algorithms run by the platforms can lead to skew in delivery along gender and racial lines, even when such skew was not intended by the advertiser. I will conclude by introducing a new fairness notion, preference-informed fairness, that could serve as a novel step towards formally studying fairness in scenarios such as targeted advertising, where individuals have complex and diverse preferences over possible outcomes.

Bio: Aleksandra Korolova is a WiSE Gabilan Assistant Professor of Computer Science at USC, where she researches algorithms and technologies that enable data-driven innovations while preserving privacy and fairness. Prior to joining USC, Aleksandra was a research scientist at Google. Aleksandra received her PhD in Computer Science from Stanford University. Her PhD thesis, which focused on protecting privacy when mining and sharing user data, has been recognized by the Arthur L. Samuel Thesis Award 2011-2012, for the best PhD thesis in the Computer Science Department at Stanford. Aleksandra is also a co-winner of the 2011 PET Award for exposing privacy violations of microtargeted advertising and a runner-up for the 2015 PET Award for RAPPOR, the first commercial deployment of differential privacy. Aleksandra's most recent work, on discrimination in ad delivery, has been cited in Facebook's Civil Rights Audit Report and invited for a briefing for Members of the House Financial Services Committee.

Truthful Aggregation of Budget Proposals
Rupert Freeman, Microsoft Research New York City

Abstract: We consider a participatory budgeting problem in which each voter submits a proposal for how to divide a single divisible resource (such as money or time) among several possible alternatives (such as public projects or activities) and these proposals must be aggregated into a single consensus division. Under L1 preferences - for which a voter's disutility is given by the L1 distance between the consensus division and the division he or she most prefers - the social welfare-maximizing mechanism, which minimizes the average L1 distance between the outcome and each voter's proposal, is incentive compatible (Goel et al. 2016) However, it fails to satisfy the natural fairness notion of proportionality, placing too much weight on majority preferences. Leveraging a connection between market prices and the generalized median rules of Moulin (1980) we introduce the independent markets mechanism, which is both incentive compatible and proportional. We unify the social welfare-maximizing mechanism and the independent markets mechanism by defining a broad class of moving phantom mechanisms that includes both. We show that every moving phantom mechanism is incentive compatible. Finally, we characterize the social welfare-maximizing mechanism as the unique Pareto-optimal mechanism in this class, suggesting an inherent tradeoff between Pareto optimality and proportionality.

Bio: Rupert Freeman is a postdoc at Microsoft Research New York City. Previously, he received his Ph.D. from Duke University under the supervision of Vincent Conitzer. His research focuses on the intersection of artificial intelligence and economics, particularly in topics such as resource allocation, voting, and information elicitation. He is the recipient of a Facebook Ph.D. Fellowship and a Duke Computer Science outstanding dissertation award.

Driver Surge Pricing
Nikhil Garg, Stanford

Abstract: Uber and Lyft ride-hailing marketplaces use dynamic pricing, often called surge, to balance the supply of available drivers with the demand for rides. We study pricing mechanisms for such marketplaces from the perspective of drivers, presenting the theoretical foundation that has informed the design of Uber’s new additive driver surge mechanism. We present a dynamic stochastic model to capture the impact of surge pricing on driver earnings and their strategies to maximize such earnings. In this setting, some time periods (surge)are more valuable than others (non-surge), and so trips of different time lengths vary in the opportunity cost they impose on drivers. First, we show that multiplicative surge, historically the standard on ride-hailing plat-forms, is not incentive compatible in a dynamic setting. We then propose a structured, incentive-compatible pricing mechanism. This closed-form mechanism has a simple form and is well-approximated by Uber’s new additive surge mechanism. Finally, through both numerical analysis and real data from a ride-hailing marketplace, we show that additive surge is more approximately incentive compatible in practice than multiplicative surge, providing more stable earnings to drivers.
Joint work with Hamid Nazerzadeh.

Scrip Systems with Minimal Availability
Suleyman Kerimov, Stanford

Abstract: In economies without monetary transfers, scrip systems serve an alternative to sustain cooperation, improve efficiency and mitigate free riding. This paper considers a marketplace, in which at each time period, one agent requests a service, one agent provides the service, and a unit of artificial currency is used to pay for service provision. We ask whether agents can sustain cooperation when the market is thin, in the sense that only few agents are available to provide the requested service. To study this problem, we analyze the stability of the scrip distribution assuming that among the available agents, the one with the minimum amount of scrips is selected to provide service. When exactly one random agent is available to provide service, the scrip distribution is unstable, since the number of scrips each agent has behaves like a simple random walk in one dimension. However, already when just two random agents are available to provide service, the scrip distribution is stable, in the sense that agents do not deviate much from their initial endowment, with high probability. This suggests that even with minimal liquidity in the market, cooperation can be sustained by balancing service provisions among agents. Our theory builds on the literature on the power of two choices paradigm and load balancing problems. Finally, our results suggest that scrip systems can lead to efficient outcomes in kidney exchange platforms by sustaining cooperation between hospitals.
Joint work with Itai Ashlagi.

Under the Hood of Bike Sharing
Shane Henderson, Cornell University

Abstract: Cornell's work on bike sharing with Citi Bike and its parent company. Motivate relies on a combination of data analysis, stochastic modeling and optimization to help inform both the design and operation of the largest bike-sharing operations in North America. I'll discuss our work and its impact, but focus on some of the inner workings of the stochastic modeling. This includes the use of (one of) the Poisson equation(s) in the computation of a central performance measure, a proof that the resulting objective function has important structural properties, and a heuristic underlying a simulation-optimization principle that is likely useful in many other contexts.
Based on joint works with Ursula Hebert-Johnson, Michael P. Kim and Guy Rothblum.

Bio: Shane G. Henderson is professor and former Director of the School of Operations Research and Information Engineering at Cornell University. He has held positions in the Department of Industrial and Operations Engineering at the University of Michigan and the Department of Engineering Science at the University of Auckland. He is the editor in chief of Stochastic Systems. He has served as chair of the INFORMS Applied Probability Society, and as simulation area editor for Operations Research. He is an INFORMS Fellow. His research interests include discrete-event simulation, simulation optimization, and emergency services planning. He likes cats, climbing walls, biking, soccer, Harry Potter and being a Dad.

A Complexity-Theoretic Perspective on Algorithmic Fairness
Omer Reingold, Stanford

Abstract: Machine learning and data analysis have enjoyed tremendous success in a broad range of domains. These advances hold the promise of great benefits to individuals, organizations, and society as a whole. Undeniably, algorithms are informing decisions that reach ever more deeply into our lives, from news article recommendations to criminal sentencing decisions to healthcare diagnostics. This progress, however, raises (and is impeded by) a host of concerns regarding the societal impact of computation. A prominent concern is that left to their own devices, algorithms will propagate - even amplify - existing biases. Often, definitions of fairness are group-based, typically requiring that a given statistic be equal across a few demographic groups, socially identified as deserving protection. Other definitions aim at individual-level protections. Broad-stroke statistical guarantees tend to be much easier to satisfy (information and complexity theoretically) but tend to be much weaker in the protections they provide. We will discuss recent research towards bridging the gap between statistical and individual protections. These works provide efficient learning algorithms that ensure protection (according to some fairness notion) to every sub-population within some rich class of sets. Our approach to defining the class of sets to protect is complexity-theoretic. We aim at obtaining the strongest fairness guarantees that can be obtained with the available computational resources. Based on joint works with Ursula Hebert-Johnson, Michael P. Kim and Guy Rothblum.

Surrogate Scoring Rules and A Dominant Truth Seru
Yiling Chen, Harvard

Abstract: Strictly proper scoring rules (SPSR) are widely used when designing incentive mechanisms to elicit private information from strategic agents using realized ground truth signals, and they can help quantify the value of elicited information. In this paper, we extend such scoring rules to settings where a mechanism designer does not have access to ground truth. We consider two such settings: (i) a setting when the mechanism designer has access to a noisy proxy version of the ground truth, with known biases; and (ii) the standard peer prediction setting where agents' reports are the only source of information that the mechanism designer has. We introduce surrogate scoring rules (SSR) for the first setting, which use the noisy ground truth to evaluate quality of elicited information. We show that SSR preserves the strict properness of SPSR. Using SSR, we then develop a multi-task scoring mechanism -- called dominant truth serum (DTS) -- to achieve strict properness when the mechanism designer only has access to agents' reports. In comparison to standard equilibrium concepts in peer prediction, we show that DTS can achieve truthfulness in a multi-task dominant strategy. A salient feature of SSR and DTS is that they quantify the quality of information despite lack of ground truth, just as proper scoring rules do for the with verification setting. Our method is verified both theoretically and empirically using data collected from real human participants. This talk is based on joint work with Yang Liu.

Cost-Sharing Methods for Scheduling Games under Uncertainty
Vasilis Gkatzelis, Drexel

Abstract: We study the performance of cost-sharing protocols in a selfish scheduling setting with load-dependent cost functions. Previous work on selfish scheduling protocols has focused on two extreme models: omnipotent protocols that are aware of every machine and every job that is active at any given time, and oblivious protocols that are aware of nothing beyond the machine they control. The main focus of this talk is on a well-motivated middle-ground model of "resource-aware" protocols, which are aware of the set of machines that the system comprises, but unaware of what jobs are active at any given time. Apart from considering budget-balanced protocols, to which previous work was restricted, we augment the design space by also studying the extent to which overcharging can lead to improved performance.

A Rudimentary Index of Strategic Stability: Deterring Defectors, Sustaining Loyalists and Forming Equilibrium
Ehud Kalai, Northwestern

Abstract: A rudimentary index explains Nash-equilibrium choices observed in behavioral economics. A rudimentary index explains Nash-equilibrium choices observed in behavioral economics. The index assigns stability levels φ(π) = 0; 1; :::; n, to strategy profiles π of n-person games: Level 0 is assigned to profiles that are not Nash equilibrium, levels 1,2,...n-1 are assigned to Nash equilibria in increasing levels of stability, and level n is assigned to dominant-strategy equilibria. The index measures players' confidence in the optimality of their choices; and expands Nash's question of "whether a strategy profile deters individual defectors" to "what size defector groups does a strategy profile deter".

The Effect of Identifying Constituents on Representative-Constituent Communication
Justin Grimmer, Stanford

Abstract: Social media sites present an opportunity to connect constituents with elected officials, but platforms rarely enable elected officials to identify constituents or for constituents to identify themselves. We show how providing this information with a ``constituent badge" affects elected officials' and users' behavior. Randomizing users' access to the badge we find that elected officials are no more responsive to users with access to the badge. And we show this null result is unlikely to be because elected officials failed to understand the constituent badge. Access to the badge, however, affects how users interact with elected officials. The badge causes individuals to write higher quality comments that are more focused on politics and less deferential to the elected official's original post. And the badge deepens participation among individuals who are otherwise less likely to participate. Our results demonstrate how the mere opportunity to formally adopt a role can alter behavior.

Computational Interventions to Improve Access to Opportunity for Disadvantaged Populations
Rediet Abebe, Cornell

Abstract: Poverty and economic hardship are highly complex and dynamic phenomena. Due to the multi-faceted nature of economic welfare, assistance programs aimed at improving access to opportunity for disadvantaged populations face challenges, as they often rely on simpler measures of economic hardship such as income or wealth that fail the full complexity of economic welfare. In this presentation, we explore on important dimension -- susceptibility to income shocks. We introduce and analyze a model of economic welfare that incorporates income, wealth, and external shocks and poses the question of how to allocate subsidies in this setting. We find that we can give optimal allocation mechanisms under natural assumptions on the agent's wealth and shock distributions, as well as approximation schemes in the general setting. We conclude with a discussion on how algorithmic, computational, and mechanism design techniques can help inform interventions to improve access to opportunity in relevant domains and the Mechanism Design for Social Good initiative, which facilitates research at this interface.

On the Taylor Expansion of Value Functions
Itai Gurvich, Cornell

Abstract: Abstract: We introduce a framework for approximate dynamic programming that we apply to discrete time chains on $\mathbb{Z}_+^d$ with countable action sets. The framework is grounded in the approximation of the (controlled) chain's generator by that of another Markov process. In simple terms, our approach stipulates applying a second-order Taylor expansion to the value function, replacing the Bellman equation with one in continuous space and time where the transition matrix is reduced to its first and second moments. We develop bounds on the optimality gap---the sub-optimality introduced by using the control produced by the ``Taylored'' equation. These bounds can be viewed as a conceptual underpinning, analytical rather than relying on weak convergence arguments, for the good performance of controls derived from Brownian approximations. Computationally, our framework leads to an ``aggregation'' approach with performance guarantees. While the guarantees are grounded in PDE theory, the practical use of this approach requires no knowledge of that theory.

The Effect of Identifying Constituents on Representative-Constituent Communication
Irene Lo, Stanford

Abstract: In the school choice market, where scarce public school seats are assigned to students, a key operational issue is how to reassign seats that are vacated after an initial round of centralized assignment. Practical solutions to the reassignment problem must be simple to implement, truthful, efficient and fair while also alleviating costly student movement between schools. In this talk, I will propose and axiomatically justify a class of reassignment mechanisms, the Permuted Lottery Deferred Acceptance (PLDA) mechanisms. Our mechanisms generalize the commonly used Deferred Acceptance (DA) school choice mechanism to a two-round setting and retain its desirable incentive, fairness and efficiency properties. School choice systems typically run Deferred Acceptance with a lottery number assigned to each student to break ties in school priorities. I will show that under natural conditions on demand, correlating the tie-breaking lotteries across rounds preserves allocative welfare, and reversing the first-round lottery order minimizes reassignment among all PLDA mechanisms. Empirical investigations based on data from NYC high school admissions support our theoretical findings. This is based on joint work with Itai Feigenbaum, Yash Kanoria and Jay Sethuraman.

The role of academic environment on scientific productivity and prominence
Aaron Clauset, UC Boulder

Abstract: Although meritocracy is a key organizing principle of academia, the social mechanisms that explain individual and institutional differences in scholarly outputs are poorly understood. For instance, which is a better predictor of early-career productivity and prominence, where a scientist was trained or where they currently work? In this talk, I'll describe a causal investigation of the factors that drive scholarly productivity and prominence, which exploits a comprehensive dataset of over 200,000 publications, 7.4 million citations, and job placement information for nearly 2500 early-career Computer Science faculty in the U.S. and Canada. Using a matched-pairs experimental design around initial faculty job placement, I'll show that the prestige of an individual's working environment is a causal driver of their productivity and prominence, not the prestige of their doctoral training. Moreover, this effect cannot be explained by the selection or retention of relatively more productive faculty by departments. I'll then argue that the wide differences faculty productivity and prominence across different departments is best explained by differences in their environmental characteristics, e.g., the number of faculty, of doctoral students, etc. As a result, the scholarly output of early-career faculty is driven by where they work, not by where they trained, and their current productivity and prominence cannot be separated from their place in the academic system. I'll close with a brief discussion of the implications of these results for the emerging field of the science of science, and for academic policy. Joint work with Samuel F. Way, Allison C. Morgan, and Daniel B. Larremore.

Sequential Procurement with Contractual and Experimental Learning
Daniela Saban, Stanford

Abstract: In this paper, we study the design of sequential procurement strategies that integrate stochastic and strategic information. We consider a buyer who repeatedly demands one unit of the same good and is unable to commit to long-term contracts. In each time period, the buyer makes a price offer to a seller who has private, persistent information regarding his cost and quality of provision. If the offer is accepted, the seller provides the good with a stochastic quality that is not contractible; otherwise, the buyer sources the good from a known outside option. In our model, the buyer can learn from observations of the (strategic) acceptance decisions taken by the seller, and from evaluations of the (stochastic) quality delivered whenever a purchase occurs. We show that the buyer's equilibrium strategy corresponds to a dynamic sequence of thresholds on the beliefs on the seller's type. In equilibrium, the buyer offers high prices that facilitate quality experimentation at early stages of the interaction with the seller, and after gathering enough information (and if beliefs are sufficiently low), she advances to offering low prices that may partially separate seller types. To highlight the interplay between the two forms of information, we contrast the buyer's strategy with two benchmark strategies designed to learn from each form of information in isolation. We establish that, paradoxically, the ability to observe delivered quality is not always beneficial for the buyer. (joint work with Y Gur and G Macnamara)

A Practical 4-Approximation Algorithm for Coflow Scheduling
David Shmoys, Cornell

Abstract: Coflows are an emerging network abstraction introduced to model traffic patterns within a datacenter. This abstraction naturally leads to several scheduling optimization problems, and we consider, in particular, optimizing for the sum of weighted completion times. There has been recent work from both the theory and systems communities on this scheduling problem, leading to either practical solutions that perform well in practice and are easy to implement, or theoretical solutions with bounded performance guarantees, albeit not both. In this work, we bridge this gap, and present Sincronia, a network design that matches the best known approximation guarantee, while empirically outperforming the existing state-of-the-art network designs. Sincronia achieves this using a key technical result — we show that given a “right” ordering of coflows, any per-flow rate allocation mechanism achieves average coflow completion time within a factor of 4 of the optimal as long as (co)flows are prioritized with respect to the ordering. We obtain this right ordering of coflows using a primal-dual algorithm. This is joint work with Saksham Agarwal, Shijin Rajakrishnan, Akshay Narayan, Rachit Agarwal, and Amin Vahdat.

Contract Theory: An Algorithmic Approach
Inbal Talgam-Cohen, Technion

Abstract: We consider the classic principal-agent model of contract theory [Holmstrom'79], in which a principal designs an outcome-dependent compensation scheme to incentivize an agent to take a costly and unobservable action. When all of the model parameters-including the full distribution over principal rewards resulting from each agent action-are available to the designer in extensive form, an optimal contract can be computed by linear programming. This talk will examine contract theory through the algorithmic lens, presenting two main results: First, if the principal knows only the first moment of each action's reward distribution, we prove that linear contracts are guaranteed to be worst-case optimal, ranging over all reward distributions consistent with the given moments. This provides a new explanation for the prevalence of this class of simple contracts. Second, we consider a natural class of succinct principal-agent settings, capturing scenarios like a marketing agent whose actions lead to different probabilities of sales of various items. We show how to utilize the succinct structure by an Ellipsoid-based algorithm to obtain a near-optimal contract. We conclude by discussing the possible role of algorithms in future research on contract theory and its applications. Joint work with Paul Duetting and Tim Roughgarden.

Identification of anonymous DNA using genealogical triangulation
Arvind Narayanan, Princeton

Abstract: Consumer genetics databases hold dense genotypes of millions of people, and the number is growing quickly. In 2018, law enforcement agencies began using such databases to identify anonymous DNA via long-range familial searches. We show that this technique is far more powerful if combined with a genealogical database of the type collected by online ancestry services. We present a "genealogical triangulation" algorithm and study its effectiveness on simulated datasets. We show that for over 50% of targets, their anonymous DNA can be identified (matched to the correct individual or same-sex sibling) when the genetic database includes just 1% of the population. We also show the effectiveness of "snowball identification" in which a successful identification adds to the genetic genealogical database, increasing the identification accuracy for future instances. We discuss our technique's potential to enhance law enforcement capabilities as well as its privacy risks.

A Quest for Structure: Joint Graph Structure & Semi-Supervised Inference
Leman Akoglu, CMU

Abstract: Semi-supervised learning (SSL) is effectively used for numerous classification problems, thanks to its ability to make use of abundant unlabeled data. The main assumption of various SSL algorithms is that the nearby points on the data manifold are likely to share a label. Graph-based SSL constructs a graph from point-cloud data as an approximation to the underlying manifold, followed by label inference. It is no surprise that the quality of the constructed graph in capturing the essential structure of the data is critical to the accuracy of the subsequent inference step. How should one construct a graph from the input point-cloud data for graph-based SSL? In this talk I will introduce a Parallel Graph Learning framework (called PG-Learn) that learns the graph structure and infers the labels jointly. PG-Learn has two main ingredients: (1) a gradient-based optimization of the edge weights (specifically, different kernel bandwidths in each dimension) based on a validation loss objective, and (2) a parallel hyper-parameter search algorithm with an adaptive resource allocation scheme. The latter early-terminates searches based on relative performance, in order to dynamically allocate resources (time and processors) to those with promising configurations. Put together, the solution is a hybrid of iterative local and parallel random search that strategically navigates the hyper-parameter space. Importantly, PG-learn is scalable; both in terms of runtime (linear in dimensionality d and log-linear in number of samples n) as well as memory (linear in both n and d).

Information Inundation on Platforms and Implications
Vahideh Manshadi, Yale

Abstract: We study a model of information consumption where consumers sequentially interact with a platform that offers a menu of signals (posts) about an underlying state of the world (fact). At each time, incapable of consuming all posts, consumers screen the posts and only select (and consume) one from the offered menu. We show that in the presence of uncertainty about the accuracy of these posts, and as the number of posts increases, adverse effects such as slow learning and polarization arise. Specifically, we establish that, in this setting, bias emerges as a consequence of the consumer's screening process. Namely, consumers, in their quest to choose the post that reduces their uncertainty about the state of the world, choose to consume the post that is closest to their own beliefs. We study the evolution of beliefs and we show that such a screening bias slows down the learning process, and the speed of learning decreases with the menu size. Further, we show that the society becomes polarized during the prolonged learning process even in situations where the societys belief distribution was not a priori polarized. (Joint work with Gad Allon and Kimon Drakopoulos)

Allocating Resources, in the Future
Sid Banerjee, Cornell

Abstract: The online allocation of scarce resources is a canonical paradigm of decision-making under uncertainty, and is widely studied in many fields of engineering under a variety of titles (dynamic allocation/revenue management/prophet inequalities/etc.). In this talk, I will re-examine the basic online allocation problem, with the aim of building bridges to the ever-improving predictive capabilities enabled by modern machine-learning methods. To this end, I will present a new Bayesian-learning inspired algorithm for online allocation problems, and show that it achieves the first horizon-independent and budget-independent regret bounds for online packing with stochastic inputs. Surprisingly, the result stems from elementary tools - LP sensitivity and basic concentration of measures.

Bio: Sid Banerjee is an assistant professor in the School of Operations Research and Information Engineering (ORIE) at Cornell. His research is on stochastic modeling, and the design of algorithms and incentives for large-scale systems. He got his PhD in ECE from UT Austin, following which he was a postdoctoral researcher in the Social Algorithms Lab at Stanford, as well as a technical consultant at Lyft.

The role of the propensity score in the synthetic control method
Avi Feller, UC Berkeley

Abstract:The synthetic control method is a popular approach for estimating the impact of a treatment on (typically one) treated unit in settings with many control units and observed pre-treatment outcomes for all units. The key idea is to construct a weighted average of control units, known as a synthetic control (SC), that minimizes imbalance of pre-treatment outcomes between the treated unit and synthetic control. Our main result is that synthetic control weights are numerically equivalent to inverse propensity score weights (IPW) with pre-treatment outcomes as covariates and heavy regularization of the propensity score model. Leveraging a primal-dual connection, we map features of the propensity score model to choices about the specification of the original SC optimization problem. In particular, the original SC method, which balances the L2 norm of pre-treatment outcomes, is identical to IPW with an L2 penalty on the propensity score model and with the least feasible regularization; other choices of balance criteria (e.g., L-infinity norm) correspond to other forms of regularization (e.g., Lasso). We then use this numerical equivalence to make several recommendations for practice. First, as in other settings with high-dimensional covariates, we argue that the estimate can be quite sensitive to the type and level of regularization. Where the researcher has knowledge of the data generating process, there are large returns to choosing an appropriate regularization that emphasizes balance on key covariates --- in general, there is little reason to choose the default regularization implied by the original SC approach. We explore several alternative approaches for setting the regularization, including cross-validation. Second, we directly apply existing results from the literature on estimating causal effects with high-dimensional covariates. For instance, once we view SC as an IPW estimator, it is natural to consider an Augmented IPW estimator, combining the SC weights with an outcome model. Third, we conduct extensive simulation studies to assess the performance of these different estimators in practice. Finally, we use these ideas to assess the impact of the 2012 Kansas tax cuts on economic growth, finding persistent negative effects.

Bio: Avi Feller is an assistant professor at the Goldman School, where he works at the intersection of public policy, data science, and statistics. His methodological research centers on learning more from social policy evaluations, especially randomized experiments. His applied research focuses on working with governments on using data to design, implement, and evaluate policies. Prior to his doctoral studies, Feller served as Special Assistant to the Director at the White House Office of Management and Budget and worked at the Center on Budget and Policy Priorities. Feller received a Ph.D. in Statistics from Harvard University, an M.Sc. in Applied Statistics as a Rhodes Scholar at the University of Oxford, and a B.A. in Political Science and Applied Mathematics from Yale University.

Algorithmically exploiting structure or: how I learned to stop worrying and enjoy real-world graphs
C. Seshadhri, UC Santa Cruz

Abstract:As an algorithms researcher, the existence of heuristics for various computational graph problems (like finding cliques) on real-world graphs bothers me to no end. What is it about (say) social networks that make classically hard problems easy? Or, how do algorithms circumvent various lower bounds when it comes to real-world graphs?
This question has no simple answer, and I will present a tale of two stories on this theme. In the first, we show how to practically and provably count cliques in social networks, by exploiting the local density of these graphs. A curious feature of the mathematics is the use of Turan's theorem in extremal combinatorics, to prove correctness of the algorithm. In the second, we show how to estimate the degree distribution of a graph by sampling a tiny portion of it, by exploiting the heavy tailed structure of the degree distribution. This result uses some recent theoretical techniques in sublinear algorithms to simulate uniform random edge queries through uniform random vertices. In each of these results, there is an interesting interplay of combinatorics, randomized algorithms, and social network analysis.

Bio: C. Seshadhri is an assistant professor of Computer Science at the University of California, Santa Cruz. Prior to joining UCSC, he was a researcher at Sandia National Labs, Livermore in the Information Security Sciences department, during 2010-2014. His primary interest is in mathematical foundations of big data, especially modeling and algorithms. By and large, he works at the boundary of theoretical computer science and data mining. His work spans many areas: sublinear algorithms, graph algorithms, graph modeling, and scalable computation for large data sets.

High-dimensional random geometric graphs
Miklos Racz, Princeton

Abstract:I will talk about two natural random geometric graph models, where connections between vertices depend on distances between latent d-dimensional labels. We are particularly interested in the high-dimensional case when d is large. We study a basic hypothesis testing problem: can we distinguish a random geometric graph from an Erdos-Renyi random graph (which has no geometry)? We show that there exists a computationally efficient procedure which is almost optimal (in an information-theoretic sense). The proofs will highlight new graph statistics as well as connections to random matrices. This is based on joint work with Sebastien Bubeck, Jian Ding, Ronen Eldan, and Jacob Richey.

Bio: Miklos is an Assistant Professor in the ORFE Department at Princeton University. His research focuses on probability, statistics, and its applications. Before Princeton, he spent two years as a postdoc in the Theory Group at Microsoft Research, Redmond. He received his PhD in Statistics from UC Berkeley in 2015, where he was advised by Elchanan Mossel. He also obtained an MS in Computer Science from Berkeley. Previously, he received an MS in Mathematics from the Budapest University of Technology and Economics, under the supervision of Marton Balazs and Balint Toth.

Dynamics of Distributed Updating in Fisher Markets
Richard Cole, NYU

Abstract:A major goal in Algorithmic Game Theory is to justify equilibrium concepts from an algorithmic and complexity perspective. One appealing approach is to identify natural distributed algorithms that converge quickly to an equilibrium. This paper established new convergence results for two generalizations of proportional response in Fisher markets. These results stem from a correspondence with mirror descent algorithms for convex optimization. Several of our new results are a consequence of new notions of strong Bregman convexity and of strong Bregman convex-concave functions, and associated linear rates of convergence, which may be of independent interest. Among other results, we analyze a damped generalized proportional response and show a linear rate of convergence in a Fisher market with buyers whose utility functions cover the full spectrum of CES utilities aside the extremes of linear and Leontief utilities; when these utilities are included, we obtain an empirical O(1/T) rate of convergence. Joint work with Yun Kuen Cheung and Yixin Tao

Bio: Richard Cole is a Silver Professor of Computer Science at NYU. His main interest is the design and analysis of algorithms. He currently concentrates on the following area: algorithmic economic market theory and game theory. He has previously worked on string and pattern matching, amortization, parallelism, network and routing problems. He has also been interested in the use of visualization for algorithm explanation and teaching. He served as department chair from 1994-2000, and subsequently, as deputy director of the Courant Institute from 2003-2013 (with one semester as acting director). He was named a Guggenheim Fellow for the 1988-89 academic year, an ACM Fellow in 1998, and a Silver Professor of Computer Science in 2011.

Randomization inference for spillovers in networks
Dean Eckles, MIT

Abstract: Social and behavioral scientists are interested in testing of hypotheses about spillovers (i.e. interference, exogenous peer effects) in social networks; and similar questions may arise in other settings (e.g., biological and computer networks). However, when there is a single network, this is complicated by lack of independent observations. We explore Fisherian randomization inference as an approach to exact finite-population inference, where the main problem is that the relevant hypotheses are non-sharp null hypotheses. Fisherian randomization inference can be used to test these hypotheses either by (a) making the hypotheses sharp by assuming a model for direct effects or (b) conducting conditional randomization inference such that the hypotheses are sharp. I present both of these approaches, the latter of which is developed in Aronow (2012) and our paper (Athey, Eckles & Imbens, 2017). This usually involves selecting some vertices to be "focal" and conditioning on their treatment assignment and/or the assignment of some of all of their network neighbors. The selection of this set can present interesting algorithmic questions; we, for example, make use of greedy methods for finding maximal independent sets. I illustrate these methods with application to a large voter turnout experiment on Facebook (Jones et al., 2017).

Bio: Dean Eckles is a social scientist, statistician, and faculty at the Massachusetts Institute of Technology (MIT). Dean is the KDD Career Development Professor in Communications and Technology, an assistant professor in the MIT Sloan School of Management, and affiliated faculty at the MIT Institute for Data, Systems & Society. He was previously a member of the Core Data Science team at Facebook. Much of his work examines how interactive technologies affect human behavior by mediating, amplifying, and directing social influence - and statistical methods to study these processes. Dean's empirical work uses large field experiments and observational studies. His research appears in the Proceedings of the National Academy of Sciences and other peer-reviewed journals and proceedings in statistics, computer science, and marketing. Dean holds degrees from Stanford University in philosophy (BA), cognitive science (BS, MS), statistics (MS), and communication (PhD).

Building a cooperator
Alex Peysakhovich, Facebook

Abstract: Social dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Building artificially intelligent agents that achieve good outcomes in these situations is important because many real world interactions include a tension between selfish interests and the welfare of others. We show how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice (begin by cooperating), provokable (try to avoid being exploited), and forgiving (try to return to mutual cooperation). We show both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Our construction does not require training methods beyond a modification of self-play, thus if an environment is such that good strategies can be constructed in the zero-sum case (eg. Atari) then we can construct agents that solve social dilemmas in this environment.

Bio: Alex is a Research Scientist at Facebook Artificial Intelligence Research working on both human and machine decision-making. He got his PhD in Behavioral Economics from Harvard University and was a post-doc with David Rand at the Human Cooperation Lab. He spent several years working with the Facebook News Feed team on combining survey and behavioral data, natural language processing for detecting clickbait, and on building tools for advanced experimentation such as heterogeneous treatment effect detection.

Diffusion, Seeding, and the Value of Network Information
Mohammad Akbarpour, Stanford

Abstract: When communicating information to individuals is costly, policymakers try to identify the best 'seeds' to prompt a cascade of information within a social network. Numerous studies have proposed various network-centrality based heuristics to choose initial seeds in a way that is likely to boost diffusion. Here we show that, for a frequently studied diffusion process, randomly seeding s + x individuals can prompt a larger cascade than optimally seeding the best s individuals, for a small x. This suggests that the returns to collecting and analyzing network information to identify the optimal seeds may not be economically significant. Given these findings, practitioners interested in communicating a message to a large number of people may wish to compare the cost of identifying optimal seeds with the cost of informing a few additional people. Moreover, researchers studying network-based seeding heuristics may consider reporting 'extra seeds required by random seeding to achieve the same diffusion' as an easily interpretable information about the magnitude of their results.

Bio: Mohammad Akbarpour is an Assistant Professor of Economics at Stanford Graduate School of Business. His research bridges between economics and computer science, and is focused on market design, networked markets, and the economics of organ markets. Mohammad received his PhD from Stanford University Economics department in 2015 and his B.Sc in Electrical Engineering from Sharif University of Technology in Iran. As a consulting economist at Auctionomics, he has been involved in designing multiple spectrum auction markets around the globe. He has also been an instructor at Khan Academy Farsi, teaching hundreds of video-lectures in high school physics and calculus, and game theory.

Differential Privacy for Growing Databases
Rachel Cummings, Georgia Tech

Abstract: We study the design of differentially private algorithms for adaptive analysis of dynamically growing databases, where a database accumulates new data entries while the analysis is ongoing. We provide a collection of tools for machine learning and other types of data analysis that guarantee differential privacy and accuracy as the underlying databases grow arbitrarily large. We give both a general technique and a specific algorithm for adaptive analysis of dynamically growing databases. Our general technique is illustrated by two algorithms that schedule black box access to some algorithm that operates on a fixed database, to generically transform private and accurate algorithms for static databases into private and accurate algorithms for dynamically growing databases. These results show that almost any private and accurate algorithm can be rerun at appropriate points of data growth with minimal loss of accuracy, even when data growth is unbounded. Our specific algorithm directly adapts the private multiplicative weights algorithm to the dynamic setting, maintaining the accuracy guarantee of the static setting through unbounded data growth. Along the way, we develop extensions of several other differentially private algorithms to the dynamic setting, which may be of independent interest for future work on the design of differentially private algorithms for growing databases.

Bio: Dr. Rachel Cummings is an Assistant Professor in the H. Milton Stewart School of Industrial & Systems Engineering at Georgia Tech. Her research interests lie primarily in data privacy, with connections to machine learning, algorithmic economics, optimization, statistics, and information theory. Her work has focused on problems such as strategic aspects of data generation, incentivizing truthful reporting of data, privacy-preserving algorithm design, impacts of privacy policy, and human decision-making. Dr. Cummings received her Ph.D. in Computing and Mathematical Sciences from the California Institute of Technology, her M.S. in Computer Science from Northwestern University, and her B.A. in Mathematics and Economics from the University of Southern California. She is a recipient of the Amori Doctoral Prize in Computing and Mathematical Sciences, a Simons Award for Graduate Students in Theoretical Computer Science, and the Best Paper Award at the 2014 International Symposium on Distributed Computing. Dr. Cummings also serves on the ACM U.S. Public Policy Council's Privacy Committee.

Lagrangian duality in mechanism design, a crash course.
Nikhil Devanur Microsoft Research

Abstract: Designing optimal (revenue maximizing) auctions in multi-parameter settings has been among the most active areas in algorithmic mechanism design in the last few years. We have discovered that Lagrangian duality is a very useful and versatile tool for this purpose. I will show how to use it to do (a subset of) the following. 1. Derive that the optimal auction is a virtual welfare maximizer. 2. Obtain a fast algorithm for approximating the optimal auction. 3. Show how simple auctions are approximately optimal. 4. Extend the above to handle marginal costs. 5. Characterize optimal auctions for structured environments. 6. Get bounds on the menu-size complexity of optimal auctions.

Bio: Nikhil Devanur is a senior researcher and manager of the Algorithms group in Microsoft Research, Redmond. He is interested in what he calls Automated Economics, which studies the question of how technology can be used to improve the efficiency of economic systems. His other interest is in algorithms: he is interested in designing algorithms that are faster, simpler, work online or in a distributed fashion, for some of the basic combinatorial optimization problems.

Learning Multi-item Auctions with (or without) Samples
Yang Cai, McGill

Abstract: We provide algorithms that learn simple auctions whose revenue is approximately optimal in multi-item multi-bidder settings. We obtain our learning results in two settings. The first is the commonly studied setting where sample access to the bidders' distributions over valuations is given. Here, our algorithms require polynomially many samples in the number of items and bidders. The second is a more general max-min learning setting that we introduce, where we are given ``approximate distributions'', and we seek to compute a mechanism whose revenue is approximately optimal simultaneously for all ``true distributions'' that are close to the ones we were given. These results are more general in that they imply the sample-based results, and are also applicable in settings where we have no sample access to the underlying distributions but have estimated them indirectly via market research or by observation of bidder behavior in previously run, potentially non-truthful auctions. Our results are enabled by new uniform convergence bounds for hypotheses classes under product measures. Our bounds result in exponential savings in sample complexity compared to bounds derived by bounding the VC dimension and are of independent interest.

Here's a link to the paper

Bio: Yang Cai is a William Dawson Assistant Professor of Computer Science at McGill University. He received his Ph.D. in computer science from MIT, advised by Costis Daskalakis. He was a postdoctoral researcher at UC Berkeley. Yang's research interests include economics and computation, learning, statistics and probability, as well as online algorithms.

Prophet Inequalities Made Easy: Stochastic Optimization by Pricing Non-Stochastic Inputs
Michal Feldman, Tel-Aviv University

Abstract: We present a general framework for stochastic online maximization problems with combinatorial feasibility constraints. The framework establishes prophet inequalities by constructing price-based online approximation algorithms, a natural extension of threshold algorithms for settings beyond binary selection. Our analysis takes the form of an extension theorem: we derive sufficient conditions on prices when all weights are known in advance, then prove that the resulting approximation guarantees extend directly to stochastic settings. Our framework unifies and simplifies much of the existing literature on prophet inequalities and posted price mechanisms, and is used to derive new and improved results for combinatorial markets (with and without complements), multi-dimensional matroids, and sparse packing problems. Finally, we highlight a surprising connection between the smoothness framework for bounding the price of anarchy of mechanisms and our framework, and show that many smooth mechanisms can be recast as posted price mechanisms with comparable performance guarantees.

Here's a link to the paper

Bio: Michal Feldman is a Professor of computer science in the Blavatnik School of Computer Science at Tel Aviv University and a researcher at Microsoft Research (MSR) Herzliya. Her research focuses on the intersection of computer science, game theory and microeconomics. She received her Ph.D. from the University of California at Berkeley in 2005, and did her postdoc at the Hebrew University (2005-07). She was a faculty member in the School of Business Administration and the Center for the study of rationality at the Hebrew University (2007-13), and a visiting professor at Harvard University and Microsoft Research New England (2011-13). She serves on the editorial board of various journals, including GEB, MOR, JCSS and ACM TEAC. She is the vice chair of ACM SIGEcom, and served as the PC chair of ACM Conference on Economics and Computation 2015. She is the recipient of various grants and fellowships, including ERC (European Research Council), Marie Curie IOF, Alon, and ISF. She is a member of the Israeli Young Academy and an alumna of the Global Young Academy.

Biases Beyond Observation
Moritz Hardt, University of California, Berkeley

Abstract: Most proposed fairness measures for machine learning are observational: They depend only on the joint distribution of the features, predictor, and outcome. I will highlight a few useful observational criteria, before arguing why observational criteria in general are unable to resolve questions of fairness conclusively. Moving beyond observational criteria, I will outline a causal framework for reasoning about discrimination based on sensitive characteristics.

Bio: Moritz Hardt is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. After obtaining a PhD in Computer Science from Princeton University in 2011, he was a postdoctoral scholar and research staff member at IBM Research Almaden, followed by two years as a research scientist at Google Research and Google Brain. Hardt's research aims to make the practice of machine learning more robust, reliable, and aligned with societal values.

Bottleneck links, essential intermediaries, and competing paths of diffusion in networks.
Mihai Manea, MIT

Abstract: We investigate how information goods are priced and diffused over links in a network. Buyers have idiosyncratic consumption values for information and, after acquiring it, can replicate it and resell copies to uninformed neighbors. A partition of the network captures the effects of network architecture and locations of information sellers on player profits and the structure of competing diffusion paths. Sellers indirectly appropriate profits over intermediation chains from buyers in their block of the partition. Links within blocks are critical for connecting the network and constitute bottlenecks for information diffusion. Links bridging distinct blocks are redundant for diffusion and impose negative externalities on sellers. Information enters each block not containing a seller via a single node - the dealer of the block. Dealers can receive information over redundant links from multiple neighbors and benefit from competitive pricing. Every non-dealer buyer can acquire information from a single neighbor via a bottleneck link and is subject to a monopoly. In dense networks, competition limits the scope of indirect appropriability, and intellectual property rights foster innovation.

Bio: Mihai Manea is a game theorist focusing on social and economic networks. He graduated from Princeton University in 2005 and received a PhD in economics from Harvard University in 2009. He has taught at MIT and Yale University and is currently visiting the economics department at Stanford University.

On fairness in supervised learning
Manuel Gomez Rodriguez, Max Planck Institute for Software Systems

Abstract: Algorithmic decision making is increasingly being used to assist or replace humans in many online as well as offline settings. These systems often rely on historical decisions, often taken by humans, to optimize functionality, satisfaction of the end user and profitability. However, there is a growing concern that these automated decisions can lead, even in the absence of intent, to a lack of fairness, i.e., their outcomes can disproportionally hurt (or benefit) particular groups of people sharing one or more sensitive attributes (e.g., race, sex). In this talk, I will first discuss several notions of fairness in the context of supervised learning and then introduce a flexible mechanism to design fair boundary-based classifiers by leveraging an intuitive measure of decision boundary (un)fairness. I will then show that this mechanism can be easily incorporated into the formulation of several well-known margin based classifiers as convex or convex-concave constraints and it allows for a fine-grained control on the degree of fairness, often at a small cost in terms of accuracy.

Bio:Manuel Gomez Rodriguez is a tenure-track faculty at Max Planck Institute for Software Systems. Manuel develops machine learning and large-scale data mining methods for the analysis, modeling and control of large social and information online systems. He is particularly interested in the creation, acquisition and/or dissemination of reliable knowledge and information, which is ubiquitous in the Web and social media, and has received several recognitions for his research, including an Outstanding Paper Award at NIPS 2013 and a Best Research Paper Honorable Mention at KDD 2010 and WWW 2017. Manuel holds a BS in Electrical Engineering from Carlos III University in Madrid (Spain), a MS and PhD in Electrical Engineering from Stanford University, and has received postdoctoral training at the Max Planck Institute for Intelligent Systems. You can find more about him at .

Blind Regression, Recommendation System and Collaborative Filtering
Devavrat Shah, MIT

We discuss the framework of Blind Regression (also known as Latent Variable Model) motivated by the problem of Matrix Completion for recommendation systems: given a collection of users and movies, the goal is to predict the unknown rating of a user for a movie using the known observations, i.e. complete the partially observed matrix. We posit that each user and movie is associated with their latent feature, and the rating of a user for a movie equals the noisy version of latent function applied to the associated latent features. Therefore, completing the matrix boils down to predicting the latent function value for user-movie pairs for which ratings are unknown, just like the classical regression setting. However, unlike the setting of regression, features are not observed here -- hence "Blind" Regression. Such a model arises as a canonical characterization due to multi-dimensional exchangeability property a la Aldous and Hoover (early 1980s).

In this talk, using inspiration from the classical Taylor's expansion for differentiable functions, we shall propose a prediction algorithm that is consistent for all Lipschitz continuous functions. We provide finite sample analysis that suggests that even when observing a vanishing fraction of the matrix, the algorithm produces accurate predictions. We discuss relationship with spectral algorithm for matrix completion, and the collaborative filtering.

The talk is based on joint works with Christina Lee, Yihua Li and Dogyoon Song (MIT).

Bio: Devavrat Shah is a Professor with the department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology. His current research interests are at the interface of Statistical Inference and Social Data Processing. His work has been recognized through prize paper awards in Machine Learning, Operations Research and Computer Science, as well as career prizes including 2010 Erlang prize from the INFORMS Applied Probability Society and 2008 ACM Sigmetrics Rising Star Award. He is a distinguished young alumni of his alma mater IIT Bombay.

Data-Science-Driven Products at Facebook
Udi Weinsberg, Facebook

This talk will give an overview of a range of data-driven products that the Core Data Science group helped building, mostly by applying machine learning and statistical methods on large-scale data. We'll talk about analysis of cascades and product adoption, identifying trends in real-time, fighting scams, understanding the true meanings behind emoji, and figuring out how people laugh online.

Bio: Udi Weinsberg leads the Algorithms group in Core Data Science in Facebook. The group helps product teams across Facebook to tackle difficult product problems and deliver new features by leveraging vast amounts of data together with a range of machine learning techniques. Before Facebook, Udi was a senior researcher at Technicolor, working on privacy in machine learning using cryptographic methods.

Large-scale Machine Learning: Theory and Practice
Anima Anandkumar, Amazon and Caltech

Large-scale machine learning requires blending computational thinking with statistical frameworks. Designing fast, efficient and distributed learning algorithms with statistical guarantees is an outstanding grand challenge. I will present perspectives from theory and practice. I will demonstrate how spectral optimization can reach the globally optimal solution for many learning problems despite being non-convex. This includes unsupervised learning of latent variable models, training neural networks and reinforcement learning of partially observable Markov decision processes. In practice, tensor methods yield enormous gains both in running times and learning accuracy over traditional methods such as variational inference. I will then talk about the recent advances in large-scale deep learning methods. Our lab at AWS is actively innovating on the MXNet package. It is a highly flexible and developer-friendly open-source deep learning framework designed for both efficiency and flexibility. It is based on the distributed parameter-server framework. I will demonstrate how to use preconfigured Deep Learning AMIs and CloudFormation Templates on AWS to help speed up deep learning research and development. I will conclude on outstanding challenges on how we can bridge the gaps between theory and practice, and how we can design and analyze large-scale learning algorithms.

Bio: Anima Anandkumar is currently a principal scientist at Amazon Web Services. She will be joining Caltech CMS department in summer 2017 as a Bren endowed chair. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. She is the recipient of several awards such as the Alfred. P. Sloan Fellowship, Microsoft Faculty Fellowship, Google research award, ARO and AFOSR Young Investigator Awards, NSF Career Award, Best Thesis Award from the ACM Sigmetrics society, IBM Fran Allen PhD fellowship, and several best paper awards. She has been featured in a number of forums such as the Quora ML session, Huffington post, Forbes, O'Reilly media, and so on. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, an assistant professor at U.C. Irvine between 2010 and 2016, and a visiting researcher at Microsoft Research New England in 2012 and 2014.

Learning and Efficiency of Outcomes in Games
Eva Tardos, Department of Computer Science, Cornell University

Selfish behavior can often lead to suboptimal outcome for all participants, a phenomenon illustrated by many classical examples in game theory. Over the last decade we developed good understanding on how to quantify the impact of strategic user behavior on the overall performance in many games (including traffic routing as well as online auctions). In this talk we will focus on games where players use a form of learning that helps them adapt to the environment, and consider two closely related questions: What are broad classes of learning behaviors that guarantee that game outcomes converge to the quality guaranteed by the price of anarchy, and how fast is this convergence. Or asking these questions more broadly: what learning guarantees high social welfare in games, when the game or the population of players is dynamically changing.

Bio: Eva Tardos is a Jacob Gould Schurman Professor of Computer Science at Cornell University, was Computer Science department chair 2006-2010. She received her BA and PhD from Eotvos University in Budapest. She joined the faculty at Cornell in 1989. Tardos's research interest is algorithms and algorithmic game theory. She is most known for her work on network-flow algorithms, approximation algorithms, and quantifying the efficiency of selfish routing. She has been elected to the National Academy of Engineering, the National Academy of Sciences, the American Academy of Arts and Sciences, and is an external member of the Hungarian Academy of Sciences. She is the recipient of a number of fellowships and awards including the Packard Fellowship, the Goedel Prize, Dantzig Prize, Fulkerson Prize, and the IEEE Technical Achievement Award. She is editor editor-in-Chief of the Journal of the ACM, and was editor in the past of several other journals including the SIAM Journal of Computing, and Combinatorica, served as problem committee member for many conferences, and was program committee chair for SODA'96, FOCS'05, and EC'13.

Supply Chain Disruptions: Evidence from the Great East Japan Earthquake
Alireza Tahbaz-Salehi, Columbia

Exploiting the exogenous and regional nature of the Great East Japan Earthquake of 2011, this paper provides a systematic quantification of the role of input-output linkages as a mechanism for the propagation and amplification of shocks. We document that the disruption caused by the earthquake and its aftermaths propagated upstream and downstream supply chains, affecting the direct and indirect suppliers and customers of disaster-stricken firms. We then use our empirical findings to obtain an estimate for the overall macroeconomic impact of the shock by taking these propagation effects into account. We find that the propagation of the shock over input-output linkages can account for a 1.2 percentage point decline in Japan's gross output in the year following the earthquake. We interpret these findings in the context of a general equilibrium model that takes the firm-to-firm linkages into account explicitly. Link

Quantifying Tradeoffs between Fairness and Accuracy in Online Learning
Aaron Roth, Department of Computer Science, University of Pennsylvania

In this talk, I will discuss our recent efforts to formalize a particular notion of "fairness" in online decision making problems, and study its costs on the achievable learning rate of the algorithm. Our focus for most of the talk will be on the "contextual bandit" problem, which models the following scenario. Every day, applicants from different populations submit loan applications to a lender, who must select a subset of them to give loans to. Each population has a (potentially different) mapping from applications to credit-worthiness that is initially unknown to the lender. The fairness constraint we impose roughly translates to: "Less credit worthy individuals should never be preferentially favored over more credit worthy individuals, regardless of group membership". Despite the fact that this constraint seems consistent with the profit motivation of the bank, we show that imposing a fairness constraint provably slows down learning --- sometimes only mildly, but sometimes substantially, depending on the structure of the problem. Time permitting, we will mention recent extensions to the reinforcement learning setting in which the actions of the learner can affect its environment, and to economic settings in which the learner must be incentivized by a principal to act fairly.

Joint Work with Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Seth Neel

Bio: Aaron Roth is the class of 1940 Bicentennial Term associate professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and co-director of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book "The Algorithmic Foundations of Differential Privacy."

Exploiting the Natural Exploration in Online Decision-Making
Mohsen Bayati, Graduate School of Business, Stanford University

Growing availability of data has enabled practitioners to tailor decisions at the individual level. This involves learning a model of decision outcomes conditional on individual-specific covariates (contexts). Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that balance an exploration-exploitation tradeoff, since greedy policies that exploit current estimates without any exploration may be sub-optimal in general. However, exploration-free greedy policies are desirable in many practical settings where experimentation may be prohibitively costly or unethical (e.g. clinical trials).

In this talk we show that, for a general class of context distributions, the greedy policy benefits from a natural exploration obtained from the varying contexts and becomes asymptotically optimal under some assumptions on problem parameters. Motivated by these results, we introduce Greedy-First, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy policy or to explore. We prove that this algorithm is asymptotically optimal without any additional assumptions. Through simulations we demonstrate that Greedy-First successfully reduces experimentation and outperforms existing (exploration-based) algorithms.

Joint work with Hamsa Bastani and Khashayar Khosravi

Competitive Division of a Mixed Manna
Herve Moulin, University of Glasgow

A mixed manna contains goods (that everyone likes), bads (that everyone dislikes), as well as items that are goods to some agents, but bads or satiated to others. If all items are goods and utility functions are homogeneous of degree one, concave (and monotone), the competitive division maximizes the Nash product of utilities (Gale-Eisenberg): hence it is welfarist (determined by the set of feasible utility profiles), unique, continuous and easy to compute. We show that the competitive division of a mixed manna is still welfarist. If the zero utility profile is Pareto dominated, the competitive profile is strictly positive and still uniquely maximizes the product of utilities. If the zero profile is unfeasible (for instance if all items are bads), the competitive profiles are strictly negative, and are the critical points of the product of disutilities on the efficiency frontier. The latter allows for multiple competitive utility profiles, from which no single-valued selection can be continuous or Resource Monotonic. Thus the implementation of competitive fairness under linear preferences in interactive platforms like SPLIDDIT will be more difficult when the manna contains bads that overwhelm the goods.

Bio: Herve Moulin graduated in 1971 from the Ecole Normale Superieure in Paris, and received his Ph.D. in mathematics from the Universite de Paris in 1975. He has taught at the Ecole Nationale de la Statistique et Administration Economique, University of Paris at Dauphine, Virginia Polytechnic Institute and State University, Duke University and Rice University, and currently at the University of Glasgow. He is a Fellow of the Econometric Society since 1983. His work has contributed to redefining the field of 'normative economics', poised to invent new resource allocation mechanisms - or refine existing ones - in a variety of contexts: voting rules; the fair division of assets (as in a divorce or inheritance); rationing of over-demanded commodities (such as organs for transplant or seats for a popular event); the exploitation of a "commons"; the assignment of tasks between workers; the scheduling of jobs in a queue; sharing the cost and pricing the traffic of a communication network, etc.

Using Optimization to Balance Fairness and Efficiency in Barter Markets
John Dickerson, University of Maryland

The exchange of indivisible goods without money addresses a variety of constrained economic settings where a medium of exchange - such as money - is considered inappropriate. Participants are either matched directly with another participant or, in more complex domains, in barter cycles and chains with other participants before exchanging their endowed goods. We show that techniques from computer science and operations research, combined with the recent availability of massive data and inexpensive computing, can guide the design of such matching markets and enable the markets by running them in the real world. A key application domain for our work is kidney exchange, an organized market where patients with end-stage renal failure swap willing but incompatible donors. We present new models that address three fundamental dimensions of kidney exchange: (i) uncertainty over the existence of possible trades, (ii) balancing efficiency and fairness, and (iii) inherent dynamism. Next, we combine these dimensions, along with high-level human-provided guidance, into a unified framework for learning to match in a general dynamic setting. This framework, which we coin FutureMatch, takes as input a high-level objective (e.g., "maximize graft survival of transplants over time") decided on by experts, then automatically learns based on data how to make this objective concrete and learns the "means" to accomplish this goal - a task that, in our experience, humans handle poorly. Throughout, we draw on insights from our work with the United Network for Organ Sharing (UNOS) US-wide exchange and experiments on data from the National Health Service UK-wide exchange.

Bio: John P. Dickerson is an Assistant Professor of Computer Science at the University of Maryland, and a recent CS PhD graduate of Carnegie Mellon University. His research centers on solving practical economic problems using techniques from computer science, stochastic optimization, and machine learning. He has worked extensively on theoretical and empirical approaches to kidney exchange, where his work has set policy at the UNOS nationwide exchange; game-theoretic approaches to counter-terrorism and negotiation, where his models have been deployed; and computational advertising through Optimized Markets, a CMU spin-off company. He created FutureMatch, a general framework for learning to match subject to human value judgments; that framework won a 2014 HPCWire Supercomputing Award. Prior to his Ph.D., he worked at IBM and in R&D at a defense agency. He is an NDSEG Fellow, Facebook Fellow, and Siebel Scholar.

Computational Social Choice: For the People
Ariel Procaccia, CMU

Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors:, a website that offers provably fair solutions to everyday problems; and, which provides optimization-driven voting methods. Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.

Bio: Ariel Procaccia is an Assistant Professor in the Computer Science Department at Carnegie Mellon University, and an Affiliated Faculty in the Machine Learning Department. He usually works on problems at the interface of computer science and economics. His distinctions include the IJCAI Computers and Thought Award (2015), the Sloan Research Fellowship (2015), the NSF Faculty Early Career Development Award (2014), and the IFAAMAS Victor Lesser Distinguished Dissertation Award (2009); as well as multiple paper awards including Best Paper (2016) and Best Student Paper (2014) at the ACM Conference on Economics and Computation (EC). He is co-editor of the Handbook of Computational Social Choice (Cambridge University Press, 2016).

Timing "versus" Matching: Dynamic Matching on Stochastic Networks
Mohammad Akbarpour, Stanford

We introduce a dynamic analogue of Erdos-Renyi random graph model and study the problem of finding the "maximum matching" in such dynamic graph. In our model, agents arrive and depart stochastically, and the composition of the trade graph depends endogenously on the matching algorithm. We show that if the planner can identify agents who are about to depart, then waiting to thicken the market is highly valuable, and if the planner cannot identify such agents, then matching agents greedily is close to optimal. The planner's decision problem in our model involves a combinatorially complex state space. However, we show that simple local algorithms that choose the right time to match agents, but do not exploit the global graph structure, can perform close to complex optimal algorithms. Finally, we consider a setting where agents have private information about their departure times, and design a continuous-time dynamic mechanism to elicit this information.

Joint work with Shengwu Li and Shayan Oveis Gharan

Bio: Mohammad Akbarpour is an Assistant Professor of Economics at Stanford GSB. He completed his PhD at Stanford University Economics department in 2015, then spent the 2015-2016 academic year at the University of Chicago Economics department, before coming back to Stanford in September 2016.

Network reporting methods for sample surveys
Dennis Feehan, UC Berkeley

Surveys have traditionally been based on the idea that researchers can estimate characteristics of a population by obtaining a sample of individuals and asking them to report about themselves. Network reporting surveys generalize this traditional approach by asking survey respondents to report about members of their personal networks. Network reporting surveys can be used to study many important rare and hidden populations for which traditional survey methods are inadequate; for example, the approach has been used to estimate the size of epidemiologically important groups like sex workers, drug injectors, and men who have sex with men. I will introduce a framework for developing estimators from network reporting surveys and then I will present some results from a nationally-representative survey experiment that my colleagues and I conducted in Rwanda.

Bio: Dennis Feehan is an Assistant Professor in the Department of Demography at UC Berkeley. He completed his PhD at Princeton University in 2015 and spent the fall/winter of 2015 as a postdoc at Facebook before starting at Berkeley last January.

Online Learning in Repeated Auctions
Philippe Rigollet, MIT

Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical. Comparing our performance against that of the best fixed bid in hindsight, we show that sublinear regret is also achievable in this case. For both the stochastic and adversarial models, we prove matching minimax lower bounds showing our strategies to be optimal up to lower-order terms. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type

Joint with Jonathan Weed, and Vianney Perchet.

Bio: Philippe Rigollet

Near Feasible Stable Matchings
Rakesh Vohra, University of Pennsylvania

Stable matchings when there are complementarities in the preferences of one side need not exist. One instance of where this might matter is in the National Resident Matching Program (NRMP). With the participation of couples, who have preferences over pairs of hospital slots, stable matchings need not exist. Indeed, the problem of determining whether a stable matching exists in the presence of couples is NP-complete. For any student preferences, we show that each instance of a matching problem has a `nearby' instance with a stable matching. The nearby instance is obtained by perturbing the capacities of the hospitals. Specifically, the capacity of each hospital will change (up or down) by no more than 4 and the total capacity of all hospitals will not go down and not down or increase by more than 9. These `perturbation’ bounds are independent of the size of the instance. We think the technique used to arrive at this result of independent interest and should be useful elsewhere. It involves a combination of Scarf’s lemma and randomized rounding.

Joint work with Thanh Nguyen

Bio: Professor Vohra is known for his application of mathematical programming techniques to the study of Game Theory and Mechanism Design. He formerly taught at Northwestern University, where he was the John L. and Helen Kellogg Professor of Managerial Economics and Decision Sciences in the Kellogg School of Management, with additional appointments in the Department of Economics and the Department of Electrical Engineering and Computer Science. He came to Penn as part as of the Penn Integrates Knowledge program that President Amy Guttmann established in 2005 as a University-wide initiative to recruit exceptional faculty members whose research and teaching exemplify the integration of knowledge across disciplines. His appointment is shared between the Department of Economics in the School of Arts and Sciences and the Department of Electrical and Systems Engineering in the School of Engineering and Applied Science.

Optimization in Online Content Recommendation Services: Beyond Click-Through Rates
Yonatan Gur, Stanford Graduate School of Business

A new class of online services allows internet media sites to direct users from articles they are currently reading to other content they may be interested in. This process creates a “browsing path” along which there is potential for repeated interaction between the user and the provider, giving rise to a dynamic optimization problem. A key metric that often underlies this recommendation process is the click-through rate (CTR) of candidate articles. While CTR is a measure of instantaneous click likelihood, we analyze the performance improvement that one may achieve by some lookahead that accounts for the potential future path of users. To that end, using a large data set of user path history at major media sites, we introduce and derive a representation of content along two key dimensions: clickability, the likelihood to click to an article when it is recommended; and engageability, the likelihood to click from an article when it hosts a recommendation. We then propose a class of heuristics that leverage both clickability and engageability, and provide theoretical support for favoring such path-focused heuristics over myopic heuristics that focus only on clickability (no lookahead). We conduct a live pilot experiment that measures the performance of a practical proxy of our proposed class, when integrated into the operating system of a worldwide leading provider of content recommendations. We estimate the aggregate improvement in clicks-per-visit relative to the CTR-driven current practice. The documented improvement highlights the importance and the practicality of efficiently incorporating for the future path of users in real time.

Joint work with Omar Besbes and Assaf Zeevi.

Bio: Yonatan Gur is an Assistant Professor of Operations, Information, and Technology at Stanford University’s Graduate School of Business. His research interests include stochastic modeling and optimization, data-driven decision-making, machine learning, and game theory with applications to revenue management, pricing, operations management, and service systems, and with a particular emphasis on dynamic, online environments. Yonatan received his PhD in Decision, Risk, and Operations from Columbia Business School. He also holds a B.Sc. from the school of Physics and Astronomy and an M.Sc. from the School of Mathematical Sciences, Tel Aviv University.

Modeling of Personal Mobility Behavioral Patterns
Irad Ben-Gal, Tel Aviv University

In recent years, personal location data is continuously captured by mobile devices, GPS chips and other sensors. Such data provides a unique learning opportunity on individuals’ mobility behavior that can be used for various applications in transportation, marketing, homeland security, smart cities and more. Nonetheless, modeling such data poses new challenges related to data volume, diversity, inhomogeneity as well as the required granularity level. In this talk, we will address a real ‘smart city’ use-case and cover some of the associated opportunities and challenges. We will present a new set of mobility-behavior models that generalizes Markov Chains and Variable-Order Bayesian Networks. We will discuss how they can be used in different applications such as pattern recognition, anomaly detection, clustering and classification.


The Impacts of Neighborhoods on Intergenerational Mobility
Raj Chetty, Stanford University

How can we improve economic opportunities for low-income children in the United States? We first characterize how rates of upward mobility vary across counties using data from tax records covering the U.S. population. We then study families who moved across areas to identify test whether the geographic variation is driven by the causal effects of place or heterogeneity in the types of people who live in each place. We find that every year of exposure to a better environment improves a given child’s chances of success, both in a national quasi-experimental study of five million families and in a re-analysis of the Moving to Opportunity Experiment. Finally, we present estimates of the causal effect of each county in America on upward mobility and characterize the features of high-mobility areas. Papers available at:

Bio: Raj Chetty joined Stanford as a Professor of Economics in 2015. Chetty's research combines empirical evidence and economic theory to help design more effective government policies. His current research focuses on equality of opportunity: how can we give children from disadvantaged backgrounds better chances of succeeding? Chetty's most recent honors include a MacArthur Fellowship and the John Bates Clark medal.

Orchestrating Collective Intelligence to Modernize Statistical Data Collection
Joe Reisinger, Premise Data

Transparency and access to information is critical to high functioning economies. However, as the world becomes more interconnected, complex and volatile, our capacity to perform robust statistical measurement is becoming increasingly fragmented. Premise is building machinery for performing evidence-based surveys at a global scale, leveraging a network of 30,000 data contributors across 34 countries to capture structured information using mobile technology. This talk will focus on the call for modernizing data collection and why the new frontier of mobile technology is the conduit for the future of business and economics. On the platform-side, several open areas under active research will be highlighted: optimizing task allocation for quality and budget; designing for both exploration and exploitation; and combatting fraud and collusion on a network.

Bio: Joe Reisinger is the CTO and co-founder of Premise, a mobile technology platform orchestrating and crowdsourcing the collection of micro-data for precise and robust economic measurement. He holds a PhD in Computer Science from the University of Texas and spent his academic career building natural language understanding systems at Google Research and IBM T.J. Watson. Prior to co-founding Premise, he was Chief Scientist at Metamarkets.

Scaling up the Web Transparency: Algorithms and Challenges
Augustin Chaintreau, Columbia University

What are they doing with our data?’’ is what transparency ought to answer. The web giants – e.g. Google, Amazon, and Facebook, but also many other companies - leverage user data for personalization (recommendations, targeting advertisements, and increasingly often adjusting prices). Users currently have little insight, and at best coarse information, to monitor how and for which purposes their data are being used. What if instead one could tell you exactly which item - whether an email you wrote, a search you made, or a webpage you visit - has been used to decide on a targeted ad or a recommended product? But can we track our data at scale, and in environments we do not control? In this talk, we argue - unfortunately through real examples - that without web transparency the exciting world open with large collection of personal data compromises our longstanding commitment to fairness and equal opportunities. Fortunately, as we will show, web transparency may be (at least partially) restored. We built the first fine-grained tracking system for personal data that scales, and prove it successful in multiple services (Gmail, Amazon, Youtube). As we will characterize its performance, new hopes but also new challenges emerge to reconcile big data with accountability and societal progress. (based on joint works with Mathias Lécuyer, Riley Spahn, Guillaume Ducoffe and Roxana Geambasu)

Bio: Augustin is an Assistant Professor of Computer Science at Columbia University since 2010, where he directs the Mobile Social Lab. He designs algorithms leveraging social mobile behaviors or incentives, to reconcile the risk and value of personal data networking. His research addressing transparency in personalization, fairness in personal data markets, efficiency in social information sharing, cross domain data linkage, and human mobility lead to 25 papers in tier-1 conferences (five receiving best or best student paper awards at ACM CoNEXT, SIGMETRICS, USENIX IMC, IEEE MASS, Algotel, some with large media coverage such as the NYT blog). An ex student of the Ecole Normale Supérieure in Paris, he earned a Ph.D in mathematics and computer science in 2006, the NSF CAREER Award in 2013 and the ACM SIGMETRICS Rising star award in 2013. He has been an active member of the network research community, serving in the program committees of ACM SIGMETRICS, SIGCOMM, WWW, CoNEXT (as chair), MobiCom, MobiHoc, IMC, WSDM, WWW, COSN, AAAI ICWSM, and IEEE Infocom, and as area editor for IEEE TMC and ACM SIGCOMM CCR.

Prediction Market Trading Strategies: manipulation and forecasting
David Rothschild, Microsoft Research

Prediction markets offer the promise of high time granularity studies of subjective expectations over major and minor events. Further, they help us design the future of markets and non-probability polling. In this paper we illustrate the lessons from order books of real-money, election related prediction markets that show persistent arbitrage opportunities. First, examining a randomized field trial of actual trades we demonstrate that arbitrage opportunities are significantly larger than the order book indicates. Second, examining transaction-level data for all trades (nearly 300,000 over 6,300 users) in one of the exchanges we present evidence suggestive of market manipulation by a single large trader. Third, we demonstrate four additional price misalignments resulting from biased trading and implicitly asymmetric trading costs. Finally, we explain how markets are still accurate, but also detail how to improve the information flow in prediction and financial markets. And tie those lessons to the growing field of non-probability polling.

Bio: David Rothschild is an economist at Microsoft Research. He has a Ph.D. in applied economics from the Wharton School of Business at the University of Pennsylvania. His primary body of work is on forecasting, and understanding public interest and sentiment. Related work examines how the public absorbs information. He has written extensively, in both the academic and popular press, on polling, prediction markets, social media and online data, and predictions of upcoming events; most of his popular work has focused on predicting elections and an economist take on public policy. He correctly predicted 50 of 51 Electoral College outcomes in February of 2012, average of 20 of 24 Oscars from 2013-6, and 15 of 15 knockout games in the 2014 World Cup.

Collective Phenomena in Complex Networks: Coordination and Social Learning
Ali Jadbabaie, University of Pennsylvania

In this talk, I will present a high-level overview of my research in the past decade on collective behavior in networked, social, and engineering systems. these collective phenomena include social aggregation in animals such as schooling, herding, and flocking, and more broadly emergence of consensus, swarming, and synchronization in complex network of interacting dynamic systems. A common underlying theme in this line of study is to understand how a desired global behavior such as consensus, synchronization or a particular formation can emerge from purely local interactions. The evolution of these ideas into social systems has lead to a new theory of collective decision making among strategic agents. Examples include participation decisions in uprisings, social cascades and investment decisions in infrastructure. I will investigate distributed strategies for information aggregation, social learning and detection problems in networked systems where heterogeneous agents with different observations (with varying quality and precision) coordinate to learn a true state (e.g., finding aggregate statistics or detecting faults and failure modes in spatially distributed wireless sensor networks, or deciding suitability of a political candidate, quality of a product,and forming opinions on social issues of the day in social networks) using a stream of private observations and interaction with neighboring agents. I will end the talk with a description of contagion phenomena in networked systems and a new vision for graduate education at the interface of information and decision systems, data science and social sciences.

Bio: Ali Jadbabaie is the Alfred Fitler Moore Professor of Network Science in the department of Electrical and Systems Engineering at the University of Pennsylvania (Penn). He holds secondary appointments in the departments of Computer and Information Science (CIS) in Penn Engineering, and Operations, Information, and Decisions (OID) in the Wharton School. A faculty member in Penn’s (GRASP) Lab, he is also the co-founder and director of the Raj and Neera Singh Program in Networked & Social Systems (NETS) at Penn Engineering. NETS is a new undergraduate interdisciplinary degree program focused on network science and engineering, operations research, social phenomena and social and technological networks. From January to October 2015, Ali was on sabbatical at LIDS at MIT. From October 2015 to January 2016, he took a leave from Penn to lead the Sociotechnical Systems Research Center at MIT as its interim director. During this period, he served as the Associate Director of MIT’s newly formed Institute for Data, Systems and Society (IDSS), where he helped create a newly-approved PhD program in Social and Engineering Systems and shaped the intellectual agenda of the newly formed Institute. Ali received his B.S. with High Honors from Sharif University of Technology, his M.S. in Electrical and Computer Engineering from the University of New Mexico in Albuquerque, and his Ph.D. in Control and Dynamical Systems from the California Institute of Technology (Caltech). He was a postdoctoral scholar at Yale University for a year before joining the faculty of the University of Pennsylvania (Penn) in July 2002. Aside from GRASP Lab and the NETS program, Ali is also a faculty member of The Warren Center for Network & Data Sciences at Penn and a faculty affiliate of the Center for Technology, Innovation and Competition at Penn Law School. Ali is the inaugural editor-in-chief of IEEE Transactions on Network Science and Engineering, a new interdisciplinary journal sponsored by 5 IEEE Societies, an Associate Editor of the INFORMS journal Operations Research and a former Associate Editor of IEEE Transactions on Control of Network Systems. He is a recipient of a Career Award from NSF, an ONR Young Investigator Award, the O. Hugo Schuck Best Paper Award of the American Automatic Control Council, and the George S. Axelby Best Paper Award of the IEEE Control Systems Society. He is also an IEEE Fellow. He has graduated 10 PhD students, and has served as mentor of 5 postdoctoral scholars who have become research leaders both in idustry and academia. His current research interests include decision and control theory with a focus on distributed optimization and control, collective behavior, network science, and the study of collective behavior in engineering systems and social and economic networks.

Conversational Peers and Idea Generation: Evidence from a Field Experiment
Sharique Hasan, Stanford Graduate School of Business

Abstract: Social interaction is thought to affect individual and organizational innovation. We argue that individuals and teams are better able to generate high quality ideas if they converse with peers who provide them with new insight, perspective and information. Further, we argue that not all individuals can equally capitalize on this new information. Specifically, extroverted peers, because they are more willing to share and transmit their experiences facilitate idea generation. Moreover, innovators who are open to experience absorb this new information better and can more effectively convert it to better ideas. We test our claims using novel data from a randomized field experiment at an entrepreneurship bootcamp where individuals were randomized to both teams and conversational peers. We find that conversations with extroverted peers help individuals generate higher quality ideas, more open individuals benefit more from such peers, and teams with more cohesion can convert these raw ideas into better performance.

Bio: Sharique Hasan is an Associate Professor of Organizational Behavior in the Graduate School of Business at Stanford University. His research focuses on understanding how social interaction affects processes such as learning, network dynamics, creativity and entrepreneurship. To study these processes he conducts field experiments.

Crowdsourcing Societal Tradeoffs
Vincent Conitzer, Duke University

It would be desirable if, as a society, we could reduce the amount of landfill trash we create, the amount of carbon dioxide we emit, the amount of forest we clear, etc. Since we cannot (or are in any case not willing to) simultaneously pursue all these objectives to their maximum extent, we must prioritize among them. Currently, this is done mostly in an ad-hoc manner, with people, companies, local governments, and other entities deciding on an individual basis which of these objectives to pursue, and to what extent.

A more systematic approach would be to set, at a global level, exact numerical tradeoffs: using one gallon of gasoline is as bad as creating x bags of landfill trash. Having such tradeoffs available would greatly facilitate decision making, and reduce inefficiencies resulting from inconsistent decisions across agents. But how could we arrive at a reasonable value for x? I will discuss this from the viewpoint of computational social choice.

Joint work with Rupert Freeman, Markus Brill, and Yuqian Li

Bio: Vincent Conitzer is the Kimberly J. Jenkins University Professor of New Technologies and Professor of Computer Science, Professor of Economics, and Professor of Philosophy at Duke University. He received Ph.D. (2006) and M.S. (2003) degrees in Computer Science from Carnegie Mellon University, and an A.B. (2001) degree in Applied Mathematics from Harvard University. His research focuses on computational aspects of microeconomics, in particular game theory, mechanism design, voting/social choice, and auctions. This work uses techniques from, and includes applications to, artificial intelligence and multiagent systems. Conitzer has received the Social Choice and Welfare Prize (2014), a Presidential Early Career Award for Scientists and Engineers (PECASE), the IJCAI Computers and Thought Award, an NSF CAREER award, the inaugural Victor Lesser dissertation award, an honorable mention for the ACM dissertation award, and several awards for papers and service at the AAAI and AAMAS conferences. He has also been named a Guggenheim Fellow, a Kavli Fellow, a Bass Fellow, a Sloan Fellow, and one of AI's Ten to Watch. Conitzer and Preston McAfee are the founding Editors-in-Chief of the ACM Transactions on Economics and Computation (TEAC).

Design and Analysis for Experiments in Networks
Johan Ugander, Stanford University

A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the Average Treatment Effect (ATE) of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments involving social interference, when the treatment of individuals spills over to neighboring individuals along an underlying social network. This talk will describe a novel experimental framework, graph cluster randomization, for identifying average treatment effects under social interference. Given this framework, we analyze the variance of the average treatment effect as a property of the graph cluster design, and bias/variance trade-offs under exposure model misspecifications. This talk will discuss joint work with Lars Backstrom, Dean Eckles, Brian Karrer, Jon Kleinberg, and Joel Nishimura.

Bio: Johan Ugander is an Assistant Professor of Management Science &ame; Engineering at Stanford University. His research develops algorithmic and statistical frameworks for analyzing social networks, social systems, and other large-scale social data. Prior to joining Stanford he was a post-doctoral researcher at Microsoft Research, Redmond from 2014-2015, held an affiliation with the Facebook Data Science team from 2010-2014, and obtained his Ph.D. in Applied Mathematics from Cornell University in 2014.

Scalable Large Near-Clique Detection in Large-Scale Networks
Charalampos E. Tsourakakis, Harvard School of Engineering and Applied Sciences

Finding large near-(bi)cliques in massive networks is a notoriously hard problem of great importance to many applications, including anomaly detection and security, and community detection in social networks, and the Web graph. But can we exploit idiosyncrasies of real-world networks to attack this NP-hard problem successfully? In this talk I will answer this question in the affirmative. Specifically, I will present a significant advance in routines with rigorous theoretical guarantees for scalable extraction of large near-cliques from large-scale networks, the k-clique densest subgraph problem. I will present exact, approximation, and MapReduce algorithms that allow us to scale our computations to large-scale networks. At the heart of our scalable algorithm lies a densest subgraph sparsifier theorem. Our approach allows us to extract large near-(bi)cliques from a diverse collection of networks with millions of edges within few seconds. Finally, I will present interesting graph mining findings such as anomaly detection in real-world networks. I will conclude my talk with some interesting open problems.

Bio: Dr. Charalampos Tsourakakis is currently a CRCS Postdoctoral Fellow in the School of Engineering and Applied Sciences (SEAS) at Harvard University. He received his Ph.D. from the Algorithms, Combinatorics and Optimization (ACO) program at Carnegie Mellon University (CMU). He also holds a Master of Science from the Machine Learning Department at CMU. He did his undergraduate studies in the School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA). He is the recipient of a best paper award in IEEE Data Mining and has designed two graph mining libraries for tera-scale graphs. The former has been officially included in Windows Azure, and the latter was a research highlight of Microsoft Research. His main research interest lies in designing scalable and data-aware algorithms for massive networks that recover hidden structure and solve complex data-driven problems.

Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment
Michael Luca, Harvard Business School

Online marketplaces increasingly choose to reduce the anonymity of buyers and sellers in order to facilitate trust. We demonstrate that this common market design choice results in an important unintended consequence: racial discrimination. In a large-scale field experiment on Airbnb, we find that requests from guests with distinctively African-American names are roughly 16% less likely to be accepted than identical guests with distinctively White names. The difference persists whether the host is African American or White, male or female. The difference also persists whether the host shares the property with the guest or not, and whether the property is cheap or expensive. Discrimination is costly for hosts who indulge in it: hosts who reject African-American guests are able to find a replacement guest only 35% of the time. On the whole, our analysis suggests a need for caution: while information can facilitate transactions, it also facilitates discrimination Joint work with Benjamin Edelman and Dan Svirsky.

Bio: Michael Luca is an assistant professor of business administration at Harvard Business School. Professor Luca’s research applies econometric methods to field data in order to study the impact of information in market settings. He works closely with governments and companies to identify, design, implement, and experiment with different approaches to information disclosure, taking into account the behavioral foundations of how people make decisions. He has done work on rankings, expert reviews, online consumer reviews, and quality disclosure laws, among other types of information provision. Professor Luca has ongoing field research with Facebook, Yelp, several colleges, the UK government, and several U.S. cities, in addition to other partners. His current work focuses on crowdsourced reviews, analyzing a variety of companies including Yelp, Amazon, and ZocDoc. His findings have been written and blogged about in such media outlets as The Wall Street Journal, The New York Times, The Washington Post, The Huffington Post, Chicago Tribune, Harvard Business Review, PC World Magazine, and Salon. Professor Luca received his Ph.D. in economics from Boston University and a bachelor’s degree in economics and mathematics from SUNY Albany.

Incentivizing Exploration
Bobby Kleinberg, Cornell

An important recent theme in the development of on-line social systems is the potential of crowdsourced effort to solve large problems — defining tasks in which many people can each contribute a small amount of time to the overall goal. In some cases the arrangement is based on a direct compensation scheme, in which a (low) rate is paid for each unit of work performed. But in many settings one only has access to a crowd "in the wild", as they go about their everyday activities. Examples include product recommendations, social news readers, and scientific activities ranging from crowdsourced "citizen science" to the funding of research by national agencies.

In all of these domains, there is a problem of misaligned incentives: the designer's goal is to carry out exploration (of the space of news stories, products, bird habitats, etc.) as efficiently as possible, but for reasons of scale they must implement the exploration via a crowd composed of members who each derive their own utility from participating in the exploration. We model this as a multi-armed bandit problem in which selfish, myopic agents pull arms with publicly observable outcomes, and a principal seeking to maximize the cumulative time-discounted reward may influence the agents by offering monetary (or other) rewards contingent on choosing particular actions. Our main result is a full characterization of the trade-off between the expected payments the principal must make and the total reward that can be achieved.

Joint work with Peter Frazier, David Kempe, and Jon Kleinberg.

Bio: Bobby Kleinberg is an Associate Professor of Computer Science at Cornell University. His research studies the design and analysis of algorithms, and their relations to economics, learning theory, and networks. Prior to receiving his doctorate from MIT in 2005, he spent three years at Akamai Technologies, where he assisted in designing the world's largest Internet Content Delivery Network. He is the recipient of a Microsoft Research Faculty Fellowship, an Alfred P. Sloan Foundation Fellowship, and an NSF CAREER Award.

Distributed Algorithms for Big Graph Analytics
Alex Dimakis, UT Austin

Many datasets from the web, social networks, bioinformatics and neuroscience are naturally graph-structured. Graph algorithms are difficult to implement in distributed computation frameworks like Hadoop MapReduce and for this reason several in-memory graph engines like Pregel, GraphLab and GraphX are being currently developed. Our goal is to design parallelizable algorithms that discover graph structural properties and theoretically analyze their performance. We give an overview of the emerging problems in designing distributed algorithms over such graph engines and discuss future directions.

Bio: Alex Dimakis is an Assistant Professor at the Electrical and Computer Engineering department, University of Texas at Austin. From 2009 until 2012 he was with the Viterbi School of Engineering, University of Southern California. He received his Ph.D. in 2008 and M.S. degree in 2005 in electrical engineering and computer sciences from UC Berkeley and the Diploma degree from the National Technical University of Athens in 2003. During 2009 he was a CMI postdoctoral scholar at Caltech. He received an NSF Career award in 2011, a Google faculty research award in 2012 and the Eli Jury dissertation award in 2008. He is the co-recipient of several best paper awards including the joint Information Theory and Communications Society Best Paper Award in 2012. He is currently serving as an associate editor for IEEE Signal Processing letters. His research interests include information theory, coding theory, signal processing, and networking, with a current focus on distributed storage and machine learning.

Estimating the Causal Impact of Recommendation Systems from Observational Data
Jake Hofman, MSR NYC

Recommendation systems are an increasingly prominent part of the web, accounting for up to a third of all traffic on several of the world's most popular sites. Nevertheless, little is known about how much activity such systems actually cause over and above activity that would have occurred via other means (e.g., search) if recommendations were absent. Although the ideal way to estimate the causal impact of recommendations is via randomized experiments, such experiments are costly and may inconvenience users. In this paper, therefore, we present a method for estimating causal effects from purely observational data. Specifically, we show that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not. We then apply our method to browsing logs containing anonymized activity for 2.1 million users on over a 9 month period and analyze over 4,000 unique products that experience such shocks. We find that although recommendation click-throughs do account for a large fraction of traffic among these products, at least 75% of this activity would likely occur in the absence of recommendations. We conclude with a discussion about the assumptions under which the method is appropriate and caveats around extrapolating results to other products, sites, or settings.

Bio: Jake Hofman is a Researcher at Microsoft Research in New York City, where his work in computational social science involves applications of statistics and machine learning to large-scale social data. Prior to joining Microsoft, he was a member of the Microeconomics and Social Systems group at Yahoo! Research. Jake is also an Adjunct Assistant Professor of Applied Mathematics at Columbia University, where he has designed and taught classes on a number of topics ranging from biological physics to applied machine learning. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University.

Competitive Algorithms from Competitive Equilibria
Kamesh Munagala, Duke University

Scheduling jobs is a fundamental problem that arises in numerous forms and in various situations. In a typical scheduling problem, jobs of unknown duration arrive in an online fashion. The rate at which jobs get processed at any time instant depends on the resources allocated by the scheduler to them. We introduce and study a general scheduling problem that we term the Packing Scheduling problem (PSP), where these rates are subject to arbitrary packing constraints. The PSP framework captures a variety of scheduling problems that arise in data center computing and multi-core architectures. More concretely, PSP models multidimensional resource requirements and parallelizability, as well as network bandwidth requirements arising in scheduling jobs on shared clusters. It also captures many classical scheduling problems such as heterogeneous machine scheduling, broadcast scheduling, and scheduling jobs of different parallelizability.

For the PSP problem we show a constant competitive algorithm for minimizing total weighted completion time, when job arrivals and durations are unknown and adversarially chosen. Surprisingly, we achieve this result by applying the well-known Proportional Fairness algorithm from economics literature to allocate resources to jobs each time instant. Proportional Fairness has been extensively studied in the context of maximizing fairness in resource allocation; however, we present one of the first analyses in the context of scheduling. We also show that this algorithm has good theoretical guarantees for minimizing total delay (latency or flow time) in several special cases. Our results showcase the advantage of viewing complex scheduling problems as sequential resource allocation problems, and applying ideas from economics for both algorithm design and analysis.

Joint work with Sungjin Im (UC Merced) and Janardhan Kulkarni (Duke).

Bio: Kamesh Munagala is an Associate Professor of Computer Science at Duke University. He obtained his Ph.D. from Stanford University in 2003 and B.Tech. from IIT Bombay in 1998. He is primarily interested in approximation algorithms, sequential decision theory, and computational economics, as well as applications of these techniques to e-commerce, databases, and social networks. He is a recipient of the NSF CAREER Award and the Alfred P. Sloan Research Fellowship, and is a co-author of the best paper at WWW 2009. He was a Visiting Research Professor at Twitter in 2012, and currently serves as the Director of Graduate Studies for the Duke CS department.

Physics-inspired Algorithms and Phase Transitions in Community Detection
Cris Moore, Santa Fe Institute

Detecting communities, and labeling nodes according to which community they belong to, is an essential problem in the study of social and biological networks. One of the best and fastest algorithms for this problem is Belief Propagation, which updates probability distributions of node labels until they reach a fixed point. I'll give a friendly introduction to Belief Propagation, and the analogy with the cavity method of statistical physics. Don't worry, no physics is required!

In addition to being practically useful, our Belief Propagation algorithm can be studied analytically, revealing phase transitions in the ability of any algorithm to find communities. It also gives us principled ways to tell when communities are statistically significant, unlike algorithms that "overfit" and find illusory structure even in random networks. It can be extended to dynamic networks where nodes switch communities over time. Finally, I'll argue that the consensus of many good solutions is often better than the "best" single solution — and that this is true for many real-world optimization problems.

This is joint work with Aurelien Decelle, Florent Krzakala, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborova, and Pan Zhang.

Bio: Cristopher Moore received his B.A. in Physics, Mathematics, and Integrated Science from Northwestern University, and his Ph.D. in Physics from Cornell. He has published over 120 papers at the boundary between physics and computer science, ranging from quantum computing, to phase transitions in NP-complete problems, to the theory of social networks and efficient algorithms for analyzing their structure. With Stephan Mertens, he is the author of The Nature of Computation, published by Oxford University Press. He is a Professor at the Santa Fe Institute.

In 2014, Cris Moore was named a fellow of the American Physical Society for fundamental contributions to the at the interface between nonlinear physics, statistical physics, and computer science, including complex network analysis, phase transitions in NP-complete problems, and the computational complexity of physical simulation.

Optimal Contracts for Intermediaries in Online Advertising
Ozan Candogan, Duke Fuqua Business School

In online display advertising, the prevalent method advertisers employ to acquire impressions is to contract with an intermediary. These contracts involve upfront payments made by the advertisers to the intermediary, in exchange for running campaigns on their behalf. This paper studies the optimal contract offered by the intermediary in a setting where advertisers’ budgets and targeting criteria are private. This problem can naturally be formulated as a multi-dimensional dynamic mechanism design problem, which in general is hard to solve. We tackle this problem by employing a novel performance space characterization technique, which relies on delineating the expected cost and value achievable by any feasible (dynamic) bidding policy. This technique provides a convex optimization formulation of the optimal contract design problem. Using this formulation, we obtain a closed-form characterization of the optimal contract, when advertisers have identical value distributions. Conversely, when advertisers are heterogeneous, we provide a duality-based approach, which reduces the optimal contract design problem to a simpler convex optimization problem. The intermediary’s bid in the optimal contract is obtained by first using the optimal dual solution to compute a weighted average of the values associated with different types (to guarantee that the advertiser reports her type truthfully), and then shading this quantity (to account for budget constraints). Our results indicate that an intermediary can profitably provide bidding service to a budget-constrained advertiser, and at the same time increase the overall market efficiency.

Joint work with Santiago Balseiro (Duke University).

Bio: Ozan Candogan is an assistant professor at the Fuqua School of Business, where he is a member of the Decision Sciences area. Prior to joining Fuqua, he received his Ph.D. and M.S. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. His research interests are in mechanism design, decision making in social and economic networks, and analysis of dynamic strategic interactions. His most recent work focuses on the design of novel iterative auction mechanisms for efficient allocation of resources, and he was a finalist in the 2013 George Nicholson Student Paper Competition. Potential applications of his research include novel auction formats that can be employed for procurement, pricing and advertising mechanisms that optimally utilize available social network information, and pricing mechanisms for selling cloud computing service. His research has appeared in journals such as Management Science, Operations Research, Mathematics of Operations Research, and Games and Economic Behavior. He is also a recipient of the 2012 Microsoft Research Ph.D. fellowship, and 2009 Siebel Scholarship.

Randomized Field Experiments in Mobile Marketing
Anindya Ghose, NYU Stern

The explosive growth of smartphones and location-based services (LBS) has contributed to the rise of mobile advertising. In this talk, we present results from multiple randomized field experiments that are designed to measure the effectiveness of mobile marketing and promotions. In the first experiment, using data from a location based app for smartphones, we measure the effectiveness of mobile coupons. The aim is to analyze the causal impact of geographical distance between a user and retail store, the display rank, and coupon discounts on consumers’ response to mobile coupons. In a second large scale field study we examine the role of contextual crowdedness on the redemption rates of mobile coupons. We find that people become increasingly engaged with their mobile devices as trains get more crowded, and in turn become more likely to respond to targeted mobile messages. The change in behavior can be accounted for by the phenomenon of “mobile immersion”: to psychologically cope with the loss of personal space in crowded trains and to avoid accidental gazes, commuters can escape into their personal mobile space. In turn, they become more involved with targeted mobile messages they receive, and, consequently, are more likely to make a purchase in crowded trains. These studies causally show that mobile advertisements based on real-time static geographical location and contextual information can significantly increase consumers’ likelihood of redeeming a geo-targeted mobile coupon. However, beyond the location and contextual information, the overall mobile trajectory of each individual consumer can provide even richer information about consumer preferences. In a third study, we propose a new mobile advertising strategy that leverages full information on consumers’ offline moving trajectories. To examine the effectiveness of this new mobile trajectory-based advertising strategy, we designed a large-scale randomized field experiment in one of the largest shopping malls in the world. We find that mobile trajectory-based advertising can lead to highest redemption probability, fastest redemption behavior, and highest satisfaction rate from customers at the focal advertising store. Various practical implications for mobile marketing are discussed.

Bio: Anindya Ghose is a Professor of IT and a Professor of Marketing at New York University's Leonard N. Stern School of Business. He is the co-Director of the Center for Business Analytics at NYU Stern. He is the Robert L.& Dale Atkins Rosen Faculty Fellow and a Daniel P. Paduano Fellow of Business Ethics at NYU Stern. He has been a Visiting Associate Professor at the Wharton School of Business. He also serves as the Scientific Advisor to 3TI China. He was recently named by Business Week as one of the "Top 40 Professors Under 40 Worldwide" and by Analytics Week as one the "Top 200 Thought Leaders in Big Data and Business Analytics". His research has been recognized by 10 best paper awards and nominations. He has published more than 75 papers in premier scientific journals and peer reviewed conferences, and has given more than 200 talks internationally. His research has been profiled and he has been interviewed numerous times in the BBC, Bloomberg TV, CNBC, China Daily, Financial Times, Fox News, Forbes, Knowledge@Wharton, Korean Broadcasting News Company, Los Angeles Times, MSNBC, NBC, New York Times, New York Daily, National Public Radio, NHK Japan Broadcasting, Reuters, Time Magazine, Washington Post, Wall Street Journal, Xinhua, and elsewhere.

Scalable mining of massive networks: Distance-based Centrality, Similarity, and Influence
Edith Cohen, Tel-Aviv University

Graphs are ubiquitous in representing interactions between entities: communication, social relations, clicks, likes, follows, views, hyperlinks, transactions, and more. Structural properties of a corresponding graph successfully model fundamental properties of entities and their relations with applications to ranking, recommendations, attribute completion, and clustering.

The shortest-path and reachability relation are the basis of natural measures of centrality, influence, and similarity. The measures can be defined over a static set of edges or, for robustness, over a distribution derived from the set of interactions.

In the talk, I will first present and motivate a unified view of distance-based measures. I will then overview algorithmic and estimation tools from my own work which support scalable computation of these measures on massive networks. The approach is based on computing sample-based sketches which capture the relation of each node to all others. The sketches, with appropriate carefully derived estimators, are then used to provide quick high-confidence approximate answers for similarity, centrality, and influence queries.

An advantages of distance-based measures over others based on the paths ensemble (random-walk/page-rank, Katz, resistance) is the combination of high scalability and application to asymmetric (directed) interactions, which are often harder to work with.

Some of the material is based on joint work performed at the MSR Silicon Valley Lab, with Daniel Delling, Andrew Goldberg, Moises Goldzmidt, Fabian Fuchs, Thomas Pajor, and Renato Werneck.

Bio: Edith Cohen is (visiting) full professor at Tel Aviv University. Until 2014 she was a Principal Researcher at Microsoft Research (Silicon Valley) and before that between 1991 and 2012 she was at AT&T Labs (initially AT&T Bell Laboratories). She was a visiting professor at UC Berkeley in 1997. She received a Ph.D in Computer Science from Stanford University in 1991. Her research interests include algorithms, mining and analysis of massive data, optimization, and computer networking. She is a winner of the IEEE ComSoc 1997 Bennett prize, and an author of 20+ patents and 100+ publications.

Quadratic Voting
Glen Weyl, University of Chicago/MSR

While the one-person-one-vote rule often leads to the tyranny of the majority, alternatives proposed by economists have been complex and fragile. By contrast, we argue that a simple mechanism, Quadratic Voting (QV), is robustly very efficient. Voters making a binary decision purchase votes from a clearinghouse paying the square of the number of votes purchased. If individuals take the chance of a marginal vote being pivotal as given, like a market price, QV is the unique pricing rule that is always efficient. In an independent private values environment, any type-symmetric Bayes-Nash equilibrium converges towards this efficient limiting outcome as the population grows large, with inefficiency decaying as 1/N. We use approximate calculations, which match our theorems in this case, to illustrate the robustness of QV, in contrast to existing mechanisms. We discuss applications in both (near-term) commercial and (long-term) social contexts.

Bio: Glen Weyl is a Researcher at Microsoft Research New England, an Assistant Professor of Economics and Law at the University of Chicago and an Alfred P. Sloan Research Fellow. He was Valedictorian of Princeton University as an undergraduate in 2007, received his PhD also from Princeton in 2008 and then spent three years as a Junior Fellow at the Harvard Society of Fellows before joining the faculty at Chicago. His research focuses on pure and applied price theory, especially applications to industrial and tax policy, as well as the intersection between economics and related fields such as law and philosophy. Outside his academic work, Glen co-founded a start-up commercializing Quadratic Voting and consults for platform start-ups through Applico Inc.

Auctions and Decentralization in Complex Networks
Paul Milgrom, Stanford

Economists' analyses are most often based on simplifying assumptions that enable the analyst to focus on the roles of prices, substitution, and decentralized decision making. Computer scientists most often take a sharply different approach, focusing on worst cases and general algorithmic treatments, in which simple economic structures are mostly hidden from view. We combine elements from these two perspectives in an attempt to develop solutions for high-stakes auction market design problems, especially the reallocation of radio spectrum in North America.

Bio: Paul Milgrom is the Shirley and Leonard Ely professor of Humanities and Sciences in the Department of Economics at Stanford University and professor, by courtesy, at the Stanford Graduate School of Business. He is a member of both the National Academy of Sciences and the American Academy of Arts and Sciences and a winner of the 2008 Nemmers Prize in Economics and the 2012 BBVA Frontiers of Knowledge award.

He is best known for his contributions to the microeconomic theory, his pioneering innovations in the practical design of multi-item auctions, and the extraordinary successes of his students and academic advisees. According to his BBVA Award citation: “Paul Milgrom has made seminal contributions to an unusually wide range of fields of economics including auctions, market design, contracts and incentives, industrial economics, economics of organizations, finance, and game theory.” According to a count by Google Scholar, Milgrom’s books and articles have received more than 60,000 citations.

Towards optimal algorithms for prediction with expert advice
Balasubramanian Sivan, Microsoft Research

We study the classic problem of prediction with expert advice. Focusing on settings with a constant number of experts, we develop optimal algorithms and obtain precisely optimal regret values for the case of 2 and 3 experts. Further, we develop regret lower bounds for the multiplicative weights algorithm that exactly match the known upper bounds for an arbitrary number of experts k. This establishes a constant factor separation between the regrets achieved by the optimal algorithm and the multiplicative weights algorithm for a constant number of experts.

The optimal algorithms have a crisp interpretation. For instance, in the case of 2 experts, the optimal algorithm follows the advice of a particular expert with exactly the probability that the expert will turn out to be the best expert.

Our main tool is the minimax principle which lets us analyze the optimal adversary to compute optimal regret values. While analyzing the optimal adversary, we establish deep connections with non-trivial aspects of random walk. We further use this connection to develop an improved regret bound for the case of 4 experts.

Joint work with Nick Gravin and Yuval Peres

Bio: Balasubramanian Sivan is a postdoctoral researcher at Microsoft Research, Redmond. He completed his PhD in Computer Science at the University of Wisconsin-Madison in 2013. His PhD thesis on “Prior Robust Optimization” was awarded the 2014 ACM SIGecom doctoral dissertation award and the University of Wisconsin-Madison Computer Science department’s outstanding graduate student researcher award. His research interests are in Algorithmic Game Theory, online and approximation algorithms, expert learning. More details can be found here:

Online A-B Testing
Vivek Farias, MIT

We consider the problem of A-B testing when the impact of a treatment is marred by a large number of covariates. This is the situation in a number of modern applications of A-B testing (such as in adTech and e-commerce) as well as in more traditional applications (such as clinical trials). Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the efficiency of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting where subjects arrive, and must be assigned, sequentially. We also characterize the value of optimization and show that it can be expected to grow large with the number of covariates. Finally, using a real-world impression dataset, we show that our algorithms can be expected to yield substantial improvements to efficiency in practice.

Joint work with Nikhil Bhat and Ciamac Moallemi (Columbia GSB)

Bio: Vivek Farias is interested in the development of new methodologies for large scale dynamic optimization and applications in revenue management, finance, marketing and healthcare. He received his Ph.D. in Electrical Engineering from Stanford University in 2007 and has been at MIT since, where he is the Robert N. Noyce Professor of Management. Outside academia, Vivek has contributed to ventures in finance and technology in the role of a researcher, consultant or founder. Vivek is a recipient of an IEEE Region 6 Undergraduate Student Paper Prize (2002), an INFORMS MSOM Student Paper Prize (2006), an MIT Solomon Buchsbaum Award (2008), an INFORMS JFIG paper prize twice (2009, 2011), the NSF CAREER award (2011), MIT Sloan’s Outstanding Teacher award (2013), and was a finalist for the 2011 INFORMS Pierskalla award in healthcare.

The spread of information in social media
Fil Menczer, Yahoo Labs/Indiana University

This talk overviews ongoing work on information diffusion in social media, focusing in particular on mining, modeling, and prediction tasks on data from the Twitter network. I present machine learning efforts that leverage the structure of meme diffusion networks and many other features to detect misinformation campaigns, such as astroturf and social bots. We employ modeling techniques to understand the formation of communities, the creation of social ties, and the competition for attention. We investigate how different forms of structural and topical diversity in the network can be leveraged to predict which memes will go viral. Finally, it time permits, I will review a few crowdsourcing projects exploring the computation of unbiased scholarly impact metrics, the anonymous collection of sensitive data, and social good.

Joint work with many members of the Center for Complex Networks and Systems Research at Indiana University. This research is supported by the National Science Foundation, McDonnell Foundation, and DARPA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these funding agencies.

Bio: Filippo Menczer is a visiting scientist at Yahoo Labs in Sunnyvale during 2014-15 while on sabbatical from Indiana University, where he is a professor of informatics and computer science, adjunct professor of physics, and a member of the cognitive science program. He holds a Laurea in Physics from the University of Rome and a Ph.D. in Computer Science and Cognitive Science from the University of California, San Diego. Dr. Menczer has been the recipient of Fulbright, Rotary Foundation, and NATO fellowships, and a Career Award from the National Science Foundation. He serves as director of the Center for Complex Networks and Systems Research and is a Fellow of the Institute for Scientific Interchange Foundation in Torino, Italy, a Senior Research Fellow of The Kinsey Institute, and an ACM Distinguished Scientist. He previously served as division chair in the IUB School of Informatics and Computing, and was Fellow-at-large of the Santa Fe Institute. His research focuses on Web science, social networks, social media, social computation, Web mining, distributed and intelligent Web applications, and modeling of complex information networks. His work has been covered in The New York Times, Wall Street Journal, NPR, CNN, The Economist, Nature, Science, The Washington Post, Wired, The Atlantic, BBC, and many other US and international news sources.

Inverting a Steady-State
Sergei Vassilvitskii, Google Research

We consider the problem of inferring choices made by users from data that only contains the relative popularity of each item. We propose a framework that models the problem as that of inferring a Markov chain given its stationary distribution. Formally, given a graph and a distribution on its nodes, the goal is to assign scores to nodes such that the stationary distribution of the following Markov chain will equal the given distribution on nodes: the transition probability from a node to its neighbor is proportional to the score of the neighbor. We prove sufficient conditions under which this problem is feasible, and, for the feasible instances, obtain a simple algorithm for a generic version of the problem. This iterative algorithm provably finds the unique solution to this problem and has a polynomial rate of convergence. In practice we find that the algorithm converges after fewer than 10 iterations. We then apply this framework to choice problems in online settings and show that our algorithm is able to explain the observed data and predict the user choice, much better than other competing baselines across a variety of diverse datasets.

Joint work with Ravi Kumar, Andrew Tomkins, and Erik Vee.

Bio: Sergei Vassilvitskii is a currently a Research Scientist at Google and a Fellow at the Applied Statistics Center at Columbia University. Previously, he was a Research Scientist at Yahoo! Research and an Adjunct Assistant Professor at Columbia University. He completed his PhD at Stanford University under the supervision of Rajeev Motwani.

Strategic Learning and the Topology of Social Networks
Omer Tamuz, Microsoft Research/Caltech

We consider a group of people who each have to make a repeated choice between two lifestyle options, and who observe the choices of their friends in a social network. We assume that one of the two options is universally preferable, that each person's choice affects only his or her own health, and that any benefits are only observed in the distant future.

We model this by a Bayesian game of pure informational externalities, in which agents start with conditionally independent private signals. We show that the question of whether or not the agents learn the preferable choice depends on the topology of the social network. In particular, we identify a geometric "egalitarianism" condition on the social network graph that guarantees learning in infinite networks, or learning with high probability in large finite networks, in any equilibrium of the game.

Joint work with Elchanan Mossel and Allan Sly

Bio: Omer Tamuz is a Schramm Postdoctoral Fellow at Microsoft Research New England / MIT, and an Assistant Professor (on leave) at the faculty of the Humanities and Social Sciences division at Caltech. He received his PhD in mathematics from the Weizmann Institute, where he studied learning in Bayesian games, as well as other topics in machine learning, probability theory and ergodic theory.

Information Provision in Dynamic Innovation Tournaments
Kostas Bimpikis, Stanford

Innovation tournaments have emerged as a viable alternative to the standard research and development process. They are particularly suited for settings that feature a high degree of uncertainty about the feasibility of the innovation goal. Information about the progress of participants in these settings has an interesting dual role. On one hand, participants who have experienced no progress start becoming pessimistic about the underlying environment and whether the innovation goal is even feasible, so learning about their competitors' gradual progress provides a positive signal about the attainability of the goal and provides an incentive for participants to continue putting effort. On the other hand, information about the status of competition may adversely affect effort provision from the laggards, who start getting discouraged about their chances of winning the tournament. Because the tournament's success crucially depends on the effort decision of the participants, the tournament's information provision mechanism is a central feature of its design. This paper explores these issues and suggests a number of design guidelines -- with a focus on the role of intermediate awards as information provision devices -- that help implement the objective of maximizing the probability and minimizing the time of completing the end goal.

Bio: Kostas Bimpikis is an Assistant Professor of Operations, Information and Technology at Stanford University's Graduate School of Business. Prior to joining Stanford, he spent a year as a postdoctoral research fellow at the Microsoft Research New England Lab. He received a PhD in Operations Research from the Massachusetts Institute of Technology in 2010, an MS in Computer Science from the University of California, San Diego and a BS degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece.

Sales Mechanisms in Online Markets: What Happened to Internet Auctions?
Jonathan Levin, Stanford

Consumer auctions were very popular in the early days of internet commerce, but today online sellers mostly use posted prices. Compositional shifts in the items being sold and the sellers offering these items cannot account for this evolution. Instead, the returns to sellers using auctions have diminished. We consider two hypotheses: a shift in buyer preferences away from auctions, and an increase in competition or search efficiency that favors favors posted prices. We use a model and data from eBay to tease these apart, finding that the former is more important, especially for more idiosyncratic products. We also discuss why auctions and posted prices can coexist in online markets.

Bio: Jonathan Levin is Professor of Economics at Stanford University. His research interests include industrial organization and microeconomic theory.

Quality Externalities and the Limits of Reputation in Two-Sided Markets
Steve Tadelis, Berkeley

Buyers in two-sided marketplace platforms may draw conclusions about the quality of the platform from any single transaction. This induces an externality across sellers that reputation mechanisms will not alleviate. Furthermore, buyers who abandon the platform without leaving feedback will cause seller reputations to be biased. Using data from eBay, we document this externality and argue that platforms can mitigate it by actively screening sellers and promoting the prominence of better quality sellers. Exploiting the bias in feedback, we create a measure of seller quality and demonstrate the benefits of our approach through a controlled experiment that prioritizes better quality sellers to a random subset of buyers. We thus highlight the importance of externalities in two-sided markets and chart an agenda that aims to create more realistic models of two-sided markets.

Bio: Steve Tadelis is a professor of economics at Berkeley's Haas School of Business. From 2011 to 2013 he spent a two year leave at ebay Labs, where he put together and led a group of Economists who focus on the economics of e-commerce, with particular attention to creating better matches of buyers and sellers, reducing market frictions by increasing trust and safety in eBay's marketplace, understanding the underlying value of different advertising and marketing strategies, and exploring the market benefits of different pricing structures. Aside from the economics of e-commerce, Steve’s main fields of interest are the economics of incentives, industrial organization and microeconomics. His research has focused on a firm's reputation as a valuable, tradable asset; the effects of contract design and organizational form on firm behavior with applications to outsourcing and privatization; public and private sector procurement and award mechanisms; and the determinants of trust.

The Dynamics of Online Reputation
Sinan Aral, MIT

Identity and reputation drive some of the most important decisions we make online: Who to follow or link to, whose information to trust, whose opinion to rely on when choosing a product or service, whose content to consume and share. Yet, we know very little about the dynamics of online reputation and how it affects our decision making. Sinan will describe a series of randomized experiments that explore the population level behavioral dynamics catalyzed by identity and reputation online. He will explore some of the implications for bias in online ratings, the foundations of social advertising and the ability to generate cascades of behavior through peer to peer social influence. The coming decades will likely see an emphasis on verified identities online. Sinan will argue that a new science of online identity could help guide our business, platform design and social policy decisions in light of the rising importance of online reputation.

Bio: Sinan Aral is a Scientist and an Entrepreneur: He is the David Austin Professor of Management at MIT, where he holds joint Associate Professorships in the IT and Marketing groups and co-leads the Initiative on the Digital Economy. He was the Chief Scientist at SocialAmp, one of the earliest social commerce analytics companies (until its sale in 2012); and is currently the Chief Scientist at Humin, a social navigation startup developing the “Google Maps” for your social relationships.

Sinan is the Scholar-in-Residence at the New York Times R&D Lab and has worked closely with Facebook, Yahoo, Microsoft, Nike, IBM, Intel, Cisco, Oracle, SAP and many other leading Fortune 500 firms on realizing business value from social media and IT investments. His research has won numerous awards including the Microsoft Faculty Fellowship, the PopTech Science Fellowship, an NSF CAREER Award and a Fulbright Scholarship. He was also recently named one of the “World’s Top 40 Business School Professors Under 40” by Poets & Quants.

In his spare time, he cooks, skis and tell jokes about his own cooking and skiing. His most recent hobby is learning from his one year old son.

You can find Sinan on Twitter @sinanaral.

Control Profiles of Complex Networks
Derek Ruths, McGill University

Studying the control properties of complex networks provides insight into how designers and engineers can influence these systems to achieve a desired behavior. Topology of a network has been shown to strongly correlate with certain control properties; in this talk I will discuss the fundamental structures that explain the basis of this correlation. I'll then use these fundamental structures to construct a new statistic, the control profile, that quantifies the different proportions of control-inducing structures present in a network. Using the control profile, I'll show that standard random network models do not reproduce the kinds of control structures that are observed in real world networks. Furthermore, the profiles of real networks form three well-defined clusters that provide insight into the high-level organization and function of complex systems.

From Small-World Networks to User-Driven Content Search
Amin Karbasi, ETH Zurich

In this talk, we will describe a new way of navigating through a database of similar objects using comparisons. In short, at each step, the database presents two objects to the user, who then selects among the pair the object closest to the target that she has in mind. This process continues until, based on the user’s answers, the database can uniquely identify the target she has in mind. This kind of interactive navigation (also known as exploratory search) and its variants have numerous real-life applications in content-based image/multimedia retrieval.

We will argue that the above problem is strongly related to the small-world network design: given a graph embedded in a metric space, how should one augment this graph by adding edges in order to minimize the expected cost of greedy forwarding. We will first show that the small-world network design problem is NP-hard. Given this negative result, we propose a novel mechanism for small-world design and provide an upper bound on its performance. This mechanism has a natural equivalent in the context of content search through comparisons, and we establish both (order optimum) upper and lower bounds for its performance. These bounds are intuitively appealing as they provide performance guarantees in terms of the bias of the distribution of targets, captured by the entropy, as well as the topology of their embedding, captured by the doubling dimension.

Savings Monitors
Arun G. Chandrasekhar, Stanford University

We conduct a field experiment in India to explore two interventions to help individuals to increase their savings balances. First, we design a financial product based on the popular business correspondent model, which includes frequent reminders, assistance in account opening, and the setting of a six-month savings goal. Second, we measure the effectiveness of adding a peer monitoring component to this basic bundle and test whether the local social network can help to increase the penetration of the formal banking system. We ask whether having a monitor substitutes for a formal commitment device, whether individuals choose the most effective monitors, and moreover, whether some community members are better than others at encouraging financial capability.

Computational Fair Division: From Cake Cutting to Cloud Computing
Ariel Procaccia, Carnegie Mellon University

I will present an exciting new interaction between computer science and fair division theory. On the one hand, I will show how computational thinking provides a novel perspective on classic problems in fair division, such as cake cutting and estate division. On the other hand, I will explain how fair division theory is informing the allocation of computational resources. I will also talk about the integration of some of these theoretical ideas into a not-for-profit fair division website that aims to make the world just a bit fairer.

Big Thinking: Augmenting human cognition with crowds and computation
Niki Kittur, Carnegie Mellon University

Despite the growing availability of information online, individual human cognition is fundamentally limited in the speed and amount of information it can process at once. In this talk I will discuss ways of augmenting individual cognition by taking advantage of crowds and computation. Our approaches combine the flexibility of many human minds working together with the raw power of computational systems to accelerate learning, innovation, knowledge production, and scientific discovery. I will discuss several novel crowdsourcing, machine learning and visualization systems that augment the uniquely human abilities of analogy, schema induction, and problem solving. Examples include augmenting human thinking in product innovation (e.g., Quirky); knowledge production (Wikipedia); creative tasks such as writing, journalism and poetry; and making sense of multivariate and graph data.

OSNs: The Wisdom of a Few and The Value of Users
Ricardo Baeza-Yates, Yahoo Labs

One of the main differences between traditional Web analysis and Online Social Networks (OSNs) studies, is that in the first case the information is organized around content, while in the second case it is organized around people. While search engines have done a good job finding relevant content across billions of pages, nowadays we do not have an equivalent tool to find relevant people in OSNs. Even though an impressive amount of research has been done in this direction, there are still a lot of gaps to cover. Although our first intuition could be (and was!) search for popular people, previous research have shown that users' in-degree (e.g. number of friends or followers) is important but not enough to represent the importance and reputation of a person. Another approach is to study the content of the messages exchanged between users, trying to identify topical experts. However the computational cost of such approach - including language diversity - is a big limitation. In our work we take a content-agnostic approach, focusing in frequency, type, and time properties of user actions rather than content, mixing their static characteristics (social graph) and their activities (dynamic graphs). Our goal is to understand the role of popular users in OSNs, and also find hidden important users: Do popular users create new trends and cascades? Do they add value to the network? And, if they don't, who does it? Who generates the content of the OSN? Our research provides preliminary answers for these questions.

This is joint work with Diego Saez-Trumper and colleagues in Brazil and USA.

Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games
Tuomas Sandholm, Carnegie Mellon University

Incomplete-information games - such as most auctions, negotiations, and future (cyber)security settings - cannot be solved using minimax search even in principle. Completely different algorithms were needed. A dramatic scalability leap has occurred in our ability to solve such games over the last seven years, fueled largely by the Annual Computer Poker Competition. I will discuss the key domain-independent techniques that enabled this leap, including automated abstraction techniques and approaches for mitigating the issues that they raise, new equilibrium-finding algorithms, safe opponent exploitation methods, techniques that use qualitative knowledge as an extra input, and endgame solving techniques. I will finish by benchmarking poker programs against the best human poker professionals and by discussing what humans can learn from the programs.

"Going Viral" and the Structure of Online Diffusion
Sharad Goel, Microsoft Research NYC

New products, ideas, norms and behaviors are often thought to propagate through a person-to-person diffusion process analogous to the spread of an infectious disease. Until recently, however, it has been prohibitively difficult to directly observe this process, and thus to rigorously quantify or characterize the structure of information cascades. In one of the largest studies to date, we describe the diffusion structure of billions of events across several domains. We find that the vast majority of cascades are small, and are characterized by a handful of simple tree structures that terminate within one degree of an initial adopting "seed." While large cascades are extremely rare, the scale of our data allows us to investigate even the one-in-a-million events. To study these rare, large cascades, we develop a formal measure of what we label "structural virality" that interpolates between two extremes: content that gains its popularity through a single, large broadcast, and that which grows via a multi-generational cascade where any one individual is directly responsible for only a fraction of the total adoption. We find that the very largest observed events nearly always exhibit high structural virality, providing some of the first direct evidence that many of the most popular products and ideas grow through person-to-person diffusion. However, medium-sized events -- having thousands of adopters -- exhibit surprising structural diversity, and are seen to grow both through broadcast and viral means. Finally, we show that our empirical results are largely consistent with an SIR model of contagion on a scale-free network, reminiscent of previous work on the long-term persistence of computer viruses.

The Value of Network Information
Itay Fainmesser, Brown University

Motivated by the proliferation of companies, such as, MySpace and Twitter, whose business models rely, at least in part, in monetizing the information on the interactions and influences of their users, we evaluate how much a monopoly can increase its profit by exploiting individual level data on "who influences whom" when designing his selling strategy. We also analyze the effect that the use of such information has on consumers' welfare. Our framework incorporates a rich set of market products, including goods characterized by global and local network effects. It also has the novel feature to distinguish between the level of influence of a consumer and her susceptibility to influence, a distinction that has been shown to be important empirically.

Joint work with Andrea Galeotti.

Prediction in complex networks: The impact of structure on learning and prediction
Jennifer Neville, Purdue University

The recent popularity of online social networks and social media has increased the amount of information available about users' behavior--including current activities and interactions among friends and family. This rich relational information can be exploited to predict user interests and preferences even when individual data is sparse, as the relationships are a critical source of information that identify potential statistical dependencies among people. Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model roll-out on the properties of the resulting statistical models--which occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partially-observable social network and social media datasets.

Eliciting and aggregating information from the crowd
Anirban Dasgupta, Yahoo Labs!

Crowdsourcing is now widely used to replace judgement or evaluation by an expert authority with an aggregate evaluation from a number of non-experts, in applications ranging from rating and categorizing online content all the way to evaluation of student assignments in massively open online courses (MOOCs) via peer grading. Two key issues in these settings are: i) how to provide incentives such that agents in the crowd put in their 'full effort' in making good evaluations, as well as truthfully report them and ii) how to aggregate these evaluations, thereby giving an accurate estimate of the ground truth, given that the agents are of varying, unknown, expertise.

In this talk, we look at both of these problems. For the first, we present an information elicitation mechanism for multiple tasks when agents have endogenous proficiencies, with the following property: exerting maximum effort followed by truthful reporting of observations is a Nash equilibrium with maximum payoff.

For the aggregation problem, we consider a model where each task is binary and each agent has an unknown, fixed, reliability that determines the agent's error rate in performing tasks. The problem is to determine the truth values of the tasks solely based on the agent evaluations. We present algorithms whose error guarantees depend on the expansion properties of the agent-task graph--- these algorithms perform well empirically.

Joint work with Nilesh Dalvi, Arpita Ghosh, Vibhor Rastogi and Ravi Kumar.

Should auctions be complex?
Noam Nisan, Hebrew University of Jerusalem

Do simple auctions suffice for obtaining high revenue or do we need to design complex ones? For the case of selling one item, Myerson shows that the answer is "yes": the revenue-maximizing auction is deterministic with a single reserve price. When selling two (or more) items, we show that the answer is "no": complex auctions can yield infinitely more revenue than simple ones, even when there is only a single bidder with an additive valuation. "Complex" may be defined in various ways here including allowing randomization as well as our choice of measuring the auction's menu size. However, when the bidder’s values for the items are independently distributed, the answer is “approximately yes”: selling each of two items simply and separately yields at least half the revenue of any auction.

Joint work with Sergiu Hart.

The extent of price misalignment in prediction markets
David Pennock, Microsoft Research, NYC

We examine misaligned prices for logically related contracts in prediction markets. First, we uncover persistent arbitrages between identical contracts on different exchanges; price support beyond what is in the published order book multiplies the size of these arbitrages. Second, misalignment between logically related contracts listed on the same exchange occurs when contracts systemically shut down or fail to respond efficiently around moments of high information flow. Third, other examples of bounded rationality include: consistent asymmetry between buying and selling, and persistent price lags between exchanges. Fourth, we propose solutions to these problems by fixing logical contradictions within the exchange and providing flexible trading interfaces, freeing traders to focus on providing information in the form they find most natural. Joint work with David Rothschild.

Adaptive Seeding in Social Networks
Yaron Singer, Google and Harvard University

The rapid adoption of social networking technologies throughout the past decade is bringing special attention to algorithmic and data mining techniques designed for maximizing information cascades in social networks. Despite the immense progress which has been made in the past decade, due to limited data access and the structure of social networks the application of state-of-the-art techniques often results in poor performance.

In this talk we will introduce a new framework we call adaptive seeding. The framework is a two-stage model which leverages a phenomenon known as the "friendship paradox" in social networks in order to dramatically increase the spread of information cascades. Our main result shows constant factor approximations are achievable for the most well-studied models of information spreading in social networks. The result follows from new techniques and concepts that may be of independent interest for those interested in stochastic optimization and machine learning.

Joint work with Lior Seeman.

Pricing in Social Networks
Francis Bloch, Ecole Polytechnique, France.

We analyze the problem of optimal monopoly pricing in social networks where agents care about consumption or prices of their neighbors. We characterize the relation between optimal prices and consumers' centrality in the social network. This relation depends on the market structure (monopoly vs. oligopoly) and on the type of externalities (consumption versus price). We identify two situations where the monopolist does not discriminate across nodes in the network (linear monopoly with consumption externalities and local monopolies with price externalities). We also analyze the robustness of the analysis with respect to changes in demand, and the introduction of bargaining between the monopolist and the consumer.

(joint with Nicolas Querou)

Language and Social Dynamics in Online Communities
Cristian Danescu-Niculescu-Mizil, Stanford University and Max Planck Institute SWS.

Much of online social activity takes the form of natural language, from product reviews to conversations on social-media platforms. I will show how analyzing these interactions from the perspective of language use can provide a new understanding of social dynamics in online communities.

I will describe two of my efforts in this direction. The first project leverages insights from psycholinguistics to build a novel computational framework that shows how key aspects of social relations between individuals are embedded in (and can be inferred from) their conversational behavior. In particular, I will discuss how power differentials between interlocutors are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. The second project explores the relation between users and their community, as revealed by patterns of linguistic change. I will show that users follow a determined lifecycle with respect to their susceptibility to adopt new community norms, and how this insight can be harnessed to predict how long a user will stay active in the community.

This talk includes joint work with Susan Dumais, Michael Gamon, Dan Jurafsky, Jon Kleinberg, Jure Leskovec, Lillian Lee, Bo Pang, Christopher Potts and Robert West.

References: WWW 2013, WWW 2012, WWW 2011.

Current challenges in kidney exchange - need for long chains and dynamic matching
Itai Ashlagi, MIT Sloan

A kidney exchange network establishes a pool of patient-donor pairs, non-directed donors, and patients on the deceased donor waiting lists, and seeks to arrange transplants among this pool.
It has been previously shown that for sufficiently large pools of patient-donor pairs, (almost) efficient kidney exchange can be achieved by using at most 3-way cycles, i.e. by using cycles among no more than 3 patient-donor pairs.
However, as kidney exchange has grown in practice, cycles among n > 3 pairs have proved useful, and long chains initiated by non-directed, altruistic donors have proven to be very effective. We explore why this is the case, both empirically and theoretically. We provide an analytical model of exchange when there are many highly sensitized patients, and show that large cycles of exchange or long chains can significantly increase efficiency when the opportunities for exchange are sparse. As very large cycles of exchange cannot be used in practice, long non-simultaneous chains initiated by non-directed donors significantly increase efficiency in patient pools of the size and com- position that presently exist. Most importantly, long chains benefit highly sensitized patients without harming low-sensitized patients.
One way to make kidney exchange networks thicker is by waiting for pairs to arrive—waiting longer allows the pool to become larger, but imposes costs on those already in the pool. We study empirically and theoretically the impact of the waiting periods on the number of matches.

Based on joint work with David Gamarnik, Alvin Roth, Mike Rees, Patrick Jaillet and Vahideh Manshadi

Take it or Leave it: Running a Survey when Privacy Comes at a Cost
Katrina Ligett, Caltech

In this talk, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by.

Joint work with Aaron Roth.


Deceased Organ Donation and Allocation: 3 Experiments in Market Design
Al Roth, Stanford

How should you allocate a scarce resource whose scarcity/availability might be sensitive to how you allocate it? I'll talk about some work I'm just beginning involving the donation and allocation of deceased donor organs for transplantation. This presents a different set of problems from our (more advanced) work on live organ donation and kidney exchange that has received a lot of recent publicity (and was the subject of Itai Ashlagi's recent talk). However the evidence suggests that even for the case of kidney transplantation the current scarcity cannot be solved by live donation alone, and for most organs live donation is not feasible. I'll discuss the new Israeli organ transplant law, and some experiments that will help us understand it better as field data become available. I'll also discuss an experiment in how organ donors are solicited in the United States.

DIMACS Workshop on Economic Aspects of Information Sharing

The DIMACS Workshop on Economic Aspects of Information Sharing is being held at Stanford Feb 7-8, 2013. It is being co-hosted and co-sponsored by RAIN.

In recent years, the collection and analysis of user data over the Internet has experienced a surge. Online services and social networking sites have facilitated and encouraged the online disclosure of a wide array of personal information, while webtracking platforms have enabled the consolidation of user data from disperse sources. The reach of these services has been significantly extended by the integration of mobile devices with the cloud. Companies increasingly view data analytics as an integral part of their operations, and there is an increased need of new means of monetizing data beyond advertising. In parallel, regulatory bodies, consumer advocacy groups as well as Internet companies themselves have struggled to find a balance between the financial benefits of user data monetization and the privacy concerns that it raises.

Motivated by these developments, this workshop aims at studying economic aspects of information sharing from several different aspects, including online data markets, targeted advertising, user incentive mechanisms, and privacy-aware mechanism design.

Citation networks: Evaluating the impact of science
Santo Fortunato, Aalto University, Finland

Citation statistics are regularly adopted to evaluate different actors in the academic and scientific arena, from scholars, to journals, departments, universities, national institutions and whole countries. The scores play a crucial role to decide which grants are awarded, the rankings of applicants for a job, even the fate of scientific institutions. For these reasons citation networks have been carefully investigated over the past decades. I will discuss some recent findings in this area. Citation distributions are universal across many different disciplines, if suitable normalized indicators are adopted as measures of performance, so avoiding the bias of field dependence. Furthermore, citation dynamics is characterized by bursts, usually occurring within a few years since the publication of a paper, and the burst size spans several orders of magnitude. This bursty dynamics can be described by a linear preferential attachment model with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.

Crowd-Powered Systems
Michael Bernstein, Stanford

Crowd-powered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone.

Unfortunately, crowd work is error-prone and slow, making it difficult to incorporate crowds as first-order building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems.

In this talk, I will present two crowd-powered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid micro-contributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in mid-air jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation.

Rumor in a network: Who's the culprit?
Devavrat Shah, MIT

Suppose a malicious "worm" has spread in a network (e.g. Stuxnet) and interest is in finding who is the source (so that many of the following possible actions be taken: report news to the world, reward, take punitive action). Motivated by such scenarios where "information" diffuses in the network starting from an unknown source and interest is in identifying the "source" based on the observed spread, we discuss the problem of finding the source of a rumor in a network. We will design a computationally efficient source estimator for the popular Susceptible-Infected (SI) information spreading model with generic spreading time. In establishing the effectiveness of the estimator, we shall explain the role played by the underlying network structure (a form of expansion property) in determining ability of detection. Connections to generalized Polya's urn, network centrality and number 1 - ln 2 will be explained.

This is based on joint work with Tauhid Zaman (MIT).

On Bitcoin and Red Balloons
Moshe Babaioff, Microsoft Research

Many large decentralized systems rely on information propagation to ensure their proper function. We examine a common scenario in which only participants that are aware of the information can compete for some reward, and thus informed participants have an incentive {\em not} to propagate information to others. One recent example in which such tension arises is the 2009 DARPA Network Challenge (finding red balloons). We focus on another prominent example: Bitcoin, a decentralized electronic currency system.

Bitcoin represents a radical new approach to monetary systems. It has been getting a large amount of public attention over the last year, both in policy discussions and in the popular press \cite{NY11,technology-review}. Its cryptographic fundamentals have largely held up even as its usage has become increasingly widespread. We find, however, that it exhibits a fundamental problem of a different nature, based on how its incentives are structured. We propose a modification to the protocol that can eliminate this problem. Bitcoin relies on a peer-to-peer network to track transactions that are performed with the currency. For this purpose, every transaction a node learns about should be transmitted to its neighbors in the network. As the protocol is currently defined and implemented, it does not provide an incentive for nodes to broadcast transactions they are aware of. In fact, it provides an incentive not to do so. Our solution is to augment the protocol with a scheme that rewards information propagation. Since clones are easy to create in the Bitcoin system, an important feature of our scheme is Sybil-proofness. We show that our proposed scheme succeeds in setting the correct incentives, that it is Sybil-proof, and that it requires only a small payment overhead, all this is achieved with iterated elimination of dominated strategies. We complement this result by showing that there are no reward schemes in which information propagation and no self-cloning is a dominant strategy.

Joint with Shahar Dobzinski, Sigal Oren, and Aviv Zohar.
You can find the paper here and here.
A short description appeared at ACM SIGecom Exchanges.

Information Asymmetries in first-price common-value auctions, or The Value of Information
David Kempe, USC

We study the role of information asymmetries in first-price common value auctions, and how different information structures provide revenue opportunities to third-party information brokers. One of our motivations is Internet ad auctions, where ads for specific site visitors have a common-value nature, in that displaying an ad to a bot has no value, and users are likely to carry the same (or similar) value to different advertisers. Cookies significantly improve the estimate of this value; the fact that different advertisers use cookies to different extents (and use cookies of different qualities) thus leads to significant information asymmetries. These asymmetries can be mitigated or exacerbated when the advertisers obtain cookie information from third-party information brokers.

We focus on a first-price common value auction in which bidders have access to independent discrete signals about the value of an item, which is either 0 or 1. Our first main result is that with two bidders, this auction format has a unique Nash equilibrium; we give a linear-time algorithm for computing this equilibrium and the resulting value of the auction for each player. We then use this equilibrium calculation to understand the value of additional information to the bidders. Third-party information has the interesting property that it can (in principle) be allocated to multiple bidders, and that it carries significant externalities.

We find that the value of additional information is subtle, and depends on the prior and the information already available. We exhibit several surprising possibilities including (1) Additional information about the item does not always result in higher value for a bidder. (2) A bidder may prefer his competitor (but not himself) to receive the broker's signal; this applies both to the more and less informed bidder.

[Joint work with Vasilis Syrgkanis and Eva Tardos]

Random Utility Models for Social Choice
Lirong Xia, Harvard

Social choice studies ordinal preference aggregation with wide applications ranging from high-stakes political elections to low-stakes movie rating websites. In many cases, we want to design a social choice mechanism that reveals the ground truth via MLE inference. Despite its importance, this objective has been largely hindered by the lack of natural probabilistic models and efficient inference algorithms.

In this talk, I will focus on a wide class of probabilistic models called Random Utility Models (RUMs), whose MLE inference was previously believed intractable in general. I will introduce a fast MC-EM algorithm for a very general and natural subclass of RUMs, and discuss its applications and impact on designing better social choice mechanisms. Extension of this algorithm also provides a computational basis for improving models in many applications in economics as well as machine learning, especially learning to rank.

Based on joint work with Hossein Azari Soufiani and David C. Parkes.

Advertiser Behavior in Sponsored Search Auctions
Susan Athey, Stanford

Economists often build "structural models," where they specify a specific model of individual behavior and then use data to estimate the parameters of the model. Although such models require strong assumptions, they have the advantage that they can make principled predictions about what would happen if the environment changed in a way that has not been observed in the past. I will describe the application of these techniques to advertiser behavior in sponsored search advertising auctions, focusing on how the models can be used for marketplace design and management. I will discuss economists' focus on causal inference in statistical models as well as the ways in which experiments can be used to estimate and test structural models. I will also make some suggestions about research directions at the intersection of economics and machine learning.

Automatic Discovery of Patterns in Media Content
Nello Cristianini, University of Bristol

What can we learn about the world (and the media system) by analysing millions of news articles or tweets ? Media content analysis has historically been the domain of the social sciences, but recently we are witnessing a strong trend towards the automation of many tasks, paving the way for a new - computational - approach to social science and even the humanities. In this talk we will survey the results obtained over the past 5 years at the Intelligent Systems Laboratory of Bristol, in the area of automating the analysis of news media content. By combining techniques from machine translation, pattern recognition, statistical learning, information retrieval, we will analyse patterns connected to the US Presidential Elections, to UK public opinion, and to EU cultural biases.

Auctions and Allocations with Positive Network Externalities
Kamesh Munagala, Duke University

With the advent of social networks such as Facebook, Twitter, and LinkedIn, and online offers/deals web sites, network externalties raise the possibility of marketing and advertising to users based on influence they derive from their neighbors in such networks. Indeed, a user's valuation for a product is changed by his knowledge of which of his neighbors likes or owns the product. Designing allocation schemes and auctions in the presence of network externalities is not very well understood, and poses many challenges that are hard to address with traditional techniques. We show that even in very simple settings, optimal auction design becomes APX-Hard, and pricing schemes may not admit to Nash equilibria in buyer behavior. On the positive side, we present approximation algorithms for welfare maximization and optimal auctions in both single item and multi-item settings via interesting linear programming relaxations.
Joint work with Anand Bhalgat and Sreenivas Gollapudi (working paper); and with Nima Haghpanah, Nicole Immorlica, and Vahab Mirrokni (EC 2011).

Adolescent Societies - Their Form, Evolution, and Variation
Daniel McFarland, Stanford University

Adolescent societies - whether arising in weak, short-term classroom friendships or close, long-term friendships - exhibit tendencies toward segregation, hierarchy and clustering. It is difficult, however, to explain the macro-variation we observe in these social networks drawing only on tie formation micro-mechanisms. Some adolescent societies are rank-ordered caste systems and others are flat, cliquish worlds. What accounts for this variation? We propose and test an ecological theory of social network formation where features of organizational environments moderate tie formation processes to account for the macro-structural variation across settings. We develop this argument using longitudinal friendship data on schools (Add Health study) and classrooms (Classroom engagement study), and by extending exponential random graph models to the study of multiple societies over time.
Joint work with James Moody, David Diehl, Jeff Smith, R. Jack Thomas

The Simple Economics of Approximately Optimal Auctions
Jason Hartline, Northwestern University

Systems wherein strategic agents compete for limited resources are ubiquitous: the economy, computer networks, social networks, congestion networks, etc. Auction theory governs the design and analysis of these systems. I will describe a method for using approximation in auction theory to identify simple, practical, and robust auction mechanisms that perform nearly as well as the complex optimal ones. A main goal of this approach is to understand the extent to which important intuition from optimal auctions for ideal models extends to approximately optimal auctions for more realistic models.
Auction theory is well understood only in ideal models where agents have single-dimensional, linear preferences. I.e., each agent has a value for receiving a service and her utility is the difference between her value and her payment. For these models optimal auctions can be interpreted as "marginal revenue" maximizers (Myerson, 1981; Bulow and Roberts 1989). In more realistic models, i.e., where bidders have multi-dimensional preferences for different outcomes or non-linear utility, similar intuitive characterizations are unknown. Understanding good auctions for these environments is considered to be the main challenge in auction theory. In these more realistic environments maximizing marginal revenue may not be optimal, and furthermore, there is sometimes no direct way to implementing the marginal revenue maximization mechanism. I will present two results: I will give procedures for implementing marginal revenue maximization in general, and I will show that marginal revenue maximization is approximately optimal. The approximation factor smoothly degrades in a term that quantifies how far the environment is from an ideal one (i.e., where marginal revenue maximization is optimal).
Joint work with Saeed Alaei, Hu Fu, Nima Haghpanah, and Azarakhsh Malekian.

Beyond Keyword Search: Taming Information Overload with Rich User Interactions
Khalid El-Arini, Carnegie Mellon University

It has been almost two decades since the first search engine appeared on the Web. In that time, much of our lives has drifted online, from the news we read to the products we buy to our relationships with friends, family and colleagues. As such, information overload has become a ubiquitous problem that affects nearly all members of society, as we try to sift through millions of articles and status updates to get an understanding of our world. However, while our information needs have grown more complex and varied over time, the traditional information retrieval paradigm has scarcely changed: users still submit strings of keywords, and receive a ranked list of hyperlinks in return.
In this talk, I argue the need for new models of user interaction that are better suited to modern retrieval tasks. Some tasks, such as reading the news, can be inherently queryless, while others, such as conducting scientific research, require complex queries. In both cases, a user's personal tastes and beliefs can color the results he or she expects to see. Here, I present a general information retrieval framework that leverages ideas from submodular optimization and probabilistic graphical models to unify both settings. I show results in the news and science domains demonstrating that our approach outperforms state-of-the-art alternatives from Google and Yahoo! on real users.
Finally, as personalization itself has become ubiquitous on the Web, it is important to provide users with an interaction mechanism that allows them to inspect--and potentially correct--inferred user profile information. I present recent work that learns such transparent user profiles from user behavior in a large-scale social network.

Optimal targeting strategies over social networks
Kostas Bimpikis, Stanford University

Recent advances in information technology have allowed ^Lrms to gather vast amounts of data regarding consumers' preferences and the structure and intensity of their social interactions. The question that arises naturally is whether ^Lfirms can intelligently use the available data to improve their business strategies. In this presentation, we take a step towards this direction by discussing our results on optimal price discrimination and targeted advertising over networks. In particular, in the ^Lrst part of the talk, we study the pricing problem of a monopolist selling a divisible good (service) to consumers who are embedded in a social network. We derive a characterization of the optimal price for each consumer as a function of her network position and illustrate the value of knowing the underlying network structure by comparing the pro^Lts of a monopolist who does not take the network into account to those of a monopolist who uses this information optimally.
The second part of the talk examines a game-theoretic model of competition between ^Lfirms, which can target their marketing budgets to individuals. We provide a sharp characterization of the optimal targeted marketing strategies and highlight their dependence on the consumers' references as well as on the underlying social network structure. In particular, firms find it optimal to allocate their marketing budgets to consumers in proportion to their \network centrality", a measure of social influence. Moreover, we identify network structures for which targeted advertising is more beneficial for the ^Lfirms and, ^Lnally, we show how the difference in the initial budgets affect the outcome of the marketing competition between them.

Individual Heterogeneity and the Formation of Collaboration Networks
Katharine Anderson, Carnegie Mellon University

Collaboration drives innovation in a variety of contexts, including product development, entrepreneurship, policy-making, and academic research. Collaboration networks--in which two people are connected if they work together on a problem--provide a valuable tool for understanding the structure of these collaborative communities. Empirically, individuals in collaboration networks play a wide variety of roles--some individuals have many links, while others have few links, some bridge between communities while others play a more peripheral role. From a social scientific perspective, we would like to believe that the heterogeneity we observe in social network position reflects some kind of underlying heterogeneity in individuals--that individuals occupy different positions in the social network, not because they were lucky, but because of some identifiable underlying characteristics. However, our current models of network formation use homogeneous agents, and thus cannot be used to answer questions about individual heterogeneity and network structure.
Here, I introduce a model of network formation in which individuals with heterogeneous skill sets choose their collaborators in order to gain access to skills that are complementary to their own. Together, these collaborative connections comprise a collaboration network. This class of models is interesting, because it connects individual heterogeneity to network heterogeneity, and thus can be used to answer questions about 1) how an individual's skill set affects her position on the social network and 2) how the distribution of skills in the population affects the overall structure of the network. Using this model, I show that an individual's degree on the collaboration network is a supermodular function of her set of skills. As a result, the degree distribution of the network may be highly skewed, even when the distribution of skills in the population is not. This skew becomes more pronounced as individuals in the network have fewer of the skills required to solve their problems--that is, as problems get more difficult. I then use an extension of this model to make predictions about the network position of specialists--individuals whose skills lie within a single subject area--and generalists--individuals whose skill bridge different subject areas. I show that while specialists tend to have more links than generalists, generalists are more likely to bridge between communities. I also show that as disciplinary boundaries fade, the degree distribution of the network becomes more skewed towards a few, high-degree superstars.

DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management.
Dan Iancu, Stanford University

Robust optimization has recently emerged as a computationallytractable and scalable paradigm for modeling complex decision problems underuncertainty. Its applications span a wide range of challenging problems infinance, pricing, supply chain management or network design. In the context of dynamicrobust decision models, a typical approach is to look for control policies (ordecision rules) that are directly parameterized in the uncertaintiesaffecting the model. This has the advantage of leading to simple convexoptimization problems for finding particular classes of rules (e.g., affine).However, the approach typically yields suboptimal policies, and is hard toanalyze. In the first part of the talk, we seek to bridge this paradigm withthe classical Dynamic Programming (DP) framework for solving dynamic decision problems. We provide a set of unifying conditions (based onthe interplay between the convexity and supermodularity of the DP valuefunctions, and the lattice structure of the uncertainty sets) that, takentogether, guarantee the optimality of the class of affine decision rules, and furthermore allow suchrules to be found very efficiently. Ourresults suggest new modeling paradigms for dynamic robust optimization, and ourproofs bring together ideas from three areas of optimization typically studied separately: robust,combinatorial (lattice programming andsupermodularity), and global (the theory of concave envelopes). Weexemplify the approach in a model for negotiating flexiblesupply chain contracts between several parties, where optimal capacity,contracting, and ordering policies can be found by solving a simple linearprogram. In the second part of the talk, we consider a class of models arisingin supply chain networks and multi-product pricing, for which we suggest a newclass of polynomial decision rules, found by solving compact semidefinite programs,and typically yielding drastic improvement compared with the status quo. This is joint work with Dimitris Bertsimas, PabloParrilo, Mayank Sharma and Maxim Sviridenko.

Online Concave Matching
Kamal Jain, Microsoft Research

Online matching was introduced by Karp, Vazirani, Vazirani [KVV] in 1990, which over time became a corner stone of online algorithms. In the matching problem of KVV, the edges are unweighted. Mehta, Saberi, Vazirani, and Vazirani [MSVV] and Buchbinder, Naor, and Jain, developed algorithms for the weighted version of online matching but at the expense of finding a fractional solution. In many large scale instances, such as matching ads to consumers, fractionality is not a major compromise.
The major compromise is that the existing work allows only hard budget constraint, e.g., so many matches per node or a linear utility function bounded above by a budget constraint, which essentially gives you a very specific concave utility function. In practice people have concave utility functions.
In this work we obtain an online algorithm for concave matching problem. The performance of our algorithm is optimal. We use a convex programming duality. Using variational calculus we show how the worst possible example is dual to the best possible algorithm, thereby proving the optimality of the performance guarantee of our algorithm. In hind sight, one can reconstruct the algorithm by studying the worst possible example itself.
Credits: This is a joint work with Nikhil Devanur.
Acknowledgement: This is a work, I had been doing for the last 7 years. Mark Braverman, Alexander Holroyd, Yuval Peres, and Oded Scramm are some of the people who have guided us.

Price Prediction Strategies for Simultaneous One-Shot Auctions
Michael Wellman, University of Michigan

Bidding in simultaneous auctions is challenging because an agent's value for a given good may depend on the outcome of other auctions. Given the lack of known solutions to general simultaneous auction games, previous works have addressed this classic exposure problem with heuristic strategies employing probabilistic price predictions. We introduce a concept of self-confirming prices, and show that within an independent private value model, bidding optimally with respect to self-confirming price predictions is w.l.o.g. in equilibrium. We exhibit practical procedures to compute approximate equilibrium strategies, and demonstrate via empirical game-theoretic analysis that these strategies are effective in simultaneous auction environments with both complementary and substitutable preference structures.
** This is joint work with Amy Greenwald and Eric Sodomka of Brown University.

Lattice Games and the Economics of Aggregators
George Varghese, UCSD and Yahoo! Research

We model the strategic decisions of web sites in content markets, where sites may reduce user search cost by aggregating content. Example aggregations include political news, technology, and other niche-topic websites. We model this market scenario as an extensive form game of complete information, where sites choose a set of content to aggregate and users associate with sites that are nearest to their interests. Thus, our scenario is a location game in which sites choose to aggregate content at a certain point in user-preference space, and our choice of distance metric, Jacquard distance, induces a lattice structure on the game. We provide two variants of this scenario: one where users associate with the first site to enter amongst sites of equal distances, and a second where users choose uniformly between sites at equal distances. We show that Subgame Perfect Nash Equilibria exist for both games. While it appears to be computationally hard to compute equilibria in both games, we show a polynomial-time satisficing strategy called Frontier Descent for the first game. A satisficing strategy is not a best response but ensures that earlier sites will have positive profits, assuming all subsequent sites also have positive profits. By contrast, we show that the second game has no satisficing solution.
**Joint work with P. Jordan, U. Nadav, K. Punera and A. Skrzypacz at Yahoo

Competitive Contagion in Networks
Michael Kearns, University of Pennsylvania

We examine a game-theoretic model of "competitive contagion" in networks, where two competing companies or other entities have limited budgets to seed initial infections in an underlying social network, which then mediates stochastic contagion. The payoff to each party is their final number of infections, which at equilibrium may come at the expense of the other party. In this model we provide characterizations of the Price of Anarchy and a new notion called the Price of Budgets. These characterizations depend on properties of the local stochastic dynamics of contagion, and in some cases exhibit sharp threshold behavior.
**Joint research with Sanjeev Goyal of the University of Cambridge.

Building an Institutional Field to Corral a Government
Stephen Barley, Stanford University

This talk will describe how the American business community either founded or revitalized eight distinct populations of organizations during the late 1970s and the 1980s which intentionally and unintentionally formed an institutional field (a network of organizations) designed to influence the federal government and create a more business friendly political environment.

Cooperation and Assortativity with Endogenous Partner Selection
Siddharth Suri, Yahoo! Research

The natural tendency for humans to choose with whom to form new relationships and with whom to end established relationships is thought to facilitate the emergence of cooperation. Helping cooperators to mix assortatively is believed to reinforce the rewards accruing to mutual cooperation while simultaneously excluding defectors. However, the relationship between endogenous partner selection, assortativity, and cooperation has been largely unexplored experimentally. Here we report on a series of human subjects experiments in which groups of 24 participants played a multi-player prisoner's dilemma game where, critically, they were also allowed to propose and delete links to players of their own choosing at some variable rate. Over a wide variety of parameter settings and initial conditions, we found that endogenous partner selection significantly increased the level of cooperation, the average payoffs to players, and the assortativity between cooperators. Even relatively slow update rates were sufficient to produce large effects resulting in cooperation levels over 80%. Subsequent increases to the update rate still had a positive, although smaller, effect. For standard prisoner's dilemma payoffs, we also found that assortativity resulted predominantly from cooperators avoiding defectors, not by severing ties with defecting partners, and that cooperation correspondingly suffered. Finally, by modifying the payoffs to satisfy two novel conditions, we found that cooperators did punish defectors by severing ties, leading to levels of cooperation approaching 100% which persisted for longer.
**Joint work with: Jing Wang (NYU) and Duncan Watts (Yahoo! Research)

Two(!) Good To Be True
Sergiu Hart, The Hebrew University of Jerusalem

How to sell goods optimally? While the mechanism design literature has solved this problem neatly when there is only one good, the multiple goods case turns out to be extremely difficult, mathematically and conceptually. Much of what is true for one good does not extend to multiple goods. We will try to explain the difficulties, show what can go wrong, and then present some universal approximation results. The talk is essentially self-contained; no background in mechanism design is necessary.
**Joint work with Noam Nisan and Phil Reny

Identifying peer effects in online communication behaviors
Dean Eckles, Facebook

Peer effects can produce clustering of behavior in social networks, but so can homophily and common external causes. For observational studies, adjustment and matching estimators for peer effects require often implausible assumptions, but it is only rarely possible to conduct appropriate direct experiments to study peer influence in situ. We describe research designs in which individuals are randomly encouraged to perform a focal behavior, which can subsequently influence their peers. Ubiquitous product optimization experiments on Internet services can be used for these analyses. This approach is illustrated with an analysis of peer effects in expressions of gratitude via Facebook on Thanksgiving Day 2010, with implications for the micro-foundations of culture.

Networks, communities and the ground-truth
Jure Leskovec

The Web, society, information, cells and brain can all be represented and studied as complex networks of interactions. Nodes in such networks tend to organize into clusters and communities, which represent the fundamental structures for understanding the organization of complex systems. Even though detection of network communities is of significant importance for computer science, sociology and biology, our understanding of the community structure of large networks remains limited.

We study a set of more than 200 large networks with the goal to understand and identify communities in networks. We challenge the conventional view of network community structure and show that it is not exhibited by the large real-world networks. We then present a new conceptual model of network community structure, which reliably captures the overall structure of networks and accurately identifies the overlapping nature of network communities.

This is joint work with Jaewon Yang.

Computing Game-Theoretic Solutions for Security
Vincent Conitzer

Algorithms for computing game-theoretic solutions are now deployed in real-world security domains, notably air travel. These applications raise some hard questions. How do we deal with the equilibrium selection problem? How is the temporal and informational structure of the game best modeled? What assumptions can we reasonably make about the utility functions of the attacker and the defender? And, last but not least, can we make all these modeling decisions in a way that allows us to scale to realistic instances? I will present our ongoing work on answering these questions.

Joint work with Dmytro Korzhyk, Joshua Letchford, Kamesh Munagala, Ronald Parr (Duke); Manish Jain, Zhengyu Yin, Milind Tambe (USC); Christopher Kiekintveld (UT El Paso); Ondrej Vanek, Michal Pechoucek (Czech Technical University); and Tuomas Sandholm (CMU).

Brief bio: Vincent Conitzer is the Sally Dalton Robinson Professor of Computer Science and Professor of Economics at Duke University. His research focuses on computational aspects of microeconomic theory, in particular game theory, mechanism design, voting/social choice, and auctions. He recently received the IJCAI Computers and Thought Award, which is awarded to outstanding young scientists in artificial intelligence.

Revenue-optimal Auctions
Costis Daskalakis

In his seminal paper, Myerson [1981] provides a revenue-optimal auction for a seller who is looking to sell a single item to multiple bidders. Unfortunately, Myerson's auction generalizes to a limited range of domains, called "single-parameter", where each bidder's preference over the auction’s outcomes is specified by a single parameter. Indeed, extending this auction to "multi-parameter domains", such as selling multiple heterogeneous items to bidders, has been one of the most central problems in Mathematical Economics. We solve this problem in bidder- and item- symmetric settings. For a single bidder, we solve the general problem.

(Based on joint work with Yang Cai and Matt Weinberg)

Bio: Constantinos (or Costis) Daskalakis is an Assistant Professor of EECS at MIT. He completed his undergraduate studies in Greece, at the National Technical University of Athens, and obtained a PhD in Computer Science from UC Berkeley. After Berkeley he was a postdoctoral researcher in Microsoft Research New England, and since 2009 he has been at the faculty of MIT. His research interests lie in Algorithmic Game Theory and Applied Probability, in particular computational aspects of markets and the Internet, social networks, and computational problems in Biology. Costis has been honored with a 2007 Microsoft Graduate Research Fellowship, the 2008 Game Theory and Computer Science Prize from the Game Theory Society, the 2008 ACM Doctoral Dissertation Award, a NSF Career Award, a 2010 Sloan Foundation Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, and the MIT Ruth and Joel Spira Award for Distinguished Teaching.

A Random Graph Model of Multi-Hospital Kidney Exchanges
David Parkes

Multi-hospital kidney exchanges, where hospitals share local lists of donor-patient pairs with a centralized exchange, promise to faciliate larger matches, and thus additional transplants. But a central challenge is to align incentives, so that hospitals will choose to share all pairs with the exchange. We present a random-graph model of feasible transplants, and use the model to (a) explain early experimental results on 2-way and 3-way exchanges anayltically, (b) derive a "square-root law" for welfare gains from centralized pools, and (c) design a matching mechanism that is Bayes-Nash incentive compatible, efficient and individual rational under reasonable assumptions. The robustness of the mechanism is established through a computational study, validating incentive alignment, and estimating an efficiency loss due to incentive alignment of 1-4% for 2-way exchanges and 5-19% for 3-way exchanges (and decreasing with larger pools.)

Joint work with Panos Toulis

A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs
Arnout van de Rijt

Contemporary scholars of fame argue that public attention to persons is short-lived. We investigate this "fifteen minutes" hypothesis in a unique data source containing daily records of references to millions of names in 2,500 English-language newspapers, on TV station websites, and on blogs. Longitudinal analysis shows that instead fame tends to carry over from one year to the next, except below some threshold level of marginal public attention where fame does appear to be fleeting. This pattern of above-threshold annual stability is found back even in newspaper entertainment sections, celebrity-oriented tabloids, and blogs, where fame is thought to be most ephemeral. Analysis of archival newspaper data reveals that while coverage of small names follows a pattern of rapid decay, big names traverse a career-like trajectory spanning many years. We take this evidence to suggest that fame is created through an interaction between static social hierarchies and dynamic reinforcement processes, whereby a population is pre-stratified in terms of the chances that a cascade of public celebration will latch on to one of its members.

Treasure Hunt
Markus Mobius

We seed a large real-world social network with binary information and analyze subsequent social learning. A unique feature of our field experiment is that we measure both the pre-existing social networks and the actual conversation network. Our experiment allows us to test how rational agents behave when processing information that originates within their social network. We find that information decays quickly with social distance and that agents mainly incorporate information within social distance 2. Conversations through common friends do not increase the weight that a subject places on signals from direct friends but linearly increases the weight on signals from indirect friends. This suggests that agents are able to avoid double-counting information from indirect friends. We propose a simple “streams model” of social learning that is consistent with the evidence from our experiment.

Task Routing for Prediction Tasks
Yiling Chen

We describe methods for routing a prediction task on a network where each participant can contribute information and route the task onwards. We introduce routing scoring rules that bring truthful contribution of information and optimal routing of the task into a Perfect Bayesian Equilibrium under common knowledge about the amount of information available to each agent on the network. Relaxing the common knowledge assumption, we address the challenge of routing in situations where states of information differ among agents. We introduce a family of local routing rules, which isolate routing decisions that depend only on local information in equilibrium, while still promoting effective routing. Local routing rules are the only routing scoring rules that induce truthful equilibria in which agents' routing decisions rely solely on local information. Simulation results show that local routing decisions can lead to effective task routing.

Joint work with Haoqi Zhang, Eric Horvitz, and David Parkes.

Bargaining Theory in the Cloud
Eric Friedman

The axiomatic theory of bargaining solutions was initiated by John Nash with his seminal paper in 1950 and has a long and mostly mathematical history. Surprisingly, it arises naturally in a variety of allocation problems arising in cloud computing. For example, the second most famous bargaining solution, the Kalai-Smorodinsky solution, is the outcome of a simple water filling algorithm used in the Mesos Platform and has many strong properties in that setting, including incentive compatibility and fairness. In this talk, I will explore these connections for a variety of cloud computing problems and show how axiomatic bargaining theory can be used to analyze allocation problems in the cloud and conversely how cloud computing sheds new light on axiomatic bargaining theory.

This talk is based on joint work with Ali Ghodsi, Scott Shenker and Ion Stoica.

Network Patterns of Favor Exchange
Matthew Jackson

Abstract: We examine the informal exchange of favors among the members of a society. We analyze a game where the fear of losing multiple relationships can provide incentives for favor provision. We characterize the network patterns of exchange that are robust in a sense that deleted relationships only result in a local loss of favor exchange. Such robustness necessitates networks such that all links are "supported": any pair of individuals exchanging favors must have a common friend. We show that favor exchange networks in 75 villages in rural India exhibit a frequency of this sort of support that is significantly higher than a standard network 'clustering' measure.

Co-authors Tomas Rodriguez-Barraquer and Xu Tan

Simple Auctions with Near-Optimal Equilibria
Tim Roughgarden

In practice, auction designers often exchange incentive-compatibility for simplicity (e.g., in sponsored search auctions or combinatorial auctions). How much welfare loss does this design choice cause? I'll talk about some techniques for answering this question.

Based largely on a SODA '11 paper with Kshipra Bhawalkar.

Efficient Online Ad Serving in a Display Advertising Exchange
Kevin Lang

We begin with a tutorial overview of the basic concepts of Non-Guaranteed (i.e. spot-market) display advertising and advertising exchanges, then pose and solve a constrained path optimization problem that lies at the heart of the real-time ad serving task in Yahoo!'s NGD Ad Exchange.

In Yahoo!'s Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the numerous business entities that belong to the Exchange. These entities include not only publishers and advertisers, but also intermediate entities called ``ad networks'' which have delegated their ad serving responsibilities to the Exchange.

Path validity is determined by business constraints which focus on the following three issues: 1) suitability of the display opportunity's web page and its publisher 2) suitability of the user who is currently viewing that web page, and 3) suitability of a candidate ad and its advertiser.

Path optimality is determined by money reaching the publisher, and is affected not only by an advertiser's bid, but also by the revenue-sharing agreements between the entities in the candidate path.

We describe two different algorithms that have both been used in the actual Yahoo! non-guaranteed ad serving system, focusing on typical case rather than worst case performance, and on the optimality-preserving speedup heuristics that allow latency targets to be met.

The talk is based on a WSDM 2011 paper that was co-written with Joaquin Delgado, Dongming Jiang, Bhaskar Ghosh, Shirshanka Das, Amita Gajewar, Swaroop Jagadish, Arathi Seshan, Chavdar Botev, Michael Binderberger-Ortega, Sunil Nagaraj, and Raymie Stata.

Designing Algorithms for MapReduce
Sergei Vassilvitskii

The amount of data available today has lead to analysis problems whose size would be unimaginable just 10 years ago. With datasets routinely measured in gigabytes and terabytes, one challenge that has emerged is scaling algorithms to handle this deluge of information. With chip designers sustaining Moore’s law by turning to parallelism, so have algorithmicists turned to parallelism to speed up algorithms and improve performance. In this talk, we will focus on the MapReduce model of parallel computing, highlight some of the recent algorithmic advances and present current challenges.

Based on joint work with Howard Karloff, Silvio Lattanzi, Ben Moseley and Siddharth Suri.

Incentivizing High-quality User-Generated Content
Arpita Ghosh

We model the economics of incentivizing high-quality user generated content (UGC), motivated by settings such as online review forums, question-answer sites, and comments on news articles and blogs. We provide a game-theoretic model within which to study the problem of incentivizing high quality UGC, in which contributors are strategic and motivated by exposure. Our model has the feature that both the quality of contributions {\em as well as} the extent of participation is determined endogenously in a free-entry Nash equilibrium. The model predicts, as observed in practice, that if exposure is independent of quality, there will be a flood of low quality contributions in equilibrium. An ideal mechanism in this context would elicit both high quality and high participation in equilibrium, with near-optimal quality as the available attention diverges, and should be easily implementable in practice. We consider a very simple elimination mechanism, which subjects each contribution to rating by some number $A$ of viewers, and eliminates any contributions that are not uniformly rated positively. We construct and analyze free-entry Nash equilibria for this mechanism, and show that $A$ can be chosen to achieve quality that tends to optimal, along with diverging participation, as the number of viewers diverges.

Joint work with Preston McAfee.

Coevolution of Network Structure and Content
Lada Adamic

As individuals communicate, their exchanges form a dynamic network. A time series analysis of both standard and novel network metrics can be used to characterize a network's evolution. We demonstrate that network structure alone can be highly revealing of the diversity and novelty of the information being communicated in the network. Networks with a higher conductance exhibit higher information entropy, while unexpected network configurations are indicative of higher information novelty.

This is joint work with Liuling Gong, Edwin Teng, and Avishay Livne.

Damon Centola


BGP Safety with Spurious Updates - The Conditions of BGP Convergence
Martin Suchara

We explore BGP safety, the question of whether a BGP system converges to a stable routing, in light of several BGP implementation features that have not been fully included in the previous theoretical analyses. We show that Route Flap Damping, MRAI timers, and other intra-router features can cause a router to briefly send ?spurious? announcements of less-preferred routes. We demonstrate that, even in simple configurations, this short-term spurious behavior may cause long-term divergence in global routing. We then present DPVP, a general model that unifies these sources of spurious announcements in order to examine their impact on BGP safety. In this new, more robust model of BGP behavior, we derive a necessary and sufficient condition for safety, which furthermore admits an efficient algorithm for checking BGP safety in most practical circumstances -- two complementary results that have been elusive in the past decade's worth of classical studies of BGP convergence in more simple models. We also consider the implications of spurious updates for well-known results on dispute wheels and safety under filtering. Joint work with Alex Fabrikant and Jennifer Rexford.

Predicting the Popularity of Online Content
Gabor Szabo

Online services that collect, categorize, and disseminate user-generated content have seemingly very different focus, user communities, and content types. We present a study on the popularity of content in three online services, Youtube, Twitter, and Digg. While videos on Youtube, short messages in Twitter, and interesting articles on Digg do not necessarily have much in common, it turns out that the patterns of access to content on any of these services are very similar across the sites. In particular, we present predictions for the long-term popularity of content based on early observations, and moreover discuss what role the social network plays in determining the success of a submission.

Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism
Elisa Celis

I will first discuss bid data from Microsoft’s AdECN platform, which allows advertisers to target their bids. We find that the valuations are irregular, making the commonly used Vickrey auction unsuitable for extracting revenue. However, Myerson's optimal auction mechanism is often unsuitable in practice.

I will present and discuss the benefits of a simple auction mechanism (BIN-TAC) which extends the second-price auction with reserve. This mechanism is particularly effective in private value environments where the distribution of valuations are irregular, and is truthful in expectation. We use the AdECN data in counterfactual experiments that suggest our BIN-TAC mechanism would improve revenue by 24% relative to the mechanism currently in place.

Social Learning and the Dynamic Cavity Method
Yashodhan Kanoria

Social learning constitutes learning of behavior from interaction with friends/neighbors on a network. This talk will focus on models of repeated interaction, with agents 'voting' in a series of rounds on some issue of interest. Votes in the initial round are based on 'private signals', whereas votes in future rounds incorporate knowledge of previous votes cast by friends.

We look at two different models of iterative learning. A very simple model is 'majority dynamics' where agents choose their vote based on the majority of neighbors' votes in the previous round. We analyze this model on regular trees. At the other extreme is a iterative Bayesian learning: a fully rational model introduced by Gale and Kariv (2003). We introduce new algorithms for this model, challenging the widespread belief that learning is computationally intractable in this setting. We develop a novel technique -- the 'dynamic cavity method', that serves as a key tool for both models.

Based on joint work with Andrea Montanari (to appear in Ann. App. Prob.) and Omer Tamuz (submitted).

The Complexity Ecology of Consensus Ranking
Matthias Mnich

Preference lists are extensively used in social science surveys and voting systems to capture information about choice. In many such scenarios there arises the need to combine the data represented by many lists into a single list which reflects the opinion of the surveyed group as much as possible. This yields a class of ranking aggregation problems, where we are given a set of permutations (votes) over a set of alternatives (candidates), and are asked for a permutation of the set candidates such that some objective is minimized.Examples include Kemeny ranking and other consensus methods, that find important applications in web search engines, e-mail spam detection, and user recommendation systems.

We present results on this topic from a multivariate complexity and algorithms analysis, based on parameterized complexity.

(Joint work with Henning Fernau, Fedor Fomin, Daniel Lokshtanov, Geevarghese Philip and Saket Saurabh)

Contribution Games in Social Networks
Elliot Anshelevich

We consider network contribution games, where each agent in a social network has a budget of effort that it can contribute to different collaborative projects or relationships. Depending on the contribution of the involved agents a relationship will flourish or drown, and to measure the success we use a reward function for each relationship. Every agent is trying to maximize the reward from all relationships that it is involved in. We consider pairwise equilibria of this game, and characterize the existence, computational complexity, and quality of equilibrium based on the types of reward functions involved. For example, when all reward functions are concave, we prove that the price of anarchy is at most 2. For convex functions the same only holds under some special but very natural conditions. A special focus of the talk are minimum effort games, where the reward of a relationship depends only on the minimum effort of any of the participants. Finally, we show tight bounds for approximate equilibria and convergence of dynamics in these games.

Liquidity in Credit Networks: A Little Trust Goes a Long Way
Pranav Dandekar

Credit networks represent a way of modeling trust between entities in a network. Nodes in the network print their own currency and trust each other for a certain amount of each other's currency. This allows the network to serve as a decentralized payment infrastructure--- arbitrary payments can be routed through the network by passing IOUs between trusting nodes in their respective currencies---and obviates the need for a common currency. Nodes can repeatedly transact with each other and pay for the transaction using trusted currency. A natural question to ask in this setting is: how long can the network sustain liquidity, i.e. how long can the network support the routing of payments before credit dries up? We answer this question in terms of the long term success probability of transactions for various network topologies and credit values.

We show that the success probability of transactions is independent of the path used to route .ow between nodes. For symmetric transaction rates, we show analytically and via simulations that the success probability for complete graphs, Erdos-Renyi graphs and preferential attachment graphs goes to one as the size, the density or the credit capacity of the network increases. Finally, we compare liquidity in credit networks to that in a centralized payment infrastructure and show that credit networks are at most a constant factor worse; thus we do not lose much liquidity in return for their robustness and decentralized properties.

Joint work with Ashish Goel, Ramesh Govindan and Ian Post.

User Browsing Models: Relevance versus Examination
Sugato Basu

There has been considerable work on user browsing models for search engine results, both organic and sponsored. The click-through rate (CTR) of a result is the product of the probability of examination (will the user look at the result) times the perceived relevance of the result (probability of a click given examination). Past papers have assumed that when the CTR of a result varies based on the pattern of clicks in prior positions, this variation is solely due to changes in the probability of examination.

We show that, for sponsored search results, a substantial portion of the change in CTR when conditioned on prior clicks is in fact due to a change in the relevance of results for that query instance, not just due to a change in the probability of examination. We then propose three new user browsing models, which attribute CTR changes solely to changes in relevance, solely to changes in examination (with an enhanced model of user behavior), or to both changes in relevance and examination. The model that attributes all the CTR change to relevance yields substantially better predictors of CTR than models that attribute all the change to examination, and does only slightly worse than the model that attributes CTR change to both relevance and examination. For predicting relevance, the model that attributes all the CTR change to relevance again does better than the model that attributes the change to examination. Surprisingly, we also find that one model might do better than another in predicting CTR, but worse in predicting relevance. Thus it is essential to evaluate user browsing models with respect to accuracy in predicting relevance, not just CTR.

Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
Daria Sorokina

Ensembles of regression trees are long known as powerful models suitable for a large variety of tasks. The most well-known ensembles are Random Forests (averaging independent large, highly non-linear trees) and Gradient Boosting (additive model built from small trees). In the beginning of the talk I will describe an ensemble that combines the benefits of both: Additive Groves is based on regression trees, additive models and bagging and is capable of both fitting additive structure of the problem and modelling its highly nonlinear components with very large trees at the same time. Combination of these properties makes Additive Groves superior in performance to other existing tree ensemble methods.

Models with best performance often act as black boxes: they make good predictions, but do not provide much insight into the decision making process. This is not satisfactory for many data mining problems. Therefore separate post-processing techniques are needed to answer the questions about important features, their effects and interactions.

The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers. In this talk I will describe an approach to interaction detection based on comparing the performance of unrestricted and restricted prediction models, where restricted models are prevented from modeling an interaction in question. I will show that Additive Groves has the required properties and can be used within this framework.

I will also talk about several applications of these techniques to real-world data: tracking environment change based on birds abundance and Salmonella risk prediction on meat-processing establishments. Apart from that, Additive Groves has been successfull in several recent competitions: KDD Cup'09, ICDM'09 and Yahoo Learning to Rank Challenge. The latter result is especially interesting, because Additive Groves was the only winning solution that did not optimize for any specific ranking metrics.

Algorithms presented in the talk are implemented as a part of an open source C++ package available at

At the end of the talk I will make a short presentation about Yandex and its California office ( We are hiring!

Reflections on Link-based Ranking
Marc Najork

This talk focuses on using hyperlinks in the ranking of web search results. We give a brief overview of the vast body of work in the area; we provide a quantitative comparison of the different features; we sketch how link-based ranking features can be implemented in large-scale search engines; and we identify promising avenues for future research.

Output URL Bidding
Panagiotis Papadimitriou

Output URL bidding is a new bidding mechanism for sponsored search, where advertisers bid on search result URLs, as opposed to keywords in the input query. For example, an advertiser may want his ad to appear whenever the search result includes the sites and, instead of bidding on keywords that lead to these sites, e.g., movie titles, actor names, and so on. In this paper we study the tradeoff between the simplicity and the specification power of output bids and we explore their utility for advertisers. We first present a model to derive output bids from existing keyword bids. Then, we use the derived bids to experimentally study output bids and contrast them to input query bids. Our main results are the following: (1) Compact output bids that mix both URLs and hosts have the same specification power as more lengthy input bids; (2) Output bidding can increase the recall of relevant queries; and (3) Output and input biding can be combined into a hybrid mechanism that combines the benefits of both.

A Technical Overview of Search Ads Quality at Google
Roberto Bayardo

Abstract: Ads quality involves optimizing the oftentimes competing interests of Google, the users of its search engine, and advertisers who pay to have their ads displayed on search results pages. In this talk I describe the primary problems faced by the Google ads quality team, and some of the general approaches we take to address them. My goal of the talk is to introduce others to problems of interest beyond the better studied problems of click through estimation and mechanism design. The topics covered will include keyword targeting & bid specification, budget allocation, user based ads quality, ad formats, when to show an ad and where, and assessing quality of ad creatives, their landing pages, and the entire "post click" experience.

Bio: Roberto Bayardo is a principal research scientist at Google, where he has been working within the ads quality and ads serving teams for the past 5 years. Roberto currently leads a team working on quality modeling, with the broad goal of ensuring that ads displayed alongside Google search provide a good user experience.

On the Competitive Ration of Online Sampling Auctions
Georgios Pierrakos

We study online profit-maximizing auctions for digital goods with adversarial bid selection and uniformly random arrivals. Our goal is to design auctions that are constant competitive with F^(2); in this sense our model lies at the intersection of prior-free mechanism design and secretary problems. We first give a generic reduction that transforms any offline auction to an online one, with only a loss of a factor of 2 in the competitive ratio; we then present some natural auctions, both randomized and deterministic, and study their competitive ratio; our analysis reveals some interesting connections of one of these auctions with RSOP, which we further investigate in our final section.

The importance of ties
Paolo Parigi

For sociologists, the ties that join individuals into relationships are the result of social processes rather than being entities themselves. A social network is therefore a representation for something that happens between individuals and that brings people together. Sometimes, however, social processes sever ties and produce an increase in the distance that separates individuals, ultimately generating isolation. In my talk I will first provide a few examples of social processes that generate ties and then I will advance a theoretical scenario for studying processes that instead decrease the number of ties. The positive examples of tie generation come from my work in the fields of political sociology and organization theory. The negative scenario of increasing isolation comes from my work on social network analysis. In presenting these examples, my goal is to stimulate the conversation between social scientists and computer scientists in the field of network analysis.

How opinions are received by online communities: a case study on helpfulness votes
Gueorgi Kossinets

There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is, where reviews come with annotations like "26 of 32 people found the following review helpful." Opinion evaluation appears in many off-line settings as well, including market research and political campaigns. Reasoning about the evaluation of an opinion is fundamentally different from reasoning about the opinion itself: rather than asking, "What did Y think of X?", we are asking, "What did Z think of Y's opinion of X?" Here we develop a framework for analyzing and modeling opinion evaluation, using a large-scale collection of Amazon book reviews as a dataset. We find that the perceived helpfulness of a review depends not just on its content but also but also in subtle ways on how the expressed evaluation relates to other evaluations of the same product. As part of our approach, we develop novel methods that take advantage of the phenomenon of review "plagiarism" to control for the effects of text in opinion evaluation, and we provide a simple and natural mathematical model consistent with our findings. Our analysis also allows us to distinguish among the predictions of competing theories from sociology and social psychology, and to discover unexpected differences in the collective opinion-evaluation behavior of user populations from ifferent countries.

Reaching Consensus on Social Networks
Grant Schoenebeck

Research in sociology studies the effectiveness of social networks in achieving computational tasks. Typically the agents who are supposed to achieve a task are unaware of the underlying social network except their immediate friends. They have limited memory, communication, and coordination. These limitations result in computational obstacles in achieving otherwise trivial computational problems. One of the simplest problems studied in the social sciences involves reaching a consensus among players between two alternatives which are otherwise indistinguishable. In this paper we formalize the computational model of social networks. We then analyze the consensus problem as well as the problem of reaching a consensus which is identical to the majority of the original signals. In both models we seek to minimize the time it takes players to reach a consensus.

Web-scale Information Extraction Using Wrappers.
Nilesh Dalvi

A significant portion of web pages embed interesting and valuable semantic content suitable for structured representation. The traditional Information Extraction techniques, however, fall short of achieving high quality extraction at Web scale. In this talk, I will outline some of the work going on at Yahoo! Research on addressing the challenges of Information Extraction on a Web scale. I will focus on {\em wrapper} based techniques, which exploit the HTML structure of websites to extract the information of interest. I will address two problems : (i) making wrappers more robust to changes in websites, and (ii) enabling learning of wrappers from automatically obtained noisy training data.s

Characterization and Modeling of Online Browsing Activity.
Andrew Tomkins

I will describe a set of studies of online user behavior based on search and toolbar logs. In these studies, we propose a new "CCS" taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Search (web search, item search, multimedia search). We show that roughly half of all pageviews online are content, 1/3 are communications, and the remaining 1/6 are search. We study the interarrival distribution of visits to a page by a single user and in aggregate, and show some evidence that "bursty" behavior in which a URL attains significant transient popularity for a few days is not a significant contributor to interarrival times. We also study the nature and frequency of search queries in more detail, differentiating between web search, item search, and multimedia search, and characterizing the prevalence of references to structured objects in web search queries. Navigation patterns in this data are not well characterized as trails, but should instead be viewed as trees due to the presence of multi-tab or multi-window browsing.  We present a model of tabbed browsing that  represents a hybrid between a markov process capturing the graph of web pages, and a branching process capturing the creation, splitting, and dying of tabs.  We present a mathematical criterion to characterize whether the process has a steady state independent of initial conditions, and we show how to characterize the limiting behavior in both cases.   We perform a series of experiments to compare our tabbed browsing model with the simpler random surfer model. Joint work with Ravi Kumar and Flavio Chierichetti.

User grouping behavior in online forums.
Xiaolin Shi

Online forums represent one type of social media that is particularly rich for studying human behavior in information seeking and diffusing. The way users join communities is a reflection of the changing and expanding of their interests toward information. In this talk, I present the work studying the patterns of user participation behavior, and the feature factors that influence such behavior on different forum datasets. In this work, we find that, despite the relative randomness and lesser commitment of structural relationships in online forums, users' community joining behaviors display some strong regularities. One particularly interesting observation is that the very weak relationships between users defined by online replies have similar diffusion curves as those of real friendships or co-authorships. We build social selection models, Bipartite Markov Random Field (BiMRF), to quantitatively evaluate the prediction performance of those feature factors and their relationships. Using these models, we show that some features carry supplementary information, and the effectiveness of different features vary in different types of forums. Moreover, the results of BiMRF with two-star configurations suggest that the feature of user similarity defined by frequency of communication or number of common friends is inadequate to predict grouping behavior, but adding node-level features can improve the fit of the model.

Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing.
Matthew Harding

In this paper we introduce a new Poisson mixture model for count panel data where the underlying Poisson process intensity is determined endogenously by consumer latent utility maximization over a set of choice alternatives. This formulation accommodates the choice and count in a single random utility framework with desirable theoretical properties. Individual heterogeneity is introduced through a random coefficient scheme with a flexible semiparametric distribution. We deal with the analytical intractability of the resulting mixture by recasting the model as an embedding of infinite sequences of scaled moments of the mixing distribution, and newly derive their cumulant representations along with bounds on their rate of numerical convergence. We further develop an efficient recursive algorithm for fast evaluation of the model likelihood within a Bayesian Gibbs sampling scheme, and show posterior consistency. We apply our model to a recent household panel of supermarket visit counts. We estimate the nonparametric density of three key variables of interest - price, driving distance, and their interaction - while controlling for a range of consumer demographic characteristics. We use this econometric framework to assess the opportunity cost of time and analyze the interaction between store choice, trip frequency, search intensity, and household and store characteristics. We also conduct a counterfactual welfare experiment and compute the compensating variation for a 10% to 30% increase in Walmart prices.

Deconstructing A Long Tail Commerce Network.
Neel Sundaresan

In this talk we discuss modeling a complex long tail commerce network in the form of the eBay Marketplace graph. The asymmetric and preferential nature of buying and selling in terms of structure, interactions, trust and reputation measures and evolution of these has tremendous value in identifying business opportunities and building effective user applications.The discussion focuses on the analysis of the macroscopic shape of the networks across the platform and also in specific marketplace categories. The results suggest significant diversity across these individual marketplaces from collectors network to retail network. We also discuss motif profiling and preferential connections. Means for defining trust measures based on network properties will also be addressed.

Nod vs. High-Five: LinkedIn Connection Strength Models and Applications.
Monica Rogati

The low cost of link formation on social and professional networking sites leads to connections with heterogeneous relationship strengths, a LinkedIn user might be connected to both professional acquaintances and close collaborators. Modeling this relationship as non-binary is more realistic (and more useful in personalizing a user's experience) than the binary links that have been the focus of previous work in the social networking space. We present an unsupervised, link-based latent variable model to estimate relationship strength from interaction activity (e.g., communication, browsing, comments) and profile content similarity. We also discuss practical applications of modeling connection strength when building data driven products that enhance the LinkedIn user experience and engagement. (This is joint work with Rongjing Xiang during her internship at LinkedIn.)

Network of positive and negative edges
Jure Leskovec

Relations between users on social media sites often reflect a mixture of positive (friendly) and negative (antagonistic) interactions. We study how the interplay between positive and negative relationships affects the structure of on-line social networks, whereas in contrast the bulk of research on social networks to date has focused almost exclusively on positive interpretations of links between people. We connect our analyses to theories of signed networks from social psychology. In doing so we find that the classical theory of structural balance tends to capture certain common patterns of interaction, but that it is also at odds with some of the fundamental phenomena we observe --- particularly related to the evolving, directed nature of these on-line networks. We then develop an alternate theory of status that better explains the observed edge signs and provides insights into the underlying social mechanisms. Our work provides one of the first large-scale evaluations of theories of signed networks using on-line datasets, as well as providing a perspective for reasoning about social media sites.

Joint work with Daniel Huttenlocher and Jon Kleinberg.

The paper can be accessed here.

Social Networks and Stable Matchings in the Job Market
Esteban Arcaute

In this talk we introduce and study a model that considers the job market as a two-sided matching market, and accounts for the importance of social contacts in finding a new job. We assume that workers learn only about positions in firms through social contacts. Given that information structure, we study both static properties of what we call locally stable matchings, a solution concept derived from stable matchings, and dynamic properties through a reinterpretation of Gale-Shapley's algorithm as myopic best response dynamics. We prove that, in general, the set of locally stable matching strictly contains that of stable matchings and it is in fact NP-complete to determine if they are identical. We also show that the lattice structure of stable matchings is in general absent. Finally, we focus on myopic best response dynamics inspired by the Gale-Shapley algorithm. We study the efficiency loss due to the informational constraints, providing both lower and upper bounds.

This is joint work with Sergei Vassilvitskii, Ramesh Johari and David Liben-Nowell. Some of the results presented appeared in WINE 2009.

Temporal User Behavior in Yahoo Answers
Christina Aperjis

When searching for an answer to a question, people generally prefer to get both high quality and speedy answers. The fact that it usually takes longer to find a better answer creates a speed-accuracy tradeoff which is inherent in information seeking. A natural setting to study this tradeoff is Yahoo Answers, a community-driven question-and-answer site. We identify and quantify how question authors trade off the number of answers they receive and the cost of waiting. We find that users are willing to wait more to obtain an additional answer when they have only received a small number of answers; this implies decreasing marginal returns in the amount of collected information. We also estimate the user's utility function from the data.

This is joint work with Bernardo A. Huberman and Fang Wu.

The Netflix Prize: Quest for $1,000,000
Yehuda Koren

The collaborative filtering approach to recommender systems predicts user preferences for products or services by learning past user-item relationships. Their significant economic implications made collaborative filtering techniques play an important role at known e-tailers such as Amazon and Netflix. This field enjoyed a surge of interest since October 2006, when the Netflix Prize competition was commenced. Netflix released a dataset containing 100 million anonymous movie ratings and challenged the research community to develop algorithms that could beat the accuracy of its recommendation system, Cinematch. In this talk I will survey the competition together with some of the principles and algorithms, which have led us to winning the Grand Prize in the competition.

Data Jujitsu - the art of treating data as a product
DJ Patil

LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio. This emphasis on data as a product led LinkedIn to be the first social network site to deploy a People You May Know service. Other data products that LinkedIn has built include Who Viewed My Profile, People Who Viewed This Profile Also Viewed..., the backend standardization algorithms that power search, and matching algorithms. To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way. In this presentation, we will talk about our methodology for turning data into user facing products and some of the major challenges we see in the data space.

Query-log analysis for improving web search
Panayiotis Tsaparas

Search engines log the activity of users when posing queries and interacting with the results. Query click logs maintain information about what queries were asked, what results were shown and which ones were clicked as a result of the query. This information offers a valuable insight in the intent of the user and the need behind the query, and it has been the focus of intense research activity. In this talk, we will describe algorithms for query-log analysis. First we will talk about an algorithm that uses query logs for generating advertising keywords and understanding the category of a query. Time permitting, we will then describe an algorithm for generating training data for learning a ranking function. In both cases we model the query log as a graph, and we formulate our problem as an optimization problem over these graphs.

Stochastic Submodular Optimization
Arash Asadpour

We study stochastic submodular maximization problem with respect to cardinality and matroid constraints. Our model can capture the effect of uncertainty in different problems, such as cascade effects in social networks, capital budgeting, sensor placement, etc. We study non-adaptive and adaptive policies and give "optimal" constant approximation algorithms for both cases. We also show that the adaptivity gap in this setting is 1.59.

Compressibility of Behavioral Graphs
Ravi Kumar

Graphs resulting from human behavior (the web graph, social networks, etc.) have hitherto been viewed as a monolithic class with similar characteristics. Our focus here is on the compressibility of such graphs. It has been empirically shown that Web graphs can be compressed down to three bits of storage per edge. In light of this, we address two basic questions: are social networks as compressible as Web graphs and are there tractable and realistic models that yield highly compressible graphs?

De-anonymizing Social Networks
Arvind Narayanan

Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.

We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.

Arvind Narayanan is a post-doctoral researcher working with Dan Boneh. He recently finished his Ph.D at the University of Texas at Austin, advised by Vitaly Shmatikov. His research is on the privacy and anonymity issues involved in publishing large-scale datasets about people. His thesis, in a sentence, is that the level of anonymity that society expects -- and companies claim to provide -- in published databases is fundamentally unrealizable. He blogs about his anonymity-breaking efforts and other research at

Eliciting Information on the Distribution of Future Outcomes
Nicolas Lambert

This paper studies the problem of inducing a presumably knowledgeable expert to reveal true information regarding the probability distribution of a future random outcome. The information being considered is quite general, and includes in particular the statistics of the probability distribution (such as mean, median, variance), and categorical information (such as the most correlated pair of variables). I examine two types of incentive schemes: Those that reward the expert for being truthful, and, for the case of numerical information, those that reward the expert increasingly with the accuracy of the prediction. For both cases, I establish simple criteria to determine the information that can be elicited with such schemes, and offer a complete characterization of the associated schedule fee.

Optimizing Web Traffic via the Media Scheduling Problem
Lars Backstrom

Website traffic varies through time in consistent and predictable ways, with highest traffic in the middle of the day. When providing media content to visitors, it is important to present repeat visitors with new content so that they keep coming back. In this paper we present an algorithm to balance the need to keep a website fresh with new content with the desire to present the best content to the most visitors at times of peak traffic. We formulate this as the me- dia scheduling problem, where we attempt to maximize total clicks, given the overall traffic pattern and the time varying clickthrough rates of available media content. We present an efficient algorithm to perform this scheduling under certain conditions and apply this algorithm to real data obtained from server logs, showing evidence of significant improvements in traffic from our algorithmic sched- ules. Finally, we analyze the click data, presenting models for why and how the clickthrough rate for new content declines as it ages.

Behavioral Experiments in Strategic Networks
Michael Kearns

For four years now, we have been conducting "medium-scale" experiments in how human subjects behave in strategic and economic settings mediated by an underlying network structure. We have explored a wide range of networks inspired by generative models from the literature, and a diverse set of collective strategic problems, including biased voting, graph coloring, consensus, and networked trading. These experiments have yielded a wealth of both specific findings and emerging general themes about how populations of human subjects interact in strategic networks. I will review these findings and themes, with an emphasis on the many more questions they raise than answer.

How Social Network Structure affects Diffusion and Learning
Matthew Jackson

We examine how diffusion and learning processes operating through social networks are affected by homophily (the tendency of individuals to associate with others similar to themselves) and other network characteristics. Homophily has no effect in pure diffusion processes where only shortest paths matter; only connection density matters. In contrast, homophily substantially slows learning based on repeated averaging of neighbors' information, while connection density has no effect.

How to Design Profitable Auctions
Paul Valiant

This talk is a high-level overview of results from two years of work with Silvio Micali and Jing Chen on a question that may best be phrased as: what is the best way to sell many items to many players? The standard formalizations of this question were fleshed out in the 1960's and led to answers that are unsatisfactory in practice, leading to today's Ebay that sells items by the auction method known to the ancient Greeks. In this talk we present new formulations of this question, elicited by fundamental notions from computer science: competitive analysis, collusion, and "knowledge". We present several concrete (and near-optimal) auctions in these models, along with many open questions.

Sort me if you can (or, Algorithms on Dynamic Data)
Mohammad Mahdian

We formulate and study a new computational model for dynamic data. In this model the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has a limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability. If time permits, we will discuss new results on graph algorithms in our framework.

This is based on joint work with Aris Anagnostopoulos, Ravi Kumar, and Eli Upfal.

Discovery & Emergence
Abdur Chowdhury

Often as computer scientists we focus on faster algorithms, such as approximations of solutions in linear time over large data sets or similar problems. Rather than focus on algorithms in this talk, we ask the question "What possibilities emerge from surfacing the world's conversations to others". Specifically we explore Twitter Trends as a discovery tool and show how awareness of the thoughts of others can cause the emergence of new behaviors.

Dr. Abdur Chowdhury serves as Twitter's Chief Scientist. Prior to that Dr. Chowdhury co-founded Summize a real-time search engine sold to Twitter in 2008. Dr Chowdhury has held positions at AOL as their Chief Architect for Search, Georgetown's Computer Science Department and University of Maryland's Institute for Systems Research. His research interest lay in Information Retrieval focusing on making information accessible.

Internet search advertising: Challenges in targeting and quality considerations
Mayur Datar

In this informal talk, I will touch upon two topics related to search advertising:

Targeting: How do you match advertiser keywords to user queries? What are the underlying challenges in bridging this semantic gap?

How do you measure the quality of search ads? Is it in the best interest of all parties involved, users, advertisers and the search engine, to always show ads for queries? What are the costs and benefits to showing ads and how do you decide on when is it appropriate to show ads?

Look forward to more questions than answers! Hopefully these will lead to good research topics.

Mayur Datar works as a Research Scientist with Google Inc. His research interests are in datamining, algorithms, databases and computer science theory. Prior to joining Google, Mayur obtained his doctorate degree in computer science from Stanford university and a Bachelor of Technology degree from I.I.T. Bombay. He was awarded the President of India, Gold Medal for being the most outstanding student of his graduating batch from I.I.T. Bombay. He has published several papers in renowned conferences like SIGMOD, VLDB, KDD, FOCS, SODA, WWW.

Affiliation Networks
D. Sivakumar

In the last decade, structural properties of several naturally arising networks (the Internet, socialnetworks, the web graph, etc.) have been studied intensively with a view to understanding their evolution. In recent empirical work, Leskovec, Kleinberg, and Faloutsos identify two new and surprising properties of the evolution of many real-world networks: densification (the ratio of edges to vertices grows over time), and shrinking diameter (the diameter reduces over time to a constant). These properties run counter to conventional wisdom, and are certainly inconsistent with graph models prior to their work.

In this work, we present the first model that provides a simple, realistic, and mathematically tractable generative model that intrinsically explains all the well-known properties of the social networks, as well as densification and shrinking diameter. Our model is based on ideas studied empirically in the social sciences, primarily on the groundbreaking work of Breiger (1973) on bipartite models of social networksthat capture the affiliation of agents to societies.

We also present algorithms that harness the structural consequences of our model. Specifically, we show how to overcome the bottleneck of densification in computing shortest paths between vertices by producing sparse subgraphs that preserve or approximate shortest distances to all or a distinguished subset of vertices. This is a rare example of an algorithmic benefit derived from a realistic graph model.

The talk will review the history of relevant random graph models, and attempt to illustrate the co-evolution of theoretical modeling and empirical insights in this fascinating slice of graph theory.

The talk reports work done jointly with Silvio Lattanzi of Sapienza University, Roma, Italy, and is to appear in STOC 2009.

Using Bandit Models in Search Engines
Sandeep Pandey

Web search engines perform two basic tasks: they acquire Web pages via crawling, and present lists of pages and advertisements in response to user search queries. Given the gigantic size of the Web these tasks can be viewed as constrained optimization problems. On the acquisition side the constraint is on the crawling resources, while on the presentation side, the constraint is on user attention: users cannot be expected to sift through large amounts of content. Furthermore, some parameters needed to solve these optimization problems may or may not be known leading to the offline and online versions of the problems. In this talk we will look at these optimization problems.

In particular, we will focus on the online presentation of advertisements where the unknown parameters are the ad quality values. I will present a method of simultaneously estimating unknown parameters and exploiting current estimates. Our method extends and adapts well-known "Bandit" models from Machine Learning.

Sandeep Pandey is a Research Scientist in Search and Advertising group at Yahoo! Research. Before joining Yahoo!, he received his Ph.D. from Carnegie Mellon and his B.Tech. from IIT Bombay. His research interests include Web crawling and advertising. He also likes to work on online learning problems such as the multi-armed bandit problem. In his spare time Sandeep enjoys watching cricket and playing tennis.

Research Challenges in Online Advertising
Richard Chatwin
VP Research, Adchemy

Adchemy is leading a revolution in online performance marketing, both as a top-25 U.S. online advertiser and through the development of the Adchemy Digital Marketing Platform, an integrated suite of products that allows online marketers to maximize their customers' end-to-end experience. I will discuss some of the research challenges that we are tackling in developing our technology solutions, including problems related to online matching, multi-armed bandits, and the optimizer's curse.

Richard has been VP Research at Adchemy since its inception in late 2004. He has over nine years of experience in the online advertising industry. He received his PhD from the Operations Research department at Stanford, where he focused on airline revenue management. He spent several years in the consulting world pioneering the development and implementation of real option valuation methodologies for asset valuation.

Scaling Social Data
Robert Johnson
Director of Engineering, Facebook

Facebook serves billions of pages a day based on hundreds of terabytes of dynamic data, which presents many scaling challenges for our storage and caching systems. In particular the social nature of the data gives it some interesting properties - it is not easily clustered or partitioned by users or applications, it updates frequently, and many queries require massive network fan-in. I'll be discussing the problems that arise in scaling such a system and some of the solutions we've used to get to our current scale, as well as our plans for infrastructure to solve these problems at even larger scale and in a more general way for future web 2.0 applications with similar properties.

Robert Johnson is Director of Engineering at Facebook, where he leads the software development efforts to cost-effectively scale Facebook's infrastructure and optimize performance for its many millions of users. During his time with the company, the number of users has expanded by more than twenty-fold and Facebook now handles billions of page views a day. Robert was previously at ActiveVideo Networks where he led the distributed systems and set-top software development teams. He has worked in a wide variety of engineering roles from robotics to embedded systems to web software. He received a B.S. In Engineering and Applied Science from Caltech.

Data Cloud: Integrating Structured Information into Web Search
Yury Lifshits

In this talk we address two questions: (1) How to use structured data in web search? (2) How to gather structured data? For the first question we identify valuable classes of data, present query classes that can benefit from structured data and describe architecture that combines keyword search with structured search. For the second question we present Data Cloud: An ecosystem of data publishers, search engine (data cloud) and data consumers.
We show connection from Data Cloud strategy to classic notion in economics: network effect in two-sided markets. At the end of the talk an early demo implementation will be presented.

Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He spent a year as a postdoc at Caltech before joining Yahoo! Research in 2008. He won two gold medals at International Mathematical Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":

Manipulation Robustness of Collaborative Filtering Systems
Robbie Yan

While the expanding universe of products available via Internet commerce provides consumers with valuable options, sifting through the numerous alternatives to identify desirable choices can be challenging. Collaborative filtering systems aid this process by recommending to users products desired by similar individuals. Because purchase decisions are influenced by these systems, however, they have become targets of manipulation by unscrupulous vendors. In this talk, we will discuss our findings on the robustness of collaborative filtering systems to manipulation. In particular, we provide theoretical and empirical results demonstrating that while nearest neighbor algorithms, which are widely used in commercial systems, can be highly susceptible to manipulation, two classes of collaborative filtering algorithms which we refer to as linear and asymptotically linear are relatively robust. In particular, we find that as a user rates an increasing number of products, the average accuracy of predictions made by these algorithms becomes insensitive to manipulated data. Our results provide guidance for the design of future collaborative filtering systems.

Scalable image recognition algorithms for camera phone applications
G.D. Ramkumar, SnapTell, Inc.

SnapTell has created technology that allows a mobile phone user to snap a picture of a product and instantly receive reviews, prices, and information. The service has been scaled to support more than ten million products including all books, DVDs, CDs, video games, and other product categories. We present an outline of the algorithms used to power the service emphasizing scaling and locality sensitive indexing. SnapTell's popular iPhone application has been downloaded by over half a million users over the last few weeks.

The technology can also be used to make print advertisements in a magazine, newspaper, poster, or billboard interactive without adding barcodes or otherwise altering the content. Consumers snap a picture of the object and send it to a shortcode to enter into a contest, access a mobile website, or download ringtones, wallpaper, video clips, or other information. SnapTell is the leading vendor in this space and offers a solution with unprecedented scale and robustness.

We present a synopsis of the image matching algorithms used to power the service and discuss potential research topics in the space.

Content-Driven Reputation and Trust for Collaborative Content
Luca de Alfaro (UCSC, visiting at Google)

Open collaboration is one of the most effective ways to create content and aggregate information, as witnessed by the Wikipedia. In this talk, we propose a reputation systems for the authors of collaborative information, and a trust systems for the aggregated information.

The goal of the reputation system is to provide an incentive to collaboration, and to provide information about the quality of authors. We propose a content-driven notion of reputation, where authors of long-lasting contributions gain reputation, while authors of contributions that are not preserved lose reputation. We show that, on the Wikipedia, the content-driven reputation of authors is a good predictor of the quality of their future contributions. We then describe a trust system for content, where the trust in a portion of content is computed according to the reputations of the authors who created and revised it. Again, we show that on the Wikipedia content trust is a good predictor of future content longevity. Finally, we describe how the reputation and trust systems can be made robust to Sybil attacks, in which people use multiple identities to try boost their reputation and raise the trust of content of their choice.

The reputation and trust systems have been implemented in the WikiTrust system/.

Luca de Alfaro received his Ph.D. degree from Stanford University, where he worked on system specification and verification under the supervision of Zohar Manna. After graduation, he was for three years at UC Berkeley as a postdoctoral researcher, working with Thomas A. Henzinger on verification and on game theory. In 2001, Luca de Alfaro joined the faculty of UC Santa Cruz. His research interests involve system specification and verification, embedded software, game theory, incentive systems for on-line collaboration, and trust and reputation on the web. Luca de Alfaro is currently at Google as a visiting scientist, on leave from UC Santa Cruz.

Luca de Alfaro has long been an enthusiastic user of on-line collaborative tools, including wikis. One evening in 2005, after working at the wiki he uses to facilitate information sharing among members of his research group, he decided to start a wiki on cooking, which is one of his passions. The wiki was soon peppered with spam and less than useful contributions. Rather than cleaning up, a most tedious job, Luca de Alfaro shut off the wiki temporarily, and he started pondering how to provide incentives to collaboration, and compute quality metrics for content. As he found the job of rating authors or contributions by hand most tedious, he was interested in methods for computing reputation and quality metrics purely from content analysis. The research led to the WikiTrust system/. The wiki on cooking has been brought back to life (, even though it receives little traffic. You are welcome to register, add recipes, and see WikiTrust in action.

Tagging with Queries: How and Why? (To appear in WSDM 2009)
Ioannis Antonellis

Web search queries capture the information need of search engine users. Search engines store these queries in their logs and analyze them to guide their search results.

In this work, we argue that not only a search engine can benefit from data stored in these logs, but also the web users. We first show how clickthrough logs can be collected in a distributed fashion using the http referer field in web server access logs. We then perform a set of experiments to study the information value of search engine queries when treated as "tags" or "labels" for the web pages that both appear as a result and the user actually clicks on. We ask how much extra information these query tags provide for web pages by comparing them to tags from the bookmarking site and to the pagetext. We find that query tags can provide substantially many (on average 250 tags per URL), new tags (on average 125 tags per URL are not present in the pagetext) for a large fraction of the Web.

Joint work with Hector Garcia-Molina and Jawed Karim

Content Targeting and Recommendation Systems
Scott Brave (Baynote, Inc)

Baynote delivers on-demand recommendation technology and social search for over 200 websites, including Motorola, NASA, Intuit, and Expedia. By tapping into the wisdom of the online community and automatically displaying the best content and products, Baynote's Collective Intelligence Platform (CIP) improves the user experience while increasing online revenue, leads and impressions. In this talk, I will present Baynote's unified approach to content targeting and discuss some of the opportunities and challenges in building next-generation recommendation systems.

Scott Brave is a co-founder and CTO at Baynote, Inc. Prior to Baynote, he was a postdoctoral scholar in the Department of Communication at Stanford University where he served as lab manager for the CHIMe (Communication between Humans and Interactive Media) Lab. Scott received his Ph.D. in Human-Computer Interaction (Communication Dept.), and B.S. in Computer Systems Engineering (AI focus) from Stanford University, and his M.S. from the MIT Media Lab. Scott is also a former Associate Editor of the "International Journal of Human-Computer Studies" and co-author of Wired for speech: How voice activates and advances the human-computer relationship.

Online Social Networks: Modeling and Mining
Ravi Kumar (Yahoo! Research)

Online social networks have become major and driving phenomena on the web. In this talk we will address key modeling and algorithmic questions related to large online social networks. From the modeling perspective, we raise the question of whether there is a generative model for network evolution. The availability of time-stamped data makes it possible to study this question at an extremely fine granularity. We exhibit a simple, natural model that leads to synthetic networks with properties similar to the online ones. From an algorithmic viewpoint, we focus on data mining challenges posed by the magnitude of data in these networks. In particular, we examine topics related to influence and correlation in user activities in such networks.

Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a research staff member at the IBM Almaden Research Center in the Computer Science Principles and Methodologies group. He obtained his PhD in Computer Science from Cornell University in December 1997. His primary interests are web algorithms, algorithms for large data sets, and theory of computation.

Community Structure in Large Social and Information Networks
Michael Mahoney (Stanford University)

The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful communities in large real-world networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology uses local spectral methods and involves computing isoperimetric properties of the networks at various size scales -- a novel application of ideas from statistics and scientific computation to internet data analysis. Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually ``blend in'' with the rest of the network and thus become less ``community-like.'' This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks.

Michael Mahoney is currently a research scientist in the math department at Stanford University. He is currently working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. His research interests also include randomized algorithms for matrices and regression problems, applications to DNA single nucleotide polymorphism data, and large-scale statistical data analysis more generally. He has been a faculty member at Yale University and a researcher at Yahoo.

An Incentive-Based Architecture for Social Recommendations
Kostas Kollias

Recommender systems and social networking services are important Internet monetization tools and their popularity greatly facilitates web-based commercial activity. In this talk we will focus on how these tools can be coupled in order to form an architecture that incentivizes users to provide honest recommendations in a social network. The architecture consists of one distinct reputation system for each user, a recommendation language which allows users to place their reviews, a product ranking algorithm, and a revenue sharing mechanism. We will discuss the details of the architecture, the properties that allow users to profit by placing honest recommendations and how fast these recommendations lead to an equilibrium state.

Joint work with Rajat Bhattacharjee and Ashish Goel

Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction
Ali Dasdan

Web search is an indispensable resource for users to search for information on the Web. Web search is also an important service for publishers and advertisers to present their content to users. Thus, user satisfaction is the key and must be quantified. In this talk, our goal is to give a practical review of web search metrics from a user satisfaction point of view. This viewpoint is intuitive enough for a typical user to express his or her web search experience using it. The metrics that we are planning to cover fall into the following areas: relevance, coverage, comprehensiveness, diversity, discovery freshness, content freshness, and presentation. We will also describe how these metrics can be mapped to proxy metrics for the stages of a generic search engine pipeline. Our hope is that practitioners can apply these metrics readily and that researchers can get problems to work on, especially in formalizing and refining metrics.

This short talk is based on a half-day tutorial that the presenter with Kostas Tsioutsiouliklis and Emre Velipasaoglu will present at the WWW'09 conference.

Ali Dasdan is a director managing the metrics, analysis, and algorithms group of Yahoo! Web Search. His group is responsible for defining and implementing white-box metrics for production web search systems, production web search data, and the Web. His group is also responsible for monitoring the metrics and diagnosing the causes for anomalies detected. His research interests are in web search and advertising since 2006. He obtained his PhD in Computer Science from the University of Illinois at Urbana-Champaign in 1999.

Trusts among Users of Online Communities -- an Epinions Case Study
Hady Lauw

Trust between a pair of users is an important piece of information for users in an online community (such as electronic commerce websites and product review websites) where users may rely on trust information to make decisions. In this paper, we address the problem of predicting whether a user trusts another user. Most prior work infers unknown trust ratings from known trust ratings. The effectiveness of this approach depends on the connectivity of the known web of trust and can be quite poor when the connectivity is very sparse which is often the case in an online community. In this paper, we therefore propose a classification approach to address the trust prediction problem. We develop a taxonomy to obtain an extensive set of relevant features derived from user attributes and user interactions in an online community. As a test case, we apply the approach to data collected from Epinions, a large product review community that supports various types of interactions as well as a web of trust that can be used for training and evaluation. Empirical results show that the trust among users can be effectively predicted using pre-trained classifiers.


Hady is currently a Visiting Researcher at Microsoft Search Labs. Prior to that, he earned his Ph.D. at Nanyang Technological University, Singapore under a fellowship funded by A*STAR (Singapore's NSF-equivalent). His research interests are in social networks and Web mining, covering such topics as rating behaviors, trust, and mass collaboration. His work has been published in various conferences and journals such as KDD, SDM, CIKM, WSDM, EC, and TKDE.

Top-k Aggregation Using Intersection of Ranked Inputs
Sergei Vassilvitskii

There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: one has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches.

We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain ``instance optimal,'' a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.

Joint work with Ravi Kumar, Kunal Punera and Torsten Suel

Utility-Maximizing Privacy Mechanisms
Mukund Sundararajan

A mechanism for releasing information about a statistical database with sensitive data must resolve a trade-off between utility and privacy. Informally, publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Formally, privacy can be rigorously quantified using the framework of {\em differential privacy}, which requires that a mechanism's output distribution is nearly the same (in a strong sense) whether or not a given database row is included or excluded. Thus far, (dis)utility has typically been quantified via the expected error of a (randomly perturbed) response to a query.

We pursue much stronger and more general utility guarantees. Just as differential privacy guarantees protection against every potential attacker, independent of its side information, we seek a mechanism that guarantees near-optimal utility to every potential user, independent of its side information. Formally, we model the side information of a potential user as a prior distribution over query results. An interaction between such a user and a mechanism induces a posterior distribution -- the utility of the mechanism for this user is defined as the accuracy of this posterior as quantified via a user-specific loss function. A differentially private mechanism $M$ is (near-)optimal for a given user $u$ if $u$ derives (almost) as much utility from $M$ as from any other differentially private mechanism.

We prove that for each fixed counting query and differential privacy level, there is a {\em geometric mechanism} $M^*$ --- a discrete variant of the well-studied Laplace mechanism --- that is {\em simultaneously utility-maximizing} for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: {\em every} potential user $u$, no matter what its side information and loss function, derives as much utility from $M^*$ as from interacting with a differentially private mechanism $M_u$ that is optimally tailored to $u$. The first part of our proof characterizes the utility-maximizing differentially private mechanism for a fixed but arbitrary user in terms of a basic feasible solution to a linear program with constraints that encode differential privacy. The second part shows that all of the relevant vertices of this polytope (ranging over all possible users) are derivable from the geometric mechanism via suitable remappings of its range.

Joint work with Arpita Ghosh and Tim Roughgarden.

Online Advertising
Hamid Nazerzadeh

Online advertising is a multi-billion dollar growing market with very interesting properties. I briefly explain some of these properties, and talk about ad allocation problems in the context of sponsored search and display advertising.


  • Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising, with Amin Saberi, and Rakesh Vohra.
    Proceedings of the 17th International World Wide Web Conference (WWW), 179-188, 2008.
  • A Combinatorial Allocation Mechanism with Penalties For Banner Advertising, with Uriel Feige, Nicole Immorlica, and Vahab S. Mirrokni.
    Proceedings of the 17th International World Wide Web Conference (WWW), 169-178, 2008.
  • Advertisement Allocation for Generalized Second Pricing Schemes, with Ashish Goel, Mohammad Mahdian, and Amin Saberi
    Fourth Workshop on Ad Auctions, 2008
  • Allocating Online Advertisement Space with Unreliable Estimates, with Mohammad Mahdian and Amin Saberi.
    Proceedings of the 8th ACM Conference on Electronic Commerce (EC), 288-294, 2007.

From Bayesian to Worst-Case Optimal Auction Design
Tim Roughgarden

We define a general template for auction design that explicitly connects Bayesian optimal mechanism design, the dominant paradigm in economics, with worst-case analysis. In particular, we establish a general and principled way to identify appropriate performance benchmarks for prior-free optimal auction design. This framework enables, for the first time, meaningful competitive analysis for auction problems with allocation asymmetries, bidder asymmetries, and costly payments.

The above is joint work with Jason Hartline (from STOC 2008 and ongoing). Time permitting, we will also briefly discuss another recent application of Bayesian optimal auction design: rigorous efficiency-revenue approximation trade-offs (joint work with Shaddin Dughmi and Mukund Sundararajan).

Social Design
Yury Lifshits
, Caltech,

Social Design is an art and science of setting and managing incentives in web applications. So far, machine learning is the dominant approach to all central algorithmic problems in the Web: search, advertising, recommendation systems. Social design can become an important counterpart to ML, if we find the ways to incentivise users to give us more and better input data and even the answers themselves.
This talk is an outline of a new research field of Social Design. We start with several stories including Internet press conference of president Putin, reputation crash in largest Russian social network, and Yahoo! SearchMonkey project. Then we attempt to extract, characterize and classify the commonly used patterns in social design. Finaly, we set up the research agenda in the area with focus on social aspects of semantic web and advertising technologies.
Slides available at
Bio: Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two gold medals at International Mathematical Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":

Inferring Links and Initiators
Evimaria Terzi

Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i; j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix we study the problem of inferring the underlying directed links between entities (rows) and finding which entities act as initiators. I will formally define this problem and propose an MCMC framework for estimating the links and the initiators given the matrix of observations M. I will also show how this framework can be extended to incorporate a temporal aspect; instead of considering a single observation matrix M we consider a sequence of observation matrices M1; : : : ;Mt over time. Finally I will show that this problem is a generalization of several problems studied in the field of social-network analysis.
Bio: Evimaria Terzi is a Research Staff Member at IBM Almaden since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, MSc from Purdue University (USA) in 2002 and BSc from the University of Thessaloniki (Greece) in 2000.

Reputation Mechanisms for Online Markets
Christina Aperjis

In online marketplaces potential buyers have to decide whether to buy an item and how much to pay for it without being able to observe it and without knowing whether the seller is trustworthy. Reputation mechanisms promote trust by giving information about past transactions. With the goal of incentivizing sellers to advertise truthfully, we study how the marketplace is affected by (i) the reputation mechanism, and (ii) the way buyers interpret the seller?s reputation score.

Two-Stage Myopic Dynamics in Network Formation Games
Esteban Arcaute

We consider two-stage myopic dynamics for network formation games, where utility to a node is based on a function of the distance to all other nodes. Since our network formation game is a generalization of Myerson's announcement game, we use pairwise Nash stability as the solution concept.
We prove that, although the price of anarchy of the static game is unbounded (in the number of nodes in the network), our dynamics converge to a subset of pairwise Nash stable networks with a constant efficiency ratio. For some specific utility functions, we prove that our dynamics converge to efficient networks.
This is joint work with Ramesh Johari and Shie Mannor.

Measuring and Modeling the Web
Ying Xu

The last couple of decades have witnessed a phenomenal growth in the World Wide Web. The Web has become an ubiquitous channel for information sharing and dissemination. This talk describes several research contributions in an endeavor towards a better understanding of the Web.
One of the basic questions about the Web is its size, i.e., how many webpages there are on the Web. A closely related problem is the sizes of search engine indexes, as search engine index defines an important subset of the Web that is most easily accessible to users. In the last ten years both the scientific literature and the popular press dealt at length with methodologies and estimates for the sizes of the Web and public search engines. In the first part of the talk, we propose algorithms for measuring the Web size, including both empirical and theoretical results.
The Web is so large that even estimating its size is a hard problem. To better understand the Web, people study random web graph models -- a stochastic process that generates random graphs that with high probability resemble the Web graph. Random web graph models are useful theoretical tools to provide insights to the evolution of the Web graph, to predict future behaviors, and to generate synthetic data sets for testing and simulation. In the second half of the talk, we briefly present several results in analyzing web graph models and applying the models to real world applications.

Should ad networks bother fighting click fraud?
Bobji Mungamuru

Suppose an ad network decides that a given click-through is invalid (or "fraudulent"). The implication is that the ad network will not charge the advertiser for that click-through. Therefore, arguably, the ad network is "taking money out of his own pocket" by marking clicks invalid. As such, should ad networks even bother fighting fraud? We analyze a simple economic model of the online advertising market, and conclude that the answer is "yes".

Reputation systems
Robbie Yan

Effective reputation systems should incentivize users to acquire and reveal information about content quality. They should also counter distortions induced by strategic agents that benefit from manipulating reputations. Mechanisms have been proposed in the recent literature with the aim to provide such incentives. These mechanisms, which we will refer to as reputation markets, can be viewed as information markets designed to assess the quality of online content. In this paper we develop equilibrium models to study how incentives created by a reputation market should influence community behavior and the accuracy of assessments.

Optimal Marketing Strategies over Social Networks
Mukund Sundararajan

Abstract: We study revenue maximization in social networks. We identify a family of strategies called 'influence and exploit' strategies that are easy to implement, easy to optimize over and approximately optimal.
Based on a paper that will appear in WWW 08. Jointly with Jason Hartline (Northwestern University) and Vahab Mirrokni (MSR Redmond).i

Ratio Index for Budgeted Learning, with Applications
Brad Null

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. We present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Index based policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multi-armed bandit problem.
The ratio index has several interesting structural properties. First, it can be computed in strongly polynomial time; the techniques we develop may be useful for deriving strongly polynomial algorithms for other Markov Decision Problems. Second, with the appropriate discount factor, the Gittins index and the ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, the ratio index can be used to obtain a "discount oblivious" solution to the multi-armed bandit problem, which in turn gives constant factor approximations to a wide class of problems that we call "list ranking problems".
Joint work with Rajat Bhattacharjee, Ashish Goel, and Sanjeev Khanna

Web Graph Similarity for Anomaly Detection
Panagiotis Papadimitriou

Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this work we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from well-known document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.

Brian Babcock

I'll talk about a number of computational problems that arise in online advertising, primarily from the perspective of the buyer of advertising.
Topics to be discussed include:
* When buying sponsored search links from search engines, how does one manage a portfolio of millions of keywords with minimal human involvement? Challenges include determining how much to bid for each keyword and what ad text to display to attract the most customers.
* How can one learn what properties of a web site or advertisement will lead to the greatest number of purchases or clicks? The problem is made more difficult by the fact that different classes of users respond well to different marketing messages, so it's best to use different content for different user segments.
* Consider an auction where the items to be sold are not known in advance, but rather arrive online, and buyers are willing to pay different amount for each item (revealed as the items arrive). Each buyer has a maximum limit on the total number of items that they will buy. How should the seller assign items to buyers so as to maximize the seller's revenue?

An Axiomatic Approach to Personalized Ranking Systems
Alon Altman

Personalized ranking systems and trust systems are an essential tool for collaboration in a multi-agent environment. In these systems, trust relations between many agents are aggregated to produce a personalized trust rating of the agents. In this talk I introduce the first extensive axiomatic study of this setting, and explore a wide array of well-known and new personalized ranking systems. We adapt several axioms (basic criteria) from the literature on global ranking systems to the context of personalized ranking systems, and fully classify the set of systems that satisfy all of these axioms. We further show that all these axioms are necessary for this result.

Ad Auctions with Reserved Price
Arash Asadpour

I will talk about the Ad Auctions when the auctioneer enjoys a relevant ad that she can show in one of the slots if she wants to. The setting is to some extent similar to the Ad Auctions with reserved price. I will explain a randomized auction which is truthful in expectation for both the bidders and the auctioneer. Also our mechanism is individual rational and budget balanced. I will discus the revenue of this auction and also will talk about some other settings in which the technique used to design the auction can be helpful.
This is a joint work with Kamal Jain (Microsoft Research) and David Abraham (CMU).

Boosting the area under ROC curve
Phil Long
, Google

We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. This boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
(This is joint work with Rocco Servedio.)

Random input models for Adwords
Aranyak Mehta
, Google

I will talk about a new result on Random Input models for the Adwords Problem: What are good algorithms for allocation when the query stream is modeled as a random process, rather than as a worst case input. How does the natural "highest-bidder-wins" greedy algorithm perform? This question leads to an interesting generalization of a classic online matching algorithm. I will also talk about a related result on extended bidding models for Ad auctions.

Who do you know: The Social Network Revolution
Orkut Buyukkokten
, Google

Online social networks fundamentally change the way we get connected. The people we cross paths with have the biggest influence in our lives. Now it's easier to cross paths than ever as we are much closer and so much more connected. In this talk I will discuss the motivation behind the development of, touch on the social and technical aspects of implementing and maintaining a system that has over 60 million users and reflect on the lessons we learned.

Deterministic Decentralized Search in Random Graphs
Esteban Arcaute

We study a general framework for decentralized search in random graphs. Our main focus is on deterministic memoryless search algorithms that use only local information to reach their destination in a bounded number of steps in expectation. This class includes (with small modifications) the search algorithms used in Kleinberg's pioneering work on long-range percolation graphs and hierarchical network models. We give a characterization of searchable graphs in this model, and use this characterization to prove a monotonicity property for searchability.
Joint work with Ning Chen, Ravi Kumar, David Liben-Nowell, Mohammad Mahdian, Hamid Nazerzadeh and Ying Xu.

Interdomain Routing and Games
Michael Schapira
, Hebrew University of Jerusalem (

We present a game-theoretic model that captures many of the intricacies of interdomain routing in today's Internet. In this model, the strategic agents are source-nodes located on a network, who aim to send traffic to a unique destination node . The model enables complex dynamic interactions between the agents (asynchronous, sequential, and based on partial-information).
We provide realistic conditions that guarantee that executing the Border gateway Protocol (BGP), the only protocol currently used for interdomain routing, is in the best interest of each of the players. Moreover, we show that even coalitions of players of any size cannot improve their routing outcomes by jointly deviating from BGP. Unlike the vast majority of works in mechanism design, these results do not require any monetary transfers (to or by the agents).
Based on joint work with Hagay Levin and Aviv Zohar. The Anatomy of Clickbot.A
Neil Daswani
, Google

In this talk, I will present a detailed case study of the architecture of the Clickbot.A botnet that attempted a low-noise click fraud attack against syndicated search engines. The botnet of over 100,000 machines was controlled using a HTTP-based botmaster. Google identified all clicks on its ads exhibiting Clickbot.A-like patterns and marked them as invalid. We disclose the results of our investigation of this botnet to educate the security research community and provide information regarding the novelties of the attack.

Data Stream Algorithms with Applications to Automatic Worm Fingerprinting
Dilys Thomas

We will first go over a few network worms and contemporary algorithms to detect these network worms. We will then go over some recent techniques to detect these worms using "vunlerability signatures". We will then use this to review two important problems in data stream computation -- distinct value estimation and frequent element computation. We show how these algorithms can be used for automatic worm fingerprinting.

Click Fraud Resistant Methods for Learning Click-Through Rates
Sergei Vassilvitskii

Presenting the paper "Click Fraud Resistant Methods for Learning Click-Through Rates" by N. Immorlica, K. Jain, M. Mahdian, and K. Talwar from WINE 2005.

Pricing Partially Compatible Products
Adam Meyerson

Suppose two competing companies provide roughly equivalent suites of software products. Each version of a product has a possibly distinct price and quality. Customers would like to purchase a low- price, high-quality set of products. The issue is that products made by different companies need not be fully compatible. We reduce the customer's decision process to a minimum cut, and provide approximation and hardness results for the problem of computing the company's best-response strategy in several different models. We also introduce a range of questions for further work.
These results will appear in the ACM Conference on Electronic Commerce (EC) 2007, and are joint work with David Kempe, Nainesh Solanki, and Ramnath Chellappa.

Google News Personalization: Scalable Online Collaborative Filtering
Mayur Datar
, Google

Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.

Detecting Near-Duplicates for Web Crawling
Anish Das Sarma

This is joint work with Gurmeet Singh Manku and Arvind Jain in Google.
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not.
In my talk I will address the problem of efficiently detecting near-duplicates in the web crawl setting. First I shall review Charikar's fingerprinting technique used in our approach. Then, I shall formally define the problem and present algorithmic solutions for two versions of the problem: online queries (single new document) and batch queries (multiple new documents). Finally, I shall briefly talk about our experimental analysis of the techniques.

Clickthrough of and Bidding on keywords
Aleksandra Korolova / Dilys Thomas

Aleksandra will talk about the paper "Predicting ClickThrough Rate Using Keyword Clusters" by Regelson and Fain from a Workshop on Sponsored Search Auctions. The results in the paper are mostly experimental. Dilys will briefly talk on "how to bid on keywords".

Multi-path Routing
Mihaela Enachescu

I will present an approach for enabling multi-path routing in large networks. This is based on a project at NEC Labs, and is joint work with Ravi Kokku. Our approach is to create multiple paths between various source-destination pairs by selecting a special subset of relay nodes. We give hardness results and provide some approximation algorithms.
This work is motivated by the observation that given two end-host in the Internet there exist potentially many independent or partially overlapping paths between them. However, only one of these is usually used by the underlay routing, leading to many inefficiencies (the single path routings are usually slow to adapt in the presence of fluctuating network conditions). Multi-path routing provides a more adaptive framework, and it also raises a variety of (still open) problems for our enjoyment.

Collaborative tagging systems
Paul Heymann

Collaborative tagging systems have recently become prevalent on the web. These systems are similar to classical keyword-based systems, but harness the input of many users to label large datasets like URLs on the web, videos, and photos. In my talk, I will give a brief description of tagging systems, and then talk about two recent projects. The first is work to predict tags annotating URLs based on the page text of those URLs. The outcome of this task has important implications for web search and for the usefulness of data created by social bookmarking systems. The second is an earlier (continuing) project on tag hierarchies.

Is Efficiency Expensive?
Mukund Sundararajan

We study the simultaneous optimization of efficiency and revenue in pay-per-click keyword auctions in a Bayesian setting. Our main result is that the efficient keyword auction yields near-optimal revenue even under modest competition. In the process, we build on classical results in auction theory to prove that increasing the number of bidders by the number of slots outweighs the benefit of employing the optimal reserve price.
This is joint work with Tim Roughgarden.

Estimating Search Engine Index and Web Sizes
Ying Xu

I will talk about estimating the index size of search engines, both empirical and theoretical results. The talk is based on our CIKM paper "Estimating Corpus Size via Queries" and ICALP paper "Estimating Sum by Weighted Sampling".

Reusable and multiple goods
Ashish Goel

The talk is based on
Online auctions with re-usable goods by Mohammad Hajiaghayi, Robert Kleinberg, Mohammad Mahdian, and David Parkes, EC'05
The Sequential Auction Problem on eBay: An Empirical Analysis and a Solution by Adam I. Juda and David C. Parkes, EC'06

Mechanism design based on Cost-Per-Acquisition (CPA)
Hamid Nazerzadeh

The Internet advertising has changed dramatically by moving from Cost-Per-Impression charging mechanism to Cost-Per-Click (CPC). I am going to talk about the possibility of going one step further and using CPA instead of CPC, and some of our results about this problem.

Asymptotically Optimal Repeated Auctions for Sponsored Search
Nicolas Lambert

We investigate asymptotically optimal keyword auctions, that is, auctions which maximize revenue as the number of bidders grows. We do so under two alternative behavioral assumptions. The first explicitly models the repeated nature of keyword auctions. It introduces a novel (and, we argue, plausible) assumption on individual bidding, namely that bidders never overbid their value, and bid their actual value if shut out for long enough. Under these conditions we present a broad class of repeated auctions that are asymptotically optimal among all sequential auctions (a superset of repeated auctions). Those auctions have varying payment schemes but share the ranking method. The Google auction belongs to this class, but not the Yahoo auction, and indeed we show that the latter is not asymptotically optimal. (Nonetheless, with some additional distributional assumptions, the Yahoo auction can be shown to belong to a broad category of auctions that are asymptotically optimal among all auction mechanisms that do not rely on ad relevance.) We then look at the one-shot keyword auction, which can be taken to model repeated auctions in which relatively myopic bidders converge on the equilibrium of the full-information stage game. In this case we show that the Google auction remains asymptotically optimal and the Yahoo auction suboptimal. The distributional assumptions under which our theorems hold are quite general, and arguably hold in all real-world situations of interest. We do however show that the Google auction is not asymptotically revenue maximizing for general distributions.