Archive of RAIN talks
RAIN schedule for Spring Quarter 201718
RAIN schedule for Winter Quarter 201718
RAIN schedule for Autumn Quarter 201718
Date  Speaker  Topic 

Oct 11

Yang Cai 
Learning Multiitem Auctions with (or without) Samples

Oct 18

Michal Feldman 
Prophet Inequalities Made Easy: Stochastic Optimization by Pricing NonStochastic Inputs

Nov 1

Moritz Hardt 
Biases Beyond Observation

Nov 8

Jon Kleinberg 
Joint seminar with GSB Econ Theory
Time: 3:45  5 PM, Venue: GSB C105 
Nov 15

Mihai Manea 
Bottleneck links, essential intermediaries, and competing paths of diffusion in networks.

Nov 29

Manuel Gomez Rodriguez 
On Fairness in Supervised Learning

RAIN schedule for Winter Quarter 201617
Date  Speaker  Topic 

January 25

Devavrat Shah 
Blind Regression, Recommendation System and Collaborative Filtering

February 8

Udi Weinsberg 
DataScienceDriven Products at Facebook

February 22

Anima Anandkumar 
Largescale Machine Learning: Theory and Practice

March 8

Éva Tardos 
Learning and Efficiency of Outcomes in Games

RAIN schedule for Spring Quarter 201617
Autumn 201617
Date  Speaker  Topic 

October 12

Philippe Rigollet at OIT seminar, room M109 at GSB 
Online Learning in Repeated Auctions

November 2

Dennis Feehan 
Network reporting methods for sample surveys

November 16

Mohammad Akbarpour 
Timing "versus" Matching: Dynamic Matching on Stochastic Networks

November 30

Ariel Procaccia 
Computational Social Choice: For the People

Spring 201516
Winter 1314
Date  Speaker  Topic 

January 21  Edith Cohen  Scalable mining of massive networks: Distancebased Centrality, Similarity, and Influence 
February 11  Glen Weyl  Quadratic Voting 
February 18  Paul Milgrom  Auctions and Decentralization in Complex Networks 
Autumn 1314
Date  Speaker  Topic 

September 24  Balasubramanian Sivan  Towards optimal algorithms for prediction with expert advice 
October 8  Vivek Farias  Online AB Testing 
October 29  Fil Menczer  The spread of information in social media 
November 19  Sergei Vassilvitskii  Inverting a SteadyState 
Spring 1314
Winter 1314
Date  Speaker  Topic 

January 29  Derek Ruths  Control Profiles of Complex Networks 
February 5  Amin Karbasi  From SmallWorld Networks to UserDriven Content Search 
February 19  Arun Chandrasekhar  Savings Monitors 
February 26  Ariel Procaccia  Computational Fair Division: From Cake Cutting to Cloud Computing 
March 12  Niki Kittur  Big Thinking: Augmenting human cognition with crowds and computation 
March 19  Ricardo BaezaYates  OSNs: The Wisdom of a Few and The Value of Users 
Autumn 1314
Spring 1213
Winter 1213
Autumn 1213
Spring 1112
Winter 1112
Date  Speaker  Topic 

January 18  Michael Wellman  Price Prediction Strategies for Simultaneous OneShot Auctions 
January 25  George Varghese  Lattice Games and the Economics of Aggregators 
February 1  Michael Kearns  Competitive Contagion in Networks 
February 8  Stephen Barley  Building an Institutional Field to Corral a Government 
February 15  Siddharth Suri  Cooperation and Assortativity with Endogenous Partner Selection 
February 22     
February 29     
March 6  Sergiu Hart  Two(!) Good To Be True 
March 7  Dean Eckles  Identifying peer effects in online communication behaviors 
Autumn 1112
Date  Speaker  Topic 

October 5  Jure Leskovec  Networks, communities and the groundtruth 
October 12  Vincent Conitzer  Computing GameTheoretic Solutions for Security 
October 19  Costis Daskalakis  Revenueoptimal Auctions 
October 26  David Parkes  A Random Graph Model of MultiHospital Kidney Exchanges 
November 2     
November 9  Arnout van de Rijt  A Test of the "Fifteen Minutes" Hypothesis using LargeScale Data from Newspapers and Blogs 
November 16  Markus Mobius  Treasure Hunt 
November 23    Thanksgiving 
November 30  Yiling Chen  Task Routing for Prediction Tasks 
Spring 1011
Winter 1011
Date  Speaker  Topic 

January 12  Martin Suchara  BGP Safety with Spurious Updates  The Conditions of BGP Convergence 
January 19  No seminar   
January 26  Gabor Szabo  Predicting the Popularity of Online Content 
February 2  No seminar   
February 9  No seminar   
February 16  Elisa Celis  BuyitNow or TakeaChance: A simple randomized auction mechanism 
February 23  Yashodhan Kanoria  Social Learning and the Dynamic Cavity Method 
March 2  Matthias Mnich  The Complexity Ecology of Consensus Ranking 
Autumn 1011
Date  Speaker  Topic 

September 22  Elliot Anshelevich  Contribution Games in Social Networks 
September 29  Pranav Dandekar  Liquidity in Credit Networks: A Little Trust Goes a Long Way 
October 6  Sugato Basu  User Browsing Models: Relevance versus Examination 
October 13  Daria Sorokina  Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank 
October 20  Marc Najork  Reflections on Linkbased Ranking 
October 27   
 
November 3  Panagiotis Papadimitriou  Output URL Bidding 
November 10  Roberto Bayardo  A Technical Overview of Search Ads Quality at Google 
November 17  Georgios Pierrakos  On the Competitive Ration of Online Sampling Auctions 
November 24   
 
December 1  Paolo Parigi  The importance of ties 
Spring 0910
Winter 0910
Date  Speaker  Topic 

Jan 13  Jure Leskovec  Network of positive and negative edges 
Jan 20  Esteban Arcaute  
Jan 27  Christina Aperjis  Temporal User Behavior in Yahoo Answers 
Feb 3  Ashish Goel  Some open problems motivated by social networks 
Feb 10  Yehuda Koren  The Netflix Prize: Quest for $1,000,000 
Feb 17  DJ Patil  Data Jujitsu  the art of treating data as a product 
Feb 24  Panayiotis Tsaparas  Querylog analysis for improving web search 
Mar 3  Arash Asadpour  Stochastic Submodular Optimization 
Mar 10  Ravi Kumar  Compressibility of Behavioral Graphs 
Mar 17  Sep Kamvar  Aardvark 
Autumn 0910
Date  Speaker  Topic 

Sep 30  Arvind Narayanan  Deanonymizing Social Networks 
Oct 7  Nicolas Lambert  Eliciting Information on the Distribution of Future Outcomes 
Oct 14  Lars Backstrom  Optimizing Web Traffic via the Media Scheduling Problem 
Oct 21  Michael Kearns  Behavioral Experiments in Strategic
Networks Note special location: Terman Engineering Center, Room 453 
Oct 28  Matthew Jackson  How Social Network Structure affects Diffusion and Learning 
Nov 4  Paul Valiant  How to Design Profitable Auctions 
Nov 11  Mohammad Mahdian  Sort me if you can (or, Algorithms on Dynamic Data) 
Nov 18  DJ Patil  Canceled! 
Nov 25   
No meeting  Happy Thanksgiving! 
Dec 2  Abdur Chowdhury  Discovery & Emergence Note special location: Terman Engineering Center, Room 453 
Spring 0809
Date  Speaker  Topic 

Apr 3  Mayur Datar 
Internet search advertising: Challenges in targeting and quality considerations 
Apr 10   
No meeting this week 
Apr 17  D Sivakumar 
Affiliation Networks 
Apr 24   
No meeting  WWW 2009 
May 1   
No meeting  BAGT 
May 8  Sandeep Pandey Yahoo! Research 
Using Bandit Models in Search Engines 
May 14  Richard Chatwin VP Research, Adchemy 
Research Challenges in Online Advertising 
May 22  Robert Johnson 
Scaling Social Data 
May 29  Yury Lifshits Yahoo! Research 
Data Cloud: Integrating Structured Information into Web Search 
June 5  Robbie Yan  Manipulation Robustness of Collaborative Filtering Systems 
Winter 0809
Date  Speaker  Topic 

Jan 16    Organizational meeting 
Jan 23  G.D. Ramkumar SnapTell, Inc. 
Scalable image recognition algorithms for camera phone applications 
Jan 30  Luca de Alfaro  ContentDriven Reputation and Trust for Collaborative Content 
Feb 6  Ioannis Antonellis  Tagging with Queries: How and Why? (To appear in WSDM 2009) 
Feb 13  Scott Brave Baynote, Inc. 
Content Targeting and Recommendation Systems 
Feb 20  Ravi Kumar  Online Social Networks: Modeling and Mining 
Feb 27  Michael Mahoney  Community Structure in Large Social and Information Networks 
Mar 6  Kostas Kollias  An IncentiveBased Architecture for Social Recommendations 
Mar 13  Ali Dasdan  Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction 
Autumn 0809
Date  Speaker  Topic 

Oct 3    Talk by Jure Leskovec and organizational meeting 
Oct 10  Hady Lauw  Trusts among Users of Online Communities  an Epinions Case Study 
Oct 17  Canceled   
Oct 24  Bahman Bahmani  Online Budgeted Matching in Random Input Models with Applications to Adwords 
Oct 31  Sergei Vassilvitskii  Topk Aggregation Using Intersection of Ranked Inputs 
Nov 7    No meeting  Bay Area Theory Seminar (BATS) 
Nov 14  Mukund Sundararajan  UtilityMaximizing Privacy Mechanisms 
Nov 21  Hamid Nazerzadeh  Online advertising 
Nov 28    No meeting  Happy Thanksgiving! 
Dec 5  Tim Roughgarden  From Bayesian to WorstCase Optimal Auction Design 
Spring 0708
Date  Speaker  Topic 

Apr 11  BAGT  
Apr 18  Robbie Yan  Reputation Markets 
Apr 25  Cancelled for WWW  
May 2  Bobji Mungamuru  Click Fraud 
May 9  Ying Xu  Oral Examination 
May 16  Esteban Arcaute  Network Formation Games 
Mar 23  Christina Aperjis  Reputation Systems 
May 30  Evimaria Terzi  Inferring Links and Initiators 
June 6  Yury Lifshits  Social Design 
Winter 0708
Date  Speaker  Topic 

Jan 25  Alon Altman  An Axiomatic Approach to Personalized Ranking Systems 
Feb 1  Ashish Goel  Open Problems 
Feb 8  Cancelled  
Feb 15  Brian Babcock  Adchemy 
Feb 22  Rajat Bhattacharjee  Oral Examination 
Feb 29  Panagiotis Papadimitriou  Web Graph Similarity for Anomaly Detection 
Mar 7  Brad Null  Ratio Index for Budgeted Learning, with Applications 
Mar 14  Mukund Sundararajan  Optimal Marketing Strategies over Social Networks 
Autumn 0708
Date  Speaker  Topic 

Oct 15  Michael Schapira  Interdomain Routing and Games 
Oct 22  Esteban Arcaute  Decentralized Search of Random Graphs 
Oct 29  Orkut Buyukkokten  Social Network Revolution 
Nov 5  Aranyak Mehta  Random input models for Adwords 
Nov 12  Phil Long  Boosting the area under the ROC curve 
Nov 19  Thanksgiving  
Nov 26  cancelled  
Dec 3  Arash Asadpour  Ad auctions with reserved price 
Spring 0607
Date  Speaker  Topic 

Apr 13  Ying Xu  a discussion on WWW'07 papers 
Apr 20  BAGT  
Apr 27  Anish Das Sarma  Detecting NearDuplicates for Web Crawling 
May 4  Mayur Datar  Google News Personalization 
May 11  Adam Meyerson  Pricing Partially Compatible Products 
May 18  Sergei Vassilvitskii  Click Fraud Resistant Methods for Learning ClickThrough Rates 
May 25  Dilys Thomas  Streaming Algorithms and Worm Fingerprinting 
June 1  Neil Daswani  Anatomy of Clickbot.A 
Winter 0607
Date  Speaker  Topic 

Jan 26  Nicolas Lambert  Asymptotically Optimal Repeated Auctions 
Feb 2  Hamid Nazerzadeh  Mechanism design based on CPA 
Feb 9  Ashish Goel  Reusable and multiple goods 
Feb 16  Ying Xu  Estimating Search Engine Index and Web size 
Feb 23  Mukund Sundararajan  Is efficiency expensive? 
March 2  Paul Heymann  Collaborative tagging systems 
March 9  Mihaela Enachescu  Multipath Routing 
March 16  Aleksandra Korolova / Dilys Thomas  Clickthrough of and Bidding on keywords 
Autumn 0607
Date  Speaker  Topic 

Oct 25  Ying Xu  The Netflix Challenge 
Nov 1  Ashish/Alexandra  Social Networks 
Nov 8  Arash/Shubha  PageRank 
Nov 15  TBA  TBA 
Nov 22    Netflix followup 
Nov 29  Zoltan Gyongyi  TBA 
Dec 6  Sreenivas Gollapudi  TBA 
Spring 0506
Date  Speaker  Topic 

Apr 7  Ashish Goel  Axiomatic PageRank 
Apr 14    Organizational Meeting 
Apr 21  Sergei Vassilvitskii  Relevance Feedback and Web Search 
Apr 28  Hamid Nazerzadeh  AdWords with Unreliable Estimates 
May 5  Mihaela Enachescu  Compact Routing 
May 12  Ying Xu  Streaming Algorithms for Graph Problems 
May 19     
May 26  Udi Manber  New Problems in Search 
Jun 2  Rajat Bhattacharjee  Incentive Based Ranking Mechanisms 
Winter 0506
Date  Speaker  Topic 

Jan 20    Organizational Meeting 
Jan 27  Brad Null  Multiarmed Bandit Problem 
Feb 3  Edith Cohen  SpatiallyDecaying Aggregation 
Feb 10  Andrew Tomkins  Routing and Growth Models for Social Networks 
Feb 17    BAGT 
Feb 24  Amin Saberi  Open Problems 
Mar 3  Mayur Datar, Ashutosh Garg  Recommendation Systems 
Mar 10  Amruta Joshi  Googlebase 
Fall 0506
Date  Speaker  Topic 

Oct 7  Roee Engelberg  Equilibria in Online Games 
Oct 14  Ying Xu, Sergei Vassilvitskii  Models for the Web Graph 
Oct 21  Gagan Aggarwal  AdWords 
Oct 28  Omar Benjelloun  SERF 
Nov 4  Zoltan Gyongyi  Link Spam 
Nov 11  Mihaela Enachescu  Query Incentive Networks 
Nov 18    BATS 
Dec 2  Rajat Bhattacharjee  Survey of Recommendation Systems 
Spring 0405
Date  Speaker  Topic 

Apr 4  Shubha Nabar  Impact of Search Engines on Evolution of Page Popularity 
Apr 11  Svetlozar Nestorov  Mining Structure from Semistructured Data 
Apr 18  Sepandar Kamvar  Reputation Management in P2P Networks 
Apr 25  Rina Panigrahy  PageRank Axioms 
May 2  Amin Saberi  Spread of Viruses on the Internet 
May 9  Ramesh Johari  Stable Policy Routing with Provider Independence 
May 16  Rajat Bhattacharjee  Reputation Systems 
May 23  Mehran Sahami  Matching Text on the Web 
Jun 6  Amruta Joshi  Modeling Documents 
Winter 0405
Date  Speaker  Topic 

Feb 2  Ying Xu  Recommendation Systems/Collaborative Filtering 
Feb 9  Ashish Goel  Spam and Page Rank 
Feb 16  Gagan Aggarwal  AdWords 
Feb 23  David Arthur  Gossip and File Sharing 
Mar 2  Anil Kamath  AdWords 
Mar 10  Sergei Vassilvitskii  
Mar 16  Dilys Thomas  Social Networks 
Abstracts
Allocating Resources, in the FutureSid Banerjee, Cornell
Abstract: The online allocation of scarce resources is a canonical paradigm of decisionmaking under uncertainty, and is widely studied in many fields of engineering under a variety of titles (dynamic allocation/revenue management/prophet inequalities/etc.). In this talk, I will reexamine the basic online allocation problem, with the aim of building bridges to the everimproving predictive capabilities enabled by modern machinelearning methods. To this end, I will present a new Bayesianlearning inspired algorithm for online allocation problems, and show that it achieves the first horizonindependent and budgetindependent regret bounds for online packing with stochastic inputs. Surprisingly, the result stems from elementary tools  LP sensitivity and basic concentration of measures.
Bio: Sid Banerjee is an assistant professor in the School of Operations Research and Information Engineering (ORIE) at Cornell. His research is on stochastic modeling, and the design of algorithms and incentives for largescale systems. He got his PhD in ECE from UT Austin, following which he was a postdoctoral researcher in the Social Algorithms Lab at Stanford, as well as a technical consultant at Lyft.
The role of the propensity score in the synthetic control method
Avi Feller, UC Berkeley
Abstract:The synthetic control method is a popular approach for estimating the impact of a treatment on (typically one) treated unit in settings with many control units and observed pretreatment outcomes for all units. The key idea is to construct a weighted average of control units, known as a synthetic control (SC), that minimizes imbalance of pretreatment outcomes between the treated unit and synthetic control. Our main result is that synthetic control weights are numerically equivalent to inverse propensity score weights (IPW) with pretreatment outcomes as covariates and heavy regularization of the propensity score model. Leveraging a primaldual connection, we map features of the propensity score model to choices about the specification of the original SC optimization problem. In particular, the original SC method, which balances the L2 norm of pretreatment outcomes, is identical to IPW with an L2 penalty on the propensity score model and with the least feasible regularization; other choices of balance criteria (e.g., Linfinity norm) correspond to other forms of regularization (e.g., Lasso). We then use this numerical equivalence to make several recommendations for practice. First, as in other settings with highdimensional covariates, we argue that the estimate can be quite sensitive to the type and level of regularization. Where the researcher has knowledge of the data generating process, there are large returns to choosing an appropriate regularization that emphasizes balance on key covariates  in general, there is little reason to choose the default regularization implied by the original SC approach. We explore several alternative approaches for setting the regularization, including crossvalidation. Second, we directly apply existing results from the literature on estimating causal effects with highdimensional covariates. For instance, once we view SC as an IPW estimator, it is natural to consider an Augmented IPW estimator, combining the SC weights with an outcome model. Third, we conduct extensive simulation studies to assess the performance of these different estimators in practice. Finally, we use these ideas to assess the impact of the 2012 Kansas tax cuts on economic growth, finding persistent negative effects.
Bio: Avi Feller is an assistant professor at the Goldman School, where he works at the intersection of public policy, data science, and statistics. His methodological research centers on learning more from social policy evaluations, especially randomized experiments. His applied research focuses on working with governments on using data to design, implement, and evaluate policies. Prior to his doctoral studies, Feller served as Special Assistant to the Director at the White House Office of Management and Budget and worked at the Center on Budget and Policy Priorities. Feller received a Ph.D. in Statistics from Harvard University, an M.Sc. in Applied Statistics as a Rhodes Scholar at the University of Oxford, and a B.A. in Political Science and Applied Mathematics from Yale University.
Algorithmically exploiting structure or: how I learned to stop worrying and enjoy realworld graphs
C. Seshadhri, UC Santa Cruz
Abstract:As an algorithms researcher, the existence of heuristics for various computational graph problems (like finding cliques) on realworld graphs bothers me to no end. What is it about (say) social networks that make classically hard problems easy? Or, how do algorithms circumvent various lower bounds when it comes to realworld graphs?
This question has no simple answer, and I will present a tale of two stories on this theme. In the first, we show how to practically and provably count cliques in social networks, by exploiting the local density of these graphs. A curious feature of the mathematics is the use of Turan's theorem in extremal combinatorics, to prove correctness of the algorithm. In the second, we show how to estimate the degree distribution of a graph by sampling a tiny portion of it, by exploiting the heavy tailed structure of the degree distribution. This result uses some recent theoretical techniques in sublinear algorithms to simulate uniform random edge queries through uniform random vertices. In each of these results, there is an interesting interplay of combinatorics, randomized algorithms, and social network analysis.
Bio: C. Seshadhri is an assistant professor of Computer Science at the University of California, Santa Cruz. Prior to joining UCSC, he was a researcher at Sandia National Labs, Livermore in the Information Security Sciences department, during 20102014. His primary interest is in mathematical foundations of big data, especially modeling and algorithms. By and large, he works at the boundary of theoretical computer science and data mining. His work spans many areas: sublinear algorithms, graph algorithms, graph modeling, and scalable computation for large data sets.
Highdimensional random geometric graphs
Miklos Racz, Princeton
Abstract:I will talk about two natural random geometric graph models, where connections between vertices depend on distances between latent ddimensional labels. We are particularly interested in the highdimensional case when d is large. We study a basic hypothesis testing problem: can we distinguish a random geometric graph from an ErdosRenyi random graph (which has no geometry)? We show that there exists a computationally efficient procedure which is almost optimal (in an informationtheoretic sense). The proofs will highlight new graph statistics as well as connections to random matrices. This is based on joint work with Sebastien Bubeck, Jian Ding, Ronen Eldan, and Jacob Richey.
Bio: Miklos is an Assistant Professor in the ORFE Department at Princeton University. His research focuses on probability, statistics, and its applications. Before Princeton, he spent two years as a postdoc in the Theory Group at Microsoft Research, Redmond. He received his PhD in Statistics from UC Berkeley in 2015, where he was advised by Elchanan Mossel. He also obtained an MS in Computer Science from Berkeley. Previously, he received an MS in Mathematics from the Budapest University of Technology and Economics, under the supervision of Marton Balazs and Balint Toth.
Dynamics of Distributed Updating in Fisher Markets
Richard Cole, NYU
Abstract:A major goal in Algorithmic Game Theory is to justify equilibrium concepts from an algorithmic and complexity
perspective. One appealing approach is to identify natural distributed algorithms that converge quickly to an
equilibrium. This paper established new convergence results for two generalizations of proportional response
in Fisher markets. These results stem from a correspondence with mirror descent algorithms for convex
optimization. Several of our new results are a consequence of new notions of strong Bregman convexity and
of strong Bregman convexconcave functions, and associated linear rates of convergence, which may be of
independent interest.
Among other results, we analyze a damped generalized proportional response and show a linear rate of
convergence in a Fisher market with buyers whose utility functions cover the full spectrum of CES utilities
aside the extremes of linear and Leontief utilities; when these utilities are included, we obtain an empirical
O(1/T) rate of convergence.
Joint work with Yun Kuen Cheung and Yixin Tao
Bio: Richard Cole is a Silver Professor of Computer Science at NYU. His main interest is the design and analysis of algorithms. He currently concentrates on the following area: algorithmic economic market theory and game theory. He has previously worked on string and pattern matching, amortization, parallelism, network and routing problems. He has also been interested in the use of visualization for algorithm explanation and teaching.
He served as department chair from 19942000, and subsequently, as deputy director of the Courant Institute from 20032013 (with one semester as acting director). He was named a Guggenheim Fellow for the 198889 academic year, an ACM Fellow in 1998, and a Silver Professor of Computer Science in 2011.
Randomization inference for spillovers in networks
Dean Eckles, MIT
Abstract: Social and behavioral scientists are interested in testing of hypotheses about spillovers (i.e. interference, exogenous peer effects) in social networks; and similar questions may arise in other settings (e.g., biological and computer networks). However, when there is a single network, this is complicated by lack of independent observations. We explore Fisherian randomization inference as an approach to exact finitepopulation inference, where the main problem is that the relevant hypotheses are nonsharp null hypotheses. Fisherian randomization inference can be used to test these hypotheses either by (a) making the hypotheses sharp by assuming a model for direct effects or (b) conducting conditional randomization inference such that the hypotheses are sharp. I present both of these approaches, the latter of which is developed in Aronow (2012) and our paper (Athey, Eckles & Imbens, 2017). This usually involves selecting some vertices to be "focal" and conditioning on their treatment assignment and/or the assignment of some of all of their network neighbors. The selection of this set can present interesting algorithmic questions; we, for example, make use of greedy methods for finding maximal independent sets. I illustrate these methods with application to a large voter turnout experiment on Facebook (Jones et al., 2017).
Bio: Dean Eckles is a social scientist, statistician, and faculty at the Massachusetts Institute of Technology (MIT). Dean is the KDD Career Development Professor in Communications and Technology, an assistant professor in the MIT Sloan School of Management, and affiliated faculty at the MIT Institute for Data, Systems & Society. He was previously a member of the Core Data Science team at Facebook. Much of his work examines how interactive technologies affect human behavior by mediating, amplifying, and directing social influence  and statistical methods to study these processes. Dean's empirical work uses large field experiments and observational studies. His research appears in the Proceedings of the National Academy of Sciences and other peerreviewed journals and proceedings in statistics, computer science, and marketing. Dean holds degrees from Stanford University in philosophy (BA), cognitive science (BS, MS), statistics (MS), and communication (PhD).
Building a cooperator
Alex Peysakhovich, Facebook
Abstract: Social dilemmas are situations where individuals face a temptation to increase their payoffs at a cost to total welfare. Building artificially intelligent agents that achieve good outcomes in these situations is important because many real world interactions include a tension between selfish interests and the welfare of others. We show how to modify modern reinforcement learning methods to construct agents that act in ways that are simple to understand, nice (begin by cooperating), provokable (try to avoid being exploited), and forgiving (try to return to mutual cooperation). We show both theoretically and experimentally that such agents can maintain cooperation in Markov social dilemmas. Our construction does not require training methods beyond a modification of selfplay, thus if an environment is such that good strategies can be constructed in the zerosum case (eg. Atari) then we can construct agents that solve social dilemmas in this environment.
Bio: Alex is a Research Scientist at Facebook Artificial Intelligence Research working on both human and machine decisionmaking. He got his PhD in Behavioral Economics from Harvard University and was a postdoc with David Rand at the Human Cooperation Lab. He spent several years working with the Facebook News Feed team on combining survey and behavioral data, natural language processing for detecting clickbait, and on building tools for advanced experimentation such as heterogeneous treatment effect detection.
Diffusion, Seeding, and the Value of Network Information
Mohammad Akbarpour, Stanford
Abstract: When communicating information to individuals is costly, policymakers try to
identify the best 'seeds' to prompt a cascade of information within a social network.
Numerous studies have proposed various networkcentrality based heuristics
to choose initial seeds in a way that is likely to boost diffusion. Here we show that,
for a frequently studied diffusion process, randomly seeding s + x individuals can
prompt a larger cascade than optimally seeding the best s individuals, for a small
x. This suggests that the returns to collecting and analyzing network information
to identify the optimal seeds may not be economically significant. Given these
findings, practitioners interested in communicating a message to a large number of
people may wish to compare the cost of identifying optimal seeds with the cost of
informing a few additional people. Moreover, researchers studying networkbased
seeding heuristics may consider reporting 'extra seeds required by random seeding
to achieve the same diffusion' as an easily interpretable information about the
magnitude of their results.
Bio: Mohammad Akbarpour is an Assistant Professor of Economics at Stanford Graduate School of Business. His research bridges between economics and computer science, and is focused on market design, networked markets, and the economics of organ markets.
Mohammad received his PhD from Stanford University Economics department in 2015 and his B.Sc in Electrical Engineering from Sharif University of Technology in Iran. As a consulting economist at Auctionomics, he has been involved in designing multiple spectrum auction markets around the globe. He has also been an instructor at Khan Academy Farsi, teaching hundreds of videolectures in high school physics and calculus, and game theory.
Differential Privacy for Growing Databases
Rachel Cummings, Georgia Tech
Abstract: We study the design of differentially private algorithms for adaptive analysis of dynamically growing databases, where a database accumulates new data entries while the analysis is ongoing. We provide a collection of tools for machine learning and other types of data analysis that guarantee differential privacy and accuracy as the underlying databases grow arbitrarily large. We give both a general technique and a specific algorithm for adaptive analysis of dynamically growing databases. Our general technique is illustrated by two algorithms that schedule black box access to some algorithm that operates on a fixed database, to generically transform private and accurate algorithms for static databases into private and accurate algorithms for dynamically growing databases. These results show that almost any private and accurate algorithm can be rerun at appropriate points of data growth with minimal loss of accuracy, even when data growth is unbounded. Our specific algorithm directly adapts the private multiplicative weights algorithm to the dynamic setting, maintaining the accuracy guarantee of the static setting through unbounded data growth. Along the way, we develop extensions of several other differentially private algorithms to the dynamic setting, which may be of independent interest for future work on the design of differentially private algorithms for growing databases.
Bio: Dr. Rachel Cummings is an Assistant Professor in the H. Milton Stewart School of Industrial & Systems Engineering at Georgia Tech. Her research interests lie primarily in data privacy, with connections to machine learning, algorithmic economics, optimization, statistics, and information theory. Her work has focused on problems such as strategic aspects of data generation, incentivizing truthful reporting of data, privacypreserving algorithm design, impacts of privacy policy, and human decisionmaking.
Dr. Cummings received her Ph.D. in Computing and Mathematical Sciences from the California Institute of Technology, her M.S. in Computer Science from Northwestern University, and her B.A. in Mathematics and Economics from the University of Southern California. She is a recipient of the Amori Doctoral Prize in Computing and Mathematical Sciences, a Simons Award for Graduate Students in Theoretical Computer Science, and the Best Paper Award at the 2014 International Symposium on Distributed Computing. Dr. Cummings also serves on the ACM U.S. Public Policy Council's Privacy Committee.
Lagrangian duality in mechanism design, a crash course.
Nikhil Devanur Microsoft Research
Abstract: Designing optimal (revenue maximizing) auctions in multiparameter settings has been among the most active areas in algorithmic mechanism design in the last few years. We have discovered that Lagrangian duality is a very useful and versatile tool for this purpose. I will show how to use it to do (a subset of) the following.
1. Derive that the optimal auction is a virtual welfare maximizer.
2. Obtain a fast algorithm for approximating the optimal auction.
3. Show how simple auctions are approximately optimal.
4. Extend the above to handle marginal costs.
5. Characterize optimal auctions for structured environments.
6. Get bounds on the menusize complexity of optimal auctions.
Bio: Nikhil Devanur is a senior researcher and manager of the Algorithms group in Microsoft Research, Redmond. He is interested in what he calls Automated Economics, which studies the question of how technology can be used to improve the efficiency of economic systems. His other interest is in algorithms: he is interested in designing algorithms that are faster, simpler, work online or in a distributed fashion, for some of the basic combinatorial optimization problems.
Learning Multiitem Auctions with (or without) Samples
Yang Cai, McGill
Abstract: We provide algorithms that learn simple auctions whose revenue is approximately optimal in multiitem multibidder settings. We obtain our learning results in two settings. The first is the commonly studied setting where sample access to the bidders' distributions over valuations is given. Here, our algorithms require polynomially many samples in the number of items and bidders. The second is a more general maxmin learning setting that we introduce, where we are given ``approximate distributions'', and we seek to compute a mechanism whose revenue is approximately optimal simultaneously for all ``true distributions'' that are close to the ones we were given. These results are more general in that they imply the samplebased results, and are also applicable in settings where we have no sample access to the underlying distributions but have estimated them indirectly via market research or by observation of bidder behavior in previously run, potentially nontruthful auctions.
Our results are enabled by new uniform convergence bounds for hypotheses classes under product measures. Our bounds result in exponential savings in sample complexity compared to bounds derived by bounding the VC dimension and are of independent interest.
Here's a link to the paper
Bio: Yang Cai is a William Dawson Assistant Professor of Computer Science at McGill University. He received his Ph.D. in computer science from MIT, advised by Costis Daskalakis. He was a postdoctoral researcher at UC Berkeley. Yang's research interests include economics and computation, learning, statistics and probability, as well as online algorithms.
Prophet Inequalities Made Easy: Stochastic Optimization by Pricing NonStochastic Inputs
Michal Feldman, TelAviv University
Abstract: We present a general framework for stochastic online maximization problems with combinatorial feasibility constraints. The framework establishes prophet inequalities by constructing pricebased online approximation algorithms, a natural extension of threshold algorithms for settings beyond binary selection. Our analysis takes the form of an extension theorem: we derive sufficient conditions on prices when all weights are known in advance, then prove that the resulting approximation guarantees extend directly to stochastic settings. Our framework unifies and simplifies much of the existing literature on prophet inequalities and posted price mechanisms, and is used to derive new and improved results for combinatorial markets (with and without complements), multidimensional matroids, and sparse packing problems. Finally, we highlight a surprising connection between the smoothness framework for bounding the price of anarchy of mechanisms and our framework, and show that many smooth mechanisms can be recast as posted price mechanisms with comparable performance guarantees.
Here's a link to the paper
Bio: Michal Feldman is a Professor of computer science in the Blavatnik School of Computer Science at Tel Aviv University and a researcher at Microsoft Research (MSR) Herzliya. Her research focuses on the intersection of computer science, game theory and microeconomics. She received her Ph.D. from the University of California at Berkeley in 2005, and did her postdoc at the Hebrew University (200507). She was a faculty member in the School of Business Administration and the Center for the study of rationality at the Hebrew University (200713), and a visiting professor at Harvard University and Microsoft Research New England (201113). She serves on the editorial board of various journals, including GEB, MOR, JCSS and ACM TEAC. She is the vice chair of ACM SIGEcom, and served as the PC chair of ACM Conference on Economics and Computation 2015. She is the recipient of various grants and fellowships, including ERC (European Research Council), Marie Curie IOF, Alon, and ISF. She is a member of the Israeli Young Academy and an alumna of the Global Young Academy.
Biases Beyond Observation
Moritz Hardt, University of California, Berkeley
Abstract: Most proposed fairness measures for machine learning are
observational: They depend only on the joint distribution of the
features, predictor, and outcome. I will highlight a few useful
observational criteria, before arguing why observational criteria in
general are unable to resolve questions of fairness conclusively.
Moving beyond observational criteria, I will outline a causal
framework for reasoning about discrimination based on sensitive
characteristics.
Bio: Moritz Hardt is an Assistant Professor in the Department of Electrical
Engineering and Computer Sciences at the University of California,
Berkeley. After obtaining a PhD in Computer Science from Princeton
University in 2011, he was a postdoctoral scholar and research staff
member at IBM Research Almaden, followed by two years as a research
scientist at Google Research and Google Brain. Hardt's research aims
to make the practice of machine learning more robust, reliable, and
aligned with societal values.
Bottleneck links, essential intermediaries, and competing paths of diffusion in networks.
Mihai Manea, MIT
Abstract: We investigate how information goods are priced and diffused over links in a network. Buyers have idiosyncratic consumption values for information and, after acquiring it, can replicate it and resell copies to uninformed neighbors. A partition of the network captures the effects of network architecture and locations of information sellers on player profits and the structure of competing diffusion paths. Sellers indirectly appropriate profits over intermediation chains from buyers in their block of the partition. Links within blocks are critical for connecting the network and constitute bottlenecks for information diffusion. Links bridging distinct blocks are redundant for diffusion and impose negative externalities on sellers. Information enters each block not containing a seller via a single node  the dealer of the block. Dealers can receive information over redundant links from multiple neighbors and benefit from competitive pricing. Every nondealer buyer can acquire information from a single neighbor via a bottleneck link and is subject to a monopoly. In dense networks, competition limits the scope of indirect appropriability, and intellectual property rights foster innovation.
Bio: Mihai Manea is a game theorist focusing on social and economic networks. He graduated from Princeton University in 2005 and received a PhD in economics from Harvard University in 2009. He has taught at MIT and Yale University and is currently visiting the economics department at Stanford University.
On fairness in supervised learning
Manuel Gomez Rodriguez, Max Planck Institute for Software Systems
Abstract: Algorithmic decision making is increasingly being used to assist or replace humans in
many online as well as offline settings. These systems often rely on historical decisions, often taken
by humans, to optimize functionality, satisfaction of the end user and profitability. However, there is a
growing concern that these automated decisions can lead, even in the absence of intent, to a lack of
fairness, i.e., their outcomes can disproportionally hurt (or benefit) particular groups of people sharing
one or more sensitive attributes (e.g., race, sex). In this talk, I will first discuss several notions of fairness
in the context of supervised learning and then introduce a flexible mechanism to design fair boundarybased
classifiers by leveraging an intuitive measure of decision boundary (un)fairness. I will then show that this
mechanism can be easily incorporated into the formulation of several wellknown margin based classifiers
as convex or convexconcave constraints and it allows for a finegrained control on the degree of fairness,
often at a small cost in terms of accuracy.
Bio:Manuel Gomez Rodriguez is a tenuretrack faculty at Max Planck Institute for Software Systems.
Manuel develops machine learning and largescale data mining methods for the analysis, modeling and
control of large social and information online systems. He is particularly interested in the creation, acquisition
and/or dissemination of reliable knowledge and information, which is ubiquitous in the Web and social media,
and has received several recognitions for his research, including an Outstanding Paper Award at NIPS 2013 and
a Best Research Paper Honorable Mention at KDD 2010 and WWW 2017. Manuel holds a BS in Electrical
Engineering from Carlos III University in Madrid (Spain), a MS and PhD in Electrical Engineering from Stanford
University, and has received postdoctoral training at the Max Planck Institute for Intelligent Systems. You can
find more about him at http://learning.mpisws.org .
Blind Regression, Recommendation System and Collaborative Filtering
Devavrat Shah, MIT
We discuss the framework of Blind Regression (also known as Latent Variable Model) motivated by the problem of Matrix Completion for recommendation systems: given a collection of users and movies, the goal is to predict the unknown rating of a user for a movie using the known observations, i.e. complete the partially observed matrix. We posit that each user and movie is associated with their latent feature, and the rating of a user for a movie equals the noisy version of latent function applied to the associated latent features. Therefore, completing the matrix boils down to predicting the latent function value for usermovie pairs for which ratings are unknown, just like the classical regression setting. However, unlike the setting of regression, features are not observed here  hence "Blind" Regression. Such a model arises as a canonical characterization due to multidimensional exchangeability property a la Aldous and Hoover (early 1980s).
In this talk, using inspiration from the classical Taylor's expansion for differentiable functions, we shall propose a prediction algorithm that is consistent for all Lipschitz continuous functions. We provide finite sample analysis that suggests that even when observing a vanishing fraction of the matrix, the algorithm produces accurate predictions. We discuss relationship with spectral algorithm for matrix completion, and the collaborative filtering.
The talk is based on joint works with Christina Lee, Yihua Li and Dogyoon Song (MIT).
Bio: Devavrat Shah is a Professor with the department of Electrical Engineering and Computer Science at Massachusetts Institute of Technology. His current research interests are at the interface of Statistical Inference and Social Data Processing. His work has been recognized through prize paper awards in Machine Learning, Operations Research and Computer Science, as well as career prizes including 2010 Erlang prize from the INFORMS Applied Probability Society and 2008 ACM Sigmetrics Rising Star Award. He is a distinguished young alumni of his alma mater IIT Bombay.
DataScienceDriven Products at Facebook
Udi Weinsberg, Facebook
This talk will give an overview of a range of datadriven products that the Core Data Science group helped building, mostly by applying machine learning and statistical methods on largescale data. We'll talk about analysis of cascades and product adoption, identifying trends in realtime, fighting scams, understanding the true meanings behind emoji, and figuring out how people laugh online.
Bio: Udi Weinsberg leads the Algorithms group in Core Data Science in Facebook. The group helps product teams across Facebook to tackle difficult product problems and deliver new features by leveraging vast amounts of data together with a range of machine learning techniques. Before Facebook, Udi was a senior researcher at Technicolor, working on privacy in machine learning using cryptographic methods.
Largescale Machine Learning: Theory and Practice
Anima Anandkumar, Amazon and Caltech
Largescale machine learning requires blending computational thinking
with statistical frameworks. Designing fast, efficient and distributed
learning algorithms with statistical guarantees is an outstanding
grand challenge. I will present perspectives from theory and practice.
I will demonstrate how spectral optimization can reach the globally
optimal solution for many learning problems despite being nonconvex.
This includes unsupervised learning of latent variable models,
training neural networks and reinforcement learning of partially
observable Markov decision processes. In practice, tensor methods
yield enormous gains both in running times and learning accuracy over
traditional methods such as variational inference.
I will then talk about the recent advances in largescale deep
learning methods. Our lab at AWS is actively innovating on the MXNet
package. It is a highly flexible and developerfriendly opensource
deep learning framework designed for both efficiency and flexibility.
It is based on the distributed parameterserver framework. I will
demonstrate how to use preconfigured Deep Learning AMIs and
CloudFormation Templates on AWS to help speed up deep learning
research and development.
I will conclude on outstanding challenges on how we can bridge the
gaps between theory and practice, and how we can design and analyze
largescale learning algorithms.
Bio: Anima Anandkumar is currently a principal scientist at Amazon Web
Services. She will be joining Caltech CMS department in summer 2017 as
a Bren endowed chair. Her research interests are in the areas of
largescale machine learning, nonconvex optimization and
highdimensional statistics. In particular, she has been spearheading
the development and analysis of tensor algorithms. She is the
recipient of several awards such as the Alfred. P. Sloan Fellowship,
Microsoft Faculty Fellowship, Google research award, ARO and AFOSR
Young Investigator Awards, NSF Career Award, Best Thesis Award from
the ACM Sigmetrics society, IBM Fran Allen PhD fellowship, and several
best paper awards. She has been featured in a number of forums such as
the Quora ML session, Huffington post, Forbes, O'Reilly media, and so
on. She received her B.Tech in Electrical Engineering from IIT Madras
in 2004 and her PhD from Cornell University in 2009. She was a
postdoctoral researcher at MIT from 2009 to 2010, an assistant
professor at U.C. Irvine between 2010 and 2016, and a visiting
researcher at Microsoft Research New England in 2012 and 2014.
Learning and Efficiency of Outcomes in Games
Eva Tardos, Department of Computer Science, Cornell University
Selfish behavior can often lead to suboptimal outcome for all participants, a phenomenon illustrated by many classical examples in game theory. Over the last decade we developed good understanding on how to quantify the impact of strategic user behavior on the overall performance in many games (including traffic routing as well as online auctions). In this talk we will focus on games where players use a form of learning that helps them adapt to the environment, and consider two closely related questions: What are broad classes of learning behaviors that guarantee that game outcomes converge to the quality guaranteed by the price of anarchy, and how fast is this convergence. Or asking these questions more broadly: what learning guarantees high social welfare in games, when the game or the population of players is dynamically changing.
Bio: Eva Tardos is a Jacob Gould Schurman Professor of Computer Science at Cornell University, was Computer Science department chair 20062010. She received her BA and PhD from Eotvos University in Budapest. She joined the faculty at Cornell in 1989. Tardos's research interest is algorithms and algorithmic game theory. She is most known for her work on networkflow algorithms, approximation algorithms, and quantifying the efficiency of selfish routing. She has been elected to the National Academy of Engineering, the National Academy of Sciences, the American Academy of Arts and Sciences, and is an external member of the Hungarian Academy of Sciences. She is the recipient of a number of fellowships and awards including the Packard Fellowship, the Goedel Prize, Dantzig Prize, Fulkerson Prize, and the IEEE Technical Achievement Award. She is editor editorinChief of the Journal of the ACM, and was editor in the past of several other journals including the SIAM Journal of Computing, and Combinatorica, served as problem committee member for many conferences, and was program committee chair for SODA'96, FOCS'05, and EC'13.
Supply Chain Disruptions: Evidence from the Great East Japan Earthquake
Alireza TahbazSalehi, Columbia
Exploiting the exogenous and regional nature of the Great East Japan Earthquake of 2011, this paper provides a systematic quantification of the role of inputoutput linkages as a mechanism for the propagation and amplification of shocks. We document that the disruption caused by the earthquake and its aftermaths propagated upstream and downstream supply chains, affecting the direct and indirect suppliers and customers of disasterstricken firms. We then use our empirical findings to obtain an estimate for the overall macroeconomic impact of the shock by taking these propagation effects into account. We find that the propagation of the shock over inputoutput linkages can account for a 1.2 percentage point decline in Japan's gross output in the year following the earthquake. We interpret these findings in the context of a general equilibrium model that takes the firmtofirm linkages into account explicitly. Link
Quantifying Tradeoffs between Fairness and Accuracy in Online Learning
Aaron Roth, Department of Computer Science, University of Pennsylvania
In this talk, I will discuss our recent efforts to formalize a particular notion of "fairness" in online decision making problems, and study its costs on the achievable learning rate of the algorithm. Our focus for most of the talk will be on the "contextual bandit" problem, which models the following scenario. Every day, applicants from different populations submit loan applications to a lender, who must select a subset of them to give loans to. Each population has a (potentially different) mapping from applications to creditworthiness that is initially unknown to the lender. The fairness constraint we impose roughly translates to: "Less credit worthy individuals should never be preferentially favored over more credit worthy individuals, regardless of group membership". Despite the fact that this constraint seems consistent with the profit motivation of the bank, we show that imposing a fairness constraint provably slows down learning  sometimes only mildly, but sometimes substantially, depending on the structure of the problem. Time permitting, we will mention recent extensions to the reinforcement learning setting in which the actions of the learner can affect its environment, and to economic settings in which the learner must be incentivized by a principal to act fairly.
Joint Work with Matthew Joseph, Michael Kearns, Jamie Morgenstern, and Seth Neel
Bio: Aaron Roth is the class of 1940 Bicentennial Term associate professor of Computer and Information Sciences at the University of Pennsylvania, affiliated with the Warren Center for Network and Data Science, and codirector of the Networked and Social Systems Engineering (NETS) program. Previously, he received his PhD from Carnegie Mellon University and spent a year as a postdoctoral researcher at Microsoft Research New England. He is the recipient of a Presidential Early Career Award for Scientists and Engineers (PECASE) awarded by President Obama in 2016, an Alfred P. Sloan Research Fellowship, an NSF CAREER award, and a Yahoo! ACE award. His research focuses on the algorithmic foundations of data privacy, algorithmic fairness, game theory and mechanism design, learning theory, and the intersections of these topics. Together with Cynthia Dwork, he is the author of the book "The Algorithmic Foundations of Differential Privacy."
Exploiting the Natural Exploration in Online DecisionMaking
Mohsen Bayati, Graduate School of Business, Stanford University
Growing availability of data has enabled practitioners to tailor decisions at the individual level. This involves learning a model of decision outcomes conditional on individualspecific covariates (contexts). Recently, contextual bandits have been introduced as a framework to study these online and sequential decision making problems. This literature predominantly focuses on algorithms that balance an explorationexploitation tradeoff, since greedy policies that exploit current estimates without any exploration may be suboptimal in general. However, explorationfree greedy policies are desirable in many practical settings where experimentation may be prohibitively costly or unethical (e.g. clinical trials).
In this talk we show that, for a general class of context distributions, the greedy policy benefits from a natural exploration obtained from the varying contexts and becomes asymptotically optimal under some assumptions on problem parameters. Motivated by these results, we introduce GreedyFirst, a new algorithm that uses only observed contexts and rewards to determine whether to follow a greedy policy or to explore. We prove that this algorithm is asymptotically optimal without any additional assumptions. Through simulations we demonstrate that GreedyFirst successfully reduces experimentation and outperforms existing (explorationbased) algorithms.
Joint work with Hamsa Bastani and Khashayar Khosravi
Competitive Division of a Mixed Manna
Herve Moulin, University of Glasgow
A mixed manna contains goods (that everyone likes), bads (that everyone dislikes), as well as items that are goods to some agents, but bads or satiated to others.
If all items are goods and utility functions are homogeneous of degree one, concave (and monotone), the competitive division maximizes the Nash product of utilities (GaleEisenberg): hence it is welfarist (determined by the set of feasible utility profiles), unique, continuous and easy to compute.
We show that the competitive division of a mixed manna is still welfarist. If the zero utility profile is Pareto dominated, the competitive profile is strictly positive and still uniquely maximizes the product of utilities. If the zero profile is unfeasible (for instance if all items are bads), the competitive profiles are strictly negative, and are the critical points of the product of disutilities on the efficiency frontier. The latter allows for multiple competitive utility profiles, from which no singlevalued selection can be continuous or Resource Monotonic.
Thus the implementation of competitive fairness under linear preferences in interactive platforms like SPLIDDIT will be more difficult when the manna contains bads that overwhelm the goods.
Bio: Herve Moulin graduated in 1971 from the Ecole Normale Superieure in Paris, and received his Ph.D. in mathematics from the Universite de Paris in 1975. He has taught at the Ecole Nationale de la Statistique et Administration Economique, University of Paris at Dauphine, Virginia Polytechnic Institute and State University, Duke University and Rice University, and currently at the University of Glasgow. He is a Fellow of the Econometric Society since 1983.
His work has contributed to redefining the field of 'normative economics', poised to invent new resource allocation mechanisms  or refine existing ones  in a variety of contexts: voting rules; the fair division of assets (as in a divorce or inheritance); rationing of overdemanded commodities (such as organs for transplant or seats for a popular event); the exploitation of a "commons"; the assignment of tasks between workers; the scheduling of jobs in a queue; sharing the cost and pricing the traffic of a communication network, etc.
Using Optimization to Balance Fairness and Efficiency in Barter Markets
John Dickerson, University of Maryland
The exchange of indivisible goods without money addresses a variety of constrained economic settings where a medium of exchange  such as money  is considered inappropriate. Participants are either matched directly with another participant or, in more complex domains, in barter cycles and chains with other participants before exchanging their endowed goods. We show that techniques from computer science and operations research, combined with the recent availability of massive data and inexpensive computing, can guide the design of such matching markets and enable the markets by running them in the real world.
A key application domain for our work is kidney exchange, an organized market where patients with endstage renal failure swap willing but incompatible donors. We present new models that address three fundamental dimensions of kidney exchange: (i) uncertainty over the existence of possible trades, (ii) balancing efficiency and fairness, and (iii) inherent dynamism. Next, we combine these dimensions, along with highlevel humanprovided guidance, into a unified framework for learning to match in a general dynamic setting. This framework, which we coin FutureMatch, takes as input a highlevel objective (e.g., "maximize graft survival of transplants over time") decided on by experts, then automatically learns based on data how to make this objective concrete and learns the "means" to accomplish this goal  a task that, in our experience, humans handle poorly. Throughout, we draw on insights from our work with the United Network for Organ Sharing (UNOS) USwide exchange and experiments on data from the National Health Service UKwide exchange.
Bio: John P. Dickerson is an Assistant Professor of Computer Science at the University of Maryland, and a recent CS PhD graduate of Carnegie Mellon University. His research centers on solving practical economic problems using techniques from computer science, stochastic optimization, and machine learning. He has worked extensively on theoretical and empirical approaches to kidney exchange, where his work has set policy at the UNOS nationwide exchange; gametheoretic approaches to counterterrorism and negotiation, where his models have been deployed; and computational advertising through Optimized Markets, a CMU spinoff company. He created FutureMatch, a general framework for learning to match subject to human value judgments; that framework won a 2014 HPCWire Supercomputing Award. Prior to his Ph.D., he worked at IBM and in R&D at a defense agency. He is an NDSEG Fellow, Facebook Fellow, and Siebel Scholar.
Computational Social Choice: For the People
Ariel Procaccia, CMU
Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people  not software agents  make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimizationdriven voting methods. Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible.
Bio: Ariel Procaccia is an Assistant Professor in the Computer Science Department at Carnegie Mellon University, and an Affiliated Faculty in the Machine Learning Department. He usually works on problems at the interface of computer science and economics. His distinctions include the IJCAI Computers and Thought Award (2015), the Sloan Research Fellowship (2015), the NSF Faculty Early Career Development Award (2014), and the IFAAMAS Victor Lesser Distinguished Dissertation Award (2009); as well as multiple paper awards including Best Paper (2016) and Best Student Paper (2014) at the ACM Conference on Economics and Computation (EC). He is coeditor of the Handbook of Computational Social Choice (Cambridge University Press, 2016).
Timing "versus" Matching: Dynamic Matching on Stochastic Networks
Mohammad Akbarpour, Stanford
We introduce a dynamic analogue of ErdosRenyi random graph model and study the problem of finding the "maximum matching" in such dynamic graph. In our model, agents arrive and depart stochastically, and the composition of the trade graph depends endogenously on the matching algorithm. We show that if the planner can identify agents who are about to depart, then waiting to thicken the market is highly valuable, and if the planner cannot identify such agents, then matching agents greedily is close to optimal. The planner's decision problem in our model involves a combinatorially complex state space. However, we show that simple local algorithms that choose the right time to match agents, but do not exploit the global graph structure, can perform close to complex optimal algorithms. Finally, we consider a setting where agents have private information about their departure times, and design a continuoustime dynamic mechanism to elicit this information.
Joint work with Shengwu Li and Shayan Oveis Gharan
Bio: Mohammad Akbarpour is an Assistant Professor of Economics at Stanford GSB. He completed his PhD at Stanford University Economics department in 2015, then spent the 20152016 academic year at the University of Chicago Economics department, before coming back to Stanford in September 2016.
Network reporting methods for sample surveys
Dennis Feehan, UC Berkeley
Surveys have traditionally been based on the idea that researchers can estimate characteristics of a population by obtaining a sample of individuals and asking them to report about themselves. Network reporting surveys generalize this traditional approach by asking survey respondents to report about members of their personal networks. Network reporting surveys can be used to study many important rare and hidden populations for which traditional survey methods are inadequate; for example, the approach has been used to estimate the size of epidemiologically important groups like sex workers, drug injectors, and men who have sex with men. I will introduce a framework for developing estimators from network reporting surveys and then I will present some results from a nationallyrepresentative survey experiment that my colleagues and I conducted in Rwanda.
Bio: Dennis Feehan is an Assistant Professor in the Department of Demography at UC Berkeley. He completed his PhD at Princeton University in 2015 and spent the fall/winter of 2015 as a postdoc at Facebook before starting at Berkeley last January.
Online Learning in Repeated Auctions
Philippe Rigollet, MIT
Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of
unknown value are sold sequentially and bidders only learn (potentially noisy) information about
a good's value once it is purchased. We adopt an online learning approach with bandit feedback
to model this problem and derive bidding strategies for two models: stochastic and adversarial.
In the stochastic model, the observed values of the goods are random variables centered around
the true value of the good. In this case, logarithmic regret is achievable when competing against
well behaved adversaries. In the adversarial model, the goods need not be identical. Comparing
our performance against that of the best fixed bid in hindsight, we show that sublinear regret is
also achievable in this case. For both the stochastic and adversarial models, we prove matching
minimax lower bounds showing our strategies to be optimal up to lowerorder terms. To our
knowledge, this is the first complete set of strategies for bidders participating in auctions of this
type
Joint with Jonathan Weed, and Vianney Perchet.
Bio: Philippe Rigollet
Near Feasible Stable Matchings
Rakesh Vohra, University of Pennsylvania
Stable matchings when there are complementarities in the preferences of one side need not exist. One instance of where this might matter is in the National Resident Matching Program (NRMP). With the participation of couples, who have preferences over pairs of hospital slots, stable matchings need not exist. Indeed, the problem of determining whether a stable matching exists in the presence of couples is NPcomplete. For any student preferences, we show that each instance of a matching problem has a `nearby' instance with a stable matching. The nearby instance is obtained by perturbing the capacities of the hospitals. Specifically, the capacity of each hospital will change (up or down) by no more than 4 and the total capacity of all hospitals will not go down and not down or increase by more than 9. These `perturbation’ bounds are independent of the size of the instance. We think the technique used to arrive at this result of independent interest and should be useful elsewhere. It involves a combination of Scarf’s lemma and randomized rounding.
Joint work with Thanh Nguyen
Bio: Professor Vohra is known for his application of mathematical programming techniques to the study of Game Theory and Mechanism Design.
He formerly taught at Northwestern University, where he was the John L. and Helen Kellogg Professor of Managerial Economics and Decision Sciences in the Kellogg School of Management, with additional appointments in the Department of Economics and the Department of Electrical Engineering and Computer Science. He came to Penn as part as of the Penn Integrates Knowledge program that President Amy Guttmann established in 2005 as a Universitywide initiative to recruit exceptional faculty members whose research and teaching exemplify the integration of knowledge across disciplines. His appointment is shared between the Department of Economics in the School of Arts and Sciences and the Department of Electrical and Systems Engineering in the School of Engineering and Applied Science.
Optimization in Online Content Recommendation Services: Beyond ClickThrough Rates
Yonatan Gur, Stanford Graduate School of Business
A new class of online services allows internet media sites to direct users from articles they are currently reading to other content they may be interested in. This process creates a “browsing path” along which there is potential for repeated interaction between the user and the provider, giving rise to a dynamic optimization problem. A key metric that often underlies this recommendation process is the clickthrough rate (CTR) of candidate articles. While CTR is a measure of instantaneous click likelihood, we analyze the performance improvement that one may achieve by some lookahead that accounts for the potential future path of users. To that end, using a large data set of user path history at major media sites, we introduce and derive a representation of content along two key dimensions: clickability, the likelihood to click to an article when it is recommended; and engageability, the likelihood to click from an article when it hosts a recommendation. We then propose a class of heuristics that leverage both clickability and engageability, and provide theoretical support for favoring such pathfocused heuristics over myopic heuristics that focus only on clickability (no lookahead). We conduct a live pilot experiment that measures the performance of a practical proxy of our proposed class, when integrated into the operating system of a worldwide leading provider of content recommendations. We estimate the aggregate improvement in clickspervisit relative to the CTRdriven current practice. The documented improvement highlights the importance and the practicality of efficiently incorporating for the future path of users in real time.
Joint work with Omar Besbes and Assaf Zeevi.
Bio: Yonatan Gur is an Assistant Professor of Operations, Information, and Technology at Stanford University’s Graduate School of Business. His research interests include stochastic modeling and optimization, datadriven decisionmaking, machine learning, and game theory with applications to revenue management, pricing, operations management, and service systems, and with a particular emphasis on dynamic, online environments. Yonatan received his PhD in Decision, Risk, and Operations from Columbia Business School. He also holds a B.Sc. from the school of Physics and Astronomy and an M.Sc. from the School of Mathematical Sciences, Tel Aviv University.
Modeling of Personal Mobility Behavioral Patterns
Irad BenGal, Tel Aviv University
In recent years, personal location data is continuously captured by mobile devices, GPS chips and other sensors. Such data provides a unique learning opportunity on individuals’ mobility behavior that can be used for various applications in transportation, marketing, homeland security, smart cities and more. Nonetheless, modeling such data poses new challenges related to data volume, diversity, inhomogeneity as well as the required granularity level. In this talk, we will address a real ‘smart city’ usecase and cover some of the associated opportunities and challenges. We will present a new set of mobilitybehavior models that generalizes Markov Chains and VariableOrder Bayesian Networks. We will discuss how they can be used in different applications such as pattern recognition, anomaly detection, clustering and classification.
Bio:
The Impacts of Neighborhoods on Intergenerational Mobility
Raj Chetty, Stanford University
How can we improve economic opportunities for lowincome
children in the United States? We first characterize how rates of
upward mobility vary across counties using data from tax records
covering the U.S. population. We then study families who moved across
areas to identify test whether the geographic variation is driven by
the causal effects of place or heterogeneity in the types of people
who live in each place. We find that every year of exposure to a
better environment improves a given child’s chances of success, both
in a national quasiexperimental study of five million families and in
a reanalysis of the Moving to Opportunity Experiment. Finally, we
present estimates of the causal effect of each county in America on
upward mobility and characterize the features of highmobility areas.
Papers available at: www.equalityofopportunity.org
Bio: Raj Chetty joined Stanford as a Professor of Economics in
2015. Chetty's research combines empirical evidence and economic
theory to help design more effective government policies. His current
research focuses on equality of opportunity: how can we give children
from disadvantaged backgrounds better chances of succeeding? Chetty's
most recent honors include a MacArthur Fellowship and the John Bates
Clark medal.
Orchestrating Collective Intelligence to Modernize Statistical Data Collection
Joe Reisinger, Premise Data
Transparency and access to information is critical to high functioning economies. However, as the world becomes more interconnected, complex and volatile, our capacity to perform robust statistical measurement is becoming increasingly fragmented. Premise is building machinery for performing evidencebased surveys at a global scale, leveraging a network of 30,000 data contributors across 34 countries to capture structured information using mobile technology. This talk will focus on the call for modernizing data collection and why the new frontier of mobile technology is the conduit for the future of business and economics. On the platformside, several open areas under active research will be highlighted: optimizing task allocation for quality and budget; designing for both exploration and exploitation; and combatting fraud and collusion on a network.
Bio: Joe Reisinger is the CTO and cofounder of Premise, a mobile technology platform orchestrating and crowdsourcing the collection of microdata for precise and robust economic measurement. He holds a PhD in Computer Science from the University of Texas and spent his academic career building natural language understanding systems at Google Research and IBM T.J. Watson. Prior to cofounding Premise, he was Chief Scientist at Metamarkets.
Scaling up the Web Transparency: Algorithms and Challenges
Augustin Chaintreau, Columbia University
What are they doing with our data?’’ is what transparency ought to
answer. The web giants – e.g. Google, Amazon, and Facebook, but also
many other companies  leverage user data for personalization
(recommendations, targeting advertisements, and increasingly often
adjusting prices). Users currently have little insight, and at best
coarse information, to monitor how and for which purposes their data
are being used. What if instead one could tell you exactly which item
 whether an email you wrote, a search you made, or a webpage you
visit  has been used to decide on a targeted ad or a recommended
product? But can we track our data at scale, and in environments we do
not control?
In this talk, we argue  unfortunately through real examples  that
without web transparency the exciting world open with large collection
of personal data compromises our longstanding commitment to fairness
and equal opportunities. Fortunately, as we will show, web
transparency may be (at least partially) restored. We built the first
finegrained tracking system for personal data that scales, and prove
it successful in multiple services (Gmail, Amazon, Youtube). As we
will characterize its performance, new hopes but also new challenges
emerge to reconcile big data with accountability and societal
progress.
(based on joint works with Mathias Lécuyer, Riley Spahn, Guillaume
Ducoffe and Roxana Geambasu)
Bio: Augustin is an Assistant Professor of Computer Science at Columbia
University since 2010, where he directs the Mobile Social Lab. He
designs algorithms leveraging social mobile behaviors or incentives,
to reconcile the risk and value of personal data networking. His
research addressing transparency in personalization, fairness in
personal data markets, efficiency in social information sharing, cross
domain data linkage, and human mobility lead to 25 papers in tier1
conferences (five receiving best or best student paper awards at ACM
CoNEXT, SIGMETRICS, USENIX IMC, IEEE MASS, Algotel, some with large
media coverage such as the NYT blog). An ex student of the Ecole
Normale Supérieure in Paris, he earned a Ph.D in mathematics and
computer science in 2006, the NSF CAREER Award in 2013 and the ACM
SIGMETRICS Rising star award in 2013. He has been an active member of
the network research community, serving in the program committees of
ACM SIGMETRICS, SIGCOMM, WWW, CoNEXT (as chair), MobiCom, MobiHoc,
IMC, WSDM, WWW, COSN, AAAI ICWSM, and IEEE Infocom, and as area editor
for IEEE TMC and ACM SIGCOMM CCR.
Prediction Market Trading Strategies: manipulation and forecasting
David Rothschild, Microsoft Research
Prediction markets offer the promise of high time granularity studies of subjective expectations over major and minor events. Further, they help us design the future of markets and nonprobability polling. In this paper we illustrate the lessons from order books of realmoney, election related prediction markets that show persistent arbitrage opportunities. First, examining a randomized field trial of actual trades we demonstrate that arbitrage opportunities are significantly larger than the order book indicates. Second, examining transactionlevel data for all trades (nearly 300,000 over 6,300 users) in one of the exchanges we present evidence suggestive of market manipulation by a single large trader. Third, we demonstrate four additional price misalignments resulting from biased trading and implicitly asymmetric trading costs. Finally, we explain how markets are still accurate, but also detail how to improve the information flow in prediction and financial markets. And tie those lessons to the growing field of nonprobability polling.
Bio: David Rothschild is an economist at Microsoft Research. He has a Ph.D. in applied economics from the Wharton School of Business at the University of Pennsylvania. His primary body of work is on forecasting, and understanding public interest and sentiment. Related work examines how the public absorbs information. He has written extensively, in both the academic and popular press, on polling, prediction markets, social media and online data, and predictions of upcoming events; most of his popular work has focused on predicting elections and an economist take on public policy. He correctly predicted 50 of 51 Electoral College outcomes in February of 2012, average of 20 of 24 Oscars from 20136, and 15 of 15 knockout games in the 2014 World Cup.
Collective Phenomena in Complex Networks: Coordination and Social Learning
Ali Jadbabaie, University of Pennsylvania
In this talk, I will present a highlevel overview of my research in the past decade on collective behavior in networked, social, and engineering systems. these collective phenomena include social aggregation in animals such as schooling, herding, and flocking, and more broadly emergence of consensus, swarming, and synchronization in complex network of interacting dynamic systems. A common underlying theme in this line of study is to understand how a desired global behavior such as consensus, synchronization or a particular formation can emerge from purely local interactions. The evolution of these ideas into social systems has lead to a new theory of collective decision making among strategic agents. Examples include participation decisions in uprisings, social cascades and investment decisions in infrastructure. I will investigate distributed strategies for information aggregation, social learning and detection problems in networked systems where heterogeneous agents with different observations (with varying quality and precision) coordinate to learn a true state (e.g., finding aggregate statistics or detecting faults and failure modes in spatially distributed wireless sensor networks, or deciding suitability of a political candidate, quality of a product,and forming opinions on social issues of the day in social networks) using a stream of private observations and interaction with neighboring agents. I will end the talk with a description of contagion phenomena in networked systems and a new vision for graduate education at the interface of information and decision systems, data science and social sciences.
Bio: Ali Jadbabaie is the Alfred Fitler Moore Professor of Network Science in the department of Electrical and Systems Engineering at the University of Pennsylvania (Penn). He holds secondary appointments in the departments of Computer and Information Science (CIS) in Penn Engineering, and Operations, Information, and Decisions (OID) in the Wharton School. A faculty member in Penn’s (GRASP) Lab, he is also the cofounder and director of the Raj and Neera Singh Program in Networked & Social Systems (NETS) at Penn Engineering. NETS is a new undergraduate interdisciplinary degree program focused on network science and engineering, operations research, social phenomena and social and technological networks. From January to October 2015, Ali was on sabbatical at LIDS at MIT. From October 2015 to January 2016, he took a leave from Penn to lead the Sociotechnical Systems Research Center at MIT as its interim director. During this period, he served as the Associate Director of MIT’s newly formed Institute for Data, Systems and Society (IDSS), where he helped create a newlyapproved PhD program in Social and Engineering Systems and shaped the intellectual agenda of the newly formed Institute. Ali received his B.S. with High Honors from Sharif University of Technology, his M.S. in Electrical and Computer Engineering from the University of New Mexico in Albuquerque, and his Ph.D. in Control and Dynamical Systems from the California Institute of Technology (Caltech). He was a postdoctoral scholar at Yale University for a year before joining the faculty of the University of Pennsylvania (Penn) in July 2002. Aside from GRASP Lab and the NETS program, Ali is also a faculty member of The Warren Center for Network & Data Sciences at Penn and a faculty affiliate of the Center for Technology, Innovation and Competition at Penn Law School. Ali is the inaugural editorinchief of IEEE Transactions on Network Science and Engineering, a new interdisciplinary journal sponsored by 5 IEEE Societies, an Associate Editor of the INFORMS journal Operations Research and a former Associate Editor of IEEE Transactions on Control of Network Systems. He is a recipient of a Career Award from NSF, an ONR Young Investigator Award, the O. Hugo Schuck Best Paper Award of the American Automatic Control Council, and the George S. Axelby Best Paper Award of the IEEE Control Systems Society. He is also an IEEE Fellow. He has graduated 10 PhD students, and has served as mentor of 5 postdoctoral scholars who have become research leaders both in idustry and academia. His current research interests include decision and control theory with a focus on distributed optimization and control, collective behavior, network science, and the study of collective behavior in engineering systems and social and economic networks.
Conversational Peers and Idea Generation: Evidence from a Field Experiment
Sharique Hasan, Stanford Graduate School of Business
Abstract: Social interaction is thought to affect individual and organizational innovation. We argue that individuals and teams are better able to generate high quality ideas if they converse with peers who provide them with new insight, perspective and information. Further, we argue that not all individuals can equally capitalize on this new information. Specifically, extroverted peers, because they are more willing to share and transmit their experiences facilitate idea generation. Moreover, innovators who are open to experience absorb this new information better and can more effectively convert it to better ideas. We test our claims using novel data from a randomized field experiment at an entrepreneurship bootcamp where individuals were randomized to both teams and conversational peers. We find that conversations with extroverted peers help individuals generate higher quality ideas, more open individuals benefit more from such peers, and teams with more cohesion can convert these raw ideas into better performance.
Bio: Sharique Hasan is an Associate Professor of Organizational Behavior in the Graduate School of Business at Stanford University. His research focuses on understanding how social interaction affects processes such as learning, network dynamics, creativity and entrepreneurship. To study these processes he conducts field experiments.
Crowdsourcing Societal Tradeoffs
Vincent Conitzer, Duke University
It would be desirable if, as a society, we could reduce the amount of
landfill trash we create, the amount of carbon dioxide we emit, the
amount of forest we clear, etc. Since we cannot (or are in any case
not willing to) simultaneously pursue all these objectives to their
maximum extent, we must prioritize among them. Currently, this is
done mostly in an adhoc manner, with people, companies, local
governments, and other entities deciding on an individual basis which
of these objectives to pursue, and to what extent.
A more systematic approach would be to set, at a global level, exact
numerical tradeoffs: using one gallon of gasoline is as bad as
creating x bags of landfill trash. Having such tradeoffs available
would greatly facilitate decision making, and reduce inefficiencies
resulting from inconsistent decisions across agents. But how could we
arrive at a reasonable value for x? I will discuss this from the
viewpoint of computational social choice.
Joint work with Rupert Freeman, Markus Brill, and Yuqian Li
Bio: Vincent Conitzer is the Kimberly J. Jenkins University Professor of
New Technologies and Professor of Computer Science, Professor of
Economics, and Professor of Philosophy at Duke University. He received
Ph.D. (2006) and M.S. (2003) degrees in Computer Science from Carnegie
Mellon University, and an A.B. (2001) degree in Applied Mathematics
from Harvard University. His research focuses on computational aspects
of microeconomics, in particular game theory, mechanism design,
voting/social choice, and auctions. This work uses techniques from,
and includes applications to, artificial intelligence and multiagent
systems. Conitzer has received the Social Choice and Welfare Prize
(2014), a Presidential Early Career Award for Scientists and Engineers
(PECASE), the IJCAI Computers and Thought Award, an NSF CAREER award,
the inaugural Victor Lesser dissertation award, an honorable mention
for the ACM dissertation award, and several awards for papers and
service at the AAAI and AAMAS conferences. He has also been named a
Guggenheim Fellow, a Kavli Fellow, a Bass Fellow, a Sloan Fellow, and
one of AI's Ten to Watch. Conitzer and Preston McAfee are the founding
EditorsinChief of the ACM Transactions on Economics and Computation
(TEAC).
Design and Analysis for Experiments in Networks
Johan Ugander, Stanford University
A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the Average Treatment Effect (ATE) of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments involving social interference, when the treatment of individuals spills over to neighboring individuals along an underlying social network. This talk will describe a novel experimental framework, graph cluster randomization, for identifying average treatment effects under social interference. Given this framework, we analyze the variance of the average treatment effect as a property of the graph cluster design, and bias/variance tradeoffs under exposure model misspecifications. This talk will discuss joint work with Lars Backstrom, Dean Eckles, Brian Karrer, Jon Kleinberg, and Joel Nishimura.
Bio: Johan Ugander is an Assistant Professor of Management Science &ame; Engineering at Stanford University. His research develops algorithmic and statistical frameworks for analyzing social networks, social systems, and other largescale social data. Prior to joining Stanford he was a postdoctoral researcher at Microsoft Research, Redmond from 20142015, held an affiliation with the Facebook Data Science team from 20102014, and obtained his Ph.D. in Applied Mathematics from Cornell University in 2014.
Scalable Large NearClique Detection in LargeScale Networks
Charalampos E. Tsourakakis, Harvard School of Engineering and Applied Sciences
Finding large near(bi)cliques in massive networks is a notoriously hard
problem of great importance to many applications, including anomaly
detection and security, and community detection in social networks, and the Web graph.
But can we exploit idiosyncrasies of realworld networks to attack
this NPhard problem successfully?
In this talk I will answer this question in the affirmative. Specifically,
I will present a significant advance in routines
with rigorous theoretical guarantees for scalable extraction
of large nearcliques from largescale networks, the kclique densest subgraph problem.
I will present exact, approximation, and MapReduce algorithms that allow us
to scale our computations to largescale networks. At the heart of our scalable
algorithm lies a densest subgraph sparsifier theorem.
Our approach allows us to extract large near(bi)cliques from a diverse collection of networks with
millions of edges within few seconds. Finally, I will present interesting graph mining findings
such as anomaly detection in realworld networks. I will conclude my talk with some interesting open problems.
Bio: Dr. Charalampos Tsourakakis is currently a CRCS Postdoctoral Fellow in the School of Engineering and Applied Sciences (SEAS) at Harvard University. He received his Ph.D. from the Algorithms, Combinatorics and Optimization (ACO) program at Carnegie Mellon University (CMU). He also holds a Master of Science from the Machine Learning Department at CMU. He did his undergraduate studies in the School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA). He is the recipient of a best paper award in IEEE Data Mining and has designed two graph mining libraries for terascale graphs. The former has been officially included in Windows Azure, and the latter was a research highlight of Microsoft Research. His main research interest lies in designing scalable and dataaware algorithms for massive networks that recover hidden structure and solve complex datadriven problems.
Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment
Michael Luca, Harvard Business School
Online marketplaces increasingly choose to reduce the anonymity of buyers and sellers in order to facilitate trust. We demonstrate that this common market design choice results in an important unintended consequence: racial discrimination. In a largescale field experiment on Airbnb, we find that requests from guests with distinctively AfricanAmerican names are roughly 16% less likely to be accepted than identical guests with distinctively White names. The difference persists whether the host is African American or White, male or female. The difference also persists whether the host shares the property with the guest or not, and whether the property is cheap or expensive. Discrimination is costly for hosts who indulge in it: hosts who reject AfricanAmerican guests are able to find a replacement guest only 35% of the time. On the whole, our analysis suggests a need for caution: while information can facilitate transactions, it also facilitates discrimination
Joint work with Benjamin Edelman and Dan Svirsky.
Bio: Michael Luca is an assistant professor of business administration at Harvard Business School. Professor Luca’s research applies econometric methods to field data in order to study the impact of information in market settings. He works closely with governments and companies to identify, design, implement, and experiment with different approaches to information disclosure, taking into account the behavioral foundations of how people make decisions. He has done work on rankings, expert reviews, online consumer reviews, and quality disclosure laws, among other types of information provision. Professor Luca has ongoing field research with Facebook, Yelp, several colleges, the UK government, and several U.S. cities, in addition to other partners.
His current work focuses on crowdsourced reviews, analyzing a variety of companies including Yelp, Amazon, and ZocDoc. His findings have been written and blogged about in such media outlets as The Wall Street Journal, The New York Times, The Washington Post, The Huffington Post, Chicago Tribune, Harvard Business Review, PC World Magazine, and Salon.
Professor Luca received his Ph.D. in economics from Boston University and a bachelor’s degree in economics and mathematics from SUNY Albany.
Incentivizing Exploration
Bobby Kleinberg, Cornell
An important recent theme in the development of online social systems is the potential of crowdsourced effort to solve large problems — defining tasks in which many people can each contribute a small amount of time to the overall goal. In some cases the arrangement is based on a direct compensation scheme, in which a (low) rate is paid for each unit of work performed. But in many settings one only has access to a crowd "in the wild", as they go about their everyday activities. Examples include product recommendations, social news readers, and scientific activities ranging from crowdsourced "citizen science" to the funding of research by national agencies.
In all of these domains, there is a problem of misaligned incentives: the designer's goal is to carry out exploration (of the space of news stories, products, bird habitats, etc.) as efficiently as possible, but for reasons of scale they must implement the exploration via a crowd composed of members who each derive their own utility from participating in the exploration. We model this as a multiarmed bandit problem in which selfish, myopic agents pull arms with publicly observable outcomes, and a principal seeking to maximize the cumulative timediscounted reward may influence the agents by offering monetary (or other) rewards contingent on choosing particular actions. Our main result is a full characterization of the tradeoff between the expected payments the principal must make and the total reward that can be achieved.
Joint work with Peter Frazier, David Kempe, and Jon Kleinberg.
Bio: Bobby Kleinberg is an Associate Professor of Computer Science at Cornell University. His research studies the design and analysis of algorithms, and their relations to economics, learning theory, and networks. Prior to receiving his doctorate from MIT in 2005, he spent three years at Akamai Technologies, where he assisted in designing the world's largest Internet Content Delivery Network. He is the recipient of a Microsoft Research Faculty Fellowship, an Alfred P. Sloan Foundation Fellowship, and an NSF CAREER Award.
Distributed Algorithms for Big Graph Analytics
Alex Dimakis, UT Austin
Many datasets from the web, social networks, bioinformatics and neuroscience are naturally graphstructured. Graph algorithms are difficult to implement in distributed computation frameworks like Hadoop MapReduce and for this reason several inmemory graph engines like Pregel, GraphLab and GraphX are being currently developed. Our goal is to design parallelizable algorithms that discover graph structural properties and theoretically analyze their performance. We give an overview of the emerging problems in designing distributed algorithms over such graph engines and discuss future directions.
Bio: Alex Dimakis is an Assistant Professor at the Electrical and Computer Engineering department, University of Texas at Austin. From 2009 until 2012 he was with the Viterbi School of Engineering, University of Southern California. He received his Ph.D. in 2008 and M.S. degree in 2005 in electrical engineering and computer sciences from UC Berkeley and the Diploma degree from the National Technical University of Athens in 2003. During 2009 he was a CMI postdoctoral scholar at Caltech. He received an NSF Career award in 2011, a Google faculty research award in 2012 and the Eli Jury dissertation award in 2008. He is the corecipient of several best paper awards including the joint Information Theory and Communications Society Best Paper Award in 2012. He is currently serving as an associate editor for IEEE Signal Processing letters. His research interests include information theory, coding theory, signal processing, and networking, with a current focus on distributed storage and machine learning.
Estimating the Causal Impact of Recommendation Systems from Observational Data
Jake Hofman, MSR NYC
Recommendation systems are an increasingly prominent part of the web, accounting for up to a third of all traffic on several of the world's most popular sites. Nevertheless, little is known about how much activity such systems actually cause over and above activity that would have occurred via other means (e.g., search) if recommendations were absent. Although the ideal way to estimate the causal impact of recommendations is via randomized experiments, such experiments are costly and may inconvenience users. In this paper, therefore, we present a method for estimating causal effects from purely observational data. Specifically, we show that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not. We then apply our method to browsing logs containing anonymized activity for 2.1 million users on Amazon.com over a 9 month period and analyze over 4,000 unique products that experience such shocks. We find that although recommendation clickthroughs do account for a large fraction of traffic among these products, at least 75% of this activity would likely occur in the absence of recommendations. We conclude with a discussion about the assumptions under which the method is appropriate and caveats around extrapolating results to other products, sites, or settings.
Bio: Jake Hofman is a Researcher at Microsoft Research in New York City, where his work in computational social science involves applications of statistics and machine learning to largescale social data. Prior to joining Microsoft, he was a member of the Microeconomics and Social Systems group at Yahoo! Research. Jake is also an Adjunct Assistant Professor of Applied Mathematics at Columbia University, where he has designed and taught classes on a number of topics ranging from biological physics to applied machine learning. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University.
Competitive Algorithms from Competitive Equilibria
Kamesh Munagala, Duke University
Scheduling jobs is a fundamental problem that arises in numerous forms
and in various situations. In a typical scheduling problem, jobs of
unknown duration arrive in an online fashion. The rate at which jobs
get processed at any time instant depends on the resources allocated
by the scheduler to them. We introduce and study a general scheduling
problem that we term the Packing Scheduling problem (PSP), where these
rates are subject to arbitrary packing constraints. The PSP framework
captures a variety of scheduling problems that arise in data center
computing and multicore architectures. More concretely, PSP models
multidimensional resource requirements and parallelizability, as well
as network bandwidth requirements arising in scheduling jobs on shared
clusters. It also captures many classical scheduling problems such as
heterogeneous machine scheduling, broadcast scheduling, and scheduling
jobs of different parallelizability.
For the PSP problem we show a constant competitive algorithm for
minimizing total weighted completion time, when job arrivals and
durations are unknown and adversarially chosen. Surprisingly, we
achieve this result by applying the wellknown Proportional Fairness
algorithm from economics literature to allocate resources to jobs each
time instant. Proportional Fairness has been extensively studied in
the context of maximizing fairness in resource allocation; however, we
present one of the first analyses in the context of scheduling. We
also show that this algorithm has good theoretical guarantees for
minimizing total delay (latency or flow time) in several special
cases. Our results showcase the advantage of viewing complex
scheduling problems as sequential resource allocation problems, and
applying ideas from economics for both algorithm design and analysis.
Joint work with Sungjin Im (UC Merced) and Janardhan Kulkarni (Duke).
Bio: Kamesh Munagala is an Associate Professor of Computer Science at Duke University. He obtained his Ph.D. from Stanford University in 2003 and B.Tech. from IIT Bombay in 1998. He is primarily interested in approximation algorithms, sequential decision theory, and computational economics, as well as applications of these techniques to ecommerce, databases, and social networks. He is a recipient of
the NSF CAREER Award and the Alfred P. Sloan Research Fellowship, and
is a coauthor of the best paper at WWW 2009. He was a Visiting
Research Professor at Twitter in 2012, and currently serves as the
Director of Graduate Studies for the Duke CS department.
Physicsinspired Algorithms and Phase Transitions in Community Detection
Cris Moore, Santa Fe Institute
Detecting communities, and labeling nodes according to which community they belong to, is an essential problem in the study of social and biological networks. One of the best and fastest algorithms for this problem is Belief Propagation, which updates probability distributions of node labels until they reach a fixed point. I'll give a friendly introduction to Belief Propagation, and the analogy with the cavity method of statistical physics. Don't worry, no physics is required!
In addition to being practically useful, our Belief Propagation algorithm can be studied analytically, revealing phase transitions in the ability of any algorithm to find communities. It also gives us principled ways to tell when communities are statistically significant, unlike algorithms that "overfit" and find illusory structure even in random networks. It can be extended to dynamic networks where nodes switch communities over time. Finally, I'll argue that the consensus of many good solutions is often better than the "best" single solution — and that this is true for many realworld optimization problems.
This is joint work with Aurelien Decelle, Florent Krzakala, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborova, and Pan Zhang.
Bio: Cristopher Moore received his B.A. in Physics, Mathematics, and Integrated Science from Northwestern University, and his Ph.D. in Physics from Cornell. He has published over 120 papers at the boundary between physics and computer science, ranging from quantum computing, to phase transitions in NPcomplete problems, to the theory of social networks and efficient algorithms for analyzing their structure. With Stephan Mertens, he is the author of The Nature of Computation, published by Oxford University Press. He is a Professor at the Santa Fe Institute.
In 2014, Cris Moore was named a fellow of the American Physical Society for fundamental contributions to the at the interface between nonlinear physics, statistical physics, and computer science, including complex network analysis, phase transitions in NPcomplete problems, and the computational complexity of physical simulation.
Optimal Contracts for Intermediaries in Online Advertising
Ozan Candogan, Duke Fuqua Business School
In online display advertising, the prevalent method advertisers employ to acquire impressions is to contract with an intermediary. These contracts involve upfront payments made by the advertisers to the intermediary, in exchange for running campaigns on their behalf. This paper studies the optimal contract offered by the intermediary in a setting where advertisers’ budgets and targeting criteria are private. This problem can naturally be formulated as a multidimensional dynamic mechanism design problem, which in general is hard to solve. We tackle this problem by employing a novel performance space characterization technique, which relies on delineating the expected cost and value achievable by any feasible (dynamic) bidding policy. This technique provides a convex optimization formulation of the optimal contract design problem. Using this formulation, we obtain a closedform characterization of the optimal contract, when advertisers have identical value distributions. Conversely, when advertisers are heterogeneous, we provide a dualitybased approach, which reduces the optimal contract design problem to a simpler convex optimization problem. The intermediary’s bid in the optimal contract is obtained by first using the optimal dual solution to compute a weighted average of the values associated with different types (to guarantee that the advertiser reports her type truthfully), and then shading this quantity (to account for budget constraints). Our results indicate that an intermediary can profitably provide bidding service to a budgetconstrained advertiser, and at the same time increase the overall market efficiency.
Joint work with Santiago Balseiro (Duke University).
Bio: Ozan Candogan is an assistant professor at the Fuqua School of Business, where he is a member of the Decision Sciences area. Prior to joining Fuqua, he received his Ph.D. and M.S. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. His research interests are in mechanism design, decision making in social and economic networks, and analysis of dynamic strategic interactions. His most recent work focuses on the design of novel iterative auction mechanisms for efficient allocation of resources, and he was a finalist in the 2013 George Nicholson Student Paper Competition. Potential applications of his research include novel auction formats that can be employed for procurement, pricing and advertising mechanisms that optimally utilize available social network information, and pricing mechanisms for selling cloud computing service. His research has appeared in journals such as Management Science, Operations Research, Mathematics of Operations Research, and Games and Economic Behavior. He is also a recipient of the 2012 Microsoft Research Ph.D. fellowship, and 2009 Siebel Scholarship.
Randomized Field Experiments in Mobile Marketing
Anindya Ghose, NYU Stern
The explosive growth of smartphones and locationbased services (LBS) has contributed to the rise of mobile advertising. In this talk, we present results from multiple randomized field experiments that are designed to measure the effectiveness of mobile marketing and promotions. In the first experiment, using data from a location based app for smartphones, we measure the effectiveness of mobile coupons. The aim is to analyze the causal impact of geographical distance between a user and retail store, the display rank, and coupon discounts on consumers’ response to mobile coupons. In a second large scale field study we examine the role of contextual crowdedness on the redemption rates of mobile coupons. We find that people become increasingly engaged with their mobile devices as trains get more crowded, and in turn become more likely to respond to targeted mobile messages. The change in behavior can be accounted for by the phenomenon of “mobile immersion”: to psychologically cope with the loss of personal space in crowded trains and to avoid accidental gazes, commuters can escape into their personal mobile space. In turn, they become more involved with targeted mobile messages they receive, and, consequently, are more likely to make a purchase in crowded trains. These studies causally show that mobile advertisements based on realtime static geographical location and contextual information can significantly increase consumers’ likelihood of redeeming a geotargeted mobile coupon. However, beyond the location and contextual information, the overall mobile trajectory of each individual consumer can provide even richer information about consumer preferences. In a third study, we propose a new mobile advertising strategy that leverages full information on consumers’ offline moving trajectories. To examine the effectiveness of this new mobile trajectorybased advertising strategy, we designed a largescale randomized field experiment in one of the largest shopping malls in the world. We find that mobile trajectorybased advertising can lead to highest redemption probability, fastest redemption behavior, and highest satisfaction rate from customers at the focal advertising store. Various practical implications for mobile marketing are discussed.
Bio: Anindya Ghose is a Professor of IT and a Professor of Marketing at New York University's Leonard N. Stern School of Business. He is the coDirector of the Center for Business Analytics at NYU Stern. He is the Robert L.& Dale Atkins Rosen Faculty Fellow and a Daniel P. Paduano Fellow of Business Ethics at NYU Stern. He has been a Visiting Associate Professor at the Wharton School of Business. He also serves as the Scientific Advisor to 3TI China. He was recently named by Business Week as one of the "Top 40 Professors Under 40 Worldwide" and by Analytics Week as one the "Top 200 Thought Leaders in Big Data and Business Analytics". His research has been recognized by 10 best paper awards and nominations. He has published more than 75 papers in premier scientific journals and peer reviewed conferences, and has given more than 200 talks internationally. His research has been profiled and he has been interviewed numerous times in the BBC, Bloomberg TV, CNBC, China Daily, Financial Times, Fox News, Forbes, Knowledge@Wharton, Korean Broadcasting News Company, Los Angeles Times, MSNBC, NBC, New York Times, New York Daily, National Public Radio, NHK Japan Broadcasting, Reuters, Time Magazine, Washington Post, Wall Street Journal, Xinhua, and elsewhere.
Scalable mining of massive networks: Distancebased Centrality, Similarity, and Influence
Edith Cohen, TelAviv University
Graphs are ubiquitous in representing interactions between entities: communication, social relations, clicks, likes, follows, views, hyperlinks, transactions, and more. Structural properties of a corresponding graph successfully model fundamental properties of entities and their relations with applications to ranking, recommendations, attribute completion, and clustering.
The shortestpath and reachability relation are the basis of natural measures of centrality, influence, and similarity. The measures can be defined over a static set of edges or, for robustness, over a distribution derived from the set of interactions.
In the talk, I will first present and motivate a unified view of distancebased measures. I will then overview algorithmic and estimation tools from my own work which support scalable computation of these measures on massive networks. The approach is based on computing samplebased sketches which capture the relation of each node to all others. The sketches, with appropriate carefully derived estimators, are then used to provide quick highconfidence approximate answers for similarity, centrality, and influence queries.
An advantages of distancebased measures over others based on the paths ensemble (randomwalk/pagerank, Katz, resistance) is the combination of high scalability and application to asymmetric (directed) interactions, which are often harder to work with.
Some of the material is based on joint work performed at the MSR Silicon Valley Lab, with Daniel Delling, Andrew Goldberg, Moises Goldzmidt, Fabian Fuchs, Thomas Pajor, and Renato Werneck.
Bio: Edith Cohen is (visiting) full professor at Tel Aviv University. Until 2014 she was a Principal Researcher at Microsoft Research (Silicon Valley) and before that between 1991 and 2012 she was at AT&T Labs (initially AT&T Bell Laboratories). She was a visiting professor at UC Berkeley in 1997. She received a Ph.D in Computer Science from Stanford University in 1991. Her research interests include algorithms, mining and analysis of massive data, optimization, and computer networking. She is a winner of the IEEE ComSoc 1997 Bennett prize, and an author of 20+ patents and 100+ publications.
Quadratic Voting
Glen Weyl, University of Chicago/MSR
While the onepersononevote rule often leads to the tyranny of the majority, alternatives proposed by economists have been complex and fragile. By contrast, we argue that a simple mechanism, Quadratic Voting (QV), is robustly very efficient. Voters making a binary decision purchase votes from a clearinghouse paying the square of the number of votes purchased. If individuals take the chance of a marginal vote being pivotal as given, like a market price, QV is the unique pricing rule that is always efficient. In an independent private values environment, any typesymmetric BayesNash equilibrium converges towards this efficient limiting outcome as the population grows large, with inefficiency decaying as 1/N. We use approximate calculations, which match our theorems in this case, to illustrate the robustness of QV, in contrast to existing mechanisms. We discuss applications in both (nearterm) commercial and (longterm) social contexts.
Bio: Glen Weyl is a Researcher at Microsoft Research New England, an Assistant Professor of Economics and Law at the University of Chicago and an Alfred P. Sloan Research Fellow. He was Valedictorian of Princeton University as an undergraduate in 2007, received his PhD also from Princeton in 2008 and then spent three years as a Junior Fellow at the Harvard Society of Fellows before joining the faculty at Chicago. His research focuses on pure and applied price theory, especially applications to industrial and tax policy, as well as the intersection between economics and related fields such as law and philosophy. Outside his academic work, Glen cofounded a startup commercializing Quadratic Voting and consults for platform startups through Applico Inc.
Auctions and Decentralization in Complex Networks
Paul Milgrom, Stanford
Economists' analyses are most often based on simplifying assumptions that enable the analyst to focus on the roles of prices, substitution, and decentralized decision making. Computer scientists most often take a sharply different approach, focusing on worst cases and general algorithmic treatments, in which simple economic structures are mostly hidden from view. We combine elements from these two perspectives in an attempt to develop solutions for highstakes auction market design problems, especially the reallocation of radio spectrum in North America.
Bio: Paul Milgrom is the Shirley and Leonard Ely professor of Humanities and Sciences in the Department of Economics at Stanford University and professor, by courtesy, at the Stanford Graduate School of Business. He is a member of both the National Academy of Sciences and the American Academy of Arts and Sciences and a winner of the 2008 Nemmers Prize in Economics and the 2012 BBVA Frontiers of Knowledge award.
He is best known for his contributions to the microeconomic theory, his pioneering innovations in the practical design of multiitem auctions, and the extraordinary successes of his students and academic advisees. According to his BBVA Award citation: “Paul Milgrom has made seminal contributions to an unusually wide range of fields of economics including auctions, market design, contracts and incentives, industrial economics, economics of organizations, finance, and game theory.” According to a count by Google Scholar, Milgrom’s books and articles have received more than 60,000 citations.
Towards optimal algorithms for prediction with expert advice
Balasubramanian Sivan, Microsoft Research
We study the classic problem of prediction with expert advice. Focusing on settings with a constant number of experts, we develop optimal algorithms and obtain precisely optimal regret values for the case of 2 and 3 experts. Further, we develop regret lower bounds for the multiplicative weights algorithm that exactly match the known upper bounds for an arbitrary number of experts k. This establishes a constant factor separation between the regrets achieved by the optimal algorithm and the multiplicative weights algorithm for a constant number of experts.
The optimal algorithms have a crisp interpretation. For instance, in the case of 2 experts, the optimal algorithm follows the advice of a particular expert with exactly the probability that the expert will turn out to be the best expert.
Our main tool is the minimax principle which lets us analyze the optimal adversary to compute optimal regret values. While analyzing the optimal adversary, we establish deep connections with nontrivial aspects of random walk. We further use this connection to develop an improved regret bound for the case of 4 experts.
Joint work with Nick Gravin and Yuval Peres
Bio: Balasubramanian Sivan is a postdoctoral researcher at Microsoft Research, Redmond. He completed his PhD in Computer Science at the University of WisconsinMadison in 2013. His PhD thesis on “Prior Robust Optimization” was awarded the 2014 ACM SIGecom doctoral dissertation award and the University of WisconsinMadison Computer Science department’s outstanding graduate student researcher award. His research interests are in Algorithmic Game Theory, online and approximation algorithms, expert learning. More details can be found here: http://research.microsoft.com/enus/um/people/bsivan/
Online AB Testing
Vivek Farias, MIT
We consider the problem of AB testing when the impact of a treatment is marred by a large number of covariates. This is the situation in a number of modern applications of AB testing (such as in adTech and ecommerce) as well as in more traditional applications (such as clinical trials). Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the efficiency of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting where subjects arrive, and must be assigned, sequentially. We also characterize the value of optimization and show that it can be expected to grow large with the number of covariates. Finally, using a realworld impression dataset, we show that our algorithms can be expected to yield substantial improvements to efficiency in practice.
Joint work with Nikhil Bhat and Ciamac Moallemi (Columbia GSB)
Bio: Vivek Farias is interested in the development of new methodologies for large scale dynamic optimization and applications in revenue management, finance, marketing and healthcare. He received his Ph.D. in Electrical Engineering from Stanford University in 2007 and has been at MIT since, where he is the Robert N. Noyce Professor of Management. Outside academia, Vivek has contributed to ventures in finance and technology in the role of a researcher, consultant or founder. Vivek is a recipient of an IEEE Region 6 Undergraduate Student Paper Prize (2002), an INFORMS MSOM Student Paper Prize (2006), an MIT Solomon Buchsbaum Award (2008), an INFORMS JFIG paper prize twice (2009, 2011), the NSF CAREER award (2011), MIT Sloan’s Outstanding Teacher award (2013), and was a finalist for the 2011 INFORMS Pierskalla award in healthcare.
The spread of information in social media
Fil Menczer, Yahoo Labs/Indiana University
This talk overviews ongoing work on information diffusion in
social media, focusing in particular on mining, modeling, and
prediction tasks on data from the Twitter network. I present machine
learning efforts that leverage the structure of meme diffusion
networks and many other features to detect misinformation campaigns,
such as astroturf and social bots. We employ modeling techniques to
understand the formation of communities, the creation of social ties,
and the competition for attention. We investigate how different forms
of structural and topical diversity in the network can be leveraged to
predict which memes will go viral. Finally, it time permits, I will
review a few crowdsourcing projects exploring the computation of
unbiased scholarly impact metrics, the anonymous collection of
sensitive data, and social good.
Joint work with many members of the Center for Complex Networks and Systems Research at Indiana University. This
research is supported by the National Science Foundation, McDonnell
Foundation, and DARPA. Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the authors
and do not necessarily reflect the views of these funding agencies.
Bio: Filippo Menczer is a visiting scientist at Yahoo Labs in Sunnyvale during 201415 while on sabbatical from Indiana University, where he is a professor of informatics and computer science, adjunct professor of physics, and a member of the cognitive science program. He holds a Laurea in Physics from the
University of Rome and a Ph.D. in Computer Science and Cognitive
Science from the University of California, San Diego. Dr. Menczer has
been the recipient of Fulbright, Rotary Foundation, and NATO fellowships, and a Career Award from the National Science Foundation. He serves as director of the Center for Complex Networks and Systems Research and is a Fellow of the Institute for Scientific Interchange Foundation in Torino, Italy, a Senior Research Fellow of The Kinsey Institute, and an ACM Distinguished Scientist. He previously served as division chair in the IUB School of Informatics and Computing, and was Fellowatlarge of the Santa Fe Institute. His research focuses on Web science, social networks, social media, social computation, Web
mining, distributed and intelligent Web applications, and modeling of
complex information networks. His work has been covered in The New
York Times, Wall Street Journal, NPR, CNN, The Economist, Nature,
Science, The Washington Post, Wired, The Atlantic, BBC, and many other
US and international news sources.
Inverting a SteadyState
Sergei Vassilvitskii, Google Research
We consider the problem of inferring choices made by users from data that only contains the relative popularity of each item. We propose a framework that models the problem as that of inferring a Markov chain given its stationary distribution. Formally, given a graph and a distribution on its nodes, the goal is to assign scores to nodes such that the stationary distribution of the following Markov chain will equal the given distribution on nodes: the transition probability from a node to its neighbor is proportional to the score of the neighbor. We prove sufficient conditions under which this problem is feasible, and, for the feasible instances, obtain a simple algorithm for a generic version of the problem. This iterative algorithm provably finds the unique solution to this problem and has a polynomial rate of convergence. In practice we find that the algorithm converges after fewer than 10 iterations. We then apply this framework to choice problems in online settings and show that our algorithm is able to explain the observed data and predict the user choice, much better than other competing baselines across a variety of diverse datasets.
Joint work with Ravi Kumar, Andrew Tomkins, and Erik Vee.
Bio: Sergei Vassilvitskii is a currently a Research Scientist at Google and a Fellow at the Applied Statistics Center at Columbia University. Previously, he was a Research Scientist at Yahoo! Research and an Adjunct Assistant Professor at Columbia University. He completed his PhD at Stanford University under the supervision of Rajeev Motwani.
Strategic Learning and the Topology of Social Networks
Omer Tamuz, Microsoft Research/Caltech
We consider a group of people who each have to make a repeated choice between two lifestyle options, and who observe the choices of their friends in a social network. We assume that one of the two options is universally preferable, that each person's choice affects only his or her own health, and that any benefits are only observed in the distant future.
We model this by a Bayesian game of pure informational externalities, in which agents start with conditionally independent private signals. We show that the question of whether or not the agents learn the preferable choice depends on the topology of the social network. In particular, we identify a geometric "egalitarianism" condition on the social network graph that guarantees learning in infinite networks, or learning with high probability in large finite networks, in any equilibrium of the game.
Joint work with Elchanan Mossel and Allan Sly
Bio: Omer Tamuz is a Schramm Postdoctoral Fellow at Microsoft Research New England / MIT, and an Assistant Professor (on leave) at the faculty of the Humanities and Social Sciences division at Caltech. He received his PhD in mathematics from the Weizmann Institute, where he studied learning in Bayesian games, as well as other topics in machine learning, probability theory and ergodic theory.
Information Provision in Dynamic Innovation Tournaments
Kostas Bimpikis, Stanford
Innovation tournaments have emerged as a viable alternative to the standard research and development process. They are particularly suited for settings that feature a high degree of uncertainty about the feasibility of the innovation goal. Information about the progress of participants in these settings has an interesting dual role. On one hand, participants who have
experienced no progress start becoming pessimistic about the underlying environment and whether the innovation goal is even feasible, so learning about their competitors' gradual progress provides a positive signal about the attainability of the goal and provides an incentive for participants to continue putting effort. On the other hand, information about the status of competition may adversely affect effort provision from the laggards, who start getting discouraged about their chances of winning the tournament. Because the tournament's success crucially depends on the effort decision of the participants, the tournament's information provision mechanism is a central feature of its design. This paper explores these issues and suggests a number of design guidelines  with a focus on the role of intermediate awards as information provision devices  that help implement the objective of maximizing the probability and minimizing the time of completing the end goal.
Bio: Kostas Bimpikis is an Assistant Professor of Operations, Information and Technology at Stanford University's Graduate School of Business. Prior to joining Stanford, he spent a year as a postdoctoral research fellow at the Microsoft Research New England Lab. He received a PhD in Operations Research from the Massachusetts Institute of Technology in 2010, an MS in Computer Science from the University of California, San Diego and a BS degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece.
Sales Mechanisms in Online Markets: What Happened to Internet Auctions?
Jonathan Levin, Stanford
Consumer auctions were very popular in the early days of internet commerce, but today online sellers mostly use posted prices. Compositional shifts in the items being sold and the sellers offering these items cannot account for this evolution. Instead, the returns to sellers using auctions have diminished. We consider two hypotheses: a shift in buyer preferences away from auctions, and an increase in competition or search efficiency that favors favors posted prices. We use a model and data from eBay to tease these apart, finding that the former is more important, especially for more idiosyncratic products. We also discuss why auctions and posted prices can coexist in online markets.
Bio: Jonathan Levin is Professor of Economics at Stanford University. His research interests include industrial organization and microeconomic theory.
Quality Externalities and the Limits of Reputation in TwoSided Markets
Steve Tadelis, Berkeley
Buyers in twosided marketplace platforms may draw conclusions about the quality of the platform from any single transaction. This induces an externality across sellers that reputation mechanisms will not alleviate. Furthermore, buyers who abandon the platform without leaving feedback will cause seller reputations to be biased. Using data from eBay, we document this externality and argue that platforms can mitigate it by actively screening sellers and promoting the prominence of better quality sellers. Exploiting the bias in feedback, we create a measure of seller quality and demonstrate the benefits of our approach through a controlled experiment that prioritizes better quality sellers to a random subset of buyers. We thus highlight the importance of externalities in twosided markets and chart an agenda that aims to create more realistic models of twosided markets.
Bio: Steve Tadelis is a professor of economics at Berkeley's Haas School of Business. From 2011 to 2013 he spent a two year leave at ebay Labs, where he put together and led a group of Economists who focus on the economics of ecommerce, with particular attention to creating better matches of buyers and sellers, reducing market frictions by increasing trust and safety in eBay's marketplace, understanding the underlying value of different advertising and marketing strategies, and exploring the market benefits of different pricing structures. Aside from the economics of ecommerce, Steve’s main fields of interest are the economics of incentives, industrial organization and microeconomics. His research has focused on a firm's reputation as a valuable, tradable asset; the effects of contract design and organizational form on firm behavior with applications to outsourcing and privatization; public and private sector procurement and award mechanisms; and the determinants of trust.
The Dynamics of Online Reputation
Sinan Aral, MIT
Identity and reputation drive some of the most important decisions we make online: Who to follow or link to, whose information to trust, whose opinion to rely on when choosing a product or service, whose content to consume and share. Yet, we know very little about the dynamics of online reputation and how it affects our decision making. Sinan will describe a series of randomized experiments that explore the population level behavioral dynamics catalyzed by identity and reputation online. He will explore some of the implications for bias in online ratings, the foundations of social advertising and the ability to generate cascades of behavior through peer to peer social influence. The coming decades will likely see an emphasis on verified identities online. Sinan will argue that a new science of online identity could help guide our business, platform design and social policy decisions in light of the rising importance of online reputation.
Bio: Sinan Aral is a Scientist and an Entrepreneur: He is the David Austin Professor of Management at MIT, where he holds joint Associate Professorships in the IT and Marketing groups and coleads the Initiative on the Digital Economy. He was the Chief Scientist at SocialAmp, one of the earliest social commerce analytics companies (until its sale in 2012); and is currently the Chief Scientist at Humin, a social navigation startup developing the “Google Maps” for your social relationships.
Sinan is the ScholarinResidence at the New York Times R&D Lab and has worked closely with Facebook, Yahoo, Microsoft, Nike, IBM, Intel, Cisco, Oracle, SAP and many other leading Fortune 500 firms on realizing business value from social media and IT investments. His research has won numerous awards including the Microsoft Faculty Fellowship, the PopTech Science Fellowship, an NSF CAREER Award and a Fulbright Scholarship. He was also recently named one of the “World’s Top 40 Business School Professors Under 40” by Poets & Quants.
In his spare time, he cooks, skis and tell jokes about his own cooking and skiing. His most recent hobby is learning from his one year old son.
You can find Sinan on Twitter @sinanaral.
Control Profiles of Complex Networks
Derek Ruths, McGill University
Studying the control properties of complex networks provides insight into how designers and engineers can influence these systems to achieve a desired behavior. Topology of a network has been shown to strongly correlate with certain control properties; in this talk I will discuss the fundamental structures that explain the basis of this correlation. I'll then use these fundamental structures to construct a new statistic, the control profile, that quantifies the different proportions of controlinducing structures present in a network. Using the control profile, I'll show that standard random network models do not reproduce the kinds of control structures that are observed in real world networks. Furthermore, the profiles of real networks form three welldefined clusters that provide insight into the highlevel organization and function of complex systems.
From SmallWorld Networks to UserDriven Content Search
Amin Karbasi, ETH Zurich
In this talk, we will describe a new way of navigating through a database of similar objects using comparisons. In short, at each step, the database presents two objects to the user, who then selects among the pair the object closest to the target that she has in mind. This process continues until, based on the user’s answers, the database can uniquely identify the target she has in mind. This kind of interactive navigation (also known as exploratory search) and its variants have numerous reallife applications in contentbased image/multimedia retrieval.
We will argue that the above problem is strongly related to the smallworld network design: given a graph embedded in a metric space, how should one augment this graph by adding edges in order to minimize the expected cost of greedy forwarding. We will first show that the smallworld network design problem is NPhard. Given this negative result, we propose a novel mechanism for smallworld design and provide an upper bound on its performance. This mechanism has a natural equivalent in the context of content search through comparisons, and we establish both (order optimum) upper and lower bounds for its performance. These bounds are intuitively appealing as they provide performance guarantees in terms of the bias of the distribution of targets, captured by the entropy, as well as the topology of their embedding, captured by the doubling dimension.
Savings Monitors
Arun G. Chandrasekhar, Stanford University
We conduct a field experiment in India to explore two interventions to help individuals to increase their savings balances. First, we design a financial product based on the popular business correspondent model, which includes frequent reminders, assistance in account opening, and the setting of a sixmonth savings goal. Second, we measure the effectiveness of adding a peer monitoring component to this basic bundle and test whether the local social network can help to increase the penetration of the formal banking system. We ask whether having a monitor substitutes for a formal commitment device, whether individuals choose the most effective monitors, and moreover, whether some community members are better than others at encouraging financial capability.
Computational Fair Division: From Cake Cutting to Cloud Computing
Ariel Procaccia, Carnegie Mellon University
I will present an exciting new interaction between computer science and fair division theory. On the one hand, I will show how computational thinking provides a novel perspective on classic problems in fair division, such as cake cutting and estate division. On the other hand, I will explain how fair division theory is informing the allocation of computational resources. I will also talk about the integration of some of these theoretical ideas into a notforprofit fair division website that aims to make the world just a bit fairer.
Big Thinking: Augmenting human cognition with crowds and computation
Niki Kittur, Carnegie Mellon University
Despite the growing availability of information online, individual human cognition is fundamentally limited in the speed and amount of information it can process at once. In this talk I will discuss ways of augmenting individual cognition by taking advantage of crowds and computation. Our approaches combine the flexibility of many human minds working together with the raw power of computational systems to accelerate learning, innovation, knowledge production, and scientific discovery. I will discuss several novel crowdsourcing, machine learning and visualization systems that augment the uniquely human abilities of analogy, schema induction, and problem solving. Examples include augmenting human thinking in product innovation (e.g., Quirky); knowledge production (Wikipedia); creative tasks such as writing, journalism and poetry; and making sense of multivariate and graph data.
OSNs: The Wisdom of a Few and The Value of Users
Ricardo BaezaYates, Yahoo Labs
One of the main differences between traditional Web analysis and Online Social Networks (OSNs) studies, is that in the first case the information is organized around content, while in the second case it is organized around people. While search engines have done a good job finding relevant content across billions of pages, nowadays we do not have an equivalent tool to find relevant people in OSNs. Even though an impressive amount of research has been done in this direction, there are still a lot of gaps to cover. Although our first intuition could be (and was!) search for popular people, previous research have shown that users' indegree (e.g. number of friends or followers) is important but not enough to represent the importance and reputation of a person. Another approach is to study the content of the messages exchanged between users, trying to identify topical experts. However the computational cost of such approach  including language diversity  is a big limitation. In our work we take a contentagnostic approach, focusing in frequency, type, and time properties of user actions rather than content, mixing their static characteristics (social graph) and their activities (dynamic graphs). Our goal is to understand the role of popular users in OSNs, and also find hidden important users: Do popular users create new trends and cascades? Do they add value to the network? And, if they don't, who does it? Who generates the content of the OSN? Our research provides preliminary answers for these questions.
This is joint work with Diego SaezTrumper and colleagues in Brazil and USA.
Algorithms for Creating GameTheoretic Strategies for Large IncompleteInformation Games
Tuomas Sandholm, Carnegie Mellon University
Incompleteinformation games  such as most auctions, negotiations, and future (cyber)security settings  cannot be solved using minimax search even in principle. Completely different algorithms were needed. A dramatic scalability leap has occurred in our ability to solve such games over the last seven years, fueled largely by the Annual Computer Poker Competition. I will discuss the key domainindependent techniques that enabled this leap, including automated abstraction techniques and approaches for mitigating the issues that they raise, new equilibriumfinding algorithms, safe opponent exploitation methods, techniques that use qualitative knowledge as an extra input, and endgame solving techniques. I will finish by benchmarking poker programs against the best human poker professionals and by discussing what humans can learn from the programs.
"Going Viral" and the Structure of Online Diffusion
Sharad Goel, Microsoft Research NYC
New products, ideas, norms and behaviors are often thought to propagate through a persontoperson diffusion process analogous to the spread of an infectious disease. Until recently, however, it has been prohibitively difficult to directly observe this process, and thus to rigorously quantify or characterize the structure of information cascades. In one of the largest studies to date, we describe the diffusion structure of billions of events across several domains. We find that the vast majority of cascades are small, and are characterized by a handful of simple tree structures that terminate within one degree of an initial adopting "seed." While large cascades are extremely rare, the scale of our data allows us to investigate even the oneinamillion events. To study these rare, large cascades, we develop a formal measure of what we label "structural virality" that interpolates between two extremes: content that gains its popularity through a single, large broadcast, and that which grows via a multigenerational cascade where any one individual is directly responsible for only a fraction of the total adoption. We find that the very largest observed events nearly always exhibit high structural virality, providing some of the first direct evidence that many of the most popular products and ideas grow through persontoperson diffusion. However, mediumsized events  having thousands of adopters  exhibit surprising structural diversity, and are seen to grow both through broadcast and viral means. Finally, we show that our empirical results are largely consistent with an SIR model of contagion on a scalefree network, reminiscent of previous work on the longterm persistence of computer viruses.
The Value of Network Information
Itay Fainmesser, Brown University
Motivated by the proliferation of companies, such as Facebook.com, MySpace and Twitter, whose business models rely, at least in part, in monetizing the information on the interactions and influences of their users, we evaluate how much a monopoly can increase its profit by exploiting individual level data on "who influences whom" when designing his selling strategy. We also analyze the effect that the use of such information has on consumers' welfare. Our framework incorporates a rich set of market products, including goods characterized by global and local network effects. It also has the novel feature to distinguish between the level of influence of a consumer and her susceptibility to influence, a distinction that has been shown to be important empirically.
Joint work with Andrea Galeotti.
Prediction in complex networks: The impact of structure on learning and prediction
Jennifer Neville, Purdue University
The recent popularity of online social networks and social media has increased the amount of information available about users' behaviorincluding current activities and interactions among friends and family. This rich relational information can be exploited to predict user interests and preferences even when individual data is sparse, as the relationships are a critical source of information that identify potential statistical dependencies among people. Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model rollout on the properties of the resulting statistical modelswhich occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partiallyobservable social network and social media datasets.
Eliciting and aggregating information from the crowd
Anirban Dasgupta, Yahoo Labs!
Crowdsourcing is now widely used to replace judgement or evaluation by an expert authority with an aggregate evaluation from a number of nonexperts, in applications ranging from rating and categorizing online content all the way to evaluation of student assignments in massively open online courses (MOOCs) via peer grading. Two key issues in these settings are: i) how to provide incentives such that agents in the crowd put in their 'full effort' in making good evaluations, as well as truthfully report them and ii)
how to aggregate these evaluations, thereby giving an accurate estimate of the ground truth, given that the agents are of varying, unknown, expertise.
In this talk, we look at both of these problems. For the first, we present an information elicitation mechanism for multiple tasks when agents have endogenous proficiencies, with the following property: exerting maximum effort followed by truthful reporting of observations is a Nash equilibrium with maximum payoff.
For the aggregation problem, we consider a model where each task is binary and each agent has an unknown, fixed, reliability that determines the agent's error rate in performing tasks. The problem is to determine the truth values of the tasks solely based on the agent evaluations. We present algorithms whose error guarantees depend on the expansion properties of the agenttask graph these algorithms perform well empirically.
Joint work with Nilesh Dalvi, Arpita Ghosh, Vibhor Rastogi and Ravi Kumar.
Should auctions be complex?
Noam Nisan, Hebrew University of Jerusalem
Do simple auctions suffice for obtaining high revenue or do we need to
design complex ones? For the case of selling one item, Myerson shows
that the answer is "yes": the revenuemaximizing auction is
deterministic with a single reserve price. When selling two (or more)
items, we show that the answer is "no": complex auctions can yield
infinitely more revenue than simple ones, even when there is only a
single bidder with an additive valuation. "Complex" may be defined in
various ways here including allowing randomization as well as our
choice of measuring the auction's menu size. However, when the
bidder’s values for the items are independently distributed, the
answer is “approximately yes”: selling each of two items simply and
separately yields at least half the revenue of any auction.
Joint work with Sergiu Hart.
The extent of price misalignment in prediction markets
David Pennock, Microsoft Research, NYC
We examine misaligned prices for logically related contracts in prediction markets. First, we uncover persistent arbitrages between identical contracts on different exchanges; price support beyond what is in the published order book multiplies the size of these arbitrages. Second, misalignment between logically related contracts listed on the same exchange occurs when contracts systemically shut down or fail to respond efficiently around moments of high information flow. Third, other examples of bounded rationality include: consistent asymmetry between buying and selling, and persistent price lags between exchanges. Fourth, we propose solutions to these problems by fixing logical contradictions within the exchange and providing flexible trading interfaces, freeing traders to focus on providing information in the form they find most natural. Joint work with David Rothschild.
Adaptive Seeding in Social Networks
Yaron Singer, Google and Harvard University
The rapid adoption of social networking technologies throughout the past decade is bringing special attention to algorithmic and data mining techniques designed for maximizing information cascades in social networks. Despite the immense progress which has been made in the past decade, due to limited data access and the structure of social networks the application of stateoftheart techniques often results in poor performance.
In this talk we will introduce a new framework we call adaptive seeding. The framework is a twostage model which leverages a phenomenon known as the "friendship paradox" in social networks in order to dramatically increase the spread of information cascades. Our main result shows constant factor approximations are achievable for the most wellstudied models of information spreading in social networks. The result follows from new techniques and concepts that may be of independent interest for those interested in stochastic optimization and machine learning.
Joint work with Lior Seeman.
Pricing in Social Networks
Francis Bloch, Ecole Polytechnique, France.
We analyze the problem of optimal monopoly pricing in social networks where agents care about consumption or prices of their neighbors. We characterize the relation between optimal prices and consumers' centrality in the social network. This relation depends on the market structure (monopoly vs. oligopoly) and on the type of externalities (consumption versus price). We identify two situations where the monopolist does not discriminate across nodes in the network (linear monopoly with consumption externalities and local monopolies with price externalities). We also analyze the robustness of the analysis with respect to changes in demand, and the introduction of bargaining between the monopolist and the consumer.
(joint with Nicolas Querou)
Language and Social Dynamics in Online Communities
Cristian DanescuNiculescuMizil, Stanford University and Max Planck Institute SWS.
Much of online social activity takes the form of natural language, from product reviews to conversations on socialmedia platforms. I will show how analyzing these interactions from the perspective of language use can provide a new understanding of social dynamics in online communities.
I will describe two of my efforts in this direction. The first project leverages insights from psycholinguistics to build a novel computational framework that shows how key aspects of social relations between individuals are embedded in (and can be inferred from) their conversational behavior. In particular, I will discuss how power differentials between interlocutors are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. The second project explores the relation between users and their community, as revealed by patterns of linguistic change. I will show that users follow a determined lifecycle with respect to their susceptibility to adopt new community norms, and how this insight can be harnessed to predict how long a user will stay active in the community.
This talk includes joint work with Susan Dumais, Michael Gamon, Dan Jurafsky, Jon Kleinberg, Jure Leskovec, Lillian Lee, Bo Pang, Christopher Potts and Robert West.
References: WWW 2013, WWW 2012, WWW 2011.
Current challenges in kidney exchange  need for long chains and dynamic matching
Itai Ashlagi, MIT Sloan
A kidney exchange network establishes a pool of patientdonor pairs, nondirected donors, and patients on the deceased donor waiting lists, and seeks to arrange transplants among this pool.
It has been previously shown that for sufficiently large pools of patientdonor pairs,
(almost) efficient kidney exchange can be achieved by using at most 3way cycles, i.e. by using cycles among no more than 3 patientdonor pairs.
However, as kidney exchange has grown in practice, cycles among n > 3 pairs have proved useful, and long chains initiated by nondirected, altruistic donors have proven to be very effective. We explore why this is the case, both empirically and theoretically. We provide an analytical model of exchange when there are many highly sensitized patients, and show that large cycles of exchange or long chains can significantly increase efficiency when the opportunities for exchange are sparse. As very large cycles of exchange cannot be used in practice, long nonsimultaneous chains initiated by nondirected donors significantly increase efficiency in patient pools of the size and com position that presently exist. Most importantly, long chains benefit highly sensitized patients without harming lowsensitized patients.
One way to make kidney exchange networks thicker is by waiting for pairs to arrive—waiting longer allows the pool to become larger, but imposes costs on those already in the pool. We study empirically and theoretically the impact of the waiting periods on the number of matches.
Based on joint work with David Gamarnik, Alvin Roth, Mike Rees, Patrick Jaillet and Vahideh Manshadi
Take it or Leave it: Running a Survey when Privacy Comes at a Cost
Katrina Ligett, Caltech
In this talk, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any nontrivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a takeitorleaveit offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passersby.
Joint work with Aaron Roth.
paper: http://users.cms.caltech.edu/~katrina/papers/buyingprivacy.pdf
Deceased Organ Donation and Allocation: 3 Experiments in Market Design
Al Roth, Stanford
How should you allocate a scarce resource whose scarcity/availability might be sensitive to how you allocate it? I'll talk about some work I'm just beginning involving the donation and allocation of deceased donor organs for transplantation. This presents a different set of problems from our (more advanced) work on live organ donation and kidney exchange that has received a lot of recent publicity (and was the subject of Itai Ashlagi's recent talk). However the evidence suggests that even for the case of kidney transplantation the current scarcity cannot be solved by live donation alone, and for most organs live donation is not feasible. I'll discuss the new Israeli organ transplant law, and some experiments that will help us understand it better as field data become available. I'll also discuss an experiment in how organ donors are solicited in the United States.
DIMACS Workshop on Economic Aspects of Information Sharing
The DIMACS Workshop on Economic Aspects of Information Sharing is being held at Stanford Feb 78, 2013. It is being cohosted and cosponsored by RAIN.
In recent years, the collection and analysis of user data over the Internet has experienced a surge. Online services and social networking sites have facilitated and encouraged the online disclosure of a wide array of personal information, while webtracking platforms have enabled the consolidation of user data from disperse sources. The reach of these services has been significantly extended by the integration of mobile devices with the cloud. Companies increasingly view data analytics as an integral part of their operations, and there is an increased need of new means of monetizing data beyond advertising. In parallel, regulatory bodies, consumer advocacy groups as well as Internet companies themselves have struggled to find a balance between the financial benefits of user data monetization and the privacy concerns that it raises.
Motivated by these developments, this workshop aims at studying economic aspects of information sharing from several different aspects, including online data markets, targeted advertising, user incentive mechanisms, and privacyaware mechanism design.
Citation networks: Evaluating the impact of science
Santo Fortunato, Aalto University, Finland
Citation statistics are regularly adopted to evaluate different actors in the academic and scientific arena, from scholars, to journals, departments, universities, national institutions and whole countries. The scores play a crucial role to decide which grants are awarded, the rankings of applicants for a job, even the fate of scientific institutions. For these reasons citation networks have been carefully investigated over the past decades. I will discuss some recent findings in this area. Citation distributions are universal across many different disciplines, if suitable normalized indicators are adopted as measures of performance, so avoiding the bias of field dependence. Furthermore, citation dynamics is characterized by bursts, usually occurring within a few years since the publication of a paper, and the burst size spans several orders of magnitude. This bursty dynamics can be described by a linear preferential attachment model with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.
CrowdPowered Systems
Michael Bernstein, Stanford
Crowdpowered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone.
Unfortunately, crowd work is errorprone and slow, making it difficult to incorporate crowds as firstorder building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems.
In this talk, I will present two crowdpowered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid microcontributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in midair jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation.
Rumor in a network: Who's the culprit?
Devavrat Shah, MIT
Suppose a malicious "worm" has spread in a network (e.g. Stuxnet) and interest is in finding who is the source (so that many of the following possible actions be taken: report news to the world, reward, take punitive action). Motivated by such scenarios where "information" diffuses in the network starting from an unknown source and interest is in identifying the "source" based on the observed spread, we discuss the problem of finding the source of a rumor in a network. We will design a computationally efficient source estimator for the popular SusceptibleInfected (SI) information spreading model with generic spreading time. In establishing the effectiveness of the estimator, we shall explain the role played by the underlying network structure (a form of expansion property) in determining ability of detection. Connections to generalized Polya's urn, network centrality and number 1  ln 2 will be explained.
This is based on joint work with Tauhid Zaman (MIT).
On Bitcoin and Red Balloons
Moshe Babaioff, Microsoft Research
Many large decentralized systems rely on information propagation to ensure their proper function. We examine a common scenario in which only participants that are aware of the information can compete for some reward, and thus informed participants have an incentive {\em not} to propagate information to others. One recent example in which such tension arises is the 2009 DARPA Network Challenge (finding red balloons). We focus on another prominent example: Bitcoin, a decentralized electronic currency system.
Bitcoin represents a radical new approach to monetary systems. It has been getting a large amount of public attention over the last year, both in policy discussions and in the popular press \cite{NY11,technologyreview}. Its cryptographic fundamentals have largely held up even as its usage has become increasingly widespread. We find, however, that it exhibits a fundamental problem of a different nature, based on how its incentives are structured. We propose a modification to the protocol that can eliminate this problem. Bitcoin relies on a peertopeer network to track transactions that are performed with the currency. For this purpose, every transaction a node learns about should be transmitted to its neighbors in the network. As the protocol is currently defined and implemented, it does not provide an incentive for nodes to broadcast transactions they are aware of. In fact, it provides an incentive not to do so. Our solution is to augment the protocol with a scheme that rewards information propagation. Since clones are easy to create in the Bitcoin system, an important feature of our scheme is Sybilproofness. We show that our proposed scheme succeeds in setting the correct incentives, that it is Sybilproof, and that it requires only a small payment overhead, all this is achieved with iterated elimination of dominated strategies. We complement this result by showing that there are no reward schemes in which information propagation and no selfcloning is a dominant strategy.
Joint with Shahar Dobzinski, Sigal Oren, and Aviv Zohar.

You can find the paper here and here.
A short description appeared at ACM SIGecom Exchanges.
Information Asymmetries in firstprice commonvalue auctions, or The Value of Information
David Kempe, USC
We study the role of information asymmetries in firstprice common value
auctions, and how different information structures provide revenue opportunities
to thirdparty information brokers. One of our motivations is Internet ad
auctions, where ads for specific site visitors have a commonvalue nature, in
that displaying an ad to a bot has no value, and users are likely to carry the
sameÂ (or similar) value to different advertisers. Cookies significantly improve
the estimate of this value; the fact that different advertisers use cookies to
different extents (and use cookies of different qualities) thus leads to
significant information asymmetries. These asymmetries can be mitigated or
exacerbated when the advertisers obtain cookie information from thirdparty
information brokers.
We focus on a firstprice common value auction in which bidders have access to
independent discrete signals about the value of an
item, which is either 0 or 1. Our first main result is that with two bidders,
this auction format has a unique Nash equilibrium; we give a lineartime
algorithm for computing this equilibrium and the resulting value of the
auction for each player. We then use this equilibrium calculation to
understand the value of additional information to the bidders.
Thirdparty information has the interesting property that it can (in
principle) be allocated to multiple bidders, and that it carries
significant externalities.
We find that the value of additional information is subtle, and depends on the
prior and the information already available. We exhibit several surprising
possibilities including (1) Additional information about the item does not
always result in higher value for a bidder. (2) A bidder may prefer his
competitor (but not himself) to receive the broker's signal; this applies both
to the more and less informed bidder.
[Joint work with Vasilis Syrgkanis and Eva Tardos]
Random Utility Models for Social Choice
Lirong Xia, Harvard
Social choice studies ordinal preference aggregation with
wide applications ranging from highstakes political elections to
lowstakes movie rating websites. In many cases, we want to design a
social choice mechanism that reveals the ground truth via MLE
inference. Despite its importance, this objective has been largely
hindered by the lack of natural probabilistic models and efficient
inference algorithms.
In this talk, I will focus on a wide class of probabilistic models
called Random Utility Models (RUMs), whose MLE inference was
previously believed intractable in general. I will introduce a fast
MCEM algorithm for a very general and natural subclass of RUMs, and
discuss its applications and impact on designing better social choice
mechanisms. Extension of this algorithm also provides a computational
basis for improving models in many applications in economics as well
as machine learning, especially learning to rank.
Based on joint work with Hossein Azari Soufiani and David C. Parkes.
Advertiser Behavior in Sponsored Search Auctions
Susan Athey, Stanford
Economists often build "structural models," where they specify a specific model of individual behavior and then use data to estimate the parameters of the model. Although such models require strong assumptions, they have the advantage that they can make principled predictions about what would happen if the environment changed in a way that has not been observed in the past. I will describe the application of these techniques to advertiser behavior in sponsored search advertising auctions, focusing on how the models can be used for marketplace design and management. I will discuss economists' focus on causal inference in statistical models as well as the ways in which experiments can be used to estimate and test structural models. I will also make some suggestions about research directions at the intersection of economics and machine learning.
Automatic Discovery of Patterns in Media Content
Nello Cristianini, University of Bristol
What can we learn about the world (and the media system) by analysing millions of news articles or tweets ? Media content analysis has historically been the domain of the social sciences, but recently we are witnessing a strong trend towards the automation of many tasks, paving the way for a new  computational  approach to social science and even the humanities.
In this talk we will survey the results obtained over the past 5 years at the Intelligent Systems Laboratory of Bristol, in the area of automating the analysis of news media content. By combining techniques from machine translation, pattern recognition, statistical learning, information retrieval, we will analyse patterns connected to the US Presidential Elections, to UK public opinion, and to EU cultural biases.
Auctions and Allocations with Positive Network Externalities
Kamesh Munagala, Duke University
With the advent of social networks such as Facebook, Twitter, and LinkedIn, and online offers/deals web sites, network externalties raise the possibility of marketing and advertising to users based on influence they derive from their neighbors in such networks. Indeed, a user's valuation for a product is changed by his knowledge of which of his neighbors likes or owns the product. Designing allocation schemes and auctions in the presence of network externalities is not very well understood, and poses many challenges that are hard to address with traditional techniques. We show that even in very simple settings, optimal auction design becomes APXHard, and pricing schemes may not admit to Nash equilibria in buyer behavior. On the positive side, we present approximation algorithms for welfare maximization and optimal auctions in both single item and multiitem settings via interesting linear programming relaxations.
Joint work with Anand Bhalgat and Sreenivas Gollapudi (working paper); and with Nima Haghpanah, Nicole Immorlica, and Vahab Mirrokni (EC 2011).
Adolescent Societies  Their Form, Evolution, and Variation
Daniel McFarland, Stanford University
Adolescent societies  whether arising in weak, shortterm classroom friendships or close, longterm friendships  exhibit tendencies toward segregation, hierarchy and clustering. It is difficult, however, to explain the macrovariation we observe in these social networks drawing only on tie formation micromechanisms. Some adolescent societies are rankordered caste systems and others are flat, cliquish worlds. What accounts for this variation? We propose and test an ecological theory of social network formation where features of organizational environments moderate tie formation processes to account for the macrostructural variation across settings. We develop this argument using longitudinal friendship data on schools (Add Health study) and classrooms (Classroom engagement study), and by extending exponential random graph models to the study of multiple societies over time.
Joint work with James Moody, David Diehl, Jeff Smith, R. Jack Thomas
The Simple Economics of Approximately Optimal Auctions
Jason Hartline, Northwestern University
Systems wherein strategic agents compete for limited resources are ubiquitous: the economy, computer networks, social networks, congestion networks, etc. Auction theory governs the design and
analysis of these systems. I will describe a method for using approximation in auction theory to identify simple, practical, and robust auction mechanisms that perform nearly as well as the complex optimal ones. A main goal of this approach is to understand the extent to which important intuition from optimal auctions for ideal models extends to approximately optimal auctions
for more realistic models.
Auction theory is well understood only in ideal models where agents have singledimensional, linear preferences. I.e., each agent has a value for receiving a service and her utility is the difference
between her value and her payment. For these models optimal auctions can be interpreted as "marginal revenue" maximizers (Myerson, 1981; Bulow and Roberts 1989). In more realistic models, i.e., where bidders have multidimensional preferences for different outcomes or nonlinear utility, similar intuitive characterizations are unknown. Understanding good auctions for these environments is considered to be the main challenge in auction theory. In these more realistic environments maximizing marginal revenue may not be optimal, and furthermore, there is sometimes no direct way to
implementing the marginal revenue maximization mechanism. I will present two results: I will give procedures for implementing marginal revenue maximization in general, and I will show that marginal revenue maximization is approximately optimal. The approximation factor smoothly degrades in a term that quantifies how far the environment is from an ideal one (i.e., where marginal revenue
maximization is optimal).
Joint work with Saeed Alaei, Hu Fu, Nima Haghpanah, and Azarakhsh Malekian.
Beyond Keyword Search: Taming Information Overload with Rich User Interactions
Khalid ElArini, Carnegie Mellon University
It has been almost two decades since the first search engine appeared on the Web. In that time, much of our lives has drifted online, from the news we read to the products we buy to our relationships with friends, family and colleagues. As such, information overload has become a ubiquitous problem that affects nearly all members of society, as we try to sift through millions of articles and status updates to get an understanding of our world. However, while our information needs have grown more complex and varied over time, the traditional information retrieval paradigm has scarcely changed: users still submit strings of keywords, and receive a ranked list of hyperlinks in return.
In this talk, I argue the need for new models of user interaction that are better suited to modern retrieval tasks. Some tasks, such as reading the news, can be inherently queryless, while others, such as conducting scientific research, require complex queries. In both cases, a user's personal tastes and beliefs can color the results he or she expects to see. Here, I present a general information retrieval framework that leverages ideas from submodular optimization and probabilistic graphical models to unify both settings. I show results in the news and science domains demonstrating that our approach outperforms stateoftheart alternatives from Google and Yahoo! on real users.
Finally, as personalization itself has become ubiquitous on the Web, it is important to provide users with an interaction mechanism that allows them to inspectand potentially correctinferred user profile information. I present recent work that learns such transparent user profiles from user behavior in a largescale social network.
Optimal targeting strategies over social networks
Kostas Bimpikis, Stanford University
Recent advances in information technology have allowed ^Lrms to gather vast amounts of data regarding consumers' preferences and the structure and intensity of their social interactions. The question that arises naturally is whether ^Lfirms can intelligently use the available data to improve their business strategies. In this presentation, we take a step towards this direction by discussing our results on optimal price discrimination and targeted advertising over networks. In particular, in the ^Lrst part of the talk, we study the pricing problem of a monopolist selling a divisible good (service) to consumers who are embedded in a social network. We derive a characterization of the optimal price for each consumer as a function of her network position and illustrate the value of knowing the underlying network structure by comparing the pro^Lts of a monopolist who does not take the network into account to those of a monopolist who uses this information optimally.
The second part of the talk examines a gametheoretic model of competition between ^Lfirms, which can target their marketing budgets to individuals. We provide a sharp characterization of the optimal targeted marketing strategies and highlight their dependence on the consumers' references as well as on the underlying social network structure. In particular, firms find it optimal to allocate their marketing budgets to consumers in proportion to their \network centrality", a measure of social influence. Moreover, we identify network structures for which targeted advertising is more beneficial for the ^Lfirms and, ^Lnally, we show how the difference in the initial budgets affect the outcome of the marketing competition between them.
Individual Heterogeneity and the Formation of Collaboration Networks
Katharine Anderson, Carnegie Mellon University
Collaboration drives innovation in a variety of contexts, including product development, entrepreneurship, policymaking, and academic research. Collaboration networksin which two people are connected if they work together on a problemprovide a valuable tool for understanding the structure of these collaborative communities. Empirically, individuals in collaboration networks play a wide variety of rolessome individuals have many links, while others have few links, some bridge between communities while others play a more peripheral role. From a social scientific perspective, we would like to believe that the heterogeneity we observe in social network position reflects some kind of underlying heterogeneity in individualsthat individuals occupy different positions in the social network, not because they were lucky, but because of some identifiable underlying characteristics. However, our current models of network formation use homogeneous agents, and thus cannot be used to answer questions about individual heterogeneity and network structure.
Here, I introduce a model of network formation in which individuals with heterogeneous skill sets choose their collaborators in order to gain access to skills that are complementary to their own. Together, these collaborative connections comprise a collaboration network. This class of models is interesting, because it connects individual heterogeneity to network heterogeneity, and thus can be used to answer questions about 1) how an individual's skill set affects her position on the social network and 2) how the distribution of skills in the population affects the overall structure of the network. Using this model, I show that an individual's degree on the collaboration network is a supermodular function of her set of skills. As a result, the degree distribution of the network may be highly skewed, even when the distribution of skills in the population is not. This skew becomes more pronounced as individuals in the network have fewer of the skills required to solve their problemsthat is, as problems get more difficult. I then use an extension of this model to make predictions about the network position of specialistsindividuals whose skills lie within a single subject areaand generalistsindividuals whose skill bridge different subject areas. I show that while specialists tend to have more links than generalists, generalists are more likely to bridge between communities. I also show that as disciplinary boundaries fade, the degree distribution of the network becomes more skewed towards a few, highdegree superstars.
DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management.
Dan Iancu, Stanford University
Robust optimization has recently emerged as a computationallytractable and scalable paradigm for modeling complex decision problems underuncertainty. Its applications span a wide range of challenging problems infinance, pricing, supply chain management or network design. In the context of dynamicrobust decision models, a typical approach is to look for control policies (ordecision rules) that are directly parameterized in the uncertaintiesaffecting the model. This has the advantage of leading to simple convexoptimization problems for finding particular classes of rules (e.g., affine).However, the approach typically yields suboptimal policies, and is hard toanalyze. In the first part of the talk, we seek to bridge this paradigm withthe classical Dynamic Programming (DP) framework for solving dynamic decision problems. We provide a set of unifying conditions (based onthe interplay between the convexity and supermodularity of the DP valuefunctions, and the lattice structure of the uncertainty sets) that, takentogether, guarantee the optimality of the class of affine decision rules, and furthermore allow suchrules to be found very efficiently. Ourresults suggest new modeling paradigms for dynamic robust optimization, and ourproofs bring together ideas from three areas of optimization typically studied separately: robust,combinatorial (lattice programming andsupermodularity), and global (the theory of concave envelopes). Weexemplify the approach in a model for negotiating flexiblesupply chain contracts between several parties, where optimal capacity,contracting, and ordering policies can be found by solving a simple linearprogram. In the second part of the talk, we consider a class of models arisingin supply chain networks and multiproduct pricing, for which we suggest a newclass of polynomial decision rules, found by solving compact semidefinite programs,and typically yielding drastic improvement compared with the status quo. This is joint work with Dimitris Bertsimas, PabloParrilo, Mayank Sharma and Maxim Sviridenko.
Online Concave Matching
Kamal Jain, Microsoft Research
Online matching was introduced by Karp, Vazirani, Vazirani [KVV] in 1990, which over time became a corner stone of online algorithms. In the matching problem of KVV, the edges are unweighted. Mehta, Saberi, Vazirani, and Vazirani [MSVV] and Buchbinder, Naor, and Jain, developed algorithms for the weighted version of online matching but at the expense of finding a fractional solution. In many large scale instances, such as matching ads to consumers, fractionality is not a major compromise.
The major compromise is that the existing work allows only hard budget constraint, e.g., so many matches per node or a linear utility function bounded above by a budget constraint, which essentially gives you a very specific concave utility function. In practice people have concave utility functions.
In this work we obtain an online algorithm for concave matching problem. The performance of our algorithm is optimal. We use a convex programming duality. Using variational calculus we show how the worst possible example is dual to the best possible algorithm, thereby proving the optimality of the performance guarantee of our algorithm. In hind sight, one can reconstruct the algorithm by studying the worst possible example itself.
Credits: This is a joint work with Nikhil Devanur.
Acknowledgement: This is a work, I had been doing for the last 7 years. Mark Braverman, Alexander Holroyd, Yuval Peres, and Oded Scramm are some of the people who have guided us.
Price Prediction Strategies for Simultaneous OneShot Auctions
Michael Wellman, University of Michigan
Bidding in simultaneous auctions is challenging because an agent's value for a given good may depend on the outcome of other auctions. Given the lack of known solutions to general simultaneous auction games, previous works have addressed this classic exposure problem with heuristic strategies employing probabilistic price predictions. We introduce a concept of selfconfirming prices, and show that within an independent private value model, bidding optimally with respect to selfconfirming price predictions is w.l.o.g. in equilibrium. We exhibit practical procedures to compute approximate equilibrium strategies, and demonstrate via empirical gametheoretic analysis that these strategies are effective in simultaneous auction environments with both complementary and substitutable preference structures.
** This is joint work with Amy Greenwald and Eric Sodomka of Brown University.
Lattice Games and the Economics of Aggregators
George Varghese, UCSD and Yahoo! Research
We model the strategic decisions of web sites in content markets,
where sites may reduce user search cost by aggregating content.
Example aggregations include political news, technology, and other
nichetopic websites. We model this market scenario as an extensive
form game of complete information, where sites choose a set
of content to aggregate and users associate with sites that are nearest
to their interests.
Thus, our scenario is a location game in which sites choose to
aggregate content at a certain point in userpreference space, and
our choice of distance metric, Jacquard distance, induces a lattice
structure on the game. We provide two variants of this scenario:
one where users associate with the first site to enter amongst sites
of equal distances, and a second where users choose uniformly between
sites at equal distances. We show that Subgame Perfect Nash
Equilibria exist for both games. While it appears to be computationally
hard to compute equilibria in both games, we show a
polynomialtime satisficing strategy called Frontier Descent for the
first game. A satisficing strategy is not a best response but ensures
that earlier sites will have positive profits, assuming all subsequent
sites also have positive profits. By contrast, we show that the second
game has no satisficing solution.
**Joint work with P. Jordan, U. Nadav, K. Punera and A. Skrzypacz at Yahoo
Competitive Contagion in Networks
Michael Kearns, University of Pennsylvania
We examine a gametheoretic model of "competitive contagion" in networks,
where two competing companies or other entities have limited budgets to seed
initial infections in an underlying social network, which then mediates stochastic
contagion. The payoff to each party is their final number of infections, which at
equilibrium may come at the expense of the other party. In this model we provide
characterizations of the Price of Anarchy and a new notion called the Price of Budgets.
These characterizations depend on properties of the local stochastic dynamics of
contagion, and in some cases exhibit sharp threshold behavior.
**Joint research with Sanjeev Goyal of the University of Cambridge.
Building an Institutional Field to Corral a Government
Stephen Barley, Stanford University
This talk will describe how the American business community either founded or revitalized eight distinct populations of organizations during the late 1970s and the 1980s which intentionally and unintentionally formed an institutional field (a network of organizations) designed to influence the federal government and create a more business friendly political environment.
Cooperation and Assortativity with Endogenous Partner Selection
Siddharth Suri, Yahoo! Research
The natural tendency for humans to choose with whom to form new relationships and with whom to end established relationships is thought to facilitate the emergence of cooperation. Helping cooperators to mix assortatively is believed to reinforce the rewards accruing to mutual cooperation while simultaneously excluding defectors. However, the relationship between endogenous partner selection, assortativity, and cooperation has been largely unexplored experimentally. Here we report on a series of human subjects experiments in which groups of 24 participants played a multiplayer prisoner's dilemma game where, critically, they were also allowed to propose and delete links to players of their own choosing at some variable rate. Over a wide variety of parameter settings and initial conditions, we found that endogenous partner selection significantly increased the level of cooperation, the average payoffs to players, and the assortativity between cooperators. Even relatively slow update rates were sufficient to produce large effects resulting in cooperation levels over 80%. Subsequent increases to the update rate still had a positive, although smaller, effect. For standard prisoner's dilemma payoffs, we also found that assortativity resulted predominantly from cooperators avoiding defectors, not by severing ties with defecting partners, and that cooperation correspondingly suffered. Finally, by modifying the payoffs to satisfy two novel conditions, we found that cooperators did punish defectors by severing ties, leading to levels of cooperation approaching 100% which persisted for longer.
**Joint work with: Jing Wang (NYU) and Duncan Watts (Yahoo! Research)
Two(!) Good To Be True
Sergiu Hart, The Hebrew University of Jerusalem
How to sell goods optimally? While the mechanism design literature
has solved this problem neatly when there is only one good, the
multiple goods case turns out to be extremely difficult,
mathematically and conceptually. Much of what is true for one good
does not extend to multiple goods. We will try to explain the
difficulties, show what can go wrong, and then present some universal
approximation results. The talk is essentially selfcontained; no
background in mechanism design is necessary.
**Joint work with Noam Nisan and Phil Reny
Identifying peer effects in online communication behaviors
Dean Eckles, Facebook
Peer effects can produce clustering of behavior in social networks,
but so can homophily and common external causes. For observational
studies, adjustment and matching estimators for peer effects require
often implausible assumptions, but it is only rarely possible to
conduct appropriate direct experiments to study peer influence in
situ. We describe research designs in which individuals are randomly
encouraged to perform a focal behavior, which can subsequently
influence their peers. Ubiquitous product optimization experiments on
Internet services can be used for these analyses. This approach is
illustrated with an analysis of peer effects in expressions of
gratitude via Facebook on Thanksgiving Day 2010, with implications for
the microfoundations of culture.
Networks, communities and the groundtruth
Jure Leskovec
The Web, society, information, cells and brain can all be represented
and studied as complex networks of interactions. Nodes in such
networks tend to organize into clusters and communities, which
represent the fundamental structures for understanding the
organization of complex systems. Even though detection of network
communities is of significant importance for computer science,
sociology and biology, our understanding of the community structure of
large networks remains limited.
We study a set of more than 200 large networks with the goal to
understand and identify communities in networks. We challenge the
conventional view of network community structure and show that it is
not exhibited by the large realworld networks. We then present a new
conceptual model of network community structure, which reliably
captures the overall structure of networks and accurately identifies
the overlapping nature of network communities.
This is joint work with Jaewon Yang.
Computing GameTheoretic Solutions for Security
Vincent Conitzer
Algorithms for computing gametheoretic solutions are now deployed in
realworld security domains, notably air travel. These applications
raise some hard questions. How do we deal with the equilibrium
selection problem? How is the temporal and informational structure of
the game best modeled? What assumptions can we reasonably make about
the utility functions of the attacker and the defender? And, last but
not least, can we make all these modeling decisions in a way that
allows us to scale to realistic instances? I will present our ongoing
work on answering these questions.
Joint work with Dmytro Korzhyk, Joshua Letchford, Kamesh Munagala,
Ronald Parr (Duke); Manish Jain, Zhengyu Yin, Milind Tambe (USC);
Christopher Kiekintveld (UT El Paso); Ondrej Vanek, Michal Pechoucek
(Czech Technical University); and Tuomas Sandholm (CMU).
Brief bio: Vincent Conitzer is the Sally Dalton Robinson Professor of
Computer Science and Professor of Economics at Duke University. His
research focuses on computational aspects of microeconomic theory, in
particular game theory, mechanism design, voting/social choice, and
auctions. He recently received the IJCAI Computers and Thought Award,
which is awarded to outstanding young scientists in artificial
intelligence.
Revenueoptimal Auctions
Costis Daskalakis
In his seminal paper, Myerson [1981] provides a revenueoptimal auction for
a seller who is looking to sell a single item to multiple bidders.
Unfortunately, Myerson's auction generalizes to a limited range of domains,
called "singleparameter", where each bidder's preference over the auction’s
outcomes is specified by a single parameter. Indeed, extending this auction
to "multiparameter domains", such as selling multiple heterogeneous items
to bidders, has been one of the most central problems in Mathematical Economics.
We solve this problem in bidder and item symmetric settings. For a single
bidder, we solve the general problem.
(Based on joint work with Yang Cai and Matt Weinberg)
Bio: Constantinos (or Costis) Daskalakis is an Assistant Professor of EECS at
MIT. He completed his undergraduate studies in Greece, at the National Technical
University of Athens, and obtained a PhD in Computer Science from UC Berkeley.
After Berkeley he was a postdoctoral researcher in Microsoft Research New
England, and since 2009 he has been at the faculty of MIT. His research
interests lie in Algorithmic Game Theory and Applied Probability, in particular
computational aspects of markets and the Internet, social networks, and
computational problems in Biology. Costis has been honored with a 2007 Microsoft
Graduate Research Fellowship, the 2008 Game Theory and Computer Science Prize
from the Game Theory Society, the 2008 ACM Doctoral Dissertation Award, a NSF
Career Award, a 2010 Sloan Foundation Fellowship in Computer Science, the 2011
SIAM Outstanding Paper Prize, and the MIT Ruth and Joel Spira Award for
Distinguished Teaching.
A Random Graph Model of MultiHospital Kidney Exchanges
David Parkes
Multihospital kidney exchanges, where hospitals share local lists of
donorpatient pairs with a centralized exchange, promise to faciliate
larger matches, and thus additional transplants. But a central challenge
is to align incentives, so that hospitals will choose to share all pairs
with the exchange. We present a randomgraph model of feasible
transplants, and use the model to (a) explain early experimental results
on 2way and 3way exchanges anayltically, (b) derive a "squareroot law"
for welfare gains from centralized pools, and (c) design a matching
mechanism that is BayesNash incentive compatible, efficient and
individual rational under reasonable assumptions. The robustness of the
mechanism is established through a computational study, validating
incentive alignment, and estimating an efficiency loss due to incentive
alignment of 14% for 2way exchanges and 519% for 3way exchanges (and
decreasing with larger pools.)
Joint work with Panos Toulis
A Test of the "Fifteen Minutes" Hypothesis using LargeScale Data from Newspapers and Blogs
Arnout van de Rijt
Contemporary scholars of fame argue that public attention to persons is shortlived. We
investigate this "fifteen minutes" hypothesis in a unique data source containing daily records of references to millions of names in 2,500 Englishlanguage newspapers, on TV station websites, and on blogs. Longitudinal analysis shows that instead fame tends to carry
over from one year to the next, except below some threshold level of marginal public attention where fame does appear to be fleeting. This pattern of abovethreshold annual stability is found back even in newspaper entertainment sections, celebrityoriented
tabloids, and blogs, where fame is thought to be most ephemeral. Analysis of archival newspaper data reveals that while coverage of small names follows a pattern of rapid decay, big names traverse a careerlike trajectory spanning many years. We take this evidence
to suggest that fame is created through an interaction between static social hierarchies and dynamic reinforcement processes, whereby a population is prestratified in terms of the chances that a cascade of public celebration will latch on to one of its members.
Treasure Hunt
Markus Mobius
We seed a large realworld social network with binary information and analyze
subsequent social learning. A unique feature of our field experiment is that
we measure both the preexisting social networks and the actual conversation
network. Our experiment allows us to test how rational agents behave when
processing information that originates within their social network. We find
that information decays quickly with social distance and that agents mainly
incorporate information within social distance 2. Conversations through common
friends do not increase the weight that a subject places on signals from
direct friends but linearly increases the weight on signals from indirect friends.
This suggests that agents are able to avoid doublecounting information from
indirect friends. We propose a simple “streams model” of social learning that
is consistent with the evidence from our experiment.
Task Routing for Prediction Tasks
Yiling Chen
We describe methods for routing a prediction task on a network where each participant
can contribute information and route the task onwards. We introduce routing scoring
rules that bring truthful contribution of information and optimal routing of the task into
a Perfect Bayesian Equilibrium under common knowledge about the amount of information
available to each agent on the network. Relaxing the common knowledge assumption, we
address the challenge of routing in situations where states of information differ among
agents. We introduce a family of local routing rules, which isolate routing decisions that
depend only on local information in equilibrium, while still promoting effective routing.
Local routing rules are the only routing scoring rules that induce truthful equilibria in
which agents' routing decisions rely solely on local information. Simulation results show
that local routing decisions can lead to effective task routing.
Joint work with Haoqi Zhang, Eric Horvitz, and David Parkes.
Bargaining Theory in the Cloud
Eric Friedman
The axiomatic theory of bargaining solutions was initiated
by John Nash with his seminal paper in 1950 and has a long
and mostly mathematical history. Surprisingly, it arises
naturally in a variety of allocation problems arising in
cloud computing. For example, the second most famous
bargaining solution, the KalaiSmorodinsky solution, is the
outcome of a simple water filling algorithm used in the
Mesos Platform and has many strong properties in that
setting, including incentive compatibility and fairness.
In this talk, I will explore these connections for a
variety of cloud computing problems and show how axiomatic
bargaining theory can be used to analyze allocation
problems in the cloud and conversely how cloud computing
sheds new light on axiomatic bargaining theory.
This talk is based on joint work with Ali Ghodsi, Scott
Shenker and Ion Stoica.
Network Patterns of Favor Exchange
Matthew Jackson
Abstract: We examine the informal exchange of favors among
the members of a society. We analyze a game where the fear of
losing multiple relationships can provide incentives for favor
provision. We characterize the network patterns of exchange
that are robust in a sense that deleted relationships only
result in a local loss of favor exchange. Such robustness
necessitates networks such that all links are "supported":
any pair of individuals exchanging favors must have a common
friend. We show that favor exchange networks in 75 villages in
rural India exhibit a frequency of this sort of support that
is significantly higher than a standard network 'clustering'
measure.
Coauthors Tomas RodriguezBarraquer and Xu Tan
Simple Auctions with NearOptimal Equilibria
Tim Roughgarden
In practice, auction designers often exchange
incentivecompatibility for simplicity (e.g., in sponsored
search auctions or combinatorial auctions). How much welfare
loss does this design choice cause? I'll talk about some
techniques for answering this question.
Based largely on a SODA '11 paper with Kshipra Bhawalkar.
Efficient Online Ad Serving in a Display Advertising Exchange
Kevin Lang
We begin with a tutorial overview of the basic concepts of NonGuaranteed
(i.e. spotmarket) display advertising and advertising exchanges, then pose
and solve a constrained path optimization problem that lies at the heart of
the realtime ad serving task in Yahoo!'s NGD Ad Exchange.
In Yahoo!'s Exchange, the ad server's task for each display opportunity is
to compute, with low latency, an optimal valid path through a directed graph
representing the business arrangements between the numerous business
entities that belong to the Exchange. These entities include not only
publishers and advertisers, but also intermediate entities called ``ad
networks'' which have delegated their ad serving responsibilities to the
Exchange.
Path validity is determined by business constraints which focus on the
following three issues: 1) suitability of the display opportunity's web page
and its publisher 2) suitability of the user who is currently viewing that
web page, and 3) suitability of a candidate ad and its advertiser.
Path optimality is determined by money reaching the publisher, and is
affected not only by an advertiser's bid, but also by the revenuesharing
agreements between the entities in the candidate path.
We describe two different algorithms that have both been used in the actual
Yahoo! nonguaranteed ad serving system, focusing on typical case rather
than worst case performance, and on the optimalitypreserving speedup
heuristics that allow latency targets to be met.
The talk is based on a WSDM 2011 paper that was cowritten with Joaquin
Delgado, Dongming Jiang, Bhaskar Ghosh, Shirshanka Das, Amita Gajewar,
Swaroop Jagadish, Arathi Seshan, Chavdar Botev, Michael BinderbergerOrtega,
Sunil Nagaraj, and Raymie Stata.
Designing Algorithms for MapReduce
Sergei Vassilvitskii
The amount of data available today has lead to analysis problems whose size
would be unimaginable just 10 years ago. With datasets routinely measured in
gigabytes and terabytes, one challenge that has emerged is scaling algorithms
to handle this deluge of information. With chip designers sustaining Moore’s
law by turning to parallelism, so have algorithmicists turned to parallelism
to speed up algorithms and improve performance. In this talk, we will focus
on the MapReduce model of parallel computing, highlight some of the recent
algorithmic advances and present current challenges.
Based on joint work with Howard Karloff, Silvio Lattanzi, Ben Moseley and Siddharth Suri.
Incentivizing Highquality UserGenerated Content
Arpita Ghosh
We model the economics of incentivizing highquality user generated content
(UGC), motivated by settings such as online review forums, questionanswer
sites, and comments on news articles and blogs. We provide a gametheoretic
model within which to study the problem of incentivizing high quality UGC, in
which contributors are strategic and motivated by exposure. Our model has
the feature that both the quality of contributions {\em as well as} the extent
of participation is determined endogenously in a freeentry Nash equilibrium.
The model predicts, as observed in practice, that if exposure is independent
of quality, there will be a flood of low quality contributions in equilibrium.
An ideal mechanism in this context would elicit both high quality and high
participation in equilibrium, with nearoptimal quality as the available attention
diverges, and should be easily implementable in practice. We consider a very
simple elimination mechanism, which subjects each contribution to rating by
some number $A$ of viewers, and eliminates any contributions that are not
uniformly rated positively. We construct and analyze freeentry Nash equilibria
for this mechanism, and show that $A$ can be chosen to achieve quality that
tends to optimal, along with diverging participation, as the number of viewers
diverges.
Joint work with Preston McAfee.
Coevolution of Network Structure and Content
Lada Adamic
As individuals communicate, their exchanges form a dynamic network. A
time series analysis of both standard and novel network metrics can be
used to characterize a network's evolution. We demonstrate that network
structure alone can be highly revealing of the diversity and novelty of
the information being communicated in the network. Networks with a
higher conductance exhibit higher information entropy, while unexpected
network configurations are indicative of higher information novelty.
This is joint work with Liuling Gong, Edwin Teng, and Avishay Livne.
TBA
Damon Centola
TBA
BGP Safety with Spurious Updates  The Conditions of BGP Convergence
Martin Suchara
We explore BGP safety, the question of whether a BGP system converges to a stable routing, in
light of several BGP implementation features that have not been fully included in the previous
theoretical analyses. We show that Route Flap Damping, MRAI timers, and other intrarouter features
can cause a router to briefly send ?spurious? announcements of lesspreferred routes. We demonstrate
that, even in simple configurations, this shortterm spurious behavior may cause longterm divergence
in global routing. We then present DPVP, a general model that unifies these sources of spurious
announcements in order to examine their impact on BGP safety. In this new, more robust model of BGP
behavior, we derive a necessary and sufficient condition for safety, which furthermore admits an efficient
algorithm for checking BGP safety in most practical circumstances  two complementary results that have
been elusive in the past decade's worth of classical studies of BGP convergence in more simple models. We
also consider the implications of spurious updates for wellknown results on dispute wheels and safety under
filtering.
Joint work with Alex Fabrikant and Jennifer Rexford.
Predicting the Popularity of Online Content
Gabor Szabo
Online services that collect, categorize, and disseminate usergenerated
content have seemingly very different focus, user communities, and
content types. We present a study on the popularity of content in three
online services, Youtube, Twitter, and Digg. While videos on Youtube,
short messages in Twitter, and interesting articles on Digg do not
necessarily have much in common, it turns out that the patterns of
access to content on any of these services are very similar across the
sites. In particular, we present predictions for the longterm
popularity of content based on early observations, and moreover discuss
what role the social network plays in determining the success of a
submission.
BuyitNow or TakeaChance: A simple randomized auction mechanism
Elisa Celis
I will first discuss bid data from Microsoft’s AdECN platform, which
allows advertisers to target their bids. We find that the valuations
are irregular, making the commonly used Vickrey auction unsuitable for
extracting revenue. However, Myerson's optimal auction mechanism is
often unsuitable in practice.
I will present and discuss the benefits of a simple auction mechanism
(BINTAC) which extends the secondprice auction with reserve. This
mechanism is particularly effective in private value environments
where the distribution of valuations are irregular, and is truthful in
expectation. We use the AdECN data in counterfactual experiments that
suggest our BINTAC mechanism would improve revenue by 24% relative
to the mechanism currently in place.
Social Learning and the Dynamic Cavity Method
Yashodhan Kanoria
Social learning constitutes learning of behavior from interaction with
friends/neighbors on a network. This talk will focus on models of repeated
interaction, with agents 'voting' in a series of rounds on some issue of
interest. Votes in the initial round are based on 'private signals',
whereas votes in future rounds incorporate knowledge of previous votes
cast by friends.
We look at two different models of iterative learning. A very simple model
is 'majority dynamics' where agents choose their vote based on the majority
of neighbors' votes in the previous round. We analyze this model on regular
trees. At the other extreme is a iterative Bayesian learning: a fully
rational model introduced by Gale and Kariv (2003). We introduce new
algorithms for this model, challenging the widespread belief that learning
is computationally intractable in this setting. We develop a novel
technique  the 'dynamic cavity method', that serves as a key tool for
both models.
Based on joint work with Andrea Montanari (to appear in Ann. App. Prob.) and
Omer Tamuz (submitted).
The Complexity Ecology of Consensus Ranking
Matthias Mnich
Preference lists are extensively used in social science surveys and voting
systems to capture information about choice. In many such scenarios there
arises the need to combine the data represented by many lists into a single
list which reflects the opinion of the surveyed group as much as possible.
This yields a class of ranking aggregation problems, where we are given a
set of permutations (votes) over a set of alternatives (candidates), and
are asked for a permutation of the set candidates such that some objective
is minimized.Examples include Kemeny ranking and other consensus methods,
that find important applications in web search engines, email spam detection,
and user recommendation systems.
We present results on this topic from a multivariate complexity and algorithms
analysis, based on parameterized complexity.
(Joint work with Henning Fernau, Fedor Fomin, Daniel Lokshtanov, Geevarghese Philip
and Saket Saurabh)
Contribution Games in Social Networks
Elliot Anshelevich
We consider network contribution games, where each agent in a social
network has a budget of effort that it can contribute to different
collaborative projects or relationships. Depending on the contribution of
the involved agents a relationship will flourish or drown, and to measure
the success we use a reward function for each relationship. Every agent is
trying to maximize the reward from all relationships that it is involved
in. We consider pairwise equilibria of this game, and characterize the
existence, computational complexity, and quality of equilibrium based on
the types of reward functions involved. For example, when all reward
functions are concave, we prove that the price of anarchy is at most 2.
For convex functions the same only holds under some special but very
natural conditions. A special focus of the talk are minimum effort games,
where the reward of a relationship depends only on the minimum effort of
any of the participants. Finally, we show tight bounds for approximate
equilibria and convergence of dynamics in these games.
Liquidity in Credit Networks: A Little Trust Goes a Long Way
Pranav Dandekar
Credit networks represent a way of modeling trust between entities in
a network. Nodes in the network print their own currency and trust
each other for a certain amount of each other's currency. This allows
the network to serve as a decentralized payment infrastructure
arbitrary payments can be routed through the network by passing IOUs
between trusting nodes in their respective currenciesand obviates
the need for a common currency. Nodes can repeatedly transact with
each other and pay for the transaction using trusted currency. A
natural question to ask in this setting is: how long can the network
sustain liquidity, i.e. how long can the network support the routing
of payments before credit dries up? We answer this question in terms
of the long term success probability of transactions for various
network topologies and credit values.
We show that the success probability of transactions is independent of
the path used to route .ow between nodes. For symmetric
transaction rates, we show analytically and via simulations that the
success probability for complete graphs, ErdosRenyi graphs and
preferential attachment graphs goes to one as the size, the density or
the credit capacity of the network increases. Finally, we compare
liquidity in credit networks to that in a centralized payment
infrastructure and show that credit networks are at most a constant
factor worse; thus we do not lose much liquidity in return for their
robustness and decentralized properties.
Joint work with Ashish Goel, Ramesh Govindan and Ian Post.
User Browsing Models: Relevance versus Examination
Sugato Basu
There has been considerable work on user browsing models for search
engine results, both organic and sponsored. The clickthrough rate
(CTR) of a result is the product of the probability of examination
(will the user look at the result) times the perceived relevance of
the result (probability of a click given examination). Past papers
have assumed that when the CTR of a result varies based on the
pattern of clicks in prior positions, this variation is solely due
to changes in the probability of examination.
We show that, for sponsored search results, a substantial portion
of the change in CTR when conditioned on prior clicks is in fact
due to a change in the relevance of results for that query instance,
not just due to a change in the probability of examination. We then
propose three new user browsing models, which attribute CTR changes
solely to changes in relevance, solely to changes in examination
(with an enhanced model of user behavior), or to both changes in
relevance and examination. The model that attributes all the CTR
change to relevance yields substantially better predictors of CTR
than models that attribute all the change to examination, and does
only slightly worse than the model that attributes CTR change to
both relevance and examination. For predicting relevance, the model
that attributes all the CTR change to relevance again does better
than the model that attributes the change to examination. Surprisingly,
we also find that one model might do better than another in predicting
CTR, but worse in predicting relevance. Thus it is essential to
evaluate user browsing models with respect to accuracy in predicting
relevance, not just CTR.
Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
Daria Sorokina
Ensembles of regression trees are long known as powerful models suitable
for a large variety of tasks. The most wellknown ensembles are Random
Forests (averaging independent large, highly nonlinear trees) and Gradient
Boosting (additive model built from small trees). In the beginning of the talk
I will describe an ensemble that combines the benefits of both: Additive Groves
is based on regression trees, additive models and bagging and is capable of both
fitting additive structure of the problem and modelling its highly nonlinear
components with very large trees at the same time. Combination of these
properties makes Additive Groves superior in performance to other existing tree
ensemble methods.
Models with best performance often act as black boxes: they make good predictions,
but do not provide much insight into the decision making process. This is not
satisfactory for many data mining problems. Therefore separate postprocessing
techniques are needed to answer the questions about important features, their
effects and interactions.
The term statistical interaction is used to describe the presence of nonadditive
effects among two or more variables in a function. When variables interact, their
effects must be modeled and interpreted simultaneously. Thus, detecting statistical
interactions can be critical for an understanding of processes by domain
researchers. In this talk I will describe an approach to interaction detection
based on comparing the performance of unrestricted and restricted prediction models,
where restricted models are prevented from modeling an interaction in question. I
will show that Additive Groves has the required properties and can be used within
this framework.
I will also talk about several applications of these techniques to realworld data:
tracking environment change based on birds abundance and Salmonella risk prediction
on meatprocessing establishments. Apart from that, Additive Groves has been
successfull in several recent competitions: KDD Cup'09, ICDM'09 and Yahoo Learning
to Rank Challenge. The latter result is especially interesting, because Additive
Groves was the only winning solution that did not optimize for any specific ranking
metrics.
Algorithms presented in the talk are implemented as a part of an open source C++
package available at http://additivegroves.net
At the end of the talk I will make a short presentation about Yandex and its
California office (http://yandexlabs.com). We are hiring!
Reflections on Linkbased Ranking
Marc Najork
This talk focuses on using hyperlinks in the ranking of web search results. We give
a brief overview of the vast body of work in the area; we provide a quantitative
comparison of the different features; we sketch how linkbased ranking features can
be implemented in largescale search engines; and we identify promising avenues for
future research.
Output URL Bidding
Panagiotis Papadimitriou
Output URL bidding is a new bidding mechanism for sponsored search,
where advertisers bid on search result URLs, as opposed to keywords
in the input query. For example, an advertiser may want his ad to
appear whenever the search result includes the sites www.imdb.com
and en.wikipedia.org, instead of bidding on keywords that lead to
these sites, e.g., movie titles, actor names, and so on. In this
paper we study the tradeoff between the simplicity and the
specification power of output bids and we explore their utility for
advertisers. We first present a model to derive output bids from
existing keyword bids. Then, we use the derived bids to experimentally
study output bids and contrast them to input query bids. Our main
results are the following: (1) Compact output bids that mix both URLs
and hosts have the same specification power as more lengthy input bids;
(2) Output bidding can increase the recall of relevant queries; and
(3) Output and input biding can be combined into a hybrid mechanism
that combines the benefits of both.
A Technical Overview of Search Ads Quality at Google
Roberto Bayardo
Abstract: Ads quality involves optimizing the oftentimes competing
interests of Google, the users of its search engine, and advertisers
who pay to have their ads displayed on search results pages. In this
talk I describe the primary problems faced by the Google ads quality
team, and some of the general approaches we take to address them. My
goal of the talk is to introduce others to problems of interest beyond
the better studied problems of click through estimation and mechanism
design. The topics covered will include keyword targeting & bid
specification, budget allocation, user based ads quality, ad formats,
when to show an ad and where, and assessing quality of ad creatives,
their landing pages, and the entire "post click" experience.
Bio: Roberto Bayardo is a principal research scientist at Google,
where he has been working within the ads quality and ads serving teams
for the past 5 years. Roberto currently leads a team working on
quality modeling, with the broad goal of ensuring that ads displayed
alongside Google search provide a good user experience.
On the Competitive Ration of Online Sampling Auctions
Georgios Pierrakos
We study online profitmaximizing auctions for digital goods with adversarial
bid selection and uniformly random arrivals. Our goal is to design auctions
that are constant competitive with F^(2); in this sense our model lies at the
intersection of priorfree mechanism design and secretary problems. We first
give a generic reduction that transforms any offline auction to an online one,
with only a loss of a factor of 2 in the competitive ratio; we then present
some natural auctions, both randomized and deterministic, and study their
competitive ratio; our analysis reveals some interesting connections of one
of these auctions with RSOP, which we further investigate in our final section.
The importance of ties
Paolo Parigi
For sociologists, the ties that join individuals into relationships are the
result of social processes rather than being entities themselves. A social
network is therefore a representation for something that happens between
individuals and that brings people together. Sometimes, however, social
processes sever ties and produce an increase in the distance that separates
individuals, ultimately generating isolation. In my talk I will first
provide a few examples of social processes that generate ties and then I
will advance a theoretical scenario for studying processes that instead
decrease the number of ties. The positive examples of tie generation come
from my work in the fields of political sociology and organization theory.
The negative scenario of increasing isolation comes from my work on social
network analysis. In presenting these examples, my goal is to stimulate the
conversation between social scientists and computer scientists in the field
of network analysis.
How opinions are received by online communities: a case study on amazon.com helpfulness votes
Gueorgi Kossinets
There are many online settings in which users publicly express opinions. A
number of these offer mechanisms for other users to evaluate these opinions;
a canonical example is Amazon.com, where reviews come with annotations like
"26 of 32 people found the following review helpful." Opinion evaluation
appears in many offline settings as well, including market research and
political campaigns. Reasoning about the evaluation of an opinion is
fundamentally different from reasoning about the opinion itself: rather than
asking, "What did Y think of X?", we are asking, "What did Z think of Y's
opinion of X?" Here we develop a framework for analyzing and modeling opinion
evaluation, using a largescale collection of Amazon book reviews as a
dataset. We find that the perceived helpfulness of a review depends not just
on its content but also but also in subtle ways on how the expressed
evaluation relates to other evaluations of the same product. As part of our
approach, we develop novel methods that take advantage of the phenomenon of
review "plagiarism" to control for the effects of text in opinion evaluation,
and we provide a simple and natural mathematical model consistent with our
findings. Our analysis also allows us to distinguish among the predictions of
competing theories from sociology and social psychology, and to discover
unexpected differences in the collective opinionevaluation behavior of user
populations from ifferent countries.
Reaching Consensus on Social Networks
Grant Schoenebeck
Research in sociology studies the effectiveness of social networks in achieving computational tasks.
Typically the agents who are supposed to achieve a task are unaware of the underlying social network
except their immediate friends. They have limited memory, communication, and coordination. These
limitations result in computational obstacles in achieving otherwise trivial computational problems.
One of the simplest problems studied in the social sciences involves reaching a consensus among players
between two alternatives which are otherwise indistinguishable.
In this paper we formalize the computational model of social networks. We then analyze the consensus
problem as well as the problem of reaching a consensus which is identical to the majority of the
original signals. In both models we seek to minimize the time it takes players to reach a consensus.
Webscale Information Extraction Using Wrappers.
Nilesh Dalvi
A significant portion of web pages embed interesting and valuable semantic
content suitable for structured representation. The traditional Information
Extraction techniques, however, fall short of achieving high quality
extraction at Web scale. In this talk, I will outline some of the work going
on at Yahoo! Research on addressing the challenges of Information Extraction
on a Web scale. I will focus on {\em wrapper} based techniques, which
exploit the HTML structure of websites to extract the information of
interest. I will address two problems : (i) making wrappers more robust to
changes in websites, and (ii) enabling learning of wrappers from
automatically obtained noisy training data.s
Characterization and Modeling of Online Browsing Activity.
Andrew Tomkins
I will describe a set of studies of online user behavior based on search and
toolbar logs. In these studies, we propose a new "CCS" taxonomy of pageviews
consisting of Content (news, portals, games, verticals, multimedia),
Communication (email, social networking, forums, blogs, chat), and
Search (web search, item search, multimedia search). We show that roughly
half of all pageviews online are content, 1/3 are communications, and the
remaining 1/6 are search. We study the interarrival distribution of visits
to a page by a single user and in aggregate, and show some evidence that "bursty"
behavior in which a URL attains significant transient popularity for a few days
is not a significant contributor to interarrival times. We also study the nature
and frequency of search queries in more detail, differentiating between web search,
item search, and multimedia search, and characterizing the prevalence of references
to structured objects in web search queries.
Navigation patterns in this data are not well characterized as trails,
but should instead be viewed as trees due to the presence of multitab
or multiwindow browsing. We present a model of tabbed browsing that
represents a hybrid between a markov process capturing the graph of
web pages, and a branching process capturing the creation, splitting,
and dying of tabs. We present a mathematical criterion to characterize
whether the process has a steady state independent of initial conditions,
and we show how to characterize the limiting behavior in both cases.
We perform a series of experiments to compare our tabbed browsing model
with the simpler random surfer model.
Joint work with Ravi Kumar and Flavio Chierichetti.
User grouping behavior in online forums.
Xiaolin Shi
Online forums represent one type of social media that is particularly rich for
studying human behavior in information seeking and diffusing. The way users join
communities is a reflection of the changing and expanding of their interests
toward information. In this talk, I present the work studying the patterns of
user participation behavior, and the feature factors that influence such behavior
on different forum datasets. In this work, we find that, despite the relative
randomness and lesser commitment of structural relationships in online forums,
users' community joining behaviors display some strong regularities. One
particularly interesting observation is that the very weak relationships
between users defined by online replies have similar diffusion curves as those
of real friendships or coauthorships. We build social selection models,
Bipartite Markov Random Field (BiMRF), to quantitatively evaluate the prediction
performance of those feature factors and their relationships. Using these models,
we show that some features carry supplementary information, and the effectiveness
of different features vary in different types of forums. Moreover, the results of
BiMRF with twostar configurations suggest that the feature of user similarity defined
by frequency of communication or number of common friends is inadequate to predict
grouping behavior, but adding nodelevel features can improve the fit of the model.
Understanding Choice Intensity: A Poisson Mixture Model with Logitbased Random Utility Selective Mixing.
Matthew Harding
In this paper we introduce a new Poisson mixture model for count panel data where the underlying
Poisson process intensity is determined endogenously by consumer latent utility maximization over
a set of choice alternatives. This formulation accommodates the choice and count in a single random
utility framework with desirable theoretical properties. Individual heterogeneity is introduced
through a random coefficient scheme with a flexible semiparametric distribution. We deal with the
analytical intractability of the resulting mixture by recasting the model as an embedding of infinite
sequences of scaled moments of the mixing distribution, and newly derive their cumulant representations
along with bounds on their rate of numerical convergence. We further develop an efficient
recursive algorithm for fast evaluation of the model likelihood within a Bayesian Gibbs sampling
scheme, and show posterior consistency. We apply our model to a recent household panel of supermarket
visit counts. We estimate the nonparametric density of three key variables of interest
 price, driving distance, and their interaction  while controlling for a range of consumer demographic
characteristics. We use this econometric framework to assess the opportunity cost of time
and analyze the interaction between store choice, trip frequency, search intensity, and household
and store characteristics. We also conduct a counterfactual welfare experiment and compute the
compensating variation for a 10% to 30% increase in Walmart prices.
Deconstructing A Long Tail Commerce Network.
Neel Sundaresan
In this talk we discuss modeling a complex long tail commerce network in the form of the eBay Marketplace graph.
The asymmetric and preferential nature of buying and selling in terms of structure, interactions, trust and
reputation measures and evolution of these has tremendous value in identifying business opportunities and
building effective user applications.The discussion focuses on the analysis of the macroscopic shape
of the networks across the platform and also
in specific marketplace categories. The results suggest significant diversity across these individual
marketplaces from collectors network to retail network. We also discuss motif profiling and preferential
connections. Means for defining trust measures based on network properties will also be addressed.
Nod vs. HighFive: LinkedIn Connection Strength Models and Applications.
Monica Rogati
The low cost of link formation on social and professional networking sites leads to connections with heterogeneous
relationship strengths, a LinkedIn user might be connected to both professional acquaintances and close collaborators.
Modeling this relationship as nonbinary is more realistic (and more useful in personalizing a user's experience)
than the binary links that have been the focus of previous work in the social networking space. We present an unsupervised,
linkbased latent variable model to estimate relationship strength from interaction
activity (e.g., communication, browsing, comments) and profile content similarity. We also discuss practical
applications of modeling connection strength when building data driven products that enhance the LinkedIn user
experience and engagement. (This is joint work with Rongjing Xiang during her internship at LinkedIn.)
Network of positive and negative edges
Jure Leskovec
Relations between users on social media sites often reflect a mixture
of positive (friendly) and negative (antagonistic) interactions. We
study how the interplay between positive and negative
relationships affects the structure of online social networks,
whereas in contrast the bulk of research on social networks to date
has focused almost exclusively on positive interpretations of links
between people. We connect our analyses to theories of signed
networks from social psychology. In doing so we find that the
classical theory of structural balance tends to capture certain common
patterns of interaction, but that it is also at odds with some of the
fundamental phenomena we observe  particularly related to the
evolving, directed nature of these online networks. We then develop
an alternate theory of status that better explains the observed edge
signs and provides insights into the underlying social mechanisms. Our
work provides one of the first largescale evaluations of theories of
signed networks using online datasets, as well as providing a
perspective for reasoning about social media sites.
Joint work with Daniel Huttenlocher and Jon Kleinberg.
The paper can be accessed here.
Social Networks and Stable Matchings in the Job Market
Esteban Arcaute
In this talk we introduce and study a model that considers the job market as a twosided matching market,
and accounts for the importance of social contacts in finding a new job. We assume that workers learn only
about positions in firms through social contacts. Given that information structure, we study both static
properties of what we call locally stable matchings, a solution concept derived from stable matchings,
and dynamic properties through a reinterpretation of GaleShapley's algorithm as myopic best response dynamics.
We prove that, in general, the set of locally stable matching strictly contains that of stable matchings
and it is in fact NPcomplete to determine if they are identical. We also show that the lattice structure
of stable matchings is in general absent. Finally, we focus on myopic best response dynamics inspired by
the GaleShapley algorithm. We study the efficiency loss due to the informational constraints, providing
both lower and upper bounds.
This is joint work with Sergei Vassilvitskii, Ramesh Johari and David LibenNowell.
Some of the results presented appeared in WINE 2009.
Temporal User Behavior in Yahoo Answers
Christina Aperjis
When searching for an answer to a question, people generally prefer to get
both high quality and speedy answers. The fact that it usually takes longer
to find a better answer creates a speedaccuracy tradeoff which is inherent
in information seeking. A natural setting to study this tradeoff is Yahoo
Answers, a communitydriven questionandanswer site. We identify and
quantify how question authors trade off the number of answers they receive
and the cost of waiting. We find that users are willing to wait more to
obtain an additional answer when they have only received a small number of
answers; this implies decreasing marginal returns in the amount of collected
information. We also estimate the user's utility function from the
data.
This is joint work with Bernardo A. Huberman and Fang Wu.
The Netflix Prize: Quest for $1,000,000
Yehuda Koren
The collaborative filtering approach to recommender systems predicts
user preferences for products or services by learning past useritem
relationships. Their significant economic implications made
collaborative filtering techniques play an important role at known
etailers such as Amazon and Netflix. This field enjoyed a surge of
interest since October 2006, when the Netflix Prize competition was
commenced. Netflix released a dataset containing 100 million anonymous
movie ratings and challenged the research community to develop
algorithms that could beat the accuracy of its recommendation system,
Cinematch. In this talk I will survey the competition together with some
of the principles and algorithms, which have led us to winning the Grand
Prize in the competition.
Data Jujitsu  the art of treating data as a product
DJ Patil
LinkedIn is the premiere professional social network with over 60 million users
and a new user joining every second. One of LinkedIn's strategic advantages
is their unique data. While most organizations consider data as a service function,
LinkedIn considers data a cornerstone of their product portfolio. This emphasis on
data as a product led LinkedIn to be the first social network site to deploy a
People You May Know service. Other data products that LinkedIn has built include
Who Viewed My Profile, People Who Viewed This Profile Also Viewed..., the backend
standardization algorithms that power search, and matching algorithms. To rapidly
develop these products LinkedIn leverages a number of technologies including open
source, 3rd party solutions, and some we've had to invent along the way.
In this presentation, we will talk about our methodology for turning data into user
facing products and some of the major challenges we see in the data space.
Querylog analysis for improving web search
Panayiotis Tsaparas
Search engines log the activity of users when posing queries and interacting with the results.
Query click logs maintain information about what queries were asked, what results were
shown and which ones were clicked as a result of the query. This information offers a
valuable insight in the intent of the user and the need behind the query, and it has
been the focus of intense research activity.
In this talk, we will describe algorithms for querylog analysis.
First we will talk about an algorithm that uses query logs for generating advertising keywords
and understanding the category of a query. Time permitting, we will then describe
an algorithm for generating training data for learning a ranking function.
In both cases we model the query log as a graph, and we formulate our problem as an optimization
problem over these graphs.
Stochastic Submodular Optimization
Arash Asadpour
We study stochastic submodular maximization problem with respect to cardinality
and matroid constraints. Our model can capture the effect of uncertainty in
different problems, such as cascade effects in social networks, capital budgeting,
sensor placement, etc. We study nonadaptive and adaptive policies and give
"optimal" constant approximation algorithms for both cases. We also show that
the adaptivity gap in this setting is 1.59.
Compressibility of Behavioral Graphs
Ravi Kumar
Graphs resulting from human behavior (the web graph, social networks,
etc.) have hitherto been viewed as a monolithic class with similar
characteristics. Our focus here is on the compressibility of such
graphs. It has been empirically shown that Web graphs can be compressed
down to three bits of storage per edge. In light of this, we address
two basic questions: are social networks as compressible as Web graphs
and are there tractable and realistic models that yield highly
compressible graphs?
Deanonymizing Social Networks
Arvind Narayanan
Operators of online social networks are increasingly sharing potentially
sensitive information about users and their relationships with advertisers,
application developers, and datamining researchers. Privacy is typically
protected by anonymization, i.e., removing names, addresses, etc.
We present a framework for analyzing privacy and anonymity in social networks and develop a new reidentification
algorithm targeting anonymized socialnetwork graphs. To demonstrate its effectiveness on realworld networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and
Flickr, an online photosharing site, can be reidentified in the anonymous Twitter graph with only a 12% error rate.
Our deanonymization algorithm is based purely on the network topology, does not require creation of a large
number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.
Bio:

Arvind Narayanan is a postdoctoral researcher working with Dan Boneh. He recently finished his Ph.D at the University of Texas at Austin, advised by Vitaly Shmatikov. His research is on the privacy and anonymity issues involved in publishing largescale datasets about people. His thesis, in a sentence, is that the level of anonymity that society expects  and companies claim to provide  in published databases is fundamentally unrealizable. He blogs about his anonymitybreaking efforts and other research at http://33bits.org/.
Eliciting Information on the Distribution of Future Outcomes
Nicolas Lambert
This paper studies the problem of inducing a presumably knowledgeable expert to reveal true information regarding the probability distribution of a future random outcome. The information being considered is quite general, and includes
in particular the statistics of the probability distribution (such as mean, median, variance), and categorical information (such as the most correlated pair of variables). I examine two types of incentive schemes: Those that reward
the expert for being truthful, and, for the case of numerical information, those that reward the expert increasingly with the accuracy of the prediction. For both cases, I establish simple criteria to determine the information that
can be elicited with such schemes, and offer a complete characterization of the associated schedule fee.
Optimizing Web Traffic via the Media Scheduling Problem
Lars Backstrom
Website traffic varies through time in consistent and predictable ways, with highest traffic in the middle of the day. When providing media content to visitors, it is important to present repeat visitors with new content so that they
keep coming back. In this paper we present an algorithm to balance the need to keep a website fresh with new content with the desire to present the best content to the most visitors at times of peak traffic. We formulate this as the
me dia scheduling problem, where we attempt to maximize total clicks, given the overall traffic pattern and the time varying clickthrough rates of available media content. We present an efficient algorithm to perform this scheduling
under certain conditions and apply this algorithm to real data obtained from server logs, showing evidence of significant improvements in traffic from our algorithmic sched ules. Finally, we analyze the click data, presenting models
for why and how the clickthrough rate for new content declines as it ages.
Behavioral Experiments in Strategic Networks
Michael Kearns
For four years now, we have been conducting "mediumscale" experiments in how human subjects behave in strategic and economic settings mediated by an underlying network structure. We have explored a wide range of networks inspired
by generative models from the literature, and a diverse set of collective strategic problems, including biased voting, graph coloring, consensus, and networked trading. These experiments have yielded a wealth of both specific findings
and emerging general themes about how populations of human subjects interact in strategic networks. I will review these findings and themes, with an emphasis on the many more questions they raise than answer.
How Social Network Structure affects Diffusion and Learning
Matthew Jackson
We examine how diffusion and learning processes operating through social networks are affected by homophily (the tendency of individuals to associate with others similar to themselves) and other network characteristics. Homophily
has no effect in pure diffusion processes where only shortest paths matter; only connection density matters. In contrast, homophily substantially slows learning based on repeated averaging of neighbors' information, while connection
density has no effect.
How to Design Profitable Auctions
Paul Valiant
This talk is a highlevel overview of results from two years of work with Silvio Micali and Jing Chen on a question that may best be phrased as: what is the best way to sell many items to many players? The standard formalizations
of this question were fleshed out in the 1960's and led to answers that are unsatisfactory in practice, leading to today's Ebay that sells items by the auction method known to the ancient Greeks. In this talk we present new formulations
of this question, elicited by fundamental notions from computer science: competitive analysis, collusion, and "knowledge". We present several concrete (and nearoptimal) auctions in these models, along with many open questions.
Sort me if you can (or, Algorithms on Dynamic Data)
Mohammad Mahdian
We formulate and study a new computational model for dynamic data. In this model the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint
that it only has a limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in
algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance
close to the optimal in expectation and with high probability. If time permits, we will discuss new results on graph algorithms in our framework.
This is based on joint work with Aris Anagnostopoulos, Ravi Kumar, and Eli Upfal.
Discovery & Emergence
Abdur Chowdhury
Often as computer scientists we focus on faster algorithms, such as approximations of solutions in linear time over large data sets or similar problems. Rather than focus on algorithms in this talk, we ask the question "What possibilities
emerge from surfacing the world's conversations to others". Specifically we explore Twitter Trends as a discovery tool and show how awareness of the thoughts of others can cause the emergence of new behaviors.
Bio:

Dr. Abdur Chowdhury serves as Twitter's Chief Scientist. Prior to that Dr. Chowdhury cofounded Summize a realtime search engine sold to Twitter in 2008. Dr Chowdhury has held positions at AOL as their Chief Architect for Search,
Georgetown's Computer Science Department and University of Maryland's Institute for Systems Research. His research interest lay in Information Retrieval focusing on making information accessible.
Internet search advertising: Challenges in targeting and quality considerations
Mayur Datar
In this informal talk, I will touch upon two topics related to search advertising:
Targeting: How do you match advertiser keywords to user queries? What are the underlying challenges in bridging this semantic gap?
How do you measure the quality of search ads? Is it in the best interest of all parties
involved, users, advertisers and the search engine, to always show ads for queries? What are the costs and benefits to showing ads and how do you decide on when is it appropriate to show ads?
Look forward to more questions
than answers! Hopefully these will lead to good research topics.
Bio:

Mayur Datar works as a Research Scientist with Google Inc. His research interests are in datamining, algorithms, databases and computer science theory. Prior to joining Google, Mayur obtained his doctorate degree in computer science
from Stanford university and a Bachelor of Technology degree from I.I.T. Bombay. He was awarded the President of India, Gold Medal for being the most outstanding student of his graduating batch from I.I.T. Bombay. He has published
several papers in renowned conferences like SIGMOD, VLDB, KDD, FOCS, SODA, WWW.
Affiliation Networks
D. Sivakumar
In the last decade, structural properties of several naturally arising networks (the Internet, socialnetworks, the web graph, etc.) have been studied intensively with a view to understanding their evolution. In recent empirical work,
Leskovec, Kleinberg, and Faloutsos identify two new and surprising properties of the evolution of many realworld networks: densification (the ratio of edges to vertices grows over time), and shrinking diameter (the diameter reduces
over time to a constant). These properties run counter to conventional wisdom, and are certainly inconsistent with graph models prior to their work.
In this work, we present the first model that provides a simple, realistic,
and mathematically tractable generative model that intrinsically explains all the wellknown properties of the social networks, as well as densification and shrinking diameter. Our model is based on ideas studied empirically in the
social sciences, primarily on the groundbreaking work of Breiger (1973) on bipartite models of social networksthat capture the affiliation of agents to societies.
We also present algorithms that harness the structural consequences
of our model. Specifically, we show how to overcome the bottleneck of densification in computing shortest paths between vertices by producing sparse subgraphs that preserve or approximate shortest distances to all or a distinguished
subset of vertices. This is a rare example of an algorithmic benefit derived from a realistic graph model.
The talk will review the history of relevant random graph models, and attempt to illustrate the coevolution of theoretical
modeling and empirical insights in this fascinating slice of graph theory.
The talk reports work done jointly with Silvio Lattanzi of Sapienza University, Roma, Italy, and is to appear in STOC 2009.
Using Bandit Models in Search Engines
Sandeep Pandey
Web search engines perform two basic tasks: they acquire Web pages via crawling, and present lists of pages and advertisements in response to user search queries. Given the gigantic size of the Web these tasks can be viewed as constrained
optimization problems. On the acquisition side the constraint is on the crawling resources, while on the presentation side, the constraint is on user attention: users cannot be expected to sift through large amounts of content. Furthermore,
some parameters needed to solve these optimization problems may or may not be known leading to the offline and online versions of the problems. In this talk we will look at these optimization problems.
In particular, we will
focus on the online presentation of advertisements where the unknown parameters are the ad quality values. I will present a method of simultaneously estimating unknown parameters and exploiting current estimates. Our method extends
and adapts wellknown "Bandit" models from Machine Learning.
Bio:

Sandeep Pandey is a Research Scientist in Search and Advertising group at Yahoo! Research. Before joining Yahoo!, he received his Ph.D. from Carnegie Mellon and his B.Tech. from IIT Bombay. His research interests include Web crawling
and advertising. He also likes to work on online learning problems such as the multiarmed bandit problem. In his spare time Sandeep enjoys watching cricket and playing tennis.
Research Challenges in Online Advertising
Richard Chatwin
VP Research, Adchemy
Adchemy is leading a revolution in online performance marketing, both as a top25 U.S. online advertiser and through the development of the Adchemy Digital Marketing Platform, an integrated suite of products that allows online marketers
to maximize their customers' endtoend experience. I will discuss some of the research challenges that we are tackling in developing our technology solutions, including problems related to online matching, multiarmed bandits, and
the optimizer's curse.
Bio:

Richard has been VP Research at Adchemy since its inception in late 2004. He has over nine years of experience in the online advertising industry. He received his PhD from the Operations Research department at Stanford, where
he focused on airline revenue management. He spent several years in the consulting world pioneering the development and implementation of real option valuation methodologies for asset valuation.
Scaling Social Data
Robert Johnson
Director of Engineering, Facebook
Facebook serves billions of pages a day based on hundreds of terabytes of dynamic data, which presents many scaling challenges for our storage and caching systems. In particular the social nature of the data gives it some interesting
properties  it is not easily clustered or partitioned by users or applications, it updates frequently, and many queries require massive network fanin. I'll be discussing the problems that arise in scaling such a system and some of
the solutions we've used to get to our current scale, as well as our plans for infrastructure to solve these problems at even larger scale and in a more general way for future web 2.0 applications with similar properties.
Bio:

Robert Johnson is Director of Engineering at Facebook, where he leads the software development efforts to costeffectively scale Facebook's infrastructure and optimize performance for its many millions of users. During his time
with the company, the number of users has expanded by more than twentyfold and Facebook now handles billions of page views a day. Robert was previously at ActiveVideo Networks where he led the distributed systems and settop software
development teams. He has worked in a wide variety of engineering roles from robotics to embedded systems to web software. He received a B.S. In Engineering and Applied Science from Caltech.
Data Cloud: Integrating Structured Information into Web Search
Yury Lifshits
In this talk we address two questions: (1) How to use structured data in web search? (2) How to gather structured data? For the first question we identify valuable classes of data, present query classes that can benefit from structured
data and describe architecture that combines keyword search with structured search. For the second question we present Data Cloud: An ecosystem of data publishers, search engine (data cloud) and data consumers.
We show connection
from Data Cloud strategy to classic notion in economics: network effect in twosided markets. At the end of the talk an early demo implementation will be presented.
Bio:

Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He spent a year as a postdoc at Caltech before joining Yahoo! Research in 2008. He won two gold medals at International Mathematical
Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":
http://simsearch.yury.name
Manipulation Robustness of Collaborative Filtering Systems
Robbie Yan
While the expanding universe of products available via Internet commerce provides consumers with valuable options, sifting through the numerous alternatives to identify desirable choices can be challenging. Collaborative filtering
systems aid this process by recommending to users products desired by similar individuals. Because purchase decisions are influenced by these systems, however, they have become targets of manipulation by unscrupulous vendors. In this
talk, we will discuss our findings on the robustness of collaborative filtering systems to manipulation. In particular, we provide theoretical and empirical results demonstrating that while nearest neighbor algorithms, which are widely
used in commercial systems, can be highly susceptible to manipulation, two classes of collaborative filtering algorithms which we refer to as linear and asymptotically linear are relatively robust. In particular, we find that as a
user rates an increasing number of products, the average accuracy of predictions made by these algorithms becomes insensitive to manipulated data. Our results provide guidance for the design of future collaborative filtering systems.
Scalable image recognition algorithms for camera phone applications
G.D. Ramkumar, SnapTell, Inc.
SnapTell has created technology that allows a mobile phone user to snap a picture of a product and instantly receive reviews, prices, and information. The service has been scaled to support more than ten million products including
all books, DVDs, CDs, video games, and other product categories. We present an outline of the algorithms used to power the service emphasizing scaling and locality sensitive indexing. SnapTell's popular iPhone application has been
downloaded by over half a million users over the last few weeks.
The technology can also be used to make print advertisements in a magazine, newspaper, poster, or billboard interactive without adding barcodes or otherwise altering
the content. Consumers snap a picture of the object and send it to a shortcode to enter into a contest, access a mobile website, or download ringtones, wallpaper, video clips, or other information. SnapTell is the leading vendor in
this space and offers a solution with unprecedented scale and robustness.
We present a synopsis of the image matching algorithms used to power the service and discuss potential research topics in the space.
ContentDriven Reputation and Trust for Collaborative Content
Luca de Alfaro (UCSC, visiting at Google)
Open collaboration is one of the most effective ways to create content and aggregate information, as witnessed by the Wikipedia. In this talk, we propose a reputation systems for the authors of collaborative information, and a trust
systems for the aggregated information.
The goal of the reputation system is to provide an incentive to collaboration, and to provide information about the quality of authors. We propose a contentdriven notion of reputation,
where authors of longlasting contributions gain reputation, while authors of contributions that are not preserved lose reputation. We show that, on the Wikipedia, the contentdriven reputation of authors is a good predictor of the
quality of their future contributions. We then describe a trust system for content, where the trust in a portion of content is computed according to the reputations of the authors who created and revised it. Again, we show that on
the Wikipedia content trust is a good predictor of future content longevity. Finally, we describe how the reputation and trust systems can be made robust to Sybil attacks, in which people use multiple identities to try boost their
reputation and raise the trust of content of their choice.
The reputation and trust systems have been implemented in the
WikiTrust system/.
Bio:

Luca de Alfaro received his Ph.D. degree from Stanford University, where he worked on system specification and verification under the supervision of Zohar Manna. After graduation, he was for three years at UC Berkeley as a postdoctoral
researcher, working with Thomas A. Henzinger on verification and on game theory. In 2001, Luca de Alfaro joined the faculty of UC Santa Cruz. His research interests involve system specification and verification, embedded software,
game theory, incentive systems for online collaboration, and trust and reputation on the web. Luca de Alfaro is currently at Google as a visiting scientist, on leave from UC Santa Cruz.
Luca de Alfaro has long been an enthusiastic
user of online collaborative tools, including wikis. One evening in 2005, after working at the wiki he uses to facilitate information sharing among members of his research group, he decided to start a wiki on cooking, which is one
of his passions. The wiki was soon peppered with spam and less than useful contributions. Rather than cleaning up, a most tedious job, Luca de Alfaro shut off the wiki temporarily, and he started pondering how to provide incentives
to collaboration, and compute quality metrics for content. As he found the job of rating authors or contributions by hand most tedious, he was interested in methods for computing reputation and quality metrics purely from content analysis.
The research led to the WikiTrust system/. The wiki on cooking has been brought back to life (http://www.cookiwiki.org), even though it receives little
traffic. You are welcome to register, add recipes, and see WikiTrust in action.
Tagging with Queries: How and Why? (To appear in WSDM 2009)
Ioannis Antonellis
Web search queries capture the information need of search engine users. Search engines store these queries in their logs and analyze them to guide their search results.
In this work, we argue that not only a search engine
can benefit from data stored in these logs, but also the web users. We first show how clickthrough logs can be collected in a distributed fashion using the http referer field in web server access logs. We then perform a set of experiments
to study the information value of search engine queries when treated as "tags" or "labels" for the web pages that both appear as a result and the user actually clicks on. We ask how much extra information these query tags provide for
web pages by comparing them to tags from the del.icio.us bookmarking site and to the pagetext. We find that query tags can provide substantially many (on average 250 tags per URL), new tags (on average 125 tags per URL are not present
in the pagetext) for a large fraction of the Web.
Joint work with Hector GarciaMolina and Jawed Karim
Content Targeting and Recommendation Systems
Scott Brave (Baynote, Inc)
Baynote delivers ondemand recommendation technology and social search for over 200 websites, including Motorola, NASA, Intuit, and Expedia. By tapping into the wisdom of the online community and automatically displaying the best
content and products, Baynote's Collective Intelligence Platform (CIP) improves the user experience while increasing online revenue, leads and impressions. In this talk, I will present Baynote's unified approach to content targeting
and discuss some of the opportunities and challenges in building nextgeneration recommendation systems.
Bio:

Scott Brave is a cofounder and CTO at Baynote, Inc. Prior to Baynote, he was a postdoctoral scholar in the Department of Communication at Stanford University where he served as lab manager
for the CHIMe (Communication between Humans and Interactive Media) Lab. Scott received his Ph.D. in HumanComputer Interaction (Communication Dept.), and B.S. in Computer Systems Engineering (AI focus) from Stanford University, and
his M.S. from the MIT Media Lab. Scott is also a former Associate Editor of the "International Journal of HumanComputer Studies" and coauthor of
Wired for speech: How voice activates and advances the
humancomputer relationship.
Online Social Networks: Modeling and Mining
Ravi Kumar (Yahoo! Research)
Online social networks have become major and driving phenomena on the web. In this talk we will address key modeling and algorithmic questions related to large online social networks. From the modeling perspective, we raise the question
of whether there is a generative model for network evolution. The availability of timestamped data makes it possible to study this question at an extremely fine granularity. We exhibit a simple, natural model that leads to synthetic
networks with properties similar to the online ones. From an algorithmic viewpoint, we focus on data mining challenges posed by the magnitude of data in these networks. In particular, we examine topics related to influence and correlation
in user activities in such networks.
Bio:

Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a research staff member at the IBM Almaden Research Center in the Computer Science Principles and Methodologies group. He obtained his PhD in Computer Science
from Cornell University in December 1997. His primary interests are web algorithms, algorithms for large data sets, and theory of computation.
Community Structure in Large Social and Information Networks
Michael Mahoney (Stanford University)
The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics,
a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful
communities in large realworld networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology uses local spectral methods and involves computing isoperimetric properties
of the networks at various size scales  a novel application of ideas from statistics and scientific computation to internet data analysis. Our empirical results suggest a significantly more refined picture of community structure
than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible
communities gradually ``blend in'' with the rest of the network and thus become less ``communitylike.'' This behavior is not explained, even at a qualitative level, by any of the commonlyused network generation models. Moreover,
this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are wellembeddable in a lowdimensional structure, and from small social networks that have
served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis
in modern large social and information networks.
Bio:

Michael Mahoney is currently a research scientist in the math department at Stanford University. He is currently working on geometric network analysis; developing approximate computation and regularization methods for large informatics
graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. His research interests also include randomized algorithms for matrices and regression problems, applications
to DNA single nucleotide polymorphism data, and largescale statistical data analysis more generally. He has been a faculty member at Yale University and a researcher at Yahoo.
An IncentiveBased Architecture for Social Recommendations
Kostas Kollias
Recommender systems and social networking services are important Internet monetization tools and their popularity greatly facilitates webbased commercial activity. In this talk we will focus on how these tools can be coupled in order
to form an architecture that incentivizes users to provide honest recommendations in a social network. The architecture consists of one distinct reputation system for each user, a recommendation language which allows users to place
their reviews, a product ranking algorithm, and a revenue sharing mechanism. We will discuss the details of the architecture, the properties that allow users to profit by placing honest recommendations and how fast these recommendations
lead to an equilibrium state.
Joint work with Rajat Bhattacharjee and Ashish Goel
Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction
Ali Dasdan
Web search is an indispensable resource for users to search for information on the Web. Web search is also an important service for publishers and advertisers to present their content to users. Thus, user satisfaction is the key and
must be quantified. In this talk, our goal is to give a practical review of web search metrics from a user satisfaction point of view. This viewpoint is intuitive enough for a typical user to express his or her web search experience
using it. The metrics that we are planning to cover fall into the following areas: relevance, coverage, comprehensiveness, diversity, discovery freshness, content freshness, and presentation. We will also describe how these metrics
can be mapped to proxy metrics for the stages of a generic search engine pipeline. Our hope is that practitioners can apply these metrics readily and that researchers can get problems to work on, especially in formalizing and refining
metrics.
This short talk is based on a halfday tutorial that the presenter with Kostas Tsioutsiouliklis and Emre Velipasaoglu will present at the WWW'09 conference.
Bio:

Ali Dasdan is a director managing the metrics, analysis, and algorithms group of Yahoo! Web Search. His group is responsible for defining and implementing whitebox metrics for production web search systems, production web search
data, and the Web. His group is also responsible for monitoring the metrics and diagnosing the causes for anomalies detected. His research interests are in web search and advertising since 2006. He obtained his PhD in Computer Science
from the University of Illinois at UrbanaChampaign in 1999.
Trusts among Users of Online Communities  an Epinions Case Study
Hady Lauw
Trust between a pair of users is an important piece of information for users in an online community (such as electronic commerce websites and product review websites) where users may rely on trust information to make decisions. In
this paper, we address the problem of predicting whether a user trusts another user. Most prior work infers unknown trust ratings from known trust ratings. The effectiveness of this approach depends on the connectivity of the known
web of trust and can be quite poor when the connectivity is very sparse which is often the case in an online community. In this paper, we therefore propose a classification approach to address the trust prediction problem. We develop
a taxonomy to obtain an extensive set of relevant features derived from user attributes and user interactions in an online community. As a test case, we apply the approach to data collected from Epinions, a large product review community
that supports various types of interactions as well as a web of trust that can be used for training and evaluation. Empirical results show that the trust among users can be effectively predicted using pretrained classifiers.
Bio
Hady is currently a Visiting Researcher at Microsoft Search Labs. Prior to that, he earned his Ph.D. at Nanyang Technological University, Singapore under a fellowship funded by A*STAR (Singapore's NSFequivalent). His research interests are in social networks and Web mining, covering such topics as rating behaviors, trust, and mass collaboration. His work has been published in various conferences and journals such as KDD, SDM, CIKM, WSDM, EC, and TKDE.Topk Aggregation Using Intersection of Ranked Inputs
Sergei Vassilvitskii
There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: one
has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly wellstudied approaches to achieve efficiency in topk aggregation
include earlytermination algorithms (e.g., TA and NRA) and preaggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches.
We generalize the TA and NRA
algorithms to the case when preaggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain ``instance optimal,'' a very strong optimality notion that is
a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and realworld search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical
benefits of intersection lists can be fully realized only with an earlytermination algorithm.
Joint work with Ravi Kumar, Kunal Punera and Torsten Suel
UtilityMaximizing Privacy Mechanisms
Mukund Sundararajan
A mechanism for releasing information about a statistical database with sensitive data must resolve a tradeoff between utility and privacy. Informally, publishing fully accurate information maximizes utility while minimizing privacy,
while publishing random noise accomplishes the opposite. Formally, privacy can be rigorously quantified using the framework of {\em differential privacy}, which requires that a mechanism's output distribution is nearly the same (in
a strong sense) whether or not a given database row is included or excluded. Thus far, (dis)utility has typically been quantified via the expected error of a (randomly perturbed) response to a query.
We pursue much stronger
and more general utility guarantees. Just as differential privacy guarantees protection against every potential attacker, independent of its side information, we seek a mechanism that guarantees nearoptimal utility to every potential
user, independent of its side information. Formally, we model the side information of a potential user as a prior distribution over query results. An interaction between such a user and a mechanism induces a posterior distribution
 the utility of the mechanism for this user is defined as the accuracy of this posterior as quantified via a userspecific loss function. A differentially private mechanism $M$ is (near)optimal for a given user $u$ if $u$ derives
(almost) as much utility from $M$ as from any other differentially private mechanism.
We prove that for each fixed counting query and differential privacy level, there is a {\em geometric mechanism} $M^*$  a discrete variant
of the wellstudied Laplace mechanism  that is {\em simultaneously utilitymaximizing} for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: {\em every} potential
user $u$, no matter what its side information and loss function, derives as much utility from $M^*$ as from interacting with a differentially private mechanism $M_u$ that is optimally tailored to $u$. The first part of our proof characterizes
the utilitymaximizing differentially private mechanism for a fixed but arbitrary user in terms of a basic feasible solution to a linear program with constraints that encode differential privacy. The second part shows that all of the
relevant vertices of this polytope (ranging over all possible users) are derivable from the geometric mechanism via suitable remappings of its range.
Joint work with Arpita Ghosh and Tim Roughgarden.
Online Advertising
Hamid Nazerzadeh
Online advertising is a multibillion dollar growing market with very interesting properties. I briefly explain some of these properties, and talk about ad allocation problems in the context of sponsored search and display advertising.
References:
 Dynamic CostPerAction Mechanisms and Applications to Online Advertising, with Amin Saberi, and Rakesh Vohra.
Proceedings of the 17th International World Wide Web Conference (WWW), 179188, 2008.  A Combinatorial Allocation Mechanism with Penalties For Banner Advertising, with Uriel Feige, Nicole Immorlica, and Vahab S. Mirrokni.
Proceedings of the 17th International World Wide Web Conference (WWW), 169178, 2008.  Advertisement Allocation for Generalized Second Pricing Schemes, with Ashish Goel, Mohammad Mahdian, and Amin Saberi
Fourth Workshop on Ad Auctions, 2008  Allocating Online Advertisement Space with Unreliable Estimates, with Mohammad Mahdian and Amin Saberi.
Proceedings of the 8th ACM Conference on Electronic Commerce (EC), 288294, 2007.
From Bayesian to WorstCase Optimal Auction Design
Tim Roughgarden
We define a general template for auction design that explicitly connects Bayesian optimal mechanism design, the dominant paradigm in economics, with worstcase analysis. In particular, we establish a general and principled way to
identify appropriate performance benchmarks for priorfree optimal auction design. This framework enables, for the first time, meaningful competitive analysis for auction problems with allocation asymmetries, bidder asymmetries, and
costly payments.
The above is joint work with Jason Hartline (from STOC 2008 and ongoing). Time permitting, we will also briefly discuss another recent application of Bayesian optimal auction design: rigorous efficiencyrevenue
approximation tradeoffs (joint work with Shaddin Dughmi and Mukund Sundararajan).
Yury Lifshits, Caltech, http://yury.name
Social Design is an art and science of setting and managing incentives in web applications. So far, machine learning is the dominant approach to all central algorithmic problems in the Web: search, advertising, recommendation systems. Social design can
become an important counterpart to ML, if we find the ways to incentivise users to give us more and better input data and even the answers themselves.
This talk is an outline of a new research field of Social Design. We start with several stories including Internet press conference of president Putin, reputation crash in largest Russian social network, and Yahoo! SearchMonkey
project. Then we attempt to extract, characterize and classify the commonly used patterns in social design. Finaly, we set up the research agenda in the area with focus on social aspects of semantic web and advertising technologies.
Slides available at http://yury.name/talks.html
Bio: Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two gold medals at International Mathematical Olympiads, received the Best
Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors": http://simsearch.yury.name
Inferring Links and Initiators
Evimaria Terzi,IBM
Consider a 01 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i; j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix we study the
problem of inferring the underlying directed links between entities (rows) and finding which entities act as initiators. I will formally define this problem and propose an MCMC framework for estimating the links and the initiators
given the matrix of observations M. I will also show how this framework can be extended to incorporate a temporal aspect; instead of considering a single observation matrix M we consider a sequence of observation matrices M1;
: : : ;Mt over time. Finally I will show that this problem is a generalization of several problems studied in the field of socialnetwork analysis.
Bio: Evimaria Terzi is a Research Staff Member
at IBM Almaden since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, MSc from Purdue University (USA) in 2002 and BSc from the University of Thessaloniki (Greece) in 2000.
Reputation Mechanisms for Online Markets
Christina Aperjis
In online marketplaces potential buyers have to decide whether to buy an item and how much to pay for it without being able to observe it and without knowing whether the seller is trustworthy. Reputation mechanisms promote trust by giving information about past transactions. With the goal of incentivizing sellers to advertise truthfully, we study how the marketplace is affected by (i) the reputation mechanism, and (ii) the way buyers interpret the seller?s reputation score.
TwoStage Myopic Dynamics in Network Formation Games
Esteban Arcaute
We consider twostage myopic dynamics for network formation games, where utility to a node is based on a function of the distance to all other nodes. Since our network formation game is a generalization of Myerson's announcement game, we use pairwise
Nash stability as the solution concept.
We prove that, although the price of anarchy of the static game is unbounded (in the number of nodes in the network), our dynamics converge to a subset of pairwise Nash stable networks with a constant efficiency
ratio. For some specific utility functions, we prove that our dynamics converge to efficient networks.
This is joint work with Ramesh Johari and Shie Mannor.
Measuring and Modeling the Web
Ying Xu
The last couple of decades have witnessed a phenomenal growth in the World Wide Web. The Web has become an ubiquitous channel for information sharing and dissemination. This talk describes several research contributions in an endeavor towards a better
understanding of the Web.
One of the basic questions about the Web is its size, i.e., how many webpages there are on the Web. A closely related
problem is the sizes of search engine indexes, as search engine index defines an important subset of the Web that is most easily accessible to users. In the last ten years both the scientific literature
and the popular press dealt at length with methodologies and estimates for the sizes of the Web and public search engines. In the first part of the talk, we propose algorithms for measuring the Web size,
including both empirical and theoretical results.
The Web is so large that even estimating its size is a hard problem. To better understand the Web, people study random web graph models  a stochastic process that generates random graphs that
with high probability resemble the Web graph. Random web graph models are useful theoretical tools to provide insights to the evolution of the Web graph, to predict future behaviors, and to generate
synthetic data sets for testing and simulation. In the second half of the talk, we briefly present several results in analyzing web graph models and applying the models to real world applications.
Should ad networks bother fighting click fraud?
Bobji Mungamuru
Suppose an ad network decides that a given clickthrough is invalid (or "fraudulent"). The implication is that the ad network will not charge the advertiser for that clickthrough. Therefore, arguably, the ad network is "taking money out of his own pocket" by marking clicks invalid. As such, should ad networks even bother fighting fraud? We analyze a simple economic model of the online advertising market, and conclude that the answer is "yes".
Effective reputation systems should incentivize users to acquire and reveal information about content quality. They should also counter distortions induced by strategic agents that benefit from manipulating reputations. Mechanisms have been proposed in the recent literature with the aim to provide such incentives. These mechanisms, which we will refer to as reputation markets, can be viewed as information markets designed to assess the quality of online content. In this paper we develop equilibrium models to study how incentives created by a reputation market should influence community behavior and the accuracy of assessments.
Optimal Marketing Strategies over Social Networks
Mukund Sundararajan
Abstract: We study revenue maximization in social networks. We identify a family of strategies called 'influence and exploit' strategies that are easy to implement, easy to optimize over and approximately optimal.
Based on a paper that will appear in WWW 08. Jointly with Jason Hartline (Northwestern University) and Vahab Mirrokni (MSR Redmond).i
Ratio Index for Budgeted Learning, with Applications
Brad Null
In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Approximation algorithms for this problem
were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. We present an index for this problem, which we call the
ratio index, which also guarantees a constant factor approximation. Index based policies have the advantage that a single number (i.e. the index) can be computed for each
alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for
the discounted multiarmed bandit problem.
The ratio index has several interesting structural properties. First, it can be computed in strongly polynomial time; the techniques we develop may be useful for
deriving strongly polynomial algorithms for other Markov Decision Problems. Second, with the appropriate discount factor, the Gittins index and the ratio index are constant
factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, the ratio index
can be used to obtain a "discount oblivious" solution to the multiarmed bandit problem, which in turn gives constant factor approximations to a wide class of problems
that we call "list ranking problems".
Joint work with Rajat Bhattacharjee, Ashish Goel, and Sanjeev Khanna
Web Graph Similarity for Anomaly Detection
Panagiotis Papadimitriou
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an errorprone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this work we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from wellknown document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.
I'll talk about a number of computational problems that arise in online advertising, primarily from the perspective of the buyer of advertising.
Topics to be discussed include:
* When buying sponsored search links from search engines, how does one manage a portfolio of millions of keywords with minimal human involvement? Challenges
include determining how much to bid for each keyword and what ad text to display to attract the most customers.
* How can one learn what properties of a web site or advertisement will lead to the greatest number of purchases or clicks? The problem is made more
difficult by the fact that different classes of users respond well to different marketing messages, so it's best to use different content for different
user segments.
* Consider an auction where the items to be sold are not known in advance, but rather arrive online, and buyers are willing to pay different amount
for each item (revealed as the items arrive). Each buyer has a maximum limit on the total number of items that they will buy. How should the seller assign
items to buyers so as to maximize the seller's revenue?
An Axiomatic Approach to Personalized Ranking Systems
Alon Altman
Personalized ranking systems and trust systems are an essential tool for collaboration in a multiagent environment. In these systems, trust relations between many agents are aggregated to produce a personalized trust rating of the agents. In this talk I introduce the first extensive axiomatic study of this setting, and explore a wide array of wellknown and new personalized ranking systems. We adapt several axioms (basic criteria) from the literature on global ranking systems to the context of personalized ranking systems, and fully classify the set of systems that satisfy all of these axioms. We further show that all these axioms are necessary for this result.
Ad Auctions with Reserved Price
Arash Asadpour
I will talk about the Ad Auctions when the auctioneer enjoys a relevant ad that she can show in one of the slots if she wants to. The setting is to some extent similar to the Ad Auctions with reserved price. I will explain a randomized auction which is
truthful in expectation for both the bidders and the auctioneer. Also our mechanism is individual rational and budget balanced. I will
discus the revenue of this auction and also will talk about some other settings in which the technique used to design the auction can be
helpful.
This is a joint work with Kamal Jain (Microsoft Research) and David Abraham (CMU).
Boosting the area under ROC curve
Phil Long, Google
We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. This boosting can be
performed even in the presence of independent misclassification noise, given access to a noisetolerant weak ranker.
(This is joint work with Rocco Servedio.)
Random input models for Adwords
Aranyak Mehta, Google
I will talk about a new result on Random Input models for the Adwords Problem: What are good algorithms for allocation when the query stream is modeled as a random process, rather than as a worst case input. How does the natural "highestbidderwins" greedy algorithm perform? This question leads to an interesting generalization of a classic online matching algorithm. I will also talk about a related result on extended bidding models for Ad auctions.
Who do you know: The Social Network Revolution
Orkut Buyukkokten, Google
Online social networks fundamentally change the way we get connected. The people we cross paths with have the biggest influence in our lives. Now it's easier to cross paths than ever as we are much closer and so much more connected. In this talk I will discuss the motivation behind the development of orkut.com, touch on the social and technical aspects of implementing and maintaining a system that has over 60 million users and reflect on the lessons we learned.
Deterministic Decentralized Search in Random Graphs
Esteban Arcaute
We study a general framework for decentralized search in random graphs. Our main focus is on deterministic memoryless search algorithms that use only local information to reach their destination in a bounded number of steps in expectation. This class
includes (with small modifications) the search algorithms used in Kleinberg's pioneering work on longrange
percolation graphs and hierarchical network models. We give a characterization of searchable graphs in
this model, and use this characterization to prove a monotonicity property for searchability.
Joint work with Ning Chen, Ravi Kumar, David LibenNowell, Mohammad Mahdian, Hamid Nazerzadeh and Ying
Xu.
Interdomain Routing and Games
Michael Schapira, Hebrew University of Jerusalem ( http://www.cs.huji.ac.il/~mikesch/)
We present a gametheoretic model that captures many of the intricacies of interdomain routing in today's Internet. In this model, the strategic agents are sourcenodes located on a network, who aim to send traffic to a unique destination node . The model
enables complex dynamic interactions between the agents (asynchronous, sequential, and based on
partialinformation).
We provide realistic conditions that guarantee that executing the Border gateway Protocol (BGP),
the only protocol currently used for interdomain routing, is in the best interest of each of the
players. Moreover, we show that even coalitions of players of any size cannot improve their routing
outcomes by jointly deviating from BGP. Unlike the vast majority of works in mechanism design,
these results do not require any monetary transfers (to or by the agents).
Based on joint
work with Hagay Levin and Aviv Zohar.
The Anatomy of Clickbot.A
Neil Daswani, Google
In this talk, I will present a detailed case study of the architecture of the Clickbot.A botnet that attempted a lownoise click fraud attack against syndicated search engines. The botnet of over 100,000 machines was controlled using a HTTPbased botmaster. Google identified all clicks on its ads exhibiting Clickbot.Alike patterns and marked them as invalid. We disclose the results of our investigation of this botnet to educate the security research community and provide information regarding the novelties of the attack.
Data Stream Algorithms with Applications to Automatic Worm Fingerprinting
Dilys Thomas
We will first go over a few network worms and contemporary algorithms to detect these network worms. We will then go over some recent techniques to detect these worms using "vunlerability signatures". We will then use this to review two important problems in data stream computation  distinct value estimation and frequent element computation. We show how these algorithms can be used for automatic worm fingerprinting.
Click Fraud Resistant Methods for Learning ClickThrough Rates
Sergei Vassilvitskii
Presenting the paper "Click Fraud Resistant Methods for Learning ClickThrough Rates" by N. Immorlica, K. Jain, M. Mahdian, and K. Talwar from WINE 2005.
Pricing Partially Compatible Products
Adam Meyerson, UCLA
Suppose two competing companies provide roughly equivalent suites of software products. Each version of a product has a possibly distinct price and quality. Customers would like to purchase a low price, highquality set of products. The issue is that
products made by different companies need not be fully compatible.
We reduce the customer's decision process to a minimum cut, and provide
approximation and hardness results for the problem of computing the
company's bestresponse strategy in several different models. We also
introduce a range of questions for further work.
These results will appear in the ACM Conference on Electronic Commerce
(EC) 2007, and are joint work with David Kempe, Nainesh Solanki, and
Ramnath Chellappa.
Google News Personalization: Scalable Online Collaborative Filtering
Mayur Datar, Google
Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
Detecting NearDuplicates for Web Crawling
Anish Das Sarma
This is joint work with Gurmeet Singh Manku and Arvind Jain in Google.
Nearduplicate web documents are abundant. Two such
documents differ from each other in a very small portion
that displays advertisements, for example. Such differences
are irrelevant for web search. So the quality of a
web crawler increases if it can assess whether a newly
crawled web page is a nearduplicate of a previously
crawled web page or not.
In my talk I will address
the problem of efficiently detecting nearduplicates
in the web crawl setting. First I shall review Charikar's
fingerprinting technique used in our approach. Then,
I shall formally define the problem and present algorithmic
solutions for two versions of the problem: online queries
(single new document) and batch queries (multiple new
documents). Finally, I shall briefly talk about our
experimental analysis of the techniques.
Clickthrough of and Bidding on keywords
Aleksandra Korolova / Dilys Thomas
Aleksandra will talk about the paper "Predicting ClickThrough Rate Using Keyword Clusters" by Regelson and Fain from a Workshop on Sponsored Search Auctions. The results in the paper are mostly experimental. Dilys will briefly talk on "how to bid on keywords".
Multipath Routing
Mihaela Enachescu
I will present an approach for enabling multipath routing in large networks. This is based on a project at NEC Labs, and is joint work with Ravi Kokku. Our approach is to create multiple paths between various sourcedestination pairs by selecting a special
subset of relay nodes. We give hardness
results and provide some approximation
algorithms.
This work is motivated by the
observation that given two endhost
in the Internet there exist potentially
many independent or partially overlapping
paths between them. However, only one
of these is usually used by the underlay
routing, leading to many inefficiencies
(the single path routings are usually
slow to adapt in the presence of fluctuating
network conditions). Multipath routing
provides a more adaptive framework,
and it also raises a variety of (still
open) problems for our enjoyment.
Collaborative tagging systems
Paul Heymann
Collaborative tagging systems have recently become prevalent on the web. These systems are similar to classical keywordbased systems, but harness the input of many users to label large datasets like URLs on the web, videos, and photos. In my talk, I will give a brief description of tagging systems, and then talk about two recent projects. The first is work to predict tags annotating URLs based on the page text of those URLs. The outcome of this task has important implications for web search and for the usefulness of data created by social bookmarking systems. The second is an earlier (continuing) project on tag hierarchies.
Is Efficiency Expensive?
Mukund Sundararajan
We study the simultaneous optimization of efficiency and revenue in payperclick keyword auctions in a Bayesian setting. Our main result is that the efficient keyword auction yields nearoptimal revenue even under modest competition. In the process,
we build on classical
results in auction
theory to prove that
increasing the number
of bidders by the number
of slots outweighs
the benefit of employing
the optimal reserve
price.
This is joint
work with Tim Roughgarden.
Estimating Search Engine Index and Web Sizes
Ying Xu
I will talk about estimating the index size of search engines, both empirical and theoretical results. The talk is based on our CIKM paper "Estimating Corpus Size via Queries" and ICALP paper "Estimating Sum by Weighted Sampling".
Reusable and multiple goods
Ashish Goel
The talk is based on
Online
auctions
with
reusable
goods
by
Mohammad
Hajiaghayi,
Robert
Kleinberg,
Mohammad
Mahdian,
and
David
Parkes,
EC'05
The
Sequential
Auction
Problem
on
eBay:
An
Empirical
Analysis
and
a Solution
by
Adam
I.
Juda
and
David
C.
Parkes,
EC'06
Mechanism design based on CostPerAcquisition (CPA)
Hamid Nazerzadeh
The Internet advertising has changed dramatically by moving from CostPerImpression charging mechanism to CostPerClick (CPC). I am going to talk about the possibility of going one step further and using CPA instead of CPC, and some of our results about this problem.
Asymptotically Optimal Repeated Auctions for Sponsored Search
Nicolas Lambert
We investigate asymptotically optimal keyword auctions, that is, auctions which maximize revenue as the number of bidders grows. We do so under two alternative behavioral assumptions. The first explicitly models the repeated nature of keyword auctions. It introduces a novel (and, we argue, plausible) assumption on individual bidding, namely that bidders never overbid their value, and bid their actual value if shut out for long enough. Under these conditions we present a broad class of repeated auctions that are asymptotically optimal among all sequential auctions (a superset of repeated auctions). Those auctions have varying payment schemes but share the ranking method. The Google auction belongs to this class, but not the Yahoo auction, and indeed we show that the latter is not asymptotically optimal. (Nonetheless, with some additional distributional assumptions, the Yahoo auction can be shown to belong to a broad category of auctions that are asymptotically optimal among all auction mechanisms that do not rely on ad relevance.) We then look at the oneshot keyword auction, which can be taken to model repeated auctions in which relatively myopic bidders converge on the equilibrium of the fullinformation stage game. In this case we show that the Google auction remains asymptotically optimal and the Yahoo auction suboptimal. The distributional assumptions under which our theorems hold are quite general, and arguably hold in all realworld situations of interest. We do however show that the Google auction is not asymptotically revenue maximizing for general distributions.