Skip to content Skip to navigation

Archive of RAIN talks

Spring 13-14

Date Speaker Topic
April 2 Omer Tamuz Strategic Learning and the Topology of Social Networks
April 16 Kostas Bimpikis Information Provision in Dynamic Innovation Tournaments
May 14 Jonathan Levin Sales Mechanisms in Online Markets: What Happened to Internet Auctions?
May 28 Steve Tadelis Quality Externalities and the Limits of Reputation in Two-Sided Markets
June 4 Sinan Aral The Dynamics of Online Reputation

Winter 13-14

Date Speaker Topic
January 29 Derek Ruths Control Profiles of Complex Networks
February 5 Amin Karbasi From Small-World Networks to User-Driven Content Search
February 19 Arun Chandrasekhar Savings Monitors
February 26 Ariel Procaccia Computational Fair Division: From Cake Cutting to Cloud Computing
March 12 Niki Kittur Big Thinking: Augmenting human cognition with crowds and computation
March 19 Ricardo Baeza-Yates OSNs: The Wisdom of a Few and The Value of Users

Autumn 13-14

Date Speaker Topic
September 25 Tuomas Sandholm Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games
October 23 Sharad Goel "Going Viral" and the Structure of Online Diffusion
November 06 Itay Fainmesser The Value of Network Information
December 04 Jennifer Neville Prediction in complex networks: The impact of structure on learning and prediction

Spring 12-13

Date Speaker Topic
April 03 Anirban Dasgupta Eliciting and aggregating information from the crowd
April 10 Noam Nisan Should auctions be complex?
April 17 David Pennock The extent of price misalignment in prediction markets
April 24 Yaron Singer Adaptive Seeding in Social Networks
May 08 Cristian Danescu-Niculescu-Mizil Language and Social Dynamics in Online Communities
May 15 Francis Bloch Pricing in Social Networks

Winter 12-13

Date Speaker Topic
Jan 09 Itai Ashlagi Current challenges in kidney exchange - need for long chains and dynamic matc hing
Jan 23 Katrina Ligett Take it or Leave it: Running a Survey when Privacy Comes at a Cost
Jan 30 Al Roth Deceased Organ Donation and Allocation: 3 Experiments in Market Design
Feb 7-8 - DIMACS Workshop on Economic Aspects of Information Sharing
Feb 20 Santo Fortunato Citation networks: Evaluating the impact of science
Mar 06 Michael Bernstein Crowd-Powered Systems

Autumn 12-13

Date Speaker Topic
Oct 03 Devavrat Shah Rumor in a network: Who's the culprit?
Oct 10 Moshe Babaioff On Bitcoin and Red Balloons
Oct 24David Kempe Information Asymmetries in first-price common-value auctions, or The Value of Information
Nov 14 Lirong Xia Random Utility Models for Social Choice
Nov 28 Susan Athey Advertiser Behavior in Sponsored Search Auctions

Spring 11-12

Date Speaker Topic
April 04 Nello Cristianini Automatic Discovery of Patterns in Media Content
April 11 Kamesh Munagala Auctions and Allocations with Positive Network Externalities
April 18 Dan McFarland Adolescent Societies - Their Form, Evolution, and Variation
April 25 Jason Hartline The Simple Economics of Approximately Optimal Auctions
May 2 Khalid El-Arini Beyond Keyword Search: Taming Information Overload with Rich User Interactions
May 9 Kostas Bimpikis Optimal targeting strategies over social networks
May 16 Katharine Anderson Individual Heterogeneity and the Formation of Collaboration Networks
May 23 Dan Iancu DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management.
May 30 Kamal Jain Online Concave Matching

Winter 11-12

Date Speaker Topic
January 18 Michael Wellman Price Prediction Strategies for Simultaneous One-Shot Auctions
January 25 George Varghese Lattice Games and the Economics of Aggregators
February 1 Michael Kearns Competitive Contagion in Networks
February 8 Stephen Barley Building an Institutional Field to Corral a Government
February 15 Siddharth Suri Cooperation and Assortativity with Endogenous Partner Selection
February 22 - -
February 29 - -
March 6 Sergiu Hart Two(!) Good To Be True
March 7 Dean Eckles Identifying peer effects in online communication behaviors

Autumn 11-12

Date Speaker Topic
October 5 Jure Leskovec Networks, communities and the ground-truth
October 12 Vincent Conitzer Computing Game-Theoretic Solutions for Security
October 19 Costis Daskalakis Revenue-optimal Auctions
October 26 David Parkes A Random Graph Model of Multi-Hospital Kidney Exchanges
November 2 - -
November 9 Arnout van de Rijt A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs
November 16 Markus Mobius Treasure Hunt
November 23 - Thanksgiving
November 30 Yiling Chen Task Routing for Prediction Tasks

Spring 10-11

Date Speaker Topic
April 6 Eric Friedman Bargaining Theory in the Cloud
April 13 Organizational Organizational
April 20 Matthew Jackson Network Patterns of Favor Exchange
April 27 Tim Roughgarden Simple Auctions with Near-Optimal Equilibria
May 4 Kevin Lang Efficient Online Ad Serving in a Display Advertising Exchange
May 11 Sergei Vassilvitskii Designing Algorithms for MapReduce
May 18 Arpita Ghosh Incentivizing High-quality User-Generated Content
May 25 Lada Adamic Coevolution of Network Structure and Content
June 1 Damon Centola -

Winter 10-11

Date Speaker Topic
January 12 Martin Suchara BGP Safety with Spurious Updates - The Conditions of BGP Convergence
January 19 No seminar -
January 26 Gabor Szabo Predicting the Popularity of Online Content
February 2 No seminar -
February 9 No seminar -
February 16 Elisa Celis Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism
February 23 Yashodhan Kanoria Social Learning and the Dynamic Cavity Method
March 2 Matthias Mnich The Complexity Ecology of Consensus Ranking

Autumn 10-11

Date Speaker Topic
September 22 Elliot Anshelevich Contribution Games in Social Networks
September 29 Pranav Dandekar Liquidity in Credit Networks: A Little Trust Goes a Long Way
October 6 Sugato Basu User Browsing Models: Relevance versus Examination
October 13 Daria Sorokina Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
October 20 Marc Najork Reflections on Link-based Ranking
October 27
November 3 Panagiotis Papadimitriou Output URL Bidding
November 10 Roberto Bayardo A Technical Overview of Search Ads Quality at Google
November 17 Georgios Pierrakos On the Competitive Ration of Online Sampling Auctions
November 24
December 1 Paolo Parigi The importance of ties

Spring 09-10

Date Speaker Topic
April 7 Gueorgi Kossinets How opinions are received by online communities: a case study on helpfulness votes
April 14 Grant Schoenebeck Reaching Consensus on Social Networks
April 21 Nilesh Dalvi Web-scale Information Extraction Using Wrappers
April 28
No meeting -- WWW 2010
May 5 Andrew Tomkins Characterization and Modeling of Online Browsing Activity
May 12 Xiaolin Shi User grouping behavior in online forums
May 19 Matthew Harding Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing
May 26 Neel Sundaresan Deconstructing A Long Tail Commerce Network
June 2 Monica Rogati Nod vs. High-Five: LinkedIn Connection Strength Models and Applications

Winter 09-10

Date Speaker Topic
Jan 13 Jure Leskovec Network of positive and negative edges
Jan 20 Esteban Arcaute
Social Networks and Stable Matchings in the Job Market
Jan 27 Christina Aperjis Temporal User Behavior in Yahoo Answers
Feb 3 Ashish Goel Some open problems motivated by social networks
Feb 10 Yehuda Koren The Netflix Prize: Quest for $1,000,000
Feb 17 DJ Patil Data Jujitsu - the art of treating data as a product
Feb 24 Panayiotis Tsaparas Query-log analysis for improving web search
Mar 3 Arash Asadpour Stochastic Submodular Optimization
Mar 10 Ravi Kumar Compressibility of Behavioral Graphs
Mar 17 Sep Kamvar Aardvark

Autumn 09-10

Date Speaker Topic
Sep 30 Arvind Narayanan De-anonymizing Social Networks
Oct 7 Nicolas Lambert Eliciting Information on the Distribution of Future Outcomes
Oct 14 Lars Backstrom Optimizing Web Traffic via the Media Scheduling Problem
Oct 21 Michael Kearns Behavioral Experiments in Strategic Networks
Note special location: Terman Engineering Center, Room 453
Oct 28 Matthew Jackson How Social Network Structure affects Diffusion and Learning
Nov 4 Paul Valiant How to Design Profitable Auctions
Nov 11 Mohammad Mahdian Sort me if you can (or, Algorithms on Dynamic Data)
Nov 18 DJ Patil
Nov 25
No meeting -- Happy Thanksgiving!
Dec 2 Abdur Chowdhury Discovery & Emergence
Note special location: Terman Engineering Center, Room 453

Spring 08-09

Date Speaker Topic
Apr 3 Mayur Datar
Internet search advertising: Challenges in targeting and quality considerations
Apr 10
No meeting this week
Apr 17 D Sivakumar
Affiliation Networks
Apr 24
No meeting -- WWW 2009
May 1
No meeting -- BAGT
May 8 Sandeep Pandey
Yahoo! Research
Using Bandit Models in Search Engines
May 14 Richard Chatwin
VP Research, Adchemy
Research Challenges in Online Advertising
May 22 Robert Johnson
Scaling Social Data
May 29 Yury Lifshits
Yahoo! Research
Data Cloud: Integrating Structured Information into Web Search
June 5 Robbie Yan Manipulation Robustness of Collaborative Filtering Systems

Winter 08-09

Date Speaker Topic
Jan 16 - Organizational meeting
Jan 23 G.D. Ramkumar
SnapTell, Inc.
Scalable image recognition algorithms for camera phone applications
Jan 30 Luca de Alfaro Content-Driven Reputation and Trust for Collaborative Content
Feb 6 Ioannis Antonellis Tagging with Queries: How and Why?
(To appear in WSDM 2009)
Feb 13 Scott Brave
Baynote, Inc.
Content Targeting and Recommendation Systems
Feb 20 Ravi Kumar Online Social Networks: Modeling and Mining
Feb 27 Michael Mahoney Community Structure in Large Social and Information Networks
Mar 6 Kostas Kollias An Incentive-Based Architecture for Social Recommendations
Mar 13 Ali Dasdan Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction

Autumn 08-09

Date Speaker Topic
Oct 3 - Talk by Jure Leskovec and organizational meeting
Oct 10 Hady Lauw Trusts among Users of Online Communities -- an Epinions Case Study
Oct 17 Canceled -
Oct 24 Bahman Bahmani Online Budgeted Matching in Random Input Models with
Applications to Adwords
Oct 31 Sergei Vassilvitskii Top-k Aggregation Using Intersection of Ranked Inputs
Nov 7 - No meeting - Bay Area Theory Seminar (BATS)
Nov 14 Mukund Sundararajan Utility-Maximizing Privacy Mechanisms
Nov 21 Hamid Nazerzadeh Online advertising
Nov 28 - No meeting - Happy Thanksgiving!
Dec 5 Tim Roughgarden From Bayesian to Worst-Case Optimal Auction Design

Spring 07-08

Date Speaker Topic
Apr 11BAGT
Apr 18Robbie Yan Reputation Markets
Apr 25Cancelled for WWW
May 2Bobji Mungamuru Click Fraud
May 9Ying Xu Oral Examination
May 16 Esteban Arcaute Network Formation Games
Mar 23Christina Aperjis Reputation Systems
May 30Evimaria Terzi Inferring Links and Initiators
June 6Yury Lifshits Social Design

Winter 07-08

Date Speaker Topic
Jan 25Alon Altman An Axiomatic Approach to Personalized Ranking Systems
Feb 1Ashish Goel Open Problems
Feb 8Cancelled
Feb 15Brian Babcock Adchemy
Feb 22Rajat Bhattacharjee Oral Examination
Feb 29Panagiotis Papadimitriou Web Graph Similarity for Anomaly Detection
Mar 7Brad Null Ratio Index for Budgeted Learning, with Applications
Mar 14Mukund Sundararajan Optimal Marketing Strategies over Social Networks

Autumn 07-08

Date Speaker Topic
Oct 15Michael Schapira Interdomain Routing and Games
Oct 22Esteban Arcaute Decentralized Search of Random Graphs
Oct 29Orkut Buyukkokten Social Network Revolution
Nov 5Aranyak Mehta Random input models for Adwords
Nov 12Phil Long Boosting the area under the ROC curve
Nov 19Thanks-giving
Nov 26cancelled
Dec 3Arash Asadpour Ad auctions with reserved price

Spring 06-07

Date Speaker Topic
Apr 13Ying Xu a discussion on WWW'07 papers
Apr 20BAGT
Apr 27Anish Das Sarma Detecting Near-Duplicates for Web Crawling
May 4Mayur Datar Google News Personalization
May 11Adam Meyerson Pricing Partially Compatible Products
May 18Sergei Vassilvitskii Click Fraud Resistant Methods for Learning Click-Through Rates
May 25Dilys Thomas Streaming Algorithms and Worm Fingerprinting
June 1Neil Daswani Anatomy of Clickbot.A

Winter 06-07

Date Speaker Topic
Jan 26Nicolas Lambert Asymptotically Optimal Repeated Auctions
Feb 2Hamid Nazerzadeh Mechanism design based on CPA
Feb 9Ashish Goel Reusable and multiple goods
Feb 16Ying Xu Estimating Search Engine Index and Web size
Feb 23Mukund Sundararajan Is efficiency expensive?
March 2Paul Heymann Collaborative tagging systems
March 9Mihaela Enachescu Multi-path Routing
March 16Aleksandra Korolova / Dilys Thomas Clickthrough of and Bidding on keywords

Autumn 06-07

Date Speaker Topic
Oct 25 Ying Xu The Netflix Challenge
Nov 1 Ashish/Alexandra Social Networks
Nov 8 Arash/Shubha PageRank
Nov 15 TBA TBA
Nov 22 - Netflix follow-up
Nov 29 Zoltan Gyongyi TBA
Dec 6 Sreenivas Gollapudi TBA

Spring 05-06

Date Speaker Topic
Apr 7 Ashish Goel Axiomatic PageRank
Apr 14 - Organizational Meeting
Apr 21 Sergei Vassilvitskii Relevance Feedback and Web Search
Apr 28 Hamid Nazerzadeh AdWords with Unreliable Estimates
May 5 Mihaela Enachescu Compact Routing
May 12 Ying Xu Streaming Algorithms for Graph Problems
May 19 - -
May 26 Udi Manber New Problems in Search
Jun 2 Rajat Bhattacharjee Incentive Based Ranking Mechanisms

Winter 05-06

Date Speaker Topic
Jan 20 - Organizational Meeting
Jan 27 Brad Null Multiarmed Bandit Problem
Feb 3 Edith Cohen Spatially-Decaying Aggregation
Feb 10 Andrew Tomkins Routing and Growth Models for Social Networks
Feb 17 - BAGT
Feb 24 Amin Saberi Open Problems
Mar 3 Mayur Datar, Ashutosh Garg Recommendation Systems
Mar 10 Amruta Joshi Googlebase

Fall 05-06

Date Speaker Topic
Oct 7 Roee Engelberg Equilibria in Online Games
Oct 14 Ying Xu, Sergei Vassilvitskii Models for the Web Graph
Oct 21 Gagan Aggarwal AdWords
Oct 28 Omar Benjelloun SERF
Nov 4 Zoltan Gyongyi Link Spam
Nov 11 Mihaela Enachescu Query Incentive Networks
Nov 18 - BATS
Dec 2 Rajat Bhattacharjee Survey of Recommendation Systems

Spring 04-05

Date Speaker Topic
Apr 4 Shubha Nabar Impact of Search Engines on Evolution of Page Popularity
Apr 11 Svetlozar Nestorov Mining Structure from Semistructured Data
Apr 18 Sepandar Kamvar Reputation Management in P2P Networks
Apr 25 Rina Panigrahy PageRank Axioms
May 2 Amin Saberi Spread of Viruses on the Internet
May 9 Ramesh Johari Stable Policy Routing with Provider Independence
May 16 Rajat Bhattacharjee Reputation Systems
May 23 Mehran Sahami Matching Text on the Web
Jun 6 Amruta Joshi Modeling Documents

Winter 04-05

Date Speaker Topic
Feb 2 Ying Xu Recommendation Systems/Collaborative Filtering
Feb 9 Ashish Goel Spam and Page Rank
Feb 16 Gagan Aggarwal AdWords
Feb 23 David Arthur Gossip and File Sharing
Mar 2 Anil Kamath AdWords
Mar 10 Sergei Vassilvitskii
Mar 16 Dilys Thomas Social Networks


Strategic Learning and the Topology of Social Networks
Omer Tamuz, Microsoft Research/Caltech

We consider a group of people who each have to make a repeated choice between two lifestyle options, and who observe the choices of their friends in a social network. We assume that one of the two options is universally preferable, that each person's choice affects only his or her own health, and that any benefits are only observed in the distant future.

We model this by a Bayesian game of pure informational externalities, in which agents start with conditionally independent private signals. We show that the question of whether or not the agents learn the preferable choice depends on the topology of the social network. In particular, we identify a geometric "egalitarianism" condition on the social network graph that guarantees learning in infinite networks, or learning with high probability in large finite networks, in any equilibrium of the game.

Joint work with Elchanan Mossel and Allan Sly

Bio: Omer Tamuz is a Schramm Postdoctoral Fellow at Microsoft Research New England / MIT, and an Assistant Professor (on leave) at the faculty of the Humanities and Social Sciences division at Caltech. He received his PhD in mathematics from the Weizmann Institute, where he studied learning in Bayesian games, as well as other topics in machine learning, probability theory and ergodic theory.

Information Provision in Dynamic Innovation Tournaments
Kostas Bimpikis, Stanford

Innovation tournaments have emerged as a viable alternative to the standard research and development process. They are particularly suited for settings that feature a high degree of uncertainty about the feasibility of the innovation goal. Information about the progress of participants in these settings has an interesting dual role. On one hand, participants who have experienced no progress start becoming pessimistic about the underlying environment and whether the innovation goal is even feasible, so learning about their competitors' gradual progress provides a positive signal about the attainability of the goal and provides an incentive for participants to continue putting effort. On the other hand, information about the status of competition may adversely affect effort provision from the laggards, who start getting discouraged about their chances of winning the tournament. Because the tournament's success crucially depends on the effort decision of the participants, the tournament's information provision mechanism is a central feature of its design. This paper explores these issues and suggests a number of design guidelines -- with a focus on the role of intermediate awards as information provision devices -- that help implement the objective of maximizing the probability and minimizing the time of completing the end goal.

Bio: Kostas Bimpikis is an Assistant Professor of Operations, Information and Technology at Stanford University's Graduate School of Business. Prior to joining Stanford, he spent a year as a postdoctoral research fellow at the Microsoft Research New England Lab. He received a PhD in Operations Research from the Massachusetts Institute of Technology in 2010, an MS in Computer Science from the University of California, San Diego and a BS degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece.

Sales Mechanisms in Online Markets: What Happened to Internet Auctions?
Jonathan Levin, Stanford

Consumer auctions were very popular in the early days of internet commerce, but today online sellers mostly use posted prices. Compositional shifts in the items being sold and the sellers offering these items cannot account for this evolution. Instead, the returns to sellers using auctions have diminished. We consider two hypotheses: a shift in buyer preferences away from auctions, and an increase in competition or search efficiency that favors favors posted prices. We use a model and data from eBay to tease these apart, finding that the former is more important, especially for more idiosyncratic products. We also discuss why auctions and posted prices can coexist in online markets.

Bio: Jonathan Levin is Professor of Economics at Stanford University. His research interests include industrial organization and microeconomic theory.

Quality Externalities and the Limits of Reputation in Two-Sided Markets
Steve Tadelis, Berkeley

Buyers in two-sided marketplace platforms may draw conclusions about the quality of the platform from any single transaction. This induces an externality across sellers that reputation mechanisms will not alleviate. Furthermore, buyers who abandon the platform without leaving feedback will cause seller reputations to be biased. Using data from eBay, we document this externality and argue that platforms can mitigate it by actively screening sellers and promoting the prominence of better quality sellers. Exploiting the bias in feedback, we create a measure of seller quality and demonstrate the benefits of our approach through a controlled experiment that prioritizes better quality sellers to a random subset of buyers. We thus highlight the importance of externalities in two-sided markets and chart an agenda that aims to create more realistic models of two-sided markets.

Bio: Steve Tadelis is a professor of economics at Berkeley's Haas School of Business. From 2011 to 2013 he spent a two year leave at ebay Labs, where he put together and led a group of Economists who focus on the economics of e-commerce, with particular attention to creating better matches of buyers and sellers, reducing market frictions by increasing trust and safety in eBay's marketplace, understanding the underlying value of different advertising and marketing strategies, and exploring the market benefits of different pricing structures. Aside from the economics of e-commerce, Steve’s main fields of interest are the economics of incentives, industrial organization and microeconomics. His research has focused on a firm's reputation as a valuable, tradable asset; the effects of contract design and organizational form on firm behavior with applications to outsourcing and privatization; public and private sector procurement and award mechanisms; and the determinants of trust.

The Dynamics of Online Reputation
Sinan Aral, MIT

Identity and reputation drive some of the most important decisions we make online: Who to follow or link to, whose information to trust, whose opinion to rely on when choosing a product or service, whose content to consume and share. Yet, we know very little about the dynamics of online reputation and how it affects our decision making. Sinan will describe a series of randomized experiments that explore the population level behavioral dynamics catalyzed by identity and reputation online. He will explore some of the implications for bias in online ratings, the foundations of social advertising and the ability to generate cascades of behavior through peer to peer social influence. The coming decades will likely see an emphasis on verified identities online. Sinan will argue that a new science of online identity could help guide our business, platform design and social policy decisions in light of the rising importance of online reputation.

Bio: Sinan Aral is a Scientist and an Entrepreneur: He is the David Austin Professor of Management at MIT, where he holds joint Associate Professorships in the IT and Marketing groups and co-leads the Initiative on the Digital Economy. He was the Chief Scientist at SocialAmp, one of the earliest social commerce analytics companies (until its sale in 2012); and is currently the Chief Scientist at Humin, a social navigation startup developing the “Google Maps” for your social relationships.

Sinan is the Scholar-in-Residence at the New York Times R&D Lab and has worked closely with Facebook, Yahoo, Microsoft, Nike, IBM, Intel, Cisco, Oracle, SAP and many other leading Fortune 500 firms on realizing business value from social media and IT investments. His research has won numerous awards including the Microsoft Faculty Fellowship, the PopTech Science Fellowship, an NSF CAREER Award and a Fulbright Scholarship. He was also recently named one of the “World’s Top 40 Business School Professors Under 40” by Poets & Quants.

In his spare time, he cooks, skis and tell jokes about his own cooking and skiing. His most recent hobby is learning from his one year old son.

You can find Sinan on Twitter @sinanaral.

Control Profiles of Complex Networks
Derek Ruths, McGill University

Studying the control properties of complex networks provides insight into how designers and engineers can influence these systems to achieve a desired behavior. Topology of a network has been shown to strongly correlate with certain control properties; in this talk I will discuss the fundamental structures that explain the basis of this correlation. I'll then use these fundamental structures to construct a new statistic, the control profile, that quantifies the different proportions of control-inducing structures present in a network. Using the control profile, I'll show that standard random network models do not reproduce the kinds of control structures that are observed in real world networks. Furthermore, the profiles of real networks form three well-defined clusters that provide insight into the high-level organization and function of complex systems.

From Small-World Networks to User-Driven Content Search
Amin Karbasi, ETH Zurich

In this talk, we will describe a new way of navigating through a database of similar objects using comparisons. In short, at each step, the database presents two objects to the user, who then selects among the pair the object closest to the target that she has in mind. This process continues until, based on the user’s answers, the database can uniquely identify the target she has in mind. This kind of interactive navigation (also known as exploratory search) and its variants have numerous real-life applications in content-based image/multimedia retrieval.

We will argue that the above problem is strongly related to the small-world network design: given a graph embedded in a metric space, how should one augment this graph by adding edges in order to minimize the expected cost of greedy forwarding. We will first show that the small-world network design problem is NP-hard. Given this negative result, we propose a novel mechanism for small-world design and provide an upper bound on its performance. This mechanism has a natural equivalent in the context of content search through comparisons, and we establish both (order optimum) upper and lower bounds for its performance. These bounds are intuitively appealing as they provide performance guarantees in terms of the bias of the distribution of targets, captured by the entropy, as well as the topology of their embedding, captured by the doubling dimension.

Savings Monitors
Arun G. Chandrasekhar, Stanford University

We conduct a field experiment in India to explore two interventions to help individuals to increase their savings balances. First, we design a financial product based on the popular business correspondent model, which includes frequent reminders, assistance in account opening, and the setting of a six-month savings goal. Second, we measure the effectiveness of adding a peer monitoring component to this basic bundle and test whether the local social network can help to increase the penetration of the formal banking system. We ask whether having a monitor substitutes for a formal commitment device, whether individuals choose the most effective monitors, and moreover, whether some community members are better than others at encouraging financial capability.

Computational Fair Division: From Cake Cutting to Cloud Computing
Ariel Procaccia, Carnegie Mellon University

I will present an exciting new interaction between computer science and fair division theory. On the one hand, I will show how computational thinking provides a novel perspective on classic problems in fair division, such as cake cutting and estate division. On the other hand, I will explain how fair division theory is informing the allocation of computational resources. I will also talk about the integration of some of these theoretical ideas into a not-for-profit fair division website that aims to make the world just a bit fairer.

Big Thinking: Augmenting human cognition with crowds and computation
Niki Kittur, Carnegie Mellon University

Despite the growing availability of information online, individual human cognition is fundamentally limited in the speed and amount of information it can process at once. In this talk I will discuss ways of augmenting individual cognition by taking advantage of crowds and computation. Our approaches combine the flexibility of many human minds working together with the raw power of computational systems to accelerate learning, innovation, knowledge production, and scientific discovery. I will discuss several novel crowdsourcing, machine learning and visualization systems that augment the uniquely human abilities of analogy, schema induction, and problem solving. Examples include augmenting human thinking in product innovation (e.g., Quirky); knowledge production (Wikipedia); creative tasks such as writing, journalism and poetry; and making sense of multivariate and graph data.

OSNs: The Wisdom of a Few and The Value of Users
Ricardo Baeza-Yates, Yahoo Labs

One of the main differences between traditional Web analysis and Online Social Networks (OSNs) studies, is that in the first case the information is organized around content, while in the second case it is organized around people. While search engines have done a good job finding relevant content across billions of pages, nowadays we do not have an equivalent tool to find relevant people in OSNs. Even though an impressive amount of research has been done in this direction, there are still a lot of gaps to cover. Although our first intuition could be (and was!) search for popular people, previous research have shown that users' in-degree (e.g. number of friends or followers) is important but not enough to represent the importance and reputation of a person. Another approach is to study the content of the messages exchanged between users, trying to identify topical experts. However the computational cost of such approach - including language diversity - is a big limitation. In our work we take a content-agnostic approach, focusing in frequency, type, and time properties of user actions rather than content, mixing their static characteristics (social graph) and their activities (dynamic graphs). Our goal is to understand the role of popular users in OSNs, and also find hidden important users: Do popular users create new trends and cascades? Do they add value to the network? And, if they don't, who does it? Who generates the content of the OSN? Our research provides preliminary answers for these questions.

This is joint work with Diego Saez-Trumper and colleagues in Brazil and USA.

Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games
Tuomas Sandholm, Carnegie Mellon University

Incomplete-information games - such as most auctions, negotiations, and future (cyber)security settings - cannot be solved using minimax search even in principle. Completely different algorithms were needed. A dramatic scalability leap has occurred in our ability to solve such games over the last seven years, fueled largely by the Annual Computer Poker Competition. I will discuss the key domain-independent techniques that enabled this leap, including automated abstraction techniques and approaches for mitigating the issues that they raise, new equilibrium-finding algorithms, safe opponent exploitation methods, techniques that use qualitative knowledge as an extra input, and endgame solving techniques. I will finish by benchmarking poker programs against the best human poker professionals and by discussing what humans can learn from the programs.

"Going Viral" and the Structure of Online Diffusion
Sharad Goel, Microsoft Research NYC

New products, ideas, norms and behaviors are often thought to propagate through a person-to-person diffusion process analogous to the spread of an infectious disease. Until recently, however, it has been prohibitively difficult to directly observe this process, and thus to rigorously quantify or characterize the structure of information cascades. In one of the largest studies to date, we describe the diffusion structure of billions of events across several domains. We find that the vast majority of cascades are small, and are characterized by a handful of simple tree structures that terminate within one degree of an initial adopting "seed." While large cascades are extremely rare, the scale of our data allows us to investigate even the one-in-a-million events. To study these rare, large cascades, we develop a formal measure of what we label "structural virality" that interpolates between two extremes: content that gains its popularity through a single, large broadcast, and that which grows via a multi-generational cascade where any one individual is directly responsible for only a fraction of the total adoption. We find that the very largest observed events nearly always exhibit high structural virality, providing some of the first direct evidence that many of the most popular products and ideas grow through person-to-person diffusion. However, medium-sized events -- having thousands of adopters -- exhibit surprising structural diversity, and are seen to grow both through broadcast and viral means. Finally, we show that our empirical results are largely consistent with an SIR model of contagion on a scale-free network, reminiscent of previous work on the long-term persistence of computer viruses.

The Value of Network Information
Itay Fainmesser, Brown University

Motivated by the proliferation of companies, such as, MySpace and Twitter, whose business models rely, at least in part, in monetizing the information on the interactions and influences of their users, we evaluate how much a monopoly can increase its profit by exploiting individual level data on "who influences whom" when designing his selling strategy. We also analyze the effect that the use of such information has on consumers' welfare. Our framework incorporates a rich set of market products, including goods characterized by global and local network effects. It also has the novel feature to distinguish between the level of influence of a consumer and her susceptibility to influence, a distinction that has been shown to be important empirically.

Joint work with Andrea Galeotti.

Prediction in complex networks: The impact of structure on learning and prediction
Jennifer Neville, Purdue University

The recent popularity of online social networks and social media has increased the amount of information available about users' behavior--including current activities and interactions among friends and family. This rich relational information can be exploited to predict user interests and preferences even when individual data is sparse, as the relationships are a critical source of information that identify potential statistical dependencies among people. Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model roll-out on the properties of the resulting statistical models--which occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partially-observable social network and social media datasets.

Eliciting and aggregating information from the crowd
Anirban Dasgupta, Yahoo Labs!

Crowdsourcing is now widely used to replace judgement or evaluation by an expert authority with an aggregate evaluation from a number of non-experts, in applications ranging from rating and categorizing online content all the way to evaluation of student assignments in massively open online courses (MOOCs) via peer grading. Two key issues in these settings are: i) how to provide incentives such that agents in the crowd put in their 'full effort' in making good evaluations, as well as truthfully report them and ii) how to aggregate these evaluations, thereby giving an accurate estimate of the ground truth, given that the agents are of varying, unknown, expertise.

In this talk, we look at both of these problems. For the first, we present an information elicitation mechanism for multiple tasks when agents have endogenous proficiencies, with the following property: exerting maximum effort followed by truthful reporting of observations is a Nash equilibrium with maximum payoff.

For the aggregation problem, we consider a model where each task is binary and each agent has an unknown, fixed, reliability that determines the agent's error rate in performing tasks. The problem is to determine the truth values of the tasks solely based on the agent evaluations. We present algorithms whose error guarantees depend on the expansion properties of the agent-task graph--- these algorithms perform well empirically.

Joint work with Nilesh Dalvi, Arpita Ghosh, Vibhor Rastogi and Ravi Kumar.

Should auctions be complex?
Noam Nisan, Hebrew University of Jerusalem

Do simple auctions suffice for obtaining high revenue or do we need to design complex ones? For the case of selling one item, Myerson shows that the answer is "yes": the revenue-maximizing auction is deterministic with a single reserve price. When selling two (or more) items, we show that the answer is "no": complex auctions can yield infinitely more revenue than simple ones, even when there is only a single bidder with an additive valuation. "Complex" may be defined in various ways here including allowing randomization as well as our choice of measuring the auction's menu size. However, when the bidder’s values for the items are independently distributed, the answer is “approximately yes”: selling each of two items simply and separately yields at least half the revenue of any auction.

Joint work with Sergiu Hart.

The extent of price misalignment in prediction markets
David Pennock, Microsoft Research, NYC

We examine misaligned prices for logically related contracts in prediction markets. First, we uncover persistent arbitrages between identical contracts on different exchanges; price support beyond what is in the published order book multiplies the size of these arbitrages. Second, misalignment between logically related contracts listed on the same exchange occurs when contracts systemically shut down or fail to respond efficiently around moments of high information flow. Third, other examples of bounded rationality include: consistent asymmetry between buying and selling, and persistent price lags between exchanges. Fourth, we propose solutions to these problems by fixing logical contradictions within the exchange and providing flexible trading interfaces, freeing traders to focus on providing information in the form they find most natural. Joint work with David Rothschild.

Adaptive Seeding in Social Networks
Yaron Singer, Google and Harvard University

The rapid adoption of social networking technologies throughout the past decade is bringing special attention to algorithmic and data mining techniques designed for maximizing information cascades in social networks. Despite the immense progress which has been made in the past decade, due to limited data access and the structure of social networks the application of state-of-the-art techniques often results in poor performance.

In this talk we will introduce a new framework we call adaptive seeding. The framework is a two-stage model which leverages a phenomenon known as the "friendship paradox" in social networks in order to dramatically increase the spread of information cascades. Our main result shows constant factor approximations are achievable for the most well-studied models of information spreading in social networks. The result follows from new techniques and concepts that may be of independent interest for those interested in stochastic optimization and machine learning.

Joint work with Lior Seeman.

Pricing in Social Networks
Francis Bloch, Ecole Polytechnique, France.

We analyze the problem of optimal monopoly pricing in social networks where agents care about consumption or prices of their neighbors. We characterize the relation between optimal prices and consumers' centrality in the social network. This relation depends on the market structure (monopoly vs. oligopoly) and on the type of externalities (consumption versus price). We identify two situations where the monopolist does not discriminate across nodes in the network (linear monopoly with consumption externalities and local monopolies with price externalities). We also analyze the robustness of the analysis with respect to changes in demand, and the introduction of bargaining between the monopolist and the consumer.

(joint with Nicolas Querou)

Language and Social Dynamics in Online Communities
Cristian Danescu-Niculescu-Mizil, Stanford University and Max Planck Institute SWS.

Much of online social activity takes the form of natural language, from product reviews to conversations on social-media platforms. I will show how analyzing these interactions from the perspective of language use can provide a new understanding of social dynamics in online communities.

I will describe two of my efforts in this direction. The first project leverages insights from psycholinguistics to build a novel computational framework that shows how key aspects of social relations between individuals are embedded in (and can be inferred from) their conversational behavior. In particular, I will discuss how power differentials between interlocutors are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. The second project explores the relation between users and their community, as revealed by patterns of linguistic change. I will show that users follow a determined lifecycle with respect to their susceptibility to adopt new community norms, and how this insight can be harnessed to predict how long a user will stay active in the community.

This talk includes joint work with Susan Dumais, Michael Gamon, Dan Jurafsky, Jon Kleinberg, Jure Leskovec, Lillian Lee, Bo Pang, Christopher Potts and Robert West.

References: WWW 2013, WWW 2012, WWW 2011.

Current challenges in kidney exchange - need for long chains and dynamic matching
Itai Ashlagi, MIT Sloan

A kidney exchange network establishes a pool of patient-donor pairs, non-directed donors, and patients on the deceased donor waiting lists, and seeks to arrange transplants among this pool.
It has been previously shown that for sufficiently large pools of patient-donor pairs, (almost) efficient kidney exchange can be achieved by using at most 3-way cycles, i.e. by using cycles among no more than 3 patient-donor pairs.
However, as kidney exchange has grown in practice, cycles among n > 3 pairs have proved useful, and long chains initiated by non-directed, altruistic donors have proven to be very effective. We explore why this is the case, both empirically and theoretically. We provide an analytical model of exchange when there are many highly sensitized patients, and show that large cycles of exchange or long chains can significantly increase efficiency when the opportunities for exchange are sparse. As very large cycles of exchange cannot be used in practice, long non-simultaneous chains initiated by non-directed donors significantly increase efficiency in patient pools of the size and com- position that presently exist. Most importantly, long chains benefit highly sensitized patients without harming low-sensitized patients.
One way to make kidney exchange networks thicker is by waiting for pairs to arrive—waiting longer allows the pool to become larger, but imposes costs on those already in the pool. We study empirically and theoretically the impact of the waiting periods on the number of matches.

Based on joint work with David Gamarnik, Alvin Roth, Mike Rees, Patrick Jaillet and Vahideh Manshadi

Take it or Leave it: Running a Survey when Privacy Comes at a Cost
Katrina Ligett, Caltech

In this talk, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by.

Joint work with Aaron Roth.


Deceased Organ Donation and Allocation: 3 Experiments in Market Design
Al Roth, Stanford

How should you allocate a scarce resource whose scarcity/availability might be sensitive to how you allocate it? I'll talk about some work I'm just beginning involving the donation and allocation of deceased donor organs for transplantation. This presents a different set of problems from our (more advanced) work on live organ donation and kidney exchange that has received a lot of recent publicity (and was the subject of Itai Ashlagi's recent talk). However the evidence suggests that even for the case of kidney transplantation the current scarcity cannot be solved by live donation alone, and for most organs live donation is not feasible. I'll discuss the new Israeli organ transplant law, and some experiments that will help us understand it better as field data become available. I'll also discuss an experiment in how organ donors are solicited in the United States.

DIMACS Workshop on Economic Aspects of Information Sharing

The DIMACS Workshop on Economic Aspects of Information Sharing is being held at Stanford Feb 7-8, 2013. It is being co-hosted and co-sponsored by RAIN.

In recent years, the collection and analysis of user data over the Internet has experienced a surge. Online services and social networking sites have facilitated and encouraged the online disclosure of a wide array of personal information, while webtracking platforms have enabled the consolidation of user data from disperse sources. The reach of these services has been significantly extended by the integration of mobile devices with the cloud. Companies increasingly view data analytics as an integral part of their operations, and there is an increased need of new means of monetizing data beyond advertising. In parallel, regulatory bodies, consumer advocacy groups as well as Internet companies themselves have struggled to find a balance between the financial benefits of user data monetization and the privacy concerns that it raises.

Motivated by these developments, this workshop aims at studying economic aspects of information sharing from several different aspects, including online data markets, targeted advertising, user incentive mechanisms, and privacy-aware mechanism design.

Citation networks: Evaluating the impact of science
Santo Fortunato, Aalto University, Finland

Citation statistics are regularly adopted to evaluate different actors in the academic and scientific arena, from scholars, to journals, departments, universities, national institutions and whole countries. The scores play a crucial role to decide which grants are awarded, the rankings of applicants for a job, even the fate of scientific institutions. For these reasons citation networks have been carefully investigated over the past decades. I will discuss some recent findings in this area. Citation distributions are universal across many different disciplines, if suitable normalized indicators are adopted as measures of performance, so avoiding the bias of field dependence. Furthermore, citation dynamics is characterized by bursts, usually occurring within a few years since the publication of a paper, and the burst size spans several orders of magnitude. This bursty dynamics can be described by a linear preferential attachment model with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.

Crowd-Powered Systems
Michael Bernstein, Stanford

Crowd-powered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone.

Unfortunately, crowd work is error-prone and slow, making it difficult to incorporate crowds as first-order building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems.

In this talk, I will present two crowd-powered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid micro-contributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in mid-air jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation.

Rumor in a network: Who's the culprit?
Devavrat Shah, MIT

Suppose a malicious "worm" has spread in a network (e.g. Stuxnet) and interest is in finding who is the source (so that many of the following possible actions be taken: report news to the world, reward, take punitive action). Motivated by such scenarios where "information" diffuses in the network starting from an unknown source and interest is in identifying the "source" based on the observed spread, we discuss the problem of finding the source of a rumor in a network. We will design a computationally efficient source estimator for the popular Susceptible-Infected (SI) information spreading model with generic spreading time. In establishing the effectiveness of the estimator, we shall explain the role played by the underlying network structure (a form of expansion property) in determining ability of detection. Connections to generalized Polya's urn, network centrality and number 1 - ln 2 will be explained.

This is based on joint work with Tauhid Zaman (MIT).

On Bitcoin and Red Balloons
Moshe Babaioff, Microsoft Research

Many large decentralized systems rely on information propagation to ensure their proper function. We examine a common scenario in which only participants that are aware of the information can compete for some reward, and thus informed participants have an incentive {\em not} to propagate information to others. One recent example in which such tension arises is the 2009 DARPA Network Challenge (finding red balloons). We focus on another prominent example: Bitcoin, a decentralized electronic currency system.

Bitcoin represents a radical new approach to monetary systems. It has been getting a large amount of public attention over the last year, both in policy discussions and in the popular press \cite{NY11,technology-review}. Its cryptographic fundamentals have largely held up even as its usage has become increasingly widespread. We find, however, that it exhibits a fundamental problem of a different nature, based on how its incentives are structured. We propose a modification to the protocol that can eliminate this problem. Bitcoin relies on a peer-to-peer network to track transactions that are performed with the currency. For this purpose, every transaction a node learns about should be transmitted to its neighbors in the network. As the protocol is currently defined and implemented, it does not provide an incentive for nodes to broadcast transactions they are aware of. In fact, it provides an incentive not to do so. Our solution is to augment the protocol with a scheme that rewards information propagation. Since clones are easy to create in the Bitcoin system, an important feature of our scheme is Sybil-proofness. We show that our proposed scheme succeeds in setting the correct incentives, that it is Sybil-proof, and that it requires only a small payment overhead, all this is achieved with iterated elimination of dominated strategies. We complement this result by showing that there are no reward schemes in which information propagation and no self-cloning is a dominant strategy.

Joint with Shahar Dobzinski, Sigal Oren, and Aviv Zohar.
You can find the paper here and here.
A short description appeared at ACM SIGecom Exchanges.

Information Asymmetries in first-price common-value auctions, or The Value of Information
David Kempe, USC

We study the role of information asymmetries in first-price common value auctions, and how different information structures provide revenue opportunities to third-party information brokers. One of our motivations is Internet ad auctions, where ads for specific site visitors have a common-value nature, in that displaying an ad to a bot has no value, and users are likely to carry the same (or similar) value to different advertisers. Cookies significantly improve the estimate of this value; the fact that different advertisers use cookies to different extents (and use cookies of different qualities) thus leads to significant information asymmetries. These asymmetries can be mitigated or exacerbated when the advertisers obtain cookie information from third-party information brokers.

We focus on a first-price common value auction in which bidders have access to independent discrete signals about the value of an item, which is either 0 or 1. Our first main result is that with two bidders, this auction format has a unique Nash equilibrium; we give a linear-time algorithm for computing this equilibrium and the resulting value of the auction for each player. We then use this equilibrium calculation to understand the value of additional information to the bidders. Third-party information has the interesting property that it can (in principle) be allocated to multiple bidders, and that it carries significant externalities.

We find that the value of additional information is subtle, and depends on the prior and the information already available. We exhibit several surprising possibilities including (1) Additional information about the item does not always result in higher value for a bidder. (2) A bidder may prefer his competitor (but not himself) to receive the broker's signal; this applies both to the more and less informed bidder.

[Joint work with Vasilis Syrgkanis and Eva Tardos]

Random Utility Models for Social Choice
Lirong Xia, Harvard

Social choice studies ordinal preference aggregation with wide applications ranging from high-stakes political elections to low-stakes movie rating websites. In many cases, we want to design a social choice mechanism that reveals the ground truth via MLE inference. Despite its importance, this objective has been largely hindered by the lack of natural probabilistic models and efficient inference algorithms.

In this talk, I will focus on a wide class of probabilistic models called Random Utility Models (RUMs), whose MLE inference was previously believed intractable in general. I will introduce a fast MC-EM algorithm for a very general and natural subclass of RUMs, and discuss its applications and impact on designing better social choice mechanisms. Extension of this algorithm also provides a computational basis for improving models in many applications in economics as well as machine learning, especially learning to rank.

Based on joint work with Hossein Azari Soufiani and David C. Parkes.

Advertiser Behavior in Sponsored Search Auctions
Susan Athey, Stanford

Economists often build "structural models," where they specify a specific model of individual behavior and then use data to estimate the parameters of the model. Although such models require strong assumptions, they have the advantage that they can make principled predictions about what would happen if the environment changed in a way that has not been observed in the past. I will describe the application of these techniques to advertiser behavior in sponsored search advertising auctions, focusing on how the models can be used for marketplace design and management. I will discuss economists' focus on causal inference in statistical models as well as the ways in which experiments can be used to estimate and test structural models. I will also make some suggestions about research directions at the intersection of economics and machine learning.

Automatic Discovery of Patterns in Media Content
Nello Cristianini, University of Bristol

What can we learn about the world (and the media system) by analysing millions of news articles or tweets ? Media content analysis has historically been the domain of the social sciences, but recently we are witnessing a strong trend towards the automation of many tasks, paving the way for a new - computational - approach to social science and even the humanities. In this talk we will survey the results obtained over the past 5 years at the Intelligent Systems Laboratory of Bristol, in the area of automating the analysis of news media content. By combining techniques from machine translation, pattern recognition, statistical learning, information retrieval, we will analyse patterns connected to the US Presidential Elections, to UK public opinion, and to EU cultural biases.

Auctions and Allocations with Positive Network Externalities
Kamesh Munagala, Duke University

With the advent of social networks such as Facebook, Twitter, and LinkedIn, and online offers/deals web sites, network externalties raise the possibility of marketing and advertising to users based on influence they derive from their neighbors in such networks. Indeed, a user's valuation for a product is changed by his knowledge of which of his neighbors likes or owns the product. Designing allocation schemes and auctions in the presence of network externalities is not very well understood, and poses many challenges that are hard to address with traditional techniques. We show that even in very simple settings, optimal auction design becomes APX-Hard, and pricing schemes may not admit to Nash equilibria in buyer behavior. On the positive side, we present approximation algorithms for welfare maximization and optimal auctions in both single item and multi-item settings via interesting linear programming relaxations.
Joint work with Anand Bhalgat and Sreenivas Gollapudi (working paper); and with Nima Haghpanah, Nicole Immorlica, and Vahab Mirrokni (EC 2011).

Adolescent Societies - Their Form, Evolution, and Variation
Daniel McFarland, Stanford University

Adolescent societies - whether arising in weak, short-term classroom friendships or close, long-term friendships - exhibit tendencies toward segregation, hierarchy and clustering. It is difficult, however, to explain the macro-variation we observe in these social networks drawing only on tie formation micro-mechanisms. Some adolescent societies are rank-ordered caste systems and others are flat, cliquish worlds. What accounts for this variation? We propose and test an ecological theory of social network formation where features of organizational environments moderate tie formation processes to account for the macro-structural variation across settings. We develop this argument using longitudinal friendship data on schools (Add Health study) and classrooms (Classroom engagement study), and by extending exponential random graph models to the study of multiple societies over time.
Joint work with James Moody, David Diehl, Jeff Smith, R. Jack Thomas

The Simple Economics of Approximately Optimal Auctions
Jason Hartline, Northwestern University

Systems wherein strategic agents compete for limited resources are ubiquitous: the economy, computer networks, social networks, congestion networks, etc. Auction theory governs the design and analysis of these systems. I will describe a method for using approximation in auction theory to identify simple, practical, and robust auction mechanisms that perform nearly as well as the complex optimal ones. A main goal of this approach is to understand the extent to which important intuition from optimal auctions for ideal models extends to approximately optimal auctions for more realistic models.
Auction theory is well understood only in ideal models where agents have single-dimensional, linear preferences. I.e., each agent has a value for receiving a service and her utility is the difference between her value and her payment. For these models optimal auctions can be interpreted as "marginal revenue" maximizers (Myerson, 1981; Bulow and Roberts 1989). In more realistic models, i.e., where bidders have multi-dimensional preferences for different outcomes or non-linear utility, similar intuitive characterizations are unknown. Understanding good auctions for these environments is considered to be the main challenge in auction theory. In these more realistic environments maximizing marginal revenue may not be optimal, and furthermore, there is sometimes no direct way to implementing the marginal revenue maximization mechanism. I will present two results: I will give procedures for implementing marginal revenue maximization in general, and I will show that marginal revenue maximization is approximately optimal. The approximation factor smoothly degrades in a term that quantifies how far the environment is from an ideal one (i.e., where marginal revenue maximization is optimal).
Joint work with Saeed Alaei, Hu Fu, Nima Haghpanah, and Azarakhsh Malekian.

Beyond Keyword Search: Taming Information Overload with Rich User Interactions
Khalid El-Arini, Carnegie Mellon University

It has been almost two decades since the first search engine appeared on the Web. In that time, much of our lives has drifted online, from the news we read to the products we buy to our relationships with friends, family and colleagues. As such, information overload has become a ubiquitous problem that affects nearly all members of society, as we try to sift through millions of articles and status updates to get an understanding of our world. However, while our information needs have grown more complex and varied over time, the traditional information retrieval paradigm has scarcely changed: users still submit strings of keywords, and receive a ranked list of hyperlinks in return.
In this talk, I argue the need for new models of user interaction that are better suited to modern retrieval tasks. Some tasks, such as reading the news, can be inherently queryless, while others, such as conducting scientific research, require complex queries. In both cases, a user's personal tastes and beliefs can color the results he or she expects to see. Here, I present a general information retrieval framework that leverages ideas from submodular optimization and probabilistic graphical models to unify both settings. I show results in the news and science domains demonstrating that our approach outperforms state-of-the-art alternatives from Google and Yahoo! on real users.
Finally, as personalization itself has become ubiquitous on the Web, it is important to provide users with an interaction mechanism that allows them to inspect--and potentially correct--inferred user profile information. I present recent work that learns such transparent user profiles from user behavior in a large-scale social network.

Optimal targeting strategies over social networks
Kostas Bimpikis, Stanford University

Recent advances in information technology have allowed ^Lrms to gather vast amounts of data regarding consumers' preferences and the structure and intensity of their social interactions. The question that arises naturally is whether ^Lfirms can intelligently use the available data to improve their business strategies. In this presentation, we take a step towards this direction by discussing our results on optimal price discrimination and targeted advertising over networks. In particular, in the ^Lrst part of the talk, we study the pricing problem of a monopolist selling a divisible good (service) to consumers who are embedded in a social network. We derive a characterization of the optimal price for each consumer as a function of her network position and illustrate the value of knowing the underlying network structure by comparing the pro^Lts of a monopolist who does not take the network into account to those of a monopolist who uses this information optimally.
The second part of the talk examines a game-theoretic model of competition between ^Lfirms, which can target their marketing budgets to individuals. We provide a sharp characterization of the optimal targeted marketing strategies and highlight their dependence on the consumers' references as well as on the underlying social network structure. In particular, firms find it optimal to allocate their marketing budgets to consumers in proportion to their \network centrality", a measure of social influence. Moreover, we identify network structures for which targeted advertising is more beneficial for the ^Lfirms and, ^Lnally, we show how the difference in the initial budgets affect the outcome of the marketing competition between them.

Individual Heterogeneity and the Formation of Collaboration Networks
Katharine Anderson, Carnegie Mellon University

Collaboration drives innovation in a variety of contexts, including product development, entrepreneurship, policy-making, and academic research. Collaboration networks--in which two people are connected if they work together on a problem--provide a valuable tool for understanding the structure of these collaborative communities. Empirically, individuals in collaboration networks play a wide variety of roles--some individuals have many links, while others have few links, some bridge between communities while others play a more peripheral role. From a social scientific perspective, we would like to believe that the heterogeneity we observe in social network position reflects some kind of underlying heterogeneity in individuals--that individuals occupy different positions in the social network, not because they were lucky, but because of some identifiable underlying characteristics. However, our current models of network formation use homogeneous agents, and thus cannot be used to answer questions about individual heterogeneity and network structure.
Here, I introduce a model of network formation in which individuals with heterogeneous skill sets choose their collaborators in order to gain access to skills that are complementary to their own. Together, these collaborative connections comprise a collaboration network. This class of models is interesting, because it connects individual heterogeneity to network heterogeneity, and thus can be used to answer questions about 1) how an individual's skill set affects her position on the social network and 2) how the distribution of skills in the population affects the overall structure of the network. Using this model, I show that an individual's degree on the collaboration network is a supermodular function of her set of skills. As a result, the degree distribution of the network may be highly skewed, even when the distribution of skills in the population is not. This skew becomes more pronounced as individuals in the network have fewer of the skills required to solve their problems--that is, as problems get more difficult. I then use an extension of this model to make predictions about the network position of specialists--individuals whose skills lie within a single subject area--and generalists--individuals whose skill bridge different subject areas. I show that while specialists tend to have more links than generalists, generalists are more likely to bridge between communities. I also show that as disciplinary boundaries fade, the degree distribution of the network becomes more skewed towards a few, high-degree superstars.

DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management.
Dan Iancu, Stanford University

Robust optimization has recently emerged as a computationallytractable and scalable paradigm for modeling complex decision problems underuncertainty. Its applications span a wide range of challenging problems infinance, pricing, supply chain management or network design. In the context of dynamicrobust decision models, a typical approach is to look for control policies (ordecision rules) that are directly parameterized in the uncertaintiesaffecting the model. This has the advantage of leading to simple convexoptimization problems for finding particular classes of rules (e.g., affine).However, the approach typically yields suboptimal policies, and is hard toanalyze. In the first part of the talk, we seek to bridge this paradigm withthe classical Dynamic Programming (DP) framework for solving dynamic decision problems. We provide a set of unifying conditions (based onthe interplay between the convexity and supermodularity of the DP valuefunctions, and the lattice structure of the uncertainty sets) that, takentogether, guarantee the optimality of the class of affine decision rules, and furthermore allow suchrules to be found very efficiently. Ourresults suggest new modeling paradigms for dynamic robust optimization, and ourproofs bring together ideas from three areas of optimization typically studied separately: robust,combinatorial (lattice programming andsupermodularity), and global (the theory of concave envelopes). Weexemplify the approach in a model for negotiating flexiblesupply chain contracts between several parties, where optimal capacity,contracting, and ordering policies can be found by solving a simple linearprogram. In the second part of the talk, we consider a class of models arisingin supply chain networks and multi-product pricing, for which we suggest a newclass of polynomial decision rules, found by solving compact semidefinite programs,and typically yielding drastic improvement compared with the status quo. This is joint work with Dimitris Bertsimas, PabloParrilo, Mayank Sharma and Maxim Sviridenko.

Online Concave Matching
Kamal Jain, Microsoft Research

Online matching was introduced by Karp, Vazirani, Vazirani [KVV] in 1990, which over time became a corner stone of online algorithms. In the matching problem of KVV, the edges are unweighted. Mehta, Saberi, Vazirani, and Vazirani [MSVV] and Buchbinder, Naor, and Jain, developed algorithms for the weighted version of online matching but at the expense of finding a fractional solution. In many large scale instances, such as matching ads to consumers, fractionality is not a major compromise.
The major compromise is that the existing work allows only hard budget constraint, e.g., so many matches per node or a linear utility function bounded above by a budget constraint, which essentially gives you a very specific concave utility function. In practice people have concave utility functions.
In this work we obtain an online algorithm for concave matching problem. The performance of our algorithm is optimal. We use a convex programming duality. Using variational calculus we show how the worst possible example is dual to the best possible algorithm, thereby proving the optimality of the performance guarantee of our algorithm. In hind sight, one can reconstruct the algorithm by studying the worst possible example itself.
Credits: This is a joint work with Nikhil Devanur.
Acknowledgement: This is a work, I had been doing for the last 7 years. Mark Braverman, Alexander Holroyd, Yuval Peres, and Oded Scramm are some of the people who have guided us.

Price Prediction Strategies for Simultaneous One-Shot Auctions
Michael Wellman, University of Michigan

Bidding in simultaneous auctions is challenging because an agent's value for a given good may depend on the outcome of other auctions. Given the lack of known solutions to general simultaneous auction games, previous works have addressed this classic exposure problem with heuristic strategies employing probabilistic price predictions. We introduce a concept of self-confirming prices, and show that within an independent private value model, bidding optimally with respect to self-confirming price predictions is w.l.o.g. in equilibrium. We exhibit practical procedures to compute approximate equilibrium strategies, and demonstrate via empirical game-theoretic analysis that these strategies are effective in simultaneous auction environments with both complementary and substitutable preference structures.
** This is joint work with Amy Greenwald and Eric Sodomka of Brown University.

Lattice Games and the Economics of Aggregators
George Varghese, UCSD and Yahoo! Research

We model the strategic decisions of web sites in content markets, where sites may reduce user search cost by aggregating content. Example aggregations include political news, technology, and other niche-topic websites. We model this market scenario as an extensive form game of complete information, where sites choose a set of content to aggregate and users associate with sites that are nearest to their interests. Thus, our scenario is a location game in which sites choose to aggregate content at a certain point in user-preference space, and our choice of distance metric, Jacquard distance, induces a lattice structure on the game. We provide two variants of this scenario: one where users associate with the first site to enter amongst sites of equal distances, and a second where users choose uniformly between sites at equal distances. We show that Subgame Perfect Nash Equilibria exist for both games. While it appears to be computationally hard to compute equilibria in both games, we show a polynomial-time satisficing strategy called Frontier Descent for the first game. A satisficing strategy is not a best response but ensures that earlier sites will have positive profits, assuming all subsequent sites also have positive profits. By contrast, we show that the second game has no satisficing solution.
**Joint work with P. Jordan, U. Nadav, K. Punera and A. Skrzypacz at Yahoo

Competitive Contagion in Networks
Michael Kearns, University of Pennsylvania

We examine a game-theoretic model of "competitive contagion" in networks, where two competing companies or other entities have limited budgets to seed initial infections in an underlying social network, which then mediates stochastic contagion. The payoff to each party is their final number of infections, which at equilibrium may come at the expense of the other party. In this model we provide characterizations of the Price of Anarchy and a new notion called the Price of Budgets. These characterizations depend on properties of the local stochastic dynamics of contagion, and in some cases exhibit sharp threshold behavior.
**Joint research with Sanjeev Goyal of the University of Cambridge.

Building an Institutional Field to Corral a Government
Stephen Barley, Stanford University

This talk will describe how the American business community either founded or revitalized eight distinct populations of organizations during the late 1970s and the 1980s which intentionally and unintentionally formed an institutional field (a network of organizations) designed to influence the federal government and create a more business friendly political environment.

Cooperation and Assortativity with Endogenous Partner Selection
Siddharth Suri, Yahoo! Research

The natural tendency for humans to choose with whom to form new relationships and with whom to end established relationships is thought to facilitate the emergence of cooperation. Helping cooperators to mix assortatively is believed to reinforce the rewards accruing to mutual cooperation while simultaneously excluding defectors. However, the relationship between endogenous partner selection, assortativity, and cooperation has been largely unexplored experimentally. Here we report on a series of human subjects experiments in which groups of 24 participants played a multi-player prisoner's dilemma game where, critically, they were also allowed to propose and delete links to players of their own choosing at some variable rate. Over a wide variety of parameter settings and initial conditions, we found that endogenous partner selection significantly increased the level of cooperation, the average payoffs to players, and the assortativity between cooperators. Even relatively slow update rates were sufficient to produce large effects resulting in cooperation levels over 80%. Subsequent increases to the update rate still had a positive, although smaller, effect. For standard prisoner's dilemma payoffs, we also found that assortativity resulted predominantly from cooperators avoiding defectors, not by severing ties with defecting partners, and that cooperation correspondingly suffered. Finally, by modifying the payoffs to satisfy two novel conditions, we found that cooperators did punish defectors by severing ties, leading to levels of cooperation approaching 100% which persisted for longer.
**Joint work with: Jing Wang (NYU) and Duncan Watts (Yahoo! Research)

Two(!) Good To Be True
Sergiu Hart, The Hebrew University of Jerusalem

How to sell goods optimally? While the mechanism design literature has solved this problem neatly when there is only one good, the multiple goods case turns out to be extremely difficult, mathematically and conceptually. Much of what is true for one good does not extend to multiple goods. We will try to explain the difficulties, show what can go wrong, and then present some universal approximation results. The talk is essentially self-contained; no background in mechanism design is necessary.
**Joint work with Noam Nisan and Phil Reny

Identifying peer effects in online communication behaviors
Dean Eckles, Facebook

Peer effects can produce clustering of behavior in social networks, but so can homophily and common external causes. For observational studies, adjustment and matching estimators for peer effects require often implausible assumptions, but it is only rarely possible to conduct appropriate direct experiments to study peer influence in situ. We describe research designs in which individuals are randomly encouraged to perform a focal behavior, which can subsequently influence their peers. Ubiquitous product optimization experiments on Internet services can be used for these analyses. This approach is illustrated with an analysis of peer effects in expressions of gratitude via Facebook on Thanksgiving Day 2010, with implications for the micro-foundations of culture.

Networks, communities and the ground-truth
Jure Leskovec

The Web, society, information, cells and brain can all be represented and studied as complex networks of interactions. Nodes in such networks tend to organize into clusters and communities, which represent the fundamental structures for understanding the organization of complex systems. Even though detection of network communities is of significant importance for computer science, sociology and biology, our understanding of the community structure of large networks remains limited.

We study a set of more than 200 large networks with the goal to understand and identify communities in networks. We challenge the conventional view of network community structure and show that it is not exhibited by the large real-world networks. We then present a new conceptual model of network community structure, which reliably captures the overall structure of networks and accurately identifies the overlapping nature of network communities.

This is joint work with Jaewon Yang.

Computing Game-Theoretic Solutions for Security
Vincent Conitzer

Algorithms for computing game-theoretic solutions are now deployed in real-world security domains, notably air travel. These applications raise some hard questions. How do we deal with the equilibrium selection problem? How is the temporal and informational structure of the game best modeled? What assumptions can we reasonably make about the utility functions of the attacker and the defender? And, last but not least, can we make all these modeling decisions in a way that allows us to scale to realistic instances? I will present our ongoing work on answering these questions.

Joint work with Dmytro Korzhyk, Joshua Letchford, Kamesh Munagala, Ronald Parr (Duke); Manish Jain, Zhengyu Yin, Milind Tambe (USC); Christopher Kiekintveld (UT El Paso); Ondrej Vanek, Michal Pechoucek (Czech Technical University); and Tuomas Sandholm (CMU).

Brief bio: Vincent Conitzer is the Sally Dalton Robinson Professor of Computer Science and Professor of Economics at Duke University. His research focuses on computational aspects of microeconomic theory, in particular game theory, mechanism design, voting/social choice, and auctions. He recently received the IJCAI Computers and Thought Award, which is awarded to outstanding young scientists in artificial intelligence.

Revenue-optimal Auctions
Costis Daskalakis

In his seminal paper, Myerson [1981] provides a revenue-optimal auction for a seller who is looking to sell a single item to multiple bidders. Unfortunately, Myerson's auction generalizes to a limited range of domains, called "single-parameter", where each bidder's preference over the auction’s outcomes is specified by a single parameter. Indeed, extending this auction to "multi-parameter domains", such as selling multiple heterogeneous items to bidders, has been one of the most central problems in Mathematical Economics. We solve this problem in bidder- and item- symmetric settings. For a single bidder, we solve the general problem.

(Based on joint work with Yang Cai and Matt Weinberg)

Bio: Constantinos (or Costis) Daskalakis is an Assistant Professor of EECS at MIT. He completed his undergraduate studies in Greece, at the National Technical University of Athens, and obtained a PhD in Computer Science from UC Berkeley. After Berkeley he was a postdoctoral researcher in Microsoft Research New England, and since 2009 he has been at the faculty of MIT. His research interests lie in Algorithmic Game Theory and Applied Probability, in particular computational aspects of markets and the Internet, social networks, and computational problems in Biology. Costis has been honored with a 2007 Microsoft Graduate Research Fellowship, the 2008 Game Theory and Computer Science Prize from the Game Theory Society, the 2008 ACM Doctoral Dissertation Award, a NSF Career Award, a 2010 Sloan Foundation Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, and the MIT Ruth and Joel Spira Award for Distinguished Teaching.

A Random Graph Model of Multi-Hospital Kidney Exchanges
David Parkes

Multi-hospital kidney exchanges, where hospitals share local lists of donor-patient pairs with a centralized exchange, promise to faciliate larger matches, and thus additional transplants. But a central challenge is to align incentives, so that hospitals will choose to share all pairs with the exchange. We present a random-graph model of feasible transplants, and use the model to (a) explain early experimental results on 2-way and 3-way exchanges anayltically, (b) derive a "square-root law" for welfare gains from centralized pools, and (c) design a matching mechanism that is Bayes-Nash incentive compatible, efficient and individual rational under reasonable assumptions. The robustness of the mechanism is established through a computational study, validating incentive alignment, and estimating an efficiency loss due to incentive alignment of 1-4% for 2-way exchanges and 5-19% for 3-way exchanges (and decreasing with larger pools.)

Joint work with Panos Toulis

A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs
Arnout van de Rijt

Contemporary scholars of fame argue that public attention to persons is short-lived. We investigate this "fifteen minutes" hypothesis in a unique data source containing daily records of references to millions of names in 2,500 English-language newspapers, on TV station websites, and on blogs. Longitudinal analysis shows that instead fame tends to carry over from one year to the next, except below some threshold level of marginal public attention where fame does appear to be fleeting. This pattern of above-threshold annual stability is found back even in newspaper entertainment sections, celebrity-oriented tabloids, and blogs, where fame is thought to be most ephemeral. Analysis of archival newspaper data reveals that while coverage of small names follows a pattern of rapid decay, big names traverse a career-like trajectory spanning many years. We take this evidence to suggest that fame is created through an interaction between static social hierarchies and dynamic reinforcement processes, whereby a population is pre-stratified in terms of the chances that a cascade of public celebration will latch on to one of its members.

Treasure Hunt
Markus Mobius

We seed a large real-world social network with binary information and analyze subsequent social learning. A unique feature of our field experiment is that we measure both the pre-existing social networks and the actual conversation network. Our experiment allows us to test how rational agents behave when processing information that originates within their social network. We find that information decays quickly with social distance and that agents mainly incorporate information within social distance 2. Conversations through common friends do not increase the weight that a subject places on signals from direct friends but linearly increases the weight on signals from indirect friends. This suggests that agents are able to avoid double-counting information from indirect friends. We propose a simple “streams model” of social learning that is consistent with the evidence from our experiment.

Task Routing for Prediction Tasks
Yiling Chen

We describe methods for routing a prediction task on a network where each participant can contribute information and route the task onwards. We introduce routing scoring rules that bring truthful contribution of information and optimal routing of the task into a Perfect Bayesian Equilibrium under common knowledge about the amount of information available to each agent on the network. Relaxing the common knowledge assumption, we address the challenge of routing in situations where states of information differ among agents. We introduce a family of local routing rules, which isolate routing decisions that depend only on local information in equilibrium, while still promoting effective routing. Local routing rules are the only routing scoring rules that induce truthful equilibria in which agents' routing decisions rely solely on local information. Simulation results show that local routing decisions can lead to effective task routing.

Joint work with Haoqi Zhang, Eric Horvitz, and David Parkes.

Bargaining Theory in the Cloud
Eric Friedman

The axiomatic theory of bargaining solutions was initiated by John Nash with his seminal paper in 1950 and has a long and mostly mathematical history. Surprisingly, it arises naturally in a variety of allocation problems arising in cloud computing. For example, the second most famous bargaining solution, the Kalai-Smorodinsky solution, is the outcome of a simple water filling algorithm used in the Mesos Platform and has many strong properties in that setting, including incentive compatibility and fairness. In this talk, I will explore these connections for a variety of cloud computing problems and show how axiomatic bargaining theory can be used to analyze allocation problems in the cloud and conversely how cloud computing sheds new light on axiomatic bargaining theory.

This talk is based on joint work with Ali Ghodsi, Scott Shenker and Ion Stoica.

Network Patterns of Favor Exchange
Matthew Jackson

Abstract: We examine the informal exchange of favors among the members of a society. We analyze a game where the fear of losing multiple relationships can provide incentives for favor provision. We characterize the network patterns of exchange that are robust in a sense that deleted relationships only result in a local loss of favor exchange. Such robustness necessitates networks such that all links are "supported": any pair of individuals exchanging favors must have a common friend. We show that favor exchange networks in 75 villages in rural India exhibit a frequency of this sort of support that is significantly higher than a standard network 'clustering' measure.

Co-authors Tomas Rodriguez-Barraquer and Xu Tan

Simple Auctions with Near-Optimal Equilibria
Tim Roughgarden

In practice, auction designers often exchange incentive-compatibility for simplicity (e.g., in sponsored search auctions or combinatorial auctions). How much welfare loss does this design choice cause? I'll talk about some techniques for answering this question.

Based largely on a SODA '11 paper with Kshipra Bhawalkar.

Efficient Online Ad Serving in a Display Advertising Exchange
Kevin Lang

We begin with a tutorial overview of the basic concepts of Non-Guaranteed (i.e. spot-market) display advertising and advertising exchanges, then pose and solve a constrained path optimization problem that lies at the heart of the real-time ad serving task in Yahoo!'s NGD Ad Exchange.

In Yahoo!'s Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the numerous business entities that belong to the Exchange. These entities include not only publishers and advertisers, but also intermediate entities called ``ad networks'' which have delegated their ad serving responsibilities to the Exchange.

Path validity is determined by business constraints which focus on the following three issues: 1) suitability of the display opportunity's web page and its publisher 2) suitability of the user who is currently viewing that web page, and 3) suitability of a candidate ad and its advertiser.

Path optimality is determined by money reaching the publisher, and is affected not only by an advertiser's bid, but also by the revenue-sharing agreements between the entities in the candidate path.

We describe two different algorithms that have both been used in the actual Yahoo! non-guaranteed ad serving system, focusing on typical case rather than worst case performance, and on the optimality-preserving speedup heuristics that allow latency targets to be met.

The talk is based on a WSDM 2011 paper that was co-written with Joaquin Delgado, Dongming Jiang, Bhaskar Ghosh, Shirshanka Das, Amita Gajewar, Swaroop Jagadish, Arathi Seshan, Chavdar Botev, Michael Binderberger-Ortega, Sunil Nagaraj, and Raymie Stata.

Designing Algorithms for MapReduce
Sergei Vassilvitskii

The amount of data available today has lead to analysis problems whose size would be unimaginable just 10 years ago. With datasets routinely measured in gigabytes and terabytes, one challenge that has emerged is scaling algorithms to handle this deluge of information. With chip designers sustaining Moore’s law by turning to parallelism, so have algorithmicists turned to parallelism to speed up algorithms and improve performance. In this talk, we will focus on the MapReduce model of parallel computing, highlight some of the recent algorithmic advances and present current challenges.

Based on joint work with Howard Karloff, Silvio Lattanzi, Ben Moseley and Siddharth Suri.

Incentivizing High-quality User-Generated Content
Arpita Ghosh

We model the economics of incentivizing high-quality user generated content (UGC), motivated by settings such as online review forums, question-answer sites, and comments on news articles and blogs. We provide a game-theoretic model within which to study the problem of incentivizing high quality UGC, in which contributors are strategic and motivated by exposure. Our model has the feature that both the quality of contributions {\em as well as} the extent of participation is determined endogenously in a free-entry Nash equilibrium. The model predicts, as observed in practice, that if exposure is independent of quality, there will be a flood of low quality contributions in equilibrium. An ideal mechanism in this context would elicit both high quality and high participation in equilibrium, with near-optimal quality as the available attention diverges, and should be easily implementable in practice. We consider a very simple elimination mechanism, which subjects each contribution to rating by some number $A$ of viewers, and eliminates any contributions that are not uniformly rated positively. We construct and analyze free-entry Nash equilibria for this mechanism, and show that $A$ can be chosen to achieve quality that tends to optimal, along with diverging participation, as the number of viewers diverges.

Joint work with Preston McAfee.

Coevolution of Network Structure and Content
Lada Adamic

As individuals communicate, their exchanges form a dynamic network. A time series analysis of both standard and novel network metrics can be used to characterize a network's evolution. We demonstrate that network structure alone can be highly revealing of the diversity and novelty of the information being communicated in the network. Networks with a higher conductance exhibit higher information entropy, while unexpected network configurations are indicative of higher information novelty.

This is joint work with Liuling Gong, Edwin Teng, and Avishay Livne.

Damon Centola


BGP Safety with Spurious Updates - The Conditions of BGP Convergence
Martin Suchara

We explore BGP safety, the question of whether a BGP system converges to a stable routing, in light of several BGP implementation features that have not been fully included in the previous theoretical analyses. We show that Route Flap Damping, MRAI timers, and other intra-router features can cause a router to briefly send ?spurious? announcements of less-preferred routes. We demonstrate that, even in simple configurations, this short-term spurious behavior may cause long-term divergence in global routing. We then present DPVP, a general model that unifies these sources of spurious announcements in order to examine their impact on BGP safety. In this new, more robust model of BGP behavior, we derive a necessary and sufficient condition for safety, which furthermore admits an efficient algorithm for checking BGP safety in most practical circumstances -- two complementary results that have been elusive in the past decade's worth of classical studies of BGP convergence in more simple models. We also consider the implications of spurious updates for well-known results on dispute wheels and safety under filtering. Joint work with Alex Fabrikant and Jennifer Rexford.

Predicting the Popularity of Online Content
Gabor Szabo

Online services that collect, categorize, and disseminate user-generated content have seemingly very different focus, user communities, and content types. We present a study on the popularity of content in three online services, Youtube, Twitter, and Digg. While videos on Youtube, short messages in Twitter, and interesting articles on Digg do not necessarily have much in common, it turns out that the patterns of access to content on any of these services are very similar across the sites. In particular, we present predictions for the long-term popularity of content based on early observations, and moreover discuss what role the social network plays in determining the success of a submission.

Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism
Elisa Celis

I will first discuss bid data from Microsoft’s AdECN platform, which allows advertisers to target their bids. We find that the valuations are irregular, making the commonly used Vickrey auction unsuitable for extracting revenue. However, Myerson's optimal auction mechanism is often unsuitable in practice.

I will present and discuss the benefits of a simple auction mechanism (BIN-TAC) which extends the second-price auction with reserve. This mechanism is particularly effective in private value environments where the distribution of valuations are irregular, and is truthful in expectation. We use the AdECN data in counterfactual experiments that suggest our BIN-TAC mechanism would improve revenue by 24% relative to the mechanism currently in place.

Social Learning and the Dynamic Cavity Method
Yashodhan Kanoria

Social learning constitutes learning of behavior from interaction with friends/neighbors on a network. This talk will focus on models of repeated interaction, with agents 'voting' in a series of rounds on some issue of interest. Votes in the initial round are based on 'private signals', whereas votes in future rounds incorporate knowledge of previous votes cast by friends.

We look at two different models of iterative learning. A very simple model is 'majority dynamics' where agents choose their vote based on the majority of neighbors' votes in the previous round. We analyze this model on regular trees. At the other extreme is a iterative Bayesian learning: a fully rational model introduced by Gale and Kariv (2003). We introduce new algorithms for this model, challenging the widespread belief that learning is computationally intractable in this setting. We develop a novel technique -- the 'dynamic cavity method', that serves as a key tool for both models.

Based on joint work with Andrea Montanari (to appear in Ann. App. Prob.) and Omer Tamuz (submitted).

The Complexity Ecology of Consensus Ranking
Matthias Mnich

Preference lists are extensively used in social science surveys and voting systems to capture information about choice. In many such scenarios there arises the need to combine the data represented by many lists into a single list which reflects the opinion of the surveyed group as much as possible. This yields a class of ranking aggregation problems, where we are given a set of permutations (votes) over a set of alternatives (candidates), and are asked for a permutation of the set candidates such that some objective is minimized.Examples include Kemeny ranking and other consensus methods, that find important applications in web search engines, e-mail spam detection, and user recommendation systems.

We present results on this topic from a multivariate complexity and algorithms analysis, based on parameterized complexity.

(Joint work with Henning Fernau, Fedor Fomin, Daniel Lokshtanov, Geevarghese Philip and Saket Saurabh)

Contribution Games in Social Networks
Elliot Anshelevich

We consider network contribution games, where each agent in a social network has a budget of effort that it can contribute to different collaborative projects or relationships. Depending on the contribution of the involved agents a relationship will flourish or drown, and to measure the success we use a reward function for each relationship. Every agent is trying to maximize the reward from all relationships that it is involved in. We consider pairwise equilibria of this game, and characterize the existence, computational complexity, and quality of equilibrium based on the types of reward functions involved. For example, when all reward functions are concave, we prove that the price of anarchy is at most 2. For convex functions the same only holds under some special but very natural conditions. A special focus of the talk are minimum effort games, where the reward of a relationship depends only on the minimum effort of any of the participants. Finally, we show tight bounds for approximate equilibria and convergence of dynamics in these games.

Liquidity in Credit Networks: A Little Trust Goes a Long Way
Pranav Dandekar

Credit networks represent a way of modeling trust between entities in a network. Nodes in the network print their own currency and trust each other for a certain amount of each other's currency. This allows the network to serve as a decentralized payment infrastructure--- arbitrary payments can be routed through the network by passing IOUs between trusting nodes in their respective currencies---and obviates the need for a common currency. Nodes can repeatedly transact with each other and pay for the transaction using trusted currency. A natural question to ask in this setting is: how long can the network sustain liquidity, i.e. how long can the network support the routing of payments before credit dries up? We answer this question in terms of the long term success probability of transactions for various network topologies and credit values.

We show that the success probability of transactions is independent of the path used to route .ow between nodes. For symmetric transaction rates, we show analytically and via simulations that the success probability for complete graphs, Erdos-Renyi graphs and preferential attachment graphs goes to one as the size, the density or the credit capacity of the network increases. Finally, we compare liquidity in credit networks to that in a centralized payment infrastructure and show that credit networks are at most a constant factor worse; thus we do not lose much liquidity in return for their robustness and decentralized properties.

Joint work with Ashish Goel, Ramesh Govindan and Ian Post.

User Browsing Models: Relevance versus Examination
Sugato Basu

There has been considerable work on user browsing models for search engine results, both organic and sponsored. The click-through rate (CTR) of a result is the product of the probability of examination (will the user look at the result) times the perceived relevance of the result (probability of a click given examination). Past papers have assumed that when the CTR of a result varies based on the pattern of clicks in prior positions, this variation is solely due to changes in the probability of examination.

We show that, for sponsored search results, a substantial portion of the change in CTR when conditioned on prior clicks is in fact due to a change in the relevance of results for that query instance, not just due to a change in the probability of examination. We then propose three new user browsing models, which attribute CTR changes solely to changes in relevance, solely to changes in examination (with an enhanced model of user behavior), or to both changes in relevance and examination. The model that attributes all the CTR change to relevance yields substantially better predictors of CTR than models that attribute all the change to examination, and does only slightly worse than the model that attributes CTR change to both relevance and examination. For predicting relevance, the model that attributes all the CTR change to relevance again does better than the model that attributes the change to examination. Surprisingly, we also find that one model might do better than another in predicting CTR, but worse in predicting relevance. Thus it is essential to evaluate user browsing models with respect to accuracy in predicting relevance, not just CTR.

Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
Daria Sorokina

Ensembles of regression trees are long known as powerful models suitable for a large variety of tasks. The most well-known ensembles are Random Forests (averaging independent large, highly non-linear trees) and Gradient Boosting (additive model built from small trees). In the beginning of the talk I will describe an ensemble that combines the benefits of both: Additive Groves is based on regression trees, additive models and bagging and is capable of both fitting additive structure of the problem and modelling its highly nonlinear components with very large trees at the same time. Combination of these properties makes Additive Groves superior in performance to other existing tree ensemble methods.

Models with best performance often act as black boxes: they make good predictions, but do not provide much insight into the decision making process. This is not satisfactory for many data mining problems. Therefore separate post-processing techniques are needed to answer the questions about important features, their effects and interactions.

The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers. In this talk I will describe an approach to interaction detection based on comparing the performance of unrestricted and restricted prediction models, where restricted models are prevented from modeling an interaction in question. I will show that Additive Groves has the required properties and can be used within this framework.

I will also talk about several applications of these techniques to real-world data: tracking environment change based on birds abundance and Salmonella risk prediction on meat-processing establishments. Apart from that, Additive Groves has been successfull in several recent competitions: KDD Cup'09, ICDM'09 and Yahoo Learning to Rank Challenge. The latter result is especially interesting, because Additive Groves was the only winning solution that did not optimize for any specific ranking metrics.

Algorithms presented in the talk are implemented as a part of an open source C++ package available at

At the end of the talk I will make a short presentation about Yandex and its California office ( We are hiring!

Reflections on Link-based Ranking
Marc Najork

This talk focuses on using hyperlinks in the ranking of web search results. We give a brief overview of the vast body of work in the area; we provide a quantitative comparison of the different features; we sketch how link-based ranking features can be implemented in large-scale search engines; and we identify promising avenues for future research.

Output URL Bidding
Panagiotis Papadimitriou

Output URL bidding is a new bidding mechanism for sponsored search, where advertisers bid on search result URLs, as opposed to keywords in the input query. For example, an advertiser may want his ad to appear whenever the search result includes the sites and, instead of bidding on keywords that lead to these sites, e.g., movie titles, actor names, and so on. In this paper we study the tradeoff between the simplicity and the specification power of output bids and we explore their utility for advertisers. We first present a model to derive output bids from existing keyword bids. Then, we use the derived bids to experimentally study output bids and contrast them to input query bids. Our main results are the following: (1) Compact output bids that mix both URLs and hosts have the same specification power as more lengthy input bids; (2) Output bidding can increase the recall of relevant queries; and (3) Output and input biding can be combined into a hybrid mechanism that combines the benefits of both.

A Technical Overview of Search Ads Quality at Google
Roberto Bayardo

Abstract: Ads quality involves optimizing the oftentimes competing interests of Google, the users of its search engine, and advertisers who pay to have their ads displayed on search results pages. In this talk I describe the primary problems faced by the Google ads quality team, and some of the general approaches we take to address them. My goal of the talk is to introduce others to problems of interest beyond the better studied problems of click through estimation and mechanism design. The topics covered will include keyword targeting & bid specification, budget allocation, user based ads quality, ad formats, when to show an ad and where, and assessing quality of ad creatives, their landing pages, and the entire "post click" experience.

Bio: Roberto Bayardo is a principal research scientist at Google, where he has been working within the ads quality and ads serving teams for the past 5 years. Roberto currently leads a team working on quality modeling, with the broad goal of ensuring that ads displayed alongside Google search provide a good user experience.

On the Competitive Ration of Online Sampling Auctions
Georgios Pierrakos

We study online profit-maximizing auctions for digital goods with adversarial bid selection and uniformly random arrivals. Our goal is to design auctions that are constant competitive with F^(2); in this sense our model lies at the intersection of prior-free mechanism design and secretary problems. We first give a generic reduction that transforms any offline auction to an online one, with only a loss of a factor of 2 in the competitive ratio; we then present some natural auctions, both randomized and deterministic, and study their competitive ratio; our analysis reveals some interesting connections of one of these auctions with RSOP, which we further investigate in our final section.

The importance of ties
Paolo Parigi

For sociologists, the ties that join individuals into relationships are the result of social processes rather than being entities themselves. A social network is therefore a representation for something that happens between individuals and that brings people together. Sometimes, however, social processes sever ties and produce an increase in the distance that separates individuals, ultimately generating isolation. In my talk I will first provide a few examples of social processes that generate ties and then I will advance a theoretical scenario for studying processes that instead decrease the number of ties. The positive examples of tie generation come from my work in the fields of political sociology and organization theory. The negative scenario of increasing isolation comes from my work on social network analysis. In presenting these examples, my goal is to stimulate the conversation between social scientists and computer scientists in the field of network analysis.

How opinions are received by online communities: a case study on helpfulness votes
Gueorgi Kossinets

There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is, where reviews come with annotations like "26 of 32 people found the following review helpful." Opinion evaluation appears in many off-line settings as well, including market research and political campaigns. Reasoning about the evaluation of an opinion is fundamentally different from reasoning about the opinion itself: rather than asking, "What did Y think of X?", we are asking, "What did Z think of Y's opinion of X?" Here we develop a framework for analyzing and modeling opinion evaluation, using a large-scale collection of Amazon book reviews as a dataset. We find that the perceived helpfulness of a review depends not just on its content but also but also in subtle ways on how the expressed evaluation relates to other evaluations of the same product. As part of our approach, we develop novel methods that take advantage of the phenomenon of review "plagiarism" to control for the effects of text in opinion evaluation, and we provide a simple and natural mathematical model consistent with our findings. Our analysis also allows us to distinguish among the predictions of competing theories from sociology and social psychology, and to discover unexpected differences in the collective opinion-evaluation behavior of user populations from ifferent countries.

Reaching Consensus on Social Networks
Grant Schoenebeck

Research in sociology studies the effectiveness of social networks in achieving computational tasks. Typically the agents who are supposed to achieve a task are unaware of the underlying social network except their immediate friends. They have limited memory, communication, and coordination. These limitations result in computational obstacles in achieving otherwise trivial computational problems. One of the simplest problems studied in the social sciences involves reaching a consensus among players between two alternatives which are otherwise indistinguishable. In this paper we formalize the computational model of social networks. We then analyze the consensus problem as well as the problem of reaching a consensus which is identical to the majority of the original signals. In both models we seek to minimize the time it takes players to reach a consensus.

Web-scale Information Extraction Using Wrappers.
Nilesh Dalvi

A significant portion of web pages embed interesting and valuable semantic content suitable for structured representation. The traditional Information Extraction techniques, however, fall short of achieving high quality extraction at Web scale. In this talk, I will outline some of the work going on at Yahoo! Research on addressing the challenges of Information Extraction on a Web scale. I will focus on {\em wrapper} based techniques, which exploit the HTML structure of websites to extract the information of interest. I will address two problems : (i) making wrappers more robust to changes in websites, and (ii) enabling learning of wrappers from automatically obtained noisy training data.s

Characterization and Modeling of Online Browsing Activity.
Andrew Tomkins

I will describe a set of studies of online user behavior based on search and toolbar logs. In these studies, we propose a new "CCS" taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Search (web search, item search, multimedia search). We show that roughly half of all pageviews online are content, 1/3 are communications, and the remaining 1/6 are search. We study the interarrival distribution of visits to a page by a single user and in aggregate, and show some evidence that "bursty" behavior in which a URL attains significant transient popularity for a few days is not a significant contributor to interarrival times. We also study the nature and frequency of search queries in more detail, differentiating between web search, item search, and multimedia search, and characterizing the prevalence of references to structured objects in web search queries. Navigation patterns in this data are not well characterized as trails, but should instead be viewed as trees due to the presence of multi-tab or multi-window browsing.  We present a model of tabbed browsing that  represents a hybrid between a markov process capturing the graph of web pages, and a branching process capturing the creation, splitting, and dying of tabs.  We present a mathematical criterion to characterize whether the process has a steady state independent of initial conditions, and we show how to characterize the limiting behavior in both cases.   We perform a series of experiments to compare our tabbed browsing model with the simpler random surfer model. Joint work with Ravi Kumar and Flavio Chierichetti.

User grouping behavior in online forums.
Xiaolin Shi

Online forums represent one type of social media that is particularly rich for studying human behavior in information seeking and diffusing. The way users join communities is a reflection of the changing and expanding of their interests toward information. In this talk, I present the work studying the patterns of user participation behavior, and the feature factors that influence such behavior on different forum datasets. In this work, we find that, despite the relative randomness and lesser commitment of structural relationships in online forums, users' community joining behaviors display some strong regularities. One particularly interesting observation is that the very weak relationships between users defined by online replies have similar diffusion curves as those of real friendships or co-authorships. We build social selection models, Bipartite Markov Random Field (BiMRF), to quantitatively evaluate the prediction performance of those feature factors and their relationships. Using these models, we show that some features carry supplementary information, and the effectiveness of different features vary in different types of forums. Moreover, the results of BiMRF with two-star configurations suggest that the feature of user similarity defined by frequency of communication or number of common friends is inadequate to predict grouping behavior, but adding node-level features can improve the fit of the model.

Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing.
Matthew Harding

In this paper we introduce a new Poisson mixture model for count panel data where the underlying Poisson process intensity is determined endogenously by consumer latent utility maximization over a set of choice alternatives. This formulation accommodates the choice and count in a single random utility framework with desirable theoretical properties. Individual heterogeneity is introduced through a random coefficient scheme with a flexible semiparametric distribution. We deal with the analytical intractability of the resulting mixture by recasting the model as an embedding of infinite sequences of scaled moments of the mixing distribution, and newly derive their cumulant representations along with bounds on their rate of numerical convergence. We further develop an efficient recursive algorithm for fast evaluation of the model likelihood within a Bayesian Gibbs sampling scheme, and show posterior consistency. We apply our model to a recent household panel of supermarket visit counts. We estimate the nonparametric density of three key variables of interest - price, driving distance, and their interaction - while controlling for a range of consumer demographic characteristics. We use this econometric framework to assess the opportunity cost of time and analyze the interaction between store choice, trip frequency, search intensity, and household and store characteristics. We also conduct a counterfactual welfare experiment and compute the compensating variation for a 10% to 30% increase in Walmart prices.

Deconstructing A Long Tail Commerce Network.
Neel Sundaresan

In this talk we discuss modeling a complex long tail commerce network in the form of the eBay Marketplace graph. The asymmetric and preferential nature of buying and selling in terms of structure, interactions, trust and reputation measures and evolution of these has tremendous value in identifying business opportunities and building effective user applications.The discussion focuses on the analysis of the macroscopic shape of the networks across the platform and also in specific marketplace categories. The results suggest significant diversity across these individual marketplaces from collectors network to retail network. We also discuss motif profiling and preferential connections. Means for defining trust measures based on network properties will also be addressed.

Nod vs. High-Five: LinkedIn Connection Strength Models and Applications.
Monica Rogati

The low cost of link formation on social and professional networking sites leads to connections with heterogeneous relationship strengths, a LinkedIn user might be connected to both professional acquaintances and close collaborators. Modeling this relationship as non-binary is more realistic (and more useful in personalizing a user's experience) than the binary links that have been the focus of previous work in the social networking space. We present an unsupervised, link-based latent variable model to estimate relationship strength from interaction activity (e.g., communication, browsing, comments) and profile content similarity. We also discuss practical applications of modeling connection strength when building data driven products that enhance the LinkedIn user experience and engagement. (This is joint work with Rongjing Xiang during her internship at LinkedIn.)

Network of positive and negative edges
Jure Leskovec

Relations between users on social media sites often reflect a mixture of positive (friendly) and negative (antagonistic) interactions. We study how the interplay between positive and negative relationships affects the structure of on-line social networks, whereas in contrast the bulk of research on social networks to date has focused almost exclusively on positive interpretations of links between people. We connect our analyses to theories of signed networks from social psychology. In doing so we find that the classical theory of structural balance tends to capture certain common patterns of interaction, but that it is also at odds with some of the fundamental phenomena we observe --- particularly related to the evolving, directed nature of these on-line networks. We then develop an alternate theory of status that better explains the observed edge signs and provides insights into the underlying social mechanisms. Our work provides one of the first large-scale evaluations of theories of signed networks using on-line datasets, as well as providing a perspective for reasoning about social media sites.

Joint work with Daniel Huttenlocher and Jon Kleinberg.

The paper can be accessed here.

Social Networks and Stable Matchings in the Job Market
Esteban Arcaute

In this talk we introduce and study a model that considers the job market as a two-sided matching market, and accounts for the importance of social contacts in finding a new job. We assume that workers learn only about positions in firms through social contacts. Given that information structure, we study both static properties of what we call locally stable matchings, a solution concept derived from stable matchings, and dynamic properties through a reinterpretation of Gale-Shapley's algorithm as myopic best response dynamics. We prove that, in general, the set of locally stable matching strictly contains that of stable matchings and it is in fact NP-complete to determine if they are identical. We also show that the lattice structure of stable matchings is in general absent. Finally, we focus on myopic best response dynamics inspired by the Gale-Shapley algorithm. We study the efficiency loss due to the informational constraints, providing both lower and upper bounds.

This is joint work with Sergei Vassilvitskii, Ramesh Johari and David Liben-Nowell. Some of the results presented appeared in WINE 2009.

Temporal User Behavior in Yahoo Answers
Christina Aperjis

When searching for an answer to a question, people generally prefer to get both high quality and speedy answers. The fact that it usually takes longer to find a better answer creates a speed-accuracy tradeoff which is inherent in information seeking. A natural setting to study this tradeoff is Yahoo Answers, a community-driven question-and-answer site. We identify and quantify how question authors trade off the number of answers they receive and the cost of waiting. We find that users are willing to wait more to obtain an additional answer when they have only received a small number of answers; this implies decreasing marginal returns in the amount of collected information. We also estimate the user's utility function from the data.

This is joint work with Bernardo A. Huberman and Fang Wu.

The Netflix Prize: Quest for $1,000,000
Yehuda Koren

The collaborative filtering approach to recommender systems predicts user preferences for products or services by learning past user-item relationships. Their significant economic implications made collaborative filtering techniques play an important role at known e-tailers such as Amazon and Netflix. This field enjoyed a surge of interest since October 2006, when the Netflix Prize competition was commenced. Netflix released a dataset containing 100 million anonymous movie ratings and challenged the research community to develop algorithms that could beat the accuracy of its recommendation system, Cinematch. In this talk I will survey the competition together with some of the principles and algorithms, which have led us to winning the Grand Prize in the competition.

Data Jujitsu - the art of treating data as a product
DJ Patil

LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio. This emphasis on data as a product led LinkedIn to be the first social network site to deploy a People You May Know service. Other data products that LinkedIn has built include Who Viewed My Profile, People Who Viewed This Profile Also Viewed..., the backend standardization algorithms that power search, and matching algorithms. To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way. In this presentation, we will talk about our methodology for turning data into user facing products and some of the major challenges we see in the data space.

Query-log analysis for improving web search
Panayiotis Tsaparas

Search engines log the activity of users when posing queries and interacting with the results. Query click logs maintain information about what queries were asked, what results were shown and which ones were clicked as a result of the query. This information offers a valuable insight in the intent of the user and the need behind the query, and it has been the focus of intense research activity. In this talk, we will describe algorithms for query-log analysis. First we will talk about an algorithm that uses query logs for generating advertising keywords and understanding the category of a query. Time permitting, we will then describe an algorithm for generating training data for learning a ranking function. In both cases we model the query log as a graph, and we formulate our problem as an optimization problem over these graphs.

Stochastic Submodular Optimization
Arash Asadpour

We study stochastic submodular maximization problem with respect to cardinality and matroid constraints. Our model can capture the effect of uncertainty in different problems, such as cascade effects in social networks, capital budgeting, sensor placement, etc. We study non-adaptive and adaptive policies and give "optimal" constant approximation algorithms for both cases. We also show that the adaptivity gap in this setting is 1.59.

Compressibility of Behavioral Graphs
Ravi Kumar

Graphs resulting from human behavior (the web graph, social networks, etc.) have hitherto been viewed as a monolithic class with similar characteristics. Our focus here is on the compressibility of such graphs. It has been empirically shown that Web graphs can be compressed down to three bits of storage per edge. In light of this, we address two basic questions: are social networks as compressible as Web graphs and are there tractable and realistic models that yield highly compressible graphs?

De-anonymizing Social Networks
Arvind Narayanan

Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.

We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.

Arvind Narayanan is a post-doctoral researcher working with Dan Boneh. He recently finished his Ph.D at the University of Texas at Austin, advised by Vitaly Shmatikov. His research is on the privacy and anonymity issues involved in publishing large-scale datasets about people. His thesis, in a sentence, is that the level of anonymity that society expects -- and companies claim to provide -- in published databases is fundamentally unrealizable. He blogs about his anonymity-breaking efforts and other research at

Eliciting Information on the Distribution of Future Outcomes
Nicolas Lambert

This paper studies the problem of inducing a presumably knowledgeable expert to reveal true information regarding the probability distribution of a future random outcome. The information being considered is quite general, and includes in particular the statistics of the probability distribution (such as mean, median, variance), and categorical information (such as the most correlated pair of variables). I examine two types of incentive schemes: Those that reward the expert for being truthful, and, for the case of numerical information, those that reward the expert increasingly with the accuracy of the prediction. For both cases, I establish simple criteria to determine the information that can be elicited with such schemes, and offer a complete characterization of the associated schedule fee.

Optimizing Web Traffic via the Media Scheduling Problem
Lars Backstrom

Website traffic varies through time in consistent and predictable ways, with highest traffic in the middle of the day. When providing media content to visitors, it is important to present repeat visitors with new content so that they keep coming back. In this paper we present an algorithm to balance the need to keep a website fresh with new content with the desire to present the best content to the most visitors at times of peak traffic. We formulate this as the me- dia scheduling problem, where we attempt to maximize total clicks, given the overall traffic pattern and the time varying clickthrough rates of available media content. We present an efficient algorithm to perform this scheduling under certain conditions and apply this algorithm to real data obtained from server logs, showing evidence of significant improvements in traffic from our algorithmic sched- ules. Finally, we analyze the click data, presenting models for why and how the clickthrough rate for new content declines as it ages.

Behavioral Experiments in Strategic Networks
Michael Kearns

For four years now, we have been conducting "medium-scale" experiments in how human subjects behave in strategic and economic settings mediated by an underlying network structure. We have explored a wide range of networks inspired by generative models from the literature, and a diverse set of collective strategic problems, including biased voting, graph coloring, consensus, and networked trading. These experiments have yielded a wealth of both specific findings and emerging general themes about how populations of human subjects interact in strategic networks. I will review these findings and themes, with an emphasis on the many more questions they raise than answer.

How Social Network Structure affects Diffusion and Learning
Matthew Jackson

We examine how diffusion and learning processes operating through social networks are affected by homophily (the tendency of individuals to associate with others similar to themselves) and other network characteristics. Homophily has no effect in pure diffusion processes where only shortest paths matter; only connection density matters. In contrast, homophily substantially slows learning based on repeated averaging of neighbors' information, while connection density has no effect.

How to Design Profitable Auctions
Paul Valiant

This talk is a high-level overview of results from two years of work with Silvio Micali and Jing Chen on a question that may best be phrased as: what is the best way to sell many items to many players? The standard formalizations of this question were fleshed out in the 1960's and led to answers that are unsatisfactory in practice, leading to today's Ebay that sells items by the auction method known to the ancient Greeks. In this talk we present new formulations of this question, elicited by fundamental notions from computer science: competitive analysis, collusion, and "knowledge". We present several concrete (and near-optimal) auctions in these models, along with many open questions.

Sort me if you can (or, Algorithms on Dynamic Data)
Mohammad Mahdian

We formulate and study a new computational model for dynamic data. In this model the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has a limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability. If time permits, we will discuss new results on graph algorithms in our framework.

This is based on joint work with Aris Anagnostopoulos, Ravi Kumar, and Eli Upfal.

Discovery & Emergence
Abdur Chowdhury

Often as computer scientists we focus on faster algorithms, such as approximations of solutions in linear time over large data sets or similar problems. Rather than focus on algorithms in this talk, we ask the question "What possibilities emerge from surfacing the world's conversations to others". Specifically we explore Twitter Trends as a discovery tool and show how awareness of the thoughts of others can cause the emergence of new behaviors.

Dr. Abdur Chowdhury serves as Twitter's Chief Scientist. Prior to that Dr. Chowdhury co-founded Summize a real-time search engine sold to Twitter in 2008. Dr Chowdhury has held positions at AOL as their Chief Architect for Search, Georgetown's Computer Science Department and University of Maryland's Institute for Systems Research. His research interest lay in Information Retrieval focusing on making information accessible.

Internet search advertising: Challenges in targeting and quality considerations
Mayur Datar

In this informal talk, I will touch upon two topics related to search advertising:

Targeting: How do you match advertiser keywords to user queries? What are the underlying challenges in bridging this semantic gap?

How do you measure the quality of search ads? Is it in the best interest of all parties involved, users, advertisers and the search engine, to always show ads for queries? What are the costs and benefits to showing ads and how do you decide on when is it appropriate to show ads?

Look forward to more questions than answers! Hopefully these will lead to good research topics.

Mayur Datar works as a Research Scientist with Google Inc. His research interests are in datamining, algorithms, databases and computer science theory. Prior to joining Google, Mayur obtained his doctorate degree in computer science from Stanford university and a Bachelor of Technology degree from I.I.T. Bombay. He was awarded the President of India, Gold Medal for being the most outstanding student of his graduating batch from I.I.T. Bombay. He has published several papers in renowned conferences like SIGMOD, VLDB, KDD, FOCS, SODA, WWW.

Affiliation Networks
D. Sivakumar

In the last decade, structural properties of several naturally arising networks (the Internet, socialnetworks, the web graph, etc.) have been studied intensively with a view to understanding their evolution. In recent empirical work, Leskovec, Kleinberg, and Faloutsos identify two new and surprising properties of the evolution of many real-world networks: densification (the ratio of edges to vertices grows over time), and shrinking diameter (the diameter reduces over time to a constant). These properties run counter to conventional wisdom, and are certainly inconsistent with graph models prior to their work.

In this work, we present the first model that provides a simple, realistic, and mathematically tractable generative model that intrinsically explains all the well-known properties of the social networks, as well as densification and shrinking diameter. Our model is based on ideas studied empirically in the social sciences, primarily on the groundbreaking work of Breiger (1973) on bipartite models of social networksthat capture the affiliation of agents to societies.

We also present algorithms that harness the structural consequences of our model. Specifically, we show how to overcome the bottleneck of densification in computing shortest paths between vertices by producing sparse subgraphs that preserve or approximate shortest distances to all or a distinguished subset of vertices. This is a rare example of an algorithmic benefit derived from a realistic graph model.

The talk will review the history of relevant random graph models, and attempt to illustrate the co-evolution of theoretical modeling and empirical insights in this fascinating slice of graph theory.

The talk reports work done jointly with Silvio Lattanzi of Sapienza University, Roma, Italy, and is to appear in STOC 2009.

Using Bandit Models in Search Engines
Sandeep Pandey

Web search engines perform two basic tasks: they acquire Web pages via crawling, and present lists of pages and advertisements in response to user search queries. Given the gigantic size of the Web these tasks can be viewed as constrained optimization problems. On the acquisition side the constraint is on the crawling resources, while on the presentation side, the constraint is on user attention: users cannot be expected to sift through large amounts of content. Furthermore, some parameters needed to solve these optimization problems may or may not be known leading to the offline and online versions of the problems. In this talk we will look at these optimization problems.

In particular, we will focus on the online presentation of advertisements where the unknown parameters are the ad quality values. I will present a method of simultaneously estimating unknown parameters and exploiting current estimates. Our method extends and adapts well-known "Bandit" models from Machine Learning.

Sandeep Pandey is a Research Scientist in Search and Advertising group at Yahoo! Research. Before joining Yahoo!, he received his Ph.D. from Carnegie Mellon and his B.Tech. from IIT Bombay. His research interests include Web crawling and advertising. He also likes to work on online learning problems such as the multi-armed bandit problem. In his spare time Sandeep enjoys watching cricket and playing tennis.

Research Challenges in Online Advertising
Richard Chatwin
VP Research, Adchemy

Adchemy is leading a revolution in online performance marketing, both as a top-25 U.S. online advertiser and through the development of the Adchemy Digital Marketing Platform, an integrated suite of products that allows online marketers to maximize their customers' end-to-end experience. I will discuss some of the research challenges that we are tackling in developing our technology solutions, including problems related to online matching, multi-armed bandits, and the optimizer's curse.

Richard has been VP Research at Adchemy since its inception in late 2004. He has over nine years of experience in the online advertising industry. He received his PhD from the Operations Research department at Stanford, where he focused on airline revenue management. He spent several years in the consulting world pioneering the development and implementation of real option valuation methodologies for asset valuation.

Scaling Social Data
Robert Johnson
Director of Engineering, Facebook

Facebook serves billions of pages a day based on hundreds of terabytes of dynamic data, which presents many scaling challenges for our storage and caching systems. In particular the social nature of the data gives it some interesting properties - it is not easily clustered or partitioned by users or applications, it updates frequently, and many queries require massive network fan-in. I'll be discussing the problems that arise in scaling such a system and some of the solutions we've used to get to our current scale, as well as our plans for infrastructure to solve these problems at even larger scale and in a more general way for future web 2.0 applications with similar properties.

Robert Johnson is Director of Engineering at Facebook, where he leads the software development efforts to cost-effectively scale Facebook's infrastructure and optimize performance for its many millions of users. During his time with the company, the number of users has expanded by more than twenty-fold and Facebook now handles billions of page views a day. Robert was previously at ActiveVideo Networks where he led the distributed systems and set-top software development teams. He has worked in a wide variety of engineering roles from robotics to embedded systems to web software. He received a B.S. In Engineering and Applied Science from Caltech.

Data Cloud: Integrating Structured Information into Web Search
Yury Lifshits

In this talk we address two questions: (1) How to use structured data in web search? (2) How to gather structured data? For the first question we identify valuable classes of data, present query classes that can benefit from structured data and describe architecture that combines keyword search with structured search. For the second question we present Data Cloud: An ecosystem of data publishers, search engine (data cloud) and data consumers.
We show connection from Data Cloud strategy to classic notion in economics: network effect in two-sided markets. At the end of the talk an early demo implementation will be presented.

Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He spent a year as a postdoc at Caltech before joining Yahoo! Research in 2008. He won two gold medals at International Mathematical Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":

Manipulation Robustness of Collaborative Filtering Systems
Robbie Yan

While the expanding universe of products available via Internet commerce provides consumers with valuable options, sifting through the numerous alternatives to identify desirable choices can be challenging. Collaborative filtering systems aid this process by recommending to users products desired by similar individuals. Because purchase decisions are influenced by these systems, however, they have become targets of manipulation by unscrupulous vendors. In this talk, we will discuss our findings on the robustness of collaborative filtering systems to manipulation. In particular, we provide theoretical and empirical results demonstrating that while nearest neighbor algorithms, which are widely used in commercial systems, can be highly susceptible to manipulation, two classes of collaborative filtering algorithms which we refer to as linear and asymptotically linear are relatively robust. In particular, we find that as a user rates an increasing number of products, the average accuracy of predictions made by these algorithms becomes insensitive to manipulated data. Our results provide guidance for the design of future collaborative filtering systems.

Scalable image recognition algorithms for camera phone applications
G.D. Ramkumar, SnapTell, Inc.

SnapTell has created technology that allows a mobile phone user to snap a picture of a product and instantly receive reviews, prices, and information. The service has been scaled to support more than ten million products including all books, DVDs, CDs, video games, and other product categories. We present an outline of the algorithms used to power the service emphasizing scaling and locality sensitive indexing. SnapTell's popular iPhone application has been downloaded by over half a million users over the last few weeks.

The technology can also be used to make print advertisements in a magazine, newspaper, poster, or billboard interactive without adding barcodes or otherwise altering the content. Consumers snap a picture of the object and send it to a shortcode to enter into a contest, access a mobile website, or download ringtones, wallpaper, video clips, or other information. SnapTell is the leading vendor in this space and offers a solution with unprecedented scale and robustness.

We present a synopsis of the image matching algorithms used to power the service and discuss potential research topics in the space.

Content-Driven Reputation and Trust for Collaborative Content
Luca de Alfaro (UCSC, visiting at Google)

Open collaboration is one of the most effective ways to create content and aggregate information, as witnessed by the Wikipedia. In this talk, we propose a reputation systems for the authors of collaborative information, and a trust systems for the aggregated information.

The goal of the reputation system is to provide an incentive to collaboration, and to provide information about the quality of authors. We propose a content-driven notion of reputation, where authors of long-lasting contributions gain reputation, while authors of contributions that are not preserved lose reputation. We show that, on the Wikipedia, the content-driven reputation of authors is a good predictor of the quality of their future contributions. We then describe a trust system for content, where the trust in a portion of content is computed according to the reputations of the authors who created and revised it. Again, we show that on the Wikipedia content trust is a good predictor of future content longevity. Finally, we describe how the reputation and trust systems can be made robust to Sybil attacks, in which people use multiple identities to try boost their reputation and raise the trust of content of their choice.

The reputation and trust systems have been implemented in the WikiTrust system/.

Luca de Alfaro received his Ph.D. degree from Stanford University, where he worked on system specification and verification under the supervision of Zohar Manna. After graduation, he was for three years at UC Berkeley as a postdoctoral researcher, working with Thomas A. Henzinger on verification and on game theory. In 2001, Luca de Alfaro joined the faculty of UC Santa Cruz. His research interests involve system specification and verification, embedded software, game theory, incentive systems for on-line collaboration, and trust and reputation on the web. Luca de Alfaro is currently at Google as a visiting scientist, on leave from UC Santa Cruz.

Luca de Alfaro has long been an enthusiastic user of on-line collaborative tools, including wikis. One evening in 2005, after working at the wiki he uses to facilitate information sharing among members of his research group, he decided to start a wiki on cooking, which is one of his passions. The wiki was soon peppered with spam and less than useful contributions. Rather than cleaning up, a most tedious job, Luca de Alfaro shut off the wiki temporarily, and he started pondering how to provide incentives to collaboration, and compute quality metrics for content. As he found the job of rating authors or contributions by hand most tedious, he was interested in methods for computing reputation and quality metrics purely from content analysis. The research led to the WikiTrust system/. The wiki on cooking has been brought back to life (, even though it receives little traffic. You are welcome to register, add recipes, and see WikiTrust in action.

Tagging with Queries: How and Why? (To appear in WSDM 2009)
Ioannis Antonellis

Web search queries capture the information need of search engine users. Search engines store these queries in their logs and analyze them to guide their search results.

In this work, we argue that not only a search engine can benefit from data stored in these logs, but also the web users. We first show how clickthrough logs can be collected in a distributed fashion using the http referer field in web server access logs. We then perform a set of experiments to study the information value of search engine queries when treated as "tags" or "labels" for the web pages that both appear as a result and the user actually clicks on. We ask how much extra information these query tags provide for web pages by comparing them to tags from the bookmarking site and to the pagetext. We find that query tags can provide substantially many (on average 250 tags per URL), new tags (on average 125 tags per URL are not present in the pagetext) for a large fraction of the Web.

Joint work with Hector Garcia-Molina and Jawed Karim

Content Targeting and Recommendation Systems
Scott Brave (Baynote, Inc)

Baynote delivers on-demand recommendation technology and social search for over 200 websites, including Motorola, NASA, Intuit, and Expedia. By tapping into the wisdom of the online community and automatically displaying the best content and products, Baynote's Collective Intelligence Platform (CIP) improves the user experience while increasing online revenue, leads and impressions. In this talk, I will present Baynote's unified approach to content targeting and discuss some of the opportunities and challenges in building next-generation recommendation systems.

Scott Brave is a co-founder and CTO at Baynote, Inc. Prior to Baynote, he was a postdoctoral scholar in the Department of Communication at Stanford University where he served as lab manager for the CHIMe (Communication between Humans and Interactive Media) Lab. Scott received his Ph.D. in Human-Computer Interaction (Communication Dept.), and B.S. in Computer Systems Engineering (AI focus) from Stanford University, and his M.S. from the MIT Media Lab. Scott is also a former Associate Editor of the "International Journal of Human-Computer Studies" and co-author of Wired for speech: How voice activates and advances the human-computer relationship.

Online Social Networks: Modeling and Mining
Ravi Kumar (Yahoo! Research)

Online social networks have become major and driving phenomena on the web. In this talk we will address key modeling and algorithmic questions related to large online social networks. From the modeling perspective, we raise the question of whether there is a generative model for network evolution. The availability of time-stamped data makes it possible to study this question at an extremely fine granularity. We exhibit a simple, natural model that leads to synthetic networks with properties similar to the online ones. From an algorithmic viewpoint, we focus on data mining challenges posed by the magnitude of data in these networks. In particular, we examine topics related to influence and correlation in user activities in such networks.

Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a research staff member at the IBM Almaden Research Center in the Computer Science Principles and Methodologies group. He obtained his PhD in Computer Science from Cornell University in December 1997. His primary interests are web algorithms, algorithms for large data sets, and theory of computation.

Community Structure in Large Social and Information Networks
Michael Mahoney (Stanford University)

The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful communities in large real-world networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology uses local spectral methods and involves computing isoperimetric properties of the networks at various size scales -- a novel application of ideas from statistics and scientific computation to internet data analysis. Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually ``blend in'' with the rest of the network and thus become less ``community-like.'' This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks.

Michael Mahoney is currently a research scientist in the math department at Stanford University. He is currently working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. His research interests also include randomized algorithms for matrices and regression problems, applications to DNA single nucleotide polymorphism data, and large-scale statistical data analysis more generally. He has been a faculty member at Yale University and a researcher at Yahoo.

An Incentive-Based Architecture for Social Recommendations
Kostas Kollias

Recommender systems and social networking services are important Internet monetization tools and their popularity greatly facilitates web-based commercial activity. In this talk we will focus on how these tools can be coupled in order to form an architecture that incentivizes users to provide honest recommendations in a social network. The architecture consists of one distinct reputation system for each user, a recommendation language which allows users to place their reviews, a product ranking algorithm, and a revenue sharing mechanism. We will discuss the details of the architecture, the properties that allow users to profit by placing honest recommendations and how fast these recommendations lead to an equilibrium state.

Joint work with Rajat Bhattacharjee and Ashish Goel

Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction
Ali Dasdan

Web search is an indispensable resource for users to search for information on the Web. Web search is also an important service for publishers and advertisers to present their content to users. Thus, user satisfaction is the key and must be quantified. In this talk, our goal is to give a practical review of web search metrics from a user satisfaction point of view. This viewpoint is intuitive enough for a typical user to express his or her web search experience using it. The metrics that we are planning to cover fall into the following areas: relevance, coverage, comprehensiveness, diversity, discovery freshness, content freshness, and presentation. We will also describe how these metrics can be mapped to proxy metrics for the stages of a generic search engine pipeline. Our hope is that practitioners can apply these metrics readily and that researchers can get problems to work on, especially in formalizing and refining metrics.

This short talk is based on a half-day tutorial that the presenter with Kostas Tsioutsiouliklis and Emre Velipasaoglu will present at the WWW'09 conference.

Ali Dasdan is a director managing the metrics, analysis, and algorithms group of Yahoo! Web Search. His group is responsible for defining and implementing white-box metrics for production web search systems, production web search data, and the Web. His group is also responsible for monitoring the metrics and diagnosing the causes for anomalies detected. His research interests are in web search and advertising since 2006. He obtained his PhD in Computer Science from the University of Illinois at Urbana-Champaign in 1999.

Trusts among Users of Online Communities -- an Epinions Case Study
Hady Lauw

Trust between a pair of users is an important piece of information for users in an online community (such as electronic commerce websites and product review websites) where users may rely on trust information to make decisions. In this paper, we address the problem of predicting whether a user trusts another user. Most prior work infers unknown trust ratings from known trust ratings. The effectiveness of this approach depends on the connectivity of the known web of trust and can be quite poor when the connectivity is very sparse which is often the case in an online community. In this paper, we therefore propose a classification approach to address the trust prediction problem. We develop a taxonomy to obtain an extensive set of relevant features derived from user attributes and user interactions in an online community. As a test case, we apply the approach to data collected from Epinions, a large product review community that supports various types of interactions as well as a web of trust that can be used for training and evaluation. Empirical results show that the trust among users can be effectively predicted using pre-trained classifiers.


Hady is currently a Visiting Researcher at Microsoft Search Labs. Prior to that, he earned his Ph.D. at Nanyang Technological University, Singapore under a fellowship funded by A*STAR (Singapore's NSF-equivalent). His research interests are in social networks and Web mining, covering such topics as rating behaviors, trust, and mass collaboration. His work has been published in various conferences and journals such as KDD, SDM, CIKM, WSDM, EC, and TKDE.

Top-k Aggregation Using Intersection of Ranked Inputs
Sergei Vassilvitskii

There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: one has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches.

We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain ``instance optimal,'' a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.

Joint work with Ravi Kumar, Kunal Punera and Torsten Suel

Utility-Maximizing Privacy Mechanisms
Mukund Sundararajan

A mechanism for releasing information about a statistical database with sensitive data must resolve a trade-off between utility and privacy. Informally, publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Formally, privacy can be rigorously quantified using the framework of {\em differential privacy}, which requires that a mechanism's output distribution is nearly the same (in a strong sense) whether or not a given database row is included or excluded. Thus far, (dis)utility has typically been quantified via the expected error of a (randomly perturbed) response to a query.

We pursue much stronger and more general utility guarantees. Just as differential privacy guarantees protection against every potential attacker, independent of its side information, we seek a mechanism that guarantees near-optimal utility to every potential user, independent of its side information. Formally, we model the side information of a potential user as a prior distribution over query results. An interaction between such a user and a mechanism induces a posterior distribution -- the utility of the mechanism for this user is defined as the accuracy of this posterior as quantified via a user-specific loss function. A differentially private mechanism $M$ is (near-)optimal for a given user $u$ if $u$ derives (almost) as much utility from $M$ as from any other differentially private mechanism.

We prove that for each fixed counting query and differential privacy level, there is a {\em geometric mechanism} $M^*$ --- a discrete variant of the well-studied Laplace mechanism --- that is {\em simultaneously utility-maximizing} for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: {\em every} potential user $u$, no matter what its side information and loss function, derives as much utility from $M^*$ as from interacting with a differentially private mechanism $M_u$ that is optimally tailored to $u$. The first part of our proof characterizes the utility-maximizing differentially private mechanism for a fixed but arbitrary user in terms of a basic feasible solution to a linear program with constraints that encode differential privacy. The second part shows that all of the relevant vertices of this polytope (ranging over all possible users) are derivable from the geometric mechanism via suitable remappings of its range.

Joint work with Arpita Ghosh and Tim Roughgarden.

Online Advertising
Hamid Nazerzadeh

Online advertising is a multi-billion dollar growing market with very interesting properties. I briefly explain some of these properties, and talk about ad allocation problems in the context of sponsored search and display advertising.


  • Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising, with Amin Saberi, and Rakesh Vohra.
    Proceedings of the 17th International World Wide Web Conference (WWW), 179-188, 2008.
  • A Combinatorial Allocation Mechanism with Penalties For Banner Advertising, with Uriel Feige, Nicole Immorlica, and Vahab S. Mirrokni.
    Proceedings of the 17th International World Wide Web Conference (WWW), 169-178, 2008.
  • Advertisement Allocation for Generalized Second Pricing Schemes, with Ashish Goel, Mohammad Mahdian, and Amin Saberi
    Fourth Workshop on Ad Auctions, 2008
  • Allocating Online Advertisement Space with Unreliable Estimates, with Mohammad Mahdian and Amin Saberi.
    Proceedings of the 8th ACM Conference on Electronic Commerce (EC), 288-294, 2007.

From Bayesian to Worst-Case Optimal Auction Design
Tim Roughgarden

We define a general template for auction design that explicitly connects Bayesian optimal mechanism design, the dominant paradigm in economics, with worst-case analysis. In particular, we establish a general and principled way to identify appropriate performance benchmarks for prior-free optimal auction design. This framework enables, for the first time, meaningful competitive analysis for auction problems with allocation asymmetries, bidder asymmetries, and costly payments.

The above is joint work with Jason Hartline (from STOC 2008 and ongoing). Time permitting, we will also briefly discuss another recent application of Bayesian optimal auction design: rigorous efficiency-revenue approximation trade-offs (joint work with Shaddin Dughmi and Mukund Sundararajan).

Social Design
Yury Lifshits
, Caltech,

Social Design is an art and science of setting and managing incentives in web applications. So far, machine learning is the dominant approach to all central algorithmic problems in the Web: search, advertising, recommendation systems. Social design can become an important counterpart to ML, if we find the ways to incentivise users to give us more and better input data and even the answers themselves.
This talk is an outline of a new research field of Social Design. We start with several stories including Internet press conference of president Putin, reputation crash in largest Russian social network, and Yahoo! SearchMonkey project. Then we attempt to extract, characterize and classify the commonly used patterns in social design. Finaly, we set up the research agenda in the area with focus on social aspects of semantic web and advertising technologies.
Slides available at
Bio: Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two gold medals at International Mathematical Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":

Inferring Links and Initiators
Evimaria Terzi

Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i; j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix we study the problem of inferring the underlying directed links between entities (rows) and finding which entities act as initiators. I will formally define this problem and propose an MCMC framework for estimating the links and the initiators given the matrix of observations M. I will also show how this framework can be extended to incorporate a temporal aspect; instead of considering a single observation matrix M we consider a sequence of observation matrices M1; : : : ;Mt over time. Finally I will show that this problem is a generalization of several problems studied in the field of social-network analysis.
Bio: Evimaria Terzi is a Research Staff Member at IBM Almaden since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, MSc from Purdue University (USA) in 2002 and BSc from the University of Thessaloniki (Greece) in 2000.

Reputation Mechanisms for Online Markets
Christina Aperjis

In online marketplaces potential buyers have to decide whether to buy an item and how much to pay for it without being able to observe it and without knowing whether the seller is trustworthy. Reputation mechanisms promote trust by giving information about past transactions. With the goal of incentivizing sellers to advertise truthfully, we study how the marketplace is affected by (i) the reputation mechanism, and (ii) the way buyers interpret the seller?s reputation score.

Two-Stage Myopic Dynamics in Network Formation Games
Esteban Arcaute

We consider two-stage myopic dynamics for network formation games, where utility to a node is based on a function of the distance to all other nodes. Since our network formation game is a generalization of Myerson's announcement game, we use pairwise Nash stability as the solution concept.
We prove that, although the price of anarchy of the static game is unbounded (in the number of nodes in the network), our dynamics converge to a subset of pairwise Nash stable networks with a constant efficiency ratio. For some specific utility functions, we prove that our dynamics converge to efficient networks.
This is joint work with Ramesh Johari and Shie Mannor.

Measuring and Modeling the Web
Ying Xu

The last couple of decades have witnessed a phenomenal growth in the World Wide Web. The Web has become an ubiquitous channel for information sharing and dissemination. This talk describes several research contributions in an endeavor towards a better understanding of the Web.
One of the basic questions about the Web is its size, i.e., how many webpages there are on the Web. A closely related problem is the sizes of search engine indexes, as search engine index defines an important subset of the Web that is most easily accessible to users. In the last ten years both the scientific literature and the popular press dealt at length with methodologies and estimates for the sizes of the Web and public search engines. In the first part of the talk, we propose algorithms for measuring the Web size, including both empirical and theoretical results.
The Web is so large that even estimating its size is a hard problem. To better understand the Web, people study random web graph models -- a stochastic process that generates random graphs that with high probability resemble the Web graph. Random web graph models are useful theoretical tools to provide insights to the evolution of the Web graph, to predict future behaviors, and to generate synthetic data sets for testing and simulation. In the second half of the talk, we briefly present several results in analyzing web graph models and applying the models to real world applications.

Should ad networks bother fighting click fraud?
Bobji Mungamuru

Suppose an ad network decides that a given click-through is invalid (or "fraudulent"). The implication is that the ad network will not charge the advertiser for that click-through. Therefore, arguably, the ad network is "taking money out of his own pocket" by marking clicks invalid. As such, should ad networks even bother fighting fraud? We analyze a simple economic model of the online advertising market, and conclude that the answer is "yes".

Reputation systems
Robbie Yan

Effective reputation systems should incentivize users to acquire and reveal information about content quality. They should also counter distortions induced by strategic agents that benefit from manipulating reputations. Mechanisms have been proposed in the recent literature with the aim to provide such incentives. These mechanisms, which we will refer to as reputation markets, can be viewed as information markets designed to assess the quality of online content. In this paper we develop equilibrium models to study how incentives created by a reputation market should influence community behavior and the accuracy of assessments.

Optimal Marketing Strategies over Social Networks
Mukund Sundararajan

Abstract: We study revenue maximization in social networks. We identify a family of strategies called 'influence and exploit' strategies that are easy to implement, easy to optimize over and approximately optimal.
Based on a paper that will appear in WWW 08. Jointly with Jason Hartline (Northwestern University) and Vahab Mirrokni (MSR Redmond).i

Ratio Index for Budgeted Learning, with Applications
Brad Null

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. We present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Index based policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multi-armed bandit problem.
The ratio index has several interesting structural properties. First, it can be computed in strongly polynomial time; the techniques we develop may be useful for deriving strongly polynomial algorithms for other Markov Decision Problems. Second, with the appropriate discount factor, the Gittins index and the ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, the ratio index can be used to obtain a "discount oblivious" solution to the multi-armed bandit problem, which in turn gives constant factor approximations to a wide class of problems that we call "list ranking problems".
Joint work with Rajat Bhattacharjee, Ashish Goel, and Sanjeev Khanna

Web Graph Similarity for Anomaly Detection
Panagiotis Papadimitriou

Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this work we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from well-known document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.

Brian Babcock

I'll talk about a number of computational problems that arise in online advertising, primarily from the perspective of the buyer of advertising.
Topics to be discussed include:
* When buying sponsored search links from search engines, how does one manage a portfolio of millions of keywords with minimal human involvement? Challenges include determining how much to bid for each keyword and what ad text to display to attract the most customers.
* How can one learn what properties of a web site or advertisement will lead to the greatest number of purchases or clicks? The problem is made more difficult by the fact that different classes of users respond well to different marketing messages, so it's best to use different content for different user segments.
* Consider an auction where the items to be sold are not known in advance, but rather arrive online, and buyers are willing to pay different amount for each item (revealed as the items arrive). Each buyer has a maximum limit on the total number of items that they will buy. How should the seller assign items to buyers so as to maximize the seller's revenue?

An Axiomatic Approach to Personalized Ranking Systems
Alon Altman

Personalized ranking systems and trust systems are an essential tool for collaboration in a multi-agent environment. In these systems, trust relations between many agents are aggregated to produce a personalized trust rating of the agents. In this talk I introduce the first extensive axiomatic study of this setting, and explore a wide array of well-known and new personalized ranking systems. We adapt several axioms (basic criteria) from the literature on global ranking systems to the context of personalized ranking systems, and fully classify the set of systems that satisfy all of these axioms. We further show that all these axioms are necessary for this result.

Ad Auctions with Reserved Price
Arash Asadpour

I will talk about the Ad Auctions when the auctioneer enjoys a relevant ad that she can show in one of the slots if she wants to. The setting is to some extent similar to the Ad Auctions with reserved price. I will explain a randomized auction which is truthful in expectation for both the bidders and the auctioneer. Also our mechanism is individual rational and budget balanced. I will discus the revenue of this auction and also will talk about some other settings in which the technique used to design the auction can be helpful.
This is a joint work with Kamal Jain (Microsoft Research) and David Abraham (CMU).

Boosting the area under ROC curve
Phil Long
, Google

We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. This boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
(This is joint work with Rocco Servedio.)

Random input models for Adwords
Aranyak Mehta
, Google

I will talk about a new result on Random Input models for the Adwords Problem: What are good algorithms for allocation when the query stream is modeled as a random process, rather than as a worst case input. How does the natural "highest-bidder-wins" greedy algorithm perform? This question leads to an interesting generalization of a classic online matching algorithm. I will also talk about a related result on extended bidding models for Ad auctions.

Who do you know: The Social Network Revolution
Orkut Buyukkokten
, Google

Online social networks fundamentally change the way we get connected. The people we cross paths with have the biggest influence in our lives. Now it's easier to cross paths than ever as we are much closer and so much more connected. In this talk I will discuss the motivation behind the development of, touch on the social and technical aspects of implementing and maintaining a system that has over 60 million users and reflect on the lessons we learned.

Deterministic Decentralized Search in Random Graphs
Esteban Arcaute

We study a general framework for decentralized search in random graphs. Our main focus is on deterministic memoryless search algorithms that use only local information to reach their destination in a bounded number of steps in expectation. This class includes (with small modifications) the search algorithms used in Kleinberg's pioneering work on long-range percolation graphs and hierarchical network models. We give a characterization of searchable graphs in this model, and use this characterization to prove a monotonicity property for searchability.
Joint work with Ning Chen, Ravi Kumar, David Liben-Nowell, Mohammad Mahdian, Hamid Nazerzadeh and Ying Xu.

Interdomain Routing and Games
Michael Schapira
, Hebrew University of Jerusalem (

We present a game-theoretic model that captures many of the intricacies of interdomain routing in today's Internet. In this model, the strategic agents are source-nodes located on a network, who aim to send traffic to a unique destination node . The model enables complex dynamic interactions between the agents (asynchronous, sequential, and based on partial-information).
We provide realistic conditions that guarantee that executing the Border gateway Protocol (BGP), the only protocol currently used for interdomain routing, is in the best interest of each of the players. Moreover, we show that even coalitions of players of any size cannot improve their routing outcomes by jointly deviating from BGP. Unlike the vast majority of works in mechanism design, these results do not require any monetary transfers (to or by the agents).
Based on joint work with Hagay Levin and Aviv Zohar. The Anatomy of Clickbot.A
Neil Daswani
, Google

In this talk, I will present a detailed case study of the architecture of the Clickbot.A botnet that attempted a low-noise click fraud attack against syndicated search engines. The botnet of over 100,000 machines was controlled using a HTTP-based botmaster. Google identified all clicks on its ads exhibiting Clickbot.A-like patterns and marked them as invalid. We disclose the results of our investigation of this botnet to educate the security research community and provide information regarding the novelties of the attack.

Data Stream Algorithms with Applications to Automatic Worm Fingerprinting
Dilys Thomas

We will first go over a few network worms and contemporary algorithms to detect these network worms. We will then go over some recent techniques to detect these worms using "vunlerability signatures". We will then use this to review two important problems in data stream computation -- distinct value estimation and frequent element computation. We show how these algorithms can be used for automatic worm fingerprinting.

Click Fraud Resistant Methods for Learning Click-Through Rates
Sergei Vassilvitskii

Presenting the paper "Click Fraud Resistant Methods for Learning Click-Through Rates" by N. Immorlica, K. Jain, M. Mahdian, and K. Talwar from WINE 2005.

Pricing Partially Compatible Products
Adam Meyerson

Suppose two competing companies provide roughly equivalent suites of software products. Each version of a product has a possibly distinct price and quality. Customers would like to purchase a low- price, high-quality set of products. The issue is that products made by different companies need not be fully compatible. We reduce the customer's decision process to a minimum cut, and provide approximation and hardness results for the problem of computing the company's best-response strategy in several different models. We also introduce a range of questions for further work.
These results will appear in the ACM Conference on Electronic Commerce (EC) 2007, and are joint work with David Kempe, Nainesh Solanki, and Ramnath Chellappa.

Google News Personalization: Scalable Online Collaborative Filtering
Mayur Datar
, Google

Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.

Detecting Near-Duplicates for Web Crawling
Anish Das Sarma

This is joint work with Gurmeet Singh Manku and Arvind Jain in Google.
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not.
In my talk I will address the problem of efficiently detecting near-duplicates in the web crawl setting. First I shall review Charikar's fingerprinting technique used in our approach. Then, I shall formally define the problem and present algorithmic solutions for two versions of the problem: online queries (single new document) and batch queries (multiple new documents). Finally, I shall briefly talk about our experimental analysis of the techniques.

Clickthrough of and Bidding on keywords
Aleksandra Korolova / Dilys Thomas

Aleksandra will talk about the paper "Predicting ClickThrough Rate Using Keyword Clusters" by Regelson and Fain from a Workshop on Sponsored Search Auctions. The results in the paper are mostly experimental. Dilys will briefly talk on "how to bid on keywords".

Multi-path Routing
Mihaela Enachescu

I will present an approach for enabling multi-path routing in large networks. This is based on a project at NEC Labs, and is joint work with Ravi Kokku. Our approach is to create multiple paths between various source-destination pairs by selecting a special subset of relay nodes. We give hardness results and provide some approximation algorithms.
This work is motivated by the observation that given two end-host in the Internet there exist potentially many independent or partially overlapping paths between them. However, only one of these is usually used by the underlay routing, leading to many inefficiencies (the single path routings are usually slow to adapt in the presence of fluctuating network conditions). Multi-path routing provides a more adaptive framework, and it also raises a variety of (still open) problems for our enjoyment.

Collaborative tagging systems
Paul Heymann

Collaborative tagging systems have recently become prevalent on the web. These systems are similar to classical keyword-based systems, but harness the input of many users to label large datasets like URLs on the web, videos, and photos. In my talk, I will give a brief description of tagging systems, and then talk about two recent projects. The first is work to predict tags annotating URLs based on the page text of those URLs. The outcome of this task has important implications for web search and for the usefulness of data created by social bookmarking systems. The second is an earlier (continuing) project on tag hierarchies.

Is Efficiency Expensive?
Mukund Sundararajan

We study the simultaneous optimization of efficiency and revenue in pay-per-click keyword auctions in a Bayesian setting. Our main result is that the efficient keyword auction yields near-optimal revenue even under modest competition. In the process, we build on classical results in auction theory to prove that increasing the number of bidders by the number of slots outweighs the benefit of employing the optimal reserve price.
This is joint work with Tim Roughgarden.

Estimating Search Engine Index and Web Sizes
Ying Xu

I will talk about estimating the index size of search engines, both empirical and theoretical results. The talk is based on our CIKM paper "Estimating Corpus Size via Queries" and ICALP paper "Estimating Sum by Weighted Sampling".

Reusable and multiple goods
Ashish Goel

The talk is based on
Online auctions with re-usable goods by Mohammad Hajiaghayi, Robert Kleinberg, Mohammad Mahdian, and David Parkes, EC'05
The Sequential Auction Problem on eBay: An Empirical Analysis and a Solution by Adam I. Juda and David C. Parkes, EC'06

Mechanism design based on Cost-Per-Acquisition (CPA)
Hamid Nazerzadeh

The Internet advertising has changed dramatically by moving from Cost-Per-Impression charging mechanism to Cost-Per-Click (CPC). I am going to talk about the possibility of going one step further and using CPA instead of CPC, and some of our results about this problem.

Asymptotically Optimal Repeated Auctions for Sponsored Search
Nicolas Lambert

We investigate asymptotically optimal keyword auctions, that is, auctions which maximize revenue as the number of bidders grows. We do so under two alternative behavioral assumptions. The first explicitly models the repeated nature of keyword auctions. It introduces a novel (and, we argue, plausible) assumption on individual bidding, namely that bidders never overbid their value, and bid their actual value if shut out for long enough. Under these conditions we present a broad class of repeated auctions that are asymptotically optimal among all sequential auctions (a superset of repeated auctions). Those auctions have varying payment schemes but share the ranking method. The Google auction belongs to this class, but not the Yahoo auction, and indeed we show that the latter is not asymptotically optimal. (Nonetheless, with some additional distributional assumptions, the Yahoo auction can be shown to belong to a broad category of auctions that are asymptotically optimal among all auction mechanisms that do not rely on ad relevance.) We then look at the one-shot keyword auction, which can be taken to model repeated auctions in which relatively myopic bidders converge on the equilibrium of the full-information stage game. In this case we show that the Google auction remains asymptotically optimal and the Yahoo auction suboptimal. The distributional assumptions under which our theorems hold are quite general, and arguably hold in all real-world situations of interest. We do however show that the Google auction is not asymptotically revenue maximizing for general distributions.