# Archive of RAIN talks

## Autumn 2016-17

Date Speaker Topic
October 12
Philippe Rigollet at OIT seminar, room M109 at GSB Online Learning in Repeated Auctions
November 2
Dennis Feehan Network reporting methods for sample surveys
November 16
Mohammad Akbarpour Timing "versus" Matching: Dynamic Matching on Stochastic Networks
November 30
Ariel Procaccia Computational Social Choice: For the People

## Spring 2015-16

Date Speaker Topic
April 13
Augustin Chaintreau Scaling up the Web Transparency: Algorithms and Challenges
April 27
David Rothschild Prediction Market Trading Strategies: manipulation and forecasting
May 11
Ali Jadbabaie Collective Phenomena in Complex Networks: Coordination and Social Learning
May 25
Sharique Hasan Conversational Peers and Idea Generation: Evidence from a Field Experiment

## Winter 13-14

Date Speaker Topic
January 21 Edith Cohen Scalable mining of massive networks: Distance-based Centrality, Similarity, and Influence
February 11 Glen Weyl Quadratic Voting
February 18 Paul Milgrom Auctions and Decentralization in Complex Networks

## Autumn 13-14

Date Speaker Topic
September 24 Balasubramanian Sivan Towards optimal algorithms for prediction with expert advice
October 8 Vivek Farias Online A-B Testing
October 29 Fil Menczer The spread of information in social media
November 19 Sergei Vassilvitskii Inverting a Steady-State

## Spring 13-14

Date Speaker Topic
April 2 Omer Tamuz Strategic Learning and the Topology of Social Networks
April 16 Kostas Bimpikis Information Provision in Dynamic Innovation Tournaments
May 14 Jonathan Levin Sales Mechanisms in Online Markets: What Happened to Internet Auctions?
May 28 Steve Tadelis Quality Externalities and the Limits of Reputation in Two-Sided Markets
June 4 Sinan Aral The Dynamics of Online Reputation

## Winter 13-14

Date Speaker Topic
January 29 Derek Ruths Control Profiles of Complex Networks
February 5 Amin Karbasi From Small-World Networks to User-Driven Content Search
February 19 Arun Chandrasekhar Savings Monitors
February 26 Ariel Procaccia Computational Fair Division: From Cake Cutting to Cloud Computing
March 12 Niki Kittur Big Thinking: Augmenting human cognition with crowds and computation
March 19 Ricardo Baeza-Yates OSNs: The Wisdom of a Few and The Value of Users

## Autumn 13-14

Date Speaker Topic
September 25 Tuomas Sandholm Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games
October 23 Sharad Goel "Going Viral" and the Structure of Online Diffusion
November 06 Itay Fainmesser The Value of Network Information
December 04 Jennifer Neville Prediction in complex networks: The impact of structure on learning and prediction

## Spring 12-13

Date Speaker Topic
April 03 Anirban Dasgupta Eliciting and aggregating information from the crowd
April 10 Noam Nisan Should auctions be complex?
April 17 David Pennock The extent of price misalignment in prediction markets
April 24 Yaron Singer Adaptive Seeding in Social Networks
May 08 Cristian Danescu-Niculescu-Mizil Language and Social Dynamics in Online Communities
May 15 Francis Bloch Pricing in Social Networks

## Autumn 12-13

Date Speaker Topic
Oct 03 Devavrat Shah Rumor in a network: Who's the culprit?
Oct 10 Moshe Babaioff On Bitcoin and Red Balloons
Oct 24David Kempe Information Asymmetries in first-price common-value auctions, or The Value of Information
Nov 14 Lirong Xia Random Utility Models for Social Choice
Nov 28 Susan Athey Advertiser Behavior in Sponsored Search Auctions

## Winter 11-12

Date Speaker Topic
January 18 Michael Wellman Price Prediction Strategies for Simultaneous One-Shot Auctions
January 25 George Varghese Lattice Games and the Economics of Aggregators
February 1 Michael Kearns Competitive Contagion in Networks
February 8 Stephen Barley Building an Institutional Field to Corral a Government
February 15 Siddharth Suri Cooperation and Assortativity with Endogenous Partner Selection
February 22 - -
February 29 - -
March 6 Sergiu Hart Two(!) Good To Be True
March 7 Dean Eckles Identifying peer effects in online communication behaviors

## Autumn 11-12

Date Speaker Topic
October 5 Jure Leskovec Networks, communities and the ground-truth
October 12 Vincent Conitzer Computing Game-Theoretic Solutions for Security
October 19 Costis Daskalakis Revenue-optimal Auctions
October 26 David Parkes A Random Graph Model of Multi-Hospital Kidney Exchanges
November 2 - -
November 9 Arnout van de Rijt A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs
November 16 Markus Mobius Treasure Hunt
November 23 - Thanksgiving
November 30 Yiling Chen Task Routing for Prediction Tasks

## Spring 10-11

Date Speaker Topic
April 6 Eric Friedman Bargaining Theory in the Cloud
April 13 Organizational Organizational
April 20 Matthew Jackson Network Patterns of Favor Exchange
April 27 Tim Roughgarden Simple Auctions with Near-Optimal Equilibria
May 4 Kevin Lang Efficient Online Ad Serving in a Display Advertising Exchange
May 11 Sergei Vassilvitskii Designing Algorithms for MapReduce
May 18 Arpita Ghosh Incentivizing High-quality User-Generated Content
May 25 Lada Adamic Coevolution of Network Structure and Content
June 1 Damon Centola -

## Winter 10-11

Date Speaker Topic
January 12 Martin Suchara BGP Safety with Spurious Updates - The Conditions of BGP Convergence
January 19 No seminar -
January 26 Gabor Szabo Predicting the Popularity of Online Content
February 2 No seminar -
February 9 No seminar -
February 16 Elisa Celis Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism
February 23 Yashodhan Kanoria Social Learning and the Dynamic Cavity Method
March 2 Matthias Mnich The Complexity Ecology of Consensus Ranking

## Autumn 10-11

Date Speaker Topic
September 22 Elliot Anshelevich Contribution Games in Social Networks
September 29 Pranav Dandekar Liquidity in Credit Networks: A Little Trust Goes a Long Way
October 6 Sugato Basu User Browsing Models: Relevance versus Examination
October 13 Daria Sorokina Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
October 20 Marc Najork Reflections on Link-based Ranking
October 27
-
-
November 3 Panagiotis Papadimitriou Output URL Bidding
November 10 Roberto Bayardo A Technical Overview of Search Ads Quality at Google
November 17 Georgios Pierrakos On the Competitive Ration of Online Sampling Auctions
November 24
-
-
December 1 Paolo Parigi The importance of ties

## Spring 09-10

Date Speaker Topic
April 7 Gueorgi Kossinets How opinions are received by online communities: a case study on amazon.com helpfulness votes
April 14 Grant Schoenebeck Reaching Consensus on Social Networks
April 21 Nilesh Dalvi Web-scale Information Extraction Using Wrappers
April 28
-
No meeting -- WWW 2010
May 5 Andrew Tomkins Characterization and Modeling of Online Browsing Activity
May 12 Xiaolin Shi User grouping behavior in online forums
May 19 Matthew Harding Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing
May 26 Neel Sundaresan Deconstructing A Long Tail Commerce Network
June 2 Monica Rogati Nod vs. High-Five: LinkedIn Connection Strength Models and Applications

## Winter 09-10

Date Speaker Topic
Jan 13 Jure Leskovec Network of positive and negative edges
Jan 20 Esteban Arcaute
Social Networks and Stable Matchings in the Job Market
Jan 27 Christina Aperjis Temporal User Behavior in Yahoo Answers
Feb 3 Ashish Goel Some open problems motivated by social networks
Feb 10 Yehuda Koren The Netflix Prize: Quest for 1,000,000 Feb 17 DJ Patil Data Jujitsu - the art of treating data as a product Feb 24 Panayiotis Tsaparas Query-log analysis for improving web search Mar 3 Arash Asadpour Stochastic Submodular Optimization Mar 10 Ravi Kumar Compressibility of Behavioral Graphs Mar 17 Sep Kamvar Aardvark ## Autumn 09-10 Date Speaker Topic Sep 30 Arvind Narayanan De-anonymizing Social Networks Oct 7 Nicolas Lambert Eliciting Information on the Distribution of Future Outcomes Oct 14 Lars Backstrom Optimizing Web Traffic via the Media Scheduling Problem Oct 21 Michael Kearns Behavioral Experiments in Strategic Networks Note special location: Terman Engineering Center, Room 453 Oct 28 Matthew Jackson How Social Network Structure affects Diffusion and Learning Nov 4 Paul Valiant How to Design Profitable Auctions Nov 11 Mohammad Mahdian Sort me if you can (or, Algorithms on Dynamic Data) Nov 18 DJ Patil Canceled! Nov 25 - No meeting -- Happy Thanksgiving! Dec 2 Abdur Chowdhury Discovery & Emergence Note special location: Terman Engineering Center, Room 453 ## Spring 08-09 Date Speaker Topic Apr 3 Mayur Datar Google Internet search advertising: Challenges in targeting and quality considerations Apr 10 - No meeting this week Apr 17 D Sivakumar Google Affiliation Networks Apr 24 - No meeting -- WWW 2009 May 1 - No meeting -- BAGT May 8 Sandeep Pandey Yahoo! Research Using Bandit Models in Search Engines May 14 Richard Chatwin VP Research, Adchemy Research Challenges in Online Advertising May 22 Robert Johnson Facebook Scaling Social Data May 29 Yury Lifshits Yahoo! Research Data Cloud: Integrating Structured Information into Web Search June 5 Robbie Yan Manipulation Robustness of Collaborative Filtering Systems ## Winter 08-09 Date Speaker Topic Jan 16 - Organizational meeting Jan 23 G.D. Ramkumar SnapTell, Inc. Scalable image recognition algorithms for camera phone applications Jan 30 Luca de Alfaro Content-Driven Reputation and Trust for Collaborative Content Feb 6 Ioannis Antonellis Tagging with Queries: How and Why? (To appear in WSDM 2009) Feb 13 Scott Brave Baynote, Inc. Content Targeting and Recommendation Systems Feb 20 Ravi Kumar Online Social Networks: Modeling and Mining Feb 27 Michael Mahoney Community Structure in Large Social and Information Networks Mar 6 Kostas Kollias An Incentive-Based Architecture for Social Recommendations Mar 13 Ali Dasdan Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction ## Autumn 08-09 Date Speaker Topic Oct 3 - Talk by Jure Leskovec and organizational meeting Oct 10 Hady Lauw Trusts among Users of Online Communities -- an Epinions Case Study Oct 17 Canceled - Oct 24 Bahman Bahmani Online Budgeted Matching in Random Input Models with Applications to Adwords Oct 31 Sergei Vassilvitskii Top-k Aggregation Using Intersection of Ranked Inputs Nov 7 - No meeting - Bay Area Theory Seminar (BATS) Nov 14 Mukund Sundararajan Utility-Maximizing Privacy Mechanisms Nov 21 Hamid Nazerzadeh Online advertising Nov 28 - No meeting - Happy Thanksgiving! Dec 5 Tim Roughgarden From Bayesian to Worst-Case Optimal Auction Design ## Spring 07-08 Date Speaker Topic Apr 11BAGT Apr 18Robbie Yan Reputation Markets Apr 25Cancelled for WWW May 2Bobji Mungamuru Click Fraud May 9Ying Xu Oral Examination May 16 Esteban Arcaute Network Formation Games Mar 23Christina Aperjis Reputation Systems May 30Evimaria Terzi Inferring Links and Initiators June 6Yury Lifshits Social Design ## Winter 07-08 Date Speaker Topic Jan 25Alon Altman An Axiomatic Approach to Personalized Ranking Systems Feb 1Ashish Goel Open Problems Feb 8Cancelled Feb 15Brian Babcock Adchemy Feb 22Rajat Bhattacharjee Oral Examination Feb 29Panagiotis Papadimitriou Web Graph Similarity for Anomaly Detection Mar 7Brad Null Ratio Index for Budgeted Learning, with Applications Mar 14Mukund Sundararajan Optimal Marketing Strategies over Social Networks ## Autumn 07-08 Date Speaker Topic Oct 15Michael Schapira Interdomain Routing and Games Oct 22Esteban Arcaute Decentralized Search of Random Graphs Oct 29Orkut Buyukkokten Social Network Revolution Nov 5Aranyak Mehta Random input models for Adwords Nov 12Phil Long Boosting the area under the ROC curve Nov 19Thanks-giving Nov 26cancelled Dec 3Arash Asadpour Ad auctions with reserved price ## Spring 06-07 Date Speaker Topic Apr 13Ying Xu a discussion on WWW'07 papers Apr 20BAGT Apr 27Anish Das Sarma Detecting Near-Duplicates for Web Crawling May 4Mayur Datar Google News Personalization May 11Adam Meyerson Pricing Partially Compatible Products May 18Sergei Vassilvitskii Click Fraud Resistant Methods for Learning Click-Through Rates May 25Dilys Thomas Streaming Algorithms and Worm Fingerprinting June 1Neil Daswani Anatomy of Clickbot.A ## Winter 06-07 Date Speaker Topic Jan 26Nicolas Lambert Asymptotically Optimal Repeated Auctions Feb 2Hamid Nazerzadeh Mechanism design based on CPA Feb 9Ashish Goel Reusable and multiple goods Feb 16Ying Xu Estimating Search Engine Index and Web size Feb 23Mukund Sundararajan Is efficiency expensive? March 2Paul Heymann Collaborative tagging systems March 9Mihaela Enachescu Multi-path Routing March 16Aleksandra Korolova / Dilys Thomas Clickthrough of and Bidding on keywords ## Autumn 06-07 Date Speaker Topic Oct 25 Ying Xu The Netflix Challenge Nov 1 Ashish/Alexandra Social Networks Nov 8 Arash/Shubha PageRank Nov 15 TBA TBA Nov 22 - Netflix follow-up Nov 29 Zoltan Gyongyi TBA Dec 6 Sreenivas Gollapudi TBA ## Spring 05-06 Date Speaker Topic Apr 7 Ashish Goel Axiomatic PageRank Apr 14 - Organizational Meeting Apr 21 Sergei Vassilvitskii Relevance Feedback and Web Search Apr 28 Hamid Nazerzadeh AdWords with Unreliable Estimates May 5 Mihaela Enachescu Compact Routing May 12 Ying Xu Streaming Algorithms for Graph Problems May 19 - - May 26 Udi Manber New Problems in Search Jun 2 Rajat Bhattacharjee Incentive Based Ranking Mechanisms ## Winter 05-06 Date Speaker Topic Jan 20 - Organizational Meeting Jan 27 Brad Null Multiarmed Bandit Problem Feb 3 Edith Cohen Spatially-Decaying Aggregation Feb 10 Andrew Tomkins Routing and Growth Models for Social Networks Feb 17 - BAGT Feb 24 Amin Saberi Open Problems Mar 3 Mayur Datar, Ashutosh Garg Recommendation Systems Mar 10 Amruta Joshi Googlebase ## Fall 05-06 Date Speaker Topic Oct 7 Roee Engelberg Equilibria in Online Games Oct 14 Ying Xu, Sergei Vassilvitskii Models for the Web Graph Oct 21 Gagan Aggarwal AdWords Oct 28 Omar Benjelloun SERF Nov 4 Zoltan Gyongyi Link Spam Nov 11 Mihaela Enachescu Query Incentive Networks Nov 18 - BATS Dec 2 Rajat Bhattacharjee Survey of Recommendation Systems ## Spring 04-05 Date Speaker Topic Apr 4 Shubha Nabar Impact of Search Engines on Evolution of Page Popularity Apr 11 Svetlozar Nestorov Mining Structure from Semistructured Data Apr 18 Sepandar Kamvar Reputation Management in P2P Networks Apr 25 Rina Panigrahy PageRank Axioms May 2 Amin Saberi Spread of Viruses on the Internet May 9 Ramesh Johari Stable Policy Routing with Provider Independence May 16 Rajat Bhattacharjee Reputation Systems May 23 Mehran Sahami Matching Text on the Web Jun 6 Amruta Joshi Modeling Documents ## Winter 04-05 Date Speaker Topic Feb 2 Ying Xu Recommendation Systems/Collaborative Filtering Feb 9 Ashish Goel Spam and Page Rank Feb 16 Gagan Aggarwal AdWords Feb 23 David Arthur Gossip and File Sharing Mar 2 Anil Kamath AdWords Mar 10 Sergei Vassilvitskii Mar 16 Dilys Thomas Social Networks # Abstracts Computational Social Choice: For the People Ariel Procaccia, CMU Computational social choice deals with algorithms for aggregating individual preferences or opinions towards collective decisions. AI researchers (including myself) have long argued that such algorithms could play a crucial role in the design and implementation of multiagent systems. However, in the last few years I have come to realize that the "killer app" of computational social choice is helping people -- not software agents -- make joint decisions. I will illustrate this theme through two recent endeavors: Spliddit.org, a website that offers provably fair solutions to everyday problems; and Robovote.org, which provides optimization-driven voting methods. Throughout the talk, I will devote special attention to the theoretical foundations and results that make these services possible. Bio: Ariel Procaccia is an Assistant Professor in the Computer Science Department at Carnegie Mellon University, and an Affiliated Faculty in the Machine Learning Department. He usually works on problems at the interface of computer science and economics. His distinctions include the IJCAI Computers and Thought Award (2015), the Sloan Research Fellowship (2015), the NSF Faculty Early Career Development Award (2014), and the IFAAMAS Victor Lesser Distinguished Dissertation Award (2009); as well as multiple paper awards including Best Paper (2016) and Best Student Paper (2014) at the ACM Conference on Economics and Computation (EC). He is co-editor of the Handbook of Computational Social Choice (Cambridge University Press, 2016). Timing "versus" Matching: Dynamic Matching on Stochastic Networks Mohammad Akbarpour, Stanford We introduce a dynamic analogue of Erdos-Renyi random graph model and study the problem of finding the "maximum matching" in such dynamic graph. In our model, agents arrive and depart stochastically, and the composition of the trade graph depends endogenously on the matching algorithm. We show that if the planner can identify agents who are about to depart, then waiting to thicken the market is highly valuable, and if the planner cannot identify such agents, then matching agents greedily is close to optimal. The planner's decision problem in our model involves a combinatorially complex state space. However, we show that simple local algorithms that choose the right time to match agents, but do not exploit the global graph structure, can perform close to complex optimal algorithms. Finally, we consider a setting where agents have private information about their departure times, and design a continuous-time dynamic mechanism to elicit this information. Joint work with Shengwu Li and Shayan Oveis Gharan Bio: Mohammad Akbarpour is an Assistant Professor of Economics at Stanford GSB. He completed his PhD at Stanford University Economics department in 2015, then spent the 2015-2016 academic year at the University of Chicago Economics department, before coming back to Stanford in September 2016. Network reporting methods for sample surveys Dennis Feehan, UC Berkeley Surveys have traditionally been based on the idea that researchers can estimate characteristics of a population by obtaining a sample of individuals and asking them to report about themselves. Network reporting surveys generalize this traditional approach by asking survey respondents to report about members of their personal networks. Network reporting surveys can be used to study many important rare and hidden populations for which traditional survey methods are inadequate; for example, the approach has been used to estimate the size of epidemiologically important groups like sex workers, drug injectors, and men who have sex with men. I will introduce a framework for developing estimators from network reporting surveys and then I will present some results from a nationally-representative survey experiment that my colleagues and I conducted in Rwanda. Bio: Dennis Feehan is an Assistant Professor in the Department of Demography at UC Berkeley. He completed his PhD at Princeton University in 2015 and spent the fall/winter of 2015 as a postdoc at Facebook before starting at Berkeley last January. Online Learning in Repeated Auctions Philippe Rigollet, MIT Motivated by online advertising auctions, we consider repeated Vickrey auctions where goods of unknown value are sold sequentially and bidders only learn (potentially noisy) information about a good's value once it is purchased. We adopt an online learning approach with bandit feedback to model this problem and derive bidding strategies for two models: stochastic and adversarial. In the stochastic model, the observed values of the goods are random variables centered around the true value of the good. In this case, logarithmic regret is achievable when competing against well behaved adversaries. In the adversarial model, the goods need not be identical. Comparing our performance against that of the best fixed bid in hindsight, we show that sublinear regret is also achievable in this case. For both the stochastic and adversarial models, we prove matching minimax lower bounds showing our strategies to be optimal up to lower-order terms. To our knowledge, this is the first complete set of strategies for bidders participating in auctions of this type Joint with Jonathan Weed, and Vianney Perchet. Bio: Philippe Rigollet Near Feasible Stable Matchings Rakesh Vohra, University of Pennsylvania Stable matchings when there are complementarities in the preferences of one side need not exist. One instance of where this might matter is in the National Resident Matching Program (NRMP). With the participation of couples, who have preferences over pairs of hospital slots, stable matchings need not exist. Indeed, the problem of determining whether a stable matching exists in the presence of couples is NP-complete. For any student preferences, we show that each instance of a matching problem has a nearby' instance with a stable matching. The nearby instance is obtained by perturbing the capacities of the hospitals. Specifically, the capacity of each hospital will change (up or down) by no more than 4 and the total capacity of all hospitals will not go down and not down or increase by more than 9. These perturbation’ bounds are independent of the size of the instance. We think the technique used to arrive at this result of independent interest and should be useful elsewhere. It involves a combination of Scarf’s lemma and randomized rounding. Joint work with Thanh Nguyen Bio: Professor Vohra is known for his application of mathematical programming techniques to the study of Game Theory and Mechanism Design. He formerly taught at Northwestern University, where he was the John L. and Helen Kellogg Professor of Managerial Economics and Decision Sciences in the Kellogg School of Management, with additional appointments in the Department of Economics and the Department of Electrical Engineering and Computer Science. He came to Penn as part as of the Penn Integrates Knowledge program that President Amy Guttmann established in 2005 as a University-wide initiative to recruit exceptional faculty members whose research and teaching exemplify the integration of knowledge across disciplines. His appointment is shared between the Department of Economics in the School of Arts and Sciences and the Department of Electrical and Systems Engineering in the School of Engineering and Applied Science. Optimization in Online Content Recommendation Services: Beyond Click-Through Rates Yonatan Gur, Stanford Graduate School of Business A new class of online services allows internet media sites to direct users from articles they are currently reading to other content they may be interested in. This process creates a “browsing path” along which there is potential for repeated interaction between the user and the provider, giving rise to a dynamic optimization problem. A key metric that often underlies this recommendation process is the click-through rate (CTR) of candidate articles. While CTR is a measure of instantaneous click likelihood, we analyze the performance improvement that one may achieve by some lookahead that accounts for the potential future path of users. To that end, using a large data set of user path history at major media sites, we introduce and derive a representation of content along two key dimensions: clickability, the likelihood to click to an article when it is recommended; and engageability, the likelihood to click from an article when it hosts a recommendation. We then propose a class of heuristics that leverage both clickability and engageability, and provide theoretical support for favoring such path-focused heuristics over myopic heuristics that focus only on clickability (no lookahead). We conduct a live pilot experiment that measures the performance of a practical proxy of our proposed class, when integrated into the operating system of a worldwide leading provider of content recommendations. We estimate the aggregate improvement in clicks-per-visit relative to the CTR-driven current practice. The documented improvement highlights the importance and the practicality of efficiently incorporating for the future path of users in real time. Joint work with Omar Besbes and Assaf Zeevi. Bio: Yonatan Gur is an Assistant Professor of Operations, Information, and Technology at Stanford University’s Graduate School of Business. His research interests include stochastic modeling and optimization, data-driven decision-making, machine learning, and game theory with applications to revenue management, pricing, operations management, and service systems, and with a particular emphasis on dynamic, online environments. Yonatan received his PhD in Decision, Risk, and Operations from Columbia Business School. He also holds a B.Sc. from the school of Physics and Astronomy and an M.Sc. from the School of Mathematical Sciences, Tel Aviv University. Modeling of Personal Mobility Behavioral Patterns Irad Ben-Gal, Tel Aviv University In recent years, personal location data is continuously captured by mobile devices, GPS chips and other sensors. Such data provides a unique learning opportunity on individuals’ mobility behavior that can be used for various applications in transportation, marketing, homeland security, smart cities and more. Nonetheless, modeling such data poses new challenges related to data volume, diversity, inhomogeneity as well as the required granularity level. In this talk, we will address a real ‘smart city’ use-case and cover some of the associated opportunities and challenges. We will present a new set of mobility-behavior models that generalizes Markov Chains and Variable-Order Bayesian Networks. We will discuss how they can be used in different applications such as pattern recognition, anomaly detection, clustering and classification. Bio: The Impacts of Neighborhoods on Intergenerational Mobility Raj Chetty, Stanford University How can we improve economic opportunities for low-income children in the United States? We first characterize how rates of upward mobility vary across counties using data from tax records covering the U.S. population. We then study families who moved across areas to identify test whether the geographic variation is driven by the causal effects of place or heterogeneity in the types of people who live in each place. We find that every year of exposure to a better environment improves a given child’s chances of success, both in a national quasi-experimental study of five million families and in a re-analysis of the Moving to Opportunity Experiment. Finally, we present estimates of the causal effect of each county in America on upward mobility and characterize the features of high-mobility areas. Papers available at: www.equality-of-opportunity.org Bio: Raj Chetty joined Stanford as a Professor of Economics in 2015. Chetty's research combines empirical evidence and economic theory to help design more effective government policies. His current research focuses on equality of opportunity: how can we give children from disadvantaged backgrounds better chances of succeeding? Chetty's most recent honors include a MacArthur Fellowship and the John Bates Clark medal. Orchestrating Collective Intelligence to Modernize Statistical Data Collection Joe Reisinger, Premise Data Transparency and access to information is critical to high functioning economies. However, as the world becomes more interconnected, complex and volatile, our capacity to perform robust statistical measurement is becoming increasingly fragmented. Premise is building machinery for performing evidence-based surveys at a global scale, leveraging a network of 30,000 data contributors across 34 countries to capture structured information using mobile technology. This talk will focus on the call for modernizing data collection and why the new frontier of mobile technology is the conduit for the future of business and economics. On the platform-side, several open areas under active research will be highlighted: optimizing task allocation for quality and budget; designing for both exploration and exploitation; and combatting fraud and collusion on a network. Bio: Joe Reisinger is the CTO and co-founder of Premise, a mobile technology platform orchestrating and crowdsourcing the collection of micro-data for precise and robust economic measurement. He holds a PhD in Computer Science from the University of Texas and spent his academic career building natural language understanding systems at Google Research and IBM T.J. Watson. Prior to co-founding Premise, he was Chief Scientist at Metamarkets. Scaling up the Web Transparency: Algorithms and Challenges Augustin Chaintreau, Columbia University What are they doing with our data?’’ is what transparency ought to answer. The web giants – e.g. Google, Amazon, and Facebook, but also many other companies - leverage user data for personalization (recommendations, targeting advertisements, and increasingly often adjusting prices). Users currently have little insight, and at best coarse information, to monitor how and for which purposes their data are being used. What if instead one could tell you exactly which item - whether an email you wrote, a search you made, or a webpage you visit - has been used to decide on a targeted ad or a recommended product? But can we track our data at scale, and in environments we do not control? In this talk, we argue - unfortunately through real examples - that without web transparency the exciting world open with large collection of personal data compromises our longstanding commitment to fairness and equal opportunities. Fortunately, as we will show, web transparency may be (at least partially) restored. We built the first fine-grained tracking system for personal data that scales, and prove it successful in multiple services (Gmail, Amazon, Youtube). As we will characterize its performance, new hopes but also new challenges emerge to reconcile big data with accountability and societal progress. (based on joint works with Mathias Lécuyer, Riley Spahn, Guillaume Ducoffe and Roxana Geambasu) Bio: Augustin is an Assistant Professor of Computer Science at Columbia University since 2010, where he directs the Mobile Social Lab. He designs algorithms leveraging social mobile behaviors or incentives, to reconcile the risk and value of personal data networking. His research addressing transparency in personalization, fairness in personal data markets, efficiency in social information sharing, cross domain data linkage, and human mobility lead to 25 papers in tier-1 conferences (five receiving best or best student paper awards at ACM CoNEXT, SIGMETRICS, USENIX IMC, IEEE MASS, Algotel, some with large media coverage such as the NYT blog). An ex student of the Ecole Normale Supérieure in Paris, he earned a Ph.D in mathematics and computer science in 2006, the NSF CAREER Award in 2013 and the ACM SIGMETRICS Rising star award in 2013. He has been an active member of the network research community, serving in the program committees of ACM SIGMETRICS, SIGCOMM, WWW, CoNEXT (as chair), MobiCom, MobiHoc, IMC, WSDM, WWW, COSN, AAAI ICWSM, and IEEE Infocom, and as area editor for IEEE TMC and ACM SIGCOMM CCR. Prediction Market Trading Strategies: manipulation and forecasting David Rothschild, Microsoft Research Prediction markets offer the promise of high time granularity studies of subjective expectations over major and minor events. Further, they help us design the future of markets and non-probability polling. In this paper we illustrate the lessons from order books of real-money, election related prediction markets that show persistent arbitrage opportunities. First, examining a randomized field trial of actual trades we demonstrate that arbitrage opportunities are significantly larger than the order book indicates. Second, examining transaction-level data for all trades (nearly 300,000 over 6,300 users) in one of the exchanges we present evidence suggestive of market manipulation by a single large trader. Third, we demonstrate four additional price misalignments resulting from biased trading and implicitly asymmetric trading costs. Finally, we explain how markets are still accurate, but also detail how to improve the information flow in prediction and financial markets. And tie those lessons to the growing field of non-probability polling. Bio: David Rothschild is an economist at Microsoft Research. He has a Ph.D. in applied economics from the Wharton School of Business at the University of Pennsylvania. His primary body of work is on forecasting, and understanding public interest and sentiment. Related work examines how the public absorbs information. He has written extensively, in both the academic and popular press, on polling, prediction markets, social media and online data, and predictions of upcoming events; most of his popular work has focused on predicting elections and an economist take on public policy. He correctly predicted 50 of 51 Electoral College outcomes in February of 2012, average of 20 of 24 Oscars from 2013-6, and 15 of 15 knockout games in the 2014 World Cup. Collective Phenomena in Complex Networks: Coordination and Social Learning Ali Jadbabaie, University of Pennsylvania In this talk, I will present a high-level overview of my research in the past decade on collective behavior in networked, social, and engineering systems. these collective phenomena include social aggregation in animals such as schooling, herding, and flocking, and more broadly emergence of consensus, swarming, and synchronization in complex network of interacting dynamic systems. A common underlying theme in this line of study is to understand how a desired global behavior such as consensus, synchronization or a particular formation can emerge from purely local interactions. The evolution of these ideas into social systems has lead to a new theory of collective decision making among strategic agents. Examples include participation decisions in uprisings, social cascades and investment decisions in infrastructure. I will investigate distributed strategies for information aggregation, social learning and detection problems in networked systems where heterogeneous agents with different observations (with varying quality and precision) coordinate to learn a true state (e.g., finding aggregate statistics or detecting faults and failure modes in spatially distributed wireless sensor networks, or deciding suitability of a political candidate, quality of a product,and forming opinions on social issues of the day in social networks) using a stream of private observations and interaction with neighboring agents. I will end the talk with a description of contagion phenomena in networked systems and a new vision for graduate education at the interface of information and decision systems, data science and social sciences. Bio: Ali Jadbabaie is the Alfred Fitler Moore Professor of Network Science in the department of Electrical and Systems Engineering at the University of Pennsylvania (Penn). He holds secondary appointments in the departments of Computer and Information Science (CIS) in Penn Engineering, and Operations, Information, and Decisions (OID) in the Wharton School. A faculty member in Penn’s (GRASP) Lab, he is also the co-founder and director of the Raj and Neera Singh Program in Networked & Social Systems (NETS) at Penn Engineering. NETS is a new undergraduate interdisciplinary degree program focused on network science and engineering, operations research, social phenomena and social and technological networks. From January to October 2015, Ali was on sabbatical at LIDS at MIT. From October 2015 to January 2016, he took a leave from Penn to lead the Sociotechnical Systems Research Center at MIT as its interim director. During this period, he served as the Associate Director of MIT’s newly formed Institute for Data, Systems and Society (IDSS), where he helped create a newly-approved PhD program in Social and Engineering Systems and shaped the intellectual agenda of the newly formed Institute. Ali received his B.S. with High Honors from Sharif University of Technology, his M.S. in Electrical and Computer Engineering from the University of New Mexico in Albuquerque, and his Ph.D. in Control and Dynamical Systems from the California Institute of Technology (Caltech). He was a postdoctoral scholar at Yale University for a year before joining the faculty of the University of Pennsylvania (Penn) in July 2002. Aside from GRASP Lab and the NETS program, Ali is also a faculty member of The Warren Center for Network & Data Sciences at Penn and a faculty affiliate of the Center for Technology, Innovation and Competition at Penn Law School. Ali is the inaugural editor-in-chief of IEEE Transactions on Network Science and Engineering, a new interdisciplinary journal sponsored by 5 IEEE Societies, an Associate Editor of the INFORMS journal Operations Research and a former Associate Editor of IEEE Transactions on Control of Network Systems. He is a recipient of a Career Award from NSF, an ONR Young Investigator Award, the O. Hugo Schuck Best Paper Award of the American Automatic Control Council, and the George S. Axelby Best Paper Award of the IEEE Control Systems Society. He is also an IEEE Fellow. He has graduated 10 PhD students, and has served as mentor of 5 postdoctoral scholars who have become research leaders both in idustry and academia. His current research interests include decision and control theory with a focus on distributed optimization and control, collective behavior, network science, and the study of collective behavior in engineering systems and social and economic networks. Conversational Peers and Idea Generation: Evidence from a Field Experiment Sharique Hasan, Stanford Graduate School of Business Abstract: Social interaction is thought to affect individual and organizational innovation. We argue that individuals and teams are better able to generate high quality ideas if they converse with peers who provide them with new insight, perspective and information. Further, we argue that not all individuals can equally capitalize on this new information. Specifically, extroverted peers, because they are more willing to share and transmit their experiences facilitate idea generation. Moreover, innovators who are open to experience absorb this new information better and can more effectively convert it to better ideas. We test our claims using novel data from a randomized field experiment at an entrepreneurship bootcamp where individuals were randomized to both teams and conversational peers. We find that conversations with extroverted peers help individuals generate higher quality ideas, more open individuals benefit more from such peers, and teams with more cohesion can convert these raw ideas into better performance. Bio: Sharique Hasan is an Associate Professor of Organizational Behavior in the Graduate School of Business at Stanford University. His research focuses on understanding how social interaction affects processes such as learning, network dynamics, creativity and entrepreneurship. To study these processes he conducts field experiments. Crowdsourcing Societal Tradeoffs Vincent Conitzer, Duke University It would be desirable if, as a society, we could reduce the amount of landfill trash we create, the amount of carbon dioxide we emit, the amount of forest we clear, etc. Since we cannot (or are in any case not willing to) simultaneously pursue all these objectives to their maximum extent, we must prioritize among them. Currently, this is done mostly in an ad-hoc manner, with people, companies, local governments, and other entities deciding on an individual basis which of these objectives to pursue, and to what extent. A more systematic approach would be to set, at a global level, exact numerical tradeoffs: using one gallon of gasoline is as bad as creating x bags of landfill trash. Having such tradeoffs available would greatly facilitate decision making, and reduce inefficiencies resulting from inconsistent decisions across agents. But how could we arrive at a reasonable value for x? I will discuss this from the viewpoint of computational social choice. Joint work with Rupert Freeman, Markus Brill, and Yuqian Li Bio: Vincent Conitzer is the Kimberly J. Jenkins University Professor of New Technologies and Professor of Computer Science, Professor of Economics, and Professor of Philosophy at Duke University. He received Ph.D. (2006) and M.S. (2003) degrees in Computer Science from Carnegie Mellon University, and an A.B. (2001) degree in Applied Mathematics from Harvard University. His research focuses on computational aspects of microeconomics, in particular game theory, mechanism design, voting/social choice, and auctions. This work uses techniques from, and includes applications to, artificial intelligence and multiagent systems. Conitzer has received the Social Choice and Welfare Prize (2014), a Presidential Early Career Award for Scientists and Engineers (PECASE), the IJCAI Computers and Thought Award, an NSF CAREER award, the inaugural Victor Lesser dissertation award, an honorable mention for the ACM dissertation award, and several awards for papers and service at the AAAI and AAMAS conferences. He has also been named a Guggenheim Fellow, a Kavli Fellow, a Bass Fellow, a Sloan Fellow, and one of AI's Ten to Watch. Conitzer and Preston McAfee are the founding Editors-in-Chief of the ACM Transactions on Economics and Computation (TEAC). Design and Analysis for Experiments in Networks Johan Ugander, Stanford University A/B testing is a standard approach for evaluating the effect of online experiments; the goal is to estimate the Average Treatment Effect (ATE) of a new feature or condition by exposing a sample of the overall population to it. A drawback with A/B testing is that it is poorly suited for experiments involving social interference, when the treatment of individuals spills over to neighboring individuals along an underlying social network. This talk will describe a novel experimental framework, graph cluster randomization, for identifying average treatment effects under social interference. Given this framework, we analyze the variance of the average treatment effect as a property of the graph cluster design, and bias/variance trade-offs under exposure model misspecifications. This talk will discuss joint work with Lars Backstrom, Dean Eckles, Brian Karrer, Jon Kleinberg, and Joel Nishimura. Bio: Johan Ugander is an Assistant Professor of Management Science &ame; Engineering at Stanford University. His research develops algorithmic and statistical frameworks for analyzing social networks, social systems, and other large-scale social data. Prior to joining Stanford he was a post-doctoral researcher at Microsoft Research, Redmond from 2014-2015, held an affiliation with the Facebook Data Science team from 2010-2014, and obtained his Ph.D. in Applied Mathematics from Cornell University in 2014. Scalable Large Near-Clique Detection in Large-Scale Networks Charalampos E. Tsourakakis, Harvard School of Engineering and Applied Sciences Finding large near-(bi)cliques in massive networks is a notoriously hard problem of great importance to many applications, including anomaly detection and security, and community detection in social networks, and the Web graph. But can we exploit idiosyncrasies of real-world networks to attack this NP-hard problem successfully? In this talk I will answer this question in the affirmative. Specifically, I will present a significant advance in routines with rigorous theoretical guarantees for scalable extraction of large near-cliques from large-scale networks, the k-clique densest subgraph problem. I will present exact, approximation, and MapReduce algorithms that allow us to scale our computations to large-scale networks. At the heart of our scalable algorithm lies a densest subgraph sparsifier theorem. Our approach allows us to extract large near-(bi)cliques from a diverse collection of networks with millions of edges within few seconds. Finally, I will present interesting graph mining findings such as anomaly detection in real-world networks. I will conclude my talk with some interesting open problems. Bio: Dr. Charalampos Tsourakakis is currently a CRCS Postdoctoral Fellow in the School of Engineering and Applied Sciences (SEAS) at Harvard University. He received his Ph.D. from the Algorithms, Combinatorics and Optimization (ACO) program at Carnegie Mellon University (CMU). He also holds a Master of Science from the Machine Learning Department at CMU. He did his undergraduate studies in the School of Electrical and Computer Engineering (ECE) at the National Technical University of Athens (NTUA). He is the recipient of a best paper award in IEEE Data Mining and has designed two graph mining libraries for tera-scale graphs. The former has been officially included in Windows Azure, and the latter was a research highlight of Microsoft Research. His main research interest lies in designing scalable and data-aware algorithms for massive networks that recover hidden structure and solve complex data-driven problems. Racial Discrimination in the Sharing Economy: Evidence from a Field Experiment Michael Luca, Harvard Business School Online marketplaces increasingly choose to reduce the anonymity of buyers and sellers in order to facilitate trust. We demonstrate that this common market design choice results in an important unintended consequence: racial discrimination. In a large-scale field experiment on Airbnb, we find that requests from guests with distinctively African-American names are roughly 16% less likely to be accepted than identical guests with distinctively White names. The difference persists whether the host is African American or White, male or female. The difference also persists whether the host shares the property with the guest or not, and whether the property is cheap or expensive. Discrimination is costly for hosts who indulge in it: hosts who reject African-American guests are able to find a replacement guest only 35% of the time. On the whole, our analysis suggests a need for caution: while information can facilitate transactions, it also facilitates discrimination Joint work with Benjamin Edelman and Dan Svirsky. Bio: Michael Luca is an assistant professor of business administration at Harvard Business School. Professor Luca’s research applies econometric methods to field data in order to study the impact of information in market settings. He works closely with governments and companies to identify, design, implement, and experiment with different approaches to information disclosure, taking into account the behavioral foundations of how people make decisions. He has done work on rankings, expert reviews, online consumer reviews, and quality disclosure laws, among other types of information provision. Professor Luca has ongoing field research with Facebook, Yelp, several colleges, the UK government, and several U.S. cities, in addition to other partners. His current work focuses on crowdsourced reviews, analyzing a variety of companies including Yelp, Amazon, and ZocDoc. His findings have been written and blogged about in such media outlets as The Wall Street Journal, The New York Times, The Washington Post, The Huffington Post, Chicago Tribune, Harvard Business Review, PC World Magazine, and Salon. Professor Luca received his Ph.D. in economics from Boston University and a bachelor’s degree in economics and mathematics from SUNY Albany. Incentivizing Exploration Bobby Kleinberg, Cornell An important recent theme in the development of on-line social systems is the potential of crowdsourced effort to solve large problems — defining tasks in which many people can each contribute a small amount of time to the overall goal. In some cases the arrangement is based on a direct compensation scheme, in which a (low) rate is paid for each unit of work performed. But in many settings one only has access to a crowd "in the wild", as they go about their everyday activities. Examples include product recommendations, social news readers, and scientific activities ranging from crowdsourced "citizen science" to the funding of research by national agencies. In all of these domains, there is a problem of misaligned incentives: the designer's goal is to carry out exploration (of the space of news stories, products, bird habitats, etc.) as efficiently as possible, but for reasons of scale they must implement the exploration via a crowd composed of members who each derive their own utility from participating in the exploration. We model this as a multi-armed bandit problem in which selfish, myopic agents pull arms with publicly observable outcomes, and a principal seeking to maximize the cumulative time-discounted reward may influence the agents by offering monetary (or other) rewards contingent on choosing particular actions. Our main result is a full characterization of the trade-off between the expected payments the principal must make and the total reward that can be achieved. Joint work with Peter Frazier, David Kempe, and Jon Kleinberg. Bio: Bobby Kleinberg is an Associate Professor of Computer Science at Cornell University. His research studies the design and analysis of algorithms, and their relations to economics, learning theory, and networks. Prior to receiving his doctorate from MIT in 2005, he spent three years at Akamai Technologies, where he assisted in designing the world's largest Internet Content Delivery Network. He is the recipient of a Microsoft Research Faculty Fellowship, an Alfred P. Sloan Foundation Fellowship, and an NSF CAREER Award. Distributed Algorithms for Big Graph Analytics Alex Dimakis, UT Austin Many datasets from the web, social networks, bioinformatics and neuroscience are naturally graph-structured. Graph algorithms are difficult to implement in distributed computation frameworks like Hadoop MapReduce and for this reason several in-memory graph engines like Pregel, GraphLab and GraphX are being currently developed. Our goal is to design parallelizable algorithms that discover graph structural properties and theoretically analyze their performance. We give an overview of the emerging problems in designing distributed algorithms over such graph engines and discuss future directions. Bio: Alex Dimakis is an Assistant Professor at the Electrical and Computer Engineering department, University of Texas at Austin. From 2009 until 2012 he was with the Viterbi School of Engineering, University of Southern California. He received his Ph.D. in 2008 and M.S. degree in 2005 in electrical engineering and computer sciences from UC Berkeley and the Diploma degree from the National Technical University of Athens in 2003. During 2009 he was a CMI postdoctoral scholar at Caltech. He received an NSF Career award in 2011, a Google faculty research award in 2012 and the Eli Jury dissertation award in 2008. He is the co-recipient of several best paper awards including the joint Information Theory and Communications Society Best Paper Award in 2012. He is currently serving as an associate editor for IEEE Signal Processing letters. His research interests include information theory, coding theory, signal processing, and networking, with a current focus on distributed storage and machine learning. Estimating the Causal Impact of Recommendation Systems from Observational Data Jake Hofman, MSR NYC Recommendation systems are an increasingly prominent part of the web, accounting for up to a third of all traffic on several of the world's most popular sites. Nevertheless, little is known about how much activity such systems actually cause over and above activity that would have occurred via other means (e.g., search) if recommendations were absent. Although the ideal way to estimate the causal impact of recommendations is via randomized experiments, such experiments are costly and may inconvenience users. In this paper, therefore, we present a method for estimating causal effects from purely observational data. Specifically, we show that causal identification through an instrumental variable is possible when a product experiences an instantaneous shock in direct traffic and the products recommended next to it do not. We then apply our method to browsing logs containing anonymized activity for 2.1 million users on Amazon.com over a 9 month period and analyze over 4,000 unique products that experience such shocks. We find that although recommendation click-throughs do account for a large fraction of traffic among these products, at least 75% of this activity would likely occur in the absence of recommendations. We conclude with a discussion about the assumptions under which the method is appropriate and caveats around extrapolating results to other products, sites, or settings. Bio: Jake Hofman is a Researcher at Microsoft Research in New York City, where his work in computational social science involves applications of statistics and machine learning to large-scale social data. Prior to joining Microsoft, he was a member of the Microeconomics and Social Systems group at Yahoo! Research. Jake is also an Adjunct Assistant Professor of Applied Mathematics at Columbia University, where he has designed and taught classes on a number of topics ranging from biological physics to applied machine learning. He holds a B.S. in Electrical Engineering from Boston University and a Ph.D. in Physics from Columbia University. Competitive Algorithms from Competitive Equilibria Kamesh Munagala, Duke University Scheduling jobs is a fundamental problem that arises in numerous forms and in various situations. In a typical scheduling problem, jobs of unknown duration arrive in an online fashion. The rate at which jobs get processed at any time instant depends on the resources allocated by the scheduler to them. We introduce and study a general scheduling problem that we term the Packing Scheduling problem (PSP), where these rates are subject to arbitrary packing constraints. The PSP framework captures a variety of scheduling problems that arise in data center computing and multi-core architectures. More concretely, PSP models multidimensional resource requirements and parallelizability, as well as network bandwidth requirements arising in scheduling jobs on shared clusters. It also captures many classical scheduling problems such as heterogeneous machine scheduling, broadcast scheduling, and scheduling jobs of different parallelizability. For the PSP problem we show a constant competitive algorithm for minimizing total weighted completion time, when job arrivals and durations are unknown and adversarially chosen. Surprisingly, we achieve this result by applying the well-known Proportional Fairness algorithm from economics literature to allocate resources to jobs each time instant. Proportional Fairness has been extensively studied in the context of maximizing fairness in resource allocation; however, we present one of the first analyses in the context of scheduling. We also show that this algorithm has good theoretical guarantees for minimizing total delay (latency or flow time) in several special cases. Our results showcase the advantage of viewing complex scheduling problems as sequential resource allocation problems, and applying ideas from economics for both algorithm design and analysis. Joint work with Sungjin Im (UC Merced) and Janardhan Kulkarni (Duke). Bio: Kamesh Munagala is an Associate Professor of Computer Science at Duke University. He obtained his Ph.D. from Stanford University in 2003 and B.Tech. from IIT Bombay in 1998. He is primarily interested in approximation algorithms, sequential decision theory, and computational economics, as well as applications of these techniques to e-commerce, databases, and social networks. He is a recipient of the NSF CAREER Award and the Alfred P. Sloan Research Fellowship, and is a co-author of the best paper at WWW 2009. He was a Visiting Research Professor at Twitter in 2012, and currently serves as the Director of Graduate Studies for the Duke CS department. Physics-inspired Algorithms and Phase Transitions in Community Detection Cris Moore, Santa Fe Institute Detecting communities, and labeling nodes according to which community they belong to, is an essential problem in the study of social and biological networks. One of the best and fastest algorithms for this problem is Belief Propagation, which updates probability distributions of node labels until they reach a fixed point. I'll give a friendly introduction to Belief Propagation, and the analogy with the cavity method of statistical physics. Don't worry, no physics is required! In addition to being practically useful, our Belief Propagation algorithm can be studied analytically, revealing phase transitions in the ability of any algorithm to find communities. It also gives us principled ways to tell when communities are statistically significant, unlike algorithms that "overfit" and find illusory structure even in random networks. It can be extended to dynamic networks where nodes switch communities over time. Finally, I'll argue that the consensus of many good solutions is often better than the "best" single solution — and that this is true for many real-world optimization problems. This is joint work with Aurelien Decelle, Florent Krzakala, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborova, and Pan Zhang. Bio: Cristopher Moore received his B.A. in Physics, Mathematics, and Integrated Science from Northwestern University, and his Ph.D. in Physics from Cornell. He has published over 120 papers at the boundary between physics and computer science, ranging from quantum computing, to phase transitions in NP-complete problems, to the theory of social networks and efficient algorithms for analyzing their structure. With Stephan Mertens, he is the author of The Nature of Computation, published by Oxford University Press. He is a Professor at the Santa Fe Institute. In 2014, Cris Moore was named a fellow of the American Physical Society for fundamental contributions to the at the interface between nonlinear physics, statistical physics, and computer science, including complex network analysis, phase transitions in NP-complete problems, and the computational complexity of physical simulation. Optimal Contracts for Intermediaries in Online Advertising Ozan Candogan, Duke Fuqua Business School In online display advertising, the prevalent method advertisers employ to acquire impressions is to contract with an intermediary. These contracts involve upfront payments made by the advertisers to the intermediary, in exchange for running campaigns on their behalf. This paper studies the optimal contract offered by the intermediary in a setting where advertisers’ budgets and targeting criteria are private. This problem can naturally be formulated as a multi-dimensional dynamic mechanism design problem, which in general is hard to solve. We tackle this problem by employing a novel performance space characterization technique, which relies on delineating the expected cost and value achievable by any feasible (dynamic) bidding policy. This technique provides a convex optimization formulation of the optimal contract design problem. Using this formulation, we obtain a closed-form characterization of the optimal contract, when advertisers have identical value distributions. Conversely, when advertisers are heterogeneous, we provide a duality-based approach, which reduces the optimal contract design problem to a simpler convex optimization problem. The intermediary’s bid in the optimal contract is obtained by first using the optimal dual solution to compute a weighted average of the values associated with different types (to guarantee that the advertiser reports her type truthfully), and then shading this quantity (to account for budget constraints). Our results indicate that an intermediary can profitably provide bidding service to a budget-constrained advertiser, and at the same time increase the overall market efficiency. Joint work with Santiago Balseiro (Duke University). Bio: Ozan Candogan is an assistant professor at the Fuqua School of Business, where he is a member of the Decision Sciences area. Prior to joining Fuqua, he received his Ph.D. and M.S. degrees in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology. His research interests are in mechanism design, decision making in social and economic networks, and analysis of dynamic strategic interactions. His most recent work focuses on the design of novel iterative auction mechanisms for efficient allocation of resources, and he was a finalist in the 2013 George Nicholson Student Paper Competition. Potential applications of his research include novel auction formats that can be employed for procurement, pricing and advertising mechanisms that optimally utilize available social network information, and pricing mechanisms for selling cloud computing service. His research has appeared in journals such as Management Science, Operations Research, Mathematics of Operations Research, and Games and Economic Behavior. He is also a recipient of the 2012 Microsoft Research Ph.D. fellowship, and 2009 Siebel Scholarship. Randomized Field Experiments in Mobile Marketing Anindya Ghose, NYU Stern The explosive growth of smartphones and location-based services (LBS) has contributed to the rise of mobile advertising. In this talk, we present results from multiple randomized field experiments that are designed to measure the effectiveness of mobile marketing and promotions. In the first experiment, using data from a location based app for smartphones, we measure the effectiveness of mobile coupons. The aim is to analyze the causal impact of geographical distance between a user and retail store, the display rank, and coupon discounts on consumers’ response to mobile coupons. In a second large scale field study we examine the role of contextual crowdedness on the redemption rates of mobile coupons. We find that people become increasingly engaged with their mobile devices as trains get more crowded, and in turn become more likely to respond to targeted mobile messages. The change in behavior can be accounted for by the phenomenon of “mobile immersion”: to psychologically cope with the loss of personal space in crowded trains and to avoid accidental gazes, commuters can escape into their personal mobile space. In turn, they become more involved with targeted mobile messages they receive, and, consequently, are more likely to make a purchase in crowded trains. These studies causally show that mobile advertisements based on real-time static geographical location and contextual information can significantly increase consumers’ likelihood of redeeming a geo-targeted mobile coupon. However, beyond the location and contextual information, the overall mobile trajectory of each individual consumer can provide even richer information about consumer preferences. In a third study, we propose a new mobile advertising strategy that leverages full information on consumers’ offline moving trajectories. To examine the effectiveness of this new mobile trajectory-based advertising strategy, we designed a large-scale randomized field experiment in one of the largest shopping malls in the world. We find that mobile trajectory-based advertising can lead to highest redemption probability, fastest redemption behavior, and highest satisfaction rate from customers at the focal advertising store. Various practical implications for mobile marketing are discussed. Bio: Anindya Ghose is a Professor of IT and a Professor of Marketing at New York University's Leonard N. Stern School of Business. He is the co-Director of the Center for Business Analytics at NYU Stern. He is the Robert L.& Dale Atkins Rosen Faculty Fellow and a Daniel P. Paduano Fellow of Business Ethics at NYU Stern. He has been a Visiting Associate Professor at the Wharton School of Business. He also serves as the Scientific Advisor to 3TI China. He was recently named by Business Week as one of the "Top 40 Professors Under 40 Worldwide" and by Analytics Week as one the "Top 200 Thought Leaders in Big Data and Business Analytics". His research has been recognized by 10 best paper awards and nominations. He has published more than 75 papers in premier scientific journals and peer reviewed conferences, and has given more than 200 talks internationally. His research has been profiled and he has been interviewed numerous times in the BBC, Bloomberg TV, CNBC, China Daily, Financial Times, Fox News, Forbes, Knowledge@Wharton, Korean Broadcasting News Company, Los Angeles Times, MSNBC, NBC, New York Times, New York Daily, National Public Radio, NHK Japan Broadcasting, Reuters, Time Magazine, Washington Post, Wall Street Journal, Xinhua, and elsewhere. Scalable mining of massive networks: Distance-based Centrality, Similarity, and Influence Edith Cohen, Tel-Aviv University Graphs are ubiquitous in representing interactions between entities: communication, social relations, clicks, likes, follows, views, hyperlinks, transactions, and more. Structural properties of a corresponding graph successfully model fundamental properties of entities and their relations with applications to ranking, recommendations, attribute completion, and clustering. The shortest-path and reachability relation are the basis of natural measures of centrality, influence, and similarity. The measures can be defined over a static set of edges or, for robustness, over a distribution derived from the set of interactions. In the talk, I will first present and motivate a unified view of distance-based measures. I will then overview algorithmic and estimation tools from my own work which support scalable computation of these measures on massive networks. The approach is based on computing sample-based sketches which capture the relation of each node to all others. The sketches, with appropriate carefully derived estimators, are then used to provide quick high-confidence approximate answers for similarity, centrality, and influence queries. An advantages of distance-based measures over others based on the paths ensemble (random-walk/page-rank, Katz, resistance) is the combination of high scalability and application to asymmetric (directed) interactions, which are often harder to work with. Some of the material is based on joint work performed at the MSR Silicon Valley Lab, with Daniel Delling, Andrew Goldberg, Moises Goldzmidt, Fabian Fuchs, Thomas Pajor, and Renato Werneck. Bio: Edith Cohen is (visiting) full professor at Tel Aviv University. Until 2014 she was a Principal Researcher at Microsoft Research (Silicon Valley) and before that between 1991 and 2012 she was at AT&T Labs (initially AT&T Bell Laboratories). She was a visiting professor at UC Berkeley in 1997. She received a Ph.D in Computer Science from Stanford University in 1991. Her research interests include algorithms, mining and analysis of massive data, optimization, and computer networking. She is a winner of the IEEE ComSoc 1997 Bennett prize, and an author of 20+ patents and 100+ publications. Quadratic Voting Glen Weyl, University of Chicago/MSR While the one-person-one-vote rule often leads to the tyranny of the majority, alternatives proposed by economists have been complex and fragile. By contrast, we argue that a simple mechanism, Quadratic Voting (QV), is robustly very efficient. Voters making a binary decision purchase votes from a clearinghouse paying the square of the number of votes purchased. If individuals take the chance of a marginal vote being pivotal as given, like a market price, QV is the unique pricing rule that is always efficient. In an independent private values environment, any type-symmetric Bayes-Nash equilibrium converges towards this efficient limiting outcome as the population grows large, with inefficiency decaying as 1/N. We use approximate calculations, which match our theorems in this case, to illustrate the robustness of QV, in contrast to existing mechanisms. We discuss applications in both (near-term) commercial and (long-term) social contexts. Bio: Glen Weyl is a Researcher at Microsoft Research New England, an Assistant Professor of Economics and Law at the University of Chicago and an Alfred P. Sloan Research Fellow. He was Valedictorian of Princeton University as an undergraduate in 2007, received his PhD also from Princeton in 2008 and then spent three years as a Junior Fellow at the Harvard Society of Fellows before joining the faculty at Chicago. His research focuses on pure and applied price theory, especially applications to industrial and tax policy, as well as the intersection between economics and related fields such as law and philosophy. Outside his academic work, Glen co-founded a start-up commercializing Quadratic Voting and consults for platform start-ups through Applico Inc. Auctions and Decentralization in Complex Networks Paul Milgrom, Stanford Economists' analyses are most often based on simplifying assumptions that enable the analyst to focus on the roles of prices, substitution, and decentralized decision making. Computer scientists most often take a sharply different approach, focusing on worst cases and general algorithmic treatments, in which simple economic structures are mostly hidden from view. We combine elements from these two perspectives in an attempt to develop solutions for high-stakes auction market design problems, especially the reallocation of radio spectrum in North America. Bio: Paul Milgrom is the Shirley and Leonard Ely professor of Humanities and Sciences in the Department of Economics at Stanford University and professor, by courtesy, at the Stanford Graduate School of Business. He is a member of both the National Academy of Sciences and the American Academy of Arts and Sciences and a winner of the 2008 Nemmers Prize in Economics and the 2012 BBVA Frontiers of Knowledge award. He is best known for his contributions to the microeconomic theory, his pioneering innovations in the practical design of multi-item auctions, and the extraordinary successes of his students and academic advisees. According to his BBVA Award citation: “Paul Milgrom has made seminal contributions to an unusually wide range of fields of economics including auctions, market design, contracts and incentives, industrial economics, economics of organizations, finance, and game theory.” According to a count by Google Scholar, Milgrom’s books and articles have received more than 60,000 citations. Towards optimal algorithms for prediction with expert advice Balasubramanian Sivan, Microsoft Research We study the classic problem of prediction with expert advice. Focusing on settings with a constant number of experts, we develop optimal algorithms and obtain precisely optimal regret values for the case of 2 and 3 experts. Further, we develop regret lower bounds for the multiplicative weights algorithm that exactly match the known upper bounds for an arbitrary number of experts k. This establishes a constant factor separation between the regrets achieved by the optimal algorithm and the multiplicative weights algorithm for a constant number of experts. The optimal algorithms have a crisp interpretation. For instance, in the case of 2 experts, the optimal algorithm follows the advice of a particular expert with exactly the probability that the expert will turn out to be the best expert. Our main tool is the minimax principle which lets us analyze the optimal adversary to compute optimal regret values. While analyzing the optimal adversary, we establish deep connections with non-trivial aspects of random walk. We further use this connection to develop an improved regret bound for the case of 4 experts. Joint work with Nick Gravin and Yuval Peres Bio: Balasubramanian Sivan is a postdoctoral researcher at Microsoft Research, Redmond. He completed his PhD in Computer Science at the University of Wisconsin-Madison in 2013. His PhD thesis on “Prior Robust Optimization” was awarded the 2014 ACM SIGecom doctoral dissertation award and the University of Wisconsin-Madison Computer Science department’s outstanding graduate student researcher award. His research interests are in Algorithmic Game Theory, online and approximation algorithms, expert learning. More details can be found here: http://research.microsoft.com/en-us/um/people/bsivan/ Online A-B Testing Vivek Farias, MIT We consider the problem of A-B testing when the impact of a treatment is marred by a large number of covariates. This is the situation in a number of modern applications of A-B testing (such as in adTech and e-commerce) as well as in more traditional applications (such as clinical trials). Randomization can be highly inefficient in such settings, and thus we consider the problem of optimally allocating test subjects to either treatment with a view to maximizing the efficiency of our estimate of the treatment effect. Our main contribution is a tractable algorithm for this problem in the online setting where subjects arrive, and must be assigned, sequentially. We also characterize the value of optimization and show that it can be expected to grow large with the number of covariates. Finally, using a real-world impression dataset, we show that our algorithms can be expected to yield substantial improvements to efficiency in practice. Joint work with Nikhil Bhat and Ciamac Moallemi (Columbia GSB) Bio: Vivek Farias is interested in the development of new methodologies for large scale dynamic optimization and applications in revenue management, finance, marketing and healthcare. He received his Ph.D. in Electrical Engineering from Stanford University in 2007 and has been at MIT since, where he is the Robert N. Noyce Professor of Management. Outside academia, Vivek has contributed to ventures in finance and technology in the role of a researcher, consultant or founder. Vivek is a recipient of an IEEE Region 6 Undergraduate Student Paper Prize (2002), an INFORMS MSOM Student Paper Prize (2006), an MIT Solomon Buchsbaum Award (2008), an INFORMS JFIG paper prize twice (2009, 2011), the NSF CAREER award (2011), MIT Sloan’s Outstanding Teacher award (2013), and was a finalist for the 2011 INFORMS Pierskalla award in healthcare. The spread of information in social media Fil Menczer, Yahoo Labs/Indiana University This talk overviews ongoing work on information diffusion in social media, focusing in particular on mining, modeling, and prediction tasks on data from the Twitter network. I present machine learning efforts that leverage the structure of meme diffusion networks and many other features to detect misinformation campaigns, such as astroturf and social bots. We employ modeling techniques to understand the formation of communities, the creation of social ties, and the competition for attention. We investigate how different forms of structural and topical diversity in the network can be leveraged to predict which memes will go viral. Finally, it time permits, I will review a few crowdsourcing projects exploring the computation of unbiased scholarly impact metrics, the anonymous collection of sensitive data, and social good. Joint work with many members of the Center for Complex Networks and Systems Research at Indiana University. This research is supported by the National Science Foundation, McDonnell Foundation, and DARPA. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of these funding agencies. Bio: Filippo Menczer is a visiting scientist at Yahoo Labs in Sunnyvale during 2014-15 while on sabbatical from Indiana University, where he is a professor of informatics and computer science, adjunct professor of physics, and a member of the cognitive science program. He holds a Laurea in Physics from the University of Rome and a Ph.D. in Computer Science and Cognitive Science from the University of California, San Diego. Dr. Menczer has been the recipient of Fulbright, Rotary Foundation, and NATO fellowships, and a Career Award from the National Science Foundation. He serves as director of the Center for Complex Networks and Systems Research and is a Fellow of the Institute for Scientific Interchange Foundation in Torino, Italy, a Senior Research Fellow of The Kinsey Institute, and an ACM Distinguished Scientist. He previously served as division chair in the IUB School of Informatics and Computing, and was Fellow-at-large of the Santa Fe Institute. His research focuses on Web science, social networks, social media, social computation, Web mining, distributed and intelligent Web applications, and modeling of complex information networks. His work has been covered in The New York Times, Wall Street Journal, NPR, CNN, The Economist, Nature, Science, The Washington Post, Wired, The Atlantic, BBC, and many other US and international news sources. Inverting a Steady-State Sergei Vassilvitskii, Google Research We consider the problem of inferring choices made by users from data that only contains the relative popularity of each item. We propose a framework that models the problem as that of inferring a Markov chain given its stationary distribution. Formally, given a graph and a distribution on its nodes, the goal is to assign scores to nodes such that the stationary distribution of the following Markov chain will equal the given distribution on nodes: the transition probability from a node to its neighbor is proportional to the score of the neighbor. We prove sufficient conditions under which this problem is feasible, and, for the feasible instances, obtain a simple algorithm for a generic version of the problem. This iterative algorithm provably finds the unique solution to this problem and has a polynomial rate of convergence. In practice we find that the algorithm converges after fewer than 10 iterations. We then apply this framework to choice problems in online settings and show that our algorithm is able to explain the observed data and predict the user choice, much better than other competing baselines across a variety of diverse datasets. Joint work with Ravi Kumar, Andrew Tomkins, and Erik Vee. Bio: Sergei Vassilvitskii is a currently a Research Scientist at Google and a Fellow at the Applied Statistics Center at Columbia University. Previously, he was a Research Scientist at Yahoo! Research and an Adjunct Assistant Professor at Columbia University. He completed his PhD at Stanford University under the supervision of Rajeev Motwani. Strategic Learning and the Topology of Social Networks Omer Tamuz, Microsoft Research/Caltech We consider a group of people who each have to make a repeated choice between two lifestyle options, and who observe the choices of their friends in a social network. We assume that one of the two options is universally preferable, that each person's choice affects only his or her own health, and that any benefits are only observed in the distant future. We model this by a Bayesian game of pure informational externalities, in which agents start with conditionally independent private signals. We show that the question of whether or not the agents learn the preferable choice depends on the topology of the social network. In particular, we identify a geometric "egalitarianism" condition on the social network graph that guarantees learning in infinite networks, or learning with high probability in large finite networks, in any equilibrium of the game. Joint work with Elchanan Mossel and Allan Sly Bio: Omer Tamuz is a Schramm Postdoctoral Fellow at Microsoft Research New England / MIT, and an Assistant Professor (on leave) at the faculty of the Humanities and Social Sciences division at Caltech. He received his PhD in mathematics from the Weizmann Institute, where he studied learning in Bayesian games, as well as other topics in machine learning, probability theory and ergodic theory. Information Provision in Dynamic Innovation Tournaments Kostas Bimpikis, Stanford Innovation tournaments have emerged as a viable alternative to the standard research and development process. They are particularly suited for settings that feature a high degree of uncertainty about the feasibility of the innovation goal. Information about the progress of participants in these settings has an interesting dual role. On one hand, participants who have experienced no progress start becoming pessimistic about the underlying environment and whether the innovation goal is even feasible, so learning about their competitors' gradual progress provides a positive signal about the attainability of the goal and provides an incentive for participants to continue putting effort. On the other hand, information about the status of competition may adversely affect effort provision from the laggards, who start getting discouraged about their chances of winning the tournament. Because the tournament's success crucially depends on the effort decision of the participants, the tournament's information provision mechanism is a central feature of its design. This paper explores these issues and suggests a number of design guidelines -- with a focus on the role of intermediate awards as information provision devices -- that help implement the objective of maximizing the probability and minimizing the time of completing the end goal. Bio: Kostas Bimpikis is an Assistant Professor of Operations, Information and Technology at Stanford University's Graduate School of Business. Prior to joining Stanford, he spent a year as a postdoctoral research fellow at the Microsoft Research New England Lab. He received a PhD in Operations Research from the Massachusetts Institute of Technology in 2010, an MS in Computer Science from the University of California, San Diego and a BS degree in Electrical and Computer Engineering from the National Technical University of Athens, Greece. Sales Mechanisms in Online Markets: What Happened to Internet Auctions? Jonathan Levin, Stanford Consumer auctions were very popular in the early days of internet commerce, but today online sellers mostly use posted prices. Compositional shifts in the items being sold and the sellers offering these items cannot account for this evolution. Instead, the returns to sellers using auctions have diminished. We consider two hypotheses: a shift in buyer preferences away from auctions, and an increase in competition or search efficiency that favors favors posted prices. We use a model and data from eBay to tease these apart, finding that the former is more important, especially for more idiosyncratic products. We also discuss why auctions and posted prices can coexist in online markets. Bio: Jonathan Levin is Professor of Economics at Stanford University. His research interests include industrial organization and microeconomic theory. Quality Externalities and the Limits of Reputation in Two-Sided Markets Steve Tadelis, Berkeley Buyers in two-sided marketplace platforms may draw conclusions about the quality of the platform from any single transaction. This induces an externality across sellers that reputation mechanisms will not alleviate. Furthermore, buyers who abandon the platform without leaving feedback will cause seller reputations to be biased. Using data from eBay, we document this externality and argue that platforms can mitigate it by actively screening sellers and promoting the prominence of better quality sellers. Exploiting the bias in feedback, we create a measure of seller quality and demonstrate the benefits of our approach through a controlled experiment that prioritizes better quality sellers to a random subset of buyers. We thus highlight the importance of externalities in two-sided markets and chart an agenda that aims to create more realistic models of two-sided markets. Bio: Steve Tadelis is a professor of economics at Berkeley's Haas School of Business. From 2011 to 2013 he spent a two year leave at ebay Labs, where he put together and led a group of Economists who focus on the economics of e-commerce, with particular attention to creating better matches of buyers and sellers, reducing market frictions by increasing trust and safety in eBay's marketplace, understanding the underlying value of different advertising and marketing strategies, and exploring the market benefits of different pricing structures. Aside from the economics of e-commerce, Steve’s main fields of interest are the economics of incentives, industrial organization and microeconomics. His research has focused on a firm's reputation as a valuable, tradable asset; the effects of contract design and organizational form on firm behavior with applications to outsourcing and privatization; public and private sector procurement and award mechanisms; and the determinants of trust. The Dynamics of Online Reputation Sinan Aral, MIT Identity and reputation drive some of the most important decisions we make online: Who to follow or link to, whose information to trust, whose opinion to rely on when choosing a product or service, whose content to consume and share. Yet, we know very little about the dynamics of online reputation and how it affects our decision making. Sinan will describe a series of randomized experiments that explore the population level behavioral dynamics catalyzed by identity and reputation online. He will explore some of the implications for bias in online ratings, the foundations of social advertising and the ability to generate cascades of behavior through peer to peer social influence. The coming decades will likely see an emphasis on verified identities online. Sinan will argue that a new science of online identity could help guide our business, platform design and social policy decisions in light of the rising importance of online reputation. Bio: Sinan Aral is a Scientist and an Entrepreneur: He is the David Austin Professor of Management at MIT, where he holds joint Associate Professorships in the IT and Marketing groups and co-leads the Initiative on the Digital Economy. He was the Chief Scientist at SocialAmp, one of the earliest social commerce analytics companies (until its sale in 2012); and is currently the Chief Scientist at Humin, a social navigation startup developing the “Google Maps” for your social relationships. Sinan is the Scholar-in-Residence at the New York Times R&D Lab and has worked closely with Facebook, Yahoo, Microsoft, Nike, IBM, Intel, Cisco, Oracle, SAP and many other leading Fortune 500 firms on realizing business value from social media and IT investments. His research has won numerous awards including the Microsoft Faculty Fellowship, the PopTech Science Fellowship, an NSF CAREER Award and a Fulbright Scholarship. He was also recently named one of the “World’s Top 40 Business School Professors Under 40” by Poets & Quants. In his spare time, he cooks, skis and tell jokes about his own cooking and skiing. His most recent hobby is learning from his one year old son. You can find Sinan on Twitter @sinanaral. Control Profiles of Complex Networks Derek Ruths, McGill University Studying the control properties of complex networks provides insight into how designers and engineers can influence these systems to achieve a desired behavior. Topology of a network has been shown to strongly correlate with certain control properties; in this talk I will discuss the fundamental structures that explain the basis of this correlation. I'll then use these fundamental structures to construct a new statistic, the control profile, that quantifies the different proportions of control-inducing structures present in a network. Using the control profile, I'll show that standard random network models do not reproduce the kinds of control structures that are observed in real world networks. Furthermore, the profiles of real networks form three well-defined clusters that provide insight into the high-level organization and function of complex systems. From Small-World Networks to User-Driven Content Search Amin Karbasi, ETH Zurich In this talk, we will describe a new way of navigating through a database of similar objects using comparisons. In short, at each step, the database presents two objects to the user, who then selects among the pair the object closest to the target that she has in mind. This process continues until, based on the user’s answers, the database can uniquely identify the target she has in mind. This kind of interactive navigation (also known as exploratory search) and its variants have numerous real-life applications in content-based image/multimedia retrieval. We will argue that the above problem is strongly related to the small-world network design: given a graph embedded in a metric space, how should one augment this graph by adding edges in order to minimize the expected cost of greedy forwarding. We will first show that the small-world network design problem is NP-hard. Given this negative result, we propose a novel mechanism for small-world design and provide an upper bound on its performance. This mechanism has a natural equivalent in the context of content search through comparisons, and we establish both (order optimum) upper and lower bounds for its performance. These bounds are intuitively appealing as they provide performance guarantees in terms of the bias of the distribution of targets, captured by the entropy, as well as the topology of their embedding, captured by the doubling dimension. Savings Monitors Arun G. Chandrasekhar, Stanford University We conduct a field experiment in India to explore two interventions to help individuals to increase their savings balances. First, we design a financial product based on the popular business correspondent model, which includes frequent reminders, assistance in account opening, and the setting of a six-month savings goal. Second, we measure the effectiveness of adding a peer monitoring component to this basic bundle and test whether the local social network can help to increase the penetration of the formal banking system. We ask whether having a monitor substitutes for a formal commitment device, whether individuals choose the most effective monitors, and moreover, whether some community members are better than others at encouraging financial capability. Computational Fair Division: From Cake Cutting to Cloud Computing Ariel Procaccia, Carnegie Mellon University I will present an exciting new interaction between computer science and fair division theory. On the one hand, I will show how computational thinking provides a novel perspective on classic problems in fair division, such as cake cutting and estate division. On the other hand, I will explain how fair division theory is informing the allocation of computational resources. I will also talk about the integration of some of these theoretical ideas into a not-for-profit fair division website that aims to make the world just a bit fairer. Big Thinking: Augmenting human cognition with crowds and computation Niki Kittur, Carnegie Mellon University Despite the growing availability of information online, individual human cognition is fundamentally limited in the speed and amount of information it can process at once. In this talk I will discuss ways of augmenting individual cognition by taking advantage of crowds and computation. Our approaches combine the flexibility of many human minds working together with the raw power of computational systems to accelerate learning, innovation, knowledge production, and scientific discovery. I will discuss several novel crowdsourcing, machine learning and visualization systems that augment the uniquely human abilities of analogy, schema induction, and problem solving. Examples include augmenting human thinking in product innovation (e.g., Quirky); knowledge production (Wikipedia); creative tasks such as writing, journalism and poetry; and making sense of multivariate and graph data. OSNs: The Wisdom of a Few and The Value of Users Ricardo Baeza-Yates, Yahoo Labs One of the main differences between traditional Web analysis and Online Social Networks (OSNs) studies, is that in the first case the information is organized around content, while in the second case it is organized around people. While search engines have done a good job finding relevant content across billions of pages, nowadays we do not have an equivalent tool to find relevant people in OSNs. Even though an impressive amount of research has been done in this direction, there are still a lot of gaps to cover. Although our first intuition could be (and was!) search for popular people, previous research have shown that users' in-degree (e.g. number of friends or followers) is important but not enough to represent the importance and reputation of a person. Another approach is to study the content of the messages exchanged between users, trying to identify topical experts. However the computational cost of such approach - including language diversity - is a big limitation. In our work we take a content-agnostic approach, focusing in frequency, type, and time properties of user actions rather than content, mixing their static characteristics (social graph) and their activities (dynamic graphs). Our goal is to understand the role of popular users in OSNs, and also find hidden important users: Do popular users create new trends and cascades? Do they add value to the network? And, if they don't, who does it? Who generates the content of the OSN? Our research provides preliminary answers for these questions. This is joint work with Diego Saez-Trumper and colleagues in Brazil and USA. Algorithms for Creating Game-Theoretic Strategies for Large Incomplete-Information Games Tuomas Sandholm, Carnegie Mellon University Incomplete-information games - such as most auctions, negotiations, and future (cyber)security settings - cannot be solved using minimax search even in principle. Completely different algorithms were needed. A dramatic scalability leap has occurred in our ability to solve such games over the last seven years, fueled largely by the Annual Computer Poker Competition. I will discuss the key domain-independent techniques that enabled this leap, including automated abstraction techniques and approaches for mitigating the issues that they raise, new equilibrium-finding algorithms, safe opponent exploitation methods, techniques that use qualitative knowledge as an extra input, and endgame solving techniques. I will finish by benchmarking poker programs against the best human poker professionals and by discussing what humans can learn from the programs. "Going Viral" and the Structure of Online Diffusion Sharad Goel, Microsoft Research NYC New products, ideas, norms and behaviors are often thought to propagate through a person-to-person diffusion process analogous to the spread of an infectious disease. Until recently, however, it has been prohibitively difficult to directly observe this process, and thus to rigorously quantify or characterize the structure of information cascades. In one of the largest studies to date, we describe the diffusion structure of billions of events across several domains. We find that the vast majority of cascades are small, and are characterized by a handful of simple tree structures that terminate within one degree of an initial adopting "seed." While large cascades are extremely rare, the scale of our data allows us to investigate even the one-in-a-million events. To study these rare, large cascades, we develop a formal measure of what we label "structural virality" that interpolates between two extremes: content that gains its popularity through a single, large broadcast, and that which grows via a multi-generational cascade where any one individual is directly responsible for only a fraction of the total adoption. We find that the very largest observed events nearly always exhibit high structural virality, providing some of the first direct evidence that many of the most popular products and ideas grow through person-to-person diffusion. However, medium-sized events -- having thousands of adopters -- exhibit surprising structural diversity, and are seen to grow both through broadcast and viral means. Finally, we show that our empirical results are largely consistent with an SIR model of contagion on a scale-free network, reminiscent of previous work on the long-term persistence of computer viruses. The Value of Network Information Itay Fainmesser, Brown University Motivated by the proliferation of companies, such as Facebook.com, MySpace and Twitter, whose business models rely, at least in part, in monetizing the information on the interactions and influences of their users, we evaluate how much a monopoly can increase its profit by exploiting individual level data on "who influences whom" when designing his selling strategy. We also analyze the effect that the use of such information has on consumers' welfare. Our framework incorporates a rich set of market products, including goods characterized by global and local network effects. It also has the novel feature to distinguish between the level of influence of a consumer and her susceptibility to influence, a distinction that has been shown to be important empirically. Joint work with Andrea Galeotti. Prediction in complex networks: The impact of structure on learning and prediction Jennifer Neville, Purdue University The recent popularity of online social networks and social media has increased the amount of information available about users' behavior--including current activities and interactions among friends and family. This rich relational information can be exploited to predict user interests and preferences even when individual data is sparse, as the relationships are a critical source of information that identify potential statistical dependencies among people. Although network data offer several opportunities to improve prediction, the characteristics of real world datasets present a number of challenges to accurately incorporate relational information into machine learning algorithms. In this talk, I will discuss the effects of sampling, parameter tying, and model roll-out on the properties of the resulting statistical models--which occurs through a complex interaction between local model properties, global network structure, and the availability of observed attributes. By understanding the impact of these interactions on algorithm performance (e.g., learning, inference, and evaluation), we can develop more accurate and efficient analysis methods for large, partially-observable social network and social media datasets. Eliciting and aggregating information from the crowd Anirban Dasgupta, Yahoo Labs! Crowdsourcing is now widely used to replace judgement or evaluation by an expert authority with an aggregate evaluation from a number of non-experts, in applications ranging from rating and categorizing online content all the way to evaluation of student assignments in massively open online courses (MOOCs) via peer grading. Two key issues in these settings are: i) how to provide incentives such that agents in the crowd put in their 'full effort' in making good evaluations, as well as truthfully report them and ii) how to aggregate these evaluations, thereby giving an accurate estimate of the ground truth, given that the agents are of varying, unknown, expertise. In this talk, we look at both of these problems. For the first, we present an information elicitation mechanism for multiple tasks when agents have endogenous proficiencies, with the following property: exerting maximum effort followed by truthful reporting of observations is a Nash equilibrium with maximum payoff. For the aggregation problem, we consider a model where each task is binary and each agent has an unknown, fixed, reliability that determines the agent's error rate in performing tasks. The problem is to determine the truth values of the tasks solely based on the agent evaluations. We present algorithms whose error guarantees depend on the expansion properties of the agent-task graph--- these algorithms perform well empirically. Joint work with Nilesh Dalvi, Arpita Ghosh, Vibhor Rastogi and Ravi Kumar. Should auctions be complex? Noam Nisan, Hebrew University of Jerusalem Do simple auctions suffice for obtaining high revenue or do we need to design complex ones? For the case of selling one item, Myerson shows that the answer is "yes": the revenue-maximizing auction is deterministic with a single reserve price. When selling two (or more) items, we show that the answer is "no": complex auctions can yield infinitely more revenue than simple ones, even when there is only a single bidder with an additive valuation. "Complex" may be defined in various ways here including allowing randomization as well as our choice of measuring the auction's menu size. However, when the bidder’s values for the items are independently distributed, the answer is “approximately yes”: selling each of two items simply and separately yields at least half the revenue of any auction. Joint work with Sergiu Hart. The extent of price misalignment in prediction markets David Pennock, Microsoft Research, NYC We examine misaligned prices for logically related contracts in prediction markets. First, we uncover persistent arbitrages between identical contracts on different exchanges; price support beyond what is in the published order book multiplies the size of these arbitrages. Second, misalignment between logically related contracts listed on the same exchange occurs when contracts systemically shut down or fail to respond efficiently around moments of high information flow. Third, other examples of bounded rationality include: consistent asymmetry between buying and selling, and persistent price lags between exchanges. Fourth, we propose solutions to these problems by fixing logical contradictions within the exchange and providing flexible trading interfaces, freeing traders to focus on providing information in the form they find most natural. Joint work with David Rothschild. Adaptive Seeding in Social Networks Yaron Singer, Google and Harvard University The rapid adoption of social networking technologies throughout the past decade is bringing special attention to algorithmic and data mining techniques designed for maximizing information cascades in social networks. Despite the immense progress which has been made in the past decade, due to limited data access and the structure of social networks the application of state-of-the-art techniques often results in poor performance. In this talk we will introduce a new framework we call adaptive seeding. The framework is a two-stage model which leverages a phenomenon known as the "friendship paradox" in social networks in order to dramatically increase the spread of information cascades. Our main result shows constant factor approximations are achievable for the most well-studied models of information spreading in social networks. The result follows from new techniques and concepts that may be of independent interest for those interested in stochastic optimization and machine learning. Joint work with Lior Seeman. Pricing in Social Networks Francis Bloch, Ecole Polytechnique, France. We analyze the problem of optimal monopoly pricing in social networks where agents care about consumption or prices of their neighbors. We characterize the relation between optimal prices and consumers' centrality in the social network. This relation depends on the market structure (monopoly vs. oligopoly) and on the type of externalities (consumption versus price). We identify two situations where the monopolist does not discriminate across nodes in the network (linear monopoly with consumption externalities and local monopolies with price externalities). We also analyze the robustness of the analysis with respect to changes in demand, and the introduction of bargaining between the monopolist and the consumer. (joint with Nicolas Querou) Language and Social Dynamics in Online Communities Cristian Danescu-Niculescu-Mizil, Stanford University and Max Planck Institute SWS. Much of online social activity takes the form of natural language, from product reviews to conversations on social-media platforms. I will show how analyzing these interactions from the perspective of language use can provide a new understanding of social dynamics in online communities. I will describe two of my efforts in this direction. The first project leverages insights from psycholinguistics to build a novel computational framework that shows how key aspects of social relations between individuals are embedded in (and can be inferred from) their conversational behavior. In particular, I will discuss how power differentials between interlocutors are subtly revealed by how much one individual immediately echoes the linguistic style of the person they are responding to. The second project explores the relation between users and their community, as revealed by patterns of linguistic change. I will show that users follow a determined lifecycle with respect to their susceptibility to adopt new community norms, and how this insight can be harnessed to predict how long a user will stay active in the community. This talk includes joint work with Susan Dumais, Michael Gamon, Dan Jurafsky, Jon Kleinberg, Jure Leskovec, Lillian Lee, Bo Pang, Christopher Potts and Robert West. References: WWW 2013, WWW 2012, WWW 2011. Current challenges in kidney exchange - need for long chains and dynamic matching Itai Ashlagi, MIT Sloan A kidney exchange network establishes a pool of patient-donor pairs, non-directed donors, and patients on the deceased donor waiting lists, and seeks to arrange transplants among this pool. It has been previously shown that for sufficiently large pools of patient-donor pairs, (almost) efficient kidney exchange can be achieved by using at most 3-way cycles, i.e. by using cycles among no more than 3 patient-donor pairs. However, as kidney exchange has grown in practice, cycles among n > 3 pairs have proved useful, and long chains initiated by non-directed, altruistic donors have proven to be very effective. We explore why this is the case, both empirically and theoretically. We provide an analytical model of exchange when there are many highly sensitized patients, and show that large cycles of exchange or long chains can significantly increase efficiency when the opportunities for exchange are sparse. As very large cycles of exchange cannot be used in practice, long non-simultaneous chains initiated by non-directed donors significantly increase efficiency in patient pools of the size and com- position that presently exist. Most importantly, long chains benefit highly sensitized patients without harming low-sensitized patients. One way to make kidney exchange networks thicker is by waiting for pairs to arrive—waiting longer allows the pool to become larger, but imposes costs on those already in the pool. We study empirically and theoretically the impact of the waiting periods on the number of matches. Based on joint work with David Gamarnik, Alvin Roth, Mike Rees, Patrick Jaillet and Vahideh Manshadi Take it or Leave it: Running a Survey when Privacy Comes at a Cost Katrina Ligett, Caltech In this talk, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by. Joint work with Aaron Roth. paper: http://users.cms.caltech.edu/~katrina/papers/buyingprivacy.pdf Deceased Organ Donation and Allocation: 3 Experiments in Market Design Al Roth, Stanford How should you allocate a scarce resource whose scarcity/availability might be sensitive to how you allocate it? I'll talk about some work I'm just beginning involving the donation and allocation of deceased donor organs for transplantation. This presents a different set of problems from our (more advanced) work on live organ donation and kidney exchange that has received a lot of recent publicity (and was the subject of Itai Ashlagi's recent talk). However the evidence suggests that even for the case of kidney transplantation the current scarcity cannot be solved by live donation alone, and for most organs live donation is not feasible. I'll discuss the new Israeli organ transplant law, and some experiments that will help us understand it better as field data become available. I'll also discuss an experiment in how organ donors are solicited in the United States. DIMACS Workshop on Economic Aspects of Information Sharing The DIMACS Workshop on Economic Aspects of Information Sharing is being held at Stanford Feb 7-8, 2013. It is being co-hosted and co-sponsored by RAIN. In recent years, the collection and analysis of user data over the Internet has experienced a surge. Online services and social networking sites have facilitated and encouraged the online disclosure of a wide array of personal information, while webtracking platforms have enabled the consolidation of user data from disperse sources. The reach of these services has been significantly extended by the integration of mobile devices with the cloud. Companies increasingly view data analytics as an integral part of their operations, and there is an increased need of new means of monetizing data beyond advertising. In parallel, regulatory bodies, consumer advocacy groups as well as Internet companies themselves have struggled to find a balance between the financial benefits of user data monetization and the privacy concerns that it raises. Motivated by these developments, this workshop aims at studying economic aspects of information sharing from several different aspects, including online data markets, targeted advertising, user incentive mechanisms, and privacy-aware mechanism design. Citation networks: Evaluating the impact of science Santo Fortunato, Aalto University, Finland Citation statistics are regularly adopted to evaluate different actors in the academic and scientific arena, from scholars, to journals, departments, universities, national institutions and whole countries. The scores play a crucial role to decide which grants are awarded, the rankings of applicants for a job, even the fate of scientific institutions. For these reasons citation networks have been carefully investigated over the past decades. I will discuss some recent findings in this area. Citation distributions are universal across many different disciplines, if suitable normalized indicators are adopted as measures of performance, so avoiding the bias of field dependence. Furthermore, citation dynamics is characterized by bursts, usually occurring within a few years since the publication of a paper, and the burst size spans several orders of magnitude. This bursty dynamics can be described by a linear preferential attachment model with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well. Crowd-Powered Systems Michael Bernstein, Stanford Crowd-powered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone. Unfortunately, crowd work is error-prone and slow, making it difficult to incorporate crowds as first-order building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems. In this talk, I will present two crowd-powered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid micro-contributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in mid-air jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation. Rumor in a network: Who's the culprit? Devavrat Shah, MIT Suppose a malicious "worm" has spread in a network (e.g. Stuxnet) and interest is in finding who is the source (so that many of the following possible actions be taken: report news to the world, reward, take punitive action). Motivated by such scenarios where "information" diffuses in the network starting from an unknown source and interest is in identifying the "source" based on the observed spread, we discuss the problem of finding the source of a rumor in a network. We will design a computationally efficient source estimator for the popular Susceptible-Infected (SI) information spreading model with generic spreading time. In establishing the effectiveness of the estimator, we shall explain the role played by the underlying network structure (a form of expansion property) in determining ability of detection. Connections to generalized Polya's urn, network centrality and number 1 - ln 2 will be explained. This is based on joint work with Tauhid Zaman (MIT). On Bitcoin and Red Balloons Moshe Babaioff, Microsoft Research Many large decentralized systems rely on information propagation to ensure their proper function. We examine a common scenario in which only participants that are aware of the information can compete for some reward, and thus informed participants have an incentive {\em not} to propagate information to others. One recent example in which such tension arises is the 2009 DARPA Network Challenge (finding red balloons). We focus on another prominent example: Bitcoin, a decentralized electronic currency system. Bitcoin represents a radical new approach to monetary systems. It has been getting a large amount of public attention over the last year, both in policy discussions and in the popular press \cite{NY11,technology-review}. Its cryptographic fundamentals have largely held up even as its usage has become increasingly widespread. We find, however, that it exhibits a fundamental problem of a different nature, based on how its incentives are structured. We propose a modification to the protocol that can eliminate this problem. Bitcoin relies on a peer-to-peer network to track transactions that are performed with the currency. For this purpose, every transaction a node learns about should be transmitted to its neighbors in the network. As the protocol is currently defined and implemented, it does not provide an incentive for nodes to broadcast transactions they are aware of. In fact, it provides an incentive not to do so. Our solution is to augment the protocol with a scheme that rewards information propagation. Since clones are easy to create in the Bitcoin system, an important feature of our scheme is Sybil-proofness. We show that our proposed scheme succeeds in setting the correct incentives, that it is Sybil-proof, and that it requires only a small payment overhead, all this is achieved with iterated elimination of dominated strategies. We complement this result by showing that there are no reward schemes in which information propagation and no self-cloning is a dominant strategy. Joint with Shahar Dobzinski, Sigal Oren, and Aviv Zohar. ------------------------------- You can find the paper here and here. A short description appeared at ACM SIGecom Exchanges. Information Asymmetries in first-price common-value auctions, or The Value of Information David Kempe, USC We study the role of information asymmetries in first-price common value auctions, and how different information structures provide revenue opportunities to third-party information brokers. One of our motivations is Internet ad auctions, where ads for specific site visitors have a common-value nature, in that displaying an ad to a bot has no value, and users are likely to carry the sameÂ (or similar) value to different advertisers. Cookies significantly improve the estimate of this value; the fact that different advertisers use cookies to different extents (and use cookies of different qualities) thus leads to significant information asymmetries. These asymmetries can be mitigated or exacerbated when the advertisers obtain cookie information from third-party information brokers. We focus on a first-price common value auction in which bidders have access to independent discrete signals about the value of an item, which is either 0 or 1. Our first main result is that with two bidders, this auction format has a unique Nash equilibrium; we give a linear-time algorithm for computing this equilibrium and the resulting value of the auction for each player. We then use this equilibrium calculation to understand the value of additional information to the bidders. Third-party information has the interesting property that it can (in principle) be allocated to multiple bidders, and that it carries significant externalities. We find that the value of additional information is subtle, and depends on the prior and the information already available. We exhibit several surprising possibilities including (1) Additional information about the item does not always result in higher value for a bidder. (2) A bidder may prefer his competitor (but not himself) to receive the broker's signal; this applies both to the more and less informed bidder. [Joint work with Vasilis Syrgkanis and Eva Tardos] Random Utility Models for Social Choice Lirong Xia, Harvard Social choice studies ordinal preference aggregation with wide applications ranging from high-stakes political elections to low-stakes movie rating websites. In many cases, we want to design a social choice mechanism that reveals the ground truth via MLE inference. Despite its importance, this objective has been largely hindered by the lack of natural probabilistic models and efficient inference algorithms. In this talk, I will focus on a wide class of probabilistic models called Random Utility Models (RUMs), whose MLE inference was previously believed intractable in general. I will introduce a fast MC-EM algorithm for a very general and natural subclass of RUMs, and discuss its applications and impact on designing better social choice mechanisms. Extension of this algorithm also provides a computational basis for improving models in many applications in economics as well as machine learning, especially learning to rank. Based on joint work with Hossein Azari Soufiani and David C. Parkes. Advertiser Behavior in Sponsored Search Auctions Susan Athey, Stanford Economists often build "structural models," where they specify a specific model of individual behavior and then use data to estimate the parameters of the model. Although such models require strong assumptions, they have the advantage that they can make principled predictions about what would happen if the environment changed in a way that has not been observed in the past. I will describe the application of these techniques to advertiser behavior in sponsored search advertising auctions, focusing on how the models can be used for marketplace design and management. I will discuss economists' focus on causal inference in statistical models as well as the ways in which experiments can be used to estimate and test structural models. I will also make some suggestions about research directions at the intersection of economics and machine learning. Automatic Discovery of Patterns in Media Content Nello Cristianini, University of Bristol What can we learn about the world (and the media system) by analysing millions of news articles or tweets ? Media content analysis has historically been the domain of the social sciences, but recently we are witnessing a strong trend towards the automation of many tasks, paving the way for a new - computational - approach to social science and even the humanities. In this talk we will survey the results obtained over the past 5 years at the Intelligent Systems Laboratory of Bristol, in the area of automating the analysis of news media content. By combining techniques from machine translation, pattern recognition, statistical learning, information retrieval, we will analyse patterns connected to the US Presidential Elections, to UK public opinion, and to EU cultural biases. Auctions and Allocations with Positive Network Externalities Kamesh Munagala, Duke University With the advent of social networks such as Facebook, Twitter, and LinkedIn, and online offers/deals web sites, network externalties raise the possibility of marketing and advertising to users based on influence they derive from their neighbors in such networks. Indeed, a user's valuation for a product is changed by his knowledge of which of his neighbors likes or owns the product. Designing allocation schemes and auctions in the presence of network externalities is not very well understood, and poses many challenges that are hard to address with traditional techniques. We show that even in very simple settings, optimal auction design becomes APX-Hard, and pricing schemes may not admit to Nash equilibria in buyer behavior. On the positive side, we present approximation algorithms for welfare maximization and optimal auctions in both single item and multi-item settings via interesting linear programming relaxations. Joint work with Anand Bhalgat and Sreenivas Gollapudi (working paper); and with Nima Haghpanah, Nicole Immorlica, and Vahab Mirrokni (EC 2011). Adolescent Societies - Their Form, Evolution, and Variation Daniel McFarland, Stanford University Adolescent societies - whether arising in weak, short-term classroom friendships or close, long-term friendships - exhibit tendencies toward segregation, hierarchy and clustering. It is difficult, however, to explain the macro-variation we observe in these social networks drawing only on tie formation micro-mechanisms. Some adolescent societies are rank-ordered caste systems and others are flat, cliquish worlds. What accounts for this variation? We propose and test an ecological theory of social network formation where features of organizational environments moderate tie formation processes to account for the macro-structural variation across settings. We develop this argument using longitudinal friendship data on schools (Add Health study) and classrooms (Classroom engagement study), and by extending exponential random graph models to the study of multiple societies over time. Joint work with James Moody, David Diehl, Jeff Smith, R. Jack Thomas The Simple Economics of Approximately Optimal Auctions Jason Hartline, Northwestern University Systems wherein strategic agents compete for limited resources are ubiquitous: the economy, computer networks, social networks, congestion networks, etc. Auction theory governs the design and analysis of these systems. I will describe a method for using approximation in auction theory to identify simple, practical, and robust auction mechanisms that perform nearly as well as the complex optimal ones. A main goal of this approach is to understand the extent to which important intuition from optimal auctions for ideal models extends to approximately optimal auctions for more realistic models. Auction theory is well understood only in ideal models where agents have single-dimensional, linear preferences. I.e., each agent has a value for receiving a service and her utility is the difference between her value and her payment. For these models optimal auctions can be interpreted as "marginal revenue" maximizers (Myerson, 1981; Bulow and Roberts 1989). In more realistic models, i.e., where bidders have multi-dimensional preferences for different outcomes or non-linear utility, similar intuitive characterizations are unknown. Understanding good auctions for these environments is considered to be the main challenge in auction theory. In these more realistic environments maximizing marginal revenue may not be optimal, and furthermore, there is sometimes no direct way to implementing the marginal revenue maximization mechanism. I will present two results: I will give procedures for implementing marginal revenue maximization in general, and I will show that marginal revenue maximization is approximately optimal. The approximation factor smoothly degrades in a term that quantifies how far the environment is from an ideal one (i.e., where marginal revenue maximization is optimal). Joint work with Saeed Alaei, Hu Fu, Nima Haghpanah, and Azarakhsh Malekian. Beyond Keyword Search: Taming Information Overload with Rich User Interactions Khalid El-Arini, Carnegie Mellon University It has been almost two decades since the first search engine appeared on the Web. In that time, much of our lives has drifted online, from the news we read to the products we buy to our relationships with friends, family and colleagues. As such, information overload has become a ubiquitous problem that affects nearly all members of society, as we try to sift through millions of articles and status updates to get an understanding of our world. However, while our information needs have grown more complex and varied over time, the traditional information retrieval paradigm has scarcely changed: users still submit strings of keywords, and receive a ranked list of hyperlinks in return. In this talk, I argue the need for new models of user interaction that are better suited to modern retrieval tasks. Some tasks, such as reading the news, can be inherently queryless, while others, such as conducting scientific research, require complex queries. In both cases, a user's personal tastes and beliefs can color the results he or she expects to see. Here, I present a general information retrieval framework that leverages ideas from submodular optimization and probabilistic graphical models to unify both settings. I show results in the news and science domains demonstrating that our approach outperforms state-of-the-art alternatives from Google and Yahoo! on real users. Finally, as personalization itself has become ubiquitous on the Web, it is important to provide users with an interaction mechanism that allows them to inspect--and potentially correct--inferred user profile information. I present recent work that learns such transparent user profiles from user behavior in a large-scale social network. Optimal targeting strategies over social networks Kostas Bimpikis, Stanford University Recent advances in information technology have allowed ^Lrms to gather vast amounts of data regarding consumers' preferences and the structure and intensity of their social interactions. The question that arises naturally is whether ^Lfirms can intelligently use the available data to improve their business strategies. In this presentation, we take a step towards this direction by discussing our results on optimal price discrimination and targeted advertising over networks. In particular, in the ^Lrst part of the talk, we study the pricing problem of a monopolist selling a divisible good (service) to consumers who are embedded in a social network. We derive a characterization of the optimal price for each consumer as a function of her network position and illustrate the value of knowing the underlying network structure by comparing the pro^Lts of a monopolist who does not take the network into account to those of a monopolist who uses this information optimally. The second part of the talk examines a game-theoretic model of competition between ^Lfirms, which can target their marketing budgets to individuals. We provide a sharp characterization of the optimal targeted marketing strategies and highlight their dependence on the consumers' references as well as on the underlying social network structure. In particular, firms find it optimal to allocate their marketing budgets to consumers in proportion to their \network centrality", a measure of social influence. Moreover, we identify network structures for which targeted advertising is more beneficial for the ^Lfirms and, ^Lnally, we show how the difference in the initial budgets affect the outcome of the marketing competition between them. Individual Heterogeneity and the Formation of Collaboration Networks Katharine Anderson, Carnegie Mellon University Collaboration drives innovation in a variety of contexts, including product development, entrepreneurship, policy-making, and academic research. Collaboration networks--in which two people are connected if they work together on a problem--provide a valuable tool for understanding the structure of these collaborative communities. Empirically, individuals in collaboration networks play a wide variety of roles--some individuals have many links, while others have few links, some bridge between communities while others play a more peripheral role. From a social scientific perspective, we would like to believe that the heterogeneity we observe in social network position reflects some kind of underlying heterogeneity in individuals--that individuals occupy different positions in the social network, not because they were lucky, but because of some identifiable underlying characteristics. However, our current models of network formation use homogeneous agents, and thus cannot be used to answer questions about individual heterogeneity and network structure. Here, I introduce a model of network formation in which individuals with heterogeneous skill sets choose their collaborators in order to gain access to skills that are complementary to their own. Together, these collaborative connections comprise a collaboration network. This class of models is interesting, because it connects individual heterogeneity to network heterogeneity, and thus can be used to answer questions about 1) how an individual's skill set affects her position on the social network and 2) how the distribution of skills in the population affects the overall structure of the network. Using this model, I show that an individual's degree on the collaboration network is a supermodular function of her set of skills. As a result, the degree distribution of the network may be highly skewed, even when the distribution of skills in the population is not. This skew becomes more pronounced as individuals in the network have fewer of the skills required to solve their problems--that is, as problems get more difficult. I then use an extension of this model to make predictions about the network position of specialists--individuals whose skills lie within a single subject area--and generalists--individuals whose skill bridge different subject areas. I show that while specialists tend to have more links than generalists, generalists are more likely to bridge between communities. I also show that as disciplinary boundaries fade, the degree distribution of the network becomes more skewed towards a few, high-degree superstars. DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management. Dan Iancu, Stanford University Robust optimization has recently emerged as a computationallytractable and scalable paradigm for modeling complex decision problems underuncertainty. Its applications span a wide range of challenging problems infinance, pricing, supply chain management or network design. In the context of dynamicrobust decision models, a typical approach is to look for control policies (ordecision rules) that are directly parameterized in the uncertaintiesaffecting the model. This has the advantage of leading to simple convexoptimization problems for finding particular classes of rules (e.g., affine).However, the approach typically yields suboptimal policies, and is hard toanalyze. In the first part of the talk, we seek to bridge this paradigm withthe classical Dynamic Programming (DP) framework for solving dynamic decision problems. We provide a set of unifying conditions (based onthe interplay between the convexity and supermodularity of the DP valuefunctions, and the lattice structure of the uncertainty sets) that, takentogether, guarantee the optimality of the class of affine decision rules, and furthermore allow suchrules to be found very efficiently. Ourresults suggest new modeling paradigms for dynamic robust optimization, and ourproofs bring together ideas from three areas of optimization typically studied separately: robust,combinatorial (lattice programming andsupermodularity), and global (the theory of concave envelopes). Weexemplify the approach in a model for negotiating flexiblesupply chain contracts between several parties, where optimal capacity,contracting, and ordering policies can be found by solving a simple linearprogram. In the second part of the talk, we consider a class of models arisingin supply chain networks and multi-product pricing, for which we suggest a newclass of polynomial decision rules, found by solving compact semidefinite programs,and typically yielding drastic improvement compared with the status quo. This is joint work with Dimitris Bertsimas, PabloParrilo, Mayank Sharma and Maxim Sviridenko. Online Concave Matching Kamal Jain, Microsoft Research Online matching was introduced by Karp, Vazirani, Vazirani [KVV] in 1990, which over time became a corner stone of online algorithms. In the matching problem of KVV, the edges are unweighted. Mehta, Saberi, Vazirani, and Vazirani [MSVV] and Buchbinder, Naor, and Jain, developed algorithms for the weighted version of online matching but at the expense of finding a fractional solution. In many large scale instances, such as matching ads to consumers, fractionality is not a major compromise. The major compromise is that the existing work allows only hard budget constraint, e.g., so many matches per node or a linear utility function bounded above by a budget constraint, which essentially gives you a very specific concave utility function. In practice people have concave utility functions. In this work we obtain an online algorithm for concave matching problem. The performance of our algorithm is optimal. We use a convex programming duality. Using variational calculus we show how the worst possible example is dual to the best possible algorithm, thereby proving the optimality of the performance guarantee of our algorithm. In hind sight, one can reconstruct the algorithm by studying the worst possible example itself. Credits: This is a joint work with Nikhil Devanur. Acknowledgement: This is a work, I had been doing for the last 7 years. Mark Braverman, Alexander Holroyd, Yuval Peres, and Oded Scramm are some of the people who have guided us. Price Prediction Strategies for Simultaneous One-Shot Auctions Michael Wellman, University of Michigan Bidding in simultaneous auctions is challenging because an agent's value for a given good may depend on the outcome of other auctions. Given the lack of known solutions to general simultaneous auction games, previous works have addressed this classic exposure problem with heuristic strategies employing probabilistic price predictions. We introduce a concept of self-confirming prices, and show that within an independent private value model, bidding optimally with respect to self-confirming price predictions is w.l.o.g. in equilibrium. We exhibit practical procedures to compute approximate equilibrium strategies, and demonstrate via empirical game-theoretic analysis that these strategies are effective in simultaneous auction environments with both complementary and substitutable preference structures. ** This is joint work with Amy Greenwald and Eric Sodomka of Brown University. Lattice Games and the Economics of Aggregators George Varghese, UCSD and Yahoo! Research We model the strategic decisions of web sites in content markets, where sites may reduce user search cost by aggregating content. Example aggregations include political news, technology, and other niche-topic websites. We model this market scenario as an extensive form game of complete information, where sites choose a set of content to aggregate and users associate with sites that are nearest to their interests. Thus, our scenario is a location game in which sites choose to aggregate content at a certain point in user-preference space, and our choice of distance metric, Jacquard distance, induces a lattice structure on the game. We provide two variants of this scenario: one where users associate with the first site to enter amongst sites of equal distances, and a second where users choose uniformly between sites at equal distances. We show that Subgame Perfect Nash Equilibria exist for both games. While it appears to be computationally hard to compute equilibria in both games, we show a polynomial-time satisficing strategy called Frontier Descent for the first game. A satisficing strategy is not a best response but ensures that earlier sites will have positive profits, assuming all subsequent sites also have positive profits. By contrast, we show that the second game has no satisficing solution. **Joint work with P. Jordan, U. Nadav, K. Punera and A. Skrzypacz at Yahoo Competitive Contagion in Networks Michael Kearns, University of Pennsylvania We examine a game-theoretic model of "competitive contagion" in networks, where two competing companies or other entities have limited budgets to seed initial infections in an underlying social network, which then mediates stochastic contagion. The payoff to each party is their final number of infections, which at equilibrium may come at the expense of the other party. In this model we provide characterizations of the Price of Anarchy and a new notion called the Price of Budgets. These characterizations depend on properties of the local stochastic dynamics of contagion, and in some cases exhibit sharp threshold behavior. **Joint research with Sanjeev Goyal of the University of Cambridge. Building an Institutional Field to Corral a Government Stephen Barley, Stanford University This talk will describe how the American business community either founded or revitalized eight distinct populations of organizations during the late 1970s and the 1980s which intentionally and unintentionally formed an institutional field (a network of organizations) designed to influence the federal government and create a more business friendly political environment. Cooperation and Assortativity with Endogenous Partner Selection Siddharth Suri, Yahoo! Research The natural tendency for humans to choose with whom to form new relationships and with whom to end established relationships is thought to facilitate the emergence of cooperation. Helping cooperators to mix assortatively is believed to reinforce the rewards accruing to mutual cooperation while simultaneously excluding defectors. However, the relationship between endogenous partner selection, assortativity, and cooperation has been largely unexplored experimentally. Here we report on a series of human subjects experiments in which groups of 24 participants played a multi-player prisoner's dilemma game where, critically, they were also allowed to propose and delete links to players of their own choosing at some variable rate. Over a wide variety of parameter settings and initial conditions, we found that endogenous partner selection significantly increased the level of cooperation, the average payoffs to players, and the assortativity between cooperators. Even relatively slow update rates were sufficient to produce large effects resulting in cooperation levels over 80%. Subsequent increases to the update rate still had a positive, although smaller, effect. For standard prisoner's dilemma payoffs, we also found that assortativity resulted predominantly from cooperators avoiding defectors, not by severing ties with defecting partners, and that cooperation correspondingly suffered. Finally, by modifying the payoffs to satisfy two novel conditions, we found that cooperators did punish defectors by severing ties, leading to levels of cooperation approaching 100% which persisted for longer. **Joint work with: Jing Wang (NYU) and Duncan Watts (Yahoo! Research) Two(!) Good To Be True Sergiu Hart, The Hebrew University of Jerusalem How to sell goods optimally? While the mechanism design literature has solved this problem neatly when there is only one good, the multiple goods case turns out to be extremely difficult, mathematically and conceptually. Much of what is true for one good does not extend to multiple goods. We will try to explain the difficulties, show what can go wrong, and then present some universal approximation results. The talk is essentially self-contained; no background in mechanism design is necessary. **Joint work with Noam Nisan and Phil Reny Identifying peer effects in online communication behaviors Dean Eckles, Facebook Peer effects can produce clustering of behavior in social networks, but so can homophily and common external causes. For observational studies, adjustment and matching estimators for peer effects require often implausible assumptions, but it is only rarely possible to conduct appropriate direct experiments to study peer influence in situ. We describe research designs in which individuals are randomly encouraged to perform a focal behavior, which can subsequently influence their peers. Ubiquitous product optimization experiments on Internet services can be used for these analyses. This approach is illustrated with an analysis of peer effects in expressions of gratitude via Facebook on Thanksgiving Day 2010, with implications for the micro-foundations of culture. Networks, communities and the ground-truth Jure Leskovec The Web, society, information, cells and brain can all be represented and studied as complex networks of interactions. Nodes in such networks tend to organize into clusters and communities, which represent the fundamental structures for understanding the organization of complex systems. Even though detection of network communities is of significant importance for computer science, sociology and biology, our understanding of the community structure of large networks remains limited. We study a set of more than 200 large networks with the goal to understand and identify communities in networks. We challenge the conventional view of network community structure and show that it is not exhibited by the large real-world networks. We then present a new conceptual model of network community structure, which reliably captures the overall structure of networks and accurately identifies the overlapping nature of network communities. This is joint work with Jaewon Yang. Computing Game-Theoretic Solutions for Security Vincent Conitzer Algorithms for computing game-theoretic solutions are now deployed in real-world security domains, notably air travel. These applications raise some hard questions. How do we deal with the equilibrium selection problem? How is the temporal and informational structure of the game best modeled? What assumptions can we reasonably make about the utility functions of the attacker and the defender? And, last but not least, can we make all these modeling decisions in a way that allows us to scale to realistic instances? I will present our ongoing work on answering these questions. Joint work with Dmytro Korzhyk, Joshua Letchford, Kamesh Munagala, Ronald Parr (Duke); Manish Jain, Zhengyu Yin, Milind Tambe (USC); Christopher Kiekintveld (UT El Paso); Ondrej Vanek, Michal Pechoucek (Czech Technical University); and Tuomas Sandholm (CMU). Brief bio: Vincent Conitzer is the Sally Dalton Robinson Professor of Computer Science and Professor of Economics at Duke University. His research focuses on computational aspects of microeconomic theory, in particular game theory, mechanism design, voting/social choice, and auctions. He recently received the IJCAI Computers and Thought Award, which is awarded to outstanding young scientists in artificial intelligence. Revenue-optimal Auctions Costis Daskalakis In his seminal paper, Myerson [1981] provides a revenue-optimal auction for a seller who is looking to sell a single item to multiple bidders. Unfortunately, Myerson's auction generalizes to a limited range of domains, called "single-parameter", where each bidder's preference over the auction’s outcomes is specified by a single parameter. Indeed, extending this auction to "multi-parameter domains", such as selling multiple heterogeneous items to bidders, has been one of the most central problems in Mathematical Economics. We solve this problem in bidder- and item- symmetric settings. For a single bidder, we solve the general problem. (Based on joint work with Yang Cai and Matt Weinberg) Bio: Constantinos (or Costis) Daskalakis is an Assistant Professor of EECS at MIT. He completed his undergraduate studies in Greece, at the National Technical University of Athens, and obtained a PhD in Computer Science from UC Berkeley. After Berkeley he was a postdoctoral researcher in Microsoft Research New England, and since 2009 he has been at the faculty of MIT. His research interests lie in Algorithmic Game Theory and Applied Probability, in particular computational aspects of markets and the Internet, social networks, and computational problems in Biology. Costis has been honored with a 2007 Microsoft Graduate Research Fellowship, the 2008 Game Theory and Computer Science Prize from the Game Theory Society, the 2008 ACM Doctoral Dissertation Award, a NSF Career Award, a 2010 Sloan Foundation Fellowship in Computer Science, the 2011 SIAM Outstanding Paper Prize, and the MIT Ruth and Joel Spira Award for Distinguished Teaching. A Random Graph Model of Multi-Hospital Kidney Exchanges David Parkes Multi-hospital kidney exchanges, where hospitals share local lists of donor-patient pairs with a centralized exchange, promise to faciliate larger matches, and thus additional transplants. But a central challenge is to align incentives, so that hospitals will choose to share all pairs with the exchange. We present a random-graph model of feasible transplants, and use the model to (a) explain early experimental results on 2-way and 3-way exchanges anayltically, (b) derive a "square-root law" for welfare gains from centralized pools, and (c) design a matching mechanism that is Bayes-Nash incentive compatible, efficient and individual rational under reasonable assumptions. The robustness of the mechanism is established through a computational study, validating incentive alignment, and estimating an efficiency loss due to incentive alignment of 1-4% for 2-way exchanges and 5-19% for 3-way exchanges (and decreasing with larger pools.) Joint work with Panos Toulis A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs Arnout van de Rijt Contemporary scholars of fame argue that public attention to persons is short-lived. We investigate this "fifteen minutes" hypothesis in a unique data source containing daily records of references to millions of names in 2,500 English-language newspapers, on TV station websites, and on blogs. Longitudinal analysis shows that instead fame tends to carry over from one year to the next, except below some threshold level of marginal public attention where fame does appear to be fleeting. This pattern of above-threshold annual stability is found back even in newspaper entertainment sections, celebrity-oriented tabloids, and blogs, where fame is thought to be most ephemeral. Analysis of archival newspaper data reveals that while coverage of small names follows a pattern of rapid decay, big names traverse a career-like trajectory spanning many years. We take this evidence to suggest that fame is created through an interaction between static social hierarchies and dynamic reinforcement processes, whereby a population is pre-stratified in terms of the chances that a cascade of public celebration will latch on to one of its members. Treasure Hunt Markus Mobius We seed a large real-world social network with binary information and analyze subsequent social learning. A unique feature of our field experiment is that we measure both the pre-existing social networks and the actual conversation network. Our experiment allows us to test how rational agents behave when processing information that originates within their social network. We find that information decays quickly with social distance and that agents mainly incorporate information within social distance 2. Conversations through common friends do not increase the weight that a subject places on signals from direct friends but linearly increases the weight on signals from indirect friends. This suggests that agents are able to avoid double-counting information from indirect friends. We propose a simple “streams model” of social learning that is consistent with the evidence from our experiment. Task Routing for Prediction Tasks Yiling Chen We describe methods for routing a prediction task on a network where each participant can contribute information and route the task onwards. We introduce routing scoring rules that bring truthful contribution of information and optimal routing of the task into a Perfect Bayesian Equilibrium under common knowledge about the amount of information available to each agent on the network. Relaxing the common knowledge assumption, we address the challenge of routing in situations where states of information differ among agents. We introduce a family of local routing rules, which isolate routing decisions that depend only on local information in equilibrium, while still promoting effective routing. Local routing rules are the only routing scoring rules that induce truthful equilibria in which agents' routing decisions rely solely on local information. Simulation results show that local routing decisions can lead to effective task routing. Joint work with Haoqi Zhang, Eric Horvitz, and David Parkes. Bargaining Theory in the Cloud Eric Friedman The axiomatic theory of bargaining solutions was initiated by John Nash with his seminal paper in 1950 and has a long and mostly mathematical history. Surprisingly, it arises naturally in a variety of allocation problems arising in cloud computing. For example, the second most famous bargaining solution, the Kalai-Smorodinsky solution, is the outcome of a simple water filling algorithm used in the Mesos Platform and has many strong properties in that setting, including incentive compatibility and fairness. In this talk, I will explore these connections for a variety of cloud computing problems and show how axiomatic bargaining theory can be used to analyze allocation problems in the cloud and conversely how cloud computing sheds new light on axiomatic bargaining theory. This talk is based on joint work with Ali Ghodsi, Scott Shenker and Ion Stoica. Network Patterns of Favor Exchange Matthew Jackson Abstract: We examine the informal exchange of favors among the members of a society. We analyze a game where the fear of losing multiple relationships can provide incentives for favor provision. We characterize the network patterns of exchange that are robust in a sense that deleted relationships only result in a local loss of favor exchange. Such robustness necessitates networks such that all links are "supported": any pair of individuals exchanging favors must have a common friend. We show that favor exchange networks in 75 villages in rural India exhibit a frequency of this sort of support that is significantly higher than a standard network 'clustering' measure. Co-authors Tomas Rodriguez-Barraquer and Xu Tan Simple Auctions with Near-Optimal Equilibria Tim Roughgarden In practice, auction designers often exchange incentive-compatibility for simplicity (e.g., in sponsored search auctions or combinatorial auctions). How much welfare loss does this design choice cause? I'll talk about some techniques for answering this question. Based largely on a SODA '11 paper with Kshipra Bhawalkar. Efficient Online Ad Serving in a Display Advertising Exchange Kevin Lang We begin with a tutorial overview of the basic concepts of Non-Guaranteed (i.e. spot-market) display advertising and advertising exchanges, then pose and solve a constrained path optimization problem that lies at the heart of the real-time ad serving task in Yahoo!'s NGD Ad Exchange. In Yahoo!'s Exchange, the ad server's task for each display opportunity is to compute, with low latency, an optimal valid path through a directed graph representing the business arrangements between the numerous business entities that belong to the Exchange. These entities include not only publishers and advertisers, but also intermediate entities called ad networks'' which have delegated their ad serving responsibilities to the Exchange. Path validity is determined by business constraints which focus on the following three issues: 1) suitability of the display opportunity's web page and its publisher 2) suitability of the user who is currently viewing that web page, and 3) suitability of a candidate ad and its advertiser. Path optimality is determined by money reaching the publisher, and is affected not only by an advertiser's bid, but also by the revenue-sharing agreements between the entities in the candidate path. We describe two different algorithms that have both been used in the actual Yahoo! non-guaranteed ad serving system, focusing on typical case rather than worst case performance, and on the optimality-preserving speedup heuristics that allow latency targets to be met. The talk is based on a WSDM 2011 paper that was co-written with Joaquin Delgado, Dongming Jiang, Bhaskar Ghosh, Shirshanka Das, Amita Gajewar, Swaroop Jagadish, Arathi Seshan, Chavdar Botev, Michael Binderberger-Ortega, Sunil Nagaraj, and Raymie Stata. Designing Algorithms for MapReduce Sergei Vassilvitskii The amount of data available today has lead to analysis problems whose size would be unimaginable just 10 years ago. With datasets routinely measured in gigabytes and terabytes, one challenge that has emerged is scaling algorithms to handle this deluge of information. With chip designers sustaining Moore’s law by turning to parallelism, so have algorithmicists turned to parallelism to speed up algorithms and improve performance. In this talk, we will focus on the MapReduce model of parallel computing, highlight some of the recent algorithmic advances and present current challenges. Based on joint work with Howard Karloff, Silvio Lattanzi, Ben Moseley and Siddharth Suri. Incentivizing High-quality User-Generated Content Arpita Ghosh We model the economics of incentivizing high-quality user generated content (UGC), motivated by settings such as online review forums, question-answer sites, and comments on news articles and blogs. We provide a game-theoretic model within which to study the problem of incentivizing high quality UGC, in which contributors are strategic and motivated by exposure. Our model has the feature that both the quality of contributions {\em as well as} the extent of participation is determined endogenously in a free-entry Nash equilibrium. The model predicts, as observed in practice, that if exposure is independent of quality, there will be a flood of low quality contributions in equilibrium. An ideal mechanism in this context would elicit both high quality and high participation in equilibrium, with near-optimal quality as the available attention diverges, and should be easily implementable in practice. We consider a very simple elimination mechanism, which subjects each contribution to rating by some numberA$of viewers, and eliminates any contributions that are not uniformly rated positively. We construct and analyze free-entry Nash equilibria for this mechanism, and show that$A$can be chosen to achieve quality that tends to optimal, along with diverging participation, as the number of viewers diverges. Joint work with Preston McAfee. Coevolution of Network Structure and Content Lada Adamic As individuals communicate, their exchanges form a dynamic network. A time series analysis of both standard and novel network metrics can be used to characterize a network's evolution. We demonstrate that network structure alone can be highly revealing of the diversity and novelty of the information being communicated in the network. Networks with a higher conductance exhibit higher information entropy, while unexpected network configurations are indicative of higher information novelty. This is joint work with Liuling Gong, Edwin Teng, and Avishay Livne. TBA Damon Centola TBA BGP Safety with Spurious Updates - The Conditions of BGP Convergence Martin Suchara We explore BGP safety, the question of whether a BGP system converges to a stable routing, in light of several BGP implementation features that have not been fully included in the previous theoretical analyses. We show that Route Flap Damping, MRAI timers, and other intra-router features can cause a router to briefly send ?spurious? announcements of less-preferred routes. We demonstrate that, even in simple configurations, this short-term spurious behavior may cause long-term divergence in global routing. We then present DPVP, a general model that unifies these sources of spurious announcements in order to examine their impact on BGP safety. In this new, more robust model of BGP behavior, we derive a necessary and sufficient condition for safety, which furthermore admits an efficient algorithm for checking BGP safety in most practical circumstances -- two complementary results that have been elusive in the past decade's worth of classical studies of BGP convergence in more simple models. We also consider the implications of spurious updates for well-known results on dispute wheels and safety under filtering. Joint work with Alex Fabrikant and Jennifer Rexford. Predicting the Popularity of Online Content Gabor Szabo Online services that collect, categorize, and disseminate user-generated content have seemingly very different focus, user communities, and content types. We present a study on the popularity of content in three online services, Youtube, Twitter, and Digg. While videos on Youtube, short messages in Twitter, and interesting articles on Digg do not necessarily have much in common, it turns out that the patterns of access to content on any of these services are very similar across the sites. In particular, we present predictions for the long-term popularity of content based on early observations, and moreover discuss what role the social network plays in determining the success of a submission. Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism Elisa Celis I will first discuss bid data from Microsoft’s AdECN platform, which allows advertisers to target their bids. We find that the valuations are irregular, making the commonly used Vickrey auction unsuitable for extracting revenue. However, Myerson's optimal auction mechanism is often unsuitable in practice. I will present and discuss the benefits of a simple auction mechanism (BIN-TAC) which extends the second-price auction with reserve. This mechanism is particularly effective in private value environments where the distribution of valuations are irregular, and is truthful in expectation. We use the AdECN data in counterfactual experiments that suggest our BIN-TAC mechanism would improve revenue by 24% relative to the mechanism currently in place. Social Learning and the Dynamic Cavity Method Yashodhan Kanoria Social learning constitutes learning of behavior from interaction with friends/neighbors on a network. This talk will focus on models of repeated interaction, with agents 'voting' in a series of rounds on some issue of interest. Votes in the initial round are based on 'private signals', whereas votes in future rounds incorporate knowledge of previous votes cast by friends. We look at two different models of iterative learning. A very simple model is 'majority dynamics' where agents choose their vote based on the majority of neighbors' votes in the previous round. We analyze this model on regular trees. At the other extreme is a iterative Bayesian learning: a fully rational model introduced by Gale and Kariv (2003). We introduce new algorithms for this model, challenging the widespread belief that learning is computationally intractable in this setting. We develop a novel technique -- the 'dynamic cavity method', that serves as a key tool for both models. Based on joint work with Andrea Montanari (to appear in Ann. App. Prob.) and Omer Tamuz (submitted). The Complexity Ecology of Consensus Ranking Matthias Mnich Preference lists are extensively used in social science surveys and voting systems to capture information about choice. In many such scenarios there arises the need to combine the data represented by many lists into a single list which reflects the opinion of the surveyed group as much as possible. This yields a class of ranking aggregation problems, where we are given a set of permutations (votes) over a set of alternatives (candidates), and are asked for a permutation of the set candidates such that some objective is minimized.Examples include Kemeny ranking and other consensus methods, that find important applications in web search engines, e-mail spam detection, and user recommendation systems. We present results on this topic from a multivariate complexity and algorithms analysis, based on parameterized complexity. (Joint work with Henning Fernau, Fedor Fomin, Daniel Lokshtanov, Geevarghese Philip and Saket Saurabh) Contribution Games in Social Networks Elliot Anshelevich We consider network contribution games, where each agent in a social network has a budget of effort that it can contribute to different collaborative projects or relationships. Depending on the contribution of the involved agents a relationship will flourish or drown, and to measure the success we use a reward function for each relationship. Every agent is trying to maximize the reward from all relationships that it is involved in. We consider pairwise equilibria of this game, and characterize the existence, computational complexity, and quality of equilibrium based on the types of reward functions involved. For example, when all reward functions are concave, we prove that the price of anarchy is at most 2. For convex functions the same only holds under some special but very natural conditions. A special focus of the talk are minimum effort games, where the reward of a relationship depends only on the minimum effort of any of the participants. Finally, we show tight bounds for approximate equilibria and convergence of dynamics in these games. Liquidity in Credit Networks: A Little Trust Goes a Long Way Pranav Dandekar Credit networks represent a way of modeling trust between entities in a network. Nodes in the network print their own currency and trust each other for a certain amount of each other's currency. This allows the network to serve as a decentralized payment infrastructure--- arbitrary payments can be routed through the network by passing IOUs between trusting nodes in their respective currencies---and obviates the need for a common currency. Nodes can repeatedly transact with each other and pay for the transaction using trusted currency. A natural question to ask in this setting is: how long can the network sustain liquidity, i.e. how long can the network support the routing of payments before credit dries up? We answer this question in terms of the long term success probability of transactions for various network topologies and credit values. We show that the success probability of transactions is independent of the path used to route .ow between nodes. For symmetric transaction rates, we show analytically and via simulations that the success probability for complete graphs, Erdos-Renyi graphs and preferential attachment graphs goes to one as the size, the density or the credit capacity of the network increases. Finally, we compare liquidity in credit networks to that in a centralized payment infrastructure and show that credit networks are at most a constant factor worse; thus we do not lose much liquidity in return for their robustness and decentralized properties. Joint work with Ashish Goel, Ramesh Govindan and Ian Post. User Browsing Models: Relevance versus Examination Sugato Basu There has been considerable work on user browsing models for search engine results, both organic and sponsored. The click-through rate (CTR) of a result is the product of the probability of examination (will the user look at the result) times the perceived relevance of the result (probability of a click given examination). Past papers have assumed that when the CTR of a result varies based on the pattern of clicks in prior positions, this variation is solely due to changes in the probability of examination. We show that, for sponsored search results, a substantial portion of the change in CTR when conditioned on prior clicks is in fact due to a change in the relevance of results for that query instance, not just due to a change in the probability of examination. We then propose three new user browsing models, which attribute CTR changes solely to changes in relevance, solely to changes in examination (with an enhanced model of user behavior), or to both changes in relevance and examination. The model that attributes all the CTR change to relevance yields substantially better predictors of CTR than models that attribute all the change to examination, and does only slightly worse than the model that attributes CTR change to both relevance and examination. For predicting relevance, the model that attributes all the CTR change to relevance again does better than the model that attributes the change to examination. Surprisingly, we also find that one model might do better than another in predicting CTR, but worse in predicting relevance. Thus it is essential to evaluate user browsing models with respect to accuracy in predicting relevance, not just CTR. Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank Daria Sorokina Ensembles of regression trees are long known as powerful models suitable for a large variety of tasks. The most well-known ensembles are Random Forests (averaging independent large, highly non-linear trees) and Gradient Boosting (additive model built from small trees). In the beginning of the talk I will describe an ensemble that combines the benefits of both: Additive Groves is based on regression trees, additive models and bagging and is capable of both fitting additive structure of the problem and modelling its highly nonlinear components with very large trees at the same time. Combination of these properties makes Additive Groves superior in performance to other existing tree ensemble methods. Models with best performance often act as black boxes: they make good predictions, but do not provide much insight into the decision making process. This is not satisfactory for many data mining problems. Therefore separate post-processing techniques are needed to answer the questions about important features, their effects and interactions. The term statistical interaction is used to describe the presence of non-additive effects among two or more variables in a function. When variables interact, their effects must be modeled and interpreted simultaneously. Thus, detecting statistical interactions can be critical for an understanding of processes by domain researchers. In this talk I will describe an approach to interaction detection based on comparing the performance of unrestricted and restricted prediction models, where restricted models are prevented from modeling an interaction in question. I will show that Additive Groves has the required properties and can be used within this framework. I will also talk about several applications of these techniques to real-world data: tracking environment change based on birds abundance and Salmonella risk prediction on meat-processing establishments. Apart from that, Additive Groves has been successfull in several recent competitions: KDD Cup'09, ICDM'09 and Yahoo Learning to Rank Challenge. The latter result is especially interesting, because Additive Groves was the only winning solution that did not optimize for any specific ranking metrics. Algorithms presented in the talk are implemented as a part of an open source C++ package available at http://additivegroves.net At the end of the talk I will make a short presentation about Yandex and its California office (http://yandexlabs.com). We are hiring! Reflections on Link-based Ranking Marc Najork This talk focuses on using hyperlinks in the ranking of web search results. We give a brief overview of the vast body of work in the area; we provide a quantitative comparison of the different features; we sketch how link-based ranking features can be implemented in large-scale search engines; and we identify promising avenues for future research. Output URL Bidding Panagiotis Papadimitriou Output URL bidding is a new bidding mechanism for sponsored search, where advertisers bid on search result URLs, as opposed to keywords in the input query. For example, an advertiser may want his ad to appear whenever the search result includes the sites www.imdb.com and en.wikipedia.org, instead of bidding on keywords that lead to these sites, e.g., movie titles, actor names, and so on. In this paper we study the tradeoff between the simplicity and the specification power of output bids and we explore their utility for advertisers. We first present a model to derive output bids from existing keyword bids. Then, we use the derived bids to experimentally study output bids and contrast them to input query bids. Our main results are the following: (1) Compact output bids that mix both URLs and hosts have the same specification power as more lengthy input bids; (2) Output bidding can increase the recall of relevant queries; and (3) Output and input biding can be combined into a hybrid mechanism that combines the benefits of both. A Technical Overview of Search Ads Quality at Google Roberto Bayardo Abstract: Ads quality involves optimizing the oftentimes competing interests of Google, the users of its search engine, and advertisers who pay to have their ads displayed on search results pages. In this talk I describe the primary problems faced by the Google ads quality team, and some of the general approaches we take to address them. My goal of the talk is to introduce others to problems of interest beyond the better studied problems of click through estimation and mechanism design. The topics covered will include keyword targeting & bid specification, budget allocation, user based ads quality, ad formats, when to show an ad and where, and assessing quality of ad creatives, their landing pages, and the entire "post click" experience. Bio: Roberto Bayardo is a principal research scientist at Google, where he has been working within the ads quality and ads serving teams for the past 5 years. Roberto currently leads a team working on quality modeling, with the broad goal of ensuring that ads displayed alongside Google search provide a good user experience. On the Competitive Ration of Online Sampling Auctions Georgios Pierrakos We study online profit-maximizing auctions for digital goods with adversarial bid selection and uniformly random arrivals. Our goal is to design auctions that are constant competitive with F^(2); in this sense our model lies at the intersection of prior-free mechanism design and secretary problems. We first give a generic reduction that transforms any offline auction to an online one, with only a loss of a factor of 2 in the competitive ratio; we then present some natural auctions, both randomized and deterministic, and study their competitive ratio; our analysis reveals some interesting connections of one of these auctions with RSOP, which we further investigate in our final section. The importance of ties Paolo Parigi For sociologists, the ties that join individuals into relationships are the result of social processes rather than being entities themselves. A social network is therefore a representation for something that happens between individuals and that brings people together. Sometimes, however, social processes sever ties and produce an increase in the distance that separates individuals, ultimately generating isolation. In my talk I will first provide a few examples of social processes that generate ties and then I will advance a theoretical scenario for studying processes that instead decrease the number of ties. The positive examples of tie generation come from my work in the fields of political sociology and organization theory. The negative scenario of increasing isolation comes from my work on social network analysis. In presenting these examples, my goal is to stimulate the conversation between social scientists and computer scientists in the field of network analysis. How opinions are received by online communities: a case study on amazon.com helpfulness votes Gueorgi Kossinets There are many on-line settings in which users publicly express opinions. A number of these offer mechanisms for other users to evaluate these opinions; a canonical example is Amazon.com, where reviews come with annotations like "26 of 32 people found the following review helpful." Opinion evaluation appears in many off-line settings as well, including market research and political campaigns. Reasoning about the evaluation of an opinion is fundamentally different from reasoning about the opinion itself: rather than asking, "What did Y think of X?", we are asking, "What did Z think of Y's opinion of X?" Here we develop a framework for analyzing and modeling opinion evaluation, using a large-scale collection of Amazon book reviews as a dataset. We find that the perceived helpfulness of a review depends not just on its content but also but also in subtle ways on how the expressed evaluation relates to other evaluations of the same product. As part of our approach, we develop novel methods that take advantage of the phenomenon of review "plagiarism" to control for the effects of text in opinion evaluation, and we provide a simple and natural mathematical model consistent with our findings. Our analysis also allows us to distinguish among the predictions of competing theories from sociology and social psychology, and to discover unexpected differences in the collective opinion-evaluation behavior of user populations from ifferent countries. Reaching Consensus on Social Networks Grant Schoenebeck Research in sociology studies the effectiveness of social networks in achieving computational tasks. Typically the agents who are supposed to achieve a task are unaware of the underlying social network except their immediate friends. They have limited memory, communication, and coordination. These limitations result in computational obstacles in achieving otherwise trivial computational problems. One of the simplest problems studied in the social sciences involves reaching a consensus among players between two alternatives which are otherwise indistinguishable. In this paper we formalize the computational model of social networks. We then analyze the consensus problem as well as the problem of reaching a consensus which is identical to the majority of the original signals. In both models we seek to minimize the time it takes players to reach a consensus. Web-scale Information Extraction Using Wrappers. Nilesh Dalvi A significant portion of web pages embed interesting and valuable semantic content suitable for structured representation. The traditional Information Extraction techniques, however, fall short of achieving high quality extraction at Web scale. In this talk, I will outline some of the work going on at Yahoo! Research on addressing the challenges of Information Extraction on a Web scale. I will focus on {\em wrapper} based techniques, which exploit the HTML structure of websites to extract the information of interest. I will address two problems : (i) making wrappers more robust to changes in websites, and (ii) enabling learning of wrappers from automatically obtained noisy training data.s Characterization and Modeling of Online Browsing Activity. Andrew Tomkins I will describe a set of studies of online user behavior based on search and toolbar logs. In these studies, we propose a new "CCS" taxonomy of pageviews consisting of Content (news, portals, games, verticals, multimedia), Communication (email, social networking, forums, blogs, chat), and Search (web search, item search, multimedia search). We show that roughly half of all pageviews online are content, 1/3 are communications, and the remaining 1/6 are search. We study the interarrival distribution of visits to a page by a single user and in aggregate, and show some evidence that "bursty" behavior in which a URL attains significant transient popularity for a few days is not a significant contributor to interarrival times. We also study the nature and frequency of search queries in more detail, differentiating between web search, item search, and multimedia search, and characterizing the prevalence of references to structured objects in web search queries. Navigation patterns in this data are not well characterized as trails, but should instead be viewed as trees due to the presence of multi-tab or multi-window browsing. We present a model of tabbed browsing that represents a hybrid between a markov process capturing the graph of web pages, and a branching process capturing the creation, splitting, and dying of tabs. We present a mathematical criterion to characterize whether the process has a steady state independent of initial conditions, and we show how to characterize the limiting behavior in both cases. We perform a series of experiments to compare our tabbed browsing model with the simpler random surfer model. Joint work with Ravi Kumar and Flavio Chierichetti. User grouping behavior in online forums. Xiaolin Shi Online forums represent one type of social media that is particularly rich for studying human behavior in information seeking and diffusing. The way users join communities is a reflection of the changing and expanding of their interests toward information. In this talk, I present the work studying the patterns of user participation behavior, and the feature factors that influence such behavior on different forum datasets. In this work, we find that, despite the relative randomness and lesser commitment of structural relationships in online forums, users' community joining behaviors display some strong regularities. One particularly interesting observation is that the very weak relationships between users defined by online replies have similar diffusion curves as those of real friendships or co-authorships. We build social selection models, Bipartite Markov Random Field (BiMRF), to quantitatively evaluate the prediction performance of those feature factors and their relationships. Using these models, we show that some features carry supplementary information, and the effectiveness of different features vary in different types of forums. Moreover, the results of BiMRF with two-star configurations suggest that the feature of user similarity defined by frequency of communication or number of common friends is inadequate to predict grouping behavior, but adding node-level features can improve the fit of the model. Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing. Matthew Harding In this paper we introduce a new Poisson mixture model for count panel data where the underlying Poisson process intensity is determined endogenously by consumer latent utility maximization over a set of choice alternatives. This formulation accommodates the choice and count in a single random utility framework with desirable theoretical properties. Individual heterogeneity is introduced through a random coefficient scheme with a flexible semiparametric distribution. We deal with the analytical intractability of the resulting mixture by recasting the model as an embedding of infinite sequences of scaled moments of the mixing distribution, and newly derive their cumulant representations along with bounds on their rate of numerical convergence. We further develop an efficient recursive algorithm for fast evaluation of the model likelihood within a Bayesian Gibbs sampling scheme, and show posterior consistency. We apply our model to a recent household panel of supermarket visit counts. We estimate the nonparametric density of three key variables of interest - price, driving distance, and their interaction - while controlling for a range of consumer demographic characteristics. We use this econometric framework to assess the opportunity cost of time and analyze the interaction between store choice, trip frequency, search intensity, and household and store characteristics. We also conduct a counterfactual welfare experiment and compute the compensating variation for a 10% to 30% increase in Walmart prices. Deconstructing A Long Tail Commerce Network. Neel Sundaresan In this talk we discuss modeling a complex long tail commerce network in the form of the eBay Marketplace graph. The asymmetric and preferential nature of buying and selling in terms of structure, interactions, trust and reputation measures and evolution of these has tremendous value in identifying business opportunities and building effective user applications.The discussion focuses on the analysis of the macroscopic shape of the networks across the platform and also in specific marketplace categories. The results suggest significant diversity across these individual marketplaces from collectors network to retail network. We also discuss motif profiling and preferential connections. Means for defining trust measures based on network properties will also be addressed. Nod vs. High-Five: LinkedIn Connection Strength Models and Applications. Monica Rogati The low cost of link formation on social and professional networking sites leads to connections with heterogeneous relationship strengths, a LinkedIn user might be connected to both professional acquaintances and close collaborators. Modeling this relationship as non-binary is more realistic (and more useful in personalizing a user's experience) than the binary links that have been the focus of previous work in the social networking space. We present an unsupervised, link-based latent variable model to estimate relationship strength from interaction activity (e.g., communication, browsing, comments) and profile content similarity. We also discuss practical applications of modeling connection strength when building data driven products that enhance the LinkedIn user experience and engagement. (This is joint work with Rongjing Xiang during her internship at LinkedIn.) Network of positive and negative edges Jure Leskovec Relations between users on social media sites often reflect a mixture of positive (friendly) and negative (antagonistic) interactions. We study how the interplay between positive and negative relationships affects the structure of on-line social networks, whereas in contrast the bulk of research on social networks to date has focused almost exclusively on positive interpretations of links between people. We connect our analyses to theories of signed networks from social psychology. In doing so we find that the classical theory of structural balance tends to capture certain common patterns of interaction, but that it is also at odds with some of the fundamental phenomena we observe --- particularly related to the evolving, directed nature of these on-line networks. We then develop an alternate theory of status that better explains the observed edge signs and provides insights into the underlying social mechanisms. Our work provides one of the first large-scale evaluations of theories of signed networks using on-line datasets, as well as providing a perspective for reasoning about social media sites. Joint work with Daniel Huttenlocher and Jon Kleinberg. The paper can be accessed here. Social Networks and Stable Matchings in the Job Market Esteban Arcaute In this talk we introduce and study a model that considers the job market as a two-sided matching market, and accounts for the importance of social contacts in finding a new job. We assume that workers learn only about positions in firms through social contacts. Given that information structure, we study both static properties of what we call locally stable matchings, a solution concept derived from stable matchings, and dynamic properties through a reinterpretation of Gale-Shapley's algorithm as myopic best response dynamics. We prove that, in general, the set of locally stable matching strictly contains that of stable matchings and it is in fact NP-complete to determine if they are identical. We also show that the lattice structure of stable matchings is in general absent. Finally, we focus on myopic best response dynamics inspired by the Gale-Shapley algorithm. We study the efficiency loss due to the informational constraints, providing both lower and upper bounds. This is joint work with Sergei Vassilvitskii, Ramesh Johari and David Liben-Nowell. Some of the results presented appeared in WINE 2009. Temporal User Behavior in Yahoo Answers Christina Aperjis When searching for an answer to a question, people generally prefer to get both high quality and speedy answers. The fact that it usually takes longer to find a better answer creates a speed-accuracy tradeoff which is inherent in information seeking. A natural setting to study this tradeoff is Yahoo Answers, a community-driven question-and-answer site. We identify and quantify how question authors trade off the number of answers they receive and the cost of waiting. We find that users are willing to wait more to obtain an additional answer when they have only received a small number of answers; this implies decreasing marginal returns in the amount of collected information. We also estimate the user's utility function from the data. This is joint work with Bernardo A. Huberman and Fang Wu. The Netflix Prize: Quest for$1,000,000
Yehuda Koren

The collaborative filtering approach to recommender systems predicts user preferences for products or services by learning past user-item relationships. Their significant economic implications made collaborative filtering techniques play an important role at known e-tailers such as Amazon and Netflix. This field enjoyed a surge of interest since October 2006, when the Netflix Prize competition was commenced. Netflix released a dataset containing 100 million anonymous movie ratings and challenged the research community to develop algorithms that could beat the accuracy of its recommendation system, Cinematch. In this talk I will survey the competition together with some of the principles and algorithms, which have led us to winning the Grand Prize in the competition.

Data Jujitsu - the art of treating data as a product
DJ Patil

LinkedIn is the premiere professional social network with over 60 million users and a new user joining every second. One of LinkedIn's strategic advantages is their unique data. While most organizations consider data as a service function, LinkedIn considers data a cornerstone of their product portfolio. This emphasis on data as a product led LinkedIn to be the first social network site to deploy a People You May Know service. Other data products that LinkedIn has built include Who Viewed My Profile, People Who Viewed This Profile Also Viewed..., the backend standardization algorithms that power search, and matching algorithms. To rapidly develop these products LinkedIn leverages a number of technologies including open source, 3rd party solutions, and some we've had to invent along the way. In this presentation, we will talk about our methodology for turning data into user facing products and some of the major challenges we see in the data space.

Query-log analysis for improving web search
Panayiotis Tsaparas

Search engines log the activity of users when posing queries and interacting with the results. Query click logs maintain information about what queries were asked, what results were shown and which ones were clicked as a result of the query. This information offers a valuable insight in the intent of the user and the need behind the query, and it has been the focus of intense research activity. In this talk, we will describe algorithms for query-log analysis. First we will talk about an algorithm that uses query logs for generating advertising keywords and understanding the category of a query. Time permitting, we will then describe an algorithm for generating training data for learning a ranking function. In both cases we model the query log as a graph, and we formulate our problem as an optimization problem over these graphs.

Stochastic Submodular Optimization
Arash Asadpour

We study stochastic submodular maximization problem with respect to cardinality and matroid constraints. Our model can capture the effect of uncertainty in different problems, such as cascade effects in social networks, capital budgeting, sensor placement, etc. We study non-adaptive and adaptive policies and give "optimal" constant approximation algorithms for both cases. We also show that the adaptivity gap in this setting is 1.59.

Compressibility of Behavioral Graphs
Ravi Kumar

Graphs resulting from human behavior (the web graph, social networks, etc.) have hitherto been viewed as a monolithic class with similar characteristics. Our focus here is on the compressibility of such graphs. It has been empirically shown that Web graphs can be compressed down to three bits of storage per edge. In light of this, we address two basic questions: are social networks as compressible as Web graphs and are there tractable and realistic models that yield highly compressible graphs?

De-anonymizing Social Networks
Arvind Narayanan

Operators of online social networks are increasingly sharing potentially sensitive information about users and their relationships with advertisers, application developers, and data-mining researchers. Privacy is typically protected by anonymization, i.e., removing names, addresses, etc.

We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

Our de-anonymization algorithm is based purely on the network topology, does not require creation of a large number of dummy "sybil" nodes, is robust to noise and all existing defenses, and works even when the overlap between the target network and the adversary's auxiliary information is small.

Bio:
--------------
Arvind Narayanan is a post-doctoral researcher working with Dan Boneh. He recently finished his Ph.D at the University of Texas at Austin, advised by Vitaly Shmatikov. His research is on the privacy and anonymity issues involved in publishing large-scale datasets about people. His thesis, in a sentence, is that the level of anonymity that society expects -- and companies claim to provide -- in published databases is fundamentally unrealizable. He blogs about his anonymity-breaking efforts and other research at http://33bits.org/.

Eliciting Information on the Distribution of Future Outcomes
Nicolas Lambert

This paper studies the problem of inducing a presumably knowledgeable expert to reveal true information regarding the probability distribution of a future random outcome. The information being considered is quite general, and includes in particular the statistics of the probability distribution (such as mean, median, variance), and categorical information (such as the most correlated pair of variables). I examine two types of incentive schemes: Those that reward the expert for being truthful, and, for the case of numerical information, those that reward the expert increasingly with the accuracy of the prediction. For both cases, I establish simple criteria to determine the information that can be elicited with such schemes, and offer a complete characterization of the associated schedule fee.

Optimizing Web Traffic via the Media Scheduling Problem
Lars Backstrom

Website traffic varies through time in consistent and predictable ways, with highest traffic in the middle of the day. When providing media content to visitors, it is important to present repeat visitors with new content so that they keep coming back. In this paper we present an algorithm to balance the need to keep a website fresh with new content with the desire to present the best content to the most visitors at times of peak traffic. We formulate this as the me- dia scheduling problem, where we attempt to maximize total clicks, given the overall traffic pattern and the time varying clickthrough rates of available media content. We present an efficient algorithm to perform this scheduling under certain conditions and apply this algorithm to real data obtained from server logs, showing evidence of significant improvements in traffic from our algorithmic sched- ules. Finally, we analyze the click data, presenting models for why and how the clickthrough rate for new content declines as it ages.

Behavioral Experiments in Strategic Networks
Michael Kearns

For four years now, we have been conducting "medium-scale" experiments in how human subjects behave in strategic and economic settings mediated by an underlying network structure. We have explored a wide range of networks inspired by generative models from the literature, and a diverse set of collective strategic problems, including biased voting, graph coloring, consensus, and networked trading. These experiments have yielded a wealth of both specific findings and emerging general themes about how populations of human subjects interact in strategic networks. I will review these findings and themes, with an emphasis on the many more questions they raise than answer.

How Social Network Structure affects Diffusion and Learning
Matthew Jackson

We examine how diffusion and learning processes operating through social networks are affected by homophily (the tendency of individuals to associate with others similar to themselves) and other network characteristics. Homophily has no effect in pure diffusion processes where only shortest paths matter; only connection density matters. In contrast, homophily substantially slows learning based on repeated averaging of neighbors' information, while connection density has no effect.

How to Design Profitable Auctions
Paul Valiant

This talk is a high-level overview of results from two years of work with Silvio Micali and Jing Chen on a question that may best be phrased as: what is the best way to sell many items to many players? The standard formalizations of this question were fleshed out in the 1960's and led to answers that are unsatisfactory in practice, leading to today's Ebay that sells items by the auction method known to the ancient Greeks. In this talk we present new formulations of this question, elicited by fundamental notions from computer science: competitive analysis, collusion, and "knowledge". We present several concrete (and near-optimal) auctions in these models, along with many open questions.

Sort me if you can (or, Algorithms on Dynamic Data)
Mohammad Mahdian

We formulate and study a new computational model for dynamic data. In this model the data changes gradually and the goal of an algorithm is to compute the solution to some problem on the data at each time step, under the constraint that it only has a limited access to the data each time. As the data is constantly changing and the algorithm might be unaware of these changes, it cannot be expected to always output the exact right solution; we are interested in algorithms that guarantee to output an approximate solution. In particular, we focus on the fundamental problems of sorting and selection, where the true ordering of the elements changes slowly. We provide algorithms with performance close to the optimal in expectation and with high probability. If time permits, we will discuss new results on graph algorithms in our framework.

This is based on joint work with Aris Anagnostopoulos, Ravi Kumar, and Eli Upfal.

Discovery & Emergence
Abdur Chowdhury

Often as computer scientists we focus on faster algorithms, such as approximations of solutions in linear time over large data sets or similar problems. Rather than focus on algorithms in this talk, we ask the question "What possibilities emerge from surfacing the world's conversations to others". Specifically we explore Twitter Trends as a discovery tool and show how awareness of the thoughts of others can cause the emergence of new behaviors.

Bio:
--------------
Dr. Abdur Chowdhury serves as Twitter's Chief Scientist. Prior to that Dr. Chowdhury co-founded Summize a real-time search engine sold to Twitter in 2008. Dr Chowdhury has held positions at AOL as their Chief Architect for Search, Georgetown's Computer Science Department and University of Maryland's Institute for Systems Research. His research interest lay in Information Retrieval focusing on making information accessible.

Internet search advertising: Challenges in targeting and quality considerations
Mayur Datar

In this informal talk, I will touch upon two topics related to search advertising:

Targeting: How do you match advertiser keywords to user queries? What are the underlying challenges in bridging this semantic gap?

How do you measure the quality of search ads? Is it in the best interest of all parties involved, users, advertisers and the search engine, to always show ads for queries? What are the costs and benefits to showing ads and how do you decide on when is it appropriate to show ads?

Look forward to more questions than answers! Hopefully these will lead to good research topics.

Bio:
--------------
Mayur Datar works as a Research Scientist with Google Inc. His research interests are in datamining, algorithms, databases and computer science theory. Prior to joining Google, Mayur obtained his doctorate degree in computer science from Stanford university and a Bachelor of Technology degree from I.I.T. Bombay. He was awarded the President of India, Gold Medal for being the most outstanding student of his graduating batch from I.I.T. Bombay. He has published several papers in renowned conferences like SIGMOD, VLDB, KDD, FOCS, SODA, WWW.

Affiliation Networks
D. Sivakumar

In the last decade, structural properties of several naturally arising networks (the Internet, socialnetworks, the web graph, etc.) have been studied intensively with a view to understanding their evolution. In recent empirical work, Leskovec, Kleinberg, and Faloutsos identify two new and surprising properties of the evolution of many real-world networks: densification (the ratio of edges to vertices grows over time), and shrinking diameter (the diameter reduces over time to a constant). These properties run counter to conventional wisdom, and are certainly inconsistent with graph models prior to their work.

In this work, we present the first model that provides a simple, realistic, and mathematically tractable generative model that intrinsically explains all the well-known properties of the social networks, as well as densification and shrinking diameter. Our model is based on ideas studied empirically in the social sciences, primarily on the groundbreaking work of Breiger (1973) on bipartite models of social networksthat capture the affiliation of agents to societies.

We also present algorithms that harness the structural consequences of our model. Specifically, we show how to overcome the bottleneck of densification in computing shortest paths between vertices by producing sparse subgraphs that preserve or approximate shortest distances to all or a distinguished subset of vertices. This is a rare example of an algorithmic benefit derived from a realistic graph model.

The talk will review the history of relevant random graph models, and attempt to illustrate the co-evolution of theoretical modeling and empirical insights in this fascinating slice of graph theory.

The talk reports work done jointly with Silvio Lattanzi of Sapienza University, Roma, Italy, and is to appear in STOC 2009.

Using Bandit Models in Search Engines
Sandeep Pandey

Web search engines perform two basic tasks: they acquire Web pages via crawling, and present lists of pages and advertisements in response to user search queries. Given the gigantic size of the Web these tasks can be viewed as constrained optimization problems. On the acquisition side the constraint is on the crawling resources, while on the presentation side, the constraint is on user attention: users cannot be expected to sift through large amounts of content. Furthermore, some parameters needed to solve these optimization problems may or may not be known leading to the offline and online versions of the problems. In this talk we will look at these optimization problems.

In particular, we will focus on the online presentation of advertisements where the unknown parameters are the ad quality values. I will present a method of simultaneously estimating unknown parameters and exploiting current estimates. Our method extends and adapts well-known "Bandit" models from Machine Learning.

Bio:
--------------
Sandeep Pandey is a Research Scientist in Search and Advertising group at Yahoo! Research. Before joining Yahoo!, he received his Ph.D. from Carnegie Mellon and his B.Tech. from IIT Bombay. His research interests include Web crawling and advertising. He also likes to work on online learning problems such as the multi-armed bandit problem. In his spare time Sandeep enjoys watching cricket and playing tennis.

Research Challenges in Online Advertising
Richard Chatwin
VP Research, Adchemy

Adchemy is leading a revolution in online performance marketing, both as a top-25 U.S. online advertiser and through the development of the Adchemy Digital Marketing Platform, an integrated suite of products that allows online marketers to maximize their customers' end-to-end experience. I will discuss some of the research challenges that we are tackling in developing our technology solutions, including problems related to online matching, multi-armed bandits, and the optimizer's curse.

Bio:
--------------
Richard has been VP Research at Adchemy since its inception in late 2004. He has over nine years of experience in the online advertising industry. He received his PhD from the Operations Research department at Stanford, where he focused on airline revenue management. He spent several years in the consulting world pioneering the development and implementation of real option valuation methodologies for asset valuation.

Scaling Social Data
Robert Johnson
Director of Engineering, Facebook

Facebook serves billions of pages a day based on hundreds of terabytes of dynamic data, which presents many scaling challenges for our storage and caching systems. In particular the social nature of the data gives it some interesting properties - it is not easily clustered or partitioned by users or applications, it updates frequently, and many queries require massive network fan-in. I'll be discussing the problems that arise in scaling such a system and some of the solutions we've used to get to our current scale, as well as our plans for infrastructure to solve these problems at even larger scale and in a more general way for future web 2.0 applications with similar properties.

Bio:
--------------
Robert Johnson is Director of Engineering at Facebook, where he leads the software development efforts to cost-effectively scale Facebook's infrastructure and optimize performance for its many millions of users. During his time with the company, the number of users has expanded by more than twenty-fold and Facebook now handles billions of page views a day. Robert was previously at ActiveVideo Networks where he led the distributed systems and set-top software development teams. He has worked in a wide variety of engineering roles from robotics to embedded systems to web software. He received a B.S. In Engineering and Applied Science from Caltech.

Data Cloud: Integrating Structured Information into Web Search
Yury Lifshits

In this talk we address two questions: (1) How to use structured data in web search? (2) How to gather structured data? For the first question we identify valuable classes of data, present query classes that can benefit from structured data and describe architecture that combines keyword search with structured search. For the second question we present Data Cloud: An ecosystem of data publishers, search engine (data cloud) and data consumers.
We show connection from Data Cloud strategy to classic notion in economics: network effect in two-sided markets. At the end of the talk an early demo implementation will be presented.

Bio:
--------------
Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He spent a year as a postdoc at Caltech before joining Yahoo! Research in 2008. He won two gold medals at International Mathematical Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors": http://simsearch.yury.name

Manipulation Robustness of Collaborative Filtering Systems
Robbie Yan

While the expanding universe of products available via Internet commerce provides consumers with valuable options, sifting through the numerous alternatives to identify desirable choices can be challenging. Collaborative filtering systems aid this process by recommending to users products desired by similar individuals. Because purchase decisions are influenced by these systems, however, they have become targets of manipulation by unscrupulous vendors. In this talk, we will discuss our findings on the robustness of collaborative filtering systems to manipulation. In particular, we provide theoretical and empirical results demonstrating that while nearest neighbor algorithms, which are widely used in commercial systems, can be highly susceptible to manipulation, two classes of collaborative filtering algorithms which we refer to as linear and asymptotically linear are relatively robust. In particular, we find that as a user rates an increasing number of products, the average accuracy of predictions made by these algorithms becomes insensitive to manipulated data. Our results provide guidance for the design of future collaborative filtering systems.

Scalable image recognition algorithms for camera phone applications
G.D. Ramkumar, SnapTell, Inc.

SnapTell has created technology that allows a mobile phone user to snap a picture of a product and instantly receive reviews, prices, and information. The service has been scaled to support more than ten million products including all books, DVDs, CDs, video games, and other product categories. We present an outline of the algorithms used to power the service emphasizing scaling and locality sensitive indexing. SnapTell's popular iPhone application has been downloaded by over half a million users over the last few weeks.

The technology can also be used to make print advertisements in a magazine, newspaper, poster, or billboard interactive without adding barcodes or otherwise altering the content. Consumers snap a picture of the object and send it to a shortcode to enter into a contest, access a mobile website, or download ringtones, wallpaper, video clips, or other information. SnapTell is the leading vendor in this space and offers a solution with unprecedented scale and robustness.

We present a synopsis of the image matching algorithms used to power the service and discuss potential research topics in the space.

Content-Driven Reputation and Trust for Collaborative Content
Luca de Alfaro (UCSC, visiting at Google)

Open collaboration is one of the most effective ways to create content and aggregate information, as witnessed by the Wikipedia. In this talk, we propose a reputation systems for the authors of collaborative information, and a trust systems for the aggregated information.

The goal of the reputation system is to provide an incentive to collaboration, and to provide information about the quality of authors. We propose a content-driven notion of reputation, where authors of long-lasting contributions gain reputation, while authors of contributions that are not preserved lose reputation. We show that, on the Wikipedia, the content-driven reputation of authors is a good predictor of the quality of their future contributions. We then describe a trust system for content, where the trust in a portion of content is computed according to the reputations of the authors who created and revised it. Again, we show that on the Wikipedia content trust is a good predictor of future content longevity. Finally, we describe how the reputation and trust systems can be made robust to Sybil attacks, in which people use multiple identities to try boost their reputation and raise the trust of content of their choice.

The reputation and trust systems have been implemented in the WikiTrust system/.

Bio:
--------------
Luca de Alfaro received his Ph.D. degree from Stanford University, where he worked on system specification and verification under the supervision of Zohar Manna. After graduation, he was for three years at UC Berkeley as a postdoctoral researcher, working with Thomas A. Henzinger on verification and on game theory. In 2001, Luca de Alfaro joined the faculty of UC Santa Cruz. His research interests involve system specification and verification, embedded software, game theory, incentive systems for on-line collaboration, and trust and reputation on the web. Luca de Alfaro is currently at Google as a visiting scientist, on leave from UC Santa Cruz.

Luca de Alfaro has long been an enthusiastic user of on-line collaborative tools, including wikis. One evening in 2005, after working at the wiki he uses to facilitate information sharing among members of his research group, he decided to start a wiki on cooking, which is one of his passions. The wiki was soon peppered with spam and less than useful contributions. Rather than cleaning up, a most tedious job, Luca de Alfaro shut off the wiki temporarily, and he started pondering how to provide incentives to collaboration, and compute quality metrics for content. As he found the job of rating authors or contributions by hand most tedious, he was interested in methods for computing reputation and quality metrics purely from content analysis. The research led to the WikiTrust system/. The wiki on cooking has been brought back to life (http://www.cookiwiki.org), even though it receives little traffic. You are welcome to register, add recipes, and see WikiTrust in action.

Tagging with Queries: How and Why? (To appear in WSDM 2009)
Ioannis Antonellis

Web search queries capture the information need of search engine users. Search engines store these queries in their logs and analyze them to guide their search results.

In this work, we argue that not only a search engine can benefit from data stored in these logs, but also the web users. We first show how clickthrough logs can be collected in a distributed fashion using the http referer field in web server access logs. We then perform a set of experiments to study the information value of search engine queries when treated as "tags" or "labels" for the web pages that both appear as a result and the user actually clicks on. We ask how much extra information these query tags provide for web pages by comparing them to tags from the del.icio.us bookmarking site and to the pagetext. We find that query tags can provide substantially many (on average 250 tags per URL), new tags (on average 125 tags per URL are not present in the pagetext) for a large fraction of the Web.

Joint work with Hector Garcia-Molina and Jawed Karim

Content Targeting and Recommendation Systems
Scott Brave (Baynote, Inc)

Baynote delivers on-demand recommendation technology and social search for over 200 websites, including Motorola, NASA, Intuit, and Expedia. By tapping into the wisdom of the online community and automatically displaying the best content and products, Baynote's Collective Intelligence Platform (CIP) improves the user experience while increasing online revenue, leads and impressions. In this talk, I will present Baynote's unified approach to content targeting and discuss some of the opportunities and challenges in building next-generation recommendation systems.

Bio:
--------------
Scott Brave is a co-founder and CTO at Baynote, Inc. Prior to Baynote, he was a postdoctoral scholar in the Department of Communication at Stanford University where he served as lab manager for the CHIMe (Communication between Humans and Interactive Media) Lab. Scott received his Ph.D. in Human-Computer Interaction (Communication Dept.), and B.S. in Computer Systems Engineering (AI focus) from Stanford University, and his M.S. from the MIT Media Lab. Scott is also a former Associate Editor of the "International Journal of Human-Computer Studies" and co-author of Wired for speech: How voice activates and advances the human-computer relationship.

Online Social Networks: Modeling and Mining
Ravi Kumar (Yahoo! Research)

Online social networks have become major and driving phenomena on the web. In this talk we will address key modeling and algorithmic questions related to large online social networks. From the modeling perspective, we raise the question of whether there is a generative model for network evolution. The availability of time-stamped data makes it possible to study this question at an extremely fine granularity. We exhibit a simple, natural model that leads to synthetic networks with properties similar to the online ones. From an algorithmic viewpoint, we focus on data mining challenges posed by the magnitude of data in these networks. In particular, we examine topics related to influence and correlation in user activities in such networks.

Bio:
--------------
Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a research staff member at the IBM Almaden Research Center in the Computer Science Principles and Methodologies group. He obtained his PhD in Computer Science from Cornell University in December 1997. His primary interests are web algorithms, algorithms for large data sets, and theory of computation.

Community Structure in Large Social and Information Networks
Michael Mahoney (Stanford University)

The concept of a community is central to social network analysis, and thus a large body of work has been devoted to identifying community structure. For example, a community may be thought of as a set of web pages on related topics, a set of people who share common interests, or more generally as a set of nodes in a network more similar amongst themselves than with the remainder of the network. Motivated by difficulties we experienced at actually finding meaningful communities in large real-world networks, we have performed a large scale analysis of a wide range of social and information networks. Our main methodology uses local spectral methods and involves computing isoperimetric properties of the networks at various size scales -- a novel application of ideas from statistics and scientific computation to internet data analysis. Our empirical results suggest a significantly more refined picture of community structure than has been appreciated previously. Our most striking finding is that in nearly every network dataset we examined, we observe tight but almost trivial communities at very small size scales, and at larger size scales, the best possible communities gradually blend in'' with the rest of the network and thus become less community-like.'' This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, this behavior is exactly the opposite of what one would expect based on experience with and intuition from expander graphs, from graphs that are well-embeddable in a low-dimensional structure, and from small social networks that have served as testbeds of community detection algorithms. Possible mechanisms for reproducing our empirical observations will be discussed, as will implications of these findings for clustering, classification, and more general data analysis in modern large social and information networks.

Bio:
--------------
Michael Mahoney is currently a research scientist in the math department at Stanford University. He is currently working on geometric network analysis; developing approximate computation and regularization methods for large informatics graphs; and applications to community detection, clustering, and information dynamics in large social and information networks. His research interests also include randomized algorithms for matrices and regression problems, applications to DNA single nucleotide polymorphism data, and large-scale statistical data analysis more generally. He has been a faculty member at Yale University and a researcher at Yahoo.

An Incentive-Based Architecture for Social Recommendations
Kostas Kollias

Recommender systems and social networking services are important Internet monetization tools and their popularity greatly facilitates web-based commercial activity. In this talk we will focus on how these tools can be coupled in order to form an architecture that incentivizes users to provide honest recommendations in a social network. The architecture consists of one distinct reputation system for each user, a recommendation language which allows users to place their reviews, a product ranking algorithm, and a revenue sharing mechanism. We will discuss the details of the architecture, the properties that allow users to profit by placing honest recommendations and how fast these recommendations lead to an equilibrium state.

Joint work with Rajat Bhattacharjee and Ashish Goel

Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction
Ali Dasdan

Web search is an indispensable resource for users to search for information on the Web. Web search is also an important service for publishers and advertisers to present their content to users. Thus, user satisfaction is the key and must be quantified. In this talk, our goal is to give a practical review of web search metrics from a user satisfaction point of view. This viewpoint is intuitive enough for a typical user to express his or her web search experience using it. The metrics that we are planning to cover fall into the following areas: relevance, coverage, comprehensiveness, diversity, discovery freshness, content freshness, and presentation. We will also describe how these metrics can be mapped to proxy metrics for the stages of a generic search engine pipeline. Our hope is that practitioners can apply these metrics readily and that researchers can get problems to work on, especially in formalizing and refining metrics.

This short talk is based on a half-day tutorial that the presenter with Kostas Tsioutsiouliklis and Emre Velipasaoglu will present at the WWW'09 conference.

Bio:
--------------
Ali Dasdan is a director managing the metrics, analysis, and algorithms group of Yahoo! Web Search. His group is responsible for defining and implementing white-box metrics for production web search systems, production web search data, and the Web. His group is also responsible for monitoring the metrics and diagnosing the causes for anomalies detected. His research interests are in web search and advertising since 2006. He obtained his PhD in Computer Science from the University of Illinois at Urbana-Champaign in 1999.

Trusts among Users of Online Communities -- an Epinions Case Study
Hady Lauw

Trust between a pair of users is an important piece of information for users in an online community (such as electronic commerce websites and product review websites) where users may rely on trust information to make decisions. In this paper, we address the problem of predicting whether a user trusts another user. Most prior work infers unknown trust ratings from known trust ratings. The effectiveness of this approach depends on the connectivity of the known web of trust and can be quite poor when the connectivity is very sparse which is often the case in an online community. In this paper, we therefore propose a classification approach to address the trust prediction problem. We develop a taxonomy to obtain an extensive set of relevant features derived from user attributes and user interactions in an online community. As a test case, we apply the approach to data collected from Epinions, a large product review community that supports various types of interactions as well as a web of trust that can be used for training and evaluation. Empirical results show that the trust among users can be effectively predicted using pre-trained classifiers.

Bio

Hady is currently a Visiting Researcher at Microsoft Search Labs. Prior to that, he earned his Ph.D. at Nanyang Technological University, Singapore under a fellowship funded by A*STAR (Singapore's NSF-equivalent). His research interests are in social networks and Web mining, covering such topics as rating behaviors, trust, and mass collaboration. His work has been published in various conferences and journals such as KDD, SDM, CIKM, WSDM, EC, and TKDE.

Top-k Aggregation Using Intersection of Ranked Inputs
Sergei Vassilvitskii

There has been considerable past work on efficiently computing top k objects by aggregating information from multiple ranked lists of these objects. An important instance of this problem is query processing in search engines: one has to combine information from several different posting lists (rankings) of web pages (objects) to obtain the top k web pages to answer user queries. Two particularly well-studied approaches to achieve efficiency in top-k aggregation include early-termination algorithms (e.g., TA and NRA) and pre-aggregation of some of the input lists. However, there has been little work on a rigorous treatment of combining these approaches.

We generalize the TA and NRA algorithms to the case when pre-aggregated intersection lists are available in addition to the original lists. We show that our versions of TA and NRA continue to remain instance optimal,'' a very strong optimality notion that is a highlight of the original TA and NRA algorithms. Using an index of millions of web pages and real-world search engine queries, we empirically characterize the performance gains offered by our new algorithms. We show that the practical benefits of intersection lists can be fully realized only with an early-termination algorithm.

Joint work with Ravi Kumar, Kunal Punera and Torsten Suel

Utility-Maximizing Privacy Mechanisms
Mukund Sundararajan

A mechanism for releasing information about a statistical database with sensitive data must resolve a trade-off between utility and privacy. Informally, publishing fully accurate information maximizes utility while minimizing privacy, while publishing random noise accomplishes the opposite. Formally, privacy can be rigorously quantified using the framework of {\em differential privacy}, which requires that a mechanism's output distribution is nearly the same (in a strong sense) whether or not a given database row is included or excluded. Thus far, (dis)utility has typically been quantified via the expected error of a (randomly perturbed) response to a query.

We pursue much stronger and more general utility guarantees. Just as differential privacy guarantees protection against every potential attacker, independent of its side information, we seek a mechanism that guarantees near-optimal utility to every potential user, independent of its side information. Formally, we model the side information of a potential user as a prior distribution over query results. An interaction between such a user and a mechanism induces a posterior distribution -- the utility of the mechanism for this user is defined as the accuracy of this posterior as quantified via a user-specific loss function. A differentially private mechanism $M$ is (near-)optimal for a given user $u$ if $u$ derives (almost) as much utility from $M$ as from any other differentially private mechanism.

We prove that for each fixed counting query and differential privacy level, there is a {\em geometric mechanism} $M^*$ --- a discrete variant of the well-studied Laplace mechanism --- that is {\em simultaneously utility-maximizing} for every possible user, subject to the differential privacy constraint. This is an extremely strong utility guarantee: {\em every} potential user $u$, no matter what its side information and loss function, derives as much utility from $M^*$ as from interacting with a differentially private mechanism $M_u$ that is optimally tailored to $u$. The first part of our proof characterizes the utility-maximizing differentially private mechanism for a fixed but arbitrary user in terms of a basic feasible solution to a linear program with constraints that encode differential privacy. The second part shows that all of the relevant vertices of this polytope (ranging over all possible users) are derivable from the geometric mechanism via suitable remappings of its range.

Joint work with Arpita Ghosh and Tim Roughgarden.

Online Advertising
Hamid Nazerzadeh

Online advertising is a multi-billion dollar growing market with very interesting properties. I briefly explain some of these properties, and talk about ad allocation problems in the context of sponsored search and display advertising.

References:

• Dynamic Cost-Per-Action Mechanisms and Applications to Online Advertising, with Amin Saberi, and Rakesh Vohra.
Proceedings of the 17th International World Wide Web Conference (WWW), 179-188, 2008.
• A Combinatorial Allocation Mechanism with Penalties For Banner Advertising, with Uriel Feige, Nicole Immorlica, and Vahab S. Mirrokni.
Proceedings of the 17th International World Wide Web Conference (WWW), 169-178, 2008.
• Advertisement Allocation for Generalized Second Pricing Schemes, with Ashish Goel, Mohammad Mahdian, and Amin Saberi
Fourth Workshop on Ad Auctions, 2008
• Allocating Online Advertisement Space with Unreliable Estimates, with Mohammad Mahdian and Amin Saberi.
Proceedings of the 8th ACM Conference on Electronic Commerce (EC), 288-294, 2007.

From Bayesian to Worst-Case Optimal Auction Design
Tim Roughgarden

We define a general template for auction design that explicitly connects Bayesian optimal mechanism design, the dominant paradigm in economics, with worst-case analysis. In particular, we establish a general and principled way to identify appropriate performance benchmarks for prior-free optimal auction design. This framework enables, for the first time, meaningful competitive analysis for auction problems with allocation asymmetries, bidder asymmetries, and costly payments.

The above is joint work with Jason Hartline (from STOC 2008 and ongoing). Time permitting, we will also briefly discuss another recent application of Bayesian optimal auction design: rigorous efficiency-revenue approximation trade-offs (joint work with Shaddin Dughmi and Mukund Sundararajan).

Social Design
Yury Lifshits
, Caltech, http://yury.name

Social Design is an art and science of setting and managing incentives in web applications. So far, machine learning is the dominant approach to all central algorithmic problems in the Web: search, advertising, recommendation systems. Social design can become an important counterpart to ML, if we find the ways to incentivise users to give us more and better input data and even the answers themselves.
This talk is an outline of a new research field of Social Design. We start with several stories including Internet press conference of president Putin, reputation crash in largest Russian social network, and Yahoo! SearchMonkey project. Then we attempt to extract, characterize and classify the commonly used patterns in social design. Finaly, we set up the research agenda in the area with focus on social aspects of semantic web and advertising technologies.
Slides available at http://yury.name/talks.html
Bio: Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two gold medals at International Mathematical Olympiads, received the Best Paper Award in Application Track of CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors": http://simsearch.yury.name

Inferring Links and Initiators
Evimaria Terzi
,IBM

Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value of 1 (or 0) in cell (i; j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix we study the problem of inferring the underlying directed links between entities (rows) and finding which entities act as initiators. I will formally define this problem and propose an MCMC framework for estimating the links and the initiators given the matrix of observations M. I will also show how this framework can be extended to incorporate a temporal aspect; instead of considering a single observation matrix M we consider a sequence of observation matrices M1; : : : ;Mt over time. Finally I will show that this problem is a generalization of several problems studied in the field of social-network analysis.
Bio: Evimaria Terzi is a Research Staff Member at IBM Almaden since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, MSc from Purdue University (USA) in 2002 and BSc from the University of Thessaloniki (Greece) in 2000.

Reputation Mechanisms for Online Markets
Christina Aperjis

In online marketplaces potential buyers have to decide whether to buy an item and how much to pay for it without being able to observe it and without knowing whether the seller is trustworthy. Reputation mechanisms promote trust by giving information about past transactions. With the goal of incentivizing sellers to advertise truthfully, we study how the marketplace is affected by (i) the reputation mechanism, and (ii) the way buyers interpret the seller?s reputation score.

Two-Stage Myopic Dynamics in Network Formation Games
Esteban Arcaute

We consider two-stage myopic dynamics for network formation games, where utility to a node is based on a function of the distance to all other nodes. Since our network formation game is a generalization of Myerson's announcement game, we use pairwise Nash stability as the solution concept.
We prove that, although the price of anarchy of the static game is unbounded (in the number of nodes in the network), our dynamics converge to a subset of pairwise Nash stable networks with a constant efficiency ratio. For some specific utility functions, we prove that our dynamics converge to efficient networks.
This is joint work with Ramesh Johari and Shie Mannor.

Measuring and Modeling the Web
Ying Xu

The last couple of decades have witnessed a phenomenal growth in the World Wide Web. The Web has become an ubiquitous channel for information sharing and dissemination. This talk describes several research contributions in an endeavor towards a better understanding of the Web.
One of the basic questions about the Web is its size, i.e., how many webpages there are on the Web. A closely related problem is the sizes of search engine indexes, as search engine index defines an important subset of the Web that is most easily accessible to users. In the last ten years both the scientific literature and the popular press dealt at length with methodologies and estimates for the sizes of the Web and public search engines. In the first part of the talk, we propose algorithms for measuring the Web size, including both empirical and theoretical results.
The Web is so large that even estimating its size is a hard problem. To better understand the Web, people study random web graph models -- a stochastic process that generates random graphs that with high probability resemble the Web graph. Random web graph models are useful theoretical tools to provide insights to the evolution of the Web graph, to predict future behaviors, and to generate synthetic data sets for testing and simulation. In the second half of the talk, we briefly present several results in analyzing web graph models and applying the models to real world applications.

Should ad networks bother fighting click fraud?
Bobji Mungamuru

Suppose an ad network decides that a given click-through is invalid (or "fraudulent"). The implication is that the ad network will not charge the advertiser for that click-through. Therefore, arguably, the ad network is "taking money out of his own pocket" by marking clicks invalid. As such, should ad networks even bother fighting fraud? We analyze a simple economic model of the online advertising market, and conclude that the answer is "yes".

Reputation systems
Robbie Yan

Effective reputation systems should incentivize users to acquire and reveal information about content quality. They should also counter distortions induced by strategic agents that benefit from manipulating reputations. Mechanisms have been proposed in the recent literature with the aim to provide such incentives. These mechanisms, which we will refer to as reputation markets, can be viewed as information markets designed to assess the quality of online content. In this paper we develop equilibrium models to study how incentives created by a reputation market should influence community behavior and the accuracy of assessments.

Optimal Marketing Strategies over Social Networks
Mukund Sundararajan

Abstract: We study revenue maximization in social networks. We identify a family of strategies called 'influence and exploit' strategies that are easy to implement, easy to optimize over and approximately optimal.
Based on a paper that will appear in WWW 08. Jointly with Jason Hartline (Northwestern University) and Vahab Mirrokni (MSR Redmond).i

Ratio Index for Budgeted Learning, with Applications
Brad Null

In the budgeted learning problem, we are allowed to experiment on a set of alternatives (given a fixed experimentation budget) with the goal of picking a single alternative with the largest possible expected payoff. Approximation algorithms for this problem were developed by Guha and Munagala by rounding a linear program that couples the various alternatives together. We present an index for this problem, which we call the ratio index, which also guarantees a constant factor approximation. Index based policies have the advantage that a single number (i.e. the index) can be computed for each alternative irrespective of all other alternatives, and the alternative with the highest index is experimented upon. This is analogous to the famous Gittins index for the discounted multi-armed bandit problem.
The ratio index has several interesting structural properties. First, it can be computed in strongly polynomial time; the techniques we develop may be useful for deriving strongly polynomial algorithms for other Markov Decision Problems. Second, with the appropriate discount factor, the Gittins index and the ratio index are constant factor approximations of each other, and hence the Gittins index also gives a constant factor approximation to the budgeted learning problem. Finally, the ratio index can be used to obtain a "discount oblivious" solution to the multi-armed bandit problem, which in turn gives constant factor approximations to a wide class of problems that we call "list ranking problems".
Joint work with Rajat Bhattacharjee, Ashish Goel, and Sanjeev Khanna

Web Graph Similarity for Anomaly Detection
Panagiotis Papadimitriou

Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this work we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from well-known document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.

Adchemy
Brian Babcock
,Adchemy

I'll talk about a number of computational problems that arise in online advertising, primarily from the perspective of the buyer of advertising.
Topics to be discussed include:
* When buying sponsored search links from search engines, how does one manage a portfolio of millions of keywords with minimal human involvement? Challenges include determining how much to bid for each keyword and what ad text to display to attract the most customers.
* How can one learn what properties of a web site or advertisement will lead to the greatest number of purchases or clicks? The problem is made more difficult by the fact that different classes of users respond well to different marketing messages, so it's best to use different content for different user segments.
* Consider an auction where the items to be sold are not known in advance, but rather arrive online, and buyers are willing to pay different amount for each item (revealed as the items arrive). Each buyer has a maximum limit on the total number of items that they will buy. How should the seller assign items to buyers so as to maximize the seller's revenue?

An Axiomatic Approach to Personalized Ranking Systems
Alon Altman

Personalized ranking systems and trust systems are an essential tool for collaboration in a multi-agent environment. In these systems, trust relations between many agents are aggregated to produce a personalized trust rating of the agents. In this talk I introduce the first extensive axiomatic study of this setting, and explore a wide array of well-known and new personalized ranking systems. We adapt several axioms (basic criteria) from the literature on global ranking systems to the context of personalized ranking systems, and fully classify the set of systems that satisfy all of these axioms. We further show that all these axioms are necessary for this result.

Ad Auctions with Reserved Price
Arash Asadpour

I will talk about the Ad Auctions when the auctioneer enjoys a relevant ad that she can show in one of the slots if she wants to. The setting is to some extent similar to the Ad Auctions with reserved price. I will explain a randomized auction which is truthful in expectation for both the bidders and the auctioneer. Also our mechanism is individual rational and budget balanced. I will discus the revenue of this auction and also will talk about some other settings in which the technique used to design the auction can be helpful.
This is a joint work with Kamal Jain (Microsoft Research) and David Abraham (CMU).

Boosting the area under ROC curve
Phil Long
, Google

We show that any weak ranker that can achieve an area under the ROC curve slightly better than 1/2 (which can be achieved by random guessing) can be efficiently boosted to achieve an area under the ROC curve arbitrarily close to 1. This boosting can be performed even in the presence of independent misclassification noise, given access to a noise-tolerant weak ranker.
(This is joint work with Rocco Servedio.)

Random input models for Adwords
Aranyak Mehta
, Google

I will talk about a new result on Random Input models for the Adwords Problem: What are good algorithms for allocation when the query stream is modeled as a random process, rather than as a worst case input. How does the natural "highest-bidder-wins" greedy algorithm perform? This question leads to an interesting generalization of a classic online matching algorithm. I will also talk about a related result on extended bidding models for Ad auctions.

Who do you know: The Social Network Revolution
Orkut Buyukkokten
, Google

Online social networks fundamentally change the way we get connected. The people we cross paths with have the biggest influence in our lives. Now it's easier to cross paths than ever as we are much closer and so much more connected. In this talk I will discuss the motivation behind the development of orkut.com, touch on the social and technical aspects of implementing and maintaining a system that has over 60 million users and reflect on the lessons we learned.

Deterministic Decentralized Search in Random Graphs
Esteban Arcaute

We study a general framework for decentralized search in random graphs. Our main focus is on deterministic memoryless search algorithms that use only local information to reach their destination in a bounded number of steps in expectation. This class includes (with small modifications) the search algorithms used in Kleinberg's pioneering work on long-range percolation graphs and hierarchical network models. We give a characterization of searchable graphs in this model, and use this characterization to prove a monotonicity property for searchability.
Joint work with Ning Chen, Ravi Kumar, David Liben-Nowell, Mohammad Mahdian, Hamid Nazerzadeh and Ying Xu.

Interdomain Routing and Games
Michael Schapira
, Hebrew University of Jerusalem ( http://www.cs.huji.ac.il/~mikesch/)

We present a game-theoretic model that captures many of the intricacies of interdomain routing in today's Internet. In this model, the strategic agents are source-nodes located on a network, who aim to send traffic to a unique destination node . The model enables complex dynamic interactions between the agents (asynchronous, sequential, and based on partial-information).
We provide realistic conditions that guarantee that executing the Border gateway Protocol (BGP), the only protocol currently used for interdomain routing, is in the best interest of each of the players. Moreover, we show that even coalitions of players of any size cannot improve their routing outcomes by jointly deviating from BGP. Unlike the vast majority of works in mechanism design, these results do not require any monetary transfers (to or by the agents).
Based on joint work with Hagay Levin and Aviv Zohar. The Anatomy of Clickbot.A
Neil Daswani
, Google

In this talk, I will present a detailed case study of the architecture of the Clickbot.A botnet that attempted a low-noise click fraud attack against syndicated search engines. The botnet of over 100,000 machines was controlled using a HTTP-based botmaster. Google identified all clicks on its ads exhibiting Clickbot.A-like patterns and marked them as invalid. We disclose the results of our investigation of this botnet to educate the security research community and provide information regarding the novelties of the attack.

Data Stream Algorithms with Applications to Automatic Worm Fingerprinting
Dilys Thomas

We will first go over a few network worms and contemporary algorithms to detect these network worms. We will then go over some recent techniques to detect these worms using "vunlerability signatures". We will then use this to review two important problems in data stream computation -- distinct value estimation and frequent element computation. We show how these algorithms can be used for automatic worm fingerprinting.

Click Fraud Resistant Methods for Learning Click-Through Rates
Sergei Vassilvitskii

Presenting the paper "Click Fraud Resistant Methods for Learning Click-Through Rates" by N. Immorlica, K. Jain, M. Mahdian, and K. Talwar from WINE 2005.

Pricing Partially Compatible Products
Adam Meyerson
, UCLA

Suppose two competing companies provide roughly equivalent suites of software products. Each version of a product has a possibly distinct price and quality. Customers would like to purchase a low- price, high-quality set of products. The issue is that products made by different companies need not be fully compatible. We reduce the customer's decision process to a minimum cut, and provide approximation and hardness results for the problem of computing the company's best-response strategy in several different models. We also introduce a range of questions for further work.
These results will appear in the ACM Conference on Electronic Commerce (EC) 2007, and are joint work with David Kempe, Nainesh Solanki, and Ramnath Chellappa.

Google News Personalization: Scalable Online Collaborative Filtering
Mayur Datar
, Google

Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.

Detecting Near-Duplicates for Web Crawling
Anish Das Sarma

This is joint work with Gurmeet Singh Manku and Arvind Jain in Google.
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrelevant for web search. So the quality of a web crawler increases if it can assess whether a newly crawled web page is a near-duplicate of a previously crawled web page or not.
In my talk I will address the problem of efficiently detecting near-duplicates in the web crawl setting. First I shall review Charikar's fingerprinting technique used in our approach. Then, I shall formally define the problem and present algorithmic solutions for two versions of the problem: online queries (single new document) and batch queries (multiple new documents). Finally, I shall briefly talk about our experimental analysis of the techniques.

Clickthrough of and Bidding on keywords
Aleksandra Korolova / Dilys Thomas

Aleksandra will talk about the paper "Predicting ClickThrough Rate Using Keyword Clusters" by Regelson and Fain from a Workshop on Sponsored Search Auctions. The results in the paper are mostly experimental. Dilys will briefly talk on "how to bid on keywords".

Multi-path Routing
Mihaela Enachescu

I will present an approach for enabling multi-path routing in large networks. This is based on a project at NEC Labs, and is joint work with Ravi Kokku. Our approach is to create multiple paths between various source-destination pairs by selecting a special subset of relay nodes. We give hardness results and provide some approximation algorithms.
This work is motivated by the observation that given two end-host in the Internet there exist potentially many independent or partially overlapping paths between them. However, only one of these is usually used by the underlay routing, leading to many inefficiencies (the single path routings are usually slow to adapt in the presence of fluctuating network conditions). Multi-path routing provides a more adaptive framework, and it also raises a variety of (still open) problems for our enjoyment.

Collaborative tagging systems
Paul Heymann

Collaborative tagging systems have recently become prevalent on the web. These systems are similar to classical keyword-based systems, but harness the input of many users to label large datasets like URLs on the web, videos, and photos. In my talk, I will give a brief description of tagging systems, and then talk about two recent projects. The first is work to predict tags annotating URLs based on the page text of those URLs. The outcome of this task has important implications for web search and for the usefulness of data created by social bookmarking systems. The second is an earlier (continuing) project on tag hierarchies.

Is Efficiency Expensive?
Mukund Sundararajan

We study the simultaneous optimization of efficiency and revenue in pay-per-click keyword auctions in a Bayesian setting. Our main result is that the efficient keyword auction yields near-optimal revenue even under modest competition. In the process, we build on classical results in auction theory to prove that increasing the number of bidders by the number of slots outweighs the benefit of employing the optimal reserve price.
This is joint work with Tim Roughgarden.

Estimating Search Engine Index and Web Sizes
Ying Xu

I will talk about estimating the index size of search engines, both empirical and theoretical results. The talk is based on our CIKM paper "Estimating Corpus Size via Queries" and ICALP paper "Estimating Sum by Weighted Sampling".

Reusable and multiple goods
Ashish Goel

The talk is based on
Online auctions with re-usable goods by Mohammad Hajiaghayi, Robert Kleinberg, Mohammad Mahdian, and David Parkes, EC'05
The Sequential Auction Problem on eBay: An Empirical Analysis and a Solution by Adam I. Juda and David C. Parkes, EC'06

Mechanism design based on Cost-Per-Acquisition (CPA)
Hamid Nazerzadeh

The Internet advertising has changed dramatically by moving from Cost-Per-Impression charging mechanism to Cost-Per-Click (CPC). I am going to talk about the possibility of going one step further and using CPA instead of CPC, and some of our results about this problem.

Asymptotically Optimal Repeated Auctions for Sponsored Search
Nicolas Lambert

We investigate asymptotically optimal keyword auctions, that is, auctions which maximize revenue as the number of bidders grows. We do so under two alternative behavioral assumptions. The first explicitly models the repeated nature of keyword auctions. It introduces a novel (and, we argue, plausible) assumption on individual bidding, namely that bidders never overbid their value, and bid their actual value if shut out for long enough. Under these conditions we present a broad class of repeated auctions that are asymptotically optimal among all sequential auctions (a superset of repeated auctions). Those auctions have varying payment schemes but share the ranking method. The Google auction belongs to this class, but not the Yahoo auction, and indeed we show that the latter is not asymptotically optimal. (Nonetheless, with some additional distributional assumptions, the Yahoo auction can be shown to belong to a broad category of auctions that are asymptotically optimal among all auction mechanisms that do not rely on ad relevance.) We then look at the one-shot keyword auction, which can be taken to model repeated auctions in which relatively myopic bidders converge on the equilibrium of the full-information stage game. In this case we show that the Google auction remains asymptotically optimal and the Yahoo auction suboptimal. The distributional assumptions under which our theorems hold are quite general, and arguably hold in all real-world situations of interest. We do however show that the Google auction is not asymptotically revenue maximizing for general distributions.