Archive of RAIN talks
Winter 12-13
Autumn 12-13
Spring 11-12
Winter 11-12
| Date | Speaker | Topic |
|---|---|---|
| January 18 | Michael Wellman | Price Prediction Strategies for Simultaneous One-Shot Auctions |
| January 25 | George Varghese | Lattice Games and the Economics of Aggregators |
| February 1 | Michael Kearns | Competitive Contagion in Networks |
| February 8 | Stephen Barley | Building an Institutional Field to Corral a Government |
| February 15 | Siddharth Suri | Cooperation and Assortativity with Endogenous Partner Selection |
| February 22 | - | - |
| February 29 | - | - |
| March 6 | Sergiu Hart | Two(!) Good To Be True |
| March 7 | Dean Eckles | Identifying peer effects in online communication behaviors |
Autumn 11-12
| Date | Speaker | Topic |
|---|---|---|
| October 5 | Jure Leskovec | Networks, communities and the ground-truth |
| October 12 | Vincent Conitzer | Computing Game-Theoretic Solutions for Security |
| October 19 | Costis Daskalakis | Revenue-optimal Auctions |
| October 26 | David Parkes | A Random Graph Model of Multi-Hospital Kidney Exchanges |
| November 2 | - | - |
| November 9 | Arnout van de Rijt | A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs |
| November 16 | Markus Mobius | Treasure Hunt |
| November 23 | - | Thanksgiving |
| November 30 | Yiling Chen | Task Routing for Prediction Tasks |
Spring 10-11
Winter 10-11
| Date | Speaker | Topic |
|---|---|---|
| January 12 | Martin Suchara | BGP Safety with Spurious Updates - The Conditions of BGP Convergence |
| January 19 | No seminar | - |
| January 26 | Gabor Szabo | Predicting the Popularity of Online Content |
| February 2 | No seminar | - |
| February 9 | No seminar | - |
| February 16 | Elisa Celis | Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism |
| February 23 | Yashodhan Kanoria | Social Learning and the Dynamic Cavity Method |
| March 2 | Matthias Mnich | The Complexity Ecology of Consensus Ranking |
Autumn 10-11
| Date | Speaker | Topic |
|---|---|---|
| September 22 | Elliot Anshelevich | Contribution Games in Social Networks |
| September 29 | Pranav Dandekar | Liquidity in Credit Networks: A Little Trust Goes a Long Way |
| October 6 | Sugato Basu | User Browsing Models: Relevance versus Examination |
| October 13 | Daria Sorokina | Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank |
| October 20 | Marc Najork | Reflections on Link-based Ranking |
| October 27 | - |
- |
| November 3 | Panagiotis Papadimitriou | Output URL Bidding |
| November 10 | Roberto Bayardo | A Technical Overview of Search Ads Quality at Google |
| November 17 | Georgios Pierrakos | On the Competitive Ration of Online Sampling Auctions |
| November 24 | - |
- |
| December 1 | Paolo Parigi | The importance of ties |
Spring 09-10
Winter 09-10
| Date | Speaker | Topic |
|---|---|---|
| Jan 13 | Jure Leskovec | Network of positive and negative edges |
| Jan 20 | Esteban Arcaute | |
| Jan 27 | Christina Aperjis | Temporal User Behavior in Yahoo Answers |
| Feb 3 | Ashish Goel | Some open problems motivated by social networks |
| Feb 10 | Yehuda Koren | The Netflix Prize: Quest for $1,000,000 |
| Feb 17 | DJ Patil | Data Jujitsu - the art of treating data as a product |
| Feb 24 | Panayiotis Tsaparas | Query-log analysis for improving web search |
| Mar 3 | Arash Asadpour | Stochastic Submodular Optimization |
| Mar 10 | Ravi Kumar | Compressibility of Behavioral Graphs |
| Mar 17 | Sep Kamvar | Aardvark |
Autumn 09-10
| Date | Speaker | Topic |
|---|---|---|
| Sep 30 | Arvind Narayanan | De-anonymizing Social Networks |
| Oct 7 | Nicolas Lambert | Eliciting Information on the Distribution of Future Outcomes |
| Oct 14 | Lars Backstrom | Optimizing Web Traffic via the Media Scheduling Problem |
| Oct 21 | Michael Kearns | Behavioral Experiments in Strategic
Networks Note special location: Terman Engineering Center, Room 453 |
| Oct 28 | Matthew Jackson | How Social Network Structure affects Diffusion and Learning |
| Nov 4 | Paul Valiant | How to Design Profitable Auctions |
| Nov 11 | Mohammad Mahdian | Sort me if you can (or, Algorithms on Dynamic Data) |
| Nov 18 | DJ Patil | Canceled! |
| Nov 25 | - |
No meeting -- Happy Thanksgiving! |
| Dec 2 | Abdur Chowdhury | Discovery & Emergence Note special location: Terman Engineering Center, Room 453 |
Spring 08-09
| Date | Speaker | Topic |
|---|---|---|
| Apr 3 | Mayur Datar |
Internet search advertising: Challenges in targeting and quality considerations |
| Apr 10 | - |
No meeting this week |
| Apr 17 | D Sivakumar |
Affiliation Networks |
| Apr 24 | - |
No meeting -- WWW 2009 |
| May 1 | - |
No meeting -- BAGT |
| May 8 | Sandeep Pandey Yahoo! Research |
Using Bandit Models in Search Engines |
| May 14 | Richard Chatwin VP Research, Adchemy |
Research Challenges in Online Advertising |
| May 22 | Robert Johnson |
Scaling Social Data |
| May 29 | Yury Lifshits Yahoo! Research |
Data Cloud: Integrating Structured Information into Web Search |
| June 5 | Robbie Yan | Manipulation Robustness of Collaborative Filtering Systems |
Winter 08-09
| Date | Speaker | Topic |
|---|---|---|
| Jan 16 | - | Organizational meeting |
| Jan 23 | G.D. Ramkumar SnapTell, Inc. |
Scalable image recognition algorithms for camera phone applications |
| Jan 30 | Luca de Alfaro | Content-Driven Reputation and Trust for Collaborative Content |
| Feb 6 | Ioannis Antonellis | Tagging with Queries: How and Why? (To appear in WSDM 2009) |
| Feb 13 | Scott Brave Baynote, Inc. |
Content Targeting and Recommendation Systems |
| Feb 20 | Ravi Kumar | Online Social Networks: Modeling and Mining |
| Feb 27 | Michael Mahoney | Community Structure in Large Social and Information Networks |
| Mar 6 | Kostas Kollias | An Incentive-Based Architecture for Social Recommendations |
| Mar 13 | Ali Dasdan | Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction |
Autumn 08-09
| Date | Speaker | Topic |
|---|---|---|
| Oct 3 | - | Talk by Jure Leskovec and organizational meeting |
| Oct 10 | Hady Lauw | Trusts among Users of Online Communities -- an Epinions Case Study |
| Oct 17 | Canceled | - |
| Oct 24 | Bahman Bahmani | Online Budgeted Matching in Random Input Models with Applications to Adwords |
| Oct 31 | Sergei Vassilvitskii | Top-k Aggregation Using Intersection of Ranked Inputs |
| Nov 7 | - | No meeting - Bay Area Theory Seminar (BATS) |
| Nov 14 | Mukund Sundararajan | Utility-Maximizing Privacy Mechanisms |
| Nov 21 | Hamid Nazerzadeh | Online advertising |
| Nov 28 | - | No meeting - Happy Thanksgiving! |
| Dec 5 | Tim Roughgarden | From Bayesian to Worst-Case Optimal Auction Design |
Spring 07-08
| Date | Speaker | Topic |
|---|---|---|
| Apr 11 | BAGT | |
| Apr 18 | Robbie Yan | Reputation Markets |
| Apr 25 | Cancelled for WWW | |
| May 2 | Bobji Mungamuru | Click Fraud |
| May 9 | Ying Xu | Oral Examination |
| May 16 | Esteban Arcaute | Network Formation Games |
| Mar 23 | Christina Aperjis | Reputation Systems |
| May 30 | Evimaria Terzi | Inferring Links and Initiators |
| June 6 | Yury Lifshits | Social Design |
Winter 07-08
| Date | Speaker | Topic |
|---|---|---|
| Jan 25 | Alon Altman | An Axiomatic Approach to Personalized Ranking Systems |
| Feb 1 | Ashish Goel | Open Problems |
| Feb 8 | Cancelled | |
| Feb 15 | Brian Babcock | Adchemy |
| Feb 22 | Rajat Bhattacharjee | Oral Examination |
| Feb 29 | Panagiotis Papadimitriou | Web Graph Similarity for Anomaly Detection |
| Mar 7 | Brad Null | Ratio Index for Budgeted Learning, with Applications |
| Mar 14 | Mukund Sundararajan | Optimal Marketing Strategies over Social Networks |
Autumn 07-08
| Date | Speaker | Topic |
|---|---|---|
| Oct 15 | Michael Schapira | Interdomain Routing and Games |
| Oct 22 | Esteban Arcaute | Decentralized Search of Random Graphs |
| Oct 29 | Orkut Buyukkokten | Social Network Revolution |
| Nov 5 | Aranyak Mehta | Random input models for Adwords |
| Nov 12 | Phil Long | Boosting the area under the ROC curve |
| Nov 19 | Thanks-giving | |
| Nov 26 | cancelled | |
| Dec 3 | Arash Asadpour | Ad auctions with reserved price |
Spring 06-07
| Date | Speaker | Topic |
|---|---|---|
| Apr 13 | Ying Xu | a discussion on WWW'07 papers |
| Apr 20 | BAGT | |
| Apr 27 | Anish Das Sarma | Detecting Near-Duplicates for Web Crawling |
| May 4 | Mayur Datar | Google News Personalization |
| May 11 | Adam Meyerson | Pricing Partially Compatible Products |
| May 18 | Sergei Vassilvitskii | Click Fraud Resistant Methods for Learning Click-Through Rates |
| May 25 | Dilys Thomas | Streaming Algorithms and Worm Fingerprinting |
| June 1 | Neil Daswani | Anatomy of Clickbot.A |
Winter 06-07
| Date | Speaker | Topic |
|---|---|---|
| Jan 26 | Nicolas Lambert | Asymptotically Optimal Repeated Auctions |
| Feb 2 | Hamid Nazerzadeh | Mechanism design based on CPA |
| Feb 9 | Ashish Goel | Reusable and multiple goods |
| Feb 16 | Ying Xu | Estimating Search Engine Index and Web size |
| Feb 23 | Mukund Sundararajan | Is efficiency expensive? |
| March 2 | Paul Heymann | Collaborative tagging systems |
| March 9 | Mihaela Enachescu | Multi-path Routing |
| March 16 | Aleksandra Korolova / Dilys Thomas | Clickthrough of and Bidding on keywords |
Autumn 06-07
| Date | Speaker | Topic |
|---|---|---|
| Oct 25 | Ying Xu | The Netflix Challenge |
| Nov 1 | Ashish/Alexandra | Social Networks |
| Nov 8 | Arash/Shubha | PageRank |
| Nov 15 | TBA | TBA |
| Nov 22 | - | Netflix follow-up |
| Nov 29 | Zoltan Gyongyi | TBA |
| Dec 6 | Sreenivas Gollapudi | TBA |
Spring 05-06
| Date | Speaker | Topic |
|---|---|---|
| Apr 7 | Ashish Goel | Axiomatic PageRank |
| Apr 14 | - | Organizational Meeting |
| Apr 21 | Sergei Vassilvitskii | Relevance Feedback and Web Search |
| Apr 28 | Hamid Nazerzadeh | AdWords with Unreliable Estimates |
| May 5 | Mihaela Enachescu | Compact Routing |
| May 12 | Ying Xu | Streaming Algorithms for Graph Problems |
| May 19 | - | - |
| May 26 | Udi Manber | New Problems in Search |
| Jun 2 | Rajat Bhattacharjee | Incentive Based Ranking Mechanisms |
Winter 05-06
| Date | Speaker | Topic |
|---|---|---|
| Jan 20 | - | Organizational Meeting |
| Jan 27 | Brad Null | Multiarmed Bandit Problem |
| Feb 3 | Edith Cohen | Spatially-Decaying Aggregation |
| Feb 10 | Andrew Tomkins | Routing and Growth Models for Social Networks |
| Feb 17 | - | BAGT |
| Feb 24 | Amin Saberi | Open Problems |
| Mar 3 | Mayur Datar, Ashutosh Garg | Recommendation Systems |
| Mar 10 | Amruta Joshi | Googlebase |
Fall 05-06
| Date | Speaker | Topic |
|---|---|---|
| Oct 7 | Roee Engelberg | Equilibria in Online Games |
| Oct 14 | Ying Xu, Sergei Vassilvitskii | Models for the Web Graph |
| Oct 21 | Gagan Aggarwal | AdWords |
| Oct 28 | Omar Benjelloun | SERF |
| Nov 4 | Zoltan Gyongyi | Link Spam |
| Nov 11 | Mihaela Enachescu | Query Incentive Networks |
| Nov 18 | - | BATS |
| Dec 2 | Rajat Bhattacharjee | Survey of Recommendation Systems |
Spring 04-05
| Date | Speaker | Topic |
|---|---|---|
| Apr 4 | Shubha Nabar | Impact of Search Engines on Evolution of Page Popularity |
| Apr 11 | Svetlozar Nestorov | Mining Structure from Semistructured Data |
| Apr 18 | Sepandar Kamvar | Reputation Management in P2P Networks |
| Apr 25 | Rina Panigrahy | PageRank Axioms |
| May 2 | Amin Saberi | Spread of Viruses on the Internet |
| May 9 | Ramesh Johari | Stable Policy Routing with Provider Independence |
| May 16 | Rajat Bhattacharjee | Reputation Systems |
| May 23 | Mehran Sahami | Matching Text on the Web |
| Jun 6 | Amruta Joshi | Modeling Documents |
Winter 04-05
| Date | Speaker | Topic |
|---|---|---|
| Feb 2 | Ying Xu | Recommendation Systems/Collaborative Filtering |
| Feb 9 | Ashish Goel | Spam and Page Rank |
| Feb 16 | Gagan Aggarwal | AdWords |
| Feb 23 | David Arthur | Gossip and File Sharing |
| Mar 2 | Anil Kamath | AdWords |
| Mar 10 | Sergei Vassilvitskii | |
| Mar 16 | Dilys Thomas | Social Networks |
Abstracts
Current challenges in kidney exchange - need for long chains and dynamic matchingItai Ashlagi, MIT Sloan
A kidney exchange network establishes a pool of patient-donor pairs, non-directed donors, and patients on the deceased donor waiting lists, and seeks to arrange transplants among this pool.
It has been previously shown that for sufficiently large pools of patient-donor pairs,
(almost) efficient kidney exchange can be achieved by using at most 3-way cycles, i.e. by using cycles among no more than 3 patient-donor pairs.
However, as kidney exchange has grown in practice, cycles among n > 3 pairs have proved useful, and long chains initiated by non-directed, altruistic donors have proven to be very effective. We explore why this is the case, both empirically and theoretically. We provide an analytical model of exchange when there are many highly sensitized patients, and show that large cycles of exchange or long chains can significantly increase efficiency when the opportunities for exchange are sparse. As very large cycles of exchange cannot be used in practice, long non-simultaneous chains initiated by non-directed donors significantly increase efficiency in patient pools of the size and com- position that presently exist. Most importantly, long chains benefit highly sensitized patients without harming low-sensitized patients.
One way to make kidney exchange networks thicker is by waiting for pairs to arrive—waiting longer allows the pool to become larger, but imposes costs on those already in the pool. We study empirically and theoretically the impact of the waiting periods on the number of matches.
Based on joint work with David Gamarnik, Alvin Roth, Mike Rees, Patrick Jaillet and Vahideh Manshadi
Take it or Leave it: Running a Survey when Privacy Comes at a Cost
Katrina Ligett, Caltech
In this talk, we consider the problem of estimating a potentially sensitive (individually stigmatizing) statistic on a population. In our model, individuals are concerned about their privacy, and experience some cost as a function of their privacy loss. Nevertheless, they would be willing to participate in the survey if they were compensated for their privacy cost. These cost functions are not publicly known, however, nor do we make Bayesian assumptions about their form or distribution. Individuals are rational and will misreport their costs for privacy if doing so is in their best interest. Ghosh and Roth recently showed in this setting, when costs for privacy loss may be correlated with private types, if individuals value differential privacy, no individually rational direct revelation mechanism can compute any non-trivial estimate of the population statistic. In this paper, we circumvent this impossibility result by proposing a modified notion of how individuals experience cost as a function of their privacy loss, and by giving a mechanism which does not operate by direct revelation. Instead, our mechanism has the ability to randomly approach individuals from a population and offer them a take-it-or-leave-it offer. This is intended to model the abilities of a surveyor who may stand on a street corner and approach passers-by.
Joint work with Aaron Roth.
paper: http://users.cms.caltech.edu/~katrina/papers/buyingprivacy.pdf
Deceased Organ Donation and Allocation: 3 Experiments in Market Design
Al Roth, Stanford
How should you allocate a scarce resource whose scarcity/availability might be sensitive to how you allocate it? I'll talk about some work I'm just beginning involving the donation and allocation of deceased donor organs for transplantation. This presents a different set of problems from our (more advanced) work on live organ donation and kidney exchange that has received a lot of recent publicity (and was the subject of Itai Ashlagi's recent talk). However the evidence suggests that even for the case of kidney transplantation the current scarcity cannot be solved by live donation alone, and for most organs live donation is not feasible. I'll discuss the new Israeli organ transplant law, and some experiments that will help us understand it better as field data become available. I'll also discuss an experiment in how organ donors are solicited in the United States.
DIMACS Workshop on Economic Aspects of Information Sharing
The DIMACS Workshop on Economic Aspects of Information Sharing is being held at Stanford Feb 7-8, 2013. It is being co-hosted and co-sponsored by RAIN.
In recent years, the collection and analysis of user data over the Internet has experienced a surge. Online services and social networking sites have facilitated and encouraged the online disclosure of a wide array of personal information, while webtracking platforms have enabled the consolidation of user data from disperse sources. The reach of these services has been significantly extended by the integration of mobile devices with the cloud. Companies increasingly view data analytics as an integral part of their operations, and there is an increased need of new means of monetizing data beyond advertising. In parallel, regulatory bodies, consumer advocacy groups as well as Internet companies themselves have struggled to find a balance between the financial benefits of user data monetization and the privacy concerns that it raises.
Motivated by these developments, this workshop aims at studying economic aspects of information sharing from several different aspects, including online data markets, targeted advertising, user incentive mechanisms, and privacy-aware mechanism design.
Citation networks: Evaluating the impact of science
Santo Fortunato, Aalto University, Finland
Citation statistics are regularly adopted to evaluate different actors in the academic and scientific arena, from scholars, to journals, departments, universities, national institutions and whole countries. The scores play a crucial role to decide which grants are awarded, the rankings of applicants for a job, even the fate of scientific institutions. For these reasons citation networks have been carefully investigated over the past decades. I will discuss some recent findings in this area. Citation distributions are universal across many different disciplines, if suitable normalized indicators are adopted as measures of performance, so avoiding the bias of field dependence. Furthermore, citation dynamics is characterized by bursts, usually occurring within a few years since the publication of a paper, and the burst size spans several orders of magnitude. This bursty dynamics can be described by a linear preferential attachment model with time dependent initial attractiveness. The model successfully reproduces the empirical citation distributions and accounts for the presence of citation bursts as well.
Crowd-Powered Systems
Michael Bernstein, Stanford
Crowd-powered systems combine computation with human intelligence, drawn from large groups of people connecting and coordinating online. These hybrid systems enable applications and experiences that neither crowds nor computation could support alone.
Unfortunately, crowd work is error-prone and slow, making it difficult to incorporate crowds as first-order building blocks in software systems. I introduce computational techniques that decompose complex tasks into simpler, verifiable steps to improve quality, and optimize work to return results in seconds. These techniques advance crowdsourcing into a platform that is reliable and responsive to the point where crowds can be used in interactive systems.
In this talk, I will present two crowd-powered systems to illustrate these ideas. The first, Soylent, is a word processor that uses paid micro-contributions to aid writing tasks such as text shortening and proofreading. Using Soylent is like having access to an entire editorial staff as you write. The second system, Adrenaline, is a camera that uses crowds to help amateur photographers capture the exact right moment for a photo. It finds the best smile and catches subjects in mid-air jumps, all in realtime. These systems point to a future where social and crowd intelligence are central elements of interaction, software, and computation.
Rumor in a network: Who's the culprit?
Devavrat Shah, MIT
Suppose a malicious "worm" has spread in a network (e.g. Stuxnet) and interest is in finding who is the source (so that many of the following possible actions be taken: report news to the world, reward, take punitive action). Motivated by such scenarios where "information" diffuses in the network starting from an unknown source and interest is in identifying the "source" based on the observed spread, we discuss the problem of finding the source of a rumor in a network. We will design a computationally efficient source estimator for the popular Susceptible-Infected (SI) information spreading model with generic spreading time. In establishing the effectiveness of the estimator, we shall explain the role played by the underlying network structure (a form of expansion property) in determining ability of detection. Connections to generalized Polya's urn, network centrality and number 1 - ln 2 will be explained.
This is based on joint work with Tauhid Zaman (MIT).
On Bitcoin and Red Balloons
Moshe Babaioff, Microsoft Research
Many large decentralized systems rely on information propagation to ensure their proper function. We examine a common scenario in which only participants that are aware of the information can compete for some reward, and thus informed participants have an incentive {\em not} to propagate information to others. One recent example in which such tension arises is the 2009 DARPA Network Challenge (finding red balloons). We focus on another prominent example: Bitcoin, a decentralized electronic currency system.
Bitcoin represents a radical new approach to monetary systems. It has been getting a large amount of public attention over the last year, both in policy discussions and in the popular press \cite{NY11,technology-review}. Its cryptographic fundamentals have largely held up even as its usage has become increasingly widespread. We find, however, that it exhibits a fundamental problem of a different nature, based on how its incentives are structured. We propose a modification to the protocol that can eliminate this problem. Bitcoin relies on a peer-to-peer network to track transactions that are performed with the currency. For this purpose, every transaction a node learns about should be transmitted to its neighbors in the network. As the protocol is currently defined and implemented, it does not provide an incentive for nodes to broadcast transactions they are aware of. In fact, it provides an incentive not to do so. Our solution is to augment the protocol with a scheme that rewards information propagation. Since clones are easy to create in the Bitcoin system, an important feature of our scheme is Sybil-proofness. We show that our proposed scheme succeeds in setting the correct incentives, that it is Sybil-proof, and that it requires only a small payment overhead, all this is achieved with iterated elimination of dominated strategies. We complement this result by showing that there are no reward schemes in which information propagation and no self-cloning is a dominant strategy.
Joint with Shahar Dobzinski, Sigal Oren, and Aviv Zohar.
-------------------------------
You can find the paper here and here.
A short description appeared at ACM SIGecom Exchanges.
Information Asymmetries in first-price common-value auctions, or The Value of Information
David Kempe, USC
We study the role of information asymmetries in first-price common value
auctions, and how different information structures provide revenue opportunities
to third-party information brokers. One of our motivations is Internet ad
auctions, where ads for specific site visitors have a common-value nature, in
that displaying an ad to a bot has no value, and users are likely to carry the
same (or similar) value to different advertisers. Cookies significantly improve
the estimate of this value; the fact that different advertisers use cookies to
different extents (and use cookies of different qualities) thus leads to
significant information asymmetries. These asymmetries can be mitigated or
exacerbated when the advertisers obtain cookie information from third-party
information brokers.
We focus on a first-price common value auction in which bidders have access to
independent discrete signals about the value of an
item, which is either 0 or 1. Our first main result is that with two bidders,
this auction format has a unique Nash equilibrium; we give a linear-time
algorithm for computing this equilibrium and the resulting value of the
auction for each player. We then use this equilibrium calculation to
understand the value of additional information to the bidders.
Third-party information has the interesting property that it can (in
principle) be allocated to multiple bidders, and that it carries
significant externalities.
We find that the value of additional information is subtle, and depends on the
prior and the information already available. We exhibit several surprising
possibilities including (1) Additional information about the item does not
always result in higher value for a bidder. (2) A bidder may prefer his
competitor (but not himself) to receive the broker's signal; this applies both
to the more and less informed bidder.
[Joint work with Vasilis Syrgkanis and Eva Tardos]
Random Utility Models for Social Choice
Lirong Xia, Harvard
Social choice studies ordinal preference aggregation with
wide applications ranging from high-stakes political elections to
low-stakes movie rating websites. In many cases, we want to design a
social choice mechanism that reveals the ground truth via MLE
inference. Despite its importance, this objective has been largely
hindered by the lack of natural probabilistic models and efficient
inference algorithms.
In this talk, I will focus on a wide class of probabilistic models
called Random Utility Models (RUMs), whose MLE inference was
previously believed intractable in general. I will introduce a fast
MC-EM algorithm for a very general and natural subclass of RUMs, and
discuss its applications and impact on designing better social choice
mechanisms. Extension of this algorithm also provides a computational
basis for improving models in many applications in economics as well
as machine learning, especially learning to rank.
Based on joint work with Hossein Azari Soufiani and David C. Parkes.
Advertiser Behavior in Sponsored Search Auctions
Susan Athey, Stanford
Economists often build "structural models," where they specify a specific model of individual behavior and then use data to estimate the parameters of the model. Although such models require strong assumptions, they have the advantage that they can make principled predictions about what would happen if the environment changed in a way that has not been observed in the past. I will describe the application of these techniques to advertiser behavior in sponsored search advertising auctions, focusing on how the models can be used for marketplace design and management. I will discuss economists' focus on causal inference in statistical models as well as the ways in which experiments can be used to estimate and test structural models. I will also make some suggestions about research directions at the intersection of economics and machine learning.
Automatic Discovery of Patterns in Media Content
Nello Cristianini, University of Bristol
What can we learn about the world (and the media system) by analysing millions of news articles or tweets ? Media content analysis has historically been the domain of the social sciences, but recently we are witnessing a strong trend towards the automation of many tasks, paving the way for a new - computational - approach to social science and even the humanities.
In this talk we will survey the results obtained over the past 5 years at the Intelligent Systems Laboratory of Bristol, in the area of automating the analysis of news media content. By combining techniques from machine translation, pattern recognition, statistical learning, information retrieval, we will analyse patterns connected to the US Presidential Elections, to UK public opinion, and to EU cultural biases.
Auctions and Allocations with Positive Network Externalities
Kamesh Munagala, Duke University
With the advent of social networks such as Facebook, Twitter, and LinkedIn, and online offers/deals web sites, network externalties raise the possibility of marketing and advertising to users based on influence they derive from their neighbors in such networks. Indeed, a user's valuation for a product is changed by his knowledge of which of his neighbors likes or owns the product. Designing allocation schemes and auctions in the presence of network externalities is not very well understood, and poses many challenges that are hard to address with traditional techniques. We show that even in very simple settings, optimal auction design becomes APX-Hard, and pricing schemes may not admit to Nash equilibria in buyer behavior. On the positive side, we present approximation algorithms for welfare maximization and optimal auctions in both single item and multi-item settings via interesting linear programming relaxations.
Joint work with Anand Bhalgat and Sreenivas Gollapudi (working paper); and with Nima Haghpanah, Nicole Immorlica, and Vahab Mirrokni (EC 2011).
Adolescent Societies - Their Form, Evolution, and Variation
Daniel McFarland, Stanford University
Adolescent societies - whether arising in weak, short-term classroom friendships or close, long-term friendships - exhibit tendencies toward segregation, hierarchy and clustering. It is difficult, however, to explain the macro-variation we observe in these social networks drawing only on tie formation micro-mechanisms. Some adolescent societies are rank-ordered caste systems and others are flat, cliquish worlds. What accounts for this variation? We propose and test an ecological theory of social network formation where features of organizational environments moderate tie formation processes to account for the macro-structural variation across settings. We develop this argument using longitudinal friendship data on schools (Add Health study) and classrooms (Classroom engagement study), and by extending exponential random graph models to the study of multiple societies over time.
Joint work with James Moody, David Diehl, Jeff Smith, R. Jack Thomas
The Simple Economics of Approximately Optimal Auctions
Jason Hartline, Northwestern University
Systems wherein strategic agents compete for limited resources are ubiquitous: the economy, computer networks, social networks, congestion networks, etc. Auction theory governs the design and
analysis of these systems. I will describe a method for using approximation in auction theory to identify simple, practical, and robust auction mechanisms that perform nearly as well as the complex optimal ones. A main goal of this approach is to understand the extent to which important intuition from optimal auctions for ideal models extends to approximately optimal auctions
for more realistic models.
Auction theory is well understood only in ideal models where agents have single-dimensional, linear preferences. I.e., each agent has a value for receiving a service and her utility is the difference
between her value and her payment. For these models optimal auctions can be interpreted as "marginal revenue" maximizers (Myerson, 1981; Bulow and Roberts 1989). In more realistic models, i.e., where bidders have multi-dimensional preferences for different outcomes or non-linear utility, similar intuitive characterizations are unknown. Understanding good auctions for these environments is considered to be the main challenge in auction theory. In these more realistic environments maximizing marginal revenue may not be optimal, and furthermore, there is sometimes no direct way to
implementing the marginal revenue maximization mechanism. I will present two results: I will give procedures for implementing marginal revenue maximization in general, and I will show that marginal revenue maximization is approximately optimal. The approximation factor smoothly degrades in a term that quantifies how far the environment is from an ideal one (i.e., where marginal revenue
maximization is optimal).
Joint work with Saeed Alaei, Hu Fu, Nima Haghpanah, and Azarakhsh Malekian.
Beyond Keyword Search: Taming Information Overload with Rich User Interactions
Khalid El-Arini, Carnegie Mellon University
It has been almost two decades since the first search engine appeared on the Web. In that time, much of our lives has drifted online, from the news we read to the products we buy to our relationships with friends, family and colleagues. As such, information overload has become a ubiquitous problem that affects nearly all members of society, as we try to sift through millions of articles and status updates to get an understanding of our world. However, while our information needs have grown more complex and varied over time, the traditional information retrieval paradigm has scarcely changed: users still submit strings of keywords, and receive a ranked list of hyperlinks in return.
In this talk, I argue the need for new models of user interaction that are better suited to modern retrieval tasks. Some tasks, such as reading the news, can be inherently queryless, while others, such as conducting scientific research, require complex queries. In both cases, a user's personal tastes and beliefs can color the results he or she expects to see. Here, I present a general information retrieval framework that leverages ideas from submodular optimization and probabilistic graphical models to unify both settings. I show results in the news and science domains demonstrating that our approach outperforms state-of-the-art alternatives from Google and Yahoo! on real users.
Finally, as personalization itself has become ubiquitous on the Web, it is important to provide users with an interaction mechanism that allows them to inspect--and potentially correct--inferred user profile information. I present recent work that learns such transparent user profiles from user behavior in a large-scale social network.
Optimal targeting strategies over social networks
Kostas Bimpikis, Stanford University
Recent advances in information technology have allowed ^Lrms to gather vast amounts of data regarding consumers' preferences and the structure and intensity of their social interactions. The question that arises naturally is whether ^Lfirms can intelligently use the available data to improve their business strategies. In this presentation, we take a step towards this direction by discussing our results on optimal price discrimination and targeted advertising over networks. In particular, in the ^Lrst part of the talk, we study the pricing problem of a monopolist selling a divisible good (service) to consumers who are embedded in a social network. We derive a characterization of the optimal price for each consumer as a function of her network position and illustrate the value of knowing the underlying network structure by comparing the pro^Lts of a monopolist who does not take the network into account to those of a monopolist who uses this information optimally.
The second part of the talk examines a game-theoretic model of competition between ^Lfirms, which can target their marketing budgets to individuals. We provide a sharp characterization of the optimal targeted marketing strategies and highlight their dependence on the consumers' references as well as on the underlying social network structure. In particular, firms find it optimal to allocate their marketing budgets to consumers in proportion to their \network centrality", a measure of social influence. Moreover, we identify network structures for which targeted advertising is more beneficial for the ^Lfirms and, ^Lnally, we show how the difference in the initial budgets affect the outcome of the marketing competition between them.
Individual Heterogeneity and the Formation of Collaboration Networks
Katharine Anderson, Carnegie Mellon University
Collaboration drives innovation in a variety of contexts, including product development, entrepreneurship, policy-making, and academic research. Collaboration networks--in which two people are connected if they work together on a problem--provide a valuable tool for understanding the structure of these collaborative communities. Empirically, individuals in collaboration networks play a wide variety of roles--some individuals have many links, while others have few links, some bridge between communities while others play a more peripheral role. From a social scientific perspective, we would like to believe that the heterogeneity we observe in social network position reflects some kind of underlying heterogeneity in individuals--that individuals occupy different positions in the social network, not because they were lucky, but because of some identifiable underlying characteristics. However, our current models of network formation use homogeneous agents, and thus cannot be used to answer questions about individual heterogeneity and network structure.
Here, I introduce a model of network formation in which individuals with heterogeneous skill sets choose their collaborators in order to gain access to skills that are complementary to their own. Together, these collaborative connections comprise a collaboration network. This class of models is interesting, because it connects individual heterogeneity to network heterogeneity, and thus can be used to answer questions about 1) how an individual's skill set affects her position on the social network and 2) how the distribution of skills in the population affects the overall structure of the network. Using this model, I show that an individual's degree on the collaboration network is a supermodular function of her set of skills. As a result, the degree distribution of the network may be highly skewed, even when the distribution of skills in the population is not. This skew becomes more pronounced as individuals in the network have fewer of the skills required to solve their problems--that is, as problems get more difficult. I then use an extension of this model to make predictions about the network position of specialists--individuals whose skills lie within a single subject area--and generalists--individuals whose skill bridge different subject areas. I show that while specialists tend to have more links than generalists, generalists are more likely to bridge between communities. I also show that as disciplinary boundaries fade, the degree distribution of the network becomes more skewed towards a few, high-degree superstars.
DisturbanceFeedback Policies in Dynamic Robust Optimization, with Applications toSupply Chain Design and Revenue Management.
Dan Iancu, Stanford University
Robust optimization has recently emerged as a computationallytractable and scalable paradigm for modeling complex decision problems underuncertainty. Its applications span a wide range of challenging problems infinance, pricing, supply chain management or network design. In the context of dynamicrobust decision models, a typical approach is to look for control policies (ordecision rules) that are directly parameterized in the uncertaintiesaffecting the model. This has the advantage of leading to simple convexoptimization problems for finding particular classes of rules (e.g., affine).However, the approach typically yields suboptimal policies, and is hard toanalyze. In the first part of the talk, we seek to bridge this paradigm withthe classical Dynamic Programming (DP) framework for solving dynamic decision problems. We provide a set of unifying conditions (based onthe interplay between the convexity and supermodularity of the DP valuefunctions, and the lattice structure of the uncertainty sets) that, takentogether, guarantee the optimality of the class of affine decision rules, and furthermore allow suchrules to be found very efficiently. Ourresults suggest new modeling paradigms for dynamic robust optimization, and ourproofs bring together ideas from three areas of optimization typically studied separately: robust,combinatorial (lattice programming andsupermodularity), and global (the theory of concave envelopes). Weexemplify the approach in a model for negotiating flexiblesupply chain contracts between several parties, where optimal capacity,contracting, and ordering policies can be found by solving a simple linearprogram. In the second part of the talk, we consider a class of models arisingin supply chain networks and multi-product pricing, for which we suggest a newclass of polynomial decision rules, found by solving compact semidefinite programs,and typically yielding drastic improvement compared with the status quo. This is joint work with Dimitris Bertsimas, PabloParrilo, Mayank Sharma and Maxim Sviridenko.
Online Concave Matching
Kamal Jain, Microsoft Research
Online matching was introduced by Karp, Vazirani, Vazirani [KVV] in 1990, which over time became a corner stone of online algorithms. In the matching problem of KVV, the edges are unweighted. Mehta, Saberi, Vazirani, and Vazirani [MSVV] and Buchbinder, Naor, and Jain, developed algorithms for the weighted version of online matching but at the expense of finding a fractional solution. In many large scale instances, such as matching ads to consumers, fractionality is not a major compromise.
The major compromise is that the existing work allows only hard budget constraint, e.g., so many matches per node or a linear utility function bounded above by a budget constraint, which essentially gives you a very specific concave utility function. In practice people have concave utility functions.
In this work we obtain an online algorithm for concave matching problem. The performance of our algorithm is optimal. We use a convex programming duality. Using variational calculus we show how the worst possible example is dual to the best possible algorithm, thereby proving the optimality of the performance guarantee of our algorithm. In hind sight, one can reconstruct the algorithm by studying the worst possible example itself.
Credits: This is a joint work with Nikhil Devanur.
Acknowledgement: This is a work, I had been doing for the last 7 years. Mark Braverman, Alexander Holroyd, Yuval Peres, and Oded Scramm are some of the people who have guided us.
Price Prediction Strategies for Simultaneous One-Shot Auctions
Michael Wellman, University of Michigan
Bidding in simultaneous auctions is challenging because an agent's value for a given good may depend on the outcome of other auctions. Given the lack of known solutions to general simultaneous auction games, previous works have addressed this classic exposure problem with heuristic strategies employing probabilistic price predictions. We introduce a concept of self-confirming prices, and show that within an independent private value model, bidding optimally with respect to self-confirming price predictions is w.l.o.g. in equilibrium. We exhibit practical procedures to compute approximate equilibrium strategies, and demonstrate via empirical game-theoretic analysis that these strategies are effective in simultaneous auction environments with both complementary and substitutable preference structures.
** This is joint work with Amy Greenwald and Eric Sodomka of Brown University.
Lattice Games and the Economics of Aggregators
George Varghese, UCSD and Yahoo! Research
We model the strategic decisions of web sites in content markets,
where sites may reduce user search cost by aggregating content.
Example aggregations include political news, technology, and other
niche-topic websites. We model this market scenario as an extensive
form game of complete information, where sites choose a set
of content to aggregate and users associate with sites that are nearest
to their interests.
Thus, our scenario is a location game in which sites choose to
aggregate content at a certain point in user-preference space, and
our choice of distance metric, Jacquard distance, induces a lattice
structure on the game. We provide two variants of this scenario:
one where users associate with the first site to enter amongst sites
of equal distances, and a second where users choose uniformly between
sites at equal distances. We show that Subgame Perfect Nash
Equilibria exist for both games. While it appears to be computationally
hard to compute equilibria in both games, we show a
polynomial-time satisficing strategy called Frontier Descent for the
first game. A satisficing strategy is not a best response but ensures
that earlier sites will have positive profits, assuming all subsequent
sites also have positive profits. By contrast, we show that the second
game has no satisficing solution.
**Joint work with P. Jordan, U. Nadav, K. Punera and A. Skrzypacz at Yahoo
Competitive Contagion in Networks
Michael Kearns, University of Pennsylvania
We examine a game-theoretic model of "competitive contagion" in networks,
where two competing companies or other entities have limited budgets to seed
initial infections in an underlying social network, which then mediates stochastic
contagion. The payoff to each party is their final number of infections, which at
equilibrium may come at the expense of the other party. In this model we provide
characterizations of the Price of Anarchy and a new notion called the Price of Budgets.
These characterizations depend on properties of the local stochastic dynamics of
contagion, and in some cases exhibit sharp threshold behavior.
**Joint research with Sanjeev Goyal of the University of Cambridge.
Building an Institutional Field to Corral a Government
Stephen Barley, Stanford University
This talk will describe how the American business community either founded or revitalized eight distinct populations of organizations during the late 1970s and the 1980s which intentionally and unintentionally formed an institutional field (a network of organizations) designed to influence the federal government and create a more business friendly political environment.
Cooperation and Assortativity with Endogenous Partner Selection
Siddharth Suri, Yahoo! Research
The natural tendency for humans to choose with whom to form new relationships and with whom to end established relationships is thought to facilitate the emergence of cooperation. Helping cooperators to mix assortatively is believed to reinforce the rewards accruing to mutual cooperation while simultaneously excluding defectors. However, the relationship between endogenous partner selection, assortativity, and cooperation has been largely unexplored experimentally. Here we report on a series of human subjects experiments in which groups of 24 participants played a multi-player prisoner's dilemma game where, critically, they were also allowed to propose and delete links to players of their own choosing at some variable rate. Over a wide variety of parameter settings and initial conditions, we found that endogenous partner selection significantly increased the level of cooperation, the average payoffs to players, and the assortativity between cooperators. Even relatively slow update rates were sufficient to produce large effects resulting in cooperation levels over 80%. Subsequent increases to the update rate still had a positive, although smaller, effect. For standard prisoner's dilemma payoffs, we also found that assortativity resulted predominantly from cooperators avoiding defectors, not by severing ties with defecting partners, and that cooperation correspondingly suffered. Finally, by modifying the payoffs to satisfy two novel conditions, we found that cooperators did punish defectors by severing ties, leading to levels of cooperation approaching 100% which persisted for longer.
**Joint work with: Jing Wang (NYU) and Duncan Watts (Yahoo! Research)
Two(!) Good To Be True
Sergiu Hart, The Hebrew University of Jerusalem
How to sell goods optimally? While the mechanism design literature
has solved this problem neatly when there is only one good, the
multiple goods case turns out to be extremely difficult,
mathematically and conceptually. Much of what is true for one good
does not extend to multiple goods. We will try to explain the
difficulties, show what can go wrong, and then present some universal
approximation results. The talk is essentially self-contained; no
background in mechanism design is necessary.
**Joint work with Noam Nisan and Phil Reny
Identifying peer effects in online communication behaviors
Dean Eckles, Facebook
Peer effects can produce clustering of behavior in social networks,
but so can homophily and common external causes. For observational
studies, adjustment and matching estimators for peer effects require
often implausible assumptions, but it is only rarely possible to
conduct appropriate direct experiments to study peer influence in
situ. We describe research designs in which individuals are randomly
encouraged to perform a focal behavior, which can subsequently
influence their peers. Ubiquitous product optimization experiments on
Internet services can be used for these analyses. This approach is
illustrated with an analysis of peer effects in expressions of
gratitude via Facebook on Thanksgiving Day 2010, with implications for
the micro-foundations of culture.
Networks, communities and the ground-truth
Jure Leskovec
The Web, society, information, cells and brain can all be represented
and studied as complex networks of interactions. Nodes in such
networks tend to organize into clusters and communities, which
represent the fundamental structures for understanding the
organization of complex systems. Even though detection of network
communities is of significant importance for computer science,
sociology and biology, our understanding of the community structure of
large networks remains limited.
We study a set of more than 200 large networks with the goal to
understand and identify communities in networks. We challenge the
conventional view of network community structure and show that it is
not exhibited by the large real-world networks. We then present a new
conceptual model of network community structure, which reliably
captures the overall structure of networks and accurately identifies
the overlapping nature of network communities.
This is joint work with Jaewon Yang.
Computing Game-Theoretic Solutions for Security
Vincent Conitzer
Algorithms for computing game-theoretic solutions are now deployed in
real-world security domains, notably air travel. These applications
raise some hard questions. How do we deal with the equilibrium
selection problem? How is the temporal and informational structure of
the game best modeled? What assumptions can we reasonably make about
the utility functions of the attacker and the defender? And, last but
not least, can we make all these modeling decisions in a way that
allows us to scale to realistic instances? I will present our ongoing
work on answering these questions.
Joint work with Dmytro Korzhyk, Joshua Letchford, Kamesh Munagala,
Ronald Parr (Duke); Manish Jain, Zhengyu Yin, Milind Tambe (USC);
Christopher Kiekintveld (UT El Paso); Ondrej Vanek, Michal Pechoucek
(Czech Technical University); and Tuomas Sandholm (CMU).
Brief bio: Vincent Conitzer is the Sally Dalton Robinson Professor of
Computer Science and Professor of Economics at Duke University. His
research focuses on computational aspects of microeconomic theory, in
particular game theory, mechanism design, voting/social choice, and
auctions. He recently received the IJCAI Computers and Thought Award,
which is awarded to outstanding young scientists in artificial
intelligence.
Revenue-optimal Auctions
Costis Daskalakis
In his seminal paper, Myerson [1981] provides a revenue-optimal auction for
a seller who is looking to sell a single item to multiple bidders.
Unfortunately, Myerson's auction generalizes to a limited range of domains,
called "single-parameter", where each bidder's preference over the auction’s
outcomes is specified by a single parameter. Indeed, extending this auction
to "multi-parameter domains", such as selling multiple heterogeneous items
to bidders, has been one of the most central problems in Mathematical Economics.
We solve this problem in bidder- and item- symmetric settings. For a single
bidder, we solve the general problem.
(Based on joint work with Yang Cai and Matt Weinberg)
Bio: Constantinos (or Costis) Daskalakis is an Assistant Professor of EECS at
MIT. He completed his undergraduate studies in Greece, at the National Technical
University of Athens, and obtained a PhD in Computer Science from UC Berkeley.
After Berkeley he was a postdoctoral researcher in Microsoft Research New
England, and since 2009 he has been at the faculty of MIT. His research
interests lie in Algorithmic Game Theory and Applied Probability, in particular
computational aspects of markets and the Internet, social networks, and
computational problems in Biology. Costis has been honored with a 2007 Microsoft
Graduate Research Fellowship, the 2008 Game Theory and Computer Science Prize
from the Game Theory Society, the 2008 ACM Doctoral Dissertation Award, a NSF
Career Award, a 2010 Sloan Foundation Fellowship in Computer Science, the 2011
SIAM Outstanding Paper Prize, and the MIT Ruth and Joel Spira Award for
Distinguished Teaching.
A Random Graph Model of Multi-Hospital Kidney Exchanges
David Parkes
Multi-hospital kidney exchanges, where hospitals share local lists of
donor-patient pairs with a centralized exchange, promise to faciliate
larger matches, and thus additional transplants. But a central challenge
is to align incentives, so that hospitals will choose to share all pairs
with the exchange. We present a random-graph model of feasible
transplants, and use the model to (a) explain early experimental results
on 2-way and 3-way exchanges anayltically, (b) derive a "square-root law"
for welfare gains from centralized pools, and (c) design a matching
mechanism that is Bayes-Nash incentive compatible, efficient and
individual rational under reasonable assumptions. The robustness of the
mechanism is established through a computational study, validating
incentive alignment, and estimating an efficiency loss due to incentive
alignment of 1-4% for 2-way exchanges and 5-19% for 3-way exchanges (and
decreasing with larger pools.)
Joint work with Panos Toulis
A Test of the "Fifteen Minutes" Hypothesis using Large-Scale Data from Newspapers and Blogs
Arnout van de Rijt
Contemporary scholars of fame argue that public attention to persons is short-lived. We
investigate this "fifteen minutes" hypothesis in a unique data source containing daily records of references to millions of names in 2,500 English-language newspapers, on TV station websites, and on blogs. Longitudinal analysis shows that instead fame tends to carry
over from one year to the next, except below some threshold level of marginal public attention where fame does appear to be fleeting. This pattern of above-threshold annual stability is found back even in newspaper entertainment sections, celebrity-oriented
tabloids, and blogs, where fame is thought to be most ephemeral. Analysis of archival newspaper data reveals that while coverage of small names follows a pattern of rapid decay, big names traverse a career-like trajectory spanning many years. We take this evidence
to suggest that fame is created through an interaction between static social hierarchies and dynamic reinforcement processes, whereby a population is pre-stratified in terms of the chances that a cascade of public celebration will latch on to one of its members.
Treasure Hunt
Markus Mobius
We seed a large real-world social network with binary information and analyze
subsequent social learning. A unique feature of our field experiment is that
we measure both the pre-existing social networks and the actual conversation
network. Our experiment allows us to test how rational agents behave when
processing information that originates within their social network. We find
that information decays quickly with social distance and that agents mainly
incorporate information within social distance 2. Conversations through common
friends do not increase the weight that a subject places on signals from
direct friends but linearly increases the weight on signals from indirect friends.
This suggests that agents are able to avoid double-counting information from
indirect friends. We propose a simple “streams model” of social learning that
is consistent with the evidence from our experiment.
Task Routing for Prediction Tasks
Yiling Chen
We describe methods for routing a prediction task on a network where each participant
can contribute information and route the task onwards. We introduce routing scoring
rules that bring truthful contribution of information and optimal routing of the task into
a Perfect Bayesian Equilibrium under common knowledge about the amount of information
available to each agent on the network. Relaxing the common knowledge assumption, we
address the challenge of routing in situations where states of information differ among
agents. We introduce a family of local routing rules, which isolate routing decisions that
depend only on local information in equilibrium, while still promoting effective routing.
Local routing rules are the only routing scoring rules that induce truthful equilibria in
which agents' routing decisions rely solely on local information. Simulation results show
that local routing decisions can lead to effective task routing.
Joint work with Haoqi Zhang, Eric Horvitz, and David Parkes.
Bargaining Theory in the Cloud
Eric Friedman
The axiomatic theory of bargaining solutions was initiated
by John Nash with his seminal paper in 1950 and has a long
and mostly mathematical history. Surprisingly, it arises
naturally in a variety of allocation problems arising in
cloud computing. For example, the second most famous
bargaining solution, the Kalai-Smorodinsky solution, is the
outcome of a simple water filling algorithm used in the
Mesos Platform and has many strong properties in that
setting, including incentive compatibility and fairness.
In this talk, I will explore these connections for a
variety of cloud computing problems and show how axiomatic
bargaining theory can be used to analyze allocation
problems in the cloud and conversely how cloud computing
sheds new light on axiomatic bargaining theory.
This talk is based on joint work with Ali Ghodsi, Scott
Shenker and Ion Stoica.
Network Patterns of Favor Exchange
Matthew Jackson
Abstract: We examine the informal exchange of favors among
the members of a society. We analyze a game where the fear of
losing multiple relationships can provide incentives for favor
provision. We characterize the network patterns of exchange
that are robust in a sense that deleted relationships only
result in a local loss of favor exchange. Such robustness
necessitates networks such that all links are "supported":
any pair of individuals exchanging favors must have a common
friend. We show that favor exchange networks in 75 villages in
rural India exhibit a frequency of this sort of support that
is significantly higher than a standard network 'clustering'
measure.
Co-authors Tomas Rodriguez-Barraquer and Xu Tan
Simple Auctions with Near-Optimal Equilibria
Tim Roughgarden
In practice, auction designers often exchange
incentive-compatibility for simplicity (e.g., in sponsored
search auctions or combinatorial auctions). How much welfare
loss does this design choice cause? I'll talk about some
techniques for answering this question.
Based largely on a SODA '11 paper with Kshipra Bhawalkar.
Efficient Online Ad Serving in a Display Advertising Exchange
Kevin Lang
We begin with a tutorial overview of the basic concepts of Non-Guaranteed
(i.e. spot-market) display advertising and advertising exchanges, then pose
and solve a constrained path optimization problem that lies at the heart of
the real-time ad serving task in Yahoo!'s NGD Ad Exchange.
In Yahoo!'s Exchange, the ad server's task for each display opportunity is
to compute, with low latency, an optimal valid path through a directed graph
representing the business arrangements between the numerous business
entities that belong to the Exchange. These entities include not only
publishers and advertisers, but also intermediate entities called ``ad
networks'' which have delegated their ad serving responsibilities to the
Exchange.
Path validity is determined by business constraints which focus on the
following three issues: 1) suitability of the display opportunity's web page
and its publisher 2) suitability of the user who is currently viewing that
web page, and 3) suitability of a candidate ad and its advertiser.
Path optimality is determined by money reaching the publisher, and is
affected not only by an advertiser's bid, but also by the revenue-sharing
agreements between the entities in the candidate path.
We describe two different algorithms that have both been used in the actual
Yahoo! non-guaranteed ad serving system, focusing on typical case rather
than worst case performance, and on the optimality-preserving speedup
heuristics that allow latency targets to be met.
The talk is based on a WSDM 2011 paper that was co-written with Joaquin
Delgado, Dongming Jiang, Bhaskar Ghosh, Shirshanka Das, Amita Gajewar,
Swaroop Jagadish, Arathi Seshan, Chavdar Botev, Michael Binderberger-Ortega,
Sunil Nagaraj, and Raymie Stata.
Designing Algorithms for MapReduce
Sergei Vassilvitskii
The amount of data available today has lead to analysis problems whose size
would be unimaginable just 10 years ago. With datasets routinely measured in
gigabytes and terabytes, one challenge that has emerged is scaling algorithms
to handle this deluge of information. With chip designers sustaining Moore’s
law by turning to parallelism, so have algorithmicists turned to parallelism
to speed up algorithms and improve performance. In this talk, we will focus
on the MapReduce model of parallel computing, highlight some of the recent
algorithmic advances and present current challenges.
Based on joint work with Howard Karloff, Silvio Lattanzi, Ben Moseley and Siddharth Suri.
Incentivizing High-quality User-Generated Content
Arpita Ghosh
We model the economics of incentivizing high-quality user generated content
(UGC), motivated by settings such as online review forums, question-answer
sites, and comments on news articles and blogs. We provide a game-theoretic
model within which to study the problem of incentivizing high quality UGC, in
which contributors are strategic and motivated by exposure. Our model has
the feature that both the quality of contributions {\em as well as} the extent
of participation is determined endogenously in a free-entry Nash equilibrium.
The model predicts, as observed in practice, that if exposure is independent
of quality, there will be a flood of low quality contributions in equilibrium.
An ideal mechanism in this context would elicit both high quality and high
participation in equilibrium, with near-optimal quality as the available attention
diverges, and should be easily implementable in practice. We consider a very
simple elimination mechanism, which subjects each contribution to rating by
some number $A$ of viewers, and eliminates any contributions that are not
uniformly rated positively. We construct and analyze free-entry Nash equilibria
for this mechanism, and show that $A$ can be chosen to achieve quality that
tends to optimal, along with diverging participation, as the number of viewers
diverges.
Joint work with Preston McAfee.
Coevolution of Network Structure and Content
Lada Adamic
As individuals communicate, their exchanges form a dynamic network. A
time series analysis of both standard and novel network metrics can be
used to characterize a network's evolution. We demonstrate that network
structure alone can be highly revealing of the diversity and novelty of
the information being communicated in the network. Networks with a
higher conductance exhibit higher information entropy, while unexpected
network configurations are indicative of higher information novelty.
This is joint work with Liuling Gong, Edwin Teng, and Avishay Livne.
TBA
Damon Centola
TBA
BGP Safety with Spurious Updates - The Conditions of BGP Convergence
Martin Suchara
We explore BGP safety, the question of whether a BGP system converges to a stable routing, in
light of several BGP implementation features that have not been fully included in the previous
theoretical analyses. We show that Route Flap Damping, MRAI timers, and other intra-router features
can cause a router to briefly send ?spurious? announcements of less-preferred routes. We demonstrate
that, even in simple configurations, this short-term spurious behavior may cause long-term divergence
in global routing. We then present DPVP, a general model that unifies these sources of spurious
announcements in order to examine their impact on BGP safety. In this new, more robust model of BGP
behavior, we derive a necessary and sufficient condition for safety, which furthermore admits an efficient
algorithm for checking BGP safety in most practical circumstances -- two complementary results that have
been elusive in the past decade's worth of classical studies of BGP convergence in more simple models. We
also consider the implications of spurious updates for well-known results on dispute wheels and safety under
filtering.
Joint work with Alex Fabrikant and Jennifer Rexford.
Predicting the Popularity of Online Content
Gabor Szabo
Online services that collect, categorize, and disseminate user-generated
content have seemingly very different focus, user communities, and
content types. We present a study on the popularity of content in three
online services, Youtube, Twitter, and Digg. While videos on Youtube,
short messages in Twitter, and interesting articles on Digg do not
necessarily have much in common, it turns out that the patterns of
access to content on any of these services are very similar across the
sites. In particular, we present predictions for the long-term
popularity of content based on early observations, and moreover discuss
what role the social network plays in determining the success of a
submission.
Buy-it-Now or Take-a-Chance: A simple randomized auction mechanism
Elisa Celis
I will first discuss bid data from Microsoft’s AdECN platform, which
allows advertisers to target their bids. We find that the valuations
are irregular, making the commonly used Vickrey auction unsuitable for
extracting revenue. However, Myerson's optimal auction mechanism is
often unsuitable in practice.
I will present and discuss the benefits of a simple auction mechanism
(BIN-TAC) which extends the second-price auction with reserve. This
mechanism is particularly effective in private value environments
where the distribution of valuations are irregular, and is truthful in
expectation. We use the AdECN data in counterfactual experiments that
suggest our BIN-TAC mechanism would improve revenue by 24% relative
to the mechanism currently in place.
Social Learning and the Dynamic Cavity Method
Yashodhan Kanoria
Social learning constitutes learning of behavior from interaction with
friends/neighbors on a network. This talk will focus on models of repeated
interaction, with agents 'voting' in a series of rounds on some issue of
interest. Votes in the initial round are based on 'private signals',
whereas votes in future rounds incorporate knowledge of previous votes
cast by friends.
We look at two different models of iterative learning. A very simple model
is 'majority dynamics' where agents choose their vote based on the majority
of neighbors' votes in the previous round. We analyze this model on regular
trees. At the other extreme is a iterative Bayesian learning: a fully
rational model introduced by Gale and Kariv (2003). We introduce new
algorithms for this model, challenging the widespread belief that learning
is computationally intractable in this setting. We develop a novel
technique -- the 'dynamic cavity method', that serves as a key tool for
both models.
Based on joint work with Andrea Montanari (to appear in Ann. App. Prob.) and
Omer Tamuz (submitted).
The Complexity Ecology of Consensus Ranking
Matthias Mnich
Preference lists are extensively used in social science surveys and voting
systems to capture information about choice. In many such scenarios there
arises the need to combine the data represented by many lists into a single
list which reflects the opinion of the surveyed group as much as possible.
This yields a class of ranking aggregation problems, where we are given a
set of permutations (votes) over a set of alternatives (candidates), and
are asked for a permutation of the set candidates such that some objective
is minimized.Examples include Kemeny ranking and other consensus methods,
that find important applications in web search engines, e-mail spam detection,
and user recommendation systems.
We present results on this topic from a multivariate complexity and algorithms
analysis, based on parameterized complexity.
(Joint work with Henning Fernau, Fedor Fomin, Daniel Lokshtanov, Geevarghese Philip
and Saket Saurabh)
Contribution Games in Social Networks
Elliot Anshelevich
We consider network contribution games, where each agent in a social
network has a budget of effort that it can contribute to different
collaborative projects or relationships. Depending on the contribution of
the involved agents a relationship will flourish or drown, and to measure
the success we use a reward function for each relationship. Every agent is
trying to maximize the reward from all relationships that it is involved
in. We consider pairwise equilibria of this game, and characterize the
existence, computational complexity, and quality of equilibrium based on
the types of reward functions involved. For example, when all reward
functions are concave, we prove that the price of anarchy is at most 2.
For convex functions the same only holds under some special but very
natural conditions. A special focus of the talk are minimum effort games,
where the reward of a relationship depends only on the minimum effort of
any of the participants. Finally, we show tight bounds for approximate
equilibria and convergence of dynamics in these games.
Liquidity in Credit Networks: A Little Trust Goes a Long Way
Pranav Dandekar
Credit networks represent a way of modeling trust between entities in
a network. Nodes in the network print their own currency and trust
each other for a certain amount of each other's currency. This allows
the network to serve as a decentralized payment infrastructure---
arbitrary payments can be routed through the network by passing IOUs
between trusting nodes in their respective currencies---and obviates
the need for a common currency. Nodes can repeatedly transact with
each other and pay for the transaction using trusted currency. A
natural question to ask in this setting is: how long can the network
sustain liquidity, i.e. how long can the network support the routing
of payments before credit dries up? We answer this question in terms
of the long term success probability of transactions for various
network topologies and credit values.
We show that the success probability of transactions is independent of
the path used to route .ow between nodes. For symmetric
transaction rates, we show analytically and via simulations that the
success probability for complete graphs, Erdos-Renyi graphs and
preferential attachment graphs goes to one as the size, the density or
the credit capacity of the network increases. Finally, we compare
liquidity in credit networks to that in a centralized payment
infrastructure and show that credit networks are at most a constant
factor worse; thus we do not lose much liquidity in return for their
robustness and decentralized properties.
Joint work with Ashish Goel, Ramesh Govindan and Ian Post.
User Browsing Models: Relevance versus Examination
Sugato Basu
There has been considerable work on user browsing models for search
engine results, both organic and sponsored. The click-through rate
(CTR) of a result is the product of the probability of examination
(will the user look at the result) times the perceived relevance of
the result (probability of a click given examination). Past papers
have assumed that when the CTR of a result varies based on the
pattern of clicks in prior positions, this variation is solely due
to changes in the probability of examination.
We show that, for sponsored search results, a substantial portion
of the change in CTR when conditioned on prior clicks is in fact
due to a change in the relevance of results for that query instance,
not just due to a change in the probability of examination. We then
propose three new user browsing models, which attribute CTR changes
solely to changes in relevance, solely to changes in examination
(with an enhanced model of user behavior), or to both changes in
relevance and examination. The model that attributes all the CTR
change to relevance yields substantially better predictors of CTR
than models that attribute all the change to examination, and does
only slightly worse than the model that attributes CTR change to
both relevance and examination. For predicting relevance, the model
that attributes all the CTR change to relevance again does better
than the model that attributes the change to examination. Surprisingly,
we also find that one model might do better than another in predicting
CTR, but worse in predicting relevance. Thus it is essential to
evaluate user browsing models with respect to accuracy in predicting
relevance, not just CTR.
Additive Groves: an Ensemble for Regression, Interaction Detection and Learning to Rank
Daria Sorokina
Ensembles of regression trees are long known as powerful models suitable
for a large variety of tasks. The most well-known ensembles are Random
Forests (averaging independent large, highly non-linear trees) and Gradient
Boosting (additive model built from small trees). In the beginning of the talk
I will describe an ensemble that combines the benefits of both: Additive Groves
is based on regression trees, additive models and bagging and is capable of both
fitting additive structure of the problem and modelling its highly nonlinear
components with very large trees at the same time. Combination of these
properties makes Additive Groves superior in performance to other existing tree
ensemble methods.
Models with best performance often act as black boxes: they make good predictions,
but do not provide much insight into the decision making process. This is not
satisfactory for many data mining problems. Therefore separate post-processing
techniques are needed to answer the questions about important features, their
effects and interactions.
The term statistical interaction is used to describe the presence of non-additive
effects among two or more variables in a function. When variables interact, their
effects must be modeled and interpreted simultaneously. Thus, detecting statistical
interactions can be critical for an understanding of processes by domain
researchers. In this talk I will describe an approach to interaction detection
based on comparing the performance of unrestricted and restricted prediction models,
where restricted models are prevented from modeling an interaction in question. I
will show that Additive Groves has the required properties and can be used within
this framework.
I will also talk about several applications of these techniques to real-world data:
tracking environment change based on birds abundance and Salmonella risk prediction
on meat-processing establishments. Apart from that, Additive Groves has been
successfull in several recent competitions: KDD Cup'09, ICDM'09 and Yahoo Learning
to Rank Challenge. The latter result is especially interesting, because Additive
Groves was the only winning solution that did not optimize for any specific ranking
metrics.
Algorithms presented in the talk are implemented as a part of an open source C++
package available at http://additivegroves.net
At the end of the talk I will make a short presentation about Yandex and its
California office (http://yandexlabs.com). We are hiring!
Reflections on Link-based Ranking
Marc Najork
This talk focuses on using hyperlinks in the ranking of web search results. We give
a brief overview of the vast body of work in the area; we provide a quantitative
comparison of the different features; we sketch how link-based ranking features can
be implemented in large-scale search engines; and we identify promising avenues for
future research.
Output URL Bidding
Panagiotis Papadimitriou
Output URL bidding is a new bidding mechanism for sponsored search,
where advertisers bid on search result URLs, as opposed to keywords
in the input query. For example, an advertiser may want his ad to
appear whenever the search result includes the sites www.imdb.com
and en.wikipedia.org, instead of bidding on keywords that lead to
these sites, e.g., movie titles, actor names, and so on. In this
paper we study the tradeoff between the simplicity and the
specification power of output bids and we explore their utility for
advertisers. We first present a model to derive output bids from
existing keyword bids. Then, we use the derived bids to experimentally
study output bids and contrast them to input query bids. Our main
results are the following: (1) Compact output bids that mix both URLs
and hosts have the same specification power as more lengthy input bids;
(2) Output bidding can increase the recall of relevant queries; and
(3) Output and input biding can be combined into a hybrid mechanism
that combines the benefits of both.
A Technical Overview of Search Ads Quality at Google
Roberto Bayardo
Abstract: Ads quality involves optimizing the oftentimes competing
interests of Google, the users of its search engine, and advertisers
who pay to have their ads displayed on search results pages. In this
talk I describe the primary problems faced by the Google ads quality
team, and some of the general approaches we take to address them. My
goal of the talk is to introduce others to problems of interest beyond
the better studied problems of click through estimation and mechanism
design. The topics covered will include keyword targeting & bid
specification, budget allocation, user based ads quality, ad formats,
when to show an ad and where, and assessing quality of ad creatives,
their landing pages, and the entire "post click" experience.
Bio: Roberto Bayardo is a principal research scientist at Google,
where he has been working within the ads quality and ads serving teams
for the past 5 years. Roberto currently leads a team working on
quality modeling, with the broad goal of ensuring that ads displayed
alongside Google search provide a good user experience.
On the Competitive Ration of Online Sampling Auctions
Georgios Pierrakos
We study online profit-maximizing auctions for digital goods with adversarial
bid selection and uniformly random arrivals. Our goal is to design auctions
that are constant competitive with F^(2); in this sense our model lies at the
intersection of prior-free mechanism design and secretary problems. We first
give a generic reduction that transforms any offline auction to an online one,
with only a loss of a factor of 2 in the competitive ratio; we then present
some natural auctions, both randomized and deterministic, and study their
competitive ratio; our analysis reveals some interesting connections of one
of these auctions with RSOP, which we further investigate in our final section.
The importance of ties
Paolo Parigi
For sociologists, the ties that join individuals into relationships are the
result of social processes rather than being entities themselves. A social
network is therefore a representation for something that happens between
individuals and that brings people together. Sometimes, however, social
processes sever ties and produce an increase in the distance that separates
individuals, ultimately generating isolation. In my talk I will first
provide a few examples of social processes that generate ties and then I
will advance a theoretical scenario for studying processes that instead
decrease the number of ties. The positive examples of tie generation come
from my work in the fields of political sociology and organization theory.
The negative scenario of increasing isolation comes from my work on social
network analysis. In presenting these examples, my goal is to stimulate the
conversation between social scientists and computer scientists in the field
of network analysis.
How opinions are received by online communities: a case study on amazon.com helpfulness votes
Gueorgi Kossinets
There are many on-line settings in which users publicly express opinions. A
number of these offer mechanisms for other users to evaluate these opinions;
a canonical example is Amazon.com, where reviews come with annotations like
"26 of 32 people found the following review helpful." Opinion evaluation
appears in many off-line settings as well, including market research and
political campaigns. Reasoning about the evaluation of an opinion is
fundamentally different from reasoning about the opinion itself: rather than
asking, "What did Y think of X?", we are asking, "What did Z think of Y's
opinion of X?" Here we develop a framework for analyzing and modeling opinion
evaluation, using a large-scale collection of Amazon book reviews as a
dataset. We find that the perceived helpfulness of a review depends not just
on its content but also but also in subtle ways on how the expressed
evaluation relates to other evaluations of the same product. As part of our
approach, we develop novel methods that take advantage of the phenomenon of
review "plagiarism" to control for the effects of text in opinion evaluation,
and we provide a simple and natural mathematical model consistent with our
findings. Our analysis also allows us to distinguish among the predictions of
competing theories from sociology and social psychology, and to discover
unexpected differences in the collective opinion-evaluation behavior of user
populations from ifferent countries.
Reaching Consensus on Social Networks
Grant Schoenebeck
Research in sociology studies the effectiveness of social networks in achieving computational tasks.
Typically the agents who are supposed to achieve a task are unaware of the underlying social network
except their immediate friends. They have limited memory, communication, and coordination. These
limitations result in computational obstacles in achieving otherwise trivial computational problems.
One of the simplest problems studied in the social sciences involves reaching a consensus among players
between two alternatives which are otherwise indistinguishable.
In this paper we formalize the computational model of social networks. We then analyze the consensus
problem as well as the problem of reaching a consensus which is identical to the majority of the
original signals. In both models we seek to minimize the time it takes players to reach a consensus.
Web-scale Information Extraction Using Wrappers.
Nilesh Dalvi
A significant portion of web pages embed interesting and valuable semantic
content suitable for structured representation. The traditional Information
Extraction techniques, however, fall short of achieving high quality
extraction at Web scale. In this talk, I will outline some of the work going
on at Yahoo! Research on addressing the challenges of Information Extraction
on a Web scale. I will focus on {\em wrapper} based techniques, which
exploit the HTML structure of websites to extract the information of
interest. I will address two problems : (i) making wrappers more robust to
changes in websites, and (ii) enabling learning of wrappers from
automatically obtained noisy training data.s
Characterization and Modeling of Online Browsing Activity.
Andrew Tomkins
I will describe a set of studies of online user behavior based on search and
toolbar logs. In these studies, we propose a new "CCS" taxonomy of pageviews
consisting of Content (news, portals, games, verticals, multimedia),
Communication (email, social networking, forums, blogs, chat), and
Search (web search, item search, multimedia search). We show that roughly
half of all pageviews online are content, 1/3 are communications, and the
remaining 1/6 are search. We study the interarrival distribution of visits
to a page by a single user and in aggregate, and show some evidence that "bursty"
behavior in which a URL attains significant transient popularity for a few days
is not a significant contributor to interarrival times. We also study the nature
and frequency of search queries in more detail, differentiating between web search,
item search, and multimedia search, and characterizing the prevalence of references
to structured objects in web search queries.
Navigation patterns in this data are not well characterized as trails,
but should instead be viewed as trees due to the presence of multi-tab
or multi-window browsing. We present a model of tabbed browsing that
represents a hybrid between a markov process capturing the graph of
web pages, and a branching process capturing the creation, splitting,
and dying of tabs. We present a mathematical criterion to characterize
whether the process has a steady state independent of initial conditions,
and we show how to characterize the limiting behavior in both cases.
We perform a series of experiments to compare our tabbed browsing model
with the simpler random surfer model.
Joint work with Ravi Kumar and Flavio Chierichetti.
User grouping behavior in online forums.
Xiaolin Shi
Online forums represent one type of social media that is particularly rich for
studying human behavior in information seeking and diffusing. The way users join
communities is a reflection of the changing and expanding of their interests
toward information. In this talk, I present the work studying the patterns of
user participation behavior, and the feature factors that influence such behavior
on different forum datasets. In this work, we find that, despite the relative
randomness and lesser commitment of structural relationships in online forums,
users' community joining behaviors display some strong regularities. One
particularly interesting observation is that the very weak relationships
between users defined by online replies have similar diffusion curves as those
of real friendships or co-authorships. We build social selection models,
Bipartite Markov Random Field (BiMRF), to quantitatively evaluate the prediction
performance of those feature factors and their relationships. Using these models,
we show that some features carry supplementary information, and the effectiveness
of different features vary in different types of forums. Moreover, the results of
BiMRF with two-star configurations suggest that the feature of user similarity defined
by frequency of communication or number of common friends is inadequate to predict
grouping behavior, but adding node-level features can improve the fit of the model.
Understanding Choice Intensity: A Poisson Mixture Model with Logit-based Random Utility Selective Mixing.
Matthew Harding
In this paper we introduce a new Poisson mixture model for count panel data where the underlying
Poisson process intensity is determined endogenously by consumer latent utility maximization over
a set of choice alternatives. This formulation accommodates the choice and count in a single random
utility framework with desirable theoretical properties. Individual heterogeneity is introduced
through a random coefficient scheme with a flexible semiparametric distribution. We deal with the
analytical intractability of the resulting mixture by recasting the model as an embedding of infinite
sequences of scaled moments of the mixing distribution, and newly derive their cumulant representations
along with bounds on their rate of numerical convergence. We further develop an efficient
recursive algorithm for fast evaluation of the model likelihood within a Bayesian Gibbs sampling
scheme, and show posterior consistency. We apply our model to a recent household panel of supermarket
visit counts. We estimate the nonparametric density of three key variables of interest
- price, driving distance, and their interaction - while controlling for a range of consumer demographic
characteristics. We use this econometric framework to assess the opportunity cost of time
and analyze the interaction between store choice, trip frequency, search intensity, and household
and store characteristics. We also conduct a counterfactual welfare experiment and compute the
compensating variation for a 10% to 30% increase in Walmart prices.
Deconstructing A Long Tail Commerce Network.
Neel Sundaresan
In this talk we discuss modeling a complex long tail commerce network in the form of the eBay Marketplace graph.
The asymmetric and preferential nature of buying and selling in terms of structure, interactions, trust and
reputation measures and evolution of these has tremendous value in identifying business opportunities and
building effective user applications.The discussion focuses on the analysis of the macroscopic shape
of the networks across the platform and also
in specific marketplace categories. The results suggest significant diversity across these individual
marketplaces from collectors network to retail network. We also discuss motif profiling and preferential
connections. Means for defining trust measures based on network properties will also be addressed.
Nod vs. High-Five: LinkedIn Connection Strength Models and Applications.
Monica Rogati
The low cost of link formation on social and professional networking sites leads to connections with heterogeneous
relationship strengths, a LinkedIn user might be connected to both professional acquaintances and close collaborators.
Modeling this relationship as non-binary is more realistic (and more useful in personalizing a user's experience)
than the binary links that have been the focus of previous work in the social networking space. We present an unsupervised,
link-based latent variable model to estimate relationship strength from interaction
activity (e.g., communication, browsing, comments) and profile content similarity. We also discuss practical
applications of modeling connection strength when building data driven products that enhance the LinkedIn user
experience and engagement. (This is joint work with Rongjing Xiang during her internship at LinkedIn.)
Network of positive and negative edges
Jure Leskovec
Relations between users on social media sites often reflect a mixture
of positive (friendly) and negative (antagonistic) interactions. We
study how the interplay between positive and negative
relationships affects the structure of on-line social networks,
whereas in contrast the bulk of research on social networks to date
has focused almost exclusively on positive interpretations of links
between people. We connect our analyses to theories of signed
networks from social psychology. In doing so we find that the
classical theory of structural balance tends to capture certain common
patterns of interaction, but that it is also at odds with some of the
fundamental phenomena we observe --- particularly related to the
evolving, directed nature of these on-line networks. We then develop
an alternate theory of status that better explains the observed edge
signs and provides insights into the underlying social mechanisms. Our
work provides one of the first large-scale evaluations of theories of
signed networks using on-line datasets, as well as providing a
perspective for reasoning about social media sites.
Joint work with Daniel Huttenlocher and Jon Kleinberg.
The paper can be accessed here.
Social Networks and Stable Matchings in the Job Market
Esteban Arcaute
In this talk we introduce and study a model that considers the job market as a two-sided matching market,
and accounts for the importance of social contacts in finding a new job. We assume that workers learn only
about positions in firms through social contacts. Given that information structure, we study both static
properties of what we call locally stable matchings, a solution concept derived from stable matchings,
and dynamic properties through a reinterpretation of Gale-Shapley's algorithm as myopic best response dynamics.
We prove that, in general, the set of locally stable matching strictly contains that of stable matchings
and it is in fact NP-complete to determine if they are identical. We also show that the lattice structure
of stable matchings is in general absent. Finally, we focus on myopic best response dynamics inspired by
the Gale-Shapley algorithm. We study the efficiency loss due to the informational constraints, providing
both lower and upper bounds.
This is joint work with Sergei Vassilvitskii, Ramesh Johari and David Liben-Nowell.
Some of the results presented appeared in WINE 2009.
Temporal User Behavior in Yahoo Answers
Christina Aperjis
When searching for an answer to a question, people generally prefer to get
both high quality and speedy answers. The fact that it usually takes longer
to find a better answer creates a speed-accuracy tradeoff which is inherent
in information seeking. A natural setting to study this tradeoff is Yahoo
Answers, a community-driven question-and-answer site. We identify and
quantify how question authors trade off the number of answers they receive
and the cost of waiting. We find that users are willing to wait more to
obtain an additional answer when they have only received a small number of
answers; this implies decreasing marginal returns in the amount of collected
information. We also estimate the user's utility function from the
data.
This is joint work with Bernardo A. Huberman and Fang Wu.
The Netflix Prize: Quest for $1,000,000
Yehuda Koren
The collaborative filtering approach to recommender systems predicts
user preferences for products or services by learning past user-item
relationships. Their significant economic implications made
collaborative filtering techniques play an important role at known
e-tailers such as Amazon and Netflix. This field enjoyed a surge of
interest since October 2006, when the Netflix Prize competition was
commenced. Netflix released a dataset containing 100 million anonymous
movie ratings and challenged the research community to develop
algorithms that could beat the accuracy of its recommendation system,
Cinematch. In this talk I will survey the competition together with some
of the principles and algorithms, which have led us to winning the Grand
Prize in the competition.
Data Jujitsu - the art of treating data as a product
DJ Patil
LinkedIn is the premiere professional social network with over 60 million users
and a new user joining every second. One of LinkedIn's strategic advantages
is their unique data. While most organizations consider data as a service function,
LinkedIn considers data a cornerstone of their product portfolio. This emphasis on
data as a product led LinkedIn to be the first social network site to deploy a
People You May Know service. Other data products that LinkedIn has built include
Who Viewed My Profile, People Who Viewed This Profile Also Viewed..., the backend
standardization algorithms that power search, and matching algorithms. To rapidly
develop these products LinkedIn leverages a number of technologies including open
source, 3rd party solutions, and some we've had to invent along the way.
In this presentation, we will talk about our methodology for turning data into user
facing products and some of the major challenges we see in the data space.
Query-log analysis for improving web search
Panayiotis Tsaparas
Search engines log the activity of users when posing queries and interacting with the results.
Query click logs maintain information about what queries were asked, what results were
shown and which ones were clicked as a result of the query. This information offers a
valuable insight in the intent of the user and the need behind the query, and it has
been the focus of intense research activity.
In this talk, we will describe algorithms for query-log analysis.
First we will talk about an algorithm that uses query logs for generating advertising keywords
and understanding the category of a query. Time permitting, we will then describe
an algorithm for generating training data for learning a ranking function.
In both cases we model the query log as a graph, and we formulate our problem as an optimization
problem over these graphs.
Stochastic Submodular Optimization
Arash Asadpour
We study stochastic submodular maximization problem with respect to cardinality
and matroid constraints. Our model can capture the effect of uncertainty in
different problems, such as cascade effects in social networks, capital budgeting,
sensor placement, etc. We study non-adaptive and adaptive policies and give
"optimal" constant approximation algorithms for both cases. We also show that
the adaptivity gap in this setting is 1.59.
Compressibility of Behavioral Graphs
Ravi Kumar
Graphs resulting from human behavior (the web graph, social networks,
etc.) have hitherto been viewed as a monolithic class with similar
characteristics. Our focus here is on the compressibility of such
graphs. It has been empirically shown that Web graphs can be compressed
down to three bits of storage per edge. In light of this, we address
two basic questions: are social networks as compressible as Web graphs
and are there tractable and realistic models that yield highly
compressible graphs?
De-anonymizing Social Networks
Arvind Narayanan
Operators of online social networks are increasingly sharing potentially
sensitive information about users and their relationships with advertisers,
application developers, and data-mining researchers. Privacy is typically
protected by anonymization, i.e., removing names,
addresses, etc.
We present a framework for analyzing privacy and anonymity in social networks
and develop a new re-identification algorithm targeting anonymized
social-network graphs. To demonstrate its effectiveness on real-world
networks, we show that a third of the users who can be verified to have
accounts on both Twitter, a popular microblogging service, and Flickr, an
online photo-sharing site, can be re-identified in the anonymous Twitter
graph with only a 12% error rate.
Our de-anonymization algorithm is based purely on the network topology, does
not require creation of a large number of dummy "sybil" nodes, is robust to
noise and all existing defenses, and works even when the overlap between the
target network and the adversary's auxiliary information is small.
Bio:
--------------
Arvind Narayanan is a post-doctoral researcher working with Dan Boneh. He recently finished his Ph.D at the University of Texas at Austin, advised by Vitaly Shmatikov. His research is on the privacy and anonymity issues involved in publishing large-scale datasets about people. His thesis, in a sentence, is that the level of anonymity that society expects -- and companies claim to provide -- in published databases is fundamentally unrealizable. He blogs about his anonymity-breaking efforts and other research at http://33bits.org/.
Eliciting Information on the Distribution of Future Outcomes
Nicolas Lambert
This paper studies the problem of inducing a presumably knowledgeable
expert to reveal true information regarding the probability
distribution of a future random outcome. The information being
considered is quite general, and includes in particular the statistics
of the probability distribution (such as mean, median, variance), and
categorical information (such as the most correlated pair of
variables). I examine two types of incentive schemes: Those that
reward the expert for being truthful, and, for the case of numerical
information, those that reward the expert increasingly with the
accuracy of the prediction. For both cases, I establish simple
criteria to determine the information that can be elicited with such
schemes, and offer a complete characterization of the associated
schedule fee.
Optimizing Web Traffic via the Media Scheduling Problem
Lars Backstrom
Website traffic varies through time in consistent and predictable
ways, with highest traffic in the middle of the day. When providing
media content to visitors, it is important to present repeat visitors
with new content so that they keep coming back. In this paper we
present an algorithm to balance the need to keep a website fresh with
new content with the desire to present the best content to the most
visitors at times of peak traffic. We formulate this as the me- dia
scheduling problem, where we attempt to maximize total clicks, given
the overall traffic pattern and the time varying clickthrough rates of
available media content. We present an efficient algorithm to perform
this scheduling under certain conditions and apply this algorithm to
real data obtained from server logs, showing evidence of significant
improvements in traffic from our algorithmic sched- ules. Finally, we
analyze the click data, presenting models for why and how the
clickthrough rate for new content declines as it ages.
Behavioral Experiments in Strategic Networks
Michael Kearns
For four years now, we have been conducting "medium-scale" experiments
in how human subjects behave in strategic and economic settings
mediated by an underlying network structure. We have explored a
wide range of networks inspired by generative models from the
literature, and a diverse set of collective strategic problems,
including biased voting, graph coloring, consensus, and networked
trading. These experiments have yielded a wealth of both specific
findings and emerging general themes about how populations of human
subjects interact in strategic networks. I will review these findings
and themes, with an emphasis on the many more questions they raise
than answer.
How Social Network Structure affects Diffusion and Learning
Matthew Jackson
We examine how diffusion and learning processes operating through social
networks are affected by homophily (the tendency of individuals to associate
with others similar to themselves) and other network
characteristics. Homophily has no effect in pure diffusion processes where
only shortest paths matter; only connection density matters. In contrast,
homophily substantially slows learning based on repeated averaging of
neighbors' information, while connection density has no effect.
How to Design Profitable Auctions
Paul Valiant
This talk is a high-level overview of results from two years of work with
Silvio Micali and Jing Chen on a question that may best be phrased as: what
is the best way to sell many items to many players? The standard
formalizations of this question were fleshed out in the 1960's and led to
answers that are unsatisfactory in practice, leading to today's Ebay that
sells items by the auction method known to the ancient Greeks. In this talk
we present new formulations of this question, elicited by fundamental notions
from computer science: competitive analysis, collusion, and "knowledge". We
present several concrete (and near-optimal) auctions in these models, along
with many open questions.
Sort me if you can (or, Algorithms on Dynamic Data)
Mohammad Mahdian
We formulate and study a new computational model for dynamic data. In
this model the data changes gradually and the goal of an algorithm is
to compute the solution to some problem on the data at each time step,
under the constraint that it only has a limited access to the data
each time. As the data is constantly changing and the algorithm might
be unaware of these changes, it cannot be expected to always output the
exact right solution; we are interested in algorithms that guarantee to output
an approximate solution. In particular, we focus on the fundamental problems
of sorting and selection, where the true ordering of the elements changes
slowly. We provide algorithms with performance close to the optimal in
expectation and with high probability. If time permits, we will discuss
new results on graph algorithms in our framework.
This is based on joint work with Aris Anagnostopoulos, Ravi Kumar, and Eli
Upfal.
Discovery & Emergence
Abdur Chowdhury
Often as computer scientists we focus on faster algorithms, such as
approximations of solutions in linear time over large data sets or similar
problems. Rather than focus on algorithms in this talk, we ask the question
"What possibilities emerge from surfacing the world's conversations to
others". Specifically we explore Twitter Trends as a discovery tool and show
how awareness of the thoughts of others can cause the emergence of new
behaviors.
Bio:
--------------
Dr. Abdur Chowdhury serves as Twitter's Chief Scientist. Prior to that
Dr. Chowdhury co-founded Summize a real-time search engine sold to Twitter in
2008. Dr Chowdhury has held positions at AOL as their Chief Architect for
Search, Georgetown's Computer Science Department and University of Maryland's
Institute for Systems Research. His research interest lay in Information
Retrieval focusing on making information accessible.
Internet search advertising: Challenges in targeting and quality considerations
Mayur Datar
In this informal talk, I will touch upon two topics related to search
advertising:
Targeting: How do you match advertiser keywords to user queries? What are the
underlying challenges in bridging this semantic gap?
How do you measure the quality of search ads? Is it in the best interest of
all parties involved, users, advertisers and the search engine, to always
show ads for queries? What are the costs and benefits to showing ads and how
do you decide on when is it appropriate to show ads?
Look forward to more questions than answers! Hopefully these will lead to
good research topics.
Bio:
--------------
Mayur Datar works as a Research Scientist with Google Inc. His research
interests are in datamining, algorithms, databases and computer science
theory. Prior to joining Google, Mayur obtained his doctorate degree in
computer science from Stanford university and a Bachelor of Technology degree
from I.I.T. Bombay. He was awarded the President of India, Gold Medal for
being the most outstanding student of his graduating batch from
I.I.T. Bombay. He has published several papers in renowned conferences like
SIGMOD, VLDB, KDD, FOCS, SODA, WWW.
Affiliation Networks
D. Sivakumar
In the last decade, structural properties of several naturally arising
networks (the Internet, socialnetworks, the web graph, etc.) have been
studied intensively with a view to understanding their evolution. In recent
empirical work, Leskovec, Kleinberg, and Faloutsos identify two new and
surprising properties of the evolution of many real-world networks:
densification (the ratio of edges to vertices grows over time), and shrinking
diameter (the diameter reduces over time to a constant). These properties run
counter to conventional wisdom, and are certainly inconsistent with graph
models prior to their work.
In this work, we present the first model that provides a simple, realistic,
and mathematically tractable generative model that intrinsically explains all
the well-known properties of the social networks, as well as densification
and shrinking diameter. Our model is based on ideas studied empirically in
the social sciences, primarily on the groundbreaking work of Breiger (1973)
on bipartite models of social networksthat capture the affiliation of agents
to societies.
We also present algorithms that harness the structural consequences of our
model. Specifically, we show how to overcome the bottleneck of densification
in computing shortest paths between vertices by producing sparse subgraphs
that preserve or approximate shortest distances to all or a distinguished
subset of vertices. This is a rare example of an algorithmic benefit derived
from a realistic graph model.
The talk will review the history of relevant random graph models, and attempt
to illustrate the co-evolution of theoretical modeling and empirical insights
in this fascinating slice of graph theory.
The talk reports work done jointly with Silvio Lattanzi of Sapienza
University, Roma, Italy, and is to appear in STOC 2009.
Using Bandit Models in Search Engines
Sandeep Pandey
Web search engines perform two basic tasks: they acquire Web pages via
crawling, and present lists of pages and advertisements in response to
user search queries. Given the gigantic size of the Web these tasks
can be viewed as constrained optimization problems. On the acquisition
side the constraint is on the crawling resources, while
on the presentation side, the constraint is on user attention: users
cannot be expected to sift through large amounts of content.
Furthermore, some parameters needed to solve these optimization
problems may or may not be known leading to the offline and online versions of
the problems. In this talk we will look at these optimization problems.
In particular, we will focus on the online presentation of advertisements where
the unknown parameters are the ad quality values. I will present a method of
simultaneously estimating unknown parameters and exploiting current estimates.
Our method extends and adapts well-known "Bandit" models from Machine
Learning.
Bio:
--------------
Sandeep Pandey is a Research Scientist in Search and Advertising group at
Yahoo! Research. Before joining Yahoo!, he received his Ph.D. from Carnegie Mellon
and his B.Tech. from IIT Bombay. His research interests include Web crawling and
advertising. He also likes to work on online learning problems such as the multi-armed
bandit problem. In his spare time Sandeep enjoys watching cricket and playing
tennis.
Research Challenges in Online Advertising
Richard Chatwin
VP Research, Adchemy
Adchemy is leading a revolution in online performance marketing, both as a
top-25 U.S. online advertiser and through the development of the Adchemy
Digital Marketing Platform, an integrated suite of products that allows
online marketers to maximize their customers' end-to-end
experience. I will discuss some of the research challenges that we are
tackling in developing our technology solutions, including problems related
to online matching, multi-armed bandits, and the optimizer's curse.
Bio:
--------------
Richard has been VP Research at Adchemy since its inception in late 2004. He
has over nine years of experience in the online advertising industry. He
received his PhD from the Operations Research department at Stanford, where
he focused on airline revenue management. He spent several years in the
consulting world pioneering the development and implementation of real option
valuation methodologies for asset valuation.
Scaling Social Data
Robert Johnson
Director of Engineering, Facebook
Facebook serves billions of pages a day based on hundreds of terabytes of
dynamic data, which presents many scaling challenges for our storage and
caching systems. In particular the social nature of the data gives it some
interesting properties - it is not easily clustered or partitioned by users
or applications, it updates frequently, and many queries require massive
network fan-in. I'll be discussing the problems that arise in scaling such a
system and some of the solutions we've used to get to our current scale, as
well as our plans for infrastructure to solve these problems at even larger
scale and in a more general way for future web 2.0 applications with similar
properties.
Bio:
--------------
Robert Johnson is Director of Engineering at Facebook, where he leads the
software development efforts to cost-effectively scale Facebook's
infrastructure and optimize performance for its many millions of
users. During his time with the company, the number of users has expanded by
more than twenty-fold and Facebook now handles billions of page views a day.
Robert was previously at ActiveVideo Networks where he led the distributed
systems and set-top software development teams. He has worked in a wide
variety of engineering roles from robotics to embedded systems to web
software. He received a B.S. In Engineering and Applied Science from
Caltech.
Data Cloud: Integrating Structured Information into Web Search
Yury Lifshits
In this talk we address two questions: (1) How to use structured data in web
search? (2) How to gather structured data? For the first question we identify
valuable classes of data, present query classes that can benefit from
structured data and describe architecture that combines keyword search with
structured search. For the second question we present Data Cloud: An
ecosystem of data publishers, search engine (data cloud) and data consumers.
We show connection from Data Cloud strategy to classic notion in economics:
network effect in two-sided markets. At the end of the talk an early demo
implementation will be presented.
Bio:
--------------
Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics
at St.Petersburg (2007). He spent a year as a postdoc at Caltech before
joining Yahoo! Research in 2008. He won two gold medals at International
Mathematical Olympiads, received the Best Paper Award in Application Track of
CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for
Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":
http://simsearch.yury.name
Manipulation Robustness of Collaborative Filtering Systems
Robbie Yan
While the expanding universe of products available via Internet
commerce provides consumers with valuable options, sifting through
the numerous alternatives to identify desirable choices can be
challenging. Collaborative filtering systems aid this process by
recommending to users products desired by similar individuals.
Because purchase decisions are influenced by these systems, however,
they have become targets of manipulation by unscrupulous vendors. In
this talk, we will discuss our findings on the robustness of
collaborative filtering systems to manipulation. In particular, we
provide theoretical and empirical results demonstrating that while
nearest neighbor algorithms, which are widely used in commercial
systems, can be highly susceptible to manipulation, two classes of
collaborative filtering algorithms which we refer to as linear and
asymptotically linear are relatively robust. In particular, we find
that as a user rates an increasing number of products, the average
accuracy of predictions made by these algorithms becomes insensitive
to manipulated data. Our results provide guidance for the design of
future collaborative filtering systems.
Scalable image recognition algorithms for camera phone applications
G.D. Ramkumar, SnapTell, Inc.
SnapTell has created technology that allows a mobile phone user to snap a
picture of a product and instantly receive reviews, prices, and information.
The service has been scaled to support more than ten million products including
all books, DVDs, CDs, video games, and other product categories. We present
an outline of the algorithms used to power the service emphasizing scaling
and locality sensitive indexing. SnapTell's popular iPhone application has been
downloaded by over half a million users over the last few weeks.
The technology can also be used to make print advertisements in a magazine,
newspaper, poster, or billboard interactive without adding barcodes or otherwise
altering the content. Consumers snap a picture of the object and send it to a
shortcode to enter into a contest, access a mobile website, or download ringtones,
wallpaper, video clips, or other information. SnapTell is the leading vendor in
this space and offers a solution with unprecedented scale and robustness.
We present a synopsis of the image matching algorithms used to power the
service and discuss potential research topics in the space.
Content-Driven Reputation and Trust for Collaborative Content
Luca de Alfaro (UCSC, visiting at Google)
Open collaboration is one of the most effective ways to create content
and aggregate information, as witnessed by the Wikipedia. In this
talk, we propose a reputation systems for the authors of collaborative
information, and a trust systems for the aggregated information.
The goal of the reputation system is to provide an incentive to
collaboration, and to provide information about the quality of
authors. We propose a content-driven notion of reputation, where
authors of long-lasting contributions gain reputation, while authors
of contributions that are not preserved lose reputation. We show
that, on the Wikipedia, the content-driven reputation of authors is a
good predictor of the quality of their future contributions.
We then describe a trust system for content, where the trust in a
portion of content is computed according to the reputations of the
authors who created and revised it. Again, we show that on the
Wikipedia content trust is a good predictor of future content
longevity. Finally, we describe how the reputation and trust systems
can be made robust to Sybil attacks, in which people use multiple
identities to try boost their reputation and raise the trust of
content of their choice.
The reputation and trust systems have been implemented in the
WikiTrust system/.
Bio:
--------------
Luca de Alfaro received his Ph.D. degree from Stanford University,
where he worked on system specification and verification under the
supervision of Zohar Manna. After graduation, he was for three years
at UC Berkeley as a postdoctoral researcher, working with Thomas A.
Henzinger on verification and on game theory. In 2001, Luca de Alfaro
joined the faculty of UC Santa Cruz. His research interests involve
system specification and verification, embedded software, game theory,
incentive systems for on-line collaboration, and trust and reputation
on the web. Luca de Alfaro is currently at Google as a visiting
scientist, on leave from UC Santa Cruz.
Luca de Alfaro has long been an enthusiastic user of on-line
collaborative tools, including wikis. One evening in 2005, after
working at the wiki he uses to facilitate information sharing among
members of his research group, he decided to start a wiki on cooking,
which is one of his passions. The wiki was soon peppered with spam
and less than useful contributions. Rather than cleaning up, a most
tedious job, Luca de Alfaro shut off the wiki temporarily, and he
started pondering how to provide incentives to collaboration, and
compute quality metrics for content. As he found the job of rating
authors or contributions by hand most tedious, he was interested in
methods for computing reputation and quality metrics purely from
content analysis. The research led to the WikiTrust system/.
The wiki on cooking has been
brought back to life (http://www.cookiwiki.org), even though it
receives little traffic. You are welcome to register, add recipes, and see
WikiTrust in action.
Tagging with Queries: How and Why? (To appear in WSDM 2009)
Ioannis Antonellis
Web search queries capture the information need of search engine
users. Search engines store these queries in their logs and analyze
them to guide their search results.
In this work, we argue that not only a search engine can benefit from
data stored in these logs, but also the web users. We first show how
clickthrough logs can be collected in a distributed fashion using the
http referer field in web server access logs. We then perform a set of
experiments to study the information value of search engine queries
when treated as "tags" or "labels" for the web pages that both appear
as a result and the user actually clicks on. We ask how much extra
information these query tags provide for web pages by comparing them
to tags from the del.icio.us bookmarking site and to the pagetext. We
find that query tags can provide substantially many (on average 250
tags per URL), new tags (on average 125 tags per URL are not present
in the pagetext) for a large fraction of the Web.
Joint work with Hector Garcia-Molina and Jawed Karim
Content Targeting and Recommendation Systems
Scott Brave (Baynote, Inc)
Baynote delivers on-demand recommendation technology and social search
for over 200 websites, including Motorola, NASA, Intuit, and Expedia.
By tapping into the wisdom of the online community and automatically
displaying the best content and products, Baynote's Collective
Intelligence Platform (CIP) improves the user experience while
increasing online revenue, leads and impressions. In this talk, I
will present Baynote's unified approach to content targeting and
discuss some of the opportunities and challenges in building
next-generation recommendation systems.
Bio:
--------------
Scott Brave is a co-founder and CTO at Baynote, Inc.
Prior to Baynote, he was a postdoctoral
scholar in the Department of Communication at Stanford University
where he served as lab manager for the CHIMe (Communication between
Humans and Interactive Media) Lab. Scott received his Ph.D. in
Human-Computer Interaction (Communication Dept.), and B.S. in Computer
Systems Engineering (AI focus) from Stanford University, and his M.S.
from the MIT Media Lab. Scott is also a former Associate Editor of
the "International Journal of Human-Computer Studies" and co-author of
Wired for speech: How voice activates and advances the
human-computer relationship.
Online Social Networks: Modeling and Mining
Ravi Kumar (Yahoo! Research)
Online social networks have become major and driving phenomena on the web. In
this talk we will address key modeling and algorithmic questions related to
large online social networks. From the modeling perspective, we raise the
question of whether there is a generative model for network evolution. The
availability of time-stamped data makes it possible to study this question at
an extremely fine granularity. We exhibit a simple, natural model that leads
to synthetic networks with properties similar to the online ones. From an
algorithmic viewpoint, we focus on data mining challenges posed by the
magnitude of data in these networks. In particular, we examine topics related
to influence and correlation in user activities in such networks.
Bio:
--------------
Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a
research staff member at the IBM Almaden Research Center in the Computer
Science Principles and Methodologies group. He obtained his PhD in Computer
Science from Cornell University in December 1997. His primary interests are
web algorithms, algorithms for large data sets, and theory of computation.
Community Structure in Large Social and Information Networks
Michael Mahoney (Stanford University)
The concept of a community is central to social network analysis, and thus a
large body of work has been devoted to identifying community structure. For
example, a community may be thought of as a set of web pages on related
topics, a set of people who share common interests, or more generally as a
set of nodes in a network more similar amongst themselves than with the
remainder of the network. Motivated by difficulties we experienced at
actually finding meaningful communities in large real-world networks, we have
performed a large scale analysis of a wide range of social and information
networks. Our main methodology uses local spectral methods and involves
computing isoperimetric properties of the networks at various size scales --
a novel application of ideas from statistics and scientific computation to
internet data analysis. Our empirical results suggest a significantly more
refined picture of community structure than has been appreciated
previously. Our most striking finding is that in nearly every network dataset
we examined, we observe tight but almost trivial communities at very small
size scales, and at larger size scales, the best possible communities
gradually ``blend in'' with the rest of the network and thus become less
``community-like.'' This behavior is not explained, even at a qualitative
level, by any of the commonly-used network generation models. Moreover, this
behavior is exactly the opposite of what one would expect based on experience
with and intuition from expander graphs, from graphs that are well-embeddable
in a low-dimensional structure, and from small social networks that have
served as testbeds of community detection algorithms. Possible mechanisms for
reproducing our empirical observations will be discussed, as will
implications of these findings for clustering, classification, and more
general data analysis in modern large social and information networks.
Bio:
--------------
Michael Mahoney is currently a research scientist in the math department at
Stanford University. He is currently working on geometric network analysis;
developing approximate computation and regularization methods for large
informatics graphs; and applications to community detection, clustering, and
information dynamics in large social and information networks. His research
interests also include randomized algorithms for matrices and regression
problems, applications to DNA single nucleotide polymorphism data, and
large-scale statistical data analysis more generally. He has been a faculty
member at Yale University and a researcher at Yahoo.
An Incentive-Based Architecture for Social Recommendations
Kostas Kollias
Recommender systems and social networking services are important Internet
monetization tools and their popularity greatly facilitates web-based commercial
activity. In this talk we will focus on how these tools can be coupled in order
to form an architecture that incentivizes users to provide honest recommendations
in a social network. The architecture consists of one distinct reputation system
for each user, a recommendation language which allows users to place their
reviews, a product ranking algorithm, and a revenue sharing mechanism.
We will discuss the details of the architecture, the properties that allow users
to profit by placing honest recommendations and how fast these recommendations
lead to an equilibrium state.
Joint work with Rajat Bhattacharjee and Ashish Goel
Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction
Ali Dasdan
Web search is an indispensable resource for users to search for information
on the Web. Web search is also an important service for publishers and
advertisers to present their content to users. Thus, user satisfaction is the
key and must be quantified. In this talk, our goal is to give a practical
review of web search metrics from a user satisfaction point of view. This
viewpoint is intuitive enough for a typical user to express his or her web
search experience using it. The metrics that we are planning to cover fall
into the following areas: relevance, coverage, comprehensiveness, diversity,
discovery freshness, content freshness, and presentation. We will also
describe how these metrics can be mapped to proxy metrics for the stages of a
generic search engine pipeline. Our hope is that practitioners can apply
these metrics readily and that researchers can get problems to work on,
especially in formalizing and refining metrics.
This short talk is based on a half-day tutorial that the
presenter with Kostas Tsioutsiouliklis and Emre Velipasaoglu will present at
the WWW'09 conference.
Bio:
--------------
Ali Dasdan is a director managing the metrics, analysis, and algorithms group
of Yahoo! Web Search. His group is responsible for defining and implementing
white-box metrics for production web search systems, production web search
data, and the Web. His group is also responsible for monitoring the metrics
and diagnosing the causes for anomalies detected. His research interests are
in web search and advertising since 2006. He obtained his PhD in Computer
Science from the University of Illinois at Urbana-Champaign in 1999.
Trusts among Users of Online Communities -- an Epinions Case Study
Hady Lauw
Trust between a pair of users is an important piece of information for users in an online community (such as electronic commerce websites and product review websites) where users may rely on trust information to make decisions. In this paper, we address the problem
of predicting whether a user trusts another user. Most prior work infers unknown trust ratings from known trust ratings. The effectiveness of this approach depends on the connectivity of the known web of trust and can be quite poor when the connectivity is very
sparse which is often the case in an online community. In this paper, we therefore propose a classification approach to address the trust prediction problem. We develop a taxonomy to obtain an extensive set of relevant features derived from user attributes and user
interactions in an online community. As a test case, we apply the approach to data collected from Epinions, a large product review community that supports various types of interactions as well as a web of trust that can be used for training and evaluation. Empirical
results show that the trust among users can be effectively predicted using pre-trained classifiers.
Bio
Hady is currently a Visiting Researcher at Microsoft Search Labs. Prior to that, he earned his Ph.D. at Nanyang Technological University, Singapore under a fellowship funded by A*STAR (Singapore's NSF-equivalent). His research interests are in social networks and Web mining, covering such topics as rating behaviors, trust, and mass collaboration. His work has been published in various conferences and journals such as KDD, SDM, CIKM, WSDM, EC, and TKDE.Top-k Aggregation Using Intersection of Ranked Inputs
Sergei Vassilvitskii
There has been considerable past work on efficiently computing top k
objects by aggregating information from multiple ranked lists of these objects.
An important instance of this problem is query processing in search engines:
one has to combine information from several different posting lists
(rankings) of web pages (objects) to obtain the top k web pages to answer
user queries. Two particularly well-studied approaches to achieve
efficiency in top-k aggregation include early-termination algorithms
(e.g., TA and NRA) and pre-aggregation of some of the input lists.
However, there has been little work on a rigorous treatment of
combining these approaches.
We generalize the TA and NRA algorithms to the case when
pre-aggregated intersection lists are available in addition to the
original lists. We show that our versions of TA and NRA continue to
remain ``instance optimal,'' a very strong optimality notion that is a
highlight of the original TA and NRA algorithms. Using an index of
millions of web pages and real-world search engine queries, we
empirically characterize the performance gains offered by our new
algorithms. We show that the practical benefits of intersection lists
can be fully realized only with an early-termination algorithm.
Joint work with Ravi Kumar, Kunal Punera and Torsten Suel
Utility-Maximizing Privacy Mechanisms
Mukund Sundararajan
A mechanism for releasing information about a statistical database
with sensitive data must resolve a trade-off between utility and
privacy. Informally, publishing fully accurate information maximizes utility
while minimizing privacy, while publishing random noise accomplishes
the opposite. Formally, privacy can be rigorously quantified using the
framework of {\em differential privacy}, which requires that a
mechanism's output distribution is nearly the same (in a strong sense)
whether or not a given database row is included or excluded. Thus
far, (dis)utility has typically been quantified via the expected error
of a (randomly perturbed) response to a query.
We pursue much stronger and more general utility
guarantees. Just as differential privacy guarantees protection
against every potential attacker, independent of its side information,
we seek a mechanism that guarantees near-optimal utility to every
potential user, independent of its side information. Formally,
we model the side information of a potential user as a prior
distribution over query results. An interaction between such a user
and a mechanism induces a posterior distribution -- the utility of the
mechanism for this user is defined as the accuracy of this
posterior as quantified via a user-specific loss function. A
differentially private mechanism $M$ is (near-)optimal for a given
user $u$ if $u$ derives (almost) as much utility from $M$ as from any
other differentially private mechanism.
We prove that for each fixed counting query and differential privacy
level, there is a {\em geometric mechanism} $M^*$ --- a discrete
variant of the well-studied Laplace mechanism --- that is {\em
simultaneously utility-maximizing} for every possible user, subject to
the differential privacy constraint. This is an extremely strong
utility guarantee: {\em every} potential user $u$, no matter what
its side information and loss function, derives as much utility
from $M^*$ as from interacting with a differentially private
mechanism $M_u$ that is optimally tailored to $u$. The first part of
our proof characterizes the utility-maximizing differentially private
mechanism for a fixed but arbitrary user in terms of a basic feasible
solution to a linear program with constraints that encode differential
privacy. The second part shows that all of the relevant vertices of
this polytope (ranging over all
possible users) are derivable from the geometric mechanism via
suitable remappings of its range.
Joint work with Arpita Ghosh and Tim Roughgarden.
Online Advertising
Hamid Nazerzadeh
Online advertising is a multi-billion dollar growing market with
very interesting properties. I briefly explain some of these
properties, and talk about ad allocation problems in the context of
sponsored search and display advertising.
References:
- Dynamic Cost-Per-Action Mechanisms and Applications to Online
Advertising, with Amin Saberi, and Rakesh Vohra.
Proceedings of the 17th International World Wide Web Conference (WWW), 179-188, 2008. - A Combinatorial Allocation Mechanism with Penalties For Banner
Advertising, with Uriel Feige, Nicole Immorlica, and Vahab S. Mirrokni.
Proceedings of the 17th International World Wide Web Conference (WWW), 169-178, 2008. - Advertisement Allocation for Generalized Second Pricing Schemes,
with Ashish Goel, Mohammad Mahdian, and Amin Saberi
Fourth Workshop on Ad Auctions, 2008 - Allocating Online Advertisement Space with Unreliable Estimates,
with Mohammad Mahdian and Amin Saberi.
Proceedings of the 8th ACM Conference on Electronic Commerce (EC), 288-294, 2007.
From Bayesian to Worst-Case Optimal Auction Design
Tim Roughgarden
We define a general template for auction design that explicitly
connects Bayesian optimal mechanism design, the dominant paradigm in
economics, with worst-case analysis. In particular, we establish a
general and principled way to identify appropriate performance
benchmarks for prior-free optimal auction design. This framework
enables, for the first time, meaningful competitive analysis for
auction problems with allocation asymmetries, bidder asymmetries, and
costly payments.
The above is joint work with Jason Hartline (from STOC 2008 and
ongoing). Time permitting, we will also briefly discuss another recent
application of Bayesian optimal auction design: rigorous
efficiency-revenue approximation trade-offs (joint work with Shaddin
Dughmi and Mukund Sundararajan).
Yury Lifshits, Caltech, http://yury.name
Social Design is an art and science of setting and managing incentives in
web applications. So far, machine learning is the dominant approach to all
central algorithmic problems in the Web: search, advertising, recommendation
systems. Social design can become an important counterpart to ML, if we find
the ways to incentivise users to give us more and better input data and even
the answers themselves.
This talk is an outline of a new research field of Social Design. We start
with several stories including Internet press conference of president Putin,
reputation crash in largest Russian social network, and Yahoo! SearchMonkey
project. Then we attempt to extract, characterize and classify the commonly
used patterns in social design. Finaly, we set up the research agenda in the
area with focus on social aspects of semantic web and advertising
technologies.
Slides available at http://yury.name/talks.html
Bio: Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics
at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two
gold medals at International Mathematical Olympiads, received the Best Paper
Award in Application Track of CSR'07 and the Yandex Teaching Award (2006)
for his course "Algorithms for Internet". Yury is a maintainer of "The
Homepage of Nearest Neighbors": http://simsearch.yury.name
Inferring Links and Initiators
Evimaria Terzi,IBM
Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value
of 1 (or 0) in cell (i; j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix
we study the problem of inferring the underlying directed links between entities (rows) and finding which entities act as initiators.
I will formally define this problem and propose an MCMC framework for estimating the links and the initiators given the matrix of observations M. I will also show how this framework can be extended to incorporate a temporal aspect; instead of considering a single
observation matrix M we consider a sequence of observation matrices M1; : : : ;Mt over
time. Finally I will show that this problem is a generalization of several
problems studied in the field of social-network analysis.
Bio: Evimaria Terzi is a Research Staff Member at IBM Almaden since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, MSc from Purdue
University (USA) in 2002
and BSc from the University of Thessaloniki (Greece) in 2000.
Reputation Mechanisms for Online Markets
Christina Aperjis
In online marketplaces potential buyers have to decide whether to buy an item and how much to pay for it without being able to observe it and without knowing whether the seller is trustworthy. Reputation mechanisms promote trust by giving information about past transactions. With the goal of incentivizing sellers to advertise truthfully, we study how the marketplace is affected by (i) the reputation mechanism, and (ii) the way buyers interpret the seller?s reputation score.
Two-Stage Myopic Dynamics in Network Formation Games
Esteban Arcaute
We consider two-stage myopic dynamics for network formation
games, where utility to a node is based on a function of
the distance to all other nodes. Since our network formation game
is a generalization of Myerson's announcement game, we use
pairwise Nash stability as the solution concept.
We prove that, although the price of anarchy of the static game is
unbounded (in the number of nodes in the network), our dynamics
converge to a subset of pairwise Nash stable networks with a constant
efficiency ratio. For some specific utility functions, we prove that
our dynamics converge to efficient networks.
This is joint work with Ramesh Johari and Shie Mannor.
Measuring and Modeling the Web
Ying Xu
The last couple of decades have witnessed a phenomenal growth in
the World Wide Web. The Web has become an ubiquitous channel
for information sharing and dissemination. This talk describes several
research contributions in an endeavor towards a better understanding
of the Web.
One of the basic questions about the Web is its size, i.e., how many
webpages there are on the Web. A closely related problem is the sizes
of search engine indexes, as search engine index defines an important
subset of the Web that is most easily accessible to users. In the last
ten years both the scientific literature and the popular press dealt at
length with methodologies and estimates for the sizes of the Web and
public search engines. In the first part of the talk, we propose
algorithms for measuring the Web size, including both empirical and
theoretical results.
The Web is so large that even estimating its size is a hard problem.
To better understand the Web, people study random web graph models
-- a stochastic process that generates random graphs that with high
probability resemble the Web graph. Random web graph models are
useful theoretical tools to provide insights to the evolution of the Web
graph, to predict future behaviors, and to generate synthetic data sets
for testing and simulation. In the second half of the talk, we briefly
present several results in analyzing web graph models and applying the
models to real world applications.
Should ad networks bother fighting click fraud?
Bobji Mungamuru
Suppose an ad network decides that a given click-through is invalid (or "fraudulent"). The implication is that the ad network will not charge the advertiser for that click-through. Therefore, arguably, the ad network is "taking money out of his own pocket" by marking clicks invalid. As such, should ad networks even bother fighting fraud? We analyze a simple economic model of the online advertising market, and conclude that the answer is "yes".
Effective reputation systems should incentivize users to acquire and reveal information about content quality. They should also counter distortions induced by strategic agents that benefit from manipulating reputations. Mechanisms have been proposed in the recent literature with the aim to provide such incentives. These mechanisms, which we will refer to as reputation markets, can be viewed as information markets designed to assess the quality of online content. In this paper we develop equilibrium models to study how incentives created by a reputation market should influence community behavior and the accuracy of assessments.
Optimal Marketing Strategies over Social Networks
Mukund Sundararajan
Abstract: We study revenue maximization in social networks. We identify a
family of strategies called 'influence and exploit' strategies that are easy
to implement, easy to optimize over and approximately optimal.
Based on a paper that will appear in WWW 08. Jointly with Jason Hartline
(Northwestern University) and Vahab Mirrokni (MSR Redmond).i
Ratio Index for Budgeted Learning, with Applications
Brad Null
In the budgeted learning problem, we are allowed to experiment on a set of
alternatives (given a fixed experimentation budget) with the goal of picking
a single alternative with the largest possible expected payoff.
Approximation algorithms for this problem were developed by Guha and
Munagala by rounding a linear program that couples the various alternatives
together. We present an index for this problem, which we call the ratio
index, which also guarantees a constant factor approximation. Index based
policies have the advantage that a single number (i.e. the index) can be
computed for each alternative irrespective of all other alternatives, and
the alternative with the highest index is experimented upon. This is
analogous to the famous Gittins index for the discounted multi-armed bandit
problem.
The ratio index has several interesting structural properties. First, it can
be computed in strongly polynomial time; the techniques we develop may be
useful for deriving strongly polynomial algorithms for other Markov Decision
Problems. Second, with the appropriate discount factor, the Gittins index
and the ratio index are constant factor approximations of each other, and
hence the Gittins index also gives a constant factor approximation to the
budgeted learning problem. Finally, the ratio index can be used to obtain a
"discount oblivious" solution to the multi-armed bandit problem, which in
turn gives constant factor approximations to a wide class of problems that
we call "list ranking problems".
Joint work with Rajat Bhattacharjee, Ashish Goel, and Sanjeev Khanna
Web Graph Similarity for Anomaly Detection
Panagiotis Papadimitriou
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this work we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from well-known document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.
I'll talk about a number of computational problems that arise in online
advertising, primarily from the perspective of the buyer of advertising.
Topics to be discussed include:
* When buying sponsored search links from search engines, how does one
manage a portfolio of millions of keywords with minimal human involvement?
Challenges include determining how much to bid for each keyword and what
ad text to display to attract the most customers.
* How can one learn what properties of a web site or advertisement will
lead to the greatest number of purchases or clicks? The problem is made
more difficult by the fact that different classes of users respond well to
different marketing messages, so it's best to use different content for
different user segments.
* Consider an auction where the items to be sold are not known in advance,
but rather arrive online, and buyers are willing to pay different amount
for each item (revealed as the items arrive). Each buyer has a maximum
limit on the total number of items that they will buy. How should the
seller assign items to buyers so as to maximize the seller's revenue?
An Axiomatic Approach to Personalized Ranking Systems
Alon Altman
Personalized ranking systems and trust systems are an essential tool for collaboration in a multi-agent environment. In these systems, trust relations between many agents are aggregated to produce a personalized trust rating of the agents. In this talk I introduce the first extensive axiomatic study of this setting, and explore a wide array of well-known and new personalized ranking systems. We adapt several axioms (basic criteria) from the literature on global ranking systems to the context of personalized ranking systems, and fully classify the set of systems that satisfy all of these axioms. We further show that all these axioms are necessary for this result.
Ad Auctions with Reserved Price
Arash Asadpour
I will talk about the Ad Auctions when the auctioneer enjoys a relevant ad
that she can show in one of the slots if she wants to. The setting is to
some extent similar to the Ad Auctions with reserved price.
I will explain a randomized auction which is truthful in expectation for
both the bidders and the auctioneer. Also our mechanism is individual
rational and budget balanced. I will discus the revenue of this auction and
also will talk about some other settings in which the technique used to
design the auction can be helpful.
This is a joint work with Kamal Jain (Microsoft Research) and David Abraham
(CMU).
Boosting the area under ROC curve
Phil Long, Google
We show that any weak ranker that can achieve an area under the ROC curve
slightly better than 1/2 (which can be achieved by random guessing) can be
efficiently boosted to achieve an area under the ROC curve arbitrarily close
to 1. This boosting can be performed even in the presence of independent
misclassification noise, given access to a noise-tolerant weak ranker.
(This is joint work with Rocco Servedio.)
Random input models for Adwords
Aranyak Mehta, Google
I will talk about a new result on Random Input models for the Adwords Problem: What are good algorithms for allocation when the query stream is modeled as a random process, rather than as a worst case input. How does the natural "highest-bidder-wins" greedy algorithm perform? This question leads to an interesting generalization of a classic online matching algorithm. I will also talk about a related result on extended bidding models for Ad auctions.
Who do you know: The Social Network Revolution
Orkut Buyukkokten, Google
Online social networks fundamentally change the way we get connected. The people we cross paths with have the biggest influence in our lives. Now it's easier to cross paths than ever as we are much closer and so much more connected. In this talk I will discuss the motivation behind the development of orkut.com, touch on the social and technical aspects of implementing and maintaining a system that has over 60 million users and reflect on the lessons we learned.
Deterministic Decentralized Search in Random Graphs
Esteban Arcaute
We study a general framework for decentralized search in
random graphs. Our main focus is on deterministic memoryless
search algorithms that use only local information to reach their
destination in a bounded number of steps in expectation. This class
includes (with small modifications) the search algorithms used in
Kleinberg's pioneering work on long-range percolation graphs and
hierarchical network models. We give a characterization of searchable
graphs in this model, and use this characterization to prove a
monotonicity property for searchability.
Joint work with Ning Chen, Ravi Kumar, David Liben-Nowell, Mohammad Mahdian, Hamid Nazerzadeh and Ying Xu.
Interdomain Routing and Games
Michael Schapira,
Hebrew University of Jerusalem ( http://www.cs.huji.ac.il/~mikesch/)
We present a game-theoretic model that captures many of the intricacies of
interdomain routing in today's Internet. In this model, the strategic agents are source-nodes located on a network, who aim to send traffic to a unique destination node . The model enables complex dynamic interactions between the agents (asynchronous, sequential,
and based on partial-information).
We provide realistic conditions that guarantee that executing the Border gateway Protocol (BGP), the only protocol currently used for interdomain routing, is in the best interest of each of the players. Moreover, we show that even coalitions of players of any size
cannot improve their routing outcomes by jointly deviating from BGP. Unlike the vast majority of works in mechanism design, these results do not require any monetary transfers (to or by the agents).
Based on joint work with Hagay Levin and Aviv Zohar.
The Anatomy of Clickbot.A
Neil Daswani,
Google
In this talk, I will present a detailed case study of the architecture of the Clickbot.A botnet that attempted a low-noise click fraud attack against syndicated search engines. The botnet of over 100,000 machines was controlled using a HTTP-based botmaster. Google identified all clicks on its ads exhibiting Clickbot.A-like patterns and marked them as invalid. We disclose the results of our investigation of this botnet to educate the security research community and provide information regarding the novelties of the attack.
Data Stream Algorithms with Applications to Automatic Worm Fingerprinting
Dilys Thomas
We will first go over a few network worms and contemporary algorithms to detect these network worms. We will then go over some recent techniques to detect these worms using "vunlerability signatures". We will then use this to review two important problems in data stream computation -- distinct value estimation and frequent element computation. We show how these algorithms can be used for automatic worm fingerprinting.
Click Fraud Resistant Methods for Learning Click-Through Rates
Sergei Vassilvitskii
Presenting the paper "Click Fraud Resistant Methods for Learning Click-Through Rates" by N. Immorlica, K. Jain, M. Mahdian, and K. Talwar from WINE 2005.
Pricing Partially Compatible Products
Adam Meyerson, UCLA
Suppose two competing companies provide roughly equivalent
suites of software products. Each version of a product has a possibly
distinct price and quality. Customers would like to purchase a low-
price, high-quality set of products. The issue is that products made
by different companies need not be fully compatible. We reduce the
customer's decision process to a minimum cut, and provide
approximation and hardness results for the problem of computing the
company's best-response strategy in several different models. We also
introduce a range of questions for further work.
These results will appear in the ACM Conference on Electronic
Commerce (EC) 2007, and are joint work with David Kempe, Nainesh
Solanki, and Ramnath Chellappa.
Google News Personalization: Scalable Online Collaborative Filtering
Mayur Datar, Google
Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
Detecting Near-Duplicates for Web Crawling
Anish Das Sarma
This is joint work with Gurmeet Singh Manku and Arvind Jain in Google.
Near-duplicate web documents are abundant. Two such documents
differ from each other in a very small portion that displays
advertisements, for example. Such differences are irrelevant for
web search. So the quality of a web crawler increases if it can
assess whether a newly crawled web page is a near-duplicate of a
previously crawled web page or not.
In my talk I will address the problem of efficiently detecting
near-duplicates in the web crawl setting. First I shall review
Charikar's fingerprinting technique used in our approach. Then, I
shall formally define the problem and present algorithmic solutions
for two versions of the problem: online queries (single new
document) and batch queries (multiple new documents). Finally, I
shall briefly talk about our experimental analysis of the techniques.
Clickthrough of and Bidding on keywords
Aleksandra Korolova / Dilys Thomas
Aleksandra will talk about the paper "Predicting ClickThrough Rate Using Keyword Clusters" by Regelson and Fain from a Workshop on Sponsored Search Auctions. The results in the paper are mostly experimental. Dilys will briefly talk on "how to bid on keywords".
Multi-path Routing
Mihaela Enachescu
I will present an approach for enabling multi-path routing in large
networks. This is based on a project at NEC Labs, and is joint work
with Ravi Kokku. Our approach is to create multiple paths between
various source-destination pairs by selecting a special subset of
relay nodes. We give hardness results and provide some approximation
algorithms.
This work is motivated by the observation that given two end-host in
the Internet there exist potentially many independent or partially
overlapping paths between them. However, only one of these is usually
used by the underlay routing, leading to many inefficiencies (the
single path routings are usually slow to adapt in the presence of
fluctuating network conditions). Multi-path routing provides a more
adaptive framework, and it also raises a variety of (still open)
problems for our enjoyment.
Collaborative tagging systems
Paul Heymann
Collaborative tagging systems have recently become prevalent on the web. These systems are similar to classical keyword-based systems, but harness the input of many users to label large datasets like URLs on the web, videos, and photos. In my talk, I will give a brief description of tagging systems, and then talk about two recent projects. The first is work to predict tags annotating URLs based on the page text of those URLs. The outcome of this task has important implications for web search and for the usefulness of data created by social bookmarking systems. The second is an earlier (continuing) project on tag hierarchies.
Is Efficiency Expensive?
Mukund Sundararajan
We study the simultaneous optimization of efficiency and revenue in
pay-per-click keyword auctions in a Bayesian setting. Our main
result is
that the efficient keyword auction yields near-optimal revenue
even under
modest competition. In the process, we build on classical results in
auction theory to prove that increasing the number of bidders by the
number
of slots outweighs the benefit of employing the optimal reserve
price.
This is joint work with Tim Roughgarden.
Estimating Search Engine Index and Web Sizes
Ying Xu
I will talk about estimating the index size of search engines, both empirical and theoretical results. The talk is based on our CIKM paper "Estimating Corpus Size via Queries" and ICALP paper "Estimating Sum by Weighted Sampling".
Reusable and multiple goods
Ashish Goel
The talk is based on
Online auctions with re-usable goods
by Mohammad Hajiaghayi, Robert Kleinberg, Mohammad Mahdian, and David
Parkes, EC'05
The Sequential Auction Problem on eBay: An Empirical Analysis and a
Solution by Adam I. Juda and David C. Parkes, EC'06
Mechanism design based on Cost-Per-Acquisition (CPA)
Hamid Nazerzadeh
The Internet advertising has changed dramatically by moving from Cost-Per-Impression charging mechanism to Cost-Per-Click (CPC). I am going to talk about the possibility of going one step further and using CPA instead of CPC, and some of our results about this problem.
Asymptotically Optimal Repeated Auctions for Sponsored Search
Nicolas Lambert
We investigate asymptotically optimal keyword auctions, that is, auctions which maximize revenue as the number of bidders grows. We do so under two alternative behavioral assumptions. The first explicitly models the repeated nature of keyword auctions. It introduces a novel (and, we argue, plausible) assumption on individual bidding, namely that bidders never overbid their value, and bid their actual value if shut out for long enough. Under these conditions we present a broad class of repeated auctions that are asymptotically optimal among all sequential auctions (a superset of repeated auctions). Those auctions have varying payment schemes but share the ranking method. The Google auction belongs to this class, but not the Yahoo auction, and indeed we show that the latter is not asymptotically optimal. (Nonetheless, with some additional distributional assumptions, the Yahoo auction can be shown to belong to a broad category of auctions that are asymptotically optimal among all auction mechanisms that do not rely on ad relevance.) We then look at the one-shot keyword auction, which can be taken to model repeated auctions in which relatively myopic bidders converge on the equilibrium of the full-information stage game. In this case we show that the Google auction remains asymptotically optimal and the Yahoo auction suboptimal. The distributional assumptions under which our theorems hold are quite general, and arguably hold in all real-world situations of interest. We do however show that the Google auction is not asymptotically revenue maximizing for general distributions.

