Archive of RAIN talks
Spring 08-09
| Date | Speaker | Topic |
|---|---|---|
| Apr 3 | Mayur Datar |
Internet search advertising: Challenges in targeting and quality considerations |
| Apr 10 | - |
No meeting this week |
| Apr 17 | D Sivakumar |
Affiliation Networks |
| Apr 24 | - |
No meeting -- WWW 2009 |
| May 1 | - |
No meeting -- BAGT |
| May 8 | Sandeep Pandey Yahoo! Research |
Using Bandit Models in Search Engines |
| May 14 | Richard Chatwin VP Research, Adchemy |
Research Challenges in Online Advertising |
| May 22 | Robert Johnson |
Scaling Social Data |
| May 29 | Yury Lifshits Yahoo! Research |
Data Cloud: Integrating Structured Information into Web Search |
| June 5 | Robbie Yan | Manipulation Robustness of Collaborative Filtering Systems |
Winter 08-09
| Date | Speaker | Topic |
|---|---|---|
| Jan 16 | - | Organizational meeting |
| Jan 23 | G.D. Ramkumar SnapTell, Inc. |
Scalable image recognition algorithms for camera phone applications |
| Jan 30 | Luca de Alfaro | Content-Driven Reputation and Trust for Collaborative Content |
| Feb 6 | Ioannis Antonellis | Tagging with Queries: How and Why? (To appear in WSDM 2009) |
| Feb 13 | Scott Brave Baynote, Inc. |
Content Targeting and Recommendation Systems |
| Feb 20 | Ravi Kumar | Online Social Networks: Modeling and Mining |
| Feb 27 | Michael Mahoney | Community Structure in Large Social and Information Networks |
| Mar 6 | Kostas Kollias | An Incentive-Based Architecture for Social Recommendations |
| Mar 13 | Ali Dasdan | Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction |
Autumn 08-09
| Date | Speaker | Topic |
|---|---|---|
| Oct 3 | - | Talk by Jure Leskovec and organizational meeting |
| Oct 10 | Hady Lauw | Trusts among Users of Online Communities -- an Epinions Case Study |
| Oct 17 | Canceled | - |
| Oct 24 | Bahman Bahmani | Online Budgeted Matching in Random Input Models with Applications to Adwords |
| Oct 31 | Sergei Vassilvitskii | Top-k Aggregation Using Intersection of Ranked Inputs |
| Nov 7 | - | No meeting - Bay Area Theory Seminar (BATS) |
| Nov 14 | Mukund Sundararajan | Utility-Maximizing Privacy Mechanisms |
| Nov 21 | Hamid Nazerzadeh | Online advertising |
| Nov 28 | - | No meeting - Happy Thanksgiving! |
| Dec 5 | Tim Roughgarden | From Bayesian to Worst-Case Optimal Auction Design |
Spring 07-08
| Date | Speaker | Topic |
|---|---|---|
| Apr 11 | BAGT | |
| Apr 18 | Robbie Yan | Reputation Markets |
| Apr 25 | Cancelled for WWW | |
| May 2 | Bobji Mungamuru | Click Fraud |
| May 9 | Ying Xu | Oral Examination |
| May 16 | Esteban Arcaute | Network Formation Games |
| Mar 23 | Christina Aperjis | Reputation Systems |
| May 30 | Evimaria Terzi | Inferring Links and Initiators |
| June 6 | Yury Lifshits | Social Design |
Winter 07-08
| Date | Speaker | Topic |
|---|---|---|
| Jan 25 | Alon Altman | An Axiomatic Approach to Personalized Ranking Systems |
| Feb 1 | Ashish Goel | Open Problems |
| Feb 8 | Cancelled | |
| Feb 15 | Brian Babcock | Adchemy |
| Feb 22 | Rajat Bhattacharjee | Oral Examination |
| Feb 29 | Panagiotis Papadimitriou | Web Graph Similarity for Anomaly Detection |
| Mar 7 | Brad Null | Ratio Index for Budgeted Learning, with Applications |
| Mar 14 | Mukund Sundararajan | Optimal Marketing Strategies over Social Networks |
Autumn 07-08
| Date | Speaker | Topic |
|---|---|---|
| Oct 15 | Michael Schapira | Interdomain Routing and Games |
| Oct 22 | Esteban Arcaute | Decentralized Search of Random Graphs |
| Oct 29 | Orkut Buyukkokten | Social Network Revolution |
| Nov 5 | Aranyak Mehta | Random input models for Adwords |
| Nov 12 | Phil Long | Boosting the area under the ROC curve |
| Nov 19 | Thanks-giving | |
| Nov 26 | cancelled | |
| Dec 3 | Arash Asadpour | Ad auctions with reserved price |
Spring 06-07
| Date | Speaker | Topic |
|---|---|---|
| Apr 13 | Ying Xu | a discussion on WWW'07 papers |
| Apr 20 | BAGT | |
| Apr 27 | Anish Das Sarma | Detecting Near-Duplicates for Web Crawling |
| May 4 | Mayur Datar | Google News Personalization |
| May 11 | Adam Meyerson | Pricing Partially Compatible Products |
| May 18 | Sergei Vassilvitskii | Click Fraud Resistant Methods for Learning Click-Through Rates |
| May 25 | Dilys Thomas | Streaming Algorithms and Worm Fingerprinting |
| June 1 | Neil Daswani | Anatomy of Clickbot.A |
Winter 06-07
| Date | Speaker | Topic |
|---|---|---|
| Jan 26 | Nicolas Lambert | Asymptotically Optimal Repeated Auctions |
| Feb 2 | Hamid Nazerzadeh | Mechanism design based on CPA |
| Feb 9 | Ashish Goel | Reusable and multiple goods |
| Feb 16 | Ying Xu | Estimating Search Engine Index and Web size |
| Feb 23 | Mukund Sundararajan | Is efficiency expensive? |
| March 2 | Paul Heymann | Collaborative tagging systems |
| March 9 | Mihaela Enachescu | Multi-path Routing |
| March 16 | Aleksandra Korolova / Dilys Thomas | Clickthrough of and Bidding on keywords |
Autumn 06-07
| Date | Speaker | Topic |
|---|---|---|
| Oct 25 | Ying Xu | The Netflix Challenge |
| Nov 1 | Ashish/Alexandra | Social Networks |
| Nov 8 | Arash/Shubha | PageRank |
| Nov 15 | TBA | TBA |
| Nov 22 | - | Netflix follow-up |
| Nov 29 | Zoltan Gyongyi | TBA |
| Dec 6 | Sreenivas Gollapudi | TBA |
Spring 05-06
| Date | Speaker | Topic |
|---|---|---|
| Apr 7 | Ashish Goel | Axiomatic PageRank |
| Apr 14 | - | Organizational Meeting |
| Apr 21 | Sergei Vassilvitskii | Relevance Feedback and Web Search |
| Apr 28 | Hamid Nazerzadeh | AdWords with Unreliable Estimates |
| May 5 | Mihaela Enachescu | Compact Routing |
| May 12 | Ying Xu | Streaming Algorithms for Graph Problems |
| May 19 | - | - |
| May 26 | Udi Manber | New Problems in Search |
| Jun 2 | Rajat Bhattacharjee | Incentive Based Ranking Mechanisms |
Winter 05-06
| Date | Speaker | Topic |
|---|---|---|
| Jan 20 | - | Organizational Meeting |
| Jan 27 | Brad Null | Multiarmed Bandit Problem |
| Feb 3 | Edith Cohen | Spatially-Decaying Aggregation |
| Feb 10 | Andrew Tomkins | Routing and Growth Models for Social Networks |
| Feb 17 | - | BAGT |
| Feb 24 | Amin Saberi | Open Problems |
| Mar 3 | Mayur Datar, Ashutosh Garg | Recommendation Systems |
| Mar 10 | Amruta Joshi | Googlebase |
Fall 05-06
| Date | Speaker | Topic |
|---|---|---|
| Oct 7 | Roee Engelberg | Equilibria in Online Games |
| Oct 14 | Ying Xu, Sergei Vassilvitskii | Models for the Web Graph |
| Oct 21 | Gagan Aggarwal | AdWords |
| Oct 28 | Omar Benjelloun | SERF |
| Nov 4 | Zoltan Gyongyi | Link Spam |
| Nov 11 | Mihaela Enachescu | Query Incentive Networks |
| Nov 18 | - | BATS |
| Dec 2 | Rajat Bhattacharjee | Survey of Recommendation Systems |
Spring 04-05
| Date | Speaker | Topic |
|---|---|---|
| Apr 4 | Shubha Nabar | Impact of Search Engines on Evolution of Page Popularity |
| Apr 11 | Svetlozar Nestorov | Mining Structure from Semistructured Data |
| Apr 18 | Sepandar Kamvar | Reputation Management in P2P Networks |
| Apr 25 | Rina Panigrahy | PageRank Axioms |
| May 2 | Amin Saberi | Spread of Viruses on the Internet |
| May 9 | Ramesh Johari | Stable Policy Routing with Provider Independence |
| May 16 | Rajat Bhattacharjee | Reputation Systems |
| May 23 | Mehran Sahami | Matching Text on the Web |
| Jun 6 | Amruta Joshi | Modeling Documents |
Winter 04-05
| Date | Speaker | Topic |
|---|---|---|
| Feb 2 | Ying Xu | Recommendation Systems/Collaborative Filtering |
| Feb 9 | Ashish Goel | Spam and Page Rank |
| Feb 16 | Gagan Aggarwal | AdWords |
| Feb 23 | David Arthur | Gossip and File Sharing |
| Mar 2 | Anil Kamath | AdWords |
| Mar 10 | Sergei Vassilvitskii | |
| Mar 16 | Dilys Thomas | Social Networks |
Abstracts
Internet search advertising: Challenges in targeting and quality considerationsMayur Datar
In this informal talk, I will touch upon two topics related to search
advertising:
Targeting: How do you match advertiser keywords to user queries? What are the
underlying challenges in bridging this semantic gap?
How do you measure the quality of search ads? Is it in the best interest of
all parties involved, users, advertisers and the search engine, to always
show ads for queries? What are the costs and benefits to showing ads and how
do you decide on when is it appropriate to show ads?
Look forward to more questions than answers! Hopefully these will lead to
good research topics.
Bio:
--------------
Mayur Datar works as a Research Scientist with Google Inc. His research
interests are in datamining, algorithms, databases and computer science
theory. Prior to joining Google, Mayur obtained his doctorate degree in
computer science from Stanford university and a Bachelor of Technology degree
from I.I.T. Bombay. He was awarded the President of India, Gold Medal for
being the most outstanding student of his graduating batch from
I.I.T. Bombay. He has published several papers in renowned conferences like
SIGMOD, VLDB, KDD, FOCS, SODA, WWW.
Affiliation Networks
D. Sivakumar
In the last decade, structural properties of several naturally arising
networks (the Internet, socialnetworks, the web graph, etc.) have been
studied intensively with a view to understanding their evolution. In recent
empirical work, Leskovec, Kleinberg, and Faloutsos identify two new and
surprising properties of the evolution of many real-world networks:
densification (the ratio of edges to vertices grows over time), and shrinking
diameter (the diameter reduces over time to a constant). These properties run
counter to conventional wisdom, and are certainly inconsistent with graph
models prior to their work.
In this work, we present the first model that provides a simple, realistic,
and mathematically tractable generative model that intrinsically explains all
the well-known properties of the social networks, as well as densification
and shrinking diameter. Our model is based on ideas studied empirically in
the social sciences, primarily on the groundbreaking work of Breiger (1973)
on bipartite models of social networksthat capture the affiliation of agents
to societies.
We also present algorithms that harness the structural consequences of our
model. Specifically, we show how to overcome the bottleneck of densification
in computing shortest paths between vertices by producing sparse subgraphs
that preserve or approximate shortest distances to all or a distinguished
subset of vertices. This is a rare example of an algorithmic benefit derived
from a realistic graph model.
The talk will review the history of relevant random graph models, and attempt
to illustrate the co-evolution of theoretical modeling and empirical insights
in this fascinating slice of graph theory.
The talk reports work done jointly with Silvio Lattanzi of Sapienza
University, Roma, Italy, and is to appear in STOC 2009.
Using Bandit Models in Search Engines
Sandeep Pandey
Web search engines perform two basic tasks: they acquire Web pages via
crawling, and present lists of pages and advertisements in response to
user search queries. Given the gigantic size of the Web these tasks
can be viewed as constrained optimization problems. On the acquisition
side the constraint is on the crawling resources, while
on the presentation side, the constraint is on user attention: users
cannot be expected to sift through large amounts of content.
Furthermore, some parameters needed to solve these optimization
problems may or may not be known leading to the offline and online versions of
the problems. In this talk we will look at these optimization problems.
In particular, we will focus on the online presentation of advertisements where
the unknown parameters are the ad quality values. I will present a method of
simultaneously estimating unknown parameters and exploiting current estimates.
Our method extends and adapts well-known "Bandit" models from Machine
Learning.
Bio:
--------------
Sandeep Pandey is a Research Scientist in Search and Advertising group at
Yahoo! Research. Before joining Yahoo!, he received his Ph.D. from Carnegie Mellon
and his B.Tech. from IIT Bombay. His research interests include Web crawling and
advertising. He also likes to work on online learning problems such as the multi-armed
bandit problem. In his spare time Sandeep enjoys watching cricket and playing
tennis.
Research Challenges in Online Advertising
Richard Chatwin
VP Research, Adchemy
Adchemy is leading a revolution in online performance marketing, both as a
top-25 U.S. online advertiser and through the development of the Adchemy
Digital Marketing Platform, an integrated suite of products that allows
online marketers to maximize their customers' end-to-end
experience. I will discuss some of the research challenges that we are
tackling in developing our technology solutions, including problems related
to online matching, multi-armed bandits, and the optimizer's curse.
Bio:
--------------
Richard has been VP Research at Adchemy since its inception in late 2004. He
has over nine years of experience in the online advertising industry. He
received his PhD from the Operations Research department at Stanford, where
he focused on airline revenue management. He spent several years in the
consulting world pioneering the development and implementation of real option
valuation methodologies for asset valuation.
Scaling Social Data
Robert Johnson
Director of Engineering, Facebook
Facebook serves billions of pages a day based on hundreds of terabytes of
dynamic data, which presents many scaling challenges for our storage and
caching systems. In particular the social nature of the data gives it some
interesting properties - it is not easily clustered or partitioned by users
or applications, it updates frequently, and many queries require massive
network fan-in. I'll be discussing the problems that arise in scaling such a
system and some of the solutions we've used to get to our current scale, as
well as our plans for infrastructure to solve these problems at even larger
scale and in a more general way for future web 2.0 applications with similar
properties.
Bio:
--------------
Robert Johnson is Director of Engineering at Facebook, where he leads the
software development efforts to cost-effectively scale Facebook's
infrastructure and optimize performance for its many millions of
users. During his time with the company, the number of users has expanded by
more than twenty-fold and Facebook now handles billions of page views a day.
Robert was previously at ActiveVideo Networks where he led the distributed
systems and set-top software development teams. He has worked in a wide
variety of engineering roles from robotics to embedded systems to web
software. He received a B.S. In Engineering and Applied Science from
Caltech.
Data Cloud: Integrating Structured Information into Web Search
Yury Lifshits
In this talk we address two questions: (1) How to use structured data in web
search? (2) How to gather structured data? For the first question we identify
valuable classes of data, present query classes that can benefit from
structured data and describe architecture that combines keyword search with
structured search. For the second question we present Data Cloud: An
ecosystem of data publishers, search engine (data cloud) and data consumers.
We show connection from Data Cloud strategy to classic notion in economics:
network effect in two-sided markets. At the end of the talk an early demo
implementation will be presented.
Bio:
--------------
Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics
at St.Petersburg (2007). He spent a year as a postdoc at Caltech before
joining Yahoo! Research in 2008. He won two gold medals at International
Mathematical Olympiads, received the Best Paper Award in Application Track of
CSR'07 and the Yandex Teaching Award (2006) for his course "Algorithms for
Internet". Yury is a maintainer of "The Homepage of Nearest Neighbors":
http://simsearch.yury.name
Manipulation Robustness of Collaborative Filtering Systems
Robbie Yan
While the expanding universe of products available via Internet
commerce provides consumers with valuable options, sifting through
the numerous alternatives to identify desirable choices can be
challenging. Collaborative filtering systems aid this process by
recommending to users products desired by similar individuals.
Because purchase decisions are influenced by these systems, however,
they have become targets of manipulation by unscrupulous vendors. In
this talk, we will discuss our findings on the robustness of
collaborative filtering systems to manipulation. In particular, we
provide theoretical and empirical results demonstrating that while
nearest neighbor algorithms, which are widely used in commercial
systems, can be highly susceptible to manipulation, two classes of
collaborative filtering algorithms which we refer to as linear and
asymptotically linear are relatively robust. In particular, we find
that as a user rates an increasing number of products, the average
accuracy of predictions made by these algorithms becomes insensitive
to manipulated data. Our results provide guidance for the design of
future collaborative filtering systems.
Scalable image recognition algorithms for camera phone applications
G.D. Ramkumar, SnapTell, Inc.
SnapTell has created technology that allows a mobile phone user to snap a
picture of a product and instantly receive reviews, prices, and information.
The service has been scaled to support more than ten million products including
all books, DVDs, CDs, video games, and other product categories. We present
an outline of the algorithms used to power the service emphasizing scaling
and locality sensitive indexing. SnapTell's popular iPhone application has been
downloaded by over half a million users over the last few weeks.
The technology can also be used to make print advertisements in a magazine,
newspaper, poster, or billboard interactive without adding barcodes or otherwise
altering the content. Consumers snap a picture of the object and send it to a
shortcode to enter into a contest, access a mobile website, or download ringtones,
wallpaper, video clips, or other information. SnapTell is the leading vendor in
this space and offers a solution with unprecedented scale and robustness.
We present a synopsis of the image matching algorithms used to power the
service and discuss potential research topics in the space.
Content-Driven Reputation and Trust for Collaborative Content
Luca de Alfaro (UCSC, visiting at Google)
Open collaboration is one of the most effective ways to create content
and aggregate information, as witnessed by the Wikipedia. In this
talk, we propose a reputation systems for the authors of collaborative
information, and a trust systems for the aggregated information.
The goal of the reputation system is to provide an incentive to
collaboration, and to provide information about the quality of
authors. We propose a content-driven notion of reputation, where
authors of long-lasting contributions gain reputation, while authors
of contributions that are not preserved lose reputation. We show
that, on the Wikipedia, the content-driven reputation of authors is a
good predictor of the quality of their future contributions.
We then describe a trust system for content, where the trust in a
portion of content is computed according to the reputations of the
authors who created and revised it. Again, we show that on the
Wikipedia content trust is a good predictor of future content
longevity. Finally, we describe how the reputation and trust systems
can be made robust to Sybil attacks, in which people use multiple
identities to try boost their reputation and raise the trust of
content of their choice.
The reputation and trust systems have been implemented in the
WikiTrust system/.
Bio:
--------------
Luca de Alfaro received his Ph.D. degree from Stanford University,
where he worked on system specification and verification under the
supervision of Zohar Manna. After graduation, he was for three years
at UC Berkeley as a postdoctoral researcher, working with Thomas A.
Henzinger on verification and on game theory. In 2001, Luca de Alfaro
joined the faculty of UC Santa Cruz. His research interests involve
system specification and verification, embedded software, game theory,
incentive systems for on-line collaboration, and trust and reputation
on the web. Luca de Alfaro is currently at Google as a visiting
scientist, on leave from UC Santa Cruz.
Luca de Alfaro has long been an enthusiastic user of on-line
collaborative tools, including wikis. One evening in 2005, after
working at the wiki he uses to facilitate information sharing among
members of his research group, he decided to start a wiki on cooking,
which is one of his passions. The wiki was soon peppered with spam
and less than useful contributions. Rather than cleaning up, a most
tedious job, Luca de Alfaro shut off the wiki temporarily, and he
started pondering how to provide incentives to collaboration, and
compute quality metrics for content. As he found the job of rating
authors or contributions by hand most tedious, he was interested in
methods for computing reputation and quality metrics purely from
content analysis. The research led to the WikiTrust system/.
The wiki on cooking has been
brought back to life (http://www.cookiwiki.org), even though it
receives little traffic. You are welcome to register, add recipes, and see
WikiTrust in action.
Tagging with Queries: How and Why? (To appear in WSDM 2009)
Ioannis Antonellis
Web search queries capture the information need of search engine
users. Search engines store these queries in their logs and analyze
them to guide their search results.
In this work, we argue that not only a search engine can benefit from
data stored in these logs, but also the web users. We first show how
clickthrough logs can be collected in a distributed fashion using the
http referer field in web server access logs. We then perform a set of
experiments to study the information value of search engine queries
when treated as "tags" or "labels" for the web pages that both appear
as a result and the user actually clicks on. We ask how much extra
information these query tags provide for web pages by comparing them
to tags from the del.icio.us bookmarking site and to the pagetext. We
find that query tags can provide substantially many (on average 250
tags per URL), new tags (on average 125 tags per URL are not present
in the pagetext) for a large fraction of the Web.
Joint work with Hector Garcia-Molina and Jawed Karim
Content Targeting and Recommendation Systems
Scott Brave (Baynote, Inc)
Baynote delivers on-demand recommendation technology and social search
for over 200 websites, including Motorola, NASA, Intuit, and Expedia.
By tapping into the wisdom of the online community and automatically
displaying the best content and products, Baynote's Collective
Intelligence Platform (CIP) improves the user experience while
increasing online revenue, leads and impressions. In this talk, I
will present Baynote's unified approach to content targeting and
discuss some of the opportunities and challenges in building
next-generation recommendation systems.
Bio:
--------------
Scott Brave is a co-founder and CTO at Baynote, Inc.
Prior to Baynote, he was a postdoctoral
scholar in the Department of Communication at Stanford University
where he served as lab manager for the CHIMe (Communication between
Humans and Interactive Media) Lab. Scott received his Ph.D. in
Human-Computer Interaction (Communication Dept.), and B.S. in Computer
Systems Engineering (AI focus) from Stanford University, and his M.S.
from the MIT Media Lab. Scott is also a former Associate Editor of
the "International Journal of Human-Computer Studies" and co-author of
Wired for speech: How voice activates and advances the
human-computer relationship.
Online Social Networks: Modeling and Mining
Ravi Kumar (Yahoo! Research)
Online social networks have become major and driving phenomena on the web. In
this talk we will address key modeling and algorithmic questions related to
large online social networks. From the modeling perspective, we raise the
question of whether there is a generative model for network evolution. The
availability of time-stamped data makes it possible to study this question at
an extremely fine granularity. We exhibit a simple, natural model that leads
to synthetic networks with properties similar to the online ones. From an
algorithmic viewpoint, we focus on data mining challenges posed by the
magnitude of data in these networks. In particular, we examine topics related
to influence and correlation in user activities in such networks.
Bio:
--------------
Ravi Kumar joined Yahoo! Research in July 2005. Prior to this, he was a
research staff member at the IBM Almaden Research Center in the Computer
Science Principles and Methodologies group. He obtained his PhD in Computer
Science from Cornell University in December 1997. His primary interests are
web algorithms, algorithms for large data sets, and theory of computation.
Community Structure in Large Social and Information Networks
Michael Mahoney (Stanford University)
The concept of a community is central to social network analysis, and thus a
large body of work has been devoted to identifying community structure. For
example, a community may be thought of as a set of web pages on related
topics, a set of people who share common interests, or more generally as a
set of nodes in a network more similar amongst themselves than with the
remainder of the network. Motivated by difficulties we experienced at
actually finding meaningful communities in large real-world networks, we have
performed a large scale analysis of a wide range of social and information
networks. Our main methodology uses local spectral methods and involves
computing isoperimetric properties of the networks at various size scales --
a novel application of ideas from statistics and scientific computation to
internet data analysis. Our empirical results suggest a significantly more
refined picture of community structure than has been appreciated
previously. Our most striking finding is that in nearly every network dataset
we examined, we observe tight but almost trivial communities at very small
size scales, and at larger size scales, the best possible communities
gradually ``blend in'' with the rest of the network and thus become less
``community-like.'' This behavior is not explained, even at a qualitative
level, by any of the commonly-used network generation models. Moreover, this
behavior is exactly the opposite of what one would expect based on experience
with and intuition from expander graphs, from graphs that are well-embeddable
in a low-dimensional structure, and from small social networks that have
served as testbeds of community detection algorithms. Possible mechanisms for
reproducing our empirical observations will be discussed, as will
implications of these findings for clustering, classification, and more
general data analysis in modern large social and information networks.
Bio:
--------------
Michael Mahoney is currently a research scientist in the math department at
Stanford University. He is currently working on geometric network analysis;
developing approximate computation and regularization methods for large
informatics graphs; and applications to community detection, clustering, and
information dynamics in large social and information networks. His research
interests also include randomized algorithms for matrices and regression
problems, applications to DNA single nucleotide polymorphism data, and
large-scale statistical data analysis more generally. He has been a faculty
member at Yale University and a researcher at Yahoo.
An Incentive-Based Architecture for Social Recommendations
Kostas Kollias
Recommender systems and social networking services are important Internet
monetization tools and their popularity greatly facilitates web-based commercial
activity. In this talk we will focus on how these tools can be coupled in order
to form an architecture that incentivizes users to provide honest recommendations
in a social network. The architecture consists of one distinct reputation system
for each user, a recommendation language which allows users to place their
reviews, a product ranking algorithm, and a revenue sharing mechanism.
We will discuss the details of the architecture, the properties that allow users
to profit by placing honest recommendations and how fast these recommendations
lead to an equilibrium state.
Joint work with Rajat Bhattacharjee and Ashish Goel
Web Search Engine Metrics: Direct Metrics to Measure User Satisfaction
Ali Dasdan
Web search is an indispensable resource for users to search for information
on the Web. Web search is also an important service for publishers and
advertisers to present their content to users. Thus, user satisfaction is the
key and must be quantified. In this talk, our goal is to give a practical
review of web search metrics from a user satisfaction point of view. This
viewpoint is intuitive enough for a typical user to express his or her web
search experience using it. The metrics that we are planning to cover fall
into the following areas: relevance, coverage, comprehensiveness, diversity,
discovery freshness, content freshness, and presentation. We will also
describe how these metrics can be mapped to proxy metrics for the stages of a
generic search engine pipeline. Our hope is that practitioners can apply
these metrics readily and that researchers can get problems to work on,
especially in formalizing and refining metrics.
This short talk is based on a half-day tutorial that the
presenter with Kostas Tsioutsiouliklis and Emre Velipasaoglu will present at
the WWW'09 conference.
Bio:
--------------
Ali Dasdan is a director managing the metrics, analysis, and algorithms group
of Yahoo! Web Search. His group is responsible for defining and implementing
white-box metrics for production web search systems, production web search
data, and the Web. His group is also responsible for monitoring the metrics
and diagnosing the causes for anomalies detected. His research interests are
in web search and advertising since 2006. He obtained his PhD in Computer
Science from the University of Illinois at Urbana-Champaign in 1999.
Trusts among Users of Online Communities -- an Epinions Case Study
Hady Lauw
Trust between a pair of users is an important piece of information for users in an online community (such as electronic commerce websites and product review websites) where users may rely on trust information to make decisions. In this paper, we address the problem of predicting whether a user trusts another user. Most prior work infers unknown trust ratings from known trust ratings. The effectiveness of this approach depends on the connectivity of the known web of trust and can be quite poor when the connectivity is very sparse which is often the case in an online community. In this paper, we therefore propose a classification approach to address the trust prediction problem. We develop a taxonomy to obtain an extensive set of relevant features derived from user attributes and user interactions in an online community. As a test case, we apply the approach to data collected from Epinions, a large product review community that supports various types of interactions as well as a web of trust that can be used for training and evaluation. Empirical results show that the trust among users can be effectively predicted using pre-trained classifiers.
Bio
Hady is currently a Visiting Researcher at Microsoft Search Labs. Prior to that, he earned his Ph.D. at Nanyang Technological University, Singapore under a fellowship funded by A*STAR (Singapore's NSF-equivalent). His research interests are in social networks and Web mining, covering such topics as rating behaviors, trust, and mass collaboration. His work has been published in various conferences and journals such as KDD, SDM, CIKM, WSDM, EC, and TKDE.Top-k Aggregation Using Intersection of Ranked Inputs
Sergei Vassilvitskii
There has been considerable past work on efficiently computing top k
objects by aggregating information from multiple ranked lists of these objects.
An important instance of this problem is query processing in search engines:
one has to combine information from several different posting lists
(rankings) of web pages (objects) to obtain the top k web pages to answer
user queries. Two particularly well-studied approaches to achieve
efficiency in top-k aggregation include early-termination algorithms
(e.g., TA and NRA) and pre-aggregation of some of the input lists.
However, there has been little work on a rigorous treatment of
combining these approaches.
We generalize the TA and NRA algorithms to the case when
pre-aggregated intersection lists are available in addition to the
original lists. We show that our versions of TA and NRA continue to
remain ``instance optimal,'' a very strong optimality notion that is a
highlight of the original TA and NRA algorithms. Using an index of
millions of web pages and real-world search engine queries, we
empirically characterize the performance gains offered by our new
algorithms. We show that the practical benefits of intersection lists
can be fully realized only with an early-termination algorithm.
Joint work with Ravi Kumar, Kunal Punera and Torsten Suel
Utility-Maximizing Privacy Mechanisms
Mukund Sundararajan
A mechanism for releasing information about a statistical database
with sensitive data must resolve a trade-off between utility and
privacy. Informally, publishing fully accurate information maximizes utility
while minimizing privacy, while publishing random noise accomplishes
the opposite. Formally, privacy can be rigorously quantified using the
framework of {\em differential privacy}, which requires that a
mechanism's output distribution is nearly the same (in a strong sense)
whether or not a given database row is included or excluded. Thus
far, (dis)utility has typically been quantified via the expected error
of a (randomly perturbed) response to a query.
We pursue much stronger and more general utility
guarantees. Just as differential privacy guarantees protection
against every potential attacker, independent of its side information,
we seek a mechanism that guarantees near-optimal utility to every
potential user, independent of its side information. Formally,
we model the side information of a potential user as a prior
distribution over query results. An interaction between such a user
and a mechanism induces a posterior distribution -- the utility of the
mechanism for this user is defined as the accuracy of this
posterior as quantified via a user-specific loss function. A
differentially private mechanism $M$ is (near-)optimal for a given
user $u$ if $u$ derives (almost) as much utility from $M$ as from any
other differentially private mechanism.
We prove that for each fixed counting query and differential privacy
level, there is a {\em geometric mechanism} $M^*$ --- a discrete
variant of the well-studied Laplace mechanism --- that is {\em
simultaneously utility-maximizing} for every possible user, subject to
the differential privacy constraint. This is an extremely strong
utility guarantee: {\em every} potential user $u$, no matter what
its side information and loss function, derives as much utility
from $M^*$ as from interacting with a differentially private
mechanism $M_u$ that is optimally tailored to $u$. The first part of
our proof characterizes the utility-maximizing differentially private
mechanism for a fixed but arbitrary user in terms of a basic feasible
solution to a linear program with constraints that encode differential
privacy. The second part shows that all of the relevant vertices of
this polytope (ranging over all
possible users) are derivable from the geometric mechanism via
suitable remappings of its range.
Joint work with Arpita Ghosh and Tim Roughgarden.
Online Advertising
Hamid Nazerzadeh
Online advertising is a multi-billion dollar growing market with
very interesting properties. I briefly explain some of these
properties, and talk about ad allocation problems in the context of
sponsored search and display advertising.
References:
- Dynamic Cost-Per-Action Mechanisms and Applications to Online
Advertising, with Amin Saberi, and Rakesh Vohra.
Proceedings of the 17th International World Wide Web Conference (WWW), 179-188, 2008. - A Combinatorial Allocation Mechanism with Penalties For Banner
Advertising, with Uriel Feige, Nicole Immorlica, and Vahab S. Mirrokni.
Proceedings of the 17th International World Wide Web Conference (WWW), 169-178, 2008. - Advertisement Allocation for Generalized Second Pricing Schemes,
with Ashish Goel, Mohammad Mahdian, and Amin Saberi
Fourth Workshop on Ad Auctions, 2008 - Allocating Online Advertisement Space with Unreliable Estimates,
with Mohammad Mahdian and Amin Saberi.
Proceedings of the 8th ACM Conference on Electronic Commerce (EC), 288-294, 2007.
From Bayesian to Worst-Case Optimal Auction Design
Tim Roughgarden
We define a general template for auction design that explicitly
connects Bayesian optimal mechanism design, the dominant paradigm in
economics, with worst-case analysis. In particular, we establish a
general and principled way to identify appropriate performance
benchmarks for prior-free optimal auction design. This framework
enables, for the first time, meaningful competitive analysis for
auction problems with allocation asymmetries, bidder asymmetries, and
costly payments.
The above is joint work with Jason Hartline (from STOC 2008 and
ongoing). Time permitting, we will also briefly discuss another recent
application of Bayesian optimal auction design: rigorous
efficiency-revenue approximation trade-offs (joint work with Shaddin
Dughmi and Mukund Sundararajan).
Yury Lifshits, Caltech, http://yury.name
Social Design is an art and science of setting and managing incentives in
web applications. So far, machine learning is the dominant approach to all
central algorithmic problems in the Web: search, advertising, recommendation
systems. Social design can become an important counterpart to ML, if we find
the ways to incentivise users to give us more and better input data and even
the answers themselves.
This talk is an outline of a new research field of Social Design. We start
with several stories including Internet press conference of president Putin,
reputation crash in largest Russian social network, and Yahoo! SearchMonkey
project. Then we attempt to extract, characterize and classify the commonly
used patterns in social design. Finaly, we set up the research agenda in the
area with focus on social aspects of semantic web and advertising
technologies.
Slides available at http://yury.name/talks.html
Bio: Yury Lifshits obtained his PhD degree from Steklov Institute of Mathematics
at St.Petersburg (2007). He is currently a postdoc at Caltech. He won two
gold medals at International Mathematical Olympiads, received the Best Paper
Award in Application Track of CSR'07 and the Yandex Teaching Award (2006)
for his course "Algorithms for Internet". Yury is a maintainer of "The
Homepage of Nearest Neighbors": http://simsearch.yury.name
Inferring Links and Initiators
Evimaria Terzi,IBM
Consider a 0-1 observation matrix M, where rows correspond to entities and columns correspond to signals; a value
of 1 (or 0) in cell (i; j) of M indicates that signal j has been observed (or not observed) in entity i. Given such a matrix
we study the problem of inferring the underlying directed links between entities (rows) and finding which entities act as initiators.
I will formally define this problem and propose an MCMC framework for estimating the links and the initiators given the matrix of observations M. I will also show how this framework can be extended to incorporate a temporal aspect; instead of considering a single observation matrix M we consider a sequence of observation matrices M1; : : : ;Mt over
time. Finally I will show that this problem is a generalization of several
problems studied in the field of social-network analysis.
Bio: Evimaria Terzi is a Research Staff Member at IBM Almaden since June 2007. She obtained her PhD from the University of Helsinki (Finland) in January 2007, MSc from Purdue
University (USA) in 2002
and BSc from the University of Thessaloniki (Greece) in 2000.
Reputation Mechanisms for Online Markets
Christina Aperjis
In online marketplaces potential buyers have to decide whether to buy an item and how much to pay for it without being able to observe it and without knowing whether the seller is trustworthy. Reputation mechanisms promote trust by giving information about past transactions. With the goal of incentivizing sellers to advertise truthfully, we study how the marketplace is affected by (i) the reputation mechanism, and (ii) the way buyers interpret the seller?s reputation score.
Two-Stage Myopic Dynamics in Network Formation Games
Esteban Arcaute
We consider two-stage myopic dynamics for network formation
games, where utility to a node is based on a function of
the distance to all other nodes. Since our network formation game
is a generalization of Myerson's announcement game, we use
pairwise Nash stability as the solution concept.
We prove that, although the price of anarchy of the static game is
unbounded (in the number of nodes in the network), our dynamics
converge to a subset of pairwise Nash stable networks with a constant
efficiency ratio. For some specific utility functions, we prove that
our dynamics converge to efficient networks.
This is joint work with Ramesh Johari and Shie Mannor.
Measuring and Modeling the Web
Ying Xu
The last couple of decades have witnessed a phenomenal growth in
the World Wide Web. The Web has become an ubiquitous channel
for information sharing and dissemination. This talk describes several
research contributions in an endeavor towards a better understanding
of the Web.
One of the basic questions about the Web is its size, i.e., how many
webpages there are on the Web. A closely related problem is the sizes
of search engine indexes, as search engine index defines an important
subset of the Web that is most easily accessible to users. In the last
ten years both the scientific literature and the popular press dealt at
length with methodologies and estimates for the sizes of the Web and
public search engines. In the first part of the talk, we propose
algorithms for measuring the Web size, including both empirical and
theoretical results.
The Web is so large that even estimating its size is a hard problem.
To better understand the Web, people study random web graph models
-- a stochastic process that generates random graphs that with high
probability resemble the Web graph. Random web graph models are
useful theoretical tools to provide insights to the evolution of the Web
graph, to predict future behaviors, and to generate synthetic data sets
for testing and simulation. In the second half of the talk, we briefly
present several results in analyzing web graph models and applying the
models to real world applications.
Should ad networks bother fighting click fraud?
Bobji Mungamuru
Suppose an ad network decides that a given click-through is invalid (or "fraudulent"). The implication is that the ad network will not charge the advertiser for that click-through. Therefore, arguably, the ad network is "taking money out of his own pocket" by marking clicks invalid. As such, should ad networks even bother fighting fraud? We analyze a simple economic model of the online advertising market, and conclude that the answer is "yes".
Effective reputation systems should incentivize users to acquire and reveal information about content quality. They should also counter distortions induced by strategic agents that benefit from manipulating reputations. Mechanisms have been proposed in the recent literature with the aim to provide such incentives. These mechanisms, which we will refer to as reputation markets, can be viewed as information markets designed to assess the quality of online content. In this paper we develop equilibrium models to study how incentives created by a reputation market should influence community behavior and the accuracy of assessments.
Optimal Marketing Strategies over Social Networks
Mukund Sundararajan
Abstract: We study revenue maximization in social networks. We identify a
family of strategies called 'influence and exploit' strategies that are easy
to implement, easy to optimize over and approximately optimal.
Based on a paper that will appear in WWW 08. Jointly with Jason Hartline
(Northwestern University) and Vahab Mirrokni (MSR Redmond).i
Ratio Index for Budgeted Learning, with Applications
Brad Null
In the budgeted learning problem, we are allowed to experiment on a set of
alternatives (given a fixed experimentation budget) with the goal of picking
a single alternative with the largest possible expected payoff.
Approximation algorithms for this problem were developed by Guha and
Munagala by rounding a linear program that couples the various alternatives
together. We present an index for this problem, which we call the ratio
index, which also guarantees a constant factor approximation. Index based
policies have the advantage that a single number (i.e. the index) can be
computed for each alternative irrespective of all other alternatives, and
the alternative with the highest index is experimented upon. This is
analogous to the famous Gittins index for the discounted multi-armed bandit
problem.
The ratio index has several interesting structural properties. First, it can
be computed in strongly polynomial time; the techniques we develop may be
useful for deriving strongly polynomial algorithms for other Markov Decision
Problems. Second, with the appropriate discount factor, the Gittins index
and the ratio index are constant factor approximations of each other, and
hence the Gittins index also gives a constant factor approximation to the
budgeted learning problem. Finally, the ratio index can be used to obtain a
"discount oblivious" solution to the multi-armed bandit problem, which in
turn gives constant factor approximations to a wide class of problems that
we call "list ranking problems".
Joint work with Rajat Bhattacharjee, Ashish Goel, and Sanjeev Khanna
Web Graph Similarity for Anomaly Detection
Panagiotis Papadimitriou
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the faultless operation of multiple software and hardware units. Checking the validity of a web graph requires a notion of graph similarity. Web graph similarity helps measure the amount and significance of changes in consecutive web graphs. These measurements validate how well search engines acquire content from the web. In this work we study five similarity schemes: three of them adapted from existing graph similarity measures and two adapted from well-known document and vector similarity methods. We compare and evaluate all five schemes using a sequence of web graphs for Yahoo! and study if the schemes can identify anomalies that may occur due to hardware or other problems.
I'll talk about a number of computational problems that arise in online
advertising, primarily from the perspective of the buyer of advertising.
Topics to be discussed include:
* When buying sponsored search links from search engines, how does one
manage a portfolio of millions of keywords with minimal human involvement?
Challenges include determining how much to bid for each keyword and what
ad text to display to attract the most customers.
* How can one learn what properties of a web site or advertisement will
lead to the greatest number of purchases or clicks? The problem is made
more difficult by the fact that different classes of users respond well to
different marketing messages, so it's best to use different content for
different user segments.
* Consider an auction where the items to be sold are not known in advance,
but rather arrive online, and buyers are willing to pay different amount
for each item (revealed as the items arrive). Each buyer has a maximum
limit on the total number of items that they will buy. How should the
seller assign items to buyers so as to maximize the seller's revenue?
An Axiomatic Approach to Personalized Ranking Systems
Alon Altman
Personalized ranking systems and trust systems are an essential tool for collaboration in a multi-agent environment. In these systems, trust relations between many agents are aggregated to produce a personalized trust rating of the agents. In this talk I introduce the first extensive axiomatic study of this setting, and explore a wide array of well-known and new personalized ranking systems. We adapt several axioms (basic criteria) from the literature on global ranking systems to the context of personalized ranking systems, and fully classify the set of systems that satisfy all of these axioms. We further show that all these axioms are necessary for this result.
Ad Auctions with Reserved Price
Arash Asadpour
I will talk about the Ad Auctions when the auctioneer enjoys a relevant ad
that she can show in one of the slots if she wants to. The setting is to
some extent similar to the Ad Auctions with reserved price.
I will explain a randomized auction which is truthful in expectation for
both the bidders and the auctioneer. Also our mechanism is individual
rational and budget balanced. I will discus the revenue of this auction and
also will talk about some other settings in which the technique used to
design the auction can be helpful.
This is a joint work with Kamal Jain (Microsoft Research) and David Abraham
(CMU).
Boosting the area under ROC curve
Phil Long, Google
We show that any weak ranker that can achieve an area under the ROC curve
slightly better than 1/2 (which can be achieved by random guessing) can be
efficiently boosted to achieve an area under the ROC curve arbitrarily close
to 1. This boosting can be performed even in the presence of independent
misclassification noise, given access to a noise-tolerant weak ranker.
(This is joint work with Rocco Servedio.)
Random input models for Adwords
Aranyak Mehta, Google
I will talk about a new result on Random Input models for the Adwords Problem: What are good algorithms for allocation when the query stream is modeled as a random process, rather than as a worst case input. How does the natural "highest-bidder-wins" greedy algorithm perform? This question leads to an interesting generalization of a classic online matching algorithm. I will also talk about a related result on extended bidding models for Ad auctions.
Who do you know: The Social Network Revolution
Orkut Buyukkokten, Google
Online social networks fundamentally change the way we get connected. The people we cross paths with have the biggest influence in our lives. Now it's easier to cross paths than ever as we are much closer and so much more connected. In this talk I will discuss the motivation behind the development of orkut.com, touch on the social and technical aspects of implementing and maintaining a system that has over 60 million users and reflect on the lessons we learned.
Deterministic Decentralized Search in Random Graphs
Esteban Arcaute
We study a general framework for decentralized search in
random graphs. Our main focus is on deterministic memoryless
search algorithms that use only local information to reach their
destination in a bounded number of steps in expectation. This class
includes (with small modifications) the search algorithms used in
Kleinberg's pioneering work on long-range percolation graphs and
hierarchical network models. We give a characterization of searchable
graphs in this model, and use this characterization to prove a
monotonicity property for searchability.
Joint work with Ning Chen, Ravi Kumar, David Liben-Nowell, Mohammad Mahdian, Hamid Nazerzadeh and Ying Xu.
Interdomain Routing and Games
Michael Schapira,
Hebrew University of Jerusalem ( http://www.cs.huji.ac.il/~mikesch/)
We present a game-theoretic model that captures many of the intricacies of
interdomain routing in today's Internet. In this model, the strategic agents are source-nodes located on a network, who aim to send traffic to a unique destination node . The model enables complex dynamic interactions between the agents (asynchronous, sequential, and based on partial-information).
We provide realistic conditions that guarantee that executing the Border gateway Protocol (BGP), the only protocol currently used for interdomain routing, is in the best interest of each of the players. Moreover, we show that even coalitions of players of any size cannot improve their routing outcomes by jointly deviating from BGP. Unlike the vast majority of works in mechanism design, these results do not require any monetary transfers (to or by the agents).
Based on joint work with Hagay Levin and Aviv Zohar.
The Anatomy of Clickbot.A
Neil Daswani,
Google
In this talk, I will present a detailed case study of the architecture of the Clickbot.A botnet that attempted a low-noise click fraud attack against syndicated search engines. The botnet of over 100,000 machines was controlled using a HTTP-based botmaster. Google identified all clicks on its ads exhibiting Clickbot.A-like patterns and marked them as invalid. We disclose the results of our investigation of this botnet to educate the security research community and provide information regarding the novelties of the attack.
Data Stream Algorithms with Applications to Automatic Worm Fingerprinting
Dilys Thomas
We will first go over a few network worms and contemporary algorithms to detect these network worms. We will then go over some recent techniques to detect these worms using "vunlerability signatures". We will then use this to review two important problems in data stream computation -- distinct value estimation and frequent element computation. We show how these algorithms can be used for automatic worm fingerprinting.
Click Fraud Resistant Methods for Learning Click-Through Rates
Sergei Vassilvitskii
Presenting the paper "Click Fraud Resistant Methods for Learning Click-Through Rates" by N. Immorlica, K. Jain, M. Mahdian, and K. Talwar from WINE 2005.
Pricing Partially Compatible Products
Adam Meyerson, UCLA
Suppose two competing companies provide roughly equivalent
suites of software products. Each version of a product has a possibly
distinct price and quality. Customers would like to purchase a low-
price, high-quality set of products. The issue is that products made
by different companies need not be fully compatible. We reduce the
customer's decision process to a minimum cut, and provide
approximation and hardness results for the problem of computing the
company's best-response strategy in several different models. We also
introduce a range of questions for further work.
These results will appear in the ACM Conference on Electronic
Commerce (EC) 2007, and are joint work with David Kempe, Nainesh
Solanki, and Ramnath Chellappa.
Google News Personalization: Scalable Online Collaborative Filtering
Mayur Datar, Google
Several approaches to collaborative filtering have been studied but seldom have the studies been reported for large (several millions of users and items) and dynamic (the underlying item set is continually changing) settings. In this paper we describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts. We combine recommendations from different algorithms using a linear model. Our approach is content agnostic and consequently domain independent, making it easily adaptible for other applications and languages with minimal effort. This paper will describe our algorithms and system setup in detail, and report results of running the recommendations engine on Google News.
Detecting Near-Duplicates for Web Crawling
Anish Das Sarma
This is joint work with Gurmeet Singh Manku and Arvind Jain in Google.
Near-duplicate web documents are abundant. Two such documents
differ from each other in a very small portion that displays
advertisements, for example. Such differences are irrelevant for
web search. So the quality of a web crawler increases if it can
assess whether a newly crawled web page is a near-duplicate of a
previously crawled web page or not.
In my talk I will address the problem of efficiently detecting
near-duplicates in the web crawl setting. First I shall review
Charikar's fingerprinting technique used in our approach. Then, I
shall formally define the problem and present algorithmic solutions
for two versions of the problem: online queries (single new
document) and batch queries (multiple new documents). Finally, I
shall briefly talk about our experimental analysis of the techniques.
Clickthrough of and Bidding on keywords
Aleksandra Korolova / Dilys Thomas
Aleksandra will talk about the paper "Predicting ClickThrough Rate Using Keyword Clusters" by Regelson and Fain from a Workshop on Sponsored Search Auctions. The results in the paper are mostly experimental. Dilys will briefly talk on "how to bid on keywords".
Multi-path Routing
Mihaela Enachescu
I will present an approach for enabling multi-path routing in large
networks. This is based on a project at NEC Labs, and is joint work
with Ravi Kokku. Our approach is to create multiple paths between
various source-destination pairs by selecting a special subset of
relay nodes. We give hardness results and provide some approximation
algorithms.
This work is motivated by the observation that given two end-host in
the Internet there exist potentially many independent or partially
overlapping paths between them. However, only one of these is usually
used by the underlay routing, leading to many inefficiencies (the
single path routings are usually slow to adapt in the presence of
fluctuating network conditions). Multi-path routing provides a more
adaptive framework, and it also raises a variety of (still open)
problems for our enjoyment.
Collaborative tagging systems
Paul Heymann
Collaborative tagging systems have recently become prevalent on the web. These systems are similar to classical keyword-based systems, but harness the input of many users to label large datasets like URLs on the web, videos, and photos. In my talk, I will give a brief description of tagging systems, and then talk about two recent projects. The first is work to predict tags annotating URLs based on the page text of those URLs. The outcome of this task has important implications for web search and for the usefulness of data created by social bookmarking systems. The second is an earlier (continuing) project on tag hierarchies.
Is Efficiency Expensive?
Mukund Sundararajan
We study the simultaneous optimization of efficiency and revenue in
pay-per-click keyword auctions in a Bayesian setting. Our main
result is
that the efficient keyword auction yields near-optimal revenue
even under
modest competition. In the process, we build on classical results in
auction theory to prove that increasing the number of bidders by the
number
of slots outweighs the benefit of employing the optimal reserve
price.
This is joint work with Tim Roughgarden.
Estimating Search Engine Index and Web Sizes
Ying Xu
I will talk about estimating the index size of search engines, both empirical and theoretical results. The talk is based on our CIKM paper "Estimating Corpus Size via Queries" and ICALP paper "Estimating Sum by Weighted Sampling".
Reusable and multiple goods
Ashish Goel
The talk is based on
Online auctions with re-usable goods
by Mohammad Hajiaghayi, Robert Kleinberg, Mohammad Mahdian, and David
Parkes, EC'05
The Sequential Auction Problem on eBay: An Empirical Analysis and a
Solution by Adam I. Juda and David C. Parkes, EC'06
Mechanism design based on Cost-Per-Acquisition (CPA)
Hamid Nazerzadeh
The Internet advertising has changed dramatically by moving from Cost-Per-Impression charging mechanism to Cost-Per-Click (CPC). I am going to talk about the possibility of going one step further and using CPA instead of CPC, and some of our results about this problem.
Asymptotically Optimal Repeated Auctions for Sponsored Search
Nicolas Lambert
We investigate asymptotically optimal keyword auctions, that is, auctions which maximize revenue as the number of bidders grows. We do so under two alternative behavioral assumptions. The first explicitly models the repeated nature of keyword auctions. It introduces a novel (and, we argue, plausible) assumption on individual bidding, namely that bidders never overbid their value, and bid their actual value if shut out for long enough. Under these conditions we present a broad class of repeated auctions that are asymptotically optimal among all sequential auctions (a superset of repeated auctions). Those auctions have varying payment schemes but share the ranking method. The Google auction belongs to this class, but not the Yahoo auction, and indeed we show that the latter is not asymptotically optimal. (Nonetheless, with some additional distributional assumptions, the Yahoo auction can be shown to belong to a broad category of auctions that are asymptotically optimal among all auction mechanisms that do not rely on ad relevance.) We then look at the one-shot keyword auction, which can be taken to model repeated auctions in which relatively myopic bidders converge on the equilibrium of the full-information stage game. In this case we show that the Google auction remains asymptotically optimal and the Yahoo auction suboptimal. The distributional assumptions under which our theorems hold are quite general, and arguably hold in all real-world situations of interest. We do however show that the Google auction is not asymptotically revenue maximizing for general distributions.

