Learning and Mechanism Design Vasilis Syrgkanis Microsoft Research, New England Mechanism design and analysis for learning agents Online learning as behavioral model in auctions Learning good mechanisms from data Points of interaction PAC learning applied to optimal mechanism design Online learning of good mechanisms

Learning from non-truthful auction data Econometric analysis in auction settings Mechanism design for learning problems Online learning with strategic experts: incentivizing exploration, side-incentives Buying data: private data, costly data, crowdsourcing Mechanism design and analysis for learning agents Online learning as behavioral model in auctions Learning good mechanisms from data

Points of interaction PAC learning applied to optimal mechanism design Online learning of good mechanisms Learning from non-truthful auction data Econometric analysis in auction settings Mechanism design for learning problems Online learning with strategic experts: incentivizing exploration, side-incentives

Buying data: private data, costly data, crowdsourcing Auctions in practice are simple and non-truthful Key Insights How do players behave: simple adaptive learning algorithms, e.g. online learning How good is the welfare of learning outcomes? Is it computationally easy for players to learn? Are there easily learnable simple mechanisms? Mechanism Design in Combinatorial Markets

items for sale, to bidders Each bidder has value for bundle of items Typically, complement-free, e.g. submodular (decreasing marginal valuation), sub-additive (whole is worth at most sum of parts) Quasi-linear utility: Auctioneers objective: maximize welfare: Values are only known to bidders Algorithmic Mechanism Design Vickrey-Clarke-Groves Mechanism Truthful reporting is dominant strategy

Maximizes social welfare Too much communication: need to report my whole valuation Computationally inefficient: requires being able to solve the welfare maximization problem Truthful Algorithmic Mechanism Design [Nisan-Ronen99] Computationally efficient mechanism Settle with approximately optimal social welfare Assume access to either a demand query or value query access to the

valuation of each player Many Mechanisms in Practice Non-truthful with simple allocation and pricing schemes Many mechanisms running simultaneously or sequentially Overall auction system is a nontruthful mechanism How Do Players Behave? Classical game theory: players play according to Nash Equilibrium How do players converge to equilibrium? Nash Equilibrium is computationally hard

Caveats! Most scenarios: repeated strategic interactions Simple adaptive game playing more natural Learn to play well over time from past experience e.g. Dynamic bid optimization tools in online ad auctions internet routing advertising auctions No-Regret Learning Consider mechanism played repeatedly for T iterations Each player uses a learning algorithm which satisfies the no-regret condition:

Average utility of algorithm Average utility of fixed bid vector Many simple algorithms achieve no-regret MWU, Regret Matching, Follow the Regularized/Perturbed Leader, Mirror Descent [Freund and Schapire 1995, Foster and Vohra 1997, Hart and Mas-Collel 2000, Cesa-Bianchi and Lugosi 2006,] Simple mechanisms with good welfare Simultaneous Second Price Auctions (SiSPAs)

Sell each individual item independently using a second price auction Bidders simultaneously submit a bid at each auction At each auction highest bidder wins and pays second highest bid Pros. Easy to describe, simple in design, distributed, prevalent in practice Cons. Bidders face a complex optimization problem Welfare at no-regret [Bik99,CKS08, BR11, HKMN11,FKL12,ST13, FFGL13]. If players use no-regret learning, average welfare is at least of optimal, even for sub-additive valuations. Similar welfare guarantees for many other simple auctions used in practice: Generalized Second Price Auction, Uniform-price Multi-unit Auction (captured by notion of smooth mechanism [ST13]) Computational Efficiency of NoRegret

In SiSPAs, number of possible actions of a player are exponential in Standard no-regret algorithms (e.g. multiplicative weight updates) require computation per-iteration, linear in number of actions Raises two questions: Can we achieve regret rates that are poly(m) with poly(m) amount of computation at each iteration? Not unless [Daskalakis-S16] Are there alternative designs or notions of learning that are poly-time No-envy learning: not regret buying any fixed set in hindsight [Daskalakis-S16] Single-bid auctions: each bidders submits a single number; his per-item price [DevanurMorgenstern-S-Weinberg15, Braverman-Mao-Weinberg16] Mechanism design and analysis for learning agents Online learning as behavioral model in auctions

Learning good mechanisms from data Points of interaction PAC learning applied to optimal mechanism design Online learning of good mechanisms Learning from non-truthful auction data Econometric analysis in auction settings Mechanism design for learning problems Online learning with strategic experts: incentivizing

exploration, side-incentives Buying data: private data, costly data, crowdsourcing Classic optimal mechanism design requires prior What if we only have samples of values? Approximately optimal mechanisms from samples Key Insights What is the sample complexity? A statistical learning theory question With computational efficiency? Online optimization of mechanisms

Samples arrive online and not i.i.d. What if we observe incomplete data? Prices and winners or chosen items from posted prices Optimal Mechanism Design Selling a single item Each buyer has a private value How do we sell the item to maximize revenue? Myerson82: Second price with reserve Setting the optimal reserve requires knowing Sample complexity of optimal mechanisms: what if instead of knowing we have samples from ? [Roughgarden-Cole14, Mohri-Rostamizadeh14] PAC Learning and Sample

Complexity Given a hypothesis space and samples from compute : What is achievable with samples? Algorithm: Empirical Risk Maximization Bound on is captured by complexity measures of hypothesis space: VC dimension, Pseudo-dimension, Rademacher Complexity PAC Learning for Optimal Auctions Hypothesis space is space of all second price auctions with reserve Need to bound complexity measure of this space Rademacher Complexity [Medina-Mohri14] Beyond i.i.d.: Optimal Myerson auction is more complex

Defines monotone transformation for each player Transform players value run second price auction with reserve of 0 Space of all such mechanisms has unbounded complexity Use independence across buyers to discretize the space to an -cover Discretize transformations to take values in multiples of [Morgenstern-Roughgarden15] Discretize values to multiples of [Devanur-Huang-Psomas16] Efficiently Learning Optimal Auctions ERM for many of these problems can be computationally hard What if we want a poly-time algorithm? Non-i.i.d. regular distributions [Cole-Roughgarden14, Devanur et al16] i.i.d. irregular distributions [Roughgarden-Schrijvers16] Non-i.i.d. irregular distributions [Gonczarowski-Nisan17]

Typically: discretization in either virtual value or value space and subsequently running Myersons auction on empirical distributions Why efficient learnable? Bayesian version of the problem has closed form simple solution (Myerson) Multi-item Optimal Auctions Optimal mechanism is not well understood or easy to learn Compete with simple mechanisms: Posted bundle price mechanisms [Morgenstern-Roughgarden16] (Pseudo-dimension) Affine maximizers, bundle pricing, second-price item auctions [Balcan-SandholmVitercik15,16] (Rademacher complexity) Bundle, item pricing [S17] (new split-sample growth measure) Yaos simple approximately optimal mechanisms [Cai-Daskalakis17] (new measure of complexity for product distributions)

Online Learning of Mechanisms Valuation samples are not i.i.d. but coming in online arbitrary manner Dynamically optimize mechanisms to perform as good as the best mechanism in hindsight? Optimizing over second price auctions with player-specific reserves [Roughgarden-Wang16] Optimizing over Myerson style auction over discretized values [Bubeck et al17] Reductions from online to offline problem for discretized Myerson and other auctions [Dudik et al17] Learning from incomplete data What if we only observe responses to posted prices? Posting prices online and buyers selecting optimal bundle [Amin et al.14, Roth-Ullman-Wu16] Goal is to optimize revenue Assumes goods are continuous and buyers value is strongly concave

What if we only observe winners and prices? Can still compute good optimal reserve prices without learning values [Coey et al.17] Mechanism design and analysis for learning agents Online learning as behavioral model in auctions Learning good mechanisms from data Points of interaction PAC learning applied to optimal mechanism design Online learning of good mechanisms

Learning from non-truthful auction data Econometric analysis in auction settings Mechanism design for learning problems Online learning with strategic experts: incentivizing exploration, side-incentives Buying data: private data, costly data, crowdsourcing To make any inference we need to connect bids to values Key

Insights Requires some form of equilibrium/behavioral assumption BNE, NE, CE, No-regret learning In many cases value distribution can be reconstructed from bid distribution If goal is to optimize revenue or infer welfare properties then learning the value distribution is not needed Learning from non-truthful data What if we have data from a first price auction or a Generalized Second Price auction? Auctions are not truthful: we only have samples of bids not values

Not a PAC learning problem any more Requires structural modeling assumptions to connect bids to values Bayes-Nash equilibrium, Nash equilibrium, No-regret learners First Price Auction: BNE Econometrics [Guerre-Perrigne-Vuong00] BNE best response condition implies : PDF and CDF of bid distribution Inference approach: Step 1. Estimate and Step 2. Use equation to get proxy samples of values Step 3. Use these values as normal i.i.d. samples from Extends to any single-dimensional mechanism design setting

Rates are at least as slow as with samples No-regret learning [Nekipelov-Syrgkanis-Tardos15, Noti-Nisan16-17] If we assume regret Current average utility Average deviating utility Regret from fixed action Inequalities that unobserved must satisfy Denote this set as the rationalizable set of parameters Returns sets of possible values

Can refine to single value either by optimistic approach [NST15] or by a quantal regret approach [NN17] Revenue inference from non-truthful bids [Chawla-Hartline-Nekipelov14] Aim to identify a class of auctions such that: By observing bids from the equilibrium of one auction Inference on the equilibrium revenue on any other auction in the class is easy Class contains auctions with high revenue as compared to optimal auction Class analyzed: Rank-Based Auctions

Position auction with weights Bidders are allocated randomly to positions based only the relative rank of their bid k-th highest bidder gets allocation Pays first price: Feasibility: For regular distributions, best rank-based auction is 2-approx. to optimal Revenue inference from non-truthful bids [Chawla-Hartline-Nekipelov14] By isolating mechanism design to rank based auctions, we achieve:

Constant approximation to the optimal revenue within the class Estimation rates of revenue of each auction in the class of Allows for easy adaptation of mechanism to past history of bids [Chawla et al. EC16]: allows for A/B testing among auctions and for a universal B test! (+improved rates) Welfare inference from non-truthful bids [Hoy-Nekipelov-S16] AGT Theory Prove worst-case bounds on the price of anarchy ratio vs

Econometrics Observe bid dataset Infer player values/distributions Calculate quantity of interest Bridges across two approaches Use worst-case price of anarchy methodologies Replace worst-case proofs with data-measurements Mechanism design and analysis for learning agents Online learning as behavioral model in auctions Learning good mechanisms from data

Points of interaction PAC learning applied to optimal mechanism design Online learning of good mechanisms Learning from non-truthful auction data Econometric analysis in auction settings Mechanism design for learning problems Online learning with strategic experts: incentivizing exploration, side-incentives

Buying data: private data, costly data, crowdsourcing Incentivizing exploration: online learning were choices are recommendations to strategic users Users might have prior biases and need to be convinced Goal is to incentivize taking a desired action Key Insights Via information design or payment schemes Achieve good regret rates despite incentives Bying data: most machine learning tasks require

inputs from humans Crowdsourcing: incentivizing strategic agents to exert costly effort to produce labels Private data: buying private data for agents that value privacy and have a cost for providing them Relevant courses Daskalakis, Syrgkanis. Topics in Algorithmic Game Theory and Data Science, MIT 6.853, Spring 2017 https://stellar.mit.edu/S/course/6/sp17/6.853/index.html Eva Tardos. Algorithmic Game Theory, Cornell CS6840, Spring 2017 http://www.cs.cornell.edu/courses/cs6840/2017sp/ Yiling Chen. Prediction, Learning and Games, Harvard CS236r, Spring 2016 https://canvas.harvard.edu/courses/9622

Nina Balcan. Connections between Learning, Game Theory, and Optimization, GTech 8803, Fall 2010 http://www.cs.cmu.edu/~ninamf/LGO10/index.html Workshop on AGT and Data Science Mechanism design and analysis for learning agents Online learning as behavioral model in auctions Learning good mechanisms from data Points of interaction PAC learning applied to optimal mechanism design

Online learning of good mechanisms Learning from non-truthful auction data Econometric analysis in auction settings Mechanism design for learning problems Online learning with strategic experts: incentivizing exploration, side-incentives Buying data: private data, costly data, crowdsourcing Thank you!