Designing truthful contextual Multi-Armed bandits based sponsored search auctions

Jain, S.; Abhishek, K.; Gujar, G.

DSpace Home
→
Research Publications
→
Year-2020
→
View Item

dc.contributor.author	Abhishek, K.
dc.contributor.author	Jain, S.
dc.contributor.author	Gujar, G.
dc.date.accessioned	2021-06-30T17:27:12Z
dc.date.available	2021-06-30T17:27:12Z
dc.date.issued	2021-06-30
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/1927
dc.description.abstract	For sponsored search auctions, we consider contextual multi-armed bandit problem in the presence of strategic agents. In this setting, at each round, an advertising platform (center) runs an auction to select the best-suited ads relevant to the query posted by the user. It is in the best interest of the center to select an ad that has a high expected value (i.e., probability of getting a click × value it derives from a click of the ad). The probability of getting a click (CTR) is unknown to the center and depends on the user’s profile (context) posting the query. Further, the value derived for a click is the private information to the advertiser and thus needs to be elicited truthfully. The existing solution in this setting is not practical as it suffers from very high regret (O(T 2 3 )). Towards designing practically useful mechanisms, we first design an elimination-based algorithm ELinUCB-SB that is ex-post monotone, which is a sufficient condition for truthfulness. Thus, ELinUCB-SB can be naturally extended to ex-post incentive compatible and ex-post individually rational mechanism M-ELinUCB-SB. We show via experiments that the proposed mechanisms outperform the existing mechanism in this setting. Theoretically, however, the mechanism may incur linear regret in some instances, which may not occur frequently. To have a theoretically stronger mechanism for regret, we propose a SupLinUCB-based allocation rule SupLinUCB-S. With the help of SupLinUCB-S, we design a mechanism M-SupLinUCB-S, which is ex-post incentive compatible and ex-post individually rational. We prove that it has regret O(n 2√ dT log T) as against O(n p dT log T) for non-strategic settings; O(n) is price of truthfulness. We demonstrate the efficacy of our mechanisms via simulation and establish superior performance than the existing literature.	en_US
dc.language.iso	en_US	en_US
dc.title	Designing truthful contextual Multi-Armed bandits based sponsored search auctions	en_US
dc.type	Article	en_US