报告题目:Restless Bandit Model: Resource Allocation and Competition - Examples and Problems
报告人:傅婧博士 墨尔本皇家理工大学
主持人:王增福副教授
报告时间:2021年7月12日(周一)
报告地点:suncitygroup太阳集团341会议室9:30-11:00
报告简介:We study a resource allocation problem with varying requests and with resources of limited capacity shared by multiple requests. It is modeled as a set of heterogeneous restless multiarmed bandit problems (RMABPs) connected by constraints imposed by resource capacity. Following Whittle’s relaxation idea and Weber and Weiss’ asymptotic optimality proof, we propose a simple policy and prove it to be asymptotically optimal in a regime where both arrival rates and capacities increase. We provide a simple sufficient condition for asymptotic optimality of the policy and, in complete generality, propose a method that generates a set of candidate policies for which asymptotic optimality can be checked. To the best of our knowledge, this is the first work providing asymptotic optimality results for such a resource allocation problem and such a combination of multiple RMABPs. On the other hand, in many RMABPs, computation of asymptotically optime policies requires knowledge of the transition matrices of the underlying processes, which are sometimes hidden from decision makers. We take first steps towards a tractable and efficient reinforcement learning algorithm for controlling such a system. We setup parallel Q-learning recursions, with each recursion mapping to individual possible values of the Whittle index.
报告人简历:
傅婧,博士,皇家墨尔本理工大学讲师,2011年获上海交通大学计算机科学学士学位,2016年获香港城市大学电子工程博士学位。2016年至2019年,担任墨尔本大学数学与统计学院任博士后。2020年至今,担任墨尔本皇家理工大学工程学院讲师。先后在Operations Research,IEEE/ACM Transactions on Networking,IEEE Journal on Selected Areas in Communications等国际顶级学术期刊上发表多篇学术论文。傅婧博士的研究兴趣包括高能效网络/调度、大规模网络中的资源分配、半马尔可夫/马尔可夫决策过程、无休止多臂赌博机问题、随机优化。
撰稿:王增福
审核:王小旭