Learning and Optimization with Seasonal Patterns

A standard assumption adopted in the multi-armed bandit (MAB) framework is that the mean rewards are constant over time. This assumption can be restrictive in the business world as decision-makers often face an evolving environment where the mean rewards are time-varying. Ningyuan Chen discusses a non-stationary MAB model with K arms whose mean rewards vary over time in a periodic manner. 

Image courtesy of interviewee

Read the Study
Log-in or Sign-up to Faculti
Currently viewing this subject insight as a guest. You have insight(s) remaining for this month.

Leave a Reply

Your email address will not be published.

Copyright © Faculti Media Limited 2013 - 2024. All rights reserved.
error: