Learning and Optimization with Seasonal Patterns

A standard assumption adopted in the multi-armed bandit (MAB) framework is that the mean rewards are constant over time. This assumption can be restrictive in the business world as decision-makers often face an evolving environment where the mean rewards are time-varying. Ningyuan Chen discusses a non-stationary MAB model with K arms whose mean rewards vary over time in a periodic manner.

Image courtesy of interviewee. April 9, 2024

Faculti

Popular in Formal Sciences

Diameter rigidity for Kähler manifolds with positive bisectional curvature

Rank-Based Multivariate Sarmanov for Modeling Dependence

The Hybrid High-Order Method for Polytopal Meshes

Financial Informatics: An Information-Based Approach to Asset Pricing

Multiplicative orders of Gauss periods and the arithmetic of real quadratic fields

Some Numerical Methods for the Hele-Shaw Equations

Geometric stabilisation via p-adic integration

Learning and Optimization with Seasonal Patterns

Leave a Reply Cancel reply