tgoop.com/noml_digest/425
Last Update:
От A/B тестов к многоруким бандитам и RL
Вводные посты, как многорукие бандиты и RL помогут в A/B тестах и контролируемых экспериментах:
📃 Multi-armed Bandits: an alternative to A/B testing, 2021 (5 минут).
📃 Beyond A/B Testing: Multi-armed Bandit Experiments, 2019 (7 минут).
📃 Supercharge your A/B Testing using Reinforcement Learning, 2021 (7 минут).
Введение в RL и QL c примером бизнес-кейса в маркетинге:
📃 Как Reinforcement Learning помогает ритейлерам, 2020 (17 минут).
Про сэмплирование Томпсона и ε-жадную стратегию:
📃 Solving multiarmed bandits: A comparison of epsilon-greedy and Thompson sampling (есть перевод), 2018 (12 минут).
Подробное введение в бандитов - Multi-Armed Bandits, 2020 (70-80 минут):
📃 Part 1 Mathematical Framework and Terminology.
📃 Part 2 The Bandit Framework.
📃 Part 3 Bandit Algorithms.
📃 Part 4 The Upper Confidence Bound (UCB) Bandit Algorithm.
📃 Part 5 Thompson Sampling (есть перевод).
📃 Part 6 A Comparison of Bandit Algorithms.
Подробно про QL - Applied Reinforcement Learning, 2022 (40-50 минут):
📃 Part I: Q-Learning.
📃 Part II: Implementation of Q-Learning.
📃 Part III: Deep Q-Networks (DQN).
📃 Part IV: Implementation of DQN.
📃 Part V: Normalized Advantage Function (NAF) for Continuous Control.
Подробно про RL - Reinforcement Learning Made Simple, 2020-2021 (70-80 минут):
📃 Part 1: Intro to Basic Concepts and Terminology.
📃 Part 2: Solution Approaches.
📃 Part 3: Model-free solutions, step-by-step.
📃 Part 4: Q Learning, step-by-step.
📃 Part 5: Deep Q Networks, step-by-step.
📃 Part 6: Policy Gradients, step-by-step.
BY NoML Digest
Share with your friend now:
tgoop.com/noml_digest/425
