Tho Le

A Data Scientist. Looking for knowledge!

Reinforcement Learning Notes

19 Feb 2025 » ai, reinforcement learning

Terminologies

Evaluation

  • Off-Policy Evaluation (OPE): a technique in RL to estimate the performance of a target policy (i.e., the policy you want to evaluate) using data collected by a behavior policy (i.e., a different policy used to generate the data).
  • Conservative Q-Learning (CQL): prevent value overestimation outside the data support.
  • Implicit Q-Learning (IQL): strong, simple, stable without importance sampling.
  • TD3+BC / BCQ / BRAC / AWAC: if your action is continuous (budget value).
  • FQI/FQE (Fitted Q Iteration/Evaluation): classical, strong baselines.

Limitations

  • Real-time use can be limited.
    • Need a lot of interactions with the env. to learn effective policies.
    • Slow inference or policy updates for deep RL algos –> prevent applications in high-freq trading or robotics.
    • Rewards in RL are often delayed –> computational expensive esp. in dynamic env. where feedbackloops must be rapid.
    • Potential solution: