Revisions
I have a tendency to make a lot of typos. My brain reads what I meant, not what I typed. On an occasion, I may also make an error (:gasp:) or state something too imprecisely.
I don’t have collaborators for this site to check for any kind of errors. Therefore, if you spot an error of any sort, please let me know by reporting it on my Github issues page or contact me some other way (see the about page).
To encourage people to let me know about issues, I will list user-submitted revisions below. As of now, there are not many user-submitted revisions because the site is too new, too unpopular, or because I’ve been uncharacteristically error-free. Feel free to remedy that!
User submitted revisions by date
- Aug 18 2025, Kaushik Subramanian (offline report): typo corrections in “Should we abandon RL” and “Why doesn’t Q-learning work with continuous actions?”
- Jan 12 2025, Cedric Brendel (offline report): suggestion to change “finite” to “bounded” in “What is the “horizon” in reinforcement learning?”
- Nov 17 2024, Michael Littman (offline report): typo correction for “If Q-learning is off-policy, why doesn’t it require importance sampling?”.
- Nov 13 2024, araffin: typo/phrasing corrections for “Why does experience replay require off-policy learning and how is it different from on-policy learning?”
- Nov 13 2024, Craig Sherstan (offline report): typo corrections for “What is the difference between V(s) and Q(s,a)?” and “Why does the policy gradient include a log probability term?”
- Nov 11 2024, Bram Grooten (offline report): typo corrections for “About” and “Notation” pages.
- Nov 10 2024, Craig Sherstan (offline report): typo correction for “Why does experience replay require off-policy learning and how is it different from on-policy learning?”