I like working through research papers. At least well-written research papers. It's very satisfying following a line of logic from beginning to end, and seeing some novel outcome produced as a result. However, the delineation between working through research papers and *being* a researcher has never been made clear to me. Obviously a professional researcher is someone who is likely involved with the publication of research, not just reading it. But I suppose, from my perspective, that's a difference between how one uses the tools at their disposal. Still, I think it's worth sharing my experience with research papers to provide some background on where I'm coming from.

I'm most familiar with research papers in the domain of deep learning. Specifically, I've spent most of my time reading research papers dealing with deep reinforcement learning. The paper "Playing Atari with Deep Reinforcement Learning" (Mnih, 2013) was my first foray into the realm of the process of reading a paper, trying to understand it, then trying to reproduce it. It was challenging, but when you get it to work, it's very rewarding. The algorithm introduced, Deep Q-Learning, is very elegant and clever. It's not the best RL algorithm, but it serves as a great way to see how deep learning can be applied in this setting. As such, I've taught how to implement DQN a number of times now, to the point where I used to be able to do it live, from scratch, over the course of a 30 minute presentation and end up with something like this.

Deep Q-Learning is a great introduction to this stuff, but things get complicated pretty quickly. I remember trying to wrap my head around "Proximal Policy Optimization Algorithms" (Schulman, 2017), and it taking significant effort and experimentation to "get" it. It doesn't take *too* much time and effort to understand a paper, but I distinctly remember the "aha moment" where it went from knowing the logic and math to intuitively understanding what was happening. Once again, the challenge was to implement PPO on my own. I did, but it was much more challenging. Namely, I wasn't satisfied with training on simple environment as was the case with my DQN implementations. So I set out to use PPO to train in MuJoCo, which was a *real* hassle. This required me to train in the cloud, and most of my time went into setting up that workflow. But the result, again, was very satisfying.

In a sort of challenge to myself (and to satisfy a summer independent study requirement), I had decided to take on a different kind of research paper: "Policy Gradient Methods for Reinforcement Learning" (Sutton, 1999). The previous papers focused on a concrete algorithm which could be implemented and verified, while this paper focused on the theory behind these algorithms. As you may have noticed, the other papers were published in 2014 and 2017 (rather recently), while this paper was published in 1999. You'd think this would mean that this paper should be simpler; you'd be wrong.

Sutton et al. set out to prove that policy gradient methods could converge to optimal policies. If this doesn't mean much to you, trust me when I say it's pretty important. You can't have algorithms like PPO without this foundation being in place, and the fact that this paper was published almost 20 years prior shows just how much more work went into this effort after the fact (along with the necessary advancements in technology that allowed this to be possible).

Trying to understand this paper was brutal, because it was *just* math, and they pulled no punches. Still, I worked through it, section by section, making sure I understood every line, every symbol, and every nuance I could to be able to get that "aha" moment that I knew was achievable. I'm happy to say that I do feel I accomplished that, but it certainly wasn't as easy. With the other papers, I could implement something and show that it worked. But this paper was all theory! How could I prove to myself I actually understand what was happening if I couldn't see it "working" first hand?

Well, it turns out I *could* implement it, but I needed to work much closer to the theory compared to before. The result was this notebook which painstakingly works through the paper, section by section, to describe what's happening in the paper and then to implement it in code. I had to build the code to match the math rather than an algorithm, and it was no small feat (at least for me). The result, though, was quite satisfying: I now feel I can claim that I understand modern deep reinforcement learning from foundations through SOTA implementations (at least at the time).

I, of course, have read and at least partially implemented *many* other papers (I wish I would have kept better track), but these three papers and my work with them, to me, paints a very full picture of my experience in this domain. It was painful at first, but I really feel I "get it," and I hope to continue doing this stuff moving forward. I feel confident in my abilities to pick apart a research paper and actually understand it to the point of being able to reproduce it. Like I said, I like working through research papers, which may not be enough to *be* a researcher, but I suppose I don't care.

## Comments

## Post a Comment