Revise Related to State of the field

amsks · web-flow · commit b768d932afde · 2026-03-19T11:44:44.000+01:00
Updated the section title from 'Related Work' to 'State of the field' and refined the content discussing the design principles and comparisons of various RL libraries.
diff --git a/paper/paper.md b/paper/paper.md
@@ -54,7 +54,7 @@ We introduce *Mighty*: a modular library designed to enable research at the inte
 
 Mighty is designed around three design principles: *flexibility, smooth integration with existing libraries, and environment parallelization*. First, flexibility is central. Mighty exposes transitions, predictions, networks, and environments to meta-methods, enabling a broad range of research patterns including black-box outer loops, algorithm-informed inner loops, and environment-level interventions. Second, Mighty integrates smoothly with Gymnasium [@towers-arxiv24a], Pufferlib [@suarez-rlc25], CARL [@benjamins-tmlr23a], and can interface with tools such as evosax [@evosax2022github] in under $100$ lines of code. This minimizes the glue code while preserving flexibility. Finally, Mighty uses standard Python and PyTorch for optimized networks with vectorized CPU environments for fast environment interaction. This design offers high training speeds, even for purely CPU-based environments, without sacrificing algorithmic modularity or code clarity.
 
-## Related Work
+## State of the field
 
 The rapidly growing ecosystem of RL libraries spans diverse design philosophies -- from low-level composability [@weng-jmlr22a] to turnkey baselines [@raffin-jmlr21a; @huang-jmlr22a] and massive-scale engines [@toledo-misc24a] -- making direct comparison and tool selection challenging. Modular research frameworks expose the internal building blocks of an RL pipeline as standalone components that can be re-combined to quickly prototype new algorithms.  
 TorchRL [@bou-arxiv23a] pioneered this approach in the PyTorch ecosystem, introducing the TensorDict abstraction to seamlessly pass the observations, actions and rewards between modules. Tianshou [@weng-jmlr22a] offers a similarly flexible design with separate *Policy*, *Collector*, and *Buffer* classes, enabling researchers to switch custom exploration strategies or data collection schemes with minimal boilerplate. Although these libraries excel at inner loop algorithm development and fine‐grained experimentation, counter to Mighty, they leave higher‐order workflows such as curriculum learning or meta-adaptation across tasks to external scripts or user‐written loops. Monolithic baselines such as stable baselines3 (SB3) [@raffin-jmlr21a] and CleanRL/PureJaxRL [@huang-jmlr22a; @lu-neurips22a] prioritize ease of use and reproducibility. However, this simplicity comes at the cost of extensibility: SB3's algorithms hide most of the training loop behind a single `learn()` call, and CleanRL's single file scripts are not designed for import or extension. Scalable platforms such as RLlib [@liang-icml18a; @liang-neurips21a] and STOIX [@toledo-misc24a] focus on maximizing throughput and supporting distributed execution. Although these systems shine when running large experiments, their APIs do not natively unify component modularity with built‐in meta-learning or curriculum design.