unsafe.host

link sharing thread: ml/ai edition anonymous 2025-05-19@830 No.31

let's share some cool ml/ai papers and other resources!
here's a start:

https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ (The 37 Implementation Details of Proximal Policy Optimization)
https://arxiv.org/pdf/2101.03961 (Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity)
https://arxiv.org/abs/2505.07215 (Measuring General Intelligence with Generated Games)
https://arxiv.org/html/2504.20571v1 (Reinforcement Learning for Reasoning in Large Language Models with One Training Example)

↳ anonymous 2025-05-19@831 No.32

https://storage.googleapis.com/public-technical-paper/INTELLECT_2_Technical_Report.pdf (INTELLECT-2: A Reasoning Model Trained Through Globally Decentralized Reinforcement Learning)
https://howtoscalenn.github.io/ (How To Scale)
https://rlhfbook.com/ (Reinforcement Learning from Human Feedback: A short introduction to RLHF and post-training focused on language models)

↳ anonymous 2025-05-19@833 No.33

https://arxiv.org/abs/2503.01067 (All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning)
https://gorilla.cs.berkeley.edu/blogs/13_bfcl_v3_multi_turn.html (BFCL V3 • Multi-Turn & Multi-Step Function Calling Evaluation)
https://arxiv.org/abs/2505.03335 (Absolute Zero: Reinforced Self-play Reasoning with Zero Data)
https://arxiv.org/pdf/2109.08668 (Primer: Searching for Efficient Transformers for Language Modeling)
https://ofir.io/How-to-Build-Good-Language-Modeling-Benchmarks/ (How to Build Good Language Modeling Benchmarks)