Announcement_3
ROM is now on arXiv. We propose a lightweight streaming framework to detect and mitigate overthinking in large reasoning models at the token level — a compact detection head (<0.01% extra parameters) monitors reasoning in real-time and triggers early exit when redundant steps are detected, achieving 93.51% accuracy with 47.2% shorter responses. Check out our project page and code.