Xinyan Wang
xwang2587@wisc.edu.
5606 Morgridge Hall, 1205 University Avenue, Madison, WI 53706
I am Xinyan Wang, a third year PhD student in Statistics at the University of Wisconsin–Madison, advised by Professor Jun Shao and working with Professor Chaowei Xiao at Johns Hopkins University. I received my BS in Statistics from East China Normal University in 2022 and MS in Statistics from UW–Madison in 2023. I am also pursuing a MS in Computer Science at UW–Madison.
I work on large reasoning models (LRMs). My goal is to make reasoning models efficient and safe enough to deploy in practice, using reinforcement learning and representation analysis. My current topics of interest include:
- Efficient Reasoning: Reducing redundant computation in large reasoning models at inference time — e.g., ROM, a model-agnostic streaming detector-and-intervention framework that curbs overthinking in frozen LRMs in real time at no accuracy cost.
- Safety of Reasoning Models: Understanding and red-teaming the vulnerabilities that long reasoning traces introduce — e.g., ReasoningBomb (CCS 2026), a reinforcement-learning-based inference-time denial-of-service attack that traps LRMs into pathologically long reasoning.
- Reasoning Distillation: Identifying which teacher signals are reliable when distilling reasoning into models and weighting supervision accordingly — e.g., PW-OPSD, which shows teacher-token reliability in on-policy self-distillation is position-structured and up-weights reliable later tokens at no extra teacher cost.
news
| May 2026 | Our new paper PW-OPSD is now on arXiv! We find that teacher-token reliability in on-policy self-distillation for reasoning is position-structured, and propose Position-Weighted On-Policy Self-Distillation (PW-OPSD) to up-weight reliable later tokens at no extra teacher cost. Check out our paper and code. |
|---|---|
| May 2026 | ROM is on arXiv. We frame overthinking in large reasoning models as a latent productive-to-redundant transition that surfaces in hidden states around first-correct-solution (FCS) boundaries, and propose ROM, a model-agnostic streaming framework that monitors a frozen LRM with a lightweight hidden-state detector and intervenes at well-formed reasoning boundaries. Our Counterfactual Self-Correction (CSC) augmentation preserves useful pre-FCS self-correction while labeling only post-FCS continuation as redundant. On Qwen3-8B and DeepSeek-R1-Distill-Qwen-32B across MATH500, GSM8K, AIME25, and MMLU-Pro, ROM improves the accuracy–length tradeoff (e.g., Qwen3-8B: 4262 → 3107 tokens with slightly higher accuracy), stacks with L1 for another ~21% token reduction at zero accuracy loss, and cuts wall-clock latency by 46.5%. Check out our project page, code, and dataset. |
| Apr 2026 | Our paper ReasoningBomb has been accepted to ACM CCS 2026! We propose an RL-based framework that crafts short, natural-language prompts to trap LRMs into pathologically long reasoning, with a constant-time surrogate reward enabling 4.39×10⁵× training speedup. Just 10% malicious traffic cuts benign throughput by 49.8% and monopolizes 64.3% of compute. Check out our paper, website, code, and dataset. |
| Feb 2026 | ReasoningBomb is now on arXiv. Check out our website, code, and dataset. |
| Sep 2024 | Passed my Qualifying Exam. |
selected publications
2026
2023
service
- Reviewer of ACL' 26, ECCV' 26.