Our new paper PW-OPSD is now on arXiv! We find that teacher-token reliability in on-policy self-distillation for reasoning is position-structured, and propose Position-Weighted On-Policy Self-Distillation (PW-OPSD) to up-weight reliable later tokens at no extra teacher cost. Check out our paper and code.