Monthly ranking

2026-06

Top 10 papers, rising papers, dominant themes, and the full monthly index.

Top 10 Papers

  1. The Verification Horizon: No Silver Bullet for Coding Agent Rewards100
  2. DanceOPD: On-Policy Generative Field Distillation98
  3. OPID: On-Policy Skill Distillation for Agentic Reinforcement Learning98
  4. JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting97
  5. Neglected Free Lunch from Post-training: Progress Advantage for LLM Agents95
  6. Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generation94
  7. ViQ: Text-Aligned Visual Quantized Representations at Any Resolution94
  8. ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation91
  9. COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origami91
  10. GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents90

Themes

  • diffusion models2
  • reinforcement learning2
  • 3D meshes1
  • 3D reasoning1
  • Blind Trust Problem1
  • Clopper-Pearson bound1

Complete Monthly Index

  1. The Verification Horizon: No Silver Bullet for Coding Agent Rewardsgenerative capabilities, human intent, policy capability, proxy signals, reward design, reward hacking
  2. DanceOPD: On-Policy Generative Field Distillationclassifier-free guidance, expert capabilities, flow-matching models, generative field distillation, global editing, local editing
  3. OPID: On-Policy Skill Distillation for Agentic Reinforcement Learningcritical-first routing, hierarchical skills, on-policy trajectories, outcome-based reinforcement learning, policy optimization, reinforcement learning
  4. JetSpec: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree DraftingMoE Qwen3, acceptance rate, autoregressive Large Language Models, autoregressive factorization, bidirectional block-diffusion, branch-agnostic marginals
  5. Neglected Free Lunch from Post-training: Progress Advantage for LLM AgentsMarkov decision process, advantage function, agentic settings, failure attribution, log-probability ratio, progress advantage
  6. Qwen-Image-Agent: Bridging the Context Gap in Real-World Image Generationagentic framework, context gap, context grounding, context-aware planning, image agent bench, image agent capabilities
  7. ViQ: Text-Aligned Visual Quantized Representations at Any Resolutiondiscrete representations, feature discretization, low-level reconstruction, multimodal modeling, position-aware head-wise quantization, proximal representation learning
  8. ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and GenerationGRPO, boundary-aware count policy, count-faithful image generation, crowd counting, cycle-consistent learning, density-aware adaptive zooming
  9. COrigami: An AI Pipeline for Co-Designing Flat-Foldable Visually Recognisable Origamiaesthetic evaluation, base packing, co-creativity, computational origami, crease patterns, flat foldability
  10. GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use AgentsN/A
  11. Information-Aware KV Cache Compression for Long ReasoningForward Influence, KV cache, KV cache compression, LLMs, attention weights, entropy-aware
  12. When Does Combining Language Models Help? A Co-Failure Ceiling on Routing, Voting, and Mixture-of-Agents Across 67 Frontier ModelsClopper-Pearson bound, GPQA-Diamond, Gaussian copula, Self-MoA, accuracy, beta
  13. Why Multi-Step Tool-Use Reinforcement Learning Collapses and How Supervisory Signals Fix Itagentic reinforcement learning, catastrophic collapse, control tokens, erroneous example supervision, exploratory learning, hint-based guidance
  14. How Post-Training Shapes Biological Reasoning Modelscontinued pre-training, foundation models, generalization, in-domain performance, language models, multimodal biological data
  15. LISA: Likelihood Score Alignment for Visual-condition Controllable GenerationLISA, conditional control, decoder, diffusion models, disentangled features, feature projection
  16. Discretizing Reward ModelsMonte Carlo dropout, discretization, discriminative ability, oversensitivity, policy learning, reinforcement learning
  17. EO-WM: A Physically Informed World Model for Probabilistic Earth Observation ForecastingNDVI, Normalized Difference Vegetation Index, climatological baseline, cumulative physical stress signals, diffusion models, meteorological forcing
  18. Hallucination in World Models is Predictable and Preventablecoverage-aware sampling, curiosity rewards, data-centric signals, data-efficient fine-tuning, ground-truth actions, hallucination
  19. CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent EconomiesLLM agents, agent behavior, autonomous agents, communication, cumulative net income, economic systems
  20. OpenBioRQ: Unsolved Biomedical Research Questions for Agentsagentic collapse, agentic models, answer key, biomedical research questions, citation verification, frontier agents
  21. Fast LeWorldModelJoint-Embedding Predictive Architectures, LeWorldModel, action-prefix prediction, autoregressive rollout, latent transition model, latent world model
  22. Confidence-Aware Tool Orchestration for Robust Video UnderstandingBlind Trust Problem, agentic video understanding, calibrated reliability score, confidence-cost GRPO reward, evidence interface, reliability-relevance score
  23. Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments3D reasoning, agent generalization, agentic systems, automated evaluation engine, benchmark, graphical understanding
  24. PhysiFormer: Learning to Simulate Mechanics in World Space3D meshes, attention factorised, autoregressive baselines, denoising diffusion process, diffusion transformer, permutation-invariant
  25. In-Context World Modeling for Robotic ControlVision-Language-Action models, in-context adaptation, novel configurations, parameter updates, real-world robot platforms, robot policies