Evaluating Kimi 2.5 vs Kimi 2.6: What happens to agent skills when the model gets smarter?
Early signals from benchmarking Kimi K2.5, K2.6, and Sonnet 4.5 on 21 agent skills. Kimi K2.6 is a better model than K2.5, and skills still matter as models improve.