GLaDOS

toolOpen Source

A few years ago, I found that duplicating a specific block of 7 middle layers in Qwen2-72B, without modifying any weights, improved performance across all Open LLM Leaderboard benchmarks and took #1 place. As of 2026, the top 4 models on that leaderboard are still descendants. The weird finding: single-layer duplication does nothing. Too few layers, nothing. Too many, it gets worse. Only circuit-

Visit Website →GitHub

0 views0 clicksAdded 3/14/2026

Reviews

No reviews yet. Be the first!

Loading reviews...

Advertise Here

Reach AI developers and builders

From $49/mo →