Learning to Reason in 13 Parameters (Morris et al.)

March 24, 2026

-Tiny LoRA
-LoRA adapter trained with 13 parameters, projected into rank-rr matrix to form ΔW\Delta W
W=W+ΔWW' = W + \Delta W
-using RL, faster + easier to train with less data than full LoRA, SFT, etc.

The LoRA adapter is currently

ΔW=AB\Delta W = A \cdot B
-where A and B are both trainable matrices of rank rr (hyperparameter)