Learning to Reason in 13 Parameters (Morris et al.)

March 24, 2026

-Tiny LoRA

-LoRA adapter trained with 13 parameters, projected into rank-

r

matrix to form

\Delta W

W' = W + \Delta W

-using RL, faster + easier to train with less data than full LoRA, SFT, etc.

The LoRA adapter is currently

\Delta W = A \cdot B

-where A and B are both trainable matrices of rank

r

(hyperparameter)