?
Trade-off between memory usage and optimization speed in a simple machine learning task
—In modern machine learning, the efficiency of train ing is heavily influenced by the choice of optimization algorithm, with preconditioned methods offering significant speed improve ments at the cost of increased memory usage. This study explores the trade-off between memory consumption and optimization performance by generalizing the Shampoo preconditioning al gorithm to support arbitrarily sized preconditioning matrices, rather than being limited by tensor dimensions. We evaluate this approach on a perceptron trained for a simple regression task, measuring how different preconditioner sizes affect the training speed. Our results reveal a strong positive correlation: larger preconditioners consistently lead to faster learning, although the performance gains scale sublinearly with memory. Our findings provide practical insights into balancing computational resources and training efficiency in adaptive optimization.