?
Performance Study of Modern Zeroth-Order Optimization Methods for LLM Fine-Tuning
Large Language Models (LLMs) are widely employed across a broad range of applications due to their versatility and state-of-the-art performance. However, as usage scenarios grow, there is a pressing demand for task-specific adaptation of LLMs through fine-tuning. While full fine-tuning (FT) remains the most preferred in terms of quality, its high memory and computation requirements limit its practical use, especially for LLMs. Parameter-efficient fine-tuning (PEFT) techniques, such as LoRA, mitigate this issue by updating a small subset of model parameters. However, it requires an extensive number of resources due to backpropagation. In contrast, zeroth-order (ZO) optimization methods, which approximate gradients using only forward passes, offer an attractive alternative for memory-constrained environments by eliminating the need for backpropagation, thus reducing memory overhead to inference-level footprints. Over the 2024-2025 year, several ZO techniques have been proposed, aiming to balance efficiency and performance. This paper introduces the comparative analysis of 12 zeroth-order optimization methods applied for the LLM fine-tuning task by memory utilization, quality, fine-tuning time, and convergence. According to the results, the best method in terms of memory reduction is ZO-SGD-Sign: 42.82% memory reduction; the best quality and fine-tuning time across zeroth-order methods compared to SGD is achieved with LoHO: 0.6% quality drop and 11.73% fine-tuning time increase, while no ZO method currently matches the Adam and AdamW convergence efficiency.