The Core Idea: Instead of updating all the parameters within a massive model, LoRA introduces small, trainable matrices (low-rank matrices) into the pre-trained model's layers. Only these new matrices are trained when adapting the model to a new task or domain.
Efficiency: LoRA drastically reduces the number of trainable parameters, leading to faster training times and lower computational costs. Reduced Memory Footprint: The added matrices are very small compared to the original model, resulting in significant memory savings.
Flexibility: LoRA can be used to adapt models to various new tasks without completely retraining the huge underlying model. Preservation of Knowledge: Fine-tuning with LoRA helps to retain the general knowledge of the original large model while specializing it for new purposes.
Initialization: LoRA injects trainable low-rank matrices (A and B) into the weight matrices of each layer in a pre-trained model. Adaptation: During training for a new task, only the parameters within matrices A and B are updated. The original model weights remain mostly frozen. Inference: To use the adapted model, the modified weight matrices are computed by multiplying the low-rank matrices with the original weights.
Natural Language Processing: Adapt LLMs for new domains (e.g., legal, medical), tasks (e.g., question answering on company policies), or styles (e.g., formal vs. casual). Computer Vision: Specialize image models (like Stable Diffusion) for new styles or objects. Other Deep Learning Models: LoRA's principles can potentially be applied to a variety of deep learning architectures.