Ever come across the hurdle of training large models and losing the battle to too high computational requirement issues? One of the solutions to this problem is LoRA (Low-Rank Adaptation) introduced by Edward Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen from Microsoft Corporation.
LoRA, or Low-Rank Adaptation, is a technique for efficiently adapting large pre-trained language models (LLMs) to specific tasks. LLMs are powerful but computationally expensive due to their vast number of parameters. LoRA addresses this challenge by significantly reducing the number of trainable parameters needed for adaptation.
Here's how LoRA works:
· Freeze Pre-trained Weights: The core idea is to freeze the weights of the pre-trained LLM. These weights capture the general knowledge learned from a massive dataset.
· Introduce Low-Rank Matrices: Instead of fine-tuning all the pre-trained weights, LoRA inserts a pair of low-rank matrices at each layer of the LLM architecture. These matrices have significantly fewer parameters compared to the original weights.
· Train Low-Rank Matrices: During adaptation to a specific task, only the trainable low-rank matrices are updated. These matrices capture the task-specific knowledge needed to adjust the LLM's behavior for the new domain.
Simplified representation-
Here's a simplified representation of how LoRA modifies the pre-trained weights (W) in a layer:
· Original Weights: W (large matrix)
· Low-Rank Matrices: U (small matrix) and V^T (small matrix, transposed[PS1] )
LoRA approximates the original weights with the product of these low-rank matrices:
W_adapted ≈ U * V^T
This approach drastically reduces the number of trainable parameters compared to directly fine-tuning the original weights (W).
#deep_learning
#LoRA
#Fine_Tuning
#Data_Science