Andy Le’s Post

View profile for Andy Le, graphic

Fintech Builder | Team Maker

Microsoft released a groundbreaking paper proposing a technique that achieves performance and perplexity on par with full FP16 models of the same size, but using significantly fewer resources. This approach enables fitting a 120-billion parameter model on a single consumer GPU with only 24GB of VRAM. This development has the potential to democratize access to powerful language models for a wider range of users. https://lnkd.in/gRZfSRm4

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

arxiv.org

To view or add a comment, sign in

Explore topics