Unum’s Post

View organization page for Unum, graphic

1,337 followers

How small should a Language Model be? Less than two months ago, we released UForm-Gen, one of the industry's most miniature multimodal Generative #AI models. Now downloaded over 100,000 times a month, it's one of the industry's most popular captioning and visual question-answering models. It was built on a Language Model with only 1.3 billion parameters in size and sometimes worked better for Vision-Language tasks than the 100-1000x larger Google Gemini. Scaling down is the key to privacy-preserving AI models running on every one of the billions of chips produced yearly! So, using the NVIDIA DGX-H100 nodes on the Nebius #GPU cloud, we've trained several tiny models! One of them features the smallest Language Model we've used to date, derived from the "Qwen1.5" by Alibaba Group with only 0.5 billion parameters. The model significantly outperforms our previous result thanks to a considerably larger and higher-quality multimodal dataset and an improved Vision tower! On a technical side, the new "UForm-Gen-2" scored 19.58 on MM-Vet, 45.5 on SQA, and 880 on MME benchmarks, similar to last year's models 10-20x the size. It works with the Hugging Face Transformers library out of the box and is already available online, free for commercial use 🤗 https://lnkd.in/gti4HdAK

  • Unum UForm Gen2 captioning previews

To view or add a comment, sign in

Explore topics