Lamini reposted this
Llama 3.2 memory tuning is live on our serverless tier at https://lamini.ai These models are super fast to train and inference and we can remove hallucinations from them using memory tuning.
Lamini makes it possible for enterprises to turn proprietary data into the next generation of LLM capabilities, by offering a platform for in-house software teams to uplevel to OpenAI-level AI teams and to build within the security of their existing infrastructure.
External link for Lamini
Santa Cruz Ave
Menlo Park, California 94025, US
Lamini reposted this
Llama 3.2 memory tuning is live on our serverless tier at https://lamini.ai These models are super fast to train and inference and we can remove hallucinations from them using memory tuning.
Sharon Zhou, PhD will be speaking at the Open Data Science Conference (ODSC) on Oct. 31. If you're there, be sure to check out her talk “Removing Hallucinations by 95% with Memory Tuning: A Technical Deep Dive". Looking forward to seeing you there!
Building the future of LLMs. Cofounder & CEO, Lamini. CS Faculty at Stanford. MIT Technology Review’s 35 Under 35. (Speaker).
The Lamini team has been hard at work accelerating & communicating Lamini Memory Tuning. Here’s our latest explainer & blogpost ahead of my keynote at the Open Data Science Conference (ODSC) Open to feedback on how to best explain it 🙂 (many creds to the one and only Alanna Brown 🫶) Lamini Memory Tuning is a new way to fine-tune any open LLM by tuning millions of LoRA adapters and selecting across them in a wide Mixture of Experts at inference time. Instead of optimizing average error on everything, Memory Tuning optimizes for zero error on specific facts, so it recalls those facts nearly perfectly — while still allowing for the LLM to generalize with average error on everything else. It changes the paradigm to make the LLM near perfect on facts, and still pretty good at everything else.
General LLMs work well for shallow tasks. But in many fields, like financial services, manufacturing, biotech, legal tech, retail, and healthcare, they offer little to no utility until they reach a high bar of accuracy. In many cases, that means 9’s of accuracy. If your bar for accuracy is 95% then 50% accuracy is actually equal to 0% usefulness. For example, in legal contract review, if AI catches only half the redlines, a lawyer still has to review the entire contract manually. Our goal is to help customers reach 9’s of accuracy, cost-effectively, without the need for a large team of ML/AI engineers.
An interesting trend with new AI products is they have basically zero utility until they reach a certain degree of accuracy. In other words, the usefulness is roughly binary. Up until point X, the effort required to monitor and check the AI's work exceeds the effort to do it manually. People often fall into the trap of thinking that a 50% solution delivers 50% of the value. In fact, a 50% solution is usually worthless. Where X lies on the scale of usefulness depends on the application, the organization and the user. Some people just have a burning need and are more tolerant of errors. Others need error rates approaching zero. In legal tech, the bar for accuracy is very high. And most products have not yet hit it. Any work that will be sent to a client or court is extremely intolerant of errors. 70% or even 80% will generally not cut it for document drafting or due diligence, for example. Besides just increasing accuracy, the biggest lever that builders can pull is to make the AI's work easier to check. Better redlines, AI explanations, citing sources. These things all help tremendously. The flip side is that once you've hit X, it's immediately obvious and you get to witness some pretty magical customer experiences.
🎉 We're excited to support Llama 3.2 3B and 1B models and see how our customers fine-tune these smaller models! Congrats AI at Meta on another game-changing release. Get a $300 credit when you sign up for a Lamini account to fine-tune and run inference on these smaller models for free!
📣 Introducing Llama 3.2: Lightweight models for edge devices, vision models and more! What’s new? • Llama 3.2 1B & 3B models deliver state-of-the-art capabilities for their class for several on-device use cases — with support for Arm, MediaTek & Qualcomm on day one. • Llama 3.2 11B & 90B vision models deliver performance competitive with leading closed models — and can be used as drop-in replacements for Llama 3.1 8B & 70B. • New Llama Guard models to support multimodal use cases and edge deployments. • The first official distro of Llama Stack simplifies and supercharges the way developers & enterprises can build around Llama to support agentic applications and more. With Llama 3.2 we’re making it possible to run Llama in even more places, with even more flexible capabilities. Details in the full announcement ➡️ https://go.fb.me/8ar7oz Download Llama 3.2 models ➡️ https://go.fb.me/7eiq2z These models are available to download now directly from Meta and Hugging Face — and will be available across offerings from 25+ partners that are rolling out starting today, including Accenture, Amazon Web Services (AWS), AMD, Microsoft Azure , Databricks, Dell Technologies, Deloitte, Fireworks AI, Google Cloud, Groq, IBM, Infosys, Intel Corporation, Kaggle, NVIDIA, Oracle Cloud, PwC, Scale AI, Snowflake, Together AI and more. We’ve said it before and we’ll say it again: open source AI is how we ensure that these innovations reflect the global community they’re built for and benefit everyone. We’re continuing our drive to make open source the standard with Llama 3.2.
Lamini reposted this
GPUs learn fast
How many GPUs do you need to remove hallucinations on the data in 10,000 PDFs using Memory Tuning - https://lnkd.in/gHepjUsP ? Here are some times on different GPU configs.
GPUs learn fast
How many GPUs do you need to remove hallucinations on the data in 10,000 PDFs using Memory Tuning - https://lnkd.in/gHepjUsP ? Here are some times on different GPU configs.
Our co-founder and CEO, Sharon Zhou, PhD, recently spoke at Aurecon #ExemplarForum2024, where she shared insights on: • High-ROI use cases for Large Language Models (LLMs) in various industries • Overcoming key challenges in AI deployment: poor model quality, hallucinations, costs, and security • The shift from general AI to expert-level, domain-specific models • Lamini's innovative approach to LLM fine-tuning and deployment • Practical strategies for embedding deep domain expertise into AI models • The critical importance of robust AI evaluation frameworks 👀 Watch the presentation here: https://lnkd.in/eHSFrixZ
Thanks for having us Aurecon!
Generative AI has game-changing potential for engineering and design. Check out insights from Dave Mackenzie, Theodore Galanos and Patricia Summers, plus experts at Lamini, Nomic AI and Nous Research. https://bit.ly/3Zv58S5 #GenerativeAI #DigitalTransformation #ExemplarForum2024
🎉🎉🎉 Excited to announce our new self-service offering, Lamini On-Demand. Pay as you go and burst tuning across multiple GPUs as needed. 🪙 $0.50 per million tokens (input and output, including guaranteed JSON format) and $1 per tuning step 💳 $300 in free credit for new and existing customers 💪Run your tuning and inference jobs on our high performance GPU cluster 📈Achieve 99% accuracy with Lamini Memory Tuning and turn any open LLM into a mixture of experts Read the blog post by our Product Manager, Eda Z., to learn more. https://lnkd.in/gHUrrXVq