Training large foundation models? We built Neptune Scale to let you monitor such training and debug any issues quickly. Now available in beta: https://buff.ly/4eCFUpz Coming soon for everyone. #generativeai #genai #llm
neptune.ai
Software Development
Palo Alto, California 37,791 followers
Experiment tracker purpose-built for foundation model training.
About us
Monitor thousands of per-layer metrics—losses, gradients, and activations—at any scale. Visualize them with no lag and no missed spikes. Drill down into logs and debug training issues fast. Keep your model training stable while reducing wasted GPU cycles.
- Website
-
https://neptune.ai
External link for neptune.ai
- Industry
- Software Development
- Company size
- 51-200 employees
- Headquarters
- Palo Alto, California
- Type
- Privately Held
- Founded
- 2017
- Specialties
- Machine learning, Gen AI, Generative AI, LLMs, Large Language Models, LLMOps, Foundation model training, and Experiment tracking
Locations
-
Primary
2100 Geng Rd
Palo Alto, California 94303, US
-
Krańcowa
5
Warsaw, Mazovian 02-493, PL
Employees at neptune.ai
Updates
-
[New on our blog] Bayesian Deep Learning is Needed in the Age of Large-Scale AI [Paper Reflection] Author: Vincent Fortuin Reading time: 6 min — (link to the full article in the comments) #generativeai #genai #llm
-
-
For large foundation models, subtle issues in just a few layers can cause silent degradation in the training process. The problem? Aggregate metrics often mask these instabilities. Without tracking layer-wise activations, gradients, and losses, you often don’t see the issues. How granular is your logging—do you monitor individual layers or only global loss? #generativeai #genai #llm
-
Some AI questions seem impossible—until someone dares to answer. At NeurIPS 2024, we challenged Amaury Gouverneur, PhD student at Kungliga Tekniska högskolan, with some of the toughest ones, like: “What combination of existing tech plus new developments will it take for us to run billion-parameter architectures on edge devices?” Watch to hear his perspective. — (Link to the full playlist in the comments) #neurips #generativeai #genai #llm
-
Maintaining AI infrastructure requires constant work—one that many ML/AI teams are forced to handle on their own. Keunwoo Choi shares the challenges AI teams face when training foundation models from scratch without dedicated infra support: → Role conflict: researchers take on infrastructure maintenance, often diverting focus from model development. → Choosing between GPU utilization vs. delivery: maximizing GPU efficiency is tempting (given the cost), but sometimes, the speed of iteration matters more. → Debugging nightmares: as GPU clusters scale, failures increase, and error messages rarely provide useful diagnostics. — Our upcoming report dives deeper into these challenges. Follow along for more insights! #generativeai #genai #llm #foundationmodels
-
How can we protect generative AI from adversarial attacks? During NeurIPS, Ambrish Rawat, Senior Research Scientist at IBM, presented his work “Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI”. — Read the paper: https://buff.ly/As83NIG Watch the full presentation: https://buff.ly/OtbfCbi #generativeai #genai #llm #neurips
-
[New on our blog] Introduction to State Space Models as Natural Language Models by Jana Kabrit TL;DR → State Space Models (SSMs) use first-order differential equations to represent dynamic systems. → The HiPPO framework provides a mathematical foundation for maintaining continuous representations of time-dependent data, enabling efficient approximation of long-range dependencies in sequence modeling. → Discretization of continuous-time SSMs lays the groundwork for processing natural language and modeling long-range dependencies in a computationally efficient way. → LSSL, S4, and S5 are increasingly sophisticated and efficient sequence-to-sequence state-space models that pave the way for viable SSM-based alternatives to transformer models. — (link to the full article in the comments) #generativeai #genai #llm
-
-
Training LLMs is hard. Training them efficiently is even harder. Here’s what experience has taught Stefan Mesken: → Curriculum design is tricky: deciding what data to use (and when) is one of the biggest optimization challenges. → Hyperparameter tuning matters (a lot): as models scale, they become even more sensitive. Getting this wrong can lead to costly inefficiencies. → Infrastructure is everything: building a supercomputer is closer to constructing a house than buying a laptop. Every detail impacts performance. → Software optimization is a game-changer: a dedicated HPC team can significantly boost training efficiency and unlock new capabilities in the inference pipeline. → Hiring the right team is a key investment: technical expertise across hardware, software, and research is critical to navigating the complexities of LLM development. — More insights like this will be featured in our upcoming State of LLM training report. Stay tuned! #generativeai #genai #llm
-
New: Markdown widgets in Reports. Include notes and explanations, document progress, and highlight key insights in your reports. #generativeai #genai #llm