Together AI

Together AI

Software Development

San Francisco, California 34,034 followers

The future of AI is open-source. Let's build together.

About us

Together AI is a research-driven artificial intelligence company. We contribute leading open-source research, models, and datasets to advance the frontier of AI. Our decentralized cloud services empower developers and researchers at organizations of all sizes to train, fine-tune, and deploy generative AI models. We believe open and transparent AI systems will drive innovation and create the best outcomes for society.

Website
https://together.ai
Industry
Software Development
Company size
51-200 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2022
Specialties
Artificial Intelligence, Cloud Computing, LLM, Open Source, and Decentralized Computing

Locations

  • Primary

    251 Rhode Island St

    Suite 205

    San Francisco, California 94103, US

    Get directions

Employees at Together AI

Updates

  • View organization page for Together AI, graphic

    34,034 followers

    🚀 Announcing the launch of Llama 3.2 and Llama Stack on Together AI, in partnership with AI at Meta. 🎉 We are excited to offer free access to the Llama 3.2 vision model for developers to build and innovate with open source AI. Start building with the Llama-Vision-Free model today: 👉 https://lnkd.in/gWxwdaVd ▶ What we are launching: - Free Llama 3.2 Vision Model (11B): Develop and experiment with our high-quality Llama-Vision-Free endpoint for multimodal tasks. - Together Turbo Inference Endpoints (11B, 90B): High performance and accuracy for tasks like image captioning, visual question answering, and image-text retrieval. - New Llama Stack APIs: Standardized APIs to simplify building agentic and retrieval-augmented generation (RAG) conversational apps. ▶ Unlock powerful use cases: - Interactive Agents: Build AI agents that process both image and text inputs Image Captioning: Create high-quality image descriptions for e-commerce and digital accessibility - Visual Search: Enable users to search via images, enhancing search efficiency for retail and e-commerce ▶ Industry applications: - Healthcare: Accelerate medical image analysis for faster diagnostics Retail & E-commerce: Revolutionize shopping with image-based search and personalized recommendations - Finance & Legal: Streamline workflows by analyzing visual and textual content to optimize contract reviews and audits 💡 Check out napkins.dev: Our open-source demo app uses Llama 3.2 vision to transform sketches and wireframes into React code! Try it out at https://napkins.dev 🔧 Get started today: Experiment with the Llama-Vision-Free endpoint, or build for production with Llama 3.2 Together Turbo endpoints. 🌟 Read more in the blog https://lnkd.in/gDfmXxzu

    • No alternative text description for this image
  • View organization page for Together AI, graphic

    34,034 followers

    We’re excited to power the AI infrastructure of leading enterprises like Salesforce, The Washington Post, and Zoom. With the Together Enterprise Platform, you can securely run inference, fine-tuning, and training on your own models, in your own environment, while unlocking 2-3x faster inference speeds and reducing operational costs by up to 50%. Learn more about our enterprise offerings at our new page: together.ai/enterprise

  • View organization page for Together AI, graphic

    34,034 followers

    Announcing Together Cookbooks! 👨🍳 A collection of hands-on notebooks showcasing powerful use cases of open-source models with Together AI, including Text RAG, Multimodal Document RAG, Semantic Search, Rerankers, and Structured JSON extraction. Here's a glimpse at what’s inside:   🖼 Multimodal Document RAG with Nvidia Investor Slide Deck: Implement multimodal RAG with ColQwen2 and Llama 3.2 90b Vision, combining text and images for advanced retrieval. 📊 Embedding Visualization: Visualize vector embeddings to explore structure in high-dimensional spaces. 🌐 Knowledge Graphs with Structured Outputs: Generate knowledge graphs from LLMs using structured JSON generation. 🧠 Semantic Search: Boost search precision with BERT-based embedding models for better retrieval. 🔎 Improving Search with Rerankers: Refine search results with rerankers to enhance relevance across large document corpora. 📰 Structured Text Extraction from Images: Extract structured text from images, ideal for document digitization and workflow automation. 📝 Text RAG: Implement text-based Retrieval-Augmented Generation to enrich responses with relevant knowledge. 🔗 Explore the cookbooks here: https://lnkd.in/g5E4VYqd

    • No alternative text description for this image
  • View organization page for Together AI, graphic

    34,034 followers

    Our CEO, Vipul Ved Prakash, joined Clara Shih, CEO of Salesforce AI, on the Ask More of AI podcast to dive into the future of generative AI and how Together AI is leading the charge. In this episode, they break down how Together AI is optimizing AI workloads to make models faster, smarter, and more efficient for real-world applications. Key innovations like FlashAttention, speculative decoding, and model quantization are highlighted, showing how we’re transforming AI workloads for greater speed, scalability, and impact. It’s an exciting look at how we’re helping businesses bring AI from pilot to production at scale. 🎧 Tune in to the conversation here: https://lnkd.in/g8HU3P4B

    View profile for Clara Shih, graphic
    Clara Shih Clara Shih is an Influencer

    CEO of Salesforce AI | Founder & Board Chair of Hearsay Systems | TIME 100 AI | WEF YGL

    As generative AI shifts from pilot to production, efficiency, cost, and scalability matter a lot more. Founded 2 years ago as "AWS for Generative AI," Together AI has raised $240M to provide cloud compute optimized for AI workloads. In this week's episode of my #AskMoreOfAI podcast, CEO/founder Vipul Ved Prakash talks about innovations to make models faster and smarter including: 🔹 FlashAttention: Smart GPU-aware tricks to reduce memory needed for calculating attention and rearrange calculations to speed up inference. 🔹 Speculative decoding: Speeds up inference by predicting multiple tokens in advance instead of one at a time, then selects the best ones and prunes the rest. 🔹 Model quantization: Reduce model size and speed up inference by reducing precision of numerical representations used in models without significantly degrading performance. In most LLMs, parameters are stored as 32-bit floating-point numbers, which consume a lot of memory and processing power. Quantization converts these to lower sig figs, eg 16-bit floats or even 8-bit integers.   🔹 Mixture of Agents, combining use of multiple specialized models (agents) that work together, with each agent handling a different aspect of a problem such as a sales agent, sales manager agent, deal desk agent, and legal contracts agents collaborating together. Vipul predicts that cloud compute for #GenAI will surpass the traditional hyperscaler business within 2-3 years. Salesforce Ventures is proud to have led the Series A earlier this year, and customers running models on Together can BYOM with Einstein Model Builder. 🎧 Listen or watch here! https://lnkd.in/g6XX4KCR

    • Ask More of AI with Clara Shih, podcast episode featuring Vipul Ved Prakash, CEO and Co-founder of Together AI
  • View organization page for Together AI, graphic

    34,034 followers

    📬 Never miss a beat! Today we are introducing "Together We Build", a newsletter with a handpicked selection of news, product launches, novel research, and AI tools from Together AI. Subscribe to keep up with the latest developments in generative AI and LLMs! And don't miss our first issue 👇

    Latest Updates: FREE Llama 3.2 Multimodal & FLUX.1 [schnell], NVIDIA H200s, and Enterprise Platform

    Latest Updates: FREE Llama 3.2 Multimodal & FLUX.1 [schnell], NVIDIA H200s, and Enterprise Platform

    Together AI on LinkedIn

  • View organization page for Together AI, graphic

    34,034 followers

    New work on linearizing LLMs! Like subquadratic capabilities? Like modern 7B+ LLMs? But don’t have the budget to pre-train billions of parameters on trillions of tokens to get subquadratic, 7B+ LLMs? Then check out LoLCATs, our new work led by Michael Zhang that converts existing Transformers like Llama and Mistral into state-of-the-art subquadratic variants. Now for the same cost as a LoRA finetune! LoLCATs builds on a simple framework to convert Transformers into subquadratic models: 1. Swap an LLM’s softmax attentions for more efficient alternatives. 2. Fine-tune the LLM to adapt to these layers & recover pre-trained quality. However, to improve linearized LLM quality, while drastically reducing the cost of this recovery, we build LoLCATs around two simple findings. First, we can learn how to approximate softmax attentions with existing linear attentions. This lets us replace softmax attentions with near-literal drop-in replacements, that are still subquadratic to compute. Next, this makes parameter-efficient fine-tuning like LoRA, sufficient to adjust for any approximation errors & rapidly recover LM quality. The results speak for themselves. LoLCATs-linearized Llama 3 8Bs and Mistral 7Bs significantly outperform both prior linearized LLMs and strong subquadratic LLMs, while only training 0.2% of their parameters on 0.003 - 0.02% of their training tokens. And we did one last thing! Mostly just because we could, we used LoLCATs to linearize the entire Llama 3.1 family. And deliver the first linearized 70B and 405B LLMs, while significantly improving over baseline qualities. Learn more on our blog here: https://lnkd.in/gQFeZxMY

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
      +2
  • View organization page for Together AI, graphic

    34,034 followers

    ### In this event we'll discuss how you can perform RAG over complex PDF documents that contain images, graphs, tables text charts and more! We'll describe in detail how: - The new image retriever ColPali works - How you can finetune ColPali to improve further for your usecase - How to leverage multi-vector retrieval to retrieve from PDFs - How to use language vision models like the new Llama 3.2 vision series to perform document RAG

    How to Build Multimodal Document RAG with Llama 3.2 Vision and ColQwen2

    How to Build Multimodal Document RAG with Llama 3.2 Vision and ColQwen2

    www.linkedin.com

  • View organization page for Together AI, graphic

    34,034 followers

    Congratulations to Braintrust on their Series A! 🎉 We've been partnering with them to showcase the performance of our models in real-world tests. Check out this Braintrust recipe showing that Llama 3.2 vision models running on Together AI are 3x faster with the same accuracy as GPT-4o-mini and GPT-4o. Read here: https://lnkd.in/gq-jDNWq

    View organization page for Braintrust, graphic

    2,569 followers

    We’re thrilled to announce that we've raised a $36M Series A led by Martin Casado at Andreessen Horowitz to advance the future of AI software engineering, bringing our total funding to $45 million. Through our work with top AI engineering and product teams from Notion, Stripe, Vercel, Airtable, Instacart, Zapier, Coda, The Browser Company, and many others, we’ve had a front-row seat to what it takes to build world-class AI products. Along the way, we’ve learned a few key lessons: - Crafting effective prompts requires active iteration. - Evaluations are crucial for systematically improving quality over time. - Production logs provide a vital feedback loop, generating new data points that drive better evaluations. Evals are just the first step to building AI apps. That’s why we’re also excited to introduce functions, the flexible primitive for creating prompts, tools, and scorers that sync between your codebase and the Braintrust UI.

Similar pages

Funding