TensorZero

TensorZero

Technology, Information and Internet

About us

Industry
Technology, Information and Internet
Company size
2-10 employees
Type
Privately Held

Employees at TensorZero

Updates

  • View organization page for TensorZero, graphic

    112 followers

    𝐍𝐞𝐰 𝐅𝐞𝐚𝐭𝐮𝐫𝐞: 𝐁𝐞𝐬𝐭-𝐨𝐟-𝐍 𝐒𝐚𝐦𝐩𝐥𝐢𝐧𝐠 (𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞-𝐓𝐢𝐦𝐞 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧) TensorZero now has built-in support for Best-of-N (BoN) sampling, an inference-time optimization strategy that can significantly improve the quality of your LLM outputs, with only small changes to your TensorZero configuration (no client changes!). Here’s how it works: 1. Generate multiple response candidates using one or more variants (i.e. possibly using different models and prompts) 2. Use an evaluator model to select the best response from these candidates 3. Return the selected response as the final output This approach allows you to leverage multiple prompts or variants to increase the likelihood of getting a high-quality response. It’s particularly useful when you want to benefit from an ensemble of variants or reduce the impact of occasional bad generations. — We're also releasing a complete runnable example for this feature: 𝐈𝐦𝐩𝐫𝐨𝐯𝐢𝐧𝐠 𝐋𝐋𝐌 𝐂𝐡𝐞𝐬𝐬 𝐀𝐛𝐢𝐥𝐢𝐭𝐲 𝐰𝐢𝐭𝐡 𝐁𝐞𝐬𝐭-𝐨𝐟-𝐍 𝐒𝐚𝐦𝐩𝐥𝐢𝐧𝐠 This example showcases how best-of-N sampling can significantly enhance an LLM's chess-playing abilities by selecting the most promising moves from multiple generated options. → 𝑾𝒊𝒕𝒉 𝒂 𝒇𝒆𝒘 𝒍𝒊𝒏𝒆𝒔 𝒐𝒇 𝒄𝒐𝒏𝒇𝒊𝒈𝒖𝒓𝒂𝒕𝒊𝒐𝒏, 𝑮𝑷𝑻-4𝒐 𝑴𝒊𝒏𝒊'𝒔 𝒔𝒖𝒄𝒄𝒆𝒔𝒔 𝒓𝒂𝒕𝒆 𝒊𝒏 𝒄𝒉𝒆𝒔𝒔 𝒑𝒖𝒛𝒛𝒍𝒆𝒔 𝒊𝒎𝒑𝒓𝒐𝒗𝒆𝒔 𝒇𝒓𝒐𝒎 35% 𝒕𝒐 41%. — Enabling BoN sampling takes only a few changes to your TensorZero configuration (no client changes!). Learn more: https://lnkd.in/gKPraZtV

    • No alternative text description for this image
  • View organization page for TensorZero, graphic

    112 followers

    We're 𝐨𝐩𝐞𝐧-𝐬𝐨𝐮𝐫𝐜𝐢𝐧𝐠 𝐓𝐞𝐧𝐬𝐨𝐫𝐙𝐞𝐫𝐨: a platform that helps LLM applications graduate from API wrappers into defensible AI products. 1. Integrate our model gateway 2. Send metrics or feedback 3. Unlock compounding improvements in quality, cost, and latency TensorZero enables a 𝐝𝐚𝐭𝐚 & 𝐥𝐞𝐚𝐫𝐧𝐢𝐧𝐠 𝐟𝐥𝐲𝐰𝐡𝐞𝐞𝐥 𝐟𝐨𝐫 𝐋𝐋𝐌𝐬 by unifying: • 𝐈𝐧𝐟𝐞𝐫𝐞𝐧𝐜𝐞: one API for all LLMs, with <1ms P99 overhead • 𝐎𝐛𝐬𝐞𝐫𝐯𝐚𝐛𝐢𝐥𝐢𝐭𝐲: inference & feedback → your database • 𝐎𝐩𝐭𝐢𝐦𝐢𝐳𝐚𝐭𝐢𝐨𝐧: better prompts, models, inference strategies • 𝐄𝐱𝐩𝐞𝐫𝐢𝐦𝐞𝐧𝐭𝐚𝐭𝐢𝐨𝐧: built-in A/B testing, routing, fallbacks Our goal is to help engineers build, manage, and optimize the next generation of LLM applications: AI systems that learn from real-world experience. You can find us on 𝐆𝐢𝐭𝐡𝐮𝐛: https://lnkd.in/gTznVUGD 𝐍𝐞𝐱𝐭 𝐬𝐭𝐞𝐩𝐬? The Quick Start (5min) and the Tutorial show it's easy to set up an LLM application with TensorZero. The tutorial teaches how to build a simple chatbot, an email copilot, a weather RAG system, and a structured data extraction pipeline. https://lnkd.in/g4b2JzD6 https://lnkd.in/giKA2snJ We are working on a series of 𝐜𝐨𝐦𝐩𝐥𝐞𝐭𝐞 𝐫𝐮𝐧𝐧𝐚𝐛𝐥𝐞 𝐞𝐱𝐚𝐦𝐩𝐥𝐞𝐬 illustrating TensorZero's data & learning flywheel. 𝟏. 𝐖𝐫𝐢𝐭𝐢𝐧𝐠 𝐇𝐚𝐢𝐤𝐮𝐬 𝐭𝐨 𝐒𝐚𝐭𝐢𝐬𝐟𝐲 𝐚 𝐉𝐮𝐝𝐠𝐞 𝐰𝐢𝐭𝐡 𝐇𝐢𝐝𝐝𝐞𝐧 𝐏𝐫𝐞𝐟𝐞𝐫𝐞𝐧𝐜𝐞𝐬 This example fine-tunes GPT-4o Mini to generate haikus tailored to a specific taste. You'll see TensorZero's "data flywheel in a box" in action: better variants leads to better data, and better data leads to better variants. You'll see progress by fine-tuning the LLM multiple times. https://lnkd.in/gmNtKVSw 𝟐. 𝐅𝐢𝐧𝐞-𝐓𝐮𝐧𝐢𝐧𝐠 𝐓𝐞𝐧𝐬𝐨𝐫𝐙𝐞𝐫𝐨 𝐉𝐒𝐎𝐍 𝐅𝐮𝐧𝐜𝐭𝐢𝐨𝐧𝐬 𝐟𝐨𝐫 𝐍𝐚𝐦𝐞𝐝 𝐄𝐧𝐭𝐢𝐭𝐲 𝐑𝐞𝐜𝐨𝐠𝐧𝐢𝐭𝐢𝐨𝐧 (𝐂𝐨𝐍𝐋𝐋++) This example shows that an optimized Llama 3.1 8B model can be trained to outperform GPT-4o on an NER task using a small amount of training data, and served by Fireworks AI at a fraction of the cost and latency. https://lnkd.in/gxmbDshh 𝟑. 𝐀𝐮𝐭𝐨𝐦𝐚𝐭𝐞𝐝 𝐏𝐫𝐨𝐦𝐩𝐭 𝐄𝐧𝐠𝐢𝐧𝐞𝐞𝐫𝐢𝐧𝐠 𝐟𝐨𝐫 𝐌𝐚𝐭𝐡 𝐑𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠 (𝐆𝐒𝐌𝟖𝐊) 𝐰𝐢𝐭𝐡 𝐚 𝐂𝐮𝐬𝐭𝐨𝐦 𝐑𝐞𝐜𝐢𝐩𝐞 (𝐃𝐒𝐏𝐲) TensorZero provides a number of pre-built optimization recipes covering common LLM engineering workflows. But you can also easily create your own recipes and workflows! This example shows how to optimize a TensorZero function using an arbitrary tool — here, DSPy. https://lnkd.in/g3E7CVvm & 𝑚𝑎𝑛𝑦 𝑚𝑜𝑟𝑒 𝑜𝑛 𝑡ℎ𝑒 𝑤𝑎𝑦!

    • No alternative text description for this image

Similar pages