"A Recipe for a Better AI-based Code Generator" explores how Pulumi Copilot uses Retrieval-Augmented Generation (RAG) to combine #LLMs with real-time provider data, ensuring accurate, up-to-date IaC code. Learn how we balance precision, recall, and self-debugging to empower modern infrastructure management. Learn more at https://hubs.ly/Q031XD7R0 #RAG #OpenAI #ArtificialIntelligence #SoftwareEngineering
Pulumi’s Post
More Relevant Posts
-
How to deploy a multi-adapter LLM easily on Kubernetes? Excited to share a new example on how to deploy Gemma 2 with multiple LoRA adapters on Google Kubernetes Engine (GKE) using Hugging Face Text Generation Inference (TGI), enabling efficient and scalable multi-task inference. 👀 TL;DR: 🚀 Unlock Multi-task Power: Deploy a single Gemma 2 model with various LoRA adapters for diverse tasks like coding, SQL, and translation. ☁️ Simplified GKE Deployment: Easy-to-follow instructions for setting up and configuring a GKE Autopilot cluster with GPU support. 🔒 Secure Token Management: Securely manage Hugging Face Hub tokens using Kubernetes secrets. ⚙️ Effortless TGI Deployment: Deploy TGI using kubectl and provided configuration files. 💬 Flexible Inference: Perform inference via cURL or using the OpenAI sdk, specifying the desired LoRA adapter for each task. 💰 Cost-Effective Solution: Minimize costs by using a single base model with multiple LoRA adapters and by deleting the GKE cluster or downscaling the pod after use. ✨ Why Multi-LoRA? Understand the key benefits of multi-LoRA inference, such as cost savings, scalability, and simplified management. Tutorial: https://lnkd.in/eS8EW8GV [Cloud AI Tuesday: #10]
To view or add a comment, sign in
-
-
If you have seen my last two posts, you have seen how Dagster Labs helps Machine Learning engineers iterate faster with their integrations to Modal. Consequently, Kyryl Truskovskyi, who consults and teaches machine learning course, sent me a case study of this in action. Here is my summary: RLHF is a machine learning technique where models learn based on feedback from humans. This process is iterative and requires constant updates, making automation and scalability crucial. How Dagster Labs Fits In... Dagster is used to orchestrate RLHF workflows. Dagster organizes tasks into "assets" and tracks dependencies, ensuring that data collection, model training, and evaluation happen smoothly in sequence. Essentially, making the pipeline easy to manage and monitor. The Role of Modal... Modal provides scalable, serverless infrastructure. RLHF often requires running multiple resource-heavy tasks simultaneously, and Modal’s serverless platform takes care of that automatically, scaling up when needed and saving time. Combining Dagster and Modal... Together, Dagster orchestrates the pipeline, while Modal executes the heavy compute tasks. This combination simplifies the workflow and enables faster iteration of machine learning models, making RLHF more manageable for engineers. In short, Dagster and Modal work together to automate and scale RLHF pipelines, offering an efficient solution for handling complex machine learning tasks. See my other posts in the comments, and reach out to me to schedule a call for further evaluation on Dagster's orchestration tool. https://lnkd.in/gJMvMxEp
To view or add a comment, sign in
-
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production. Langchain/LlamaIndex provide easy to use abstractions that can be used for quick experimentation and prototyping on jupyter notebooks. But, when things move to production, there are constraints like the components should be modular, easily scalable and extendable. This is where Cognita comes in action. Cognita uses Langchain/Llamaindex under the hood and provides an organisation to your codebase, where each of the RAG component is modular, API driven and easily extendible. Cognita can be used easily in a local setup, at the same time, offers you a production ready environment along with no-code UI support. Cognita also supports incremental indexing by default. #genai #llm #framework #rag
To view or add a comment, sign in
-
We are excited to announce that vLLM inference provider is now available in Llama Stack through the collaboration between the Red Hat AI Engineering team and the Llama Stack team from Meta! Check out my recent article with Ashwin Bharambe on vLLM blog that provides an introduction and tutorial to help you get started using it locally or deploying it in a Kubernetes cluster.
To view or add a comment, sign in
-
My YouTube feed is full of thumbnails saying “You won’t believe what this Agent can do”, “Shocking Llama 3 gets us closer to AGI”, etc. Perfect style transfer from the Mr.Beast clickbait playbook, but doesn’t speak to the reality of where we stand. Arthur C. Clarke famously said “Any sufficiently advanced technology is indistinguishable from magic.” I recently wrote a memo to my team at Houseware, and asked them “Do agents feel magical yet?” Heard back a resounding “NOT YET!” There’s a lot to be excited about, but there’s a lot that’s yet to be built. Robustness is yet to be cracked, especially when you start getting into automating business processes. Langgraph and DSPy get us closer, but there’s still an abstraction layer that need to be baked in to improve Dev Experience. Here’s how I’m thinking: DSPy’s concept of signatures in the Chain-of-Thought module solves for brittleness in prompting, but more importantly, it abstracts away the stupid prompt engineering that I had to do earlier. The agent toolkit needs a similar abstraction. I don’t want to be creating twenty-five nodes, and land myself in a tangled mess of edges. There’s the other extreme too: CrewAI oversimplified it, and took away too much control. Building agents should be simpler! Let's talk about costs lol. Came across a tweet from Jared Palmer over the weekend, "When it comes to agents, running the current frontier models in a loop is comical at current price points. It’s ngmi. At minimum, new middleware is going to be needed to properly select model size for the task(s) at hand, queue + batch process, and efficiently pass around state, files, and memory." ^^100%. Looking at the OpenAI bills on our end, extrapolating this pattern for production use cases in enterprises doesn’t make any sense right now! Figuring this out. We’re building out end-user facing agents, and learning which patterns work best for which use cases. Eventually, the right abstractions will show up. Stare at the agents, and the abstractions stare back?
To view or add a comment, sign in
-
Just merged my latest contribution to the Terraform Provider for OpenAI Added support for Structured Outputs in the Assistants resource, enabling even more powerful automation and control over your GenAI assistant driven by terraform. This feature is now available in version v1.5.1! 🚀 Check out the details and get started here: https://lnkd.in/gzVUYPdc Learn more about Structured Outputs in the OpenAI API: https://lnkd.in/gQzTH3iT Shaun Stuart Milan Brown HashiCorp #genai
To view or add a comment, sign in
-
I am excited to write this blog together with (Kevin) Huan-Ping Su from Union. - Anyscale is the best place to run Ray workloads, and Ray is the go-to solution for ML and platform engineers to manage the end-to-end lifecycle of ML workloads as a unified computing engine. - Union.ai is the best place to orchestrate Flyte pipelines, and Flyte is a great solution to address gaps within ML pipelines, such as managing task lifecycles for infra engineers and accessing data from data warehouses. This blog will dive into a RAG example to show the perfect marriage between Anyscale and Union. Sign up for Ray Summit (raysummit.anyscale.com) to learn more about Ray!
In the lead-up to #Raysummit, we have a guest blog on Building a RAG Batch Inference Pipeline with Anyscale and Union.ai 🚀 #Flyte is an open-source orchestrator facilitating production-grade data & ML pipelines. When combined with Ray’s distributed computing power, it unites data, platform, infrastructure, and ML engineers to boost productivity and scale across evolving use cases. Union.ai platform makes orchestrating these pipelines seamless. In the RAG Batch Inference pipeline exactly there are actually 2 pipelines that Union orchestrates: 🔹 Embedding Generation Pipeline 🔹 Batch Inference Pipeline #Flyte Deck on Union.ai lets you preview responses without downloading, ensuring instant validation. Anyscale’s Ray platform optimizes the execution of these pipelines, helping deliver leading performance and cost efficiency. Read more here 👉 https://lnkd.in/gjDKRPmu
To view or add a comment, sign in
-
With MindsDB you can build AI-powered applications easily, even with no AI/ML experience. Follow along to learn how to set up MindsDB in Docker Desktop.
To view or add a comment, sign in
-
🎯 MLOps project: Time for inference Continuing the development of our MLOps project within the MLOps Zoomcamp by DataTalksClub, we must analyze the requirements of our business process and consider what type of inference we should provide and what platform to build it on. 🏗 In our scenario, we will continue to rely on the Mage AI orchestration platform by Mage where we have already built other project pipelines. So we can facilitate the integration between them and take advantage of the extraordinary capabilities that the tool offers us. Initially, we will consider the following scenarios: 📚 Batch or offline inference: we will create a flow that collects input data, prepares it for our model, and performs the predictions. Finally, we will return the results by storing them in a repository shared with the requester. 📤 Online inference: on the one hand, we can build a simple flow in Mage platform that performs a prediction and then activates an API trigger that allows us to publish the endpoint. In addition, we consider the alternative of providing an independent endpoint that gives us greater scalability and flexibility through a REST API in Flask, FastAPI or even with a UI in Streamlit. 🕵♂️ In addition, since we are going to monitor the performance of our model in topics such as data drift, in the inference process we must include a final step to collect and store the data received in the inference. Later, we will retrieve them in the monitoring pipeline to perform the corresponding analysis. ⚙ The process of designing and building our end-to-end MLOps workflow keeps advancing and we are covering the most relevant components of this subject. #mlopszoomcamp #mlops #machinelearning
To view or add a comment, sign in
-
Managing scalable infrastructure for machine learning is already challenging as is. Nobody wants to spend hours writing Kubernetes YAML or configuring GPU operators. In our latest #Dagster Deep Dive, Colton P. (Dagster Labs) and Charles Frye (Modal) showed us how using Dagster and Modal together lets you orchestrate ML workflows and scale infrastructure without the added complexity. Read the full recap here: https://bit.ly/3BpoZbr
To view or add a comment, sign in