Quinten L.’s Post

Platform engineer with 5+ years of deploying AI Inference systems

2mo

I think what a lot of people have intuitively figured out, but haven't noticed explicitly, is that using AI for greenfield projects feels much more useful than using it in an established codebase. From what I've seen, there are two main reasons for this: 1. Experienced engineers often work on systems that involves many different parts of systems. Current AI tools just aren't built for this kind of task. 1. AI models are trained on a broad range of data, which doesn't always match up with the specific, deep knowledge that experienced devs have built up over years. New devs are brought up while experienced devs are weighed down. I'm going to focus on that first point in this post, because I think it's in part what's allowing less experienced devs to see things that more experienced devs aren't. AI models are getting pretty damn good, to the point where using Claude 3.5 rarely leaves me wanting more. AI tooling is the exact opposite. Working on greenfield projects that have grown, I've started to run into problems: it's becoming increasingly harder to give the AI enough context to get a good response. The changes I'm requesting are touching more parts of the codebase, and it's tough to include all the relevant bits. For any given change to my web projects (like Django, for example), if I want a solution quickly I need: 1. The relevant html 2. Any blocks of other content I'm including 3. Relevant CSS 4. Relevant JS 5. Sometimes an example of a similar feature implemented in another html, css, or js file, to maintain consistency 6. The view 7. Any relevant imports 8. Similar views that may have implemented similar patterns to what I need to happen 9. Any other functions that the view calls 10. The URL structure 11. Any schemas that might be relevant 12. Database models And that's not even counting things like repo structure, ownership, git diffs, or (for more complicated scenarios) call graphs. More relevant context means better AI output, but getting that context is a pain, and for best results it should all be in a single message. I got fed up with this and made a Neovim shortcut to collect these snippets in a haphazard kind of way that grabs code snippets, file info, and generates a file structure at the top of a temporary buffer based on the files that snippets are grabbed from. It's not perfect, but it helps get more context to the AI without spending ages adding all the metadata. Just by using this there has been a noticeable improvement in how often I am able to get zero-shot solutions out of Claude 3.5. At this point I am just doing a manual, informed RAG. I would like to automate this process, so to that end I ask "How can I automatically find all of the snippets that are relevant to the feature I am trying to implement?" I cover the rest of my thoughts on this in a post on my blog: https://lnkd.in/gtAmyx7a

Using Agents as Retrofit Solutions to Established Codebases

thelisowe.com

To view or add a comment, sign in

More Relevant Posts

Ziaul Kamal

Coder Enthusias
6mo
Report this post
Pulumi Templates for GenAI Stacks: Pinecone, LangChain First To build a Generative AI application, you typically need at least two components to start with, a Large Language Model (LLM) and a vector data store. You probably need some sort of frontend component as well, such as a chatbot. Organizations jumping into the GenAI space are now facing an orchestration challenge with GenAI. They find that moving these components from the developer’s laptop to the production environment can be error-prone and time-consuming. To ease deployments, Infrastructure as Code (IaC) software provider Pulumi has introduced “providers,” or templates, for two essential GenAI tools, namely the Pinecone vector database and a version of the LangChain framework for building LLMs. “We find a lot of the tools out there, like LangChain, are great for local development. But then when you want to go into production, it’s left as a DIY exercise,” said Joe Duffy, CEO and co-founder of Pulumi, in an interview with TNS. “And it’s very challenging because you want to architect for infinite scale so that as you see success with your application, you’re able to scale to meet that demand. And that’s not very easy to do.” Specifically, Pulumi is supporting the serverless version of Pinecone on AWS, which was unveiled in January, and support for LangChain comes through LangServe, a container management service built on Amazon ECS. The two templates join a portfolio that covers over 150 cloud and SaaS service providers, including many others used in the GenAI space, such as Vercel Next.js for the frontend and Apache Spark. In addition to the templates themselves, Pulumi also mapped out a set of reference architectures that use Pinecone and LangChain. How to Build a GenAI Stack Using IaC The idea is that the AI professional, who may not have operations experience, can define and orchestrate an ML stack with Pulumi, using Python another language. As an IaC solution, Pulumi provides a way to declaratively define an infrastructure. Unlike other IaC approaches, Pulumi allows the developer to build out your environment using any one of a number of programming languages, such as Python, Go, Java and TypeScript. deployment engine then can provision the defined environment, and even check to ensure that the operational state stays in sync with the defined state. The AI Gen reference architectures have been designed with best practices in mind, Duffy said. “A lot of the challenge is how to make this scalable, scalable across regions and scalable across the subnets, and networks. And so this blueprint is built for configurable scale.” This is not Pulumi’s first foray into managing AI infra. The company has already developed modules for AWS SageMaker and Microsoft’s OpenAI Azure service. There is also a blueprint for deploying an LLM from Hugging Face on Docker, Azure, or Runpod. Of course, the company has plans to further expand the roster going forward. “We’re seeing a lot...

1 Comment
Like Comment
To view or add a comment, sign in
🧑🏻💻 Lars Grammel

Software Engineer (Vercel AI SDK)
7mo
Report this post
Here's how I think about the software stack for LLM inference, from a JS/TS dev point of view: There are 6 levels that build on one another: 1) The model: the actual model that will be executed at inference time. Sometimes it's the providers models (e.g. GPT-4 et al for OpenAI), sometimes you can choose yourself (download different GGUF files and run them with llama.cpp). When I say model I put fine-tunes, base models, and LORAs all in the same bucket for this post - it's the weights that are being used to infer the next token. 2) The model execution engine (model backend): the models need to be run in some runtime environment to process inputs and produce tokens. Some providers have their own engines for their own models (OpenAI, AnthropicAI), others let you run open source models in the cloud (e.g. FireworksAI), and then there are engines that you can use locally (llama.cpp). The engine needs to support the architecture of the model. Some providers wrap existing open source engines, e.g. ollama uses llama.cpp. 3) The API: the models are exposed through REST APIs mostly. With Llama.cpp, you can use bindings. With WebLLM, you can run in the browser. 4) The client library: various options here. Many providers standardize on the OpenAI client library these days, but others choose to have different libs (e.g. mistral, google, anthropic, ollama). With Llama.cpp you can use bindings in various languages, including JS (node bindings) or clients for the Llama.cpp server. 5) The orchestration framework: Handles how you integrate LLMs into apps, e.g. for chat, retrieval augmented generation (in combination with vector stores and embeddings), agents, etc. llama_index and LangChainAI are examples of orchestration frameworks. 6) UI integration: most JavaScript apps are client/server apps with a web frontend. It's important to move information from the server (where the API keys are) to the client, ideally with streaming. The Vercel AI SDK is an example of a UI integration library for AI. This means that there are 3 types of LLM providers: A) integrated providers (such as OpenAI, GoogleAI, Anthropic): they train and host their own proprietary models, have their own execution engines, their own API, and provide client libraries to work with their models. B) open-source cloud providers (such as Fireworks, Anyscale, TogetherAI): They host open source models (and often your own models) and provide a standardized API (often OpenAI compatible). C) local model providers (such as Llama.cpp, Ollama, webllm): you download and run the model on your machine. Some have their own client (e.g. Ollama). Right now the orchestration frameworks and the UI integration is separate from the backend LLM provider stack. Do you agree? How do you see these components evolve?

6 Comments
Like Comment
To view or add a comment, sign in
Reid Mayo

Founding AI Engineer @ OpenPipe | The End-to-End LLM Fine-tuning Platform for Developers
7mo
Report this post
Solid assessment Lars. I'd add a few more LLM "providers" (or I'd call them "layers") Regarding (B) layer, we're also seeing inference run at the edge in addition to more centralized cloud providers. By this I mean CDN networks (fastly/cloudflare) that can run inference at an edge node in order to lower latency a bit to the end client. That said shaving off a few milliseconds of latency is pretty marginal given compute time is the largest bottleneck on response latency. There are other advantages inference at the edge could provide though, like caching responses that might be similar etc. Regarding (C) layer, I think that's gonna expand a fair amount to basically being an "embedded LLM" layer. Llama etc needs a pretty beefy machine to perform. Seems like there will be a future where IoT devices have smaller specialized models embedded for certain niche tasks, and then for more cpu bound tasks they will cascade up a chain that could look like going to the (B) layer, and failing that going to a SOTA (A) layer. Lastly there's also the possibility of an "on-premise" layer to get inference closer to the end client/IoT device but still have beefier compute. But that only makes since if bandwidth is bottleneck (ie video not text)

🧑🏻💻 Lars Grammel

Software Engineer (Vercel AI SDK)
7mo

Here's how I think about the software stack for LLM inference, from a JS/TS dev point of view: There are 6 levels that build on one another: 1) The model: the actual model that will be executed at inference time. Sometimes it's the providers models (e.g. GPT-4 et al for OpenAI), sometimes you can choose yourself (download different GGUF files and run them with llama.cpp). When I say model I put fine-tunes, base models, and LORAs all in the same bucket for this post - it's the weights that are being used to infer the next token. 2) The model execution engine (model backend): the models need to be run in some runtime environment to process inputs and produce tokens. Some providers have their own engines for their own models (OpenAI, AnthropicAI), others let you run open source models in the cloud (e.g. FireworksAI), and then there are engines that you can use locally (llama.cpp). The engine needs to support the architecture of the model. Some providers wrap existing open source engines, e.g. ollama uses llama.cpp. 3) The API: the models are exposed through REST APIs mostly. With Llama.cpp, you can use bindings. With WebLLM, you can run in the browser. 4) The client library: various options here. Many providers standardize on the OpenAI client library these days, but others choose to have different libs (e.g. mistral, google, anthropic, ollama). With Llama.cpp you can use bindings in various languages, including JS (node bindings) or clients for the Llama.cpp server. 5) The orchestration framework: Handles how you integrate LLMs into apps, e.g. for chat, retrieval augmented generation (in combination with vector stores and embeddings), agents, etc. llama_index and LangChainAI are examples of orchestration frameworks. 6) UI integration: most JavaScript apps are client/server apps with a web frontend. It's important to move information from the server (where the API keys are) to the client, ideally with streaming. The Vercel AI SDK is an example of a UI integration library for AI. This means that there are 3 types of LLM providers: A) integrated providers (such as OpenAI, GoogleAI, Anthropic): they train and host their own proprietary models, have their own execution engines, their own API, and provide client libraries to work with their models. B) open-source cloud providers (such as Fireworks, Anyscale, TogetherAI): They host open source models (and often your own models) and provide a standardized API (often OpenAI compatible). C) local model providers (such as Llama.cpp, Ollama, webllm): you download and run the model on your machine. Some have their own client (e.g. Ollama). Right now the orchestration frameworks and the UI integration is separate from the backend LLM provider stack. Do you agree? How do you see these components evolve?
Like Comment
To view or add a comment, sign in
Rishub C R

Agentic Engineer | AI Consultant |Multimodal LLMs | Researcher on Latest AI Papers | Natural Intelligence working on Artificial Intelligence
1w Edited
Report this post
🚀 Exciting Advances in AI for Long-Context QA and Fine-Grained Citations! 🚀 Our latest project focuses on enabling Long-Context Large Language Models (LLMs) to generate precise citations in Long-Context Question Answering (QA) tasks. Here are some key highlights: 1. **Robust Tech Stack**: Leveraging React.js with TypeScript for the frontend and Node.js with Express for the backend, our infrastructure is designed for scalability and performance. The PostgreSQL database ensures robust data management, while Docker and Kubernetes streamline deployment. 2. **Seamless Authentication**: Implementing OAuth 2.0 ensures secure user authentication, integrating smoothly across both frontend and backend components. 3. **Advanced LLM Integration**: Utilizing Hugging Face Transformers within our backend, we can harness state-of-the-art language models for accurate and context-rich QA responses. 4. **Comprehensive Benchmarking**: By adopting LongBench-Cite, we have established rigorous benchmarking protocols, focusing on correctness and citation quality to ensure the highest standards. 5. **Dynamic Data Visualization**: Our frontend incorporates D3.js for intuitive and interactive data visualizations, providing clear insights into citation quality and correctness metrics. This project represents a significant leap forward in the application of LLMs for complex QA tasks, ensuring not only accuracy but also the reliability of information through fine-grained citations. 📈 Ready to dive deeper? Watch the full video to explore how we are pushing the boundaries of AI in QA: [Full Paper Link](https://lnkd.in/diittEAY) #AI #MachineLearning #LLM #QA #TechInnovation #DataScience #DeepLearning #ReactJS #NodeJS #HuggingFace #Kubernetes #Docker #DataVisualization https://lnkd.in/dqjqTbAv
Like Comment
To view or add a comment, sign in
Jordan D.

Bespoke scalable systems for operational efficiency
5mo
Report this post
Who wants to pair with me on building a cool side-project for software developers? Here's the big idea: Developers often spend significant amounts of time trying to understand large and / or unfamiliar codebases, tracing through code history to debug issues and searching for relevant code. Git commit history can be a treasure trove of valuable information, but more often than not it is completely useless: in reality, most commit messages look like this: "fixed a bug" Yay. An AI-powered tool to re-analyse and re-summarize Git commits could dramatically improve developer productivity. Leveraging an LLM to guess the intent and summarise the code changes in each commit - even if far from perfect - would likely make the codebase much more accessible and understandable. Developers new to the project can quickly get up to speed on an unfamiliar codebase, with more historical context and a timeline that provides valuable insights. But wait, there's more. When debugging, it's often unclear which commit introduced a bug. Searching an index of AI-enhanced commit descriptions could help zero in on likely culprit commits much faster than manual search. Semantic code search powered by an LLM-generated index could allow more natural language queries to find relevant code snippets and understand how code evolved. For companies, this could mean faster and more productive onboarding of new developers, less time wasted on unnecessary code archeology and ultimately fewer bugs. It could be a powerful productivity multiplier for both new and experienced devs. My proposal for the technical approach, in a nutshell: Technical Approach: * Use an open-source or proprietary pre-trained code-aware model (e.g. Code Llama, OpenAI Codex, or just GPT4 / Claude) * To process a new codebase, go through all the branches and for each commit, extract the diff and commit message * Chunk the diff into manageable pieces if needed (modern models have rather large context window, but still - some commits are monsters) * Prompt the LLM with each diff chunk + commit message (plus the project description for context), asking it to summarize the likely intent of the code changes in concise natural language * Aggregate the LLM's responses into a commit-level intent summary * Index these summaries along with the diffs and commit metadata in a search engine like Elastic * Build a clean, simple UI to search and explore this data, allowing queries in natural language that get embedded and semantically matched against the index. ALSO provide a CLI tool for easy command-line access. Additional Ideas: * Augment the LLM prompts with info from linked pull requests, issues, documentation to get better context. * Identify and surface insightful trends, e.g. files changed together frequently, common bug patterns, etc. * Integrate with IDE to surface relevant commits in-context. Develop plugins for PyCharm, VSCode, etc.

20 Comments
Like Comment
To view or add a comment, sign in
Rohit Sood

Data & AI Pre-Sales Engineering Leader, IBM Quantum Ambassador, IBM Technology, India/South Asia
1mo
Report this post
There are many ways to build AI applications. Most of them aren't that easy to get started with or require prior knowledge about machine learning. With the new #watsonx.ai flows engine developers of all skills levels are able to create #generativeAI applications. It simplifies how you integrate with Large Language Models (#LLMs) and connect them to your (#enterprise) data. A quick tutorial on how to build a question-answer tool using JavaScript and IBM #watsonx.ai flows engine.

Build a RAG application with watsonx.ai flows engine

developer.ibm.com

5 Comments
Like Comment
To view or add a comment, sign in
Robert Clark

AI Innovator | Future AI Influencer | Next Gen AI Creator
6mo
Report this post
Creating a bot that performs a specific task involves several steps. Here's a general outline to guide you through the process: ### 1. Define the Purpose and Capabilities of the Bot - **Identify the Task:** Clearly define what "X" is. The task could range from answering questions, managing schedules, automating social media interactions, to more complex tasks like data analysis or machine learning applications. - **Determine the Platform:** Decide where your bot will operate (e.g., web, Discord, Slack, Telegram, standalone application). - **Understand the User Requirements:** Consider what users expect from the bot and how it will improve or simplify their tasks. ### 2. Choose the Right Tools and Technologies - **Programming Language:** Based on the bot's requirements, choose a suitable programming language. Python is popular for its simplicity and the vast number of libraries available for AI, machine learning, and web scraping. - **Frameworks and Libraries:** Depending on the task, you might use frameworks like Dialogflow for conversational bots, TensorFlow or PyTorch for machine learning tasks, or Flask/Django for web applications. - **APIs and Integration:** If your bot interacts with other services or platforms, identify the necessary APIs and ensure they support your bot's functionality. ### 3. Design the Bot's Architecture - **Data Flow:** Outline how data will move through your bot, from input to processing to output. - **State Management:** Decide how your bot will manage state (e.g., user sessions, conversation history). - **Error Handling:** Plan for handling errors and unexpected inputs gracefully. ### 4. Development - **Setup Development Environment:** Prepare your development environment with all the necessary tools and libraries. - **Implement Core Features:** Start coding the core functionalities based on your design. Implement one feature at a time and test thoroughly. - **Iterate Based on Feedback:** Test your bot with potential users and iterate based on their feedback. ### 5. Testing and Deployment - **Unit Testing:** Write unit tests for individual components to ensure reliability. - **Integration Testing:** Test the bot in an integrated environment to see how components interact. - **Deployment:** Choose a hosting service or platform to deploy your bot. This could be a cloud provider like AWS, Google Cloud, or a server you manage. ### 6. Maintenance and Updates - **Monitor Performance:** Regularly check the bot's performance and fix any issues. - **Update Regularly:** Keep the bot updated with new features and improvements based on user feedback. ### Example Bot Proposal: Event Reminder Bot - **Purpose:** To remind users of upcoming events via Discord. - **Technologies:** Python for the backend, Discord API for integration, and SQLite for storing event data. - **Key Features:** Users can add, list, and delete events. The bot sends reminders a day before and an hour before events.

1 Comment
Like Comment
To view or add a comment, sign in
Halil Coşdu

Laravel Expert | Technical Lead | Software Architect
4mo Edited
Report this post
Laravel Finetuner: Generate training examples, save them as a .jsonl file, upload it to OpenAI, and start the fine-tuning job. Your AI model is now ready.

GitHub - halilcosdu/laravel-finetuner: Laravel Finetuner is a package designed for the Laravel framework that automates the fine-tuning of OpenAI models.

github.com
Like Comment
To view or add a comment, sign in
Lava Kafle
8mo
Report this post
How to write and deploy a basic Node.js API with Duet AI on VS Code a step-by-step guide https://buff.ly/3GSXkyV - Node.js and AI: Learn how Node.js is powering AI applications and enabling advanced machine learning capabilities. #Nodejs #AI - VS Code for Node.js: Discover how VS Code, a popular code editor, enhances the development experience for Node.js projects. #VSCode #Nodejs - The future of Node.js: Explore the potential of Node.js in the AI landscape and how it can revolutionize various industries. #FutureTech

Geshan Manandhar

T Shaped <10x Product Minded Engineer | Blogger | Speaker
8mo

How to write and deploy a basic Node.js API with Duet AI on VS Code a step-by-step guide https://buff.ly/3GSXkyV - Node.js and AI: Learn how Node.js is powering AI applications and enabling advanced machine learning capabilities. #Nodejs #AI - VS Code for Node.js: Discover how VS Code, a popular code editor, enhances the development experience for Node.js projects. #VSCode #Nodejs - The future of Node.js: Explore the potential of Node.js in the AI landscape and how it can revolutionize various industries. #FutureTech

How to write and deploy a basic Node.js API with Duet AI on VS Code a step-by-step guide

geshan.com.np
Like Comment
To view or add a comment, sign in
Rozario Chivers

Digital Technology Specialist
2mo
Report this post
ControlFlow ControlFlow is a Python framework for building agentic AI workflows. ControlFlow provides a structured, developer-focused framework for defining workflows and delegating work to LLMs, without sacrificing control or transparency: Create discrete, observable tasks for an AI to solve. Assign one or more specialized AI agents to each task. Combine tasks into a flow to orchestrate more complex behaviors. This task-centric approach allows you to harness the power of AI for complex workflows while maintaining fine-grained control. By defining clear objectives and constraints for each task, you can balance AI autonomy with precise oversight, letting you build sophisticated AI-powered applications with confidence. https://lnkd.in/gM4Newdz

GitHub - PrefectHQ/ControlFlow: 🦾 Take control of your AI agents

github.com
Like Comment
To view or add a comment, sign in

129 followers

18 Posts

View Profile Follow

Quinten L.’s Post

More Relevant Posts

Explore topics