Weights & Biases’ Post

Name: LLM Apps: Evaluation Course (What is Position Bias?)
Uploaded: 2025-01-21T20:28:16.093Z
Duration: 37 s
Channel: Weights & Biases

Weights & Biases

79,629 followers

2mo

Why does the order of words matter for LLMs? Two words: Position Bias. LLMs rely on positional embeddings to determine “who did what to whom.” Without this positional context, words lose their relationships, making it nearly impossible to capture true meaning. If you’re ready to dive deeper into these concepts—and more—check out our new, free, on-demand course: LLM Apps: Evaluation. In just 2 hours, you’ll learn how to: - Build an evaluation pipeline for LLM applications. - Leverage LLMs as evaluators to assess outputs programmatically. - Minimize human input by aligning auto-evaluations with best practices. By the end of the course, you’ll have hands-on experience, practical implementation methods, and a clear understanding of how to effectively evaluate and improve your GenAI apps. Meet your expert instructors: Ayush Thakur – AI Engineer at Weights & Biases Anish Shah – AI Engineer at Weights & Biases Paige Bailey – AI Developer Relations Lead at Google Graham Neubig – Co-Founder at All Hands AI Join us and take the next step in advancing your LLM expertise—one (positional) token at a time! 📚: https://lnkd.in/gCHffA24

To view or add a comment, sign in

More Relevant Posts

Ashi kaur Chandok

Driving Communication Strategies | External Engagement |Team Collaboration| Performance Management | Employee Communication & Engagement | HR Tool Automation | SHRM Certified Professional and Analyst
4mo
Report this post
Learn how to build professional-level applications using Large Language Models (LLMs). In this video, we'll guide you through advanced techniques for integrating, fine-tuning, and deploying LLMs for real-world use cases. Perfect for developers, data scientists, and AI enthusiasts ready to master Generative AI and elevate their skills. Check out this course on Euron: Master Generative AI : Professional level LLM application development: https://lnkd.in/dpgM-BPs Check out this Plan on Euron: Personal Plan :https://lnkd.in/dMzV_f2H
Like Comment
To view or add a comment, sign in
Mike Lanzetta

Helping Scale Data and AI Solutions for Microsoft’s Largest Customers
3mo Edited
Report this post
Ever since my colleague and current boss Dan Massey turned me onto Honeycomb.io's blog posts with I've been a fan. Their latest one is, frankly, a better written version of much of my thinking on developing with LLMs, so rather than continue with my own piece on the subject I'm just going to leave this here: https://lnkd.in/gGSk4if5 Some of my favorite points: - "If the question is a mathematical one, computing or calculating the answer will always be faster, cheaper, and more accurate than using AI to derive an answer." - Re: newer reasoning models: " it can get stuff wrong a lot, but the kinds of things it can do were fundamentally impossible a few years ago" - "we need to stack an LLM on top of, underneath, or between other computer systems that we’re more familiar with" -- As someone who has been advocating for using LLMs to drive experimentation/exploration with solvers and traditional ML systems, I'm excited to see them say this The rest of the article is a good discussion on observability in LLM systems, something which my colleague Drew Robbins has been working on and I'm sure he'd agree with their findings here as I've heard him say much the same in our conversations.

Observability in the Age of AI

honeycomb.io

1 Comment
Like Comment
To view or add a comment, sign in
Neurons Lab

8,367 followers
9mo
Report this post
Recently we explored how knowledge graphs (#KG) unlock large language models’ (#LLM) full potential for more robust, reliable AI applications across industries. And now… Check out these fantastic results from our latest AI model performance assessments, using KG + LLM, shared by our very own Head of AI Engineering Rahul Kumar 👇 #AI #RAG #ML
Rahul Kumar

AGI Visionary | Head of Al, 13+ years | Al founder x3 | Speaker & Author x2 | Knowledge Graph Expert | Helping Enterprises Build Intelligent Systems
10mo

🚀 Let's talk numbers for G-RAG (KG+RAG)! 🚀 I'm excited to unveil compelling outcomes from our recent AI model performance assessments, showcasing the synergy of Knowledge Graph and LLM model (KG + LLM)! 🚀 Our team at Neurons Lab has been hard at work refining our language models, and the numbers speak for themselves: 🌟 Vanilla LLM: Success Rate: 30.56% Failed: 27.78% Partial Success: 38.89% 🌟 KG + LLM: Success Rate: 41.18% 🎉 Failed: 41.18% Partial Success: 17.65% The KG + LLM model not only boosts the success rate by an incredible 34.75%📈 compared to the Vanilla LLM, but also shows a remarkable 54.62%📉 decrease in partial successes, indicating more definitive outcomes. While the failure rate has increased, this indicates the model's decisive nature in handling complex tasks. These results highlight our continuous commitment to pushing the boundaries of what's possible with AI, delivering more accurate and reliable outcomes for our customers. Stay tuned because more exciting news from us!! Join us on this journey of connecting the dots in data and unlocking the power of Knowledge Graphs! 🌟 📺 Check out the full workshop here: https://lnkd.in/dSPMtYHW 👨💻Github: https://lnkd.in/dGmvmCGp 👨💻Figma: https://lnkd.in/dKufPVPy 🤓 Reach out to us at Neurons Lab or DM me, if you are struggling with taming the power of LLMs 😉. #AI #MachineLearning #Innovation #ArtificialIntelligence #DataScience #TechNews #PerformanceBoost #AIResearch #LLM #KnowledgeGraph #RAG
Like Comment
To view or add a comment, sign in
Dr. Guru S. Anand

Executive Director - Innovation(RPA, AI,ML,NLP, Analytics,BigData, Digital Transformation, Fintech) at Mizuho | Ex Morgan Stanley & Credit Suisse
2mo
Report this post
DeepSeek R1: Redefining AI Dynamics DeepSeek R1, an entirely open-source large language model, is transforming the AI industry by delivering 96.4% cost savings compared to OpenAI o1—all while maintaining a comparable level of performance. With exceptional expertise in mathematical reasoning and software engineering tasks, it exemplifies the powerful fusion of innovation and affordability. In a rapidly evolving landscape of diminishing competitive moats and accelerated progress, now is the time to seize opportunities and make your mark in this dynamic domain. For further insights, delve into Gary Marcus' Substack Article.

The race for "AI Supremacy" is over — at least for now.

garymarcus.substack.com
Like Comment
To view or add a comment, sign in
Ivan Blanco

Director Master in Finance at CUNEF | Founder & Director at Noax Capital | Long/Short Equity | 💡Talking about Quant Finance & ML/DL
11mo
Report this post
📢 Check out this week's newsletter! It presents a carefully curated selection of research papers and essential resources designed to enhance your understanding of AI in Finance. Below, you'll find a summary of the primary highlights, with a link to the full newsletter at the bottom of this post.👇 🕹️ AI-Finance Insights: I summarize three must-read academic papers that blend cutting-edge ML/DL techniques with Quant Finance: ✍ "Can ChatGPT Beat Wall Street? Unveiling the Potential of AI in Stock Selection" ✍ "Deep Learning for Corporate Bonds" ✍ "Deep Learning from Implied Volatility Surfaces" 💊 AI Essentials: The section on top AI & Quant Finance learning resources: Today, I'm excited to share a video course that is almost two hours long, guiding you step by step through the main mathematical concepts for machine learning. If you're wondering which fundamental concepts you should start with to gain a deeper understanding of machine learning, this video is for you. 🥐 Asset Pricing Insights: In this edition, I recommend a paper exploring the feasibility of successfully timing the market portfolio using a wide range of well-known inputs. Access all this content for free here: https://lnkd.in/dkxSDJpq
Like Comment
To view or add a comment, sign in
2021.AI

9,693 followers
6mo
Report this post
Unlocking success in LLM deployment 🔐 In this video, Guilherme Costa, senior data scientist, and Simon Moe Sørensen, data scientist, at 2021.AI, dive deep into some of the real challenges with large language models (LLMs). 🤖 Finding the right fit: It's not just about the coolest tech, it's about solving your specific business problems. Where and how to use AI? 🛠️ Beyond the demo: LLMs aren't plug-and-play. There's a lot of engineering that goes into making them production-ready. 🎯 Managing expectations: The market's hungry for AI, but development takes time. Let's be realistic about what we can deliver and when. 🧪 Testing the unpredictable: LLMs are inherently non-deterministic. How do you ensure quality and reliability? It's tricky. Watch now to get practical knowledge and expert insights needed to succeed with your LLM projects 👉 https://lnkd.in/exBwNjMR
Like Comment
To view or add a comment, sign in
Charl Arthur Henry Cowley

Data Scientist | Generalist
5mo
Report this post
Reading is probably the most valuable and treasured skill in my toolbox, but finding the best things to read when there's so much going on, is a massive challenge. I've been overwhelmed by the rate of progress in AI and have been struggling to keep up with all the advancements that seem to drop on a daily basis. This newsletter has gone a long way to keep abreast of some of the highlights. It's also helped me to identify at least two potential ideas to build on stagnating concepts that I've been working on. Hope some of you find value in it too to harness the power of learning through reading. https://alphasignal.ai/

AlphaSignal | The Most Read Technical Newsletter in AI

alphasignal.ai
Like Comment
To view or add a comment, sign in
Sanee Kumar

Campus Ambassador at Euron | Subject Matter Expert at Chegg, Tutorbin, & Transtutor | Aspiring Data Analyst & Data Scientist | Skilled in Python, SQL, & DBMS
4mo
Report this post
Learn how to build professional-level applications using Large Language Models (LLMs). In this we'll guide you through advanced techniques for integrating, fine-tuning, and deploying LLMs for real-world use cases. Perfect for developers, data scientists, and AI enthusiasts ready to master Generative AI and elevate their skills. Course Link :https://lnkd.in/g7swuPYk Get Everything With Euron Plus - Course Link :https://lnkd.in/gwh3krBf
Like Comment
To view or add a comment, sign in
Yanan Xie

Foundation Models @ Orby AI
2mo
Report this post
I have some exciting news to share with everyone working on GUI agents! 🌟 Since the initial release of ActIO / UGround in August 2024, the community has made remarkable strides in advancing GUI agents, particularly by integrating visual grounding techniques. We are thrilled to announce that our pioneering work in this area, the UGround paper, has been accepted to #ICLR 2025 with scores of 10, 8, 8, and 5! 🚀 ActIO / UGround is SOTA again. Using the exact same training data, the latest ActIO / UGround model has achieved an incredible 89.4% accuracy on ScreenSpot, outperforming models from Google, Anthropic, Apple, and others. Even more impressive, ActIO / UGround demonstrates exceptional cross-platform generalization on the challenging ScreenSpot-Pro benchmark, despite using no desktop data. 📖 We are proud to not only open source the model weights but also share the curated training datasets with the broader research community. ❤️ Finally, I would like to express my heartfelt gratitude to our collaborators from the OSU NLP Group and my teammates at Orby AI for their invaluable contributions to this work. A special shoutout to our lead author, Boyu Gou, for his tireless dedication in delivering these outstanding results! Demi Ruohan Wang Boyuan Zheng Cheng Chang Yiheng Shu Huan Sun Yu Su Gang Li Yining Mao 💻 Don't forget to check out https://lnkd.in/gEBuYsBP for our paper, models, and datasets!
6 Comments
Like Comment
To view or add a comment, sign in
Karn Singh

Founder DroneX AI | Building AI Solutions That Maximize B2B Growth | AI Educator Empowering Innovation & Sharing Insights
6mo
Report this post
Your Retrieval Augmented Generation (RAG) setup could be way smarter. Most are stuck with naive RAG, but there’s a lot more you can do to boost performance. Let’s break it down: Naive RAG This is the basic version of RAG: 1. User asks a question. 2. Documents are retrieved based on the query. 3. The query + documents are sent to the LLM. 4. LLM generates an answer. It works, but it’s not the most efficient. You can do better. Advanced RAG To make it smarter, focus on improving retrieval and how you use the retrieved data for augmentation: 1. Query Rewriting: If a query is unclear or depends on previous conversation context, you can: - Use the LLM to rewrite the query with context in mind. - Try step-back prompting: create a broader question, then generate a more detailed query. 2. Query Expansion: Don’t settle for a single query—expand it into multiple queries to cover more ground. This gives richer, more relevant results. 3. Reranking or Summarization: Sometimes the LLM gets overwhelmed by too much content. Rerank or summarize retrieved documents to only send the most relevant pieces to the LLM, reducing distractions. Modular RAG This is where things get really interesting. Instead of a linear process (retrieve → augment → generate), Modular RAG introduces adaptive and iterative retrieval. Different models or steps feed into each other in flexible ways, depending on the use case and the data at hand. It’s like customizing your RAG pipeline based on specific needs, not just sticking to a standard process. Final Tip Whatever changes you make, evaluation is key. Run tests and ablation studies to see if the changes really improve your system. Sometimes, adding more complexity doesn’t always mean better performance! There’s a ton more you can do with RAG beyond the basics—this is just scratching the surface. Check out the papers. How are you improving your RAG setup? Drop your thoughts below! 👇 — I share my learning journey here. Join me and let's grow together. Enjoy this? Repost it to your network and follow Karn Singh for more. #LLM #RAG #MachineLearning #AdvancedRAG #AI
Like Comment
To view or add a comment, sign in

79,629 followers

View Profile Connect

Weights & Biases’ Post

LLM Apps: Evaluation Course (What is Position Bias?)

More Relevant Posts

Explore topics