LanceDB’s cover photo
LanceDB

LanceDB

Information Services

San Francisco, California 6,487 followers

Developer-friendly, open source database for multi-modal AI

About us

LanceDB is a developer-friendly, open source database for multimodal AI. From hyper scalable vector search to advanced retrieval for RAG, from streaming training data to interactive exploration of large scale AI datasets, LanceDB is the best foundation for your AI application.

Industry
Information Services
Company size
11-50 employees
Headquarters
San Francisco, California
Type
Privately Held
Founded
2022

Locations

Employees at LanceDB

Updates

  • A cool step by step guide! Thanks for sharing Isaac Flath

    View profile for Isaac Flath

    R & D | Answer.AI

    In this new blog post I cover: 1. Why traditional search fails 2. Vector embeddings with LanceDB 3. Why that's not enough 4. Chunking, hybrid search, and re-ranking 90% of users never scroll past the first page of search results, and most only scan the top 3-5 entries before giving up. For content creators, this is a nightmare - your valuable tutorials and explanations are essentially invisible. The problem? Traditional search relies on exact keyword matching, which fails miserably with specialized technical vocabulary. Here's the fundamental issue: semantic relationships between technical concepts don't translate to keyword searches. My tutorial on custom FastHTML tags is highly relevant to someone looking for web components, but that term never appears in my post! So how do we make technical content discoverable by meaning rather than just keywords? The answer lies in implementing a proper semantic search system. After experimenting with various approaches, there's a three-layer solution that dramatically improves content discovery: 1. Vector embeddings: Convert your content into numerical representations that capture meaning, not just keywords. This allows users to find conceptually related content even when terminology differs. 2. Chunking strategy: Don't embed entire documents. Break content into meaningful sections that preserve context while remaining focused enough to be useful in search results. 3. Hybrid search + re-ranking: Combine vector similarity with keyword matching (BM25), then apply a cross-encoder to re-rank results for maximum relevance. That's your MVP 🔥 Key principles I've learned: • Semantic search isn't magic - it's about transforming text into numbers that capture meaning • Domain knowledge is crucial - understand your content and actually test your search system • Hybrid approaches outperform single methods - vector search + keyword matching + re-ranking works better than any single approach • Chunking matters - how you divide content dramatically impacts retrieval quality What's your experience with content discovery? Have you implemented semantic search for your technical content? I'd love to hear what's working (or not working) for you! Read it now! https://lnkd.in/eyiRTe6n

  • 🤷♀️ 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝘂𝘀𝘁𝗼𝗺𝗲𝗿 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗖𝗵𝗮𝘁𝗯𝗼𝘁 𝘄𝗶𝘁𝗵 𝗥𝗔𝗦𝗔 In this example, Rithik Kumar demonstrates how to create an advanced customer support chatbot by integrating Rasa, LanceDB, and #LLMs for an advanced Customer support chatbot. #Rasa is an open-source framework for building intelligent chatbots with natural language understanding and dialogue management. It integrates easily with APIs, databases, and machine learning models for effective customer support. 💪 Example Notebook - https://lnkd.in/gfMNPDAB ✅ Writeup - https://lnkd.in/gzqgWnhY Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #agent #advanced #customersupport #vectordb #rasa

    • No alternative text description for this image
  • ⛓️ 𝗖𝗼𝗻𝘃𝗲𝗿𝘁 𝗮𝗻𝘆 𝗜𝗺𝗮𝗴𝗲 𝗱𝗮𝘁𝗮𝘀𝗲𝘁 𝘁𝗼 𝗟𝗮𝗻𝗰𝗲 𝗳𝗼𝗿𝗺𝗮𝘁 ⛓️ By using the #Lance format, improve your machine learning workflows, making it more efficient, powerful, and flexible. This example converts 𝘤𝘪𝘯𝘪𝘤 and 𝘮𝘪𝘯𝘪-𝘪𝘮𝘢𝘨𝘦𝘯𝘦𝘵 datasets using CLI command. 🤝 Colab Notebook - https://lnkd.in/gBJGvGFb 🔖 Checkout How it works - https://lnkd.in/gRsdU22U Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #lance #cli #imagedataset #ai #vectordb

    • No alternative text description for this image
  • 🧾 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗥𝗔𝗚: 𝗣𝗮𝗿𝗲𝗻𝘁 𝗗𝗼𝗰𝘂𝗺𝗲𝗻𝘁 𝗥𝗲𝘁𝗿𝗶𝗲𝘃𝗲𝗿 In the #Chunking step for building #RAG, the goal is to create chunks that are long enough to keep the context but short enough for quick retrieval. The #ParentDocumentRetriever balances context and efficiency by splitting and storing small data chunks. During retrieval, it first fetches these small chunks, then uses their parent IDs to return the larger documents. 🤝 Colab Notebook - https://lnkd.in/gFAVvTCF Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #parentdocument #retreiver #rag #advanced #vectordb

    • No alternative text description for this image
  • 🖇️ 𝗖𝗵𝘂𝗻𝗸𝗶𝗻𝗴 𝗔𝗻𝗮𝗹𝘆𝘀𝗶𝘀 𝗳𝗼𝗿 𝗱𝗶𝗳𝗳𝗲𝗿𝗲𝗻𝘁 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲𝘀 🖇️ Question that comes to everyone’s mind is 𝗱𝗼𝗲𝘀 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝗱𝗲𝗽𝗲𝗻𝗱𝘀 𝗼𝗻 𝘁𝗵𝗲 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲 𝗼𝗿 𝗻𝗼𝘁. 📊 So here is a comprehensive analysis - https://lnkd.in/gpvj8Gmn In short, Yes, 𝗰𝗵𝘂𝗻𝗸𝗶𝗻𝗴 𝗱𝗼𝗲𝘀 𝗱𝗲𝗽𝗲𝗻𝗱 𝗼𝗻 𝗹𝗮𝗻𝗴𝘂𝗮𝗴𝗲. The way you break text into sentences matters, and choosing the right tokenizer can improve the chunk quality. As for the best chunking approach, it depends on your use case and content type. Whether your text is structured or unstructured, multi-lingual, or includes images or code, will influence the choice. Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #chunking #analysis #vectordb

    • No alternative text description for this image
  • 📊 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁 𝗰𝗼𝗻𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗿𝗲𝘁𝗿𝗶𝗲𝘃𝗮𝗹 𝘄𝗶𝘁𝗵 𝗟𝗮𝗻𝗰𝗲𝗗𝗕 In Naive RAG, a basic chunking method creates vector embeddings for each chunk separately, and RAG systems use these embeddings to find chunks that match the query, but this approach loses the context of the original document. 𝗖𝗼𝗻𝘁𝗲𝘅𝘁𝘂𝗮𝗹 𝗘𝗺𝗯𝗲𝗱𝗱𝗶𝗻𝗴𝘀 by Anthropic address this issue by incorporating relevant context and prepending it into each chunk before creating embeddings. It enhances the quality of each embedded chunk, leading to more accurate retrieval and reduces the failure rate of retrieval by 35%. This Implementation uses OpenAI model to get context for each chunk. 🤝 Colab Notebook - https://lnkd.in/gWb4vjRc 🔖 Blog - https://lnkd.in/gUzfybNr Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #rag #contextualrag #embeddinngs #anthropic #vectordb

    • No alternative text description for this image
  • 📊 𝗖𝗼𝗺𝗽𝗮𝗿𝗲 𝗠𝗼𝗱𝗲𝗿𝗻𝗕𝗘𝗥𝗧 𝘄𝗶𝘁𝗵 𝘀𝗲𝗿𝗶𝗲𝘀 𝗼𝗳 𝗕𝗲𝗿𝘁 𝗠𝗼𝗱𝗲𝗹𝘀 This study compares #ModernBert released by Answer.AI with established #BERT-based models like Google's BERT, ALBERT, and RoBERTa on the Uber10K dataset, using OpenAI embeddings for question-answer pairs. The goal is to highlight the strengths and weaknesses of ModernBert in comparison to these well-known models. 👾 Dataset used is from LlamaIndex - https://lnkd.in/gvC9Ds7h 10K Dataset 2021 🤝 Analysis Notebook - https://lnkd.in/gUuWMcAS Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #modernbert #answerdotai #bert #googlebert #vectordb

    • No alternative text description for this image
  • 🤷♀️ 𝗔𝗱𝘃𝗮𝗻𝗰𝗲𝗱 𝗖𝘂𝘀𝘁𝗼𝗺𝗲𝗿 𝗦𝘂𝗽𝗽𝗼𝗿𝘁 𝗖𝗵𝗮𝘁𝗯𝗼𝘁 𝘄𝗶𝘁𝗵 𝗥𝗔𝗦𝗔 This example demonstrates how to create an advanced customer support chatbot by integrating Rasa, LanceDB, and #LLMs for an advanced Customer support chatbot. Rasa is an open-source framework for building intelligent chatbots with natural language understanding and dialogue management. It integrates easily with APIs, databases, and machine learning models for effective customer support. 💪 Example Notebook - https://lnkd.in/gfMNPDAB ✅ Writeup by Rithik Kumar - https://lnkd.in/gzqgWnhY Star 🌟 LanceDB recipes to keep yourself updated - https://lnkd.in/dvmfDFed #agent #advanced #customersupport #vectordb #rasa

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

LanceDB 2 total rounds

Last Round

Seed

US$ 8.0M

See more info on crunchbase