RAG to Riches: Unlocking the full power of LLMs via Shane McAllister, Lead Developer Advocacy (Global) at MongoDB. https://lnkd.in/ep_Zw3-D
TechNative’s Post
More Relevant Posts
-
Here's an intriguing project on GitHub aiming to replicate DeepSeek-R1's performance results by filling in the missing components left out of the "Open-Source" model e.g. the curation process of reasoning-specific datasets: https://lnkd.in/e8hwQ64A #GitHub #DeepSeek-R1 #OpenSource #
To view or add a comment, sign in
-
In today's digital age, the rapid proliferation of information through online platforms has revolutionized how we consume news and engage with current events. However, this unprecedented accessibility to information has also given rise to a concerning phenomenon—fake news. Fake news, characterized by false or misleading information presented as factual news, has become a pervasive issue with far-reaching consequences. Its impact spans from influencing public opinion and elections to inciting social discord and undermining trust in credible sources. The approach for this project involves a fusion of TF-IDF methodology with LSTM (Long Short-Term Memory) and Bidirectional LSTM (BiLSTM) models for the detection of fake news. Initially, the FakeNewsNet and ISOT datasets are merged to create a comprehensive corpus of labeled news content. Following this, the text data undergoes preprocessing, including cleaning, tokenization, and normalization, ensuring uniformity across articles. The key pivot lies in the TF-IDF representation, where the text data is transformed into TF-IDF vectors to highlight the importance of specific terms within each article, emphasizing crucial words in the context of individual documents. Finally, the model is trained and evaluated, showcasing the potential of this approach in combating the spread of fake news. This project was a significant step towards understanding and mitigating the impact of fake news. I am excited to share more details and insights. Check out the complete project on GitHub: https://lnkd.in/d2rqcijM. #FakeNewsDetection #MachineLearning #DataScience #AI #LSTM #BiLSTM #TFIDF #Python #DeepLearning #TechForGood #Innovation
To view or add a comment, sign in
-
Mission accomplished! 💪 Excited to announce my 1st place 🏆🥇 win in the Secure RAG Challenge by Understand Tech In a world where AI security is becoming increasingly critical, developing robust offline RAG systems isn't just a technical challenge – it's a necessity. My winning open-source solution combines hybrid dense-sparse retrieval with advanced reranking to ensure both security and performance, helping organizations and individuals leverage AI while keeping sensitive data protected. 🔒💡 Open source is the future of secure AI - by making these solutions accessible to everyone, we're building a more secure and transparent AI ecosystem together. 🌟 Special shoutout to Trustii.io and Understand Tech for fostering innovation in secure AI solutions! 🙏 Check out my solution here: https://lnkd.in/dH8v--8K 🚀 #AI #Security #MachineLearning #Innovation #RAG #OpenSource
🎉 Announcing the Results of the Secure RAG Challenge by Understand Tech! 🏆 This competition brought together brilliant minds from the data science community to push the boundaries of offline RAG systems, emphasizing data privacy, security, and versatility through open-source technologies. We were impressed by the exceptional #opensource-based solutions, as participants developed comprehensive offline RAG pipelines capable of generating embeddings and facilitating chat-based retrieval without relying on external APIs. 🔍 Competition Highlights: Focus Areas: System Accuracy, Reproducibility, and Cost of Deployment. Innovative Solutions: From hybrid encoding and re-ranking approaches to GPU-optimized pipelines, our participants showcased a diverse range of strategies to tackle the challenges of secure offline RAG systems. 🏅 Congratulations to Our Top Five Winners (from four different countries) : 🥇 1st Prize: Abdoulaye SAYOUTI SOULEYMANE 🇫🇷 🥈 2nd Prize: Ahmed Benmessaoud 🇩🇿 🥉 3rd Prize: Param Thakkar team 🇮🇳 4️⃣ 4th Prize: Shravan Kumar K. 🇮🇳 5️⃣ 5th Prize: Dao Nguyen Duong 🇻🇳 💰 Prizes Awarded: 🥇 1st Prize: $4,000 cash and a 1-year team free subscription to Understand.Tech. 🥈 2nd Prize: $1,000 cash and a 1-year team free subscription to Understand.Tech. 🥉 3rd Prize: 1-year team free subscription to Understand.Tech. 4️⃣ 4th Prize: 1-year team free subscription to Understand.Tech. 5️⃣ 5th Prize: 1-year team free subscription to Understand.Tech. 🔗 Explore the Competition Results and Winner Solutions: Dive into the detailed solutions and methodologies of our top participants in the official GitHub Repository : https://lnkd.in/dCkpfsWk Each winner's repository includes their source code, setup instructions, comprehensive documentation, and evaluation metrics, providing valuable insights into their unique approaches and solutions. 🙌 A Heartfelt Thank You: A big thank you to all participants for your dedication and innovative solutions. Special appreciation to Understand.Tech for sponsoring the challenge and providing the platform that made this competition possible. 🚀 Join Us in Advancing Secure RAG Systems: Stay tuned for future challenges and opportunities to contribute to the evolving landscape of secure and versatile AI applications. #SecureRAG #UnderstandTech #DataScience #OpenSource #GenAI #CompetitionResults #MachineLearning #DataPrivacy #TechInnovation
To view or add a comment, sign in
-
The significance of responsible and ethical AI systems has gained immense prominence on the global stage, underscoring the escalating recognition of its far-reaching impact on societies worldwide. In this talk, Niharika Singhal will delve into the concept of openness of software materialized in the historical definition of Free Software and highlight as to how that should not be forgotten while developing an open AI landscape. The Apache Software Foundation #freesoftware #opensourcesoftware #ethicallicenses #freesoftwarelicense
To view or add a comment, sign in
-
-
Natural Language Querying of data in S3 with Athena and Generative AI (Text-to-SQL). This would be revolutionary for those who works in day to day life for writing SQL queries or migrating from one database to another database .. The text-to-SQL capability of generative AI has the potential to democratize data management, data governance , improve data quality, and accelerate data-driven decision-making across various industries and applications. dive into the codebase for more details .. #aws #nlp #genai #generativeai #awscloud #dataengineer #datascience #datascientist #aiml #awsarchitect #cloudarchitect #datamidernization #datademocratisation #glue #dataengineer
To view or add a comment, sign in
-
🚀 Keeping AI Open and Scalable: The Power of Community-Driven LLMs I "recently" had an insightful conversation with Jose Pablo Cabeza García and Antonio Velasco Fernandez from Elastacloud in the Software Engineering Daily on the importance of community-driven development for Large Language Models (LLMs). 🔐 Open Source vs. Closed Source: Current LLM landscape is dominated by closed-source models, posing risks of monopolistic behavior. Open-source initiatives are crucial to democratize access and innovation. 👩🏽⚖️ Regulation: There’s a fine balance needed in regulation to ensure safety without stifling innovation. Regulations should support transparency and allow open-source communities to thrive. 🧑🏽💻 Technological Foundations: LLMs, rooted in open research like the 2017 paper “Attention Is All You Need,” show the power of community collaboration in advancing tech. 🗺️ Future Directions: Tools like LangChain and LlamaIndex are expanding LLM capabilities, making them more versatile and powerful. 📚 Get Involved: For those interested, start with LangChain or explore resources like the Awesome LLM GitHub repository (https://lnkd.in/eJduJiDq) Let’s keep pushing for an open, collaborative future in AI! 🌐 https://lnkd.in/eEVZuCX3 #AI #OpenSource #LLM #Innovation #CommunityDriven
To view or add a comment, sign in
-
If you're exploring frameworks for RAG (Retrieval Augmented Generation) or AI applications, LlamaIndex (https://www.llamaindex.ai/) might save you hundreds of hours. After 80-120 hours of using LlamaIndex, I've found it bridges my Data Science knowledge gaps. As a developer with 25+ years of experience, LLMs can feel like black magic. I need fast results, and LlamaIndex delivers. Spending this many hours on something is far quicker than paying bloated academic courses for a PhD in Data Science. LlamaIndex allows rapid RAG solution development. It supports importing various file types (PPT, PDF, DOCX, CSV, text, mailbox files) and integrating them into a vector database. With an OpenAI API key (or Vertex AI), you can query and setup chat agents with your data with minimal code. Advanced tasks require only a few more lines of code. The framework lets you leverage the latest Data Science techniques, algorithms, and white papers such as vector scoring and hybrid search with simple function calls, without needing deep AI expertise (or a PhD). Drawbacks: - Extensive tools but documentation is example-based, not explanatory. Terms like FUSION_MODES.RECIPROCAL_RANK are unclear without further research. - Understanding the framework's functionality often requires reading the code. - Writing minimal code can take hours to comprehend. - Many tutorials only skim the surface and lack depth. - If you want to try and fix issues, it's probably easier to build a class of their base class and implement it yourself. - Vertex AI implementation was lacking, most basic things got flagged as hate speech or sexual harassment because Vertex AI has too many filters (and fails at many of them). No library calls to disable those filters (which are available using Vertex AI Preview libraries directly). Despite these challenges, LlamaIndex saves significant time compared to acquiring extensive Data Science expertise. The tech evolves rapidly, and LlamaIndex’s ease of updating keeps you at the cutting edge.
To view or add a comment, sign in
-
Using an LLM (guided by an ontology) to populate a knowledge graph is working well. All the code is Apache-2.
Using an LLM (guided by an ontology) to populate a knowledge graph is working well. All the code is Apache-2. I’ve now experimented with several ontologies and source document types. Have fun! #LLM #KnowledgeGraph https://lnkd.in/eK_ymM2F
To view or add a comment, sign in
-
✨ Exciting news for the open source #GenAI community! 🤗 Hugging Face changes the Text-Generation-Inference (TGI) library license back to Apache 2.0 👉 Why is that a thing? Three reasons: 1. another boost to the. open source collaboration (more community) 2. easing codebase maintenance (less bugs, more performance) 3. expanding the HF open-source ML footprint (more adoption) A strategic step for future innovation and growth for the #ArtificalIntelligence ecosystem! Thanks Clem Delangue 🤗
To view or add a comment, sign in