🚀 We're excited to share our latest research paper, Writing in the Margins (WiM). Despite advances in larger context windows, AI models still struggle with accuracy and clarity over long prompts. That’s where WiM comes in. WiM improves model performance by over 7.5% in multi-hop reasoning and more than 30% in aggregation tasks, making it super effective for long-context tasks. How does it work? WiM segments inputs into smaller units and creates summaries — like margin notes in a book. These "margins" then guide the model to the most relevant information. 🧠 What you need to know: - It boosts multi-hop reasoning accuracy (HotpotQA, MultiHop-RAG) by 7.5% - It increases F1-score for aggregation tasks (CWE) by over 30% - It improves transparency by displaying margins in real time, showing the reasoning behind responses 👏 Huge kudos to our engineering team for this breakthrough! More exciting research is on the way. For now, see the details here: https://lnkd.in/gQVu2ncA #AI #MachineLearning #Research #Innovation
Writer’s Post
More Relevant Posts
-
🤔 Just read a good paper about AI Reasoning; Q* ( Q-star) - a new approach that could change how AI reasons! The problem that caught my eye: Our current AI models, despite being powerful, often rush through complex problems like an overconfident intern. They make quick decisions without much thought, leading to compounding errors. What's exciting about Q*? It teaches AI to "think before speaking" by introducing deliberative planning. Instead of generating responses in one go, it evaluates different solution paths and chooses the most promising ones - much like how we humans tackle complex problems! 🎯 The results are impressive: Improved accuracy in math reasoning by 15% Better code generation without expensive model retraining Works with existing AI models (they tested it with Llama-2!) 🔥 Key takeaway for businesses: This isn't just academic - it's a practical approach to make AI more reliable for real-world applications, from financial analysis to software development. Paper: https://lnkd.in/eXmVPTf6 #PhdLife #AI #Innovation #TechNews #FutureOfWork
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning
arxiv.org
To view or add a comment, sign in
-
This paper, titled “Chain of Thoughtlessness: An Analysis of CoT in Planning” dated 8 May 2024, presents a critical analysis of the effectiveness of the "Chain of Thought" (CoT) technique in enabling large language models (LLMs) to generalize reasoning abilities to out-of-distribution problems. The authors conduct a case study on using problems from the Blocksworld, a classical AI planning domain. They evaluated LLMs like GPT-4 and Claude-3-Opus across various CoT prompting strategies and problem complexities. Context: CoT prompting is a method of prompting LLMs to “reason” using intermediate steps before arriving at a final answer. The premise is that complex multi-steps problems could be tackled, when mimicking a train of thought, using carefully crafted prompts. Key Findings and Insights: · CoT prompts only provide meaningful performance improvements when they are exceedingly specific to the problem class being solved. · More general CoT prompts, which should theoretically enable LLMs to learn and apply algorithmic procedures, fail to provide robust reasoning capabilities across different problem instances. · As the complexity of problems increases, the benefits of CoT prompts quickly deteriorate, indicating a lack of true generalization. · Contrary to claims in the literature, the performance improvements from CoT prompts do not stem from LLMs learning general algorithmic procedures via examples of reasoning. · Highly specific CoT prompts can boost performance, but at the cost of significant human effort in crafting examples for each problem subclass. You can find the paper here:
Chain of Thoughtlessness: An Analysis of CoT in Planning
arxiv.org
To view or add a comment, sign in
-
This is an interesting one; if you get an LLM to use 'reverse reasoning', i.e. starting from the solution and reasoning towards the problem like humans can, it improves their reasoning. It seems more and more that applying human thought processes to LLM's strengthen their abilities. https://lnkd.in/dXDh4uBP #AI
Reverse Thinking Makes LLMs Stronger Reasoners
arxiv.org
To view or add a comment, sign in
-
📃Scientific paper: Universal Prompt Optimizer for Safe Text-to-Image Generation Abstract: Text-to-Image (T2I) models have shown great performance in generating images based on textual prompts. However, these models are vulnerable to unsafe input to generate unsafe content like sexual, harassment and illegal-activity images. Existing studies based on image checker, model fine-tuning and embedding blocking are impractical in real-world applications. Hence, we propose the first universal prompt optimizer for safe T2I (POSI) generation in black-box scenario. We first construct a dataset consisting of toxic-clean prompt pairs by GPT-3.5 Turbo. To guide the optimizer to have the ability of converting toxic prompt to clean prompt while preserving semantic information, we design a novel reward function measuring toxicity and text alignment of generated images and train the optimizer through Proximal Policy Optimization. Experiments show that our approach can effectively reduce the likelihood of various T2I models in generating inappropriate images, with no significant impact on text alignment. It is also flexible to be combined with methods to achieve better performance. Our code is available at https://lnkd.in/ejwPvHJ8. Continued on ES/IODE ➡️ https://etcse.fr/dRm4 ------- If you find this interesting, feel free to follow, comment and share. We need your help to enhance our visibility, so that our platform continues to serve you.
Universal Prompt Optimizer for Safe Text-to-Image Generation
ethicseido.com
To view or add a comment, sign in
-
⭐ Editor's Choice, "Many-Objectives #Optimization: A Machine Learning Approach for Reducing the Number of Objectives", https://lnkd.in/euUKWd9C #ObjectivesReduction #MultiobjectiveOptimization Author List: António Gaspar-Cunha, Paulo Costa, Francisco Monaco, Alexandre Delbem Abstract: Solving real-world multi-objective optimization problems using Multi-Objective Optimization Algorithms becomes difficult when the number of objectives is high since the types of algorithms generally used to solve these problems are based on the concept of non-dominance, which ceases to work as the number of objectives grows. This problem is known as the curse of dimensionality. Simultaneously, the existence of many objectives, a characteristic of practical optimization problems, makes choosing a solution to the problem very difficult. Different approaches are being used in the literature to reduce the number of objectives required for optimization. This work aims to propose a machine learning methodology, designated by FS-OPA, to tackle this problem. The proposed methodology was assessed using DTLZ benchmarks problems suggested in the literature and compared with similar algorithms, showing a good performance. In the end, the methodology was applied to a difficult real problem in polymer processing, showing its effectiveness. The algorithm proposed has some advantages when compared with a similar algorithm in the literature based on machine learning (NL-MVU-PCA), namely, the possibility for establishing variable–variable and objective–variable relations (not only objective–objective), and the elimination of the need to define/chose a kernel neither to optimize algorithm parameters. The collaboration with the DM(s) allows for the obtainment of explainable solutions.
To view or add a comment, sign in
-
-
AI SCEPTICISM POST (again) ----- As you all know, I do value AI as an interesting tool, that greatly helps with several types of tasks. But I feel fireworks at the bottom when I hear stuff like "AI will revolutionize X", and, especially, "we will cure X with AI". Undoutably , it's a great instrument, but yet a hundred times less valuable then Excel😁 My greatest concern is accuracy. Since LLMs main design feature is approximation and prediction, herein lies it's main vulnerability. I came across quite an interesting article today - https://lnkd.in/evW458U2. It investigates the "hallucinations" in LLMs, and even though it might be a bit controversial, it still emphasizes good points. NB! It doesn't mean that LLMs are bad or something like this. The idea is that AI, like any tool, has it's strong sides and it's limitations, and it's crucial to know them to find the right applications for it.
LLMs Will Always Hallucinate, and We Need to Live With This
arxiv.org
To view or add a comment, sign in
-
🔍 Exciting Development in AI Reasoning: Meet RATIONALYST! 🔍 I came across a fascinating new model called RATIONALYST, designed to enhance reasoning capabilities in large language models (LLMs). 🤔 Often, the reasoning steps produced by LLMs can feel incomplete, mimicking the logical leaps we commonly make in everyday communication. To address this challenge, RATIONALYST introduces process-supervision for reasoning, pre-trained on a massive collection of rationale annotations. Here are some key highlights: - 79,000 rationales: Extracted from a vast web-scale dataset (The Pile) with minimal human intervention. - Versatile reasoning: RATIONALYST generalizes across diverse tasks, including - mathematical, commonsense, scientific, and logical reasoning. - Performance boost: Fine-tuned from LLaMa-3-8B, it improves accuracy by an average of 3.9% on seven representative reasoning benchmarks. - Strong competitor: RATIONALYST demonstrates superior performance compared to larger models like GPT-4! If you're curious to learn more about this innovative approach to improving reasoning in AI, check out the full paper here: https://lnkd.in/gGQSxuPp. #ai #machinelearning #naturallanguageprocessing #reasoning #huggingface #rationalyst
Paper page - RATIONALYST: Pre-training Process-Supervision for Improving Reasoning
huggingface.co
To view or add a comment, sign in
-
🌟 Exciting News! 🌟 Thrilled to share that I have seven papers accepted at NeurIPS 2024! 🎉 This year, our work spans several topics at the intersection of deep learning, optimization, and theoretical analysis. A special thanks to all my collaborators who contributed to these efforts. Here are the paper titles: 1️⃣ On the Comparison between Multi-modal and Single-modal Contrastive Learning 2️⃣ Provable and Efficient Dataset Distillation for Kernel Ridge Regression 3️⃣ Federated Learning from Vision-Language Foundation Models: Theoretical Analysis and Method 4️⃣ Unveil Benign Overfitting for Transformer in Vision: Training Dynamics, Convergence, and Generalization 5️⃣ Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning 6️⃣ On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability 7️⃣ SLTrain: a Sparse Plus Low Rank Approach for Parameter and Memory Efficient Pretraining Check out two of the arXiv versions here: https://lnkd.in/gjyrDnQt https://lnkd.in/gaBJ_4TD More of our work will be released soon, so stay tuned! 📚 #NeurIPS2024 #DeepLearning #AI #MachineLearning #Research
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
arxiv.org
To view or add a comment, sign in
-
𝐓𝐡𝐞 𝐏𝐫𝐨𝐦𝐩𝐭 𝐑𝐞𝐩𝐨𝐫𝐭: 𝐀 𝐒𝐲𝐬𝐭𝐞𝐦𝐚𝐭𝐢𝐜 𝐒𝐮𝐫𝐯𝐞𝐲 𝐨𝐟 𝐏𝐫𝐨𝐦𝐩𝐭𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 This 76 page overview aims to provide a comprehensive taxonomy of prompting techniques. Presents 33 vocabulary terms, a taxonomy of 58 text-only prompting techniques and 40 techniques for other modalities as well as: 📝 Evaluated various prompting techniques on the MMLU benchmark. 📊 Discusses methods for evaluating prompt outputs to ensure accuracy and reduce risks. 🌍 Explores prompting techniques beyond English, including multilingual and multimodal approaches. 🎨 Techniques extend to various media like images, audio, and video, reflecting the growing complexity of prompting. 𝘋𝘚𝘗𝘺 𝘰𝘶𝘵𝘱𝘦𝘳𝘧𝘰𝘳𝘮𝘴 𝘢 𝘩𝘶𝘮𝘢𝘯 20:1 𝘢𝘯𝘥 𝘢𝘤𝘩𝘪𝘦𝘷𝘦𝘴 𝘣𝘦𝘵𝘵𝘦𝘳 𝘍1 𝘴𝘤𝘰𝘳𝘦: "We documented the prompt engineering process in order to illustrate the way that an experienced prompt engineer goes about their work. The exercise proceeded through 47 recorded development steps, cumulatively about 20 hours of work" Human prompting often involves trial and error to find effective templates. DSPy offers a more systematic approach by treating prompting as a program. By defining modules for steps like retrieving context and generating answers, DSPy helps build the prompt for you. Automated prompt engineering shows strong potential with DSPy. You will hear a lot more about DSPy over the next few months. Abs: https://lnkd.in/gjG4iPmD DSPy: https://lnkd.in/g8vw2Bc9 Github: https://lnkd.in/g7remNgw #AI #GenAI #LLM #MLLM #Prompting #Promptengineering #DSPY #multimodal
To view or add a comment, sign in
-
OK, setting aside language choices (I'm not a fan of language implying agency in mathematical models), this paper points out some serious risks inherent in large language models. When the model is instructed with conflicting motivations, for example "advance renewable energy adoption globally" and "maintaining profitability through our existing [fossil fuel] energy infrastructure", it can generate text promoting one goal over the other - i.e. "scheming". 🤷🏻♀️ An issue here is that motivation can be given as context for the model, not visible for the end user - such as the context given for a custom GPT, or the layers of tweaks added on top of models to mitigate bias. ⚠ This means that even if you craft your prompt skilfully, hidden instructions may distort the answers you get. ⚠ So in addition to transparency about training data, there is a need for transparency about the architecture around the model when it is deployed. For a TLDR, check out https://lnkd.in/dv8Kge2x. Note: this is a preprint, so not peer reviewed yet. Leonora Onarheim Bergsjø, PhD Inge Harkestad Helga M. Brogger, MD
Frontier Models are Capable of In-context Scheming
arxiv.org
To view or add a comment, sign in