Machine Learning Week Europe’s Post

View organization page for Machine Learning Week Europe, graphic

3,114 followers

2mo Edited

Today’s LLMs such as #ChatGPT show an impressive performance and have the potential to revolutionize our daily life. All these LLMs are based on the Transformer architecture with the Attention mechanism at its core. Due to the quadratic scaling with context length, Attention makes processing of long sequences very expensive. In this talk, Maximilian Beck presents xLSTM, a novel architecture for #LLMs that scales only linear in context length while still outperforming Transformers on language modeling. https://ow.ly/3Q1x50T7v0F #mlweek #machinelearning

To view or add a comment, sign in

More Relevant Posts

Dominik Węglarz
8mo Edited
Report this post
The way you talk. It matters. "... Large language models (LLMs) have accomplished remarkable reasoning performance in various domains. However, in the domain of reasoning tasks, we discover a frailty: LLMs are surprisingly brittle to the ordering of the premises, despite the fact that such ordering does not alter the underlying task. In particular, we observe that LLMs achieve the best performance when the premise order aligns with the context required in intermediate reasoning steps. For example, in deductive reasoning tasks, presenting the premises in the same order as the ground truth proof in the prompt (as opposed to random ordering) drastically increases the model’s accuracy. We first examine the effect of premise ordering on deductive reasoning on a variety of LLMs, and our evaluation shows that permuting the premise order can cause a performance drop of over 30%. In addition, we release the benchmark R-GSM, based on GSM8K, to examine the ordering effect for mathematical problem-solving, and we again observe a significant drop in accuracy, relative to the original GSM8K benchmark ..."

Konrad Jędrzejczyk

Expert @ AI Security Foundation
8mo

Disappointed with 'Chat with RTX'? Surprised by logical inconsistencies in GPT-4 responses? Linear logic matters. All Large Language Models (LLMs) fall short; a new logic layer is crucial, particularly in Retrieval-Augmented Generation (RAG) systems. For an in-depth understanding, see 'Premise Order Matters in Reasoning with Large Language Models': https://lnkd.in/dHyXd58U Marek Zmysłowski #ChatNMI

arxiv.org
Like Comment
To view or add a comment, sign in
The Deep Hub

342 followers
8mo
Report this post
Explore the future of language modeling with Lazy Programmer's insightful article on The Deep Hub! Discover Mamba, a state-space model (SSM) and a non-attention architecture for language modeling, offering promising results in experimental tests. Learn about Mamba's key components, advantages, and applications across various domains. https://lnkd.in/dMXhR7Wh #Mamba #LLM #Transformers #ArtificialIntelligence #MachineLearning

Mamba (Transformer Alternative): The Future of LLMs and ChatGPT?

medium.com
Like Comment
To view or add a comment, sign in
Pradeep Nambiar

Capgemini Engineering, Industry Solutions Director | Data, AI & Security | CISSP | AASP | Personal Decentralized Computing
1w
Report this post
Explore the latest VLMs (Vision Language Models), now available in various sizes with open licenses, and optimized for edge deployment on consumer-grade hardware. These models are transforming workflow automation across industries—enabling seamless integration of visual and language processing, such as text extraction, in real-world applications that can be deployed rapidly.

Evaluating Small Vision Language Models for Text Extraction

link.medium.com

2 Comments
Like Comment
To view or add a comment, sign in
SONALI KORI

||Assistant professor in Parul University||Java|c|c++|python|content Writing|SQL|Oracle|Angular|
6mo
Report this post
I be pleased to be participated in NextGen Engineering : the Pre-Conference Discovery Workshop in "Leveraging LLM and RAG for Advanced Text Processing". Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) can be leveraged to improve personalized product recommendations. By moving through the data preprocessing, embedding generation, and personalized reranking, we will demonstrate practical application of using LLM and RAG at generating recommendations that considers individual preferences. Retrieval Augmented Generation (RAG) is a groundbreaking framework that enhances Large Language Models (LLMs) by integrating external knowledge sources. #internationalconference #LLM #RAG #conference #AdvancedText #ArtificialIntelligence #Deeplearning #Machinelearning #Researcher #AdvancedText #Processing
8 Comments
Like Comment
To view or add a comment, sign in
MIT Computer Science and Artificial Intelligence Laboratory (CSAIL)

156,234 followers
9mo
Report this post
Our top Podcast of 2023 #1: MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) Professor Saman Amarasinghe discusses how Large Language Models, such as Chat GPT, will alter the future of programming. Listen here: https://bit.ly/3Nwj4ok

1 Comment
Like Comment
To view or add a comment, sign in
Monika Mak

Junior Software Developer | AI Enthusiast | Empowering People & Businesses to Elevate Their Potential Through AI-Driven Solutions
1mo Edited
Report this post
Chat GPT gets even more powerful "We've developed a new series of AI models designed to spend more time thinking before they respond. They can reason through complex tasks and solve harder problems than previous models in science, coding, and math." Open AI Read full article👉🏼https://lnkd.in/eU8krKTU #openai #technology #chatgpt #aistrawberry #llm #openaio1
4 Comments
Like Comment
To view or add a comment, sign in
Deep Tech Stars

449 followers
4mo
Report this post
Claude 3.5 Sonnet: Anthropic's AI Model Outshines GPT-4o in Benchmarks: 1. Anthropic's new AI model, Claude 3.5 Sonnet, surpasses GPT-4o (powering ChatGPT) on several benchmarks related to reasoning, coding, and math. 2. Sonnet boasts impressive features like faster processing, larger context window, and text-image analysis capabilities. 3. The model is available now with various access options, and an even more powerful version (Claude 3.5 Opus) is on the horizon. #deeptechstars #anthropic #claude3 #sonnet #gpt #language #coding #opus #techupdates
Like Comment
To view or add a comment, sign in
Ramin Mehran

Tech Lead @ Google DeepMind Multi-Modal perception/generation, AI Breakdown Podcaster
7mo
Report this post
In this episode, we discuss ReALM: Reference Resolution As Language Modeling by Joel Ruben Antony Moniz, Soundarya Krishnan, Melis Ozyildirim, Prathamesh Saraf, Halim Cagri Ates, Yuan Zhang, Hong Yu, Nidhi Rajshree. This paper presents a method for using Large Language Models (LLMs) to resolve references, including complex ones such as entities on a user's screen or in the background, by framing reference resolution as a language modeling task. The proposed system shows significant improvements, with over 5% gains in handling on-screen references, compared to an existing system. Moreover, the paper reports that even the smallest model within their framework performs comparably to GPT-4, while their larger models outperform GPT-4.

arxiv preprint - ReALM: Reference Resolution As Language Modeling

podbean.com

1 Comment
Like Comment
To view or add a comment, sign in
Himanshu Sharma

Lead Architect at SAP
9mo
Report this post
Attention is all you need! 🧠💡 Did you know that the revolutionary idea behind LLMs was presented back in 2017? The paper titled "Attention is all you need" introduced the transformer architecture which paved the way for the ChatGPT phenomenon we witnessed in 2022. After a few attempts at understanding the details, I managed to appreciate the breakthrough it brought. If you're curious too, check out the link to the paper: https://lnkd.in/eqAHer9r Stay curious, and keep learning! 🤓 #LLMs #transformerarchitecture #ChatGPT #attentionisallyouneed

Attention Is All You Need

arxiv.org
Like Comment
To view or add a comment, sign in
Nitu Kumari

Social media manager , content creator, Digital marketing expert, tech blogger, AI and tech Enthusiast , Top Office Administrstion Voice.
6mo
Report this post
The #MegalodonModel is an innovative approach to #machineLearning (ML) that addresses some of the core limitations of the #TransformerArchitecture, which is the foundation of many current #Large Language Models (#LLMs). Here’s an active overview of #Megalodon: #ExtendedContextWindow: Megalodon significantly expands the context window capacity of language models, enabling them to process sequences of millions of tokens without the need for extensive memory resources. #Efficiency in #Pretraining and #Inference: The model is designed for efficient sequence modeling, which allows for better pretraining and inference with unlimited context length. #Performance: In comparative experiments, Megalodon has shown to outperform Transformer models of the same size, particularly in handling large texts. #TechnicalComponents: Megalodon introduces several novel technical components such as complex exponential moving average (CEMA), timestep normalization layer, normalized attention mechanism, and pre-norm with two-hop residual configuration to improve its capability and stability. #ReducedComplexity: The architecture of Megalodon reduces the complexity of processing long sequences, which is a significant challenge for the traditional Transformer models due to their quadratic complexity. #PotentialSuccessor to #Transformer: With its advancements, Megalodon is considered one of the latest models proposed as a successor to the Transformer, aiming to enhance the capabilities of LLMs. For more detailed insights, you can explore the research paper available on arXiv. The development of Megalodon represents a step forward in the evolution of deep learning architectures, potentially leading to more powerful and efficient LLMs in the future. #Megalodon #Meta #LLM
2 Comments
Like Comment
To view or add a comment, sign in

3,114 followers

View Profile Follow

Machine Learning Week Europe’s Post

More Relevant Posts

arxiv preprint - ReALM: Reference Resolution As Language Modeling

podbean.com

Explore topics