𝐄𝐱𝐜𝐢𝐭𝐢𝐧𝐠 𝐍𝐞𝐰𝐬 𝐢𝐧 𝐎𝐛𝐣𝐞𝐜𝐭 𝐃𝐞𝐭𝐞𝐜𝐭𝐢𝐨𝐧: 𝐘𝐎𝐋𝐎-𝐖𝐨𝐫𝐥𝐝 Thrilled to share an innovative new development in the field of object detection - YOLO-World. Building on the efficiency and practicality established by the You Only Look Once (YOLO) series of detectors, YOLO-World brings an open-vocabulary detection capability to the table. Traditional detectors are limited by predefined and trained object categories. However, YOLO-World moves beyond these confines by incorporating vision-language modeling and pre-training on large-scale datasets: this manifests as consistent, exceptional performance in detecting an expansive range of objects in a zero-shot manner while maintaining high efficiency. The technological innovation behind this approach is the newly-proposed Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) combined with a region-text contrastive loss. These facilitate a more profound interaction between visual and linguistic information. The results speak for themselves: on the challenging LVIS dataset, YOLO-World achieves a 35.4 average precision (AP) at an impressive 52.0 frames per second (FPS) on V100. This achievement outperforms many state-of-the-art methods in terms of both speed and accuracy. More interestingly, the fine-tuned YOLO-World shows remarkable performance on several downstream tasks. This includes object detection and open-vocabulary instance segmentation, highlighting broad applications and potential for this technology. 🔗 https://lnkd.in/gzYp_b2w YOLO-World is a serious game-changer, introducing flexibility and scalability to object detection that was previously unattainable. Stay tuned for more developments in this space! #AI #ObjectDetection #MachineLearning #yolo #computervision #datascience #artificialintelligence #innovation #technology #visionmodeling #YOLO-World
ABDELLATIF BELMADY’s Post
More Relevant Posts
-
Introducing YOLO-World: The Future of Open-Vocabulary Object Detection Object detection is crucial for AI systems operating in the real world, but traditional models are limited to predefined object categories. YOLO-World is a revolutionary new model that overcomes this through open-vocabulary object detection, allowing it to identify any object described in natural language. To learn about it in detail read https://lnkd.in/dxtEPaXS The key innovations behind YOLO-World include: ⚡ Building on the exceptional speed and efficiency of YOLOv8 🪄 Using a "prompt-then-detect" paradigm to bypass real-time text encoding 🧠 Grounding visual and language understanding through cross-modal fusion ⚡ Avoiding complex Transformer architectures for faster, lighter inference 🌎 Released open-source under GPL to foster accessibility and collaboration YOLO-World achieves remarkable performance benchmarks, running up to 20x faster than comparable models on a single GPU while maintaining competitive accuracy. Its small model footprint enables deployment across a wide range of devices. But what really unlocks YOLO-World's transformative potential is custom prompting. With phrases like "cracked smartphone screen", you can rapidly adapt its detection capabilities without retraining. This opens up endless innovative applications: 📷 Intelligent surveillance and anomaly detection 🏪 Retail analytics and automated inventory tracking ⚙️ Enabling smarter robotics and navigation 👩🦯 Advancing accessibility for the visually impaired ...and so much more! YOLO-World heralds a new era of open-vocabulary, real-time perception for AI. Its speed, flexibility, and open-source philosophy empower us to build truly intelligent systems attuned to the real world's boundless complexity. #ai #computervision #innovation #opensource
YOLO-World: Real-Time, Open-Vocabulary Object Detection
ruintheextinct.medium.com
To view or add a comment, sign in
-
📈 10M+ Views | 🚀 Turning Data into Actionable Insights | 🤖 AI, ML & Analytics Expert | 🎥 Content Creator & YouTuber | 💻 Power Apps Innovator | 🖼️ NFTs Advocate | 💡 Tech & Innovation Visionary | 🔔 Follow for More
"YALTAi introduces a game-changing approach to layout analysis in OCR and similar tasks. By leveraging object detection instead of pixel classification, it significantly enhances segmentation efficiency. The incorporation of YOLOv5 into Kraken 4.1's pipeline yields remarkable performance gains, particularly on smaller datasets. This innovation marks a pivotal shift in document digitization, promising superior extraction accuracy and noise reduction. #AI #MachineLearning #ComputerVision #YALTAi #KrakenEngine"
"YALTAi introduces a game-changing approach to layout analysis in OCR and similar tasks. By leveraging object detection instead of pixel classification, it significantly enhances segmentation efficiency. The incorporation of YOLOv5 into Kraken 4.1's pipeline yields remarkable performance gains, particularly on smaller datasets. This innovation marks a pivotal shift in document digitization, p...
arxiv.org
To view or add a comment, sign in
-
***Research Paper/Project Update*** Excited to share our recent work accepted at ECCV 2024 spearheaded by Abrar Majeedi 😁 🎉. Do checkout this work that he plans to present in Milan. One line description about project: How to assess the performance of a diver (or actions of a surgeon) and score it? How certain can we be about the model's predicted score and can we have a step-wise feedback to trust the model's predictions? Fun coincidence: One of the datasets we evaluated is related to Olympics Diving and the timing of this paper with 2024 Paris Olympics felt nice 😎 #MachineLearning #AI #ComputerVision #SportsResearch #ECCV
🚀 I am excited to announce that our paper, "𝗥𝗜𝗖𝗔𝟮: 𝗥𝘂𝗯𝗿𝗶𝗰-𝗜𝗻𝗳𝗼𝗿𝗺𝗲𝗱, 𝗖𝗮𝗹𝗶𝗯𝗿𝗮𝘁𝗲𝗱 𝗔𝘀𝘀𝗲𝘀𝘀𝗺𝗲𝗻𝘁 𝗼𝗳 𝗔𝗰𝘁𝗶𝗼𝗻𝘀," has been accepted at ECCV 2024! ⭐ RICA2 incorporates human-designed scoring rubrics to emulate the human scoring process of activities. ⭐ It also provides calibrated uncertainty estimates, indicating when model predictions can be trusted. ⭐ We demonstrate the effectiveness of RICA2 in automatically evaluating diverse activities such as Olympic diving and surgical procedures. Thanks to the amazing team: Viswanatha Reddy, Satya Sai Srinath Namburi, and Yin Li. You can read our paper on arXiv and visit our project page for more details. 📄 Paper: https://lnkd.in/gkw4Q-yS 🔗 Project page: https://lnkd.in/gWMHY56E ⌨ Code: https://lnkd.in/gvVFk-7u See you in Milan! #ECCV2024 #ComputerVision #AI #MachineLearning #Research
RICA 2 : Rubric-Informed, Calibrated Assessment of Actions ECCV 2024
abrarmajeedi.github.io
To view or add a comment, sign in
-
🎓 MTech in AIML | 🚀 Former AIML Researcher | 📝 Blogger on LLMs, VLMs, & Generative AI | Sharing Research Papers & Innovations Shaping the Future of AI | Exploring the Latest in Generative AI
🚨 New Study Reveals Critical Blind Spot in AI's Visual Intelligence! 🚨 A recent study exposed a critical blind spot in AI's visual intelligence. While these models excel at interpreting photographic realism, they falter dramatically when presented with abstract visualizations like diagrams and charts. Advanced models such as GPT-4.0 and Claude 3.5 Sonnet achieved a mere 64.7% and 59.9% accuracy respectively on a diverse dataset of 11,193 abstract images, significantly lagging human performance at 82.1%. The research team employed a "multimodal self-instruct" approach to develop a diverse dataset of 11,193 abstract images, each accompanied by specific questions across eight scenarios, such as dashboards, road maps, diagrams, tables, and more. Despite their multimodal capabilities, the models struggle particularly with tasks requiring spatial reasoning and abstract concept understanding. Attempts to improve open-source model performance using synthetic data showed some success, yet reliance on costly closed models for reference data poses challenges. . 🔗Read more:- https://lnkd.in/dKbVgGzn. 🔗Blog :- https://t.ly/QLhNK #AI #ArtificialIntelligence #VisualIntelligence #MachineLearning #Research #DataScience #TechNews #DeepLearning #NeuralNetworks #AIResearch #TechInnovation #AbstractReasoning #AIChallenges #FutureOfAI #TechTrends #Innovation #SyntheticData #AIModels #Share
Multimodal Self-Instruct
multi-modal-self-instruct.github.io
To view or add a comment, sign in
-
Hugging Face writes about a new transformer based AI model for predicting time-series data based on Meta's large language model Llama called Lag-Llama https://lnkd.in/eU89MeBJ. #huggingface #timeseriesforecasting #transformermodels #lagllama #llama #artificialintelligence #meta
time-series-foundation-models/Lag-Llama · Hugging Face
huggingface.co
To view or add a comment, sign in
-
Hugging Face Researchers Introduce Idefics2: A Powerful 8B Vision-Language Model Elevating Multimodal AI Through Advanced OCR and Native Resolution Techniques Quick read: https://lnkd.in/gHWWRkFr HF Page: https://lnkd.in/gJaAGSkd
Idefics2 🐶 - a HuggingFaceM4 Collection
huggingface.co
To view or add a comment, sign in
-
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success in modalities such as natural language processing and computer vision, foundation models for time series forecasting have lagged behind. We are proud to present Lag-Llama: the first open-source foundation model for time series forecasting! Lag-LLama is developed jointly with our amazing team from Universite de Montreal, my CERC-AAI lab, ServiceNow and Morgan Stanley! We are grateful for compute resources provided to us by OLCF - the Oak Ridge Leadership Computing Facility (DoE Office of Science User Facility supported under Contract DE-AC05-00OR22725). Lag-Llama is a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data. Time-series foundation models have the potential to revolutionize applications such as computational medicine, natural sciences, finance climate, retail, ecology, energy and many more. They will become essential in applications where the data availability is limited and thus a transfer from a general-purpose pretrained model might be required. Model weights: https://lnkd.in/epuAwQ-X Demo: https://lnkd.in/evCHJ6-M GitHub: https://lnkd.in/eQi-uNzW Paper: https://lnkd.in/eK4BbfUk Tweet: https://lnkd.in/ezBf8int Blog: https://lnkd.in/e6XFaNQS
time-series-foundation-models/Lag-Llama · Hugging Face
huggingface.co
To view or add a comment, sign in
-
IBM Executive Architect - Data, AI & Cyber for Resilience - University of Bari Adjunct Professor for Emerging Tech, Innovation, AI & Machine Learning - Member of ISO/JTC 1/SC42 -
Interesting news from the research team that developed an open source foundation model for time series which is at the state of the art in this sub-field. The paradigm shift for machine learning that foundation models triggered is opening to multiple use cases that in next months will go beyond the textual-language data and re-define the solutions in a number of industrial and practical contexts and time series are pervasive! See the list of training data set the team considered I propose into the first comment to have an idea of kind of application problems.
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success in modalities such as natural language processing and computer vision, foundation models for time series forecasting have lagged behind. We are proud to present Lag-Llama: the first open-source foundation model for time series forecasting! Lag-LLama is developed jointly with our amazing team from Universite de Montreal, my CERC-AAI lab, ServiceNow and Morgan Stanley! We are grateful for compute resources provided to us by OLCF - the Oak Ridge Leadership Computing Facility (DoE Office of Science User Facility supported under Contract DE-AC05-00OR22725). Lag-Llama is a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data. Time-series foundation models have the potential to revolutionize applications such as computational medicine, natural sciences, finance climate, retail, ecology, energy and many more. They will become essential in applications where the data availability is limited and thus a transfer from a general-purpose pretrained model might be required. Model weights: https://lnkd.in/epuAwQ-X Demo: https://lnkd.in/evCHJ6-M GitHub: https://lnkd.in/eQi-uNzW Paper: https://lnkd.in/eK4BbfUk Tweet: https://lnkd.in/ezBf8int Blog: https://lnkd.in/e6XFaNQS
time-series-foundation-models/Lag-Llama · Hugging Face
huggingface.co
To view or add a comment, sign in
-
"Graph embeddings learn the structure of networks and represent it in low-dimensional vector spaces. Community structure is one of the features that are recognized and reproduced by embeddings. We show that an iterative procedure, in which a graph is repeatedly embedded and its links are reweighted based on the geometric proximity between the nodes, reinforces intra-community links and weakens inter-community links, making the clusters of the initial network more visible and more easily detectable. The geometric separation between the communities can become so strong that even a very simple parsing of the links may recover the communities as isolated components with surprisingly high precision. Furthermore, when used as a pre-processing step, our embedding and reweighting procedure can improve the performance of traditional community detection algorithms." https://lnkd.in/dfTk-HEJ.
Iterative embedding and reweighting of complex networks reveals community structure - Scientific Reports
nature.com
To view or add a comment, sign in
-
If you're personally interested in AI ... Or if you're a *business* and you want to apply AI ... You need to understand how to use foundation models, and how to fine-tune them to solve specific problems for you. #machinelearning #datascience #AI #data
Over the past years, foundation models have caused a paradigm shift in machine learning due to their unprecedented capabilities for zero-shot and few-shot generalization. However, despite the success in modalities such as natural language processing and computer vision, foundation models for time series forecasting have lagged behind. We are proud to present Lag-Llama: the first open-source foundation model for time series forecasting! Lag-LLama is developed jointly with our amazing team from Universite de Montreal, my CERC-AAI lab, ServiceNow and Morgan Stanley! We are grateful for compute resources provided to us by OLCF - the Oak Ridge Leadership Computing Facility (DoE Office of Science User Facility supported under Contract DE-AC05-00OR22725). Lag-Llama is a general-purpose foundation model for univariate probabilistic time series forecasting based on a decoder-only transformer architecture that uses lags as covariates. Lag-Llama is pretrained on a large corpus of diverse time series data from several domains, and demonstrates strong zero-shot generalization capabilities compared to a wide range of forecasting models on downstream datasets across domains. Moreover, when fine-tuned on relatively small fractions of such previously unseen datasets, Lag-Llama achieves state-of-the-art performance, outperforming prior deep learning approaches, emerging as the best general-purpose model on average. Lag-Llama serves as a strong contender to the current state-of-art in time series forecasting and paves the way for future advancements in foundation models tailored to time series data. Time-series foundation models have the potential to revolutionize applications such as computational medicine, natural sciences, finance climate, retail, ecology, energy and many more. They will become essential in applications where the data availability is limited and thus a transfer from a general-purpose pretrained model might be required. Model weights: https://lnkd.in/epuAwQ-X Demo: https://lnkd.in/evCHJ6-M GitHub: https://lnkd.in/eQi-uNzW Paper: https://lnkd.in/eK4BbfUk Tweet: https://lnkd.in/ezBf8int Blog: https://lnkd.in/e6XFaNQS
time-series-foundation-models/Lag-Llama · Hugging Face
huggingface.co
To view or add a comment, sign in
Operational data engineering intern | Data science student
9moIlyas El younoussi