"YALTAi introduces a game-changing approach to layout analysis in OCR and similar tasks. By leveraging object detection instead of pixel classification, it significantly enhances segmentation efficiency. The incorporation of YOLOv5 into Kraken 4.1's pipeline yields remarkable performance gains, particularly on smaller datasets. This innovation marks a pivotal shift in document digitization, promising superior extraction accuracy and noise reduction. #AI#MachineLearning#ComputerVision#YALTAi#KrakenEngine"
𝐄𝐱𝐜𝐢𝐭𝐢𝐧𝐠 𝐍𝐞𝐰𝐬 𝐢𝐧 𝐎𝐛𝐣𝐞𝐜𝐭 𝐃𝐞𝐭𝐞𝐜𝐭𝐢𝐨𝐧: 𝐘𝐎𝐋𝐎-𝐖𝐨𝐫𝐥𝐝
Thrilled to share an innovative new development in the field of object detection - YOLO-World. Building on the efficiency and practicality established by the You Only Look Once (YOLO) series of detectors, YOLO-World brings an open-vocabulary detection capability to the table.
Traditional detectors are limited by predefined and trained object categories. However, YOLO-World moves beyond these confines by incorporating vision-language modeling and pre-training on large-scale datasets: this manifests as consistent, exceptional performance in detecting an expansive range of objects in a zero-shot manner while maintaining high efficiency.
The technological innovation behind this approach is the newly-proposed Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) combined with a region-text contrastive loss. These facilitate a more profound interaction between visual and linguistic information.
The results speak for themselves: on the challenging LVIS dataset, YOLO-World achieves a 35.4 average precision (AP) at an impressive 52.0 frames per second (FPS) on V100. This achievement outperforms many state-of-the-art methods in terms of both speed and accuracy.
More interestingly, the fine-tuned YOLO-World shows remarkable performance on several downstream tasks. This includes object detection and open-vocabulary instance segmentation, highlighting broad applications and potential for this technology.
🔗 https://lnkd.in/gzYp_b2w
YOLO-World is a serious game-changer, introducing flexibility and scalability to object detection that was previously unattainable. Stay tuned for more developments in this space!
#AI#ObjectDetection#MachineLearning#yolo#computervision#datascience#artificialintelligence#innovation#technology#visionmodeling#YOLO-World
Computer Vision Consultant - available to help your R&D! Have 66+ patents. 37+ years experience in artificial intelligence and hitech technologies. Passionate about using the latest advancements to improve your business.
Get 3D object shape when poor views using feature similarity learn view interactions with UFORecon
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets
arXiv paper abstract https://lnkd.in/eVum-UfM
arXiv PDF paper https://lnkd.in/eBjPBGkm
GitHub https://lnkd.in/eHTBKWp7
Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes.
However, existing methods select only informative and relevant views using predefined scores for training and testing phases.
... introduce and validate a view-combination score to indicate the effectiveness of the input view combination.
... propose UFORecon, a robust view-combination generalizable surface reconstruction framework.
... apply cross-view matching transformers to model interactions between source images and build correlation frustums to capture global correlations.
... framework .. outperforms previous methods in terms of view-combination generalizability and ... conventional generalizable protocol trained with favorable view-combinations ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://lnkd.in/emCkRuA
Web site with my other posts by category https://lnkd.in/enY7VpM
LinkedIn https://lnkd.in/ehrfPYQ6#ComputerVision#3D#AINewsClips#AI#ML#ArtificialIntelligence#MachineLearning
🚀 Exciting news in the world of AI and computer vision! A new Transformer-based gaze object prediction method, TransGOP, has been introduced, achieving state-of-the-art performance in object detection, gaze estimation, and gaze object prediction. This innovative approach leverages Transformer-based object detectors to accurately predict object locations and establish long-distance gaze relationships, particularly beneficial in dense retail scenarios. The proposed object-to-gaze cross-attention mechanism and end-to-end training further enhance the framework's effectiveness. Stay tuned for the code release at the provided link! #AI#ComputerVision#Transformer#MachineLearning#DeepLearning#TechInnovation
🚀 Innovative Breakthrough in Object Detection: SDDGR for Class Incremental Learning! 🌐🔍
Dive into the world of cutting-edge advancements with Stable Diffusion-based Deep Generative Replay (SDDGR), a groundbreaking approach for Class Incremental Object Detection (CIOD). 🤖✨
Key Highlights:
🎓 Continuous Learning Evolution:
In the realm of Class Incremental Learning (CIL), SDDGR emerges as a pioneering method to address the challenge of catastrophic forgetting. It leverages generative replay alongside the continuous enhancements in generative models.
🔍 Diffusion-based Generative Model:
SDDGR introduces a stable diffusion-based generative model, utilizing pre-trained text-to-diffusion networks. This innovation enables the generation of realistic and diverse synthetic images, contributing to improved object detection.
🖼️ Synthetic Image Generation:
The method incorporates an iterative refinement strategy, ensuring the production of high-quality images that encompass old classes. By adopting an L2 knowledge distillation technique, it enhances the retention of prior knowledge in synthetic images.
🎯 Pseudo-labeling for Old Objects:
SDDGR includes a strategic pseudo-labeling approach for old objects within new task images. This prevents misclassification as background elements, contributing to more accurate and reliable object detection.
📈 State-of-the-Art Performance:
Through rigorous evaluations, SDDGR has showcased superior performance, outperforming existing algorithms and establishing a new state-of-the-art in various CIOD scenarios.
🌐 Further Exploration:
If you're intrigued by the details, explore the arXiv paper abstract here and access the arXiv PDF paper here.
🤔 Your Thoughts?
What potential applications and implications do you envision for advancements like SDDGR in the field of object detection and continuous learning? Share your insights below! 👇
#ObjectDetection#ContinuousLearning#GenerativeReplay#AIInnovation#MachineLearning#TechBreakthrough#AIResearch
Computer Vision Consultant - available to help your R&D! Have 66+ patents. 37+ years experience in artificial intelligence and hitech technologies. Passionate about using the latest advancements to improve your business.
Learn new objects without forgetting old ones using stable diffusion to generate images with SDDGR
https://lnkd.in/eTyw28Qk
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
arXiv paper abstract https://lnkd.in/e8dkru76
arXiv PDF paper https://lnkd.in/eMNpqfGM
In ... class incremental learning (CIL), generative replay has become ... a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models.
... propose a novel approach called stable diffusion deep generative replay (SDDGR) for CIOD.
... method utilizes a diffusion-based generative model with pre-trained text-to-diffusion networks to generate realistic and diverse synthetic images.
SDDGR incorporates an iterative refinement strategy to produce high-quality images encompassing old classes ... adopt an L2 knowledge distillation technique to improve the retention of prior knowledge in synthetic images.
... approach includes pseudo-labeling for old objects within new task images, preventing misclassification as background elements.
... SDDGR significantly outperforms existing algorithms, achieving a new state-of-the-art in various CIOD scenarios ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://lnkd.in/emCkRuA
Web site with my other posts by category https://lnkd.in/enY7VpM
LinkedIn https://lnkd.in/ehrfPYQ6#ComputerVision#ObjectDetection#AINewsClips#AI#ML#ArtificialIntelligence#MachineLearning
Computer Vision Consultant - available to help your R&D! Have 66+ patents. 37+ years experience in artificial intelligence and hitech technologies. Passionate about using the latest advancements to improve your business.
Learn new objects without forgetting old ones using stable diffusion to generate images with SDDGR
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
arXiv paper abstract https://lnkd.in/e8dkru76
arXiv PDF paper https://lnkd.in/eMNpqfGM
In ... class incremental learning (CIL), generative replay has become ... a method to mitigate the catastrophic forgetting, alongside the continuous improvements in generative models.
... propose a novel approach called stable diffusion deep generative replay (SDDGR) for CIOD.
... method utilizes a diffusion-based generative model with pre-trained text-to-diffusion networks to generate realistic and diverse synthetic images.
SDDGR incorporates an iterative refinement strategy to produce high-quality images encompassing old classes ... adopt an L2 knowledge distillation technique to improve the retention of prior knowledge in synthetic images.
... approach includes pseudo-labeling for old objects within new task images, preventing misclassification as background elements.
... SDDGR significantly outperforms existing algorithms, achieving a new state-of-the-art in various CIOD scenarios ...
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://lnkd.in/emCkRuA
Web site with my other posts by category https://lnkd.in/enY7VpM
LinkedIn https://lnkd.in/ehrfPYQ6#ComputerVision#ObjectDetection#AINewsClips#AI#ML#ArtificialIntelligence#MachineLearning
***Research Paper/Project Update***
Excited to share our recent work accepted at ECCV 2024 spearheaded by Abrar Majeedi 😁 🎉. Do checkout this work that he plans to present in Milan.
One line description about project: How to assess the performance of a diver (or actions of a surgeon) and score it? How certain can we be about the model's predicted score and can we have a step-wise feedback to trust the model's predictions?
Fun coincidence: One of the datasets we evaluated is related to Olympics Diving and the timing of this paper with 2024 Paris Olympics felt nice 😎
#MachineLearning#AI#ComputerVision#SportsResearch#ECCV
Ph.D. Candidate in Deep Learning | University of Wisconsin-Madison
🚀 I am excited to announce that our paper, "𝗥𝗜𝗖𝗔𝟮: 𝗥𝘂𝗯𝗿𝗶𝗰-𝗜𝗻𝗳𝗼𝗿𝗺𝗲𝗱, 𝗖𝗮𝗹𝗶𝗯𝗿𝗮𝘁𝗲𝗱 𝗔𝘀𝘀𝗲𝘀𝘀𝗺𝗲𝗻𝘁 𝗼𝗳 𝗔𝗰𝘁𝗶𝗼𝗻𝘀," has been accepted at ECCV 2024!
⭐ RICA2 incorporates human-designed scoring rubrics to emulate the human scoring process of activities.
⭐ It also provides calibrated uncertainty estimates, indicating when model predictions can be trusted.
⭐ We demonstrate the effectiveness of RICA2 in automatically evaluating diverse activities such as Olympic diving and surgical procedures.
Thanks to the amazing team: Viswanatha Reddy, Satya Sai Srinath Namburi, and Yin Li.
You can read our paper on arXiv and visit our project page for more details.
📄 Paper: https://lnkd.in/gkw4Q-yS
🔗 Project page: https://lnkd.in/gWMHY56E
⌨ Code: https://lnkd.in/gvVFk-7u
See you in Milan!
#ECCV2024#ComputerVision#AI#MachineLearning#Research
Computer Vision Consultant - available to help your R&D! Have 66+ patents. 37+ years experience in artificial intelligence and hitech technologies. Passionate about using the latest advancements to improve your business.
Segment object into parts unsupervised using knowledge inside stable diffusion using with EmerDiff
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
arXiv paper abstract https://lnkd.in/edyG_uSN
arXiv PDF paper https://lnkd.in/eE6gQCaR
Project page https://lnkd.in/eGhhYif5
Diffusion models have recently received increasing research attention for their remarkable transfer abilities in semantic segmentation tasks.
... generating fine-grained segmentation masks with diffusion models ... requires ... training on annotated datasets ... unclear to what extent pre-trained diffusion models ... understand the semantic relations
... leverage the semantic knowledge extracted from Stable Diffusion (SD) and aim to develop an image segmentor capable of generating fine-grained segmentation maps without any additional training.
... difficulty stems from ... semantically meaningful feature maps typically exist only in the spatially lower-dimensional layers, which poses a challenge in directly extracting pixel-level semantic relations
... framework identifies semantic correspondences between image pixels and spatial locations of low-dimensional feature maps by exploiting SD's generation process and utilizes them for constructing image-resolution segmentation maps.
... segmentation maps are demonstrated to be well delineated and capture detailed parts of the images, indicating the existence of highly accurate pixel-level semantic knowledge in diffusion models.
Please like and share this post if you enjoyed it using the buttons at the bottom!
Stay up to date. Subscribe to my posts https://lnkd.in/emCkRuA
Web site with my other posts by category https://lnkd.in/enY7VpM
LinkedIn https://lnkd.in/ehrfPYQ6#ComputerVision#Segmentation#AINewsClips#AI#ML#ArtificialIntelligence#MachineLearning
Ponder .. AI is awesome ..
#rubiksCube. #AI
“Artificial intelligence (AI) can solve a Rubik's cube in a fraction of a second with 100% accuracy.
In fact, AI can solve a Rubik's cube in about 20 moves, which is usually the minimum number of steps possible, while a highly skilled human can solve it in about 50 moves.
AI can also explain how it solved the cube, which can help people understand how algorithms solve simple problems. “
#algorithms#problemSolving
Llama 3 can now support 1M context length and is accessible on HuggingFace!
Yesterday, Gradient released an updated version of the foundational Llama 3 8B model, expanding its context length from 8k to over 1.04M tokens.
Here are some big ideas on how this was done:
⛳ Long context training data was generated by augmenting the SlimPajama dataset
⛳ The team used foundational ideas from Rotary Position Embeddings (RoPE), particularly the YaRN (Yet another RoPE extensioN) method which uses NTK-aware interpolation to initialize an optimal schedule for RoPE theta, followed by empirical RoPE theta optimization
⛳ This method, requires significantly fewer tokens and training steps compared to previous techniques, enables LLMs like Llama to effectively utilize and extrapolate to much longer context lengths than their original training allowed.
⛳ They also use ideas from Large World Model (LWM) which uses Ring-Attention & a progressive training approach. This involves incrementally training the model on increasingly longer context lengths, thus gradually expanding its capacity to understand and process extensive textual information.
The brief report and model are available here, please check it out: https://lnkd.in/g38R7Ajp
🚨 I post #genai content daily, follow along for the latest updates!
hashtag
#llms#llama3#contextlength
MoE-Mamba: efficient selective state space models with Mixture of Experts 🐍
🔥 SSMs gain popularity, potentially rivaling Transformers. Integrating MoE might boost SSM efficiency further.
📈 MoE-Mamba matches Mamba's results using 2.2x fewer steps and maintains Mamba's fast inference, efficiently blending performance with speed. MoE-Mamba, outperforms both Mamba and Transformer-MoE.
——————————————————
Want to stay at the forefront of Generative AI developments? Follow NLPlanet for daily insights into the most relevant news, guides, and research! 🚀