Hugging Face

Hugging Face

Software Development

The AI community building the future.

About us

The AI community building the future.

Website
https://huggingface.co
Industry
Software Development
Company size
51-200 employees
Type
Privately Held
Founded
2016
Specialties
machine learning, natural language processing, and deep learning

Products

Locations

Employees at Hugging Face

Updates

  • Hugging Face reposted this

    In the Large Language Model world, there are mostly two variants of the same model. * base model: completes a prompt (llama-3.2-1B) * instruction tuned model: completes an instruction (llama-3.2-1B-Instruct) BASE MODEL prompt: i am going to completion: i am going to school, to learn new things. INSTRUCTION TUNED (INSTRUCT) MODEL prompt: who are you? completion: who are you? I am an instruction tuned model. What do you need help with? As you might have noticed, both the models complete a given prompt. The instruction tuned model is the base model which was later fine-tuned on an instructional dataset. This makes it a really good conversational model. Now that we know about the two models let's dive into how to format the prompts for each. With the base model it is really simple, you write a piece of text, and tokenize it. That is it! Due to the chat completion, the instruction tuned models have special tokens to be incorporated into the prompt. ``` <|begin_of_text|><|start_header_id|>system<|end_header_id|> assistant<|eot_id|><|start_header_id|>user<|end_header_id|> Who are you?<|eot_id|> ``` To add on top of the chaos (for instruct models), the special tokens for different models might be (mostly are) different. It would be very difficult to consult the model card, and build the correct prompt for each instuct model. To solve this problem Matthew Carrigan from the Hugging Face team introduced "chat template". It is a Jinja template (for each model) that solves the problems of formatting the propmt correctly. Each (instrcut) model has a tokenizer, and each tokenizer will have a chat tempalte. All you have to do is compose a list of messages (conversation) and apply the chat template to it. To make the example more concrete, let's apply two distinct chat templates. For a more detailed understanding I advice reading Matthew Carrigan's blog post on chat template, titled "Chat Templates: An End to the Silent Performance Killer" (link in comments) Happy chatting!

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Aymeric Roucher, graphic

    Machine Learning Engineer @ Hugging Face 🤗 | Polytechnique - Cambridge

    📜 Old-school RNNs can actually rival fancy transformers! Remember good old RNNs (Recurrent Neural Networks)? Well, researchers from Mila - Quebec Artificial Intelligence Institute and Borealis AI just have shown that simplified versions of decade-old RNNs can match the performance of today's transformers. They took a fresh look at LSTMs (from 1997!) and GRUs (from 2014). They stripped these models down to their bare essentials, creating "minLSTM" and "minGRU". The key changes: ❶ Removed dependencies on previous hidden states in the gates ❷ Dropped the tanh that had been added to restrict output range in order to avoid vanishing gradients ❸ Ensured outputs are time-independent in scale (not sure I understood that well either, don't worry) ⚡️ As a result, you can use a “parallel scan” algorithm to train these new, minimal RNNs, in parallel, taking 88% more memory but also making them 200x faster than their traditional counterparts for long sequences 🔥 The results are mind-blowing! Performance-wise, they go toe-to-toe with Transformers or Mamba. And for Language Modeling, they need 2.5x fewer training steps than Transformers to reach the same performance! 🚀 🤔 Why does this matter? By showing there are simpler models with similar performance to transformers, this challenges the narrative that we need advanced architectures for better performance! 💬 François Chollet wrote in a tweet about this paper: “The fact that there are many recent architectures coming from different directions that roughly match Transformers is proof that architectures aren't fundamentally important in the curve-fitting paradigm (aka deep learning)” “Curve-fitting is about embedding a dataset on a curve. The critical factor is the dataset, not the specific hard-coded bells and whistles that constrain the curve's shape.” It’s the Bitter lesson by Richard Sutton striking again: don’t try fancy thinking architectures, just scale up your model and data! Read the paper 👉 https://lnkd.in/eQiV_8nZ

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Lysandre Debut, graphic

    Head of Open Source at Hugging Face

    Transformers v4.45 was just released, and it introduces a change I would not have expected: Modularity in Modeling Files. Transformers has always been strict about its single-file policy: a model must be defined in a single file rather than through layers of abstraction. So, what changed, and why are we seemingly moving away from the concept that made transformers what it is today, with 250+ model architectures across many modalities? We respond to an issue that affects both contributors and maintainers: contributing a model to transformers is long and tedious. It oftens results in PRs spanning across 20+ files, with thousands of lines of code. We wanted a solution to remove that constraint from contributors, therefore significantly enabling model additions from model authors and community members. Still, the single-file policy is at the core of Transformers: controversial to some due to the constraints it brings with it, we know for a fact that it enabled: - Researchers to experiment and tweak the modeling files - Students to go through the code without jumping from abstraction to abstraction, - Community members to contribute models without first needing to understand the rest of the overwhelmingly large package. Therefore, we've worked on "Modular Transformers," an approach to designing modeling files in a modular way while maintaining the single-file policy. Contributing a model to Transformers can now be done by subclassing other models, inheriting all their attributes, methods, and forward definitions. The tool we contribute enables unraveling that inheritance into a single file. The RoBERTa "Modular" modeling file above defines the base and masked LM models. This is then unraveled in a 1700+ single-file model definition, which can be inspected, debugged, tweaked, and adapted. The model definition spans ~30 lines of code: only the differences are now explicit. This is particularly important in the wake of LLMs, with each released model being only slightly different in terms of architecture; most of the difference lying in the data for the pretrained checkpoints. While the "Modular" and "Single-file" model definitions serve different purposes, they should both result in the exact same code execution. We aim for no magic, no hidden behavior: define a code path, a property, a method in the modular file, and you'll see it reflected in the single file. With this now merged, we can start seeing model contributions coming in at 215 LoC for the modular file; being unraveled to several files, the single-file definition standing at 1300+ LoC. Now, please come and help us break it! It's experimental and brittle, but it should drastically lower the barrier of entry for model contribution. Come and contribute your model to make it accessible to the community at large

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • Hugging Face reposted this

    View organization page for Gradio, graphic

    36,032 followers

    🎤 Voice-Restore is now LIVE on Hugging Face! 🚀 Cutting-edge model can fix background noise, reverberations, distortions, and signal loss. 📣 VoiceRestore uses Flow-Matching Transformers for Speech Recording Quality Restoration 🔊 The audio restoration app is build with Gradio 5 (we are still in Beta!😎): https://lnkd.in/g9NZpK2e 💻 Super easy to use: Built on 🤗 Transformers by Jade Choghari, integrated seamlessly with Gradio for a smooth experience! 🔧 Build the gradio app locally: https://lnkd.in/grbSusMV Kudos to the author for the release Stanislav Kirdey! With Gradio5, Python is the language for you if you want to build highly performant apps with a slick UI. Extremely simple to start using Beta release: `pip install gradio==5.0b5` Docs for Gradio 5 Beta: https://lnkd.in/ghJ97rRn

  • Hugging Face reposted this

    View organization page for Gradio, graphic

    36,032 followers

    Now you can take audio notes and transcribe them in Real-time with Whisper Turbo and Gradio 5!🤩 ✨ Completely open-source stack for building high-performing Python apps. Build them locally or host them publicly. Realtime Whisper-Large-v3 Turbo with a Gradio app on Hugging Face Spaces: https://lnkd.in/gh_tgd7W Kudos to Nishith Jain (@kingnish24 on X) for the brilliant gradio app 👏

  • Hugging Face reposted this

    View organization page for Argilla, graphic

    9,437 followers

    How do you start your text classification project on the Hugging Face Hub? David Berenstein will guide you through the journey of creating a text classifier from scratch using Open Source tools. 🚀 Agenda: - Deploy Argilla on Hugging Face Spaces - Configure and create an Argilla dataset - Use model predictions to accelerate labeling - Train a SetFit model 👇🏾Link to the event in the comments

    • No alternative text description for this image
  • Hugging Face reposted this

    View profile for Philipp Schmid, graphic

    Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

    Whisper Turbo is available on Hugging Face! 🚀 Model: https://lnkd.in/ez6D5Gkf Demo: https://lnkd.in/e4rRc6nD

    View profile for Philipp Schmid, graphic

    Technical Lead & LLMs at Hugging Face 🤗 | AWS ML HERO 🦸🏻♂️

    OpenAI has released new Whisper models! 👀 Yesterday, OpenAI updated their Github and added a new Whisper V3 Turbo model! The turbo model is an optimized version of large-v3 that offers 8x faster transcription speed with minimal degradation in accuracy (no benchmarks yet) with roughly half the size. ⚡️ Coming to Hugging Face soon… 🔜

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

Hugging Face 7 total rounds

Last Round

Series D
See more info on crunchbase