CMU MLSP

CMU MLSP

Software Development

Pittsburgh, PA 104 followers

Research Lab led by Raj and Singh at Carnegie Mellon University for Speech and Audio Processing

About us

Research lab affiliated with LTI at Carnegie Mellon University that focuses on speech, audio, and audiovisual processing led by Bhiksha Raj and Rita Singh

Industry
Software Development
Company size
11-50 employees
Headquarters
Pittsburgh, PA
Type
Nonprofit

Locations

Employees at CMU MLSP

Updates

  • View organization page for CMU MLSP, graphic

    104 followers

    Congratulations Dr. Ankit Shah!

    View profile for Ankit S., graphic

    LLM Arch Assoc Director and Tech Lead @Accenture | Ph.D. CMU LTI | Deep Learning | Machine Learning | AI | Computer Science | AGI

    Very happy to announce that this past month of October on October 11th that I successfully passed my Ph.D. thesis defense on "Computational Audition using Imprecise Labels" from Carnegie Mellon University - School of Computer Science - Language Technologies Institute. I am very grateful to my advisors Bhiksha Raj, Rita Singh, for their constant support, research guidance throughout the program without whose support this milestone would not have been possible. Thankful for my thesis committee Shinji Watanabe Anurag Kumar Jonathan Le Roux for their suggestions on the thesis document, presentation skills and research guidance throughout the way. To my parents, Parag S. Ragini Shah for their incredible support, blessing and constant motivation in any goal I have chosen. Special thanks and shoutout to my collaborators and labmates for their support throughout the journey. To anyone reading this post, "Never give up on your dreams". I didn't get in the Ph.D. program the first time but did my best to capitalize on the next opportunity that was presented and continued the pursuit to contribute meaningfully to research.

    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
    • No alternative text description for this image
  • CMU MLSP reposted this

    View profile for Xiang Li, graphic

    Final- year PhD at CMU | Multimodal Understanding and Generation | Ex-intern @ 2 × Microsoft, Tiktok,Adobe

    🌟New image tokenizer for MLLM: ImageFolder 🚀 Imagefolder 🚀 generates images in just 10 steps ⚡⚡ with much shorter token length💡, fewer costs💡, and better quality💡 compared to previous tokenizers. To summary: Imagefolder🚀 achieves significant improvement compared to SOTA VAR, 3.30 v.s. 2.60 (ours) FID 📈 for ~300M generator, 680 v.s. 265 (ours) 📈 Token length, 11.3 v.s. 58.0 (ours) 📈 Linear probing accuracy, 4 : 1 (ours) ⚡Training and inference cost, 🔗 Project: https://lnkd.in/dkwciY8D 📑 Paper: https://lnkd.in/dM698ZDk 💻 Code: https://lnkd.in/dWrsbNFP 👀 Our results demonstrate significant improvements, and we will be sharing further findings and releasing checkpoints soon. #LLM #Autoregressive #Generation #Tokenizer #Research

    • No alternative text description for this image
  • CMU MLSP reposted this

    View profile for Era Parihar, graphic

    University of Michigan - MS Data Science | Machine learning | LLM’s

    🚀 Excited to share my very first Medium article! 🎉 I’ve just published an article titled "Understanding Encoder-Only Architectures in Transformers: Key Pre-Training Tasks Explained". In this piece, I dive into the intricacies of encoder-only transformer architectures, breaking down the key pre-training tasks that power many of today’s advanced models. If you’re curious about the technical underpinnings of transformers or looking to deepen your understanding of how these models work, this article is for you! This piece stems from my research at the CMU MLSP lab. Check it out and let me know what you think. Feedback and discussions are always welcome! 😊 #AI #MachineLearning #NLP #Transformers #ArtificialIntelligence #Medium

    Understanding Encoder-Only Architectures in Transformers: Key Pre-Training Tasks Explained

    Understanding Encoder-Only Architectures in Transformers: Key Pre-Training Tasks Explained

    link.medium.com

  • View organization page for CMU MLSP, graphic

    104 followers

    In this paper, we are investigating the hypothesis that the spoken utterance carries more information than the written transcript of what was spoken, and that this difference can result in different take-aways, which would manifest as differences in the summaries derived from the two. Although all current evaluation methodologies are text-based, and are naturally biased towards text- (i.e. transcript)-based summaries, we nonetheless find that the summaries derived directly from the spoken utterance do indeed provide different, and in some instances superior interpretations than those derived from the text transcription.

    View profile for Roshan S. Sharma, graphic

    Research Scientist at Google | Ph.D. from Carnegie Mellon University | Speech and Language processing

    Excited to share our paper accepted at #ACL2024 Main conference titled "Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?" with Suwon Shon Mark Lindsey Hira Dhamyal Rita Singh, and Bhiksha Raj. In this paper, we take the first steps towards understanding how humans may annotate for speech summarization by isolating different contributing factors. Humans can summarize speech when it is presented by (a) listening to the speech directly and then constructing the summary based on this, or (b) reading a textual transcript of the content spoken in the recording, and using that to construct the summary. It is unclear whether summaries that result from these two different approaches are different or similar. In this paper, we try to examine differences in resulting human summaries arising from: (a) different source modalities ( speech or transcript) (b) errors in the source modality (manual transcript or automatic transcript with errors) , and (c) expertise of human annotators. We collect data from multiple human annotators and devise a range of metrics to holistically assess differences between summaries across multiple dimensions. We find interesting differences between speech-based and transcript-based summaries from our data collection and analysis. Transcript-based annotation is valuable if transcription errors are minimal and longer and more informative summaries are desired; while speech-based annotation is desired for higher information selectivity, factual consistency, and resilience to transcription errors. Please read our paper and attend our virtual poster session for further discussions.

  • CMU MLSP reposted this

    View profile for Roshan S. Sharma, graphic

    Research Scientist at Google | Ph.D. from Carnegie Mellon University | Speech and Language processing

    Excited to share our paper accepted at #ACL2024 Main conference titled "Speech vs. Transcript: Does It Matter for Human Annotators in Speech Summarization?" with Suwon Shon Mark Lindsey Hira Dhamyal Rita Singh, and Bhiksha Raj. In this paper, we take the first steps towards understanding how humans may annotate for speech summarization by isolating different contributing factors. Humans can summarize speech when it is presented by (a) listening to the speech directly and then constructing the summary based on this, or (b) reading a textual transcript of the content spoken in the recording, and using that to construct the summary. It is unclear whether summaries that result from these two different approaches are different or similar. In this paper, we try to examine differences in resulting human summaries arising from: (a) different source modalities ( speech or transcript) (b) errors in the source modality (manual transcript or automatic transcript with errors) , and (c) expertise of human annotators. We collect data from multiple human annotators and devise a range of metrics to holistically assess differences between summaries across multiple dimensions. We find interesting differences between speech-based and transcript-based summaries from our data collection and analysis. Transcript-based annotation is valuable if transcription errors are minimal and longer and more informative summaries are desired; while speech-based annotation is desired for higher information selectivity, factual consistency, and resilience to transcription errors. Please read our paper and attend our virtual poster session for further discussions.

  • View organization page for CMU MLSP, graphic

    104 followers

    Congratulations to our esteemed Prof. Bhiksha Raj for his remarkable achievement! #isca #ai #achievement #speech

    We are very excited to introduce the 2024 #ISCA Fellows! The ISCA Fellow program is to recognise ISCA members who have shown outstanding scientific and/or technical contributions and/or continued significant service to ISCA. Takayuki Arai, Dept. of Information and Communication Sciences, Sophia University, Tokyo, Japan ⭐ For contributions to developing models of speech production and applying them for speech science, technology, pathology and education   Kay Berkling (Prof'in), Berkling, DHBW Mosbach,  Baden-Wuerttemberg Cooperative State University, Germany ⭐ For her contributions to education, improving childhood learning and boosting diversity through interactive technology   Carlos Busso, University of Texas at Dallas, US ⭐ For contributions to speech and multimodal affective signal processing and their technology applications Sebastian Moeller, Technische Universität Berlin, and German Research Center for Artificial Intelligence, Berlin, Germany ⭐ For sustained contributions to the evaluation of speech transmission, processing, and speech technology Bhiksha Raj, Carnegie Mellon University, Pittsburgh, PA, US ⭐ For contributions to robust speech recognition and other areas in speech and audio processing.   Torbjørn Svendsen Svendsen, NTNU Norwegian University of Science and Technology, Norway ⭐ For contributions to speech processing research and long-standing contributions to the speech community Petra Wagner, Universität Bielefeld, Germany ⭐ For contributions to multimodal prosody, conversational interaction, and speech synthesis.   Congratulations to all!! 

  • CMU MLSP reposted this

    View profile for Muqiao Yang, graphic

    Research Scientist @ Google NY | PhD @ Carnegie Mellon University

    Delighted to share that I have just successfully defended my thesis at CMU 🎓! And heartful gratitude to my advisors and committee members Bhiksha Raj, Shinji Watanabe, Rita Singh, Anurag Kumar, and everyone who supported me and loves me during this journey. I will join Google as a Research Scientist next week in New York City, working on speech-related research ♊️. Looking forward to the new chapter ahead!

    • No alternative text description for this image

Similar pages