default search action
Takuya Yoshioka
Person information
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [j21]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. IEEE ACM Trans. Audio Speech Lang. Process. 32: 3355-3364 (2024) - [c122]Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota:
Look Once to Hear: Target Speech Hearing with Noisy Examples. CHI 2024: 37:1-37:16 - [c121]Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
Diarist: Streaming Speech Translation with Speaker Diarization. ICASSP 2024: 10866-10870 - [c120]Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
T-SOT FNT: Streaming Multi-Talker ASR with Text-Only Domain Adaptation Capability. ICASSP 2024: 11531-11535 - [c119]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu:
Profile-Error-Tolerant Target-Speaker Voice Activity Detection. ICASSP 2024: 11906-11910 - [c118]Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Xuemei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data. NAACL-HLT (Findings) 2024: 1615-1627 - [i66]Francis McCann Ramirez, Luka Chkhetiani, Andrew Ehrenberg, Robert McHardy, Rami Botros, Yash Khare, Andrea Vanzo, Taufiquzzaman Peyash, Gabriel Oexle, Michael Liang, Ilya Sklyar, Enver Fakhan, Ahmed Etefy, Daniel McCrystal, Sam Flamini, Domenic Donato, Takuya Yoshioka:
Anatomy of Industrial Scale Multilingual ASR. CoRR abs/2404.09841 (2024) - [i65]Bandhav Veluri, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota:
Look Once to Hear: Target Speech Hearing with Noisy Examples. CoRR abs/2405.06289 (2024) - [i64]Vidya Srinivas, Malek Itani, Tuochao Chen, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota:
Knowledge boosting during low-latency inference. CoRR abs/2407.11055 (2024) - [i63]Tuochao Chen, Qirui Wang, Bohan Wu, Malek Itani, Sefik Emre Eskimez, Takuya Yoshioka, Shyamnath Gollakota:
Target conversation extraction: Source separation using turn-taking dynamics. CoRR abs/2407.11277 (2024) - 2023
- [c117]Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code: An Integrative and Composable Multimodal Learning Framework. AAAI 2023: 10880-10890 - [c116]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech Separation with Large-Scale Self-Supervised Learning. ICASSP 2023: 1-5 - [c115]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-Supervised Learning with Bi-Label Masked Speech Prediction for Streaming Multi-Talker Speech Recognition. ICASSP 2023: 1-5 - [c114]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Vararray Meets T-Sot: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. ICASSP 2023: 1-5 - [c113]Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng:
Target Sound Extraction with Variable Cross-Modality Clues. ICASSP 2023: 1-5 - [c112]Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka:
Breaking the Trade-Off in Personalized Speech Enhancement With Cross-Task Knowledge Distillation. ICASSP 2023: 1-5 - [c111]Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota:
Real-Time Target Sound Extraction. ICASSP 2023: 1-5 - [c110]Heming Wang, Yao Qian, Hemin Yang, Nauyuki Kanda, Peidong Wang, Takuya Yoshioka, Xiaofei Wang, Yiming Wang, Shujie Liu, Zhuo Chen, DeLiang Wang, Michael Zeng:
DATA2VEC-SG: Improving Self-Supervised Learning Representations for Speech Generation Tasks. ICASSP 2023: 1-5 - [c109]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu:
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-To-End Neural Diarization. ICASSP 2023: 1-5 - [c108]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating Realistic Speech Overlaps Improves Multi-Talker ASR. ICASSP 2023: 1-5 - [c107]Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. INTERSPEECH 2023: 236-240 - [c106]Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Pärnamaa, Huaming Wang:
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation. INTERSPEECH 2023: 1050-1054 - [c105]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. INTERSPEECH 2023: 1314-1318 - [c104]Midia Yousefi, Naoyuki Kanda, Dongmei Wang, Zhuo Chen, Xiaofei Wang, Takuya Yoshioka:
Speaker Diarization for ASR Output with T-vectors: A Sequence Classification Approach. INTERSPEECH 2023: 3502-3506 - [c103]Takuya Yoshioka, Keita Sasada, Yuichiro Nakano, Keisuke Fujii:
Experimental Demonstration of Fermionic QAOA with One-Dimensional Cyclic Driver Hamiltonian. QCE 2023: 300-306 - [c102]Bandhav Veluri, Malek Itani, Justin Chan, Takuya Yoshioka, Shyamnath Gollakota:
Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables. UIST 2023: 89:1-89:15 - [i62]Naoyuki Kanda, Takuya Yoshioka, Yang Liu:
Factual Consistency Oriented Speech Recognition. CoRR abs/2302.12369 (2023) - [i61]Chenda Li, Yao Qian, Zhuo Chen, Dongmei Wang, Takuya Yoshioka, Shujie Liu, Yanmin Qian, Michael Zeng:
Target Sound Extraction with Variable Cross-modality Clues. CoRR abs/2303.08372 (2023) - [i60]Ziyi Yang, Mahmoud Khademi, Yichong Xu, Reid Pryzant, Yuwei Fang, Chenguang Zhu, Dongdong Chen, Yao Qian, Mei Gao, Yi-Ling Chen, Robert Gmyr, Naoyuki Kanda, Noel Codella, Bin Xiao, Yu Shi, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data. CoRR abs/2305.12311 (2023) - [i59]Yuwei Fang, Mahmoud Khademi, Chenguang Zhu, Ziyi Yang, Reid Pryzant, Yichong Xu, Yao Qian, Takuya Yoshioka, Lu Yuan, Michael Zeng, Xuedong Huang:
i-Code Studio: A Configurable and Composable Framework for Integrative AI. CoRR abs/2305.13738 (2023) - [i58]Chenda Li, Yao Qian, Zhuo Chen, Naoyuki Kanda, Dongmei Wang, Takuya Yoshioka, Yanmin Qian, Michael Zeng:
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers. CoRR abs/2305.18747 (2023) - [i57]Xiaofei Wang, Manthan Thakker, Zhuo Chen, Naoyuki Kanda, Sefik Emre Eskimez, Sanyuan Chen, Min Tang, Shujie Liu, Jinyu Li, Takuya Yoshioka:
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer. CoRR abs/2308.06873 (2023) - [i56]Mu Yang, Naoyuki Kanda, Xiaofei Wang, Junkun Chen, Peidong Wang, Jian Xue, Jinyu Li, Takuya Yoshioka:
DiariST: Streaming Speech Translation with Speaker Diarization. CoRR abs/2309.08007 (2023) - [i55]Jian Wu, Naoyuki Kanda, Takuya Yoshioka, Rui Zhao, Zhuo Chen, Jinyu Li:
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability. CoRR abs/2309.08131 (2023) - [i54]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Midia Yousefi, Takuya Yoshioka, Jian Wu:
Profile-Error-Tolerant Target-Speaker Voice Activity Detection. CoRR abs/2309.12521 (2023) - [i53]Bandhav Veluri, Malek Itani, Justin Chan, Takuya Yoshioka, Shyamnath Gollakota:
Semantic Hearing: Programming Acoustic Scenes with Binaural Hearables. CoRR abs/2311.00320 (2023) - 2022
- [j20]Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. IEEE J. Sel. Top. Signal Process. 16(6): 1505-1518 (2022) - [c101]Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang:
One Model to Enhance Them All: Array Geometry Agnostic Multi-Channel Personalized Speech Enhancement. ICASSP 2022: 271-275 - [c100]Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang:
Personalized speech enhancement: new models and Comprehensive evaluation. ICASSP 2022: 356-360 - [c99]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang:
Picknet: Real-Time Channel Selection for Ad Hoc Microphone Arrays. ICASSP 2022: 921-925 - [c98]Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li:
Continuous Speech Separation with Recurrent Selective Attention Network. ICASSP 2022: 6017-6021 - [c97]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. ICASSP 2022: 6027-6031 - [c96]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-Neural Beamformer for Continuous Speech Separation. ICASSP 2022: 6032-6036 - [c95]Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang:
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. ICASSP 2022: 6062-6066 - [c94]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers Using End-to-End Speaker-Attributed ASR. ICASSP 2022: 8082-8086 - [c93]Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner:
Icassp 2022 Deep Noise Suppression Challenge. ICASSP 2022: 9271-9275 - [c92]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. INTERSPEECH 2022: 521-525 - [c91]Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang:
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation. INTERSPEECH 2022: 991-995 - [c90]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. INTERSPEECH 2022: 3774-3778 - [c89]Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. INTERSPEECH 2022: 3814-3818 - [c88]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-wise Permutation Invariant Training. INTERSPEECH 2022: 5383-5387 - [c87]Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:
Exploring WavLM on Speech Enhancement. SLT 2022: 451-457 - [i52]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang:
PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays. CoRR abs/2201.09586 (2022) - [i51]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Multi-Talker ASR with Token-Level Serialized Output Training. CoRR abs/2202.00842 (2022) - [i50]Harishchandra Dubey, Vishak Gopal, Ross Cutler, Ashkan Aazami, Sergiy Matusevych, Sebastian Braun, Sefik Emre Eskimez, Manthan Thakker, Takuya Yoshioka, Hannes Gamper, Robert Aichner:
ICASSP 2022 Deep Noise Suppression Challenge. CoRR abs/2202.13288 (2022) - [i49]Naoyuki Kanda, Jian Wu, Yu Wu, Xiong Xiao, Zhong Meng, Xiaofei Wang, Yashesh Gaur, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings. CoRR abs/2203.16685 (2022) - [i48]Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang:
Fast Real-time Personalized Speech Enhancement: End-to-End Enhancement Network (E3Net) and Knowledge Distillation. CoRR abs/2204.00771 (2022) - [i47]Xiaofei Wang, Dongmei Wang, Naoyuki Kanda, Sefik Emre Eskimez, Takuya Yoshioka:
Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation. CoRR abs/2204.03232 (2022) - [i46]Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu:
Ultra Fast Speech Separation Model with Teacher Student Learning. CoRR abs/2204.12777 (2022) - [i45]Ziyi Yang, Yuwei Fang, Chenguang Zhu, Reid Pryzant, Dongdong Chen, Yu Shi, Yichong Xu, Yao Qian, Mei Gao, Yi-Ling Chen, Liyang Lu, Yujia Xie, Robert Gmyr, Noel Codella, Naoyuki Kanda, Bin Xiao, Lu Yuan, Takuya Yoshioka, Michael Zeng, Xuedong Huang:
i-Code: An Integrative and Composable Multimodal Learning Framework. CoRR abs/2205.01818 (2022) - [i44]Dongmei Wang, Xiong Xiao, Naoyuki Kanda, Takuya Yoshioka, Jian Wu:
Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization. CoRR abs/2208.13085 (2022) - [i43]Naoyuki Kanda, Jian Wu, Xiaofei Wang, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition. CoRR abs/2209.04974 (2022) - [i42]Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka:
Simulating realistic speech overlaps improves multi-talker ASR. CoRR abs/2210.15715 (2022) - [i41]Bandhav Veluri, Justin Chan, Malek Itani, Tuochao Chen, Takuya Yoshioka, Shyamnath Gollakota:
Real-Time Target Sound Extraction. CoRR abs/2211.02250 (2022) - [i40]Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Pärnamaa, Huaming Wang:
Real-Time Joint Personalized Speech Enhancement and Acoustic Echo Cancellation with E3Net. CoRR abs/2211.02773 (2022) - [i39]Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka:
Breaking the trade-off in personalized speech enhancement with cross-task knowledge distillation. CoRR abs/2211.02944 (2022) - [i38]Zhuo Chen, Naoyuki Kanda, Jian Wu, Yu Wu, Xiaofei Wang, Takuya Yoshioka, Jinyu Li, Sunit Sivasankaran, Sefik Emre Eskimez:
Speech separation with large-scale self-supervised learning. CoRR abs/2211.05172 (2022) - [i37]Zili Huang, Zhuo Chen, Naoyuki Kanda, Jian Wu, Yiming Wang, Jinyu Li, Takuya Yoshioka, Xiaofei Wang, Peidong Wang:
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition. CoRR abs/2211.05564 (2022) - [i36]Xiaofei Wang, Zhuo Chen, Yu Shi, Jian Wu, Naoyuki Kanda, Takuya Yoshioka:
Breaking trade-offs in speech separation with sparsely-gated mixture of experts. CoRR abs/2211.06493 (2022) - [i35]Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu:
Exploring WavLM on Speech Enhancement. CoRR abs/2211.09988 (2022) - 2021
- [c86]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. ASRU 2021: 296-303 - [c85]Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng:
Continuous Speech Separation with Ad Hoc Microphone Arrays. EUSIPCO 2021: 1100-1104 - [c84]Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Jinyu Li, Takuya Yoshioka, Chengyi Wang, Shujie Liu, Ming Zhou:
Continuous Speech Separation with Conformer. ICASSP 2021: 5749-5753 - [c83]Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the Voxceleb Speaker Recognition Challenge 2020. ICASSP 2021: 5824-5828 - [c82]Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jin-Yu Li, Xiangzhan Yu:
Don't Shoot Butterfly with Rifles: Multi-Channel Continuous Speech Separation with Early Exit Transformer. ICASSP 2021: 6139-6143 - [c81]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. ICASSP 2021: 6503-6507 - [c80]Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-Attributed ASR on Long-Form Multi-Talker Recordings. ICASSP 2021: 6763-6767 - [c79]Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka:
Human Listening and Live Captioning: Multi-Task Training for Speech Enhancement. Interspeech 2021: 2686-2690 - [c78]Sanyuan Chen, Yu Wu, Zhuo Chen, Jian Wu, Takuya Yoshioka, Shujie Liu, Jinyu Li, Xiangzhan Yu:
Ultra Fast Speech Separation Model with Teacher Student Learning. Interspeech 2021: 3026-3030 - [c77]Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. Interspeech 2021: 3066-3070 - [c76]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. Interspeech 2021: 3430-3434 - [c75]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. Interspeech 2021: 4413-4417 - [c74]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-to-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. SLT 2021: 809-816 - [c73]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-Channel ASR with Bias Information for Meeting Transcription. SLT 2021: 833-840 - [c72]Chenda Li, Yi Luo, Cong Han, Jinyu Li, Takuya Yoshioka, Tianyan Zhou, Marc Delcroix, Keisuke Kinoshita, Christoph Böddeker, Yanmin Qian, Shinji Watanabe, Zhuo Chen:
Dual-Path RNN for Long Recording Speech Separation. SLT 2021: 865-872 - [c71]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Maokui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis. SLT 2021: 897-904 - [i34]Xuankai Chang, Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings. CoRR abs/2101.01853 (2021) - [i33]Dongmei Wang, Takuya Yoshioka, Zhuo Chen, Xiaofei Wang, Tianyan Zhou, Zhong Meng:
Continuous Speech Separation with Ad Hoc Microphone Arrays. CoRR abs/2103.02378 (2021) - [i32]Naoyuki Kanda, Guoli Ye, Yu Wu, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone. CoRR abs/2103.16776 (2021) - [i31]Naoyuki Kanda, Guoli Ye, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
End-to-End Speaker-Attributed ASR with Transformer. CoRR abs/2104.02128 (2021) - [i30]Jian Wu, Zhuo Chen, Sanyuan Chen, Yu Wu, Takuya Yoshioka, Naoyuki Kanda, Shujie Liu, Jinyu Li:
Investigation of Practical Aspects of Single Channel Speech Separation for ASR. CoRR abs/2107.01922 (2021) - [i29]Naoyuki Kanda, Xiong Xiao, Jian Wu, Tianyan Zhou, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio. CoRR abs/2107.02852 (2021) - [i28]Naoyuki Kanda, Xiong Xiao, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR. CoRR abs/2110.03151 (2021) - [i27]Takuya Yoshioka, Xiaofei Wang, Dongmei Wang, Min Tang, Zirun Zhu, Zhuo Chen, Naoyuki Kanda:
VarArray: Array-Geometry-Agnostic Continuous Speech Separation. CoRR abs/2110.05745 (2021) - [i26]Zhuohuang Zhang, Takuya Yoshioka, Naoyuki Kanda, Zhuo Chen, Xiaofei Wang, Dongmei Wang, Sefik Emre Eskimez:
All-neural beamformer for continuous speech separation. CoRR abs/2110.06428 (2021) - [i25]Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang:
Personalized Speech Enhancement: New Models and Comprehensive Evaluation. CoRR abs/2110.09625 (2021) - [i24]Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang:
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement. CoRR abs/2110.10330 (2021) - [i23]Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Jian Wu, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Furu Wei:
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing. CoRR abs/2110.13900 (2021) - [i22]Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei:
Separating Long-Form Speech with Group-Wise Permutation Invariant Training. CoRR abs/2110.14142 (2021) - [i21]Yixuan Zhang, Zhuo Chen, Jian Wu, Takuya Yoshioka, Peidong Wang, Zhong Meng, Jinyu Li:
Continuous Speech Separation with Recurrent Selective Attention Network. CoRR abs/2110.14838 (2021) - [i20]Heming Wang, Yao Qian, Xiaofei Wang, Yiming Wang, Chengyi Wang, Shujie Liu, Takuya Yoshioka, Jinyu Li, DeLiang Wang:
Improving Noise Robustness of Contrastive Speech Representation Learning with Speech Reconstruction. CoRR abs/2110.15430 (2021) - 2020
- [c70]Yi Luo, Zhuo Chen, Takuya Yoshioka:
Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation. ICASSP 2020: 46-50 - [c69]Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka:
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation. ICASSP 2020: 6394-6398 - [c68]Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Xiong Xiao, Jinyu Li:
Continuous Speech Separation: Dataset and Analysis. ICASSP 2020: 7284-7288 - [c67]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of any Number of Speakers. INTERSPEECH 2020: 36-40 - [c66]Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-End Architecture of Online Multi-Channel Speech Separation. INTERSPEECH 2020: 81-85 - [c65]Dongmei Wang, Zhuo Chen, Takuya Yoshioka:
Neural Speech Separation Using Spatially Distributed Microphones. INTERSPEECH 2020: 339-343 - [c64]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. INTERSPEECH 2020: 2797-2801 - [i19]Zhuo Chen, Takuya Yoshioka, Liang Lu, Tianyan Zhou, Zhong Meng, Yi Luo, Jian Wu, Jinyu Li:
Continuous speech separation: dataset and analysis. CoRR abs/2001.11482 (2020) - [i18]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Takuya Yoshioka:
Serialized Output Training for End-to-End Overlapped Speech Recognition. CoRR abs/2003.12687 (2020) - [i17]Dongmei Wang, Zhuo Chen, Takuya Yoshioka:
Neural Speech Separation Using Spatially Distributed Microphones. CoRR abs/2004.13670 (2020) - [i16]Naoyuki Kanda, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka:
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers. CoRR abs/2006.10930 (2020) - [i15]Naoyuki Kanda, Xuankai Chang, Yashesh Gaur, Xiaofei Wang, Zhong Meng, Zhuo Chen, Takuya Yoshioka:
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings. CoRR abs/2008.04546 (2020) - [i14]Jian Wu, Zhuo Chen, Jinyu Li, Takuya Yoshioka, Zhili Tan, Ed Lin, Yi Luo, Lei Xie:
An End-to-end Architecture of Online Multi-channel Speech Separation. CoRR abs/2009.03141 (2020) - [i13]Xiong Xiao, Naoyuki Kanda, Zhuo Chen, Tianyan Zhou, Takuya Yoshioka, Sanyuan Chen, Yong Zhao, Gang Liu, Yu Wu, Jian Wu, Shujie Liu, Jinyu Li, Yifan Gong:
Microsoft Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2020. CoRR abs/2010.11458 (2020) - [i12]Sanyuan Chen, Yu Wu, Zhuo Chen, Takuya Yoshioka, Shujie Liu, Jinyu Li:
Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer. CoRR abs/2010.12180 (2020) - [i11]Desh Raj, Pavel Denisov, Zhuo Chen, Hakan Erdogan, Zili Huang, Mao-Kui He, Shinji Watanabe, Jun Du, Takuya Yoshioka, Yi Luo, Naoyuki Kanda, Jinyu Li, Scott Wisdom, John R. Hershey:
Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis. CoRR abs/2011.02014 (2020) - [i10]Naoyuki Kanda, Zhong Meng, Liang Lu, Yashesh Gaur, Xiaofei Wang, Zhuo Chen, Takuya Yoshioka:
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR. CoRR abs/2011.02921 (2020) - [i9]Xiaofei Wang, Naoyuki Kanda, Yashesh Gaur, Zhuo Chen, Zhong Meng, Takuya Yoshioka:
Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription. CoRR abs/2011.03110 (2020)
2010 – 2019
- 2019
- [c63]Peidong Wang, Zhuo Chen, Xiong Xiao, Zhong Meng, Takuya Yoshioka, Tianyan Zhou, Liang Lu, Jinyu Li:
Speech Separation Using Speaker Inventory. ASRU 2019: 230-236 - [c62]Takuya Yoshioka, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Igor Abramovski, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang:
Advances in Online Audio-Visual Meeting Transcription. ASRU 2019: 276-283 - [c61]Andreas Stolcke, Takuya Yoshioka:
Dover: A Method for Combining Diarization Outputs. ASRU 2019: 757-763 - [c60]Xiong Xiao, Zhuo Chen, Takuya Yoshioka, Hakan Erdogan, Changliang Liu, Dimitrios Dimitriadis, Jasha Droppo, Yifan Gong:
Single-channel Speech Extraction Using Speaker Inventory and Attention Network. ICASSP 2019: 86-90 - [c59]Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis:
Low-latency Speaker-independent Continuous Speech Separation. ICASSP 2019: 6980-6984 - [c58]Takuya Yoshioka, Dimitrios Dimitriadis, Andreas Stolcke, William Hinthorn, Zhuo Chen, Michael Zeng, Xuedong Huang:
Meeting Transcription Using Asynchronous Distant Microphones. INTERSPEECH 2019: 2968-2972 - [i8]Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis:
Low-Latency Speaker-Independent Continuous Speech Separation. CoRR abs/1904.06478 (2019) - [i7]Takuya Yoshioka, Zhuo Chen, Dimitrios Dimitriadis, William Hinthorn, Xuedong Huang, Andreas Stolcke, Michael Zeng:
Meeting Transcription Using Virtual Microphone Arrays. CoRR abs/1905.02545 (2019) - [i6]Andreas Stolcke, Takuya Yoshioka:
DOVER: A Method for Combining Diarization Outputs. CoRR abs/1909.08090 (2019) - [i5]Yi Luo, Zhuo Chen, Takuya Yoshioka:
Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation. CoRR abs/1910.06379 (2019) - [i4]Yi Luo, Zhuo Chen, Nima Mesgarani, Takuya Yoshioka:
End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation. CoRR abs/1910.14104 (2019) - [i3]Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou:
Advances in Online Audio-Visual Meeting Transcription. CoRR abs/1912.04979 (2019) - 2018
- [c57]Zhuo Chen, Takuya Yoshioka, Xiong Xiao, Linyu Li, Michael L. Seltzer, Yifan Gong:
Efficient Integration of Fixed Beamformers and Speech Separation Networks for Multi-Channel Far-Field Speech Separation. ICASSP 2018: 5384-5388 - [c56]Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Fil Alleva:
Multi-Microphone Neural Speech Separation for Far-Field Multi-Talker Speech Recognition. ICASSP 2018: 5739-5743 - [c55]Christoph Böddeker, Hakan Erdogan, Takuya Yoshioka, Reinhold Haeb-Umbach:
Exploring Practical Aspects of Neural Mask-Based Beamforming for Far-Field Speech Recognition. ICASSP 2018: 6697-6701 - [c54]Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva:
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks. INTERSPEECH 2018: 3038-3042 - [c53]Hakan Erdogan, Takuya Yoshioka:
Investigations on Data Augmentation and Loss Functions for Deep Learning Based Speech-Background Separation. INTERSPEECH 2018: 3499-3503 - [c52]Zhuo Chen, Xiong Xiao, Takuya Yoshioka, Hakan Erdogan, Jinyu Li, Yifan Gong:
Multi-Channel Overlapped Speech Recognition with Location Guided Speech Extraction Network. SLT 2018: 558-565 - [i2]Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong:
Cracking the cocktail party problem by multi-beam deep attractor network. CoRR abs/1803.10924 (2018) - [i1]Takuya Yoshioka, Hakan Erdogan, Zhuo Chen, Xiong Xiao, Fil Alleva:
Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks. CoRR abs/1810.03655 (2018) - 2017
- [j19]Takuya Higuchi, Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Marc Delcroix, Tomohiro Nakatani:
Online MVDR Beamformer Based on Complex Gaussian Mixture Model With Spatial Prior for Noise Robust ASR. IEEE ACM Trans. Audio Speech Lang. Process. 25(4): 780-793 (2017) - [c51]Zhuo Chen, Jinyu Li, Xiong Xiao, Takuya Yoshioka, Huaming Wang, Zhenghao Wang, Yifan Gong:
Cracking the cocktail party problem by multi-beam deep attractor network. ASRU 2017: 437-444 - [c50]Shoko Araki, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Takuya Higuchi, Takuya Yoshioka, Dung T. Tran, Shigeki Karita, Tomohiro Nakatani:
Online meeting recognition in noisy environments with time-frequency mask based MVDR beamforming. HSCMA 2017: 16-20 - [c49]Takuya Higuchi, Takuya Yoshioka, Keisuke Kinoshita, Tomohiro Nakatani:
Unsupervised utterance-wise beamformer estimation with speech recognition-level criterion. ICASSP 2017: 5170-5174 - [p3]Marc Delcroix, Takuya Yoshioka, Nobutaka Ito, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto, Takuya Higuchi, Shoko Araki, Tomohiro Nakatani:
Multichannel Speech Enhancement Approaches to DNN-Based Far-Field Speech Recognition. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 21-49 - [p2]Keisuke Kinoshita, Marc Delcroix, Sharon Gannot, Emanuël A. P. Habets, Reinhold Haeb-Umbach, Walter Kellermann, Volker Leutnant, Roland Maas, Tomohiro Nakatani, Bhiksha Raj, Armin Sehr, Takuya Yoshioka:
The REVERB Challenge: A Benchmark Task for Reverberation-Robust ASR Techniques. New Era for Robust Speech Recognition, Exploiting Deep Learning 2017: 345-354 - 2016
- [j18]Keisuke Kinoshita, Marc Delcroix, Sharon Gannot, Emanuël A. P. Habets, Reinhold Haeb-Umbach, Walter Kellermann, Volker Leutnant, Roland Maas, Tomohiro Nakatani, Bhiksha Raj, Armin Sehr, Takuya Yoshioka:
A summary of the REVERB challenge: state-of-the-art and remaining challenges in reverberant speech processing research. EURASIP J. Adv. Signal Process. 2016: 7 (2016) - [c48]Takuya Higuchi, Nobutaka Ito, Takuya Yoshioka, Tomohiro Nakatani:
Robust MVDR beamforming using time-frequency masks for online/offline ASR in noise. ICASSP 2016: 5210-5214 - [c47]Marc Delcroix, Keisuke Kinoshita, Chengzhu Yu, Atsunori Ogawa, Takuya Yoshioka, Tomohiro Nakatani:
Context adaptive deep neural networks for fast acoustic model adaptation in noisy conditions. ICASSP 2016: 5270-5274 - [c46]Takuya Yoshioka, Katsunori Ohnishi, Fuming Fang, Tomohiro Nakatani:
Noise robust speech recognition using recent developments in neural networks for computer vision. ICASSP 2016: 5730-5734 - [c45]Marc Delcroix, Keisuke Kinoshita, Atsunori Ogawa, Takuya Yoshioka, Dung T. Tran, Tomohiro Nakatani:
Context Adaptive Neural Network for Rapid Adaptation of Deep CNN Based Acoustic Models. INTERSPEECH 2016: 1573-1577 - [c44]Atsunori Ogawa, Shogo Seki, Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Kazuya Takeda:
Robust Example Search Using Bottleneck Features for Example-Based Speech Enhancement. INTERSPEECH 2016: 3733-3737 - [c43]Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani:
Optimization of Speech Enhancement Front-End with Speech Recognition-Level Criterion. INTERSPEECH 2016: 3808-3812 - [c42]Takuya Higuchi, Takuya Yoshioka, Tomohiro Nakatani:
Sparseness-based multichannel nonnegative matrix factorization for blind source separation. IWAENC 2016: 1-5 - 2015
- [j17]Takuya Yoshioka, Mark J. F. Gales:
Environmentally robust ASR front-end for deep neural network acoustic models. Comput. Speech Lang. 31(1): 65-86 (2015) - [j16]Marc Delcroix, Takuya Yoshioka, Atsunori Ogawa, Yotaro Kubo, Masakiyo Fujimoto, Nobutaka Ito, Keisuke Kinoshita, Miquel Espi, Shoko Araki, Takaaki Hori, Tomohiro Nakatani:
Strategies for distant speech recognitionin reverberant environments. EURASIP J. Adv. Signal Process. 2015: 60 (2015) - [c41]Takuya Yoshioka, Nobutaka Ito, Marc Delcroix, Atsunori Ogawa, Keisuke Kinoshita, Masakiyo Fujimoto, Chengzhu Yu, Wojciech J. Fabian, Miquel Espi, Takuya Higuchi, Shoko Araki, Tomohiro Nakatani:
The NTT CHiME-3 system: Advances in speech enhancement and recognition for mobile multi-microphone devices. ASRU 2015: 436-443 - [c40]Takuya Yoshioka, Shigeki Karita, Tomohiro Nakatani:
Far-field speech recognition using CNN-DNN-HMM with convolution in time. ICASSP 2015: 4360-4364 - [c39]Chengzhu Yu, Atsunori Ogawa, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, John H. L. Hansen:
Robust i-vector extraction for neural network adaptation in noisy environment. INTERSPEECH 2015: 2854-2857 - 2014
- [j15]Takuma Otsuka, Katsuhiko Ishiguro, Takuya Yoshioka, Hiroshi Sawada, Hiroshi G. Okuno:
Multichannel sound source dereverberation and separation for arbitrary number of sources based on Bayesian nonparametrics. IEEE ACM Trans. Audio Speech Lang. Process. 22(12): 2218-2232 (2014) - [c38]Marc Delcroix, Takuya Yoshioka, Atsunori Ogawa, Yotaro Kubo, Masakiyo Fujimoto, Nobutaka Ito, Keisuke Kinoshita, Miquel Espi, Shoko Araki, Takaaki Hori, Tomohiro Nakatani:
Defeating reverberation: Advanced dereverberation and recognition techniques for hands-free speech recognition. GlobalSIP 2014: 522-526 - [c37]Takuya Yoshioka, Xie Chen, Mark J. F. Gales:
Impact of single-microphone dereverberation on DNN-based meeting transcription systems. ICASSP 2014: 5527-5531 - [c36]Takuya Yoshioka, Anton Ragni, Mark J. F. Gales:
Investigation of unsupervised adaptation of DNN acoustic models with filter bank input. ICASSP 2014: 6344-6348 - [c35]Nobutaka Ito, Shoko Araki, Takuya Yoshioka, Tomohiro Nakatani:
Relaxed disjointness based clustering for joint blind source separation and dereverberation. IWAENC 2014: 268-272 - 2013
- [j14]Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Shoko Araki, Atsunori Ogawa, Takaaki Hori, Shinji Watanabe, Masakiyo Fujimoto, Takuya Yoshioka, Takanobu Oba, Yotaro Kubo, Mehrez Souden, Seong-Jun Hahm, Atsushi Nakamura:
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds. Comput. Speech Lang. 27(3): 851-873 (2013) - [j13]Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, Keikichi Hirose:
Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting. IEEE Trans. Speech Audio Process. 21(10): 2172-2181 (2013) - [j12]Takuya Yoshioka, Tomohiro Nakatani:
Noise Model Transfer: Novel Approach to Robustness Against Nonstationary Noise. IEEE Trans. Speech Audio Process. 21(10): 2182-2192 (2013) - [j11]Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Marc Delcroix, Masakiyo Fujimoto:
Dominance Based Integration of Spatial and Spectral Features for Speech Enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 21(12): 2516-2531 (2013) - [c34]Takuya Yoshioka, Tomohiro Nakatani:
Dereverberation for reverberation-robust microphone arrays. EUSIPCO 2013: 1-5 - [c33]Takuya Yoshioka, Tomohiro Nakatani:
Noise model transfer using affine transformation with application to large vocabulary reverberant speech recognition. ICASSP 2013: 7058-7062 - [c32]Tomohiro Nakatani, Mehrez Souden, Shoko Araki, Takuya Yoshioka, Takaaki Hori, Atsunori Ogawa:
Coupling beamforming with spatial and spectral feature based spectral enhancement and its application to meeting recognition. ICASSP 2013: 7249-7253 - [c31]Roland Maas, Walter Kellermann, Armin Sehr, Takuya Yoshioka, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani:
Formulation of the REMOS concept from an uncertainty decoding perspective. DSP 2013: 1-6 - [c30]Armin Sehr, Takuya Yoshioka, Marc Delcroix, Keisuke Kinoshita, Tomohiro Nakatani, Roland Maas, Walter Kellermann:
Conditional emission densities for combining speech enhancement and recognition systems. INTERSPEECH 2013: 3502-3506 - [c29]Keisuke Kinoshita, Marc Delcroix, Takuya Yoshioka, Tomohiro Nakatani, Armin Sehr, Walter Kellermann, Roland Maas:
The reverb challenge: Acommon evaluation framework for dereverberation and recognition of reverberant speech. WASPAA 2013: 1-4 - 2012
- [j10]Mehrez Souden, Marc Delcroix, Keisuke Kinoshita, Takuya Yoshioka, Tomohiro Nakatani:
Noise Power Spectral Density Tracking: A Maximum Likelihood Perspective. IEEE Signal Process. Lett. 19(8): 495-498 (2012) - [j9]Takuya Yoshioka, Armin Sehr, Marc Delcroix, Keisuke Kinoshita, Roland Maas, Tomohiro Nakatani, Walter Kellermann:
Making Machines Understand Us in Reverberant Rooms: Robustness Against Reverberation for Automatic Speech Recognition. IEEE Signal Process. Mag. 29(6): 114-126 (2012) - [j8]Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato:
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera. IEEE Trans. Speech Audio Process. 20(2): 499-513 (2012) - [j7]Takuya Yoshioka, Tomohiro Nakatani:
Generalization of Multi-Channel Linear Prediction Methods for Blind MIMO Impulse Response Shortening. IEEE Trans. Speech Audio Process. 20(10): 2707-2720 (2012) - [c28]Takuya Yoshioka, Armin Sehr, Marc Delcroix, Keisuke Kinoshita, Roland Maas, Tomohiro Nakatani, Walter Kellermann:
Survey on approaches to speech recognition in reverberant environments. APSIPA 2012: 1-4 - [c27]Tomohiro Nakatani, Takuya Yoshioka, Shoko Araki, Marc Delcroix, Masakiyo Fujimoto:
LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise. ICASSP 2012: 4029-4032 - [c26]Masayuki Suzuki, Takuya Yoshioka, Shinji Watanabe, Nobuaki Minematsu, Keikichi Hirose:
MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments. ICASSP 2012: 4109-4112 - [c25]Takuya Yoshioka, Emmanuel Ternon, Tomohiro Nakatani:
Time-varying residual noise feature model estimation for multi-microphone speech recognition. ICASSP 2012: 4913-4916 - [c24]Takuya Yoshioka, Daichi Sakaue:
Log-normal matrix factorization with application to speech-music separation. SAPA@INTERSPEECH 2012: 80-85 - 2011
- [j6]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi, Hiroshi G. Okuno:
Blind Separation and Dereverberation of Speech Mixtures by Joint Optimization. IEEE Trans. Speech Audio Process. 19(1): 69-84 (2011) - [c23]Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:
Joint unsupervised learning of hidden Markov source models and source location models for multichannel source separation. ICASSP 2011: 237-240 - [c22]Naoki Yasuraoka, Hirokazu Kameoka, Takuya Yoshioka, Hiroshi G. Okuno:
I-Divergence-based dereverberation method with auxiliary function approach. ICASSP 2011: 369-372 - [c21]Takuya Yoshioka, Tomohiro Nakatani:
Speech enhancement based on log spectral envelope model and harmonicity-derived spectral mask, and its coupling with feature compensation. ICASSP 2011: 5064-5067 - [c20]Tomohiro Nakatani, Shoko Araki, Marc Delcroix, Takuya Yoshioka, Masakiyo Fujimoto:
Reduction of Highly Nonstationary Ambient Noise by Integrating Spectral and Locational Characteristics of Speech and Noise for Robust ASR. INTERSPEECH 2011: 1785-1788 - 2010
- [j5]Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang:
Speech Dereverberation Based on Variance-Normalized Delayed Linear Prediction. IEEE Trans. Speech Audio Process. 18(7): 1717-1731 (2010) - [c19]Hirokazu Kameoka, Takuya Yoshioka, Mariko Hamamura, Jonathan Le Roux, Kunio Kashino:
Statistical Model of Speech Signals Based on Composite Autoregressive System with Application to Blind Source Separation. LVA/ICA 2010: 245-253 - [c18]Naoki Yasuraoka, Takuya Yoshioka, Tomohiro Nakatani, Atsushi Nakamura, Hiroshi G. Okuno:
Music dereverberation using harmonic structure source model and Wiener filter. ICASSP 2010: 53-56 - [c17]Takuya Yoshioka, Tomohiro Nakatani, Hiroshi G. Okuno:
Noisy speech enhancement based on prior knowledge about spectral envelope and harmonic structure. ICASSP 2010: 4270-4273 - [c16]Tomohiro Nakatani, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto:
Multichannel source separation based on source location cue with log-spectral shaping by hidden Markov source model. INTERSPEECH 2010: 2766-2769 - [c15]Takaaki Hori, Shoko Araki, Takuya Yoshioka, Masakiyo Fujimoto, Shinji Watanabe, Takanobu Oba, Atsunori Ogawa, Kazuhiro Otsuka, Dan Mikami, Keisuke Kinoshita, Tomohiro Nakatani, Atsushi Nakamura, Junji Yamato:
Real-time meeting recognition and understanding using distant microphones and omni-directional camera. SLT 2010: 424-429 - [p1]Masato Miyoshi, Marc Delcroix, Keisuke Kinoshita, Takuya Yoshioka, Tomohiro Nakatani, Takafumi Hikichi:
Inverse Filtering for Speech Dereverberation Without the Use of Room Acoustics Information. Speech Dereverberation 2010: 271-310
2000 – 2009
- 2009
- [j4]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi:
Integrated Speech Enhancement Method Using Noise Suppression and Dereverberation. IEEE Trans. Speech Audio Process. 17(2): 231-246 (2009) - [c14]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi:
Fast algorithm for conditional separation and dereverberation. EUSIPCO 2009: 1432-1436 - [c13]Hirokazu Kameoka, Tomohiro Nakatani, Takuya Yoshioka:
Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms. ICASSP 2009: 45-48 - [c12]Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang:
Real-time speech enhancement in noisy reverberant multi-talker environments based on a location-independent room acoustics model. ICASSP 2009: 137-140 - [c11]Takuya Yoshioka, Hideyuki Tachibana, Tomohiro Nakatani, Masato Miyoshi:
Adaptive dereverberation of speech signals with speaker-position change detection. ICASSP 2009: 3733-3736 - [c10]Takuya Yoshioka, Hirokazu Kameoka, Tomohiro Nakatani, Hiroshi G. Okuno:
Statistical models for speech dereverberation. WASPAA 2009: 145-148 - 2008
- [j3]Tomohiro Nakatani, Biing-Hwang Juang, Takuya Yoshioka, Keisuke Kinoshita, Marc Delcroix, Masato Miyoshi:
Speech Dereverberation Based on Maximum-Likelihood Estimation With Time-Varying Gaussian Source Model. IEEE Trans. Speech Audio Process. 16(8): 1512-1527 (2008) - [c9]Masato Miyoshi, Keisuke Kinoshita, Takuya Yoshioka, Tomohiro Nakatani:
Principles and applications of dereverberation for noisy and reverberant audio signals. ACSCC 2008: 793-796 - [c8]Takuya Yoshioka, Tomohiro Nakatani, Masato Miyoshi:
An integrated method for blind separation and dereverberation of convolutive audio mixtures. EUSIPCO 2008: 1-5 - [c7]Tomohiro Nakatani, Takuya Yoshioka, Keisuke Kinoshita, Masato Miyoshi, Biing-Hwang Juang:
Blind speech dereverberation with multi-channel linear prediction based on short time fourier transform representation. ICASSP 2008: 85-88 - [c6]Takuya Yoshioka, Tomohiro Nakatani, Takafumi Hikichi, Masato Miyoshi:
Maximum likelihood approach to speech enhancement for noisy reverberant signals. ICASSP 2008: 4585-4588 - [c5]Takuya Yoshioka, Masato Miyoshi:
Adaptive suppression of non-stationary noise by using the variational Bayesian method. ICASSP 2008: 4889-4892 - 2007
- [j2]Takuya Yoshioka, Takafumi Hikichi, Masato Miyoshi:
Dereverberation by Using Time-Variant Nature of Speech Production System. EURASIP J. Adv. Signal Process. 2007 (2007) - [c4]Tomohiro Nakatani, Biing-Hwang Juang, Takafumi Hikichi, Takuya Yoshioka, Keisuke Kinoshita, Marc Delcroix, Masato Miyoshi:
Study on Speech Dereverberation with Autocorrelation Codebook. ICASSP (1) 2007: 193-196 - [c3]Tomohiro Nakatani, Takafumi Hikichi, Keisuke Kinoshita, Takuya Yoshioka, Marc Delcroix, Masato Miyoshi, Biing-Hwang Juang:
Robust blind dereverberation of speech signals based on characteristics of short-time speech segments. ISCAS 2007: 2986-2989 - 2006
- [j1]Takuya Yoshioka, Takafumi Hikichi, Masato Miyoshi, Hiroshi G. Okuno:
Common Acoustical Pole Estimation from Multi-Channel Musical Audio Signals. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 89-A(1): 240-247 (2006) - [c2]Takuya Yoshioka, Takafumi Hikichi, Masato Miyoshi, Hiroshi G. Okuno:
Robust decomposition of inverse filter of channel and prediction error filter of speech signal for dereverberation. EUSIPCO 2006: 1-5 - 2004
- [c1]Takuya Yoshioka, Tetsuro Kitahara, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries. ISMIR 2004
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-10-07 22:14 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint