default search action
IEEE/ACM Transactions on Audio, Speech and Language Processing, Volume 29
Volume 29, 2021
- Bijue Jia, Jiancheng Lv, Xi Peng, Yao Chen, Shenglan Yang:
Hierarchical Regulated Iterative Network for Joint Task of Music Detection and Music Relative Loudness Estimation. 1-13 - Nauman Dawalatabad, Srikanth R. Madikeri, C. Chandra Sekhar, Hema A. Murthy:
Novel Architectures for Unsupervised Information Bottleneck Based Speaker Diarization of Meetings. 14-27 - Midia Yousefi, John H. L. Hansen:
Block-Based High Performance CNN Architectures for Frame-Level Overlapping Speech Detection. 28-40 - Jiaming Cheng, Ruiyu Liang, Zhenlin Liang, Li Zhao, Chengwei Huang, Björn W. Schuller:
A Deep Adaptation Network for Speech Enhancement: Combining a Relativistic Discriminator With Multi-Kernel Maximum Mean Discrepancy. 41-53 - Franz Anders, Mario Hlawitschka, Mirco Fuchs:
Comparison of Artificial Neural Network Types for Infant Vocalization Classification. 54-67 - Tomohiko Nakamura, Hirokazu Kameoka:
Harmonic-Temporal Factor Decomposition for Unsupervised Monaural Separation of Harmonic Sounds. 68-82 - Jens Ahrens, Stefan Bilbao:
Computation of Spherical Harmonic Representations of Source Directivity Based on the Finite-Distance Signature. 83-92 - Shun-Po Chuang, Alexander H. Liu, Tzu-Wei Sung, Hung-yi Lee:
Improving Automatic Speech Recognition and Speech Translation via Word Embedding Prediction. 93-105 - Li Chai, Jun Du, Qing-Feng Liu, Chin-Hui Lee:
A Cross-Entropy-Guided Measure (CEGM) for Assessing Speech Recognition Performance and Optimizing DNN-Based Speech Enhancement. 106-117 - De Hu, Zhe Chen, Fuliang Yin:
Passive Geometry Calibration for Microphone Arrays Based on Distributed Damped Newton Optimization. 118-131 - Berrak Sisman, Junichi Yamagishi, Simon King, Haizhou Li:
An Overview of Voice Conversion and Its Challenges: From Statistical Modeling to Deep Learning. 132-157 - Jilu Jin, Gongping Huang, Xuehan Wang, Jingdong Chen, Jacob Benesty, Israel Cohen:
Steering Study of Linear Differential Microphone Arrays. 158-170 - Ching Hua Lee, Bhaskar D. Rao, Harinath Garudadri:
Proportionate Adaptive Filtering Algorithms Derived Using an Iterative Reweighting Framework. 171-186 - Shakeel Ahmed, Muhammad Tufail, Muhammad Rehan, Tanveer Abbas, Amna Majid:
A Novel Approach for Improved Noise Reduction Performance in Feed-Forward Active Noise Control Systems With (Loudspeaker) Saturation Non-Linearity in the Secondary Path. 187-197 - Cunhang Fan, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen:
Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition. 198-209 - Amin Edraki, Wai-Yip Chan, Jesper Jensen, Daniel Fogerty:
Speech Intelligibility Prediction Using Spectro-Temporal Modulation Analysis. 210-225 - Phan Le Son:
On the Design of Sparse Arrays With Frequency-Invariant Beam Pattern. 226-238 - Dylan Menzies, Philip Coleman, Filippo Maria Fazi:
A Room Compensation Method by Modification of Reverberant Audio Objects. 239-252 - Yonggang Hu, Thushara D. Abhayapala, Prasanga N. Samarasinghe:
Multiple Source Direction of Arrival Estimations Using Relative Sound Pressure Based MUSIC. 253-264 - Alan Kan, Qinglin Meng:
The Temporal Limits Encoder as a Sound Coding Strategy for Bilateral Cochlear Implants. 265-273 - Rui Liu, Berrak Sisman, Feilong Bao, Jichen Yang, Guanglai Gao, Haizhou Li:
Exploiting Morphological and Phonological Features to Improve Prosodic Phrasing for Mongolian Speech Synthesis. 274-285 - Fei Ma, Thushara D. Abhayapala, Wen Zhang:
Multiple Circular Arrays of Vector Sensors for Real-Time Sound Field Analysis. 286-299 - David Diaz-Guerra, Antonio Miguel, José Ramón Beltrán:
Robust Sound Source Tracking Using SRP-PHAT and 3D Convolutional Neural Networks. 300-311 - Viet Anh Trinh, Michael I. Mandel:
Directly Comparing the Listening Strategies of Humans and Machines. 312-323 - Leda Sari, Mark Hasegawa-Johnson, Samuel Thomas:
Auxiliary Networks for Joint Speaker Adaptation and Speaker Change Detection. 324-333 - Jielong Yang, Xionghu Zhong, Weiguang Chen, Wenwu Wang:
Multiple Acoustic Source Localization in Microphone Array Networks. 334-347 - Bin Wu, Sakriani Sakti, Jinsong Zhang, Satoshi Nakamura:
Tackling Perception Bias in Unsupervised Phoneme Discovery Using DPGMM-RNN Hybrid Model and Functional Load. 348-362 - Taewoong Lee, Liming Shi, Jesper Kjær Nielsen, Mads Græsbøll Christensen:
Fast Generation of Sound Zones Using Variable Span Trade-Off Filters in the DFT-Domain. 363-378 - Maoshen Jia, Yuxuan Wu, Changchun Bao, Christian H. Ritz:
Multi-Source DOA Estimation in Reverberant Environments by Jointing Detection and Modeling of Time-Frequency Points. 379-392 - Wei Xue, Alastair H. Moore, Mike Brookes, Patrick A. Naylor:
Speech Enhancement Based on Modulation-Domain Parametric Multichannel Kalman Filtering. 393-405 - Wei Song, Jingjin Guo, Ruiji Fu, Ting Liu, Lizhen Liu:
A Knowledge Graph Embedding Approach for Metaphor Processing. 406-420 - Longbiao Cheng, Xingwei Sun, Dingding Yao, Junfeng Li, Yonghong Yan:
Estimation Reliability Function Assisted Sound Source Localization With Enhanced Steering Vector Phase Difference. 421-435 - Wangyang Yu, W. Bastiaan Kleijn:
Room Acoustical Parameter Estimation From Room Impulse Responses Using Deep Neural Networks. 436-447 - Miguel Ferrer, Maria de Diego, Gema Piñero, Alberto González:
Affine Projection Algorithm Over Acoustic Sensor Networks for Active Noise Control. 448-461 - Nico Gößling, Daniel Marquardt, Simon Doclo:
Performance Analysis of the Extended Binaural MVDR Beamformer With Partial Noise Estimation. 462-476 - Gábor Gosztolya, Róbert Busa-Fekete:
Ensemble Bag-of-Audio-Words Representation Improves Paralinguistic Classification Accuracy. 477-488 - Alfred Mertins, Marco Maaß, Fabrice Katzberg:
Room Impulse Response Reshaping and Crosstalk Cancellation Using Convex Optimization. 489-502 - Xuefeng Bai, Pengbo Liu, Yue Zhang:
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network. 503-514 - Bengt J. Borgström, Michael S. Brandstein:
Speech Enhancement via Attention Masking Network (SEAMNET): An End-to-End System for Joint Suppression of Noise and Reverberation. 515-526 - Juan Manuel Miramont, Marcelo Alejandro Colominas, Gastón Schlotthauer:
Voice Jitter Estimation Using High-Order Synchrosqueezing Operators. 527-536 - Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong:
Speaker Separation Using Speaker Inventories and Estimated Speech. 537-546 - Sandro Cumani:
On the Distribution of Speaker Verification Scores: Generative Models for Unsupervised Calibration. 547-562 - Yu-Ren Chien, Jón Guðnason:
Acoustic Measure of Vocal Strain Based on Glottal Airflow Periodicity. 563-574 - Xingfa Shen, Xingkun Shao, Quanbo Ge, Lili Liu:
RARS: Recognition of Audio Recording Source Based on Residual Neural Network. 575-584 - Gang Chen, Yang Liu, Huanbo Luan, Meng Zhang, Qun Liu, Maosong Sun:
Learning to Generate Explainable Plots for Neural Story Generation. 585-593 - Wenxing Yang, Jacob Benesty, Gongping Huang, Jingdong Chen:
A New Class of Differential Beamformers. 594-606 - Yuki Mitsufuji, Norihiro Takamune, Shoichi Koyama, Hiroshi Saruwatari:
Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain. 607-617 - Dörte Fischer, Simon Doclo:
Robust Constrained MFMVDR Filters for Single-Channel Speech Enhancement Based on Spherical Uncertainty Set. 618-631 - Xudong Zhao, Jacob Benesty, Jingdong Chen, Gongping Huang:
Differential Beamforming From the Beampattern Factorization Perspective. 632-643 - Yuki Kawara, Chenhui Chu, Yuki Arase:
Preordering Encoding on Transformer for Translation. 644-655 - Hirokazu Kameoka, Wen-Chin Huang, Kou Tanaka, Takuhiro Kaneko, Nobukatsu Hojo, Tomoki Toda:
Many-to-Many Voice Transformer Network. 656-670 - Jie Zhang, Huawei Chen, Li-Rong Dai, Richard Christian Hendriks:
A Study on Reference Microphone Selection for Multi-Microphone Speech Enhancement. 671-683 - Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen:
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019. 684-698 - Markus Niermann, Peter Vary:
Listening Enhancement in Noisy Environments: Solutions in Time and Frequency Domain. 699-709 - Hyeonseung Lee, Woo Hyun Kang, Sung Jun Cheon, Hyeongju Kim, Nam Soo Kim:
Gated Recurrent Context: Softmax-Free Attention for Online Encoder-Decoder Speech Recognition. 710-719 - Elizabeth Vargas, James R. Hopgood, Keith E. Brown, Kartic Subr:
On Improved Training of CNN for Acoustic Source Localisation. 720-732 - Yunqi Cai, Lantian Li, Andrew Abel, Xiaoyan Zhu, Dong Wang:
Deep Normalization for Speaker Vectors. 733-744 - Wen-Chin Huang, Tomoki Hayashi, Yi-Chiao Wu, Hirokazu Kameoka, Tomoki Toda:
Pretraining Techniques for Sequence-to-Sequence Voice Conversion. 745-755 - Arindam Jati, Amrutha Nadarajan, Raghuveer Peri, Karel Mundnich, Tiantian Feng, Benjamin Girault, Shrikanth Narayanan:
Temporal Dynamics of Workplace Acoustic Scenes: Egocentric Analysis and Prediction. 756-769 - Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao:
Modeling Future Cost for Neural Machine Translation. 770-781 - Kashif Munir, Hai Zhao, Zuchao Li:
Adaptive Convolution for Semantic Role Labeling. 782-791 - Yi-Chiao Wu, Tomoki Hayashi, Takuma Okamoto, Hisashi Kawai, Tomoki Toda:
Quasi-Periodic Parallel WaveGAN: A Non-Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. 792-806 - Weitao Yuan, Bofei Dong, Shengbei Wang, Masashi Unoki, Wenwu Wang:
Evolving Multi-Resolution Pooling CNN for Monaural Singing Voice Separation. 807-822 - Liming Shi, Taewoong Lee, Lijun Zhang, Jesper Kjær Nielsen, Mads Græsbøll Christensen:
Generation of Personal Sound Zones With Physical Meaningful Constraints and Conjugate Gradient Method. 823-837 - Xi Chen, Jacob Benesty, Gongping Huang, Jingdong Chen:
On the Robustness of the Superdirective Beamformer. 838-849 - Xinsheng Wang, Tingting Qiao, Jihua Zhu, Alan Hanjalic, Odette Scharenborg:
Generating Images From Spoken Descriptions. 850-865 - Vevake Balaraman, Bernardo Magnini:
Domain-Aware Dialogue State Tracker for Multi-Domain Dialogue Systems. 866-873 - Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Shiyin Kang, Zhiyong Wu, Xunying Liu, Helen Meng:
Exemplar-Based Emotive Speech Synthesis. 874-886 - Heinrich Dinkel, Mengyue Wu, Kai Yu:
Towards Duration Robust Weakly Supervised Sound Event Detection. 887-900 - Zamir Ben-Hur, David Lou Alon, Ravish Mehra, Boaz Rafaely:
Binaural Reproduction Based on Bilateral Ambisonics and Ear-Aligned HRTFs. 901-913 - Philipp Aichinger, Franz Pernkopf:
Synthesis and Analysis-By-Synthesis of Modulated Diplophonic Glottal Area Waveforms. 914-926 - Finnian Kelly, John H. L. Hansen:
Analysis and Calibration of Lombard Effect and Whisper for Speaker Recognition. 927-942 - Matthias Müller, Thilo Schulz, Tatiana Ermakova, Philipp P. Caffier:
Lyric or Dramatic - Vibrato Analysis for Voice Type Classification in Professional Opera Singers. 943-955 - Demóstenes Z. Rodríguez, Dick Carrillo, Miguel Arjona Ramírez, Pedro H. J. Nardelli, Sebastian Möller:
Incorporating Wireless Communication Parameters Into the E-Model Algorithm. 956-968 - Tianrui Zong, Yong Xiang, Iynkaran Natgunanathan, Longxiang Gao, Guang Hua, Wanlei Zhou:
Non-Linear-Echo Based Anti-Collusion Mechanism for Audio Signals. 969-984 - Zheng Lian, Bin Liu, Jianhua Tao:
CTNet: Conversational Transformer Network for Emotion Recognition. 985-1000 - Jiacheng Zhang, Huanbo Luan, Maosong Sun, Feifei Zhai, Jingfang Xu, Yang Liu:
Neural Machine Translation With Explicit Phrase Alignment. 1001-1010 - Maria Vukovic, Melissa N. Stolar, Margaret Lech:
Cognitive Load Estimation From Speech Commands to Simulated Aircraft. 1011-1022 - De Hu, Zhe Chen, Fuliang Yin:
Geometry Calibration for Acoustic Transceiver Networks Based on Network Newton Distributed Optimization. 1023-1032 - Yuki Saito, Shinnosuke Takamichi, Hiroshi Saruwatari:
Perceptual-Similarity-Aware Deep Speaker Representation Learning for Multi-Speaker Generative Modeling. 1033-1048 - Tadashi Sakata, Naomitsu Ikeda, Yuichi Ueda, Akira Watanabe:
Vocal Tract Length Estimation Using Accumulated Means of Formants and Its Effects on Speaker-Normalization. 1049-1064 - Jichen Yang, Hongji Wang, Rohan Kumar Das, Yanmin Qian:
Modified Magnitude-Phase Spectrum Information for Spoofing Detection. 1065-1078 - Yanmin Qian, Zhengyang Chen, Shuai Wang:
Audio-Visual Deep Neural Network for Robust Person Verification. 1079-1092 - Peiqin Lin, Meng Yang, Jianhuang Lai:
Deep Selective Memory Network With Selective Attention and Inter-Aspect Modeling for Aspect Level Sentiment Classification. 1093-1106 - Herman Kamper, Yevgen Matusevych, Sharon Goldwater:
Improved Acoustic Word Embeddings for Zero-Resource Languages Using Multilingual Transfer. 1107-1118 - Weiqing Wang, Jin Pan, Hua Yi, Zhanmei Song, Ming Li:
Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism. 1119-1133 - Yi-Chiao Wu, Tomoki Hayashi, Patrick Lumban Tobing, Kazuhiro Kobayashi, Tomoki Toda:
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model With Pitch-Dependent Dilated Convolution Neural Network. 1134-1148 - Vesa Välimäki, Karolina Prawda:
Late-Reverberation Synthesis Using Interleaved Velvet-Noise Sequences. 1149-1160 - Zhuosheng Zhang, Junlong Li, Hai Zhao:
Multi-Turn Dialogue Reading Comprehension With Pivot Turns and Knowledge. 1161-1173 - Clément Gaultier, Srdan Kitic, Rémi Gribonval, Nancy Bertin:
Sparsity-Based Audio Declipping Methods: Selected Overview, New Algorithms, and Large-Scale Evaluation. 1174-1187 - Lachlan Birnie, Thushara D. Abhayapala, Vladimir Tourbabin, Prasanga N. Samarasinghe:
Mixed Source Sound Field Translation for Virtual Binaural Application With Perceptual Validation. 1188-1203 - Monisankha Pal, Manoj Kumar, Raghuveer Peri, Tae Jin Park, So Hyun Kim, Catherine Lord, Somer Bishop, Shrikanth Narayanan:
Meta-Learning With Latent Space Clustering in Generative Adversarial Network for Speaker Diarization. 1204-1219 - Jie Zhang, Jun Du, Li-Rong Dai:
Sensor Selection for Relative Acoustic Transfer Function Steered Linearly-Constrained Beamformers. 1220-1232 - Huang Xie, Tuomas Virtanen:
Zero-Shot Audio Classification Via Semantic Embeddings. 1233-1242 - Xianhong Chen, Changchun Bao:
Phoneme-Unit-Specific Time-Delay Neural Network for Speaker Verification. 1243-1255 - Dong-Yuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen:
Optimal Output-Constrained Active Noise Control Based on Inverse Adaptive Modeling Leak Factor Estimate. 1256-1269 - Ashutosh Pandey, DeLiang Wang:
Dense CNN With Self-Attention for Time-Domain Speech Enhancement. 1270-1279 - Libo Qin, Wanxiang Che, Minheng Ni, Yangming Li, Ting Liu:
Knowing Where to Leverage: Context-Aware Graph Convolutional Network With an Adaptive Fusion Layer for Contextual Spoken Language Understanding. 1280-1289 - Mingyang Zhang, Yi Zhou, Li Zhao, Haizhou Li:
Transfer Learning From Speech Synthesis to Voice Conversion With Non-Parallel Training Data. 1290-1302 - Weipeng He, Petr Motlícek, Jean-Marc Odobez:
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation. 1303-1317 - Yile Wang, Leyang Cui, Yue Zhang:
Improving Skip-Gram Embeddings Using BERT. 1318-1328 - Linzhi Wu, Meishan Zhang:
Deep Graph-Based Character-Level Chinese Dependency Parsing. 1329-1339 - Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang:
Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data. 1340-1351 - Byung Joon Cho, Hyung-Min Park:
Convolutional Maximum-Likelihood Distortionless Response Beamforming With Steering Vector Estimation for Robust Speech Recognition. 1352-1367 - Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen:
An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation. 1368-1396 - Gal Itzhak, Jacob Benesty, Israel Cohen:
On the Design of Differential Kronecker Product Beamformers. 1397-1410 - Zhongshu Ge, Liang Li, Tianshu Qu:
Partially Matching Projection Decoding Method Evaluation Under Different Playback Conditions. 1411-1423 - Sijie Mai, Songlong Xing, Haifeng Hu:
Analyzing Multimodal Sentiment Via Acoustic- and Visual-LSTM With Channel-Aware Temporal Convolution Network. 1424-1437 - Tao Qian, Meishan Zhang, Yinxia Lou, Daiwen Hua:
A Joint Model for Named Entity Recognition With Sentence-Level Entity Type Attentions. 1438-1448 - Ryotaro Sato, Kenta Niwa, Kazunori Kobayashi:
Ambisonic Signal Processing DNNs Guaranteeing Rotation, Scale and Time Translation Equivariance. 1449-1462 - Sooyeon Park, Jung-Woo Choi:
Iterative Echo Labeling Algorithm With Convex Hull Expansion for Room Geometry Estimation. 1463-1478 - Aidan O. T. Hogg, Christine Evers, Alastair H. Moore, Patrick A. Naylor:
Overlapping Speaker Segmentation Using Multiple Hypothesis Tracking of Fundamental Frequency. 1479-1490 - Rajib Sharma, Israel Cohen, Baruch Berdugo:
Controlling Elevation and Azimuth Beamwidths With Concentric Circular Microphone Arrays. 1491-1502 - Runze Wang, Zhen-Hua Ling, Jing-Bo Zhou, Yu Hu:
A Multiple-Integration Encoder for Multi-Turn Text-to-SQL Semantic Parsing. 1503-1513 - Shoukang Hu, Xurong Xie, Shansong Liu, Jianwei Yu, Zi Ye, Mengzhe Geng, Xunying Liu, Helen Meng:
Bayesian Learning of LF-MMI Trained Time Delay Neural Networks for Speech Recognition. 1514-1529 - Matteo Torcoli, Thorsten Kastner, Jürgen Herre:
Objective Measures of Perceptual Audio Quality Reviewed: An Evaluation of Their Application Domain Dependence. 1530-1541 - Heinrich Dinkel, Shuai Wang, Xuenan Xu, Mengyue Wu, Kai Yu:
Voice Activity Detection in the Wild: A Data-Driven Approach Using Teacher-Student Training. 1542-1555 - Songbin Li, Jingang Wang, Peng Liu, Miao Wei, Qiandong Yan:
Detection of Multiple Steganography Methods in Compressed Speech Based on Code Element Embedding, Bi-LSTM and CNN With Attention Mechanisms. 1556-1569 - Qianli Ma, Jiangyue Yan, Zhenxi Lin, Liuhong Yu, Zipeng Chen:
Deformable Self-Attention for Text Classification. 1570-1581 - Yajie Zhang, Zhen-Hua Ling:
Extracting and Predicting Word-Level Style Variations for Speech Synthesis. 1582-1593 - Alexander Bohlender, Ann Spriet, Wouter Tirry, Nilesh Madhu:
Exploiting Temporal Context in CNN Based Multisource DOA Estimation. 1594-1608 - Kohei Yatabe, Daichi Kitamura:
Determined BSS Based on Time-Frequency Masking and Its Application to Harmonic Vector Analysis. 1609-1625 - Ji Won Yoon, Hyeonseung Lee, Hyung Yong Kim, Won-Ik Cho, Nam Soo Kim:
TutorNet: Towards Flexible Knowledge Distillation for End-to-End Speech Recognition. 1626-1638 - Prachi Singh, Sriram Ganapathy:
Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization. 1639-1649 - Penghui Wei, Jiahao Zhao, Wenji Mao:
A Graph-to-Sequence Learning Framework for Summarizing Opinionated Texts. 1650-1660 - Dovid Y. Levin, Shmulik Markovich-Golan, Sharon Gannot:
Near-Field Superdirectivity: An Analytical Perspective. 1661-1674 - Jia-Hao Hsu, Ming-Hsiang Su, Chung-Hsien Wu, Yi-Hsuan Chen:
Speech Emotion Recognition Considering Nonverbal Vocalization in Affective Conversations. 1675-1686 - Tomohiko Nakamura, Shihori Kozuka, Hiroshi Saruwatari:
Time-Domain Audio Source Separation With Neural Networks Based on Multiresolution Analysis. 1687-1701 - Yun Zhang, Yongguo Liu, Jiajing Zhu, Xindong Wu:
FSPRM: A Feature Subsequence Based Probability Representation Model for Chinese Word Embedding. 1702-1716 - Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng:
Any-to-Many Voice Conversion With Location-Relative Sequence-to-Sequence Modeling. 1717-1728 - Rafael Attili Chiea, Márcio H. Costa, Julio A. Cordioli:
An Optimal Envelope-Based Noise Reduction Method for Cochlear Implants: An Upper Bound Performance Investigation. 1729-1739 - Junliang Guo, Zhirui Zhang, Linli Xu, Boxing Chen, Enhong Chen:
Adaptive Adapters: An Efficient Way to Incorporate BERT Into Neural Machine Translation. 1740-1751 - Yi Luo, Cong Han, Nima Mesgarani:
Group Communication With Context Codec for Lightweight Source Separation. 1752-1761 - Zhiwen Xie, Runjie Zhu, Jin Liu, Guangyou Zhou, Jimmy Xiangji Huang:
Hierarchical Neighbor Propagation With Bidirectional Graph Attention Network for Relation Prediction. 1762-1773 - Xuehan Wang, Jacob Benesty, Jingdong Chen, Gongping Huang, Israel Cohen:
Beamforming with Cube Microphone Arrays Via Kronecker Product Decompositions. 1774-1784 - Ke Tan, DeLiang Wang:
Towards Model Compression for Deep Learning Based Speech Enhancement. 1785-1794 - Kristina Tesch, Timo Gerkmann:
Nonlinear Spatial Filtering in Multichannel Speech Enhancement. 1795-1805 - Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li:
Expressive TTS Training With Frame and Style Reconstruction Loss. 1806-1818 - Jipeng Qiang, Xinyu Lu, Yun Li, Yunhao Yuan, Xindong Wu:
Chinese Lexical Simplification. 1819-1828 - Andong Li, Wenzhe Liu, Chengshi Zheng, Cunhang Fan, Xiaodong Li:
Two Heads are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement. 1829-1843 - Eric Carlos Hamdan, Filippo Maria Fazi:
Weighted Orthogonal Vector Rejection Method for Loudspeaker-Based Binaural Audio Reproduction. 1844-1852 - Ke Tan, Xueliang Zhang, DeLiang Wang:
Deep Learning Based Real-Time Speech Enhancement for Dual-Microphone Mobile Phones. 1853-1863 - Kunkun SongGong, Huawei Chen, Wenwu Wang:
Indoor Multi-Speaker Localization Based on Bayesian Nonparametrics in the Circular Harmonic Domain. 1864-1880 - Aleksej Chinaev, Philipp Thüne, Gerald Enzner:
Double-Cross-Correlation Processing for Blind Sampling-Rate and Time-Offset Estimation. 1881-1896 - Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang:
Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT. 1897-1911 - Öykü Deniz Köse, Murat Saraçlar:
Multimodal Representations for Synchronized Speech and Real-Time MRI Video Processing. 1912-1924 - N. P. Narendra, Björn W. Schuller, Paavo Alku:
The Detection of Parkinson's Disease From Speech Using Voice Source Information. 1925-1936 - Robert Rehr, Timo Gerkmann:
SNR-Based Features and Diverse Training Data for Robust DNN-Based Speech Enhancement. 1937-1949 - Nobutaka Ito, Rintaro Ikeshita, Hiroshi Sawada, Tomohiro Nakatani:
A Joint Diagonalization Based Efficient Approach to Underdetermined Blind Audio Source Separation Using the Multichannel Wiener Filter. 1950-1965 - Hao Fei, Shengqiong Wu, Yafeng Ren, Donghong Ji:
Second-Order Semantic Role Labeling With Global Structural Refinement. 1966-1976 - Humberto M. Torres, Mercedes Güemes, Jorge A. Gurlekian, Diego A. Evin:
F0 Perturbation Due to Articulatory Movements: Filtering, Characterization and Applications. 1977-1986 - Khaled Koutini, Hamid Eghbal-zadeh, Gerhard Widmer:
Receptive Field Regularization Techniques for Audio Classification and Tagging With Deep Convolutional Neural Networks. 1987-2000 - Zhong-Qiu Wang, Peidong Wang, DeLiang Wang:
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation. 2001-2014 - Mengjia Zhou, Donghong Ji, Fei Li:
Relation Extraction in Dialogues: A Deep Learning Model Based on the Generality and Specialty of Dialogue Text. 2015-2026 - Minh Nguyen, Gia H. Ngo, Nancy F. Chen:
Domain-Shift Conditioning Using Adaptable Filtering Via Hierarchical Embeddings for Robust Chinese Spell Check. 2027-2036 - Lior Madmoni, Shir Tibor, Israel Nelken, Boaz Rafaely:
The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech. 2037-2047 - Haibin Chen, Qianli Ma, Liuhong Yu, Zhenxi Lin, Jiangyue Yan:
Corpus-Aware Graph Aggregation Network for Sequence Labeling. 2048-2057 - Heming Wang, DeLiang Wang:
Towards Robust Speech Super-Resolution. 2058-2066 - Jianwei Yu, Shi-Xiong Zhang, Bo Wu, Shansong Liu, Shoukang Hu, Mengzhe Geng, Xunying Liu, Helen Meng, Dong Yu:
Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech. 2067-2082 - Olga Slizovskaia, Gloria Haro, Emilia Gómez:
Conditioned Source Separation for Musical Instrument Performances. 2083-2095 - Xurong Xie, Xunying Liu, Tan Lee, Lan Wang:
Bayesian Learning for Deep Neural Network Adaptation. 2096-2110 - Sankha Subhra Bhattacharjee, Nithin V. George:
Nearest Kronecker Product Decomposition Based Linear-in-The-Parameters Nonlinear Filters. 2111-2122 - Canguang Li, Guohua Wang, Jin Cao, Yi Cai:
A Multi-Agent Communication Based Model for Nested Named Entity Recognition. 2123-2136 - Jonah Ong, Ba-Tuong Vo, Sven Nordholm:
Blind Separation for Multiple Moving Sources With Labeled Random Finite Sets. 2137-2151 - Yixuan Su, Yan Wang, Deng Cai, Simon Baker, Anna Korhonen, Nigel Collier:
PROTOTYPE-TO-STYLE: Dialogue Generation With Style-Aware Editing on Retrieval Memory. 2152-2161 - Alberto Bernardini, Enrico Bozzo, Federico Fontana, Augusto Sarti:
A Wave Digital Newton-Raphson Method for Virtual Analog Modeling of Audio Circuits with Multiple One-Port Nonlinearities. 2162-2173 - Gang Guo, Yi Yu, Rodrigo C. de Lamare, Zongsheng Zheng, Lu Lu, Qiangming Cai:
Proximal Normalized Subband Adaptive Filtering for Acoustic Echo Cancellation. 2174-2188 - Juho Liski, Aki Mäkivirta, Vesa Välimäki:
Audibility of Group-Delay Equalization. 2189-2201 - Farjana Sultana Mim, Naoya Inoue, Paul Reisert, Hiroki Ouchi, Kentaro Inui:
Corruption Is Not All Bad: Incorporating Discourse Structure Into Pre-Training via Corruption for Essay Scoring. 2202-2215 - Dror Kipnis, Roee Diamant:
Graph-Based Clustering of Dolphin Whistles. 2216-2227 - Yuanyuan Liu, Nelly Penttilä, Tiina Ihalainen, Juulia Lintula, Rachel Convey, Okko Räsänen:
Language-Independent Approach for Automatic Computation of Vowel Articulation Features in Dysarthric Speech Assessment. 2228-2243 - César Medina, Rosangela Coelho, Leonardo Zão:
Impulsive Noise Detection for Speech Enhancement in HHT Domain. 2244-2253 - Iván López-Espejo, Zheng-Hua Tan, Jesper Jensen:
A Novel Loss Function and Training Strategy for Noise-Robust Keyword Spotting. 2254-2266 - Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui, Jianwei Yu, Xunying Liu, Helen Meng:
Recent Progress in the CUHK Dysarthric Speech Recognition System. 2267-2281 - Juan Zhao, Tianrui Zong, Yong Xiang, Longxiang Gao, Wanlei Zhou, Gleb Beliakov:
Desynchronization Attacks Resilient Watermarking Method Based on Frequency Singular Value Coefficient Modification. 2282-2295 - Mert Burkay Çöteli, Hüseyin Hacihabiboglu:
Sparse Representations With Legendre Kernels for DOA Estimation and Acoustic Source Separation. 2296-2309 - Nicolas Furnon, Romain Serizel, Slim Essid, Irina Illina:
DNN-Based Mask Estimation for Distributed Speech Enhancement in Spatially Unconstrained Microphone Arrays. 2310-2323 - Or Haim Anidjar, Itshak Lapidot, Chen Hajaj, Amit Dvir, Issachar Gilad:
Hybrid Speech and Text Analysis Methods for Speaker Change Detection. 2324-2338 - Chuang Fan, Chaofa Yuan, Lin Gui, Yue Zhang, Ruifeng Xu:
Multi-Task Sequence Tagging for Emotion-Cause Pair Extraction Via Tag Distribution Refinement. 2339-2350 - Andy T. Liu, Shang-Wen Li, Hung-yi Lee:
TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech. 2351-2366 - Guanlong Zhao, Shaojin Ding, Ricardo Gutierrez-Osuna:
Converting Foreign Accent Speech Without a Reference. 2367-2381 - Kilian Schulze-Forster, Clement S. J. Doire, Gaël Richard, Roland Badeau:
Phoneme Level Lyrics Alignment and Text-Informed Singing Voice Separation. 2382-2395 - Shengqiong Wu, Hao Fei, Yafeng Ren, Bobo Li, Fei Li, Donghong Ji:
High-Order Pair-Wise Aspect and Opinion Terms Extraction With Edge-Enhanced Syntactic Graph Convolution. 2396-2406 - Jingyi Wu, Lin Shang, Xiaoying Gao:
Sentiment Time Series Calibration for Event Detection. 2407-2420 - Kashif Munir, Hai Zhao, Zuchao Li:
Learning Context-Aware Convolutional Filters for Implicit Discourse Relation Classification. 2421-2433 - Seokhwan Kim, Hannes Schulz, R. Chulaka Gunasekara, Chiori Hori, Abhinav Rastogi, Luis Fernando D'Haro:
Editorial: Special Issue on the Eighth Dialog System Technology Challenge. 2434-2436 - Byoungjae Kim, Jungyun Seo, Myoung-Wan Koo:
Randomly Wired Network Based on RoBERTa and Dialog History Attention for Response Selection. 2437-2442 - Jia-Chen Gu, Tianda Li, Zhen-Hua Ling, Quan Liu, Zhiming Su, Yu-Ping Ruan, Xiaodan Zhu:
Deep Contextualized Utterance Representations for Response Selection and Dialogue Analysis. 2443-2455 - Yun-Wei Chu, Kuan-Yen Lin, Chao-Chun Hsu, Lun-Wei Ku:
End-to-End Recurrent Cross-Modality Attention for Video Dialogue. 2456-2464 - Kun Xu, Han Wu, Linfeng Song, Haisong Zhang, Linqi Song, Dong Yu:
Conversational Semantic Role Labeling. 2465-2475 - Zekang Li, Zongjia Li, Jinchao Zhang, Yang Feng, Jie Zhou:
Bridging Text and Video: A Universal Multimodal Transformer for Audio-Visual Scene-Aware Dialog. 2476-2483 - Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz:
GRTr: Generative-Retrieval Transformers for Data-Efficient Dialogue Domain Adaptation. 2484-2492 - Jiali Zeng, Yongjing Yin, Yang Liu, Yubin Ge, Jinsong Su:
Domain Adaptive Meta-Learning for Dialogue State Tracking. 2493-2501 - Chen Zhang, Grandee Lee, Luis Fernando D'Haro, Haizhou Li:
D-Score: Holistic Dialogue Evaluation Without Reference. 2502-2516 - Shrikant Malviya, Rohit Mishra, Santosh Kumar Barnwal, Uma Shanker Tiwary:
HDRS: Hindi Dialogue Restaurant Search Corpus for Dialogue State Tracking in Task-Oriented Environment. 2517-2528 - Seokhwan Kim, Michel Galley, R. Chulaka Gunasekara, Sungjin Lee, Adam Atkinson, Baolin Peng, Hannes Schulz, Jianfeng Gao, Jinchao Li, Mahmoud Adada, Minlie Huang, Luis A. Lastras, Jonathan K. Kummerfeld, Walter S. Lasecki, Chiori Hori, Anoop Cherian, Tim K. Marks, Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta:
Overview of the Eighth Dialog System Technology Challenge: DSTC8. 2529-2540 - Myeongho Jeong, Seungtaek Choi, Jinyoung Yeo, Seung-won Hwang:
Label and Context Augmentation for Response Selection at DSTC8. 2541-2550 - Qing Liu, Lei Chen, Yuan Yuan, Huarui Wu:
History Reuse and Bag-of-Words Loss for Long Summary Generation. 2551-2560 - Lu Zhang, Mingjiang Wang, Qiquan Zhang, Xinsheng Wang, Ming Liu:
PhaseDCN: A Phase-Enhanced Dual-Path Dilated Convolutional Network for Single-Channel Speech Enhancement. 2561-2574 - Kazi Nazmul Haque, Rajib Rana, Jiajun Liu, John H. L. Hansen, Nicholas Cummins, Carlos Busso, Björn W. Schuller:
Guided Generative Adversarial Neural Network for Representation Learning and Audio Generation Using Fewer Labelled Audio Data. 2575-2590 - Toru Nakashika, Kohei Yatabe:
Gamma Boltzmann Machine for Audio Modeling. 2591-2605 - Xintong Li, Lemao Liu, Zhaopeng Tu, Guanlin Li, Shuming Shi, Max Q.-H. Meng:
Attending From Foresight: A Novel Attention Mechanism for Neural Machine Translation. 2606-2616 - Hengshun Zhou, Jun Du, Yuanyuan Zhang, Qing Wang, Qing-Feng Liu, Chin-Hui Lee:
Information Fusion in Attention Networks Using Adaptive and Multi-Level Factorized Bilinear Pooling for Audio-Visual Emotion Recognition. 2617-2629 - Yuling Li, Kui Yu, Yuhong Zhang:
Learning Cross-Lingual Mappings in Imperfectly Isomorphic Embedding Spaces. 2630-2642 - Xiao Zhou, Zhen-Hua Ling, Li-Rong Dai:
UnitNet: A Sequence-to-Sequence Acoustic Model for Concatenative Speech Synthesis. 2643-2655 - Zihan Pan, Malu Zhang, Jibin Wu, Jiadong Wang, Haizhou Li:
Multi-Tone Phase Coding of Interaural Time Difference for Sound Source Localization With Spiking Neural Networks. 2656-2670 - Ken O'Hanlon, Mark B. Sandler:
FifthNet: Structured Compact Neural Networks for Automatic Chord Recognition. 2671-2682 - Simone Spagnol, Riccardo Miccini, Marius George Onofrei, Runar Unnthorsson, Stefania Serafin:
Estimation of Spectral Notches From Pinna Meshes: Insights From a Simple Computational Model. 2683-2695 - Chenglin Xu, Wei Rao, Jibin Wu, Haizhou Li:
Target Speaker Verification With Selective Auditory Attention for Single and Multi-Talker Speech. 2696-2709 - Adel Zahedi, Michael Syskind Pedersen, Jan Østergaard, Thomas Ulrich Christiansen, Lars Bramsløw, Jesper Jensen:
Minimum Processing Beamforming. 2710-2724 - Xianghui Wang, Jie Chen, Xiaoyi Chen, Jing Guo, Qian Xiang:
Multichannel Iterative Noise Reduction Filters in the Short-Time-Fourier-Transform Domain Based on Kronecker Product Decomposition. 2725-2740 - Kai-Li Yin, Yi-Fei Pu, Lu Lu:
Robust Q-Gradient Subband Adaptive Filter for Nonlinear Active Noise Control. 2741-2752 - Jaeuk Byun, Jong Won Shin:
Monaural Speech Separation Using Speaker Embedding From Preliminary Separation. 2753-2763 - Xudong Zhao, Gongping Huang, Jingdong Chen, Jacob Benesty:
On the Design of 3D Steerable Beamformers With Uniform Concentric Circular Microphone Arrays. 2764-2778 - Zifeng Cheng, Zhiwei Jiang, Yafeng Yin, Na Li, Qing Gu:
A Unified Target-Oriented Sequence-to-Sequence Model for Emotion-Cause Pair Extraction. 2779-2791 - Hamid Azadi, Mohammad-R. Akbarzadeh-T., Hamid Reza Kobravi, Ali Shoeibi:
Robust Voice Feature Selection Using Interval Type-2 Fuzzy AHP for Automated Diagnosis of Parkinson's Disease. 2792-2802 - Yukiya Hono, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda:
Sinsy: A Deep Neural Network-Based Singing Voice Synthesis System. 2803-2815 - Jian Tang, Jie Zhang, Yan Song, Ian McLoughlin, Li-Rong Dai:
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR. 2816-2828 - Chongman Leong, Xuebo Liu, Derek F. Wong, Lidia S. Chao:
Exploiting Translation Model for Parallel Corpus Mining. 2829-2839 - Neil Zeghidour, David Grangier:
Wavesplit: End-to-End Speech Separation by Speaker Clustering. 2840-2849 - Dino Oglic, Zoran Cvetkovic, Peter Sollich:
Learning Waveform-Based Acoustic Models Using Deep Variational Convolutional Neural Networks. 2850-2863 - Alexandru Nelus, Rainer Martin:
Privacy-Preserving Audio Classification Using Variational Information Feature Extraction. 2864-2877 - Hao Li, DeLiang Wang, Xueliang Zhang, Guanglai Gao:
Recurrent Neural Networks and Acoustic Features for Frame-Level Signal-to-Noise Ratio Estimation. 2878-2887 - Yi Zhou, Xiaoqing Zheng, Xuanjing Huang:
Generating Responses With a Given Syntactic Pattern in Chinese Dialogues. 2888-2898 - Viktor Gunnarsson, Mikael Sternad:
Binaural Auralization of Microphone Array Room Impulse Responses Using Causal Wiener Filtering. 2899-2914 - Zuolong Chen, Huawei Chen, Quansheng Tu:
Sensor Imperfection Tolerance Analysis of Robust Linear Differential Microphone Arrays. 2915-2929 - YuSheng Su, Xu Han, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Peng Li, Jie Zhou, Maosong Sun:
CSS-LM: A Contrastive Framework for Semi-Supervised Fine-Tuning of Pre-Trained Language Models. 2930-2941 - Tobias Kabzinski, Peter Jax:
A Causality-Constrained Frequency-Domain Least-Squares Filter Design Method for Crosstalk Cancellation. 2942-2956 - Frank Zalkow, Meinard Müller:
CTC-Based Learning of Chroma Features for Score-Audio Music Retrieval. 2957-2971 - Teck Kai Chan, Cheng Siong Chin:
Multi-Branch Convolutional Macaron net for Sound Event Detection. 2972-2985 - Tedd Kourkounakis, Amirhossein Hajavi, Ali Etemad:
FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning. 2986-2999 - Haoyu Li, Junichi Yamagishi:
Multi-Metric Optimization Using Generative Adversarial Networks for Near-End Speech Intelligibility Enhancement. 3000-3011 - Zehao Lin, Shaobo Cui, Guodun Li, Xiaoming Kang, Feng Ji, Feng-Lin Li, Zhongzhou Zhao, Haiqing Chen, Yin Zhang:
Predict-Then-Decide: A Predictive Approach for Wait or Answer Task in Dialogue Systems. 3012-3024 - Metin Calis, Steven van de Par, Richard Heusdens, Richard Christian Hendriks:
Localization Based on Enhanced Low Frequency Interaural Level Difference. 3025-3039 - Christopher Liberatore:
Native-Nonnative Voice Conversion by Residual Warping in a Sparse, Anchor-Based Representation. 3040-3051 - Shoichi Koyama, Jesper Brunnström, Hayato Ito, Natsuki Ueno, Hiroshi Saruwatari:
Spatial Active Noise Control Based on Kernel Interpolation of Sound Field. 3052-3063 - Jipeng Qiang, Yun Li, Yi Zhu, Yunhao Yuan, Yang Shi, Xindong Wu:
LSBert: Lexical Simplification Based on BERT. 3064-3076 - Ningyu Zhang, Hongbin Ye, Shumin Deng, Chuanqi Tan, Mosha Chen, Songfang Huang, Fei Huang, Huajun Chen:
Contrastive Information Extraction With Generative Transformer. 3077-3088 - Jianyu Wang, Shanzheng Guan, Shupei Liu, Xiao-Lei Zhang:
Minimum-Volume Multichannel Nonnegative Matrix Factorization for Blind Audio Source Separation. 3089-3103 - Alberto Carini, Stefania Cecchi, Alessandro Terenzi, Simone Orcioni:
A Room Impulse Response Measurement Method Robust Towards Nonlinearities Based on Orthogonal Periodic Sequences. 3104-3117 - Jie Zhang, Changheng Li:
Quantization-Aware Binaural MWF Based Noise Reduction Incorporating External Wireless Devices. 3118-3131 - Biru Zhu, Xingyao Zhang, Ming Gu, Yangdong Deng:
Knowledge Enhanced Fact Checking and Verification. 3132-3143 - Mark A. Poletti, Paul D. Teal:
A Superfast Toeplitz Matrix Inversion Method for Single- and Multi-Channel Inverse Filters and Its Application to Room Equalization. 3144-3157 - Guanlin Li, Lemao Liu, Conghui Zhu, Rui Wang, Tiejun Zhao, Shuming Shi:
Detecting Source Contextual Barriers for Understanding Neural Machine Translation. 3158-3169 - Chia-Chih Kuo, Kuan-Yu Chen, Shang-Bao Luo:
Audio-Aware Spoken Multiple-Choice Question Answering With Pre-Trained Language Models. 3170-3179 - Rui Liu, Zheng Lin, Weiping Wang:
Addressing Extraction and Generation Separately: Keyphrase Prediction With Pre-Trained Language Models. 3180-3191 - Jiangnan Li, Hongliang Pan, Zheng Lin, Peng Fu, Weiping Wang:
Sarcasm Detection with Commonsense Knowledge. 3192-3201 - Runyan Yang, Gaofeng Cheng, Haoran Miao, Ta Li, Pengyuan Zhang, Yonghong Yan:
Keyword Search Using Attention-Based End-to-End ASR and Frame-Synchronous Phoneme Alignments. 3202-3215 - Tareq Alkhaldi, Chenhui Chu, Sadao Kurohashi:
Flexibly Focusing on Supporting Facts, Using Bridge Links, and Jointly Training Specialized Modules for Multi-Hop Question Answering. 3216-3225 - Wenyi Wu, Yegui Xiao, Jianhui Lin, Liying Ma, Khashayar Khorasani:
An Efficient Filter Bank Structure for Adaptive Notch Filtering and Applications. 3226-3241 - Xinsheng Wang, Justin van der Hout, Jihua Zhu, Mark Hasegawa-Johnson, Odette Scharenborg:
Synthesizing Spoken Descriptions of Images. 3242-3254 - Vincent W. Neo, Christine Evers, Patrick A. Naylor:
Enhancement of Noisy Reverberant Speech Using Polynomial Matrix Eigenvalue Decomposition. 3255-3266 - Riccardo Giampiccolo, Mauro Giuseppe de Bari, Alberto Bernardini, Augusto Sarti:
Wave Digital Modeling and Implementation of Nonlinear Audio Circuits With Nullors. 3267-3279 - Xixin Wu, Yuewen Cao, Hui Lu, Songxiang Liu, Disong Wang, Zhiyong Wu, Xunying Liu, Helen Meng:
Speech Emotion Recognition Using Sequential Capsule Networks. 3280-3291 - Yuan Gong, Yu-An Chung, James R. Glass:
PSLA: Improving Audio Tagging With Pretraining, Sampling, Labeling, and Aggregation. 3292-3306 - Licheng Zhang, Zhendong Mao, Benfeng Xu, Quan Wang, Yongdong Zhang:
Review and Arrange: Curriculum Learning for Natural Language Understanding. 3307-3320 - Fei He, Ling He, Jing Zhang, Yuanyuan Li, Xi Xiong:
Automatic Detection of Affective Flattening in Schizophrenia: Acoustic Correlates to Sound Waves and Auditory Perception. 3321-3334 - Saoussen Mathlouthi Bouzid, Chiraz Ben Othmane Zribi:
Efficient Learning Approach for Pronominal Anaphora and Ellipsis Identification and Resolution in Arabic Texts. 3335-3348 - Arda Yüksel, Berke Ugurlu, Aykut Koç:
Semantic Change Detection With Gaussian Word Embeddings. 3349-3361 - Mei Li, Lu Xiang, Xiaomian Kang, Yang Zhao, Yu Zhou, Chengqing Zong:
Medical Term and Status Generation From Chinese Clinical Dialogue With Multi-Granularity Transformer. 3362-3374 - Yongwei Li, Jianhua Tao, Donna Erickson, Bin Liu, Masato Akagi:
$F_0$-Noise-Robust Glottal Source and Vocal Tract Analysis Based on ARX-LF Model. 3375-3383 - Xianwen Liao, Yongzhong Huang, Yongzhuang Wei, Chenhao Zhang, Fu Wang, Yong Wang:
Efficient Estimate of Sentence's Representation Based on the Difference Semantics Model. 3384-3399 - Kwang Myung Jeon, Geon Woo Lee, Nam Kyun Kim, Hong Kook Kim:
TAU-Net: Temporal Activation U-Net Shared With Nonnegative Matrix Factorization for Speech Enhancement in Unseen Noise Environments. 3400-3414 - Yi-Yang Ding, Hao-Jian Lin, Li-Juan Liu, Zhen-Hua Ling, Yu Hu:
Robustness of Speech Spoofing Detectors Against Adversarial Post-Processing of Voice Conversion. 3415-3426 - Yi Zhou, Xiaohai Tian, Haizhou Li:
Language Agnostic Speaker Embedding for Cross-Lingual Personalized Speech Generation. 3427-3439 - Ju Lin, Adriaan J. de Lind van Wijngaarden, Kuang-Ching Wang, Melissa C. Smith:
Speech Enhancement Using Multi-Stage Self-Attentive Temporal Convolutional Networks. 3440-3450 - Wei-Ning Hsu, Benjamin Bolte, Yao-Hung Hubert Tsai, Kushal Lakhotia, Ruslan Salakhutdinov, Abdelrahman Mohamed:
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units. 3451-3460 - Kouei Yamaoka, Nobutaka Ono, Shoji Makino:
Time-Frequency-Bin-Wise Linear Combination of Beamformers for Distortionless Signal Enhancement. 3461-3475 - Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux:
Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation. 3476-3490 - Bing Yang, Hong Liu, Xiaofei Li:
Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization. 3491-3503 - Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Ziqing Yang:
Pre-Training With Whole Word Masking for Chinese BERT. 3504-3514 - Leda Sari, Mark Hasegawa-Johnson, Chang D. Yoo:
Counterfactually Fair Automatic Speech Recognition. 3515-3525 - Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, Lianwu Chen, Donald S. Williamson, Dong Yu:
Multi-Channel Multi-Frame ADL-MVDR for Target Speech Separation. 3526-3540 - Nils L. Westhausen, Rainer Huber, Hannah Baumgartner, Ragini Sinha, Jan Rennies, Bernd T. Meyer:
Reduction of Subjective Listening Effort for TV Broadcast Signals With Recurrent Neural Networks. 3541-3550 - Shota Sasaki, Jun Suzuki, Kentaro Inui:
Subword-Based Compact Reconstruction for Open-Vocabulary Neural Word Embeddings. 3551-3564 - Xiaodong Cui, Wei Zhang, Abdullah Kayi, Mingrui Liu, Ulrich Finkler, Brian Kingsbury, George Saon, David S. Kung:
Asynchronous Decentralized Distributed Training of Acoustic Models. 3565-3576 - Junqing Zhang, Wen Zhang, Jihui Aimee Zhang, Thushara Dheemantha Abhayapala, Lijun Zhang:
Spatial Active Noise Control in Rooms Using Higher Order Sources. 3577-3591 - Bingzhi Chen, Qi Cao, Mixiao Hou, Zheng Zhang, Guangming Lu, David Zhang:
Multimodal Emotion Recognition With Temporal and Semantic Consistency. 3592-3603 - S. Supraja, Andy W. H. Khong, Sivanagaraja Tatinati:
Regularized Phrase-Based Topic Model for Automatic Question Classification With Domain-Agnostic Class Labels. 3604-3616 - Natsuko Maeda, Filippo Maria Fazi, Falk-Martin Hoffmann:
Sound Field Reproduction With a Cylindrical Loudspeaker Array Using First Order Wall Reflections. 3617-3630 - Xugang Lu, Peng Shen, Yu Tsao, Hisashi Kawai:
Coupling a Generative Model With a Discriminative Learning Framework for Speaker Verification. 3631-3641 - Hannes Helmholz, David Lou Alon, Sebastià V. Amengual Garí, Jens Ahrens:
Effects of Additive Noise in Binaural Rendering of Spherical Microphone Array Signals. 3642-3653 - Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro:
Speech Reconstruction With Reminiscent Sound Via Visual Voice Memory. 3654-3667 - Ran Weisman, Tom Shlomo, Vladimir Tourbabin, Paul Calamia, Boaz Rafaely:
Robustness of Acoustic Rake Filters in Minimum Variance Beamforming. 3668-3678 - Junhao Xu, Jianwei Yu, Shoukang Hu, Xunying Liu, Helen Meng:
Mixed Precision Low-Bit Quantization of Neural Network Language Models for Speech Recognition. 3679-3693 - Jidong Ge, Yunyun Huang, Xiaoyu Shen, Chuanyi Li, Wei Hu:
Learning Fine-Grained Fact-Article Correspondence in Legal Cases. 3694-3706 - Qiuqiang Kong, Bochen Li, Xuchen Song, Yuan Wan, Yuxuan Wang:
High-Resolution Piano Transcription With Pedals by Regressing Onset and Offset Times. 3707-3717
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.