default search action
ASRU 2007: Kyoto, Japan
- Sadaoki Furui, Tatsuya Kawahara:
IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2007, Kyoto, Japan, December 9-13, 2007. IEEE 2007, ISBN 978-1-4244-1746-9
Acoustic Modeling and Robust ASR
- Li Deng:
Roles of high-fidelity acoustic modeling in robust speech recognition. 1-13 - Esfandiar Zavarehei, Saeed Vaseghi:
Interpolation of lost speech segments using LP-HNM model with codebook-mapping post-processing. 14-18 - Amin Haji Abolhassani, Sid-Ahmed Selouani, Douglas D. O'Shaughnessy:
Speech enhancement using PCA and variance of the reconstruction error in distributed speech recognition. 19-23 - Mark J. F. Gales, Frank Diehl, Chandra Kant Raut, Marcus Tomalin, Philip C. Woodland, Kai Yu:
Development of a phonetic system for large vocabulary Arabic speech recognition. 24-29 - Chuan-Wei Ting, Jen-Tzung Chien:
Factor analysis of acoustic features for streamed hidden Markov modeling. 30-35 - Özgür Çetin, Mathew Magimai-Doss, Karen Livescu, Arthur Kantor, Simon King, Chris D. Bartels, Joe Frankel:
Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPS. 36-41 - Peter Jancovic, Münevver Köküer:
Incorporating the voicing information into HMM-based automatic speech recognition. 42-46 - Parya Momayyez, James Waterhouse, Richard Rose:
Exploiting complementary aspects of phonological features in automatic speech recognition. 47-52 - Takatoshi Jitsuhiro, Tomoji Toriyama, Kiyoshi Kogure:
Robust speech recognition using noise suppression based on multiple composite models and multi-pass search. 53-58 - Mark J. F. Gales, Rogier C. van Dalen:
Predictive linear transforms for noise robust speech recognition. 59-64 - Jinyu Li, Li Deng, Dong Yu, Yifan Gong, Alex Acero:
High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series. 65-70 - Ken'ichi Kumatani, Uwe Mayer, Tobias Gehrig, Emilian Stoimenov, John W. McDonough, Matthias Wölfel:
Minimum mutual information beamforming for simultaneous active speakers. 71-76 - Yu Tsao, Chin-Hui Lee:
Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition. 77-80 - Liang-Che Sun, Chang-Wen Hsu, Lin-Shan Lee:
Modulation spectrum equalization for robust speech recognition. 81-86 - Shih-Hsiang Lin, Yao-Ming Yeh, Berlin Chen:
Investigating the use of speech features and their corresponding distribution characteristics for robust speech recognition. 87-92 - Nishanth Ulhas Nair, T. V. Sreenivas:
Joint decoding of multiple speech patterns for robust speech recognition. 93-98 - Yi Chen, Chia-Yu Wan, Lin-Shan Lee:
Robust speech recognition by properly utilizing reliable frames and segments in corrupted signals. 99-104 - Luis Buera, Antonio Miguel, Eduardo Lleida, Oscar Saz, Alfonso Ortega:
Robust speech recognition with on-line unsupervised acoustic feature compensation. 105-110 - Shun'ichi Yamamoto, Kazuhiro Nakadai, Mikio Nakano, Hiroshi Tsujino, Jean-Marc Valin, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno:
Design and implementation of a robot audition system for automatic speech recognition of simultaneous speech. 111-116 - Diego Giuliani, Fabio Brugnara:
Experiments on cross-system acoustic model adaptation. 117-122
Language Modeling and Speech Understanding
- Ye-Yi Wang:
Voice search - Information access via voice queries. 123 - Songfang Huang, Steve Renals:
Hierarchical Pitman-Yor language models for ASR in meetings. 124-129 - Xiao Li, Asela Gunawardana, Alex Acero:
Adapting grapheme-to-phoneme conversion for name recognition. 130-135 - Bo-June Paul Hsu:
Generalized linear interpolation of language models. 136-140 - Jiazhong Nie, Runxin Li, Dingsheng Luo, Xihong Wu:
Refine bigram PLSA model by assigning latent topics unevenly. 141-146 - Ahmad Emami, Lidia Mangu:
Empirical study of neural network language models for Arabic speech recognition. 147-152 - Xunying Liu, William J. Byrne, Mark J. F. Gales, Adrià de Gispert, Marcus Tomalin, Philip C. Woodland, Kai Yu:
Discriminative language model adaptation for Mandarin broadcast speech transcription and translation. 153-158 - Wen Wang, Andreas Stolcke, Jing Zheng:
Reranking machine translation hypotheses with structured and web-based language models. 159-164 - Ciro Martins, António J. S. Teixeira, João Paulo Neto:
Dynamic language modeling for a daily broadcast news transcription system. 165-170 - Jia Cui, Yi Su, Keith B. Hall, Frederick Jelinek:
Investigating linguistic knowledge in a maximum entropy token-based language model. 171-176 - Aaron Heidel, Lin-Shan Lee:
Robust topic inference for latent semantic language model adaptation. 177-182 - Alessandro Moschitti, Giuseppe Riccardi, Christian Raymond:
Spoken language understanding with kernels for syntactic/semantic structures. 183-188 - Yi-Ting Chen, Shih-Hsiang Lin, Hsin-Min Wang, Berlin Chen:
Spoken document summarization using relevant information. 189-194 - Justin Jian Zhang, Ricky Ho Yin Chan, Pascale Fung:
Improving lecture speech summarization using rhetorical information. 195-200 - Ani Nenkova, Dan Jurafsky:
Automatic detection of contrastive elements in spontaneous speech. 201-206 - Keelan Evanini, David Suendermann, Roberto Pieraccini:
Call classification for automated troubleshooting on large corpora. 207-212 - Ye-Yi Wang, Alex Acero:
Maximum entropy model parameterization with TF∗IDF weighted vector space model. 213-218 - Matthias H. Heie, Edward W. D. Whittaker, Josef R. Novak, Sadaoki Furui:
A language modeling approach to question answering on speech transcripts. 219-224 - Ghinwa F. Choueiter, Stephanie Seneff, James R. Glass:
Automatic lexical pronunciations generation and update. 225-230 - Mina Kim, Yoo Rhee Oh, Hong Kook Kim:
Non-native pronunciation variation modeling using an indirect data driven method. 231-236
Project Talks
- Jordan Cohen:
The GALE project: A description and an update. 237 - Steve Renals, Thomas Hain, Hervé Bourlard:
Recognition and understanding of meetings the AMI and AMIDA projects. 238-247 - Sadaoki Furui, Tetsunori Kobayashi:
Introduction of the METI project "development of fundamental speech recognition technology". 248
Statistical Modeling and Learning
- Jeff A. Bilmes:
Submodularity and adaptation. 249 - Deepu Vijayasenan, Fabio Valente, Hervé Bourlard:
Agglomerative information bottleneck for speaker diarization of meetings data. 250-255 - Themos Stafylakis, Vassilis Katsouros, George Carayannis:
Efficient combination of parametric spaces, models and metrics for speaker diarization1. 256-261 - Kyu Jeong Han, Samuel Kim, Shrikanth S. Narayanan:
Robust speaker clustering strategies to data source variation for improved speaker diarization. 262-267 - Jinyu Li, Zhi-Jie Yan, Chin-Hui Lee, Ren-Hua Wang:
A study on soft margin estimation for LVCSR. 268-271 - Hung-An Chang, James R. Glass:
Hierarchical large-margin Gaussian mixture models for phonetic classification. 272-277 - Qiang Fu, Biing-Hwang Juang:
Automatic speech recognition based on weighted minimum classification error (W-MCE) training method. 278-283 - Shih-Hung Liu, Fang-Hui Chu, Shih-Hsiang Lin, Hung-Shin Lee, Berlin Chen:
Training data selection for improving discriminative training of acoustic models. 284-289 - Peng Liu, Cong Liu, Hui Jiang, Frank K. Soong, Ren-Hua Wang:
A constrained line search approach to general discriminative HMM training. 290-295 - Yasuhiro Minami:
Mixture Gaussian HMM-trajctory method using likelihood compensation. 296-299 - Yi Liu, Fang Zheng, Lei He, Yunqing Xia:
State-dependent mixture tying with variable codebook size for accented speech recognition. 300-305 - Tara N. Sainath, Dimitri Kanevsky, Bhuvana Ramabhadran:
Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations. 306-311 - Yan Yin, Hui Jiang:
A compact semidefinite programming (SDP) formulation for large margin estimation of HMMS in speech recognition. 312-317 - Takahiro Shinozaki, Tatsuya Kawahara:
HMM training based on CV-EM and CV Gaussian mixture optimization. 318-322 - John R. Hershey, Peder A. Olsen, Steven J. Rennie:
Variational Kullback-Leibler divergence for Hidden Markov models. 323-328 - Stavros Tsakalidis, Spyros Matsoukas:
Bayesian adaptation in HMM training and decoding using a mixture of feature transforms. 329-334 - Chris D. Bartels, Jeff A. Bilmes:
Use of syllable nuclei locations to improve ASR. 335-340 - Ken Schutte, James R. Glass:
Speech recognition with localized time-frequency pattern detectors. 341-346 - Yun-Hsuan Sung, Constantinos Boulis, Christopher D. Manning, Dan Jurafsky:
Regularization, adaptation, and non-independent features improve hidden conditional random fields for phone classification. 347-352 - Andrej Ljolje, Vincent Goffin:
Discriminative training of multi-state barge-in models. 353-358 - Andrei Alexandrescu, Katrin Kirchhoff:
Graph-based learning for phonetic classification. 359-364
Keynote Talks
- Renato de Mori:
Spoken language understanding: a survey. 365-376 - Junichi Tsujii:
Combining statistical models with symbolic grammar in parsing. 377-378
Speech Translation and LVCSR
- Stephan Vogel:
Speech-translation: from domain-limited to domain-unlimited translation tasks. 379 - Chiori Hori, Bing Zhao, Stephan Vogel, Alex Waibel:
Consolidation based speech translation. 380-385 - George Saon, Michael Picheny:
Lattice-based Viterbi decoding techniques for speech translation. 386-389 - Krishna Subramanian, David Stallard, Rohit Prasad, Shirin Saleem, Prem Natarajan:
Semantic translation error rate for evaluating translation systems. 390-395 - Oliver Bender, Evgeny Matusov, Stefan Hahn, Sasa Hasan, Shahram Khadivi, Hermann Ney:
The RWTH Arabic-to-English spoken language translation system. 396-401 - Liang Wang, Eliathamby Ambikairajah, Eric H. C. Choi:
A comparisonal study of the multi-layer Kohonen self-organizing feature maps for spoken language identification. 402-407 - Bo Yin, Eliathamby Ambikairajah, Fang Chen:
A novel weighting technique for fusing Language Identification systems based on pair-wise performances. 408-412 - Martin Raab, Rainer Gruhn, Elmar Nöth:
Non-native speech databases. 413-418 - Frederik Stouten, Jean-Pierre Martens:
Dealing with cross-lingual aspects in spoken name recognition. 419-424 - Frank Diehl, Asunción Moreno, Enric Monte:
Crosslingual acoustic model development for automatics speech recognition. 425-430 - Rahul Chitturi, John H. L. Hansen:
Multi-stream dialect classification using SVM-GMM hybrid classifiers. 431-436 - Helen Mei-Ling Meng, Yuen Yee Lo, Lan Wang, Wing Yiu Lau:
Deriving salient learners' mispronunciations from cross-language phonological comparisons. 437-442 - Paul R. Dixon, Diamantino Caseiro, Tasuku Oonishi, Sadaoki Furui:
The Titech large vocabulary WFST speech recognition system. 443-448 - David Rybach, Stefan Hahn, Christian Gollan, Ralf Schlüter, Hermann Ney:
Advances in Arabic broadcast news transcription at RWTH. 449-454 - Björn Hoffmeister, Christian Plahl, Peter Fritz, Georg Heigold, Jonas Lööf, Ralf Schlüter, Hermann Ney:
Development of the 2007 RWTH Mandarin LVCSR system. 455-460 - John W. McDonough, Emilian Stoimenov, Dietrich Klakow:
An algorithm for fast composition of weighted finite-state transducers. 461-466 - Ricky Ho Yin Chan, Justin Jian Zhang, Pascale Fung, Lu Cao:
A Mandarin lecture speech transcription system for speech summarization. 467-471 - Bhuvana Ramabhadran, Olivier Siohan, Abhinav Sethy:
The IBM 2007 speech transcription system for European parliamentary speeches. 472-477 - Hui Lin, Jeff A. Bilmes, Dimitra Vergyri, Katrin Kirchhoff:
OOV detection by joint word/phone lattice alignment. 478-483 - Amarnag Subramanya, Chris D. Bartels, Jeff A. Bilmes, Patrick Nguyen:
Uncertainty in training large vocabulary speech recognizers. 484-489 - Mei-Yuh Hwang, Gang Peng, Wen Wang, Arlo Faria, Aaron Heidel, Mari Ostendorf:
Building a highly accurate Mandarin speech recognizer. 490-495
Spoken and Multi-Modal Dialogue Systems
- Sharon L. Oviatt:
Implicit user-adaptive system engagement in speech, pen and multimodal interfaces. 496-501 - Jason D. Williams:
Using particle filters to track dialogue state. 502-507 - Jason D. Williams:
A method for evaluating and comparing user simulations: The Cramér-von Mises divergence. 508-513 - Antoine Raux, Maxine Eskénazi:
A multi-layer architecture for semi-synchronous event-driven dialogue management. 514-519 - Tobias Cincarek, Hiromichi Kawanami, Hiroshi Saruwatari, Kiyohiro Shikano:
Development and portability of ASR and Q&A modules for real-environment speech-oriented guidance systems. 520-525 - Jost Schatzmann, Blaise Thomson, Steve J. Young:
Error simulation for training statistical dialogue systems. 526-531 - Sebastian Varges, Giuseppe Riccardi:
A data-centric architecture for data-driven spoken dialog systems. 532-537 - Cheongjae Lee, Sangkeun Jung, Donghyeon Lee, Gary Geunbae Lee:
Example-based error recovery strategy for spoken dialog system. 538-543 - Yi-Cheng Pan, Lin-Shan Lee:
Type-II dialogue systems for information access from unstructured knowledge sources. 544-549 - Fabrice Lefèvre, Renato de Mori:
Unsupervised state clustering for stochastic dialog management. 550-555 - Jussi Leppänen, Jilei Tian:
Dynamic vocabulary prediction for isolated-word dictation on embedded devices. 556-561 - Yi Wu, Rong Zhang, Alexander I. Rudnicky:
Data selection for speech recognition. 562-565 - Sabato Marco Siniscalchi, Torbjørn Svendsen, Chin-Hui Lee:
Towards bottom-up continuous phone recognition. 566-569 - Qiang Fu, Biing-Hwang Juang:
A study on rescoring using HMM-based detectors for continuous speech recognition. 570-575 - Yu Qiao, Satoshi Asakawa, Nobuaki Minematsu:
Random discriminant structure analysis for automatic recognition of connected vowels. 576-581 - Abhijeet Sangwan, John H. L. Hansen:
Phonological feature based variable frame rate scheme for improved speech recognition. 582-586 - Yuan-Fu Liao, Jia Jang Tu, Sen-Chia Chang, Chin-Hui Lee:
An enhanced minimum classification error learning framework for balancing insertion, deletion and substitution errors. 587-590 - Huiqun Deng, Douglas D. O'Shaughnessy, Jean-Guy Dahan, William F. Ganong III:
Interpolative variable frame rate transmission of speech features for distributed speech recognition. 591-595 - Björn W. Schuller, Bogdan Vlasenko, Ricardo Minguez, Gerhard Rigoll, Andreas Wendemuth:
Comparing one and two-stage acoustic modeling in the recognition of emotion in speech. 596-600 - Teppei Nakano, Shinya Fujie, Tetsunori Kobayashi:
Extensible speech recognition system using proxy-agent. 601-606 - Norihide Kitaoka, Kazumasa Yamamoto, Tomohiro Kusamizu, Seiichi Nakagawa, Takeshi Yamada, Satoru Tsuge, Chiyomi Miyajima, Takanobu Nishiura, Masato Nakayama, Yuki Denda, Masakiyo Fujimoto, Tetsuya Takiguchi, Satoshi Tamura, Shingo Kuroiwa, Kazuya Takeda, Satoshi Nakamura:
Development of VAD evaluation framework CENSREC-1-C and investigation of relationship between VAD and speech recognition performance. 607-612
Speech Mining and Information Retrieval
- Mark Clements, Marsal Gavaldà:
Voice/audio information retrieval: minimizing the need for human ears. 613-623 - César González Ferreras, Valentín Cardeñoso-Payo:
A system for speech driven information retrieval. 624-628 - Frank Seide, Peng Yu, Yu Shi:
Towards spoken-document retrieval for the enterprise: Approximate word-lattice indexing with text indexers. 629-634 - Sha Meng, Peng Yu, Frank Seide, Jia Liu:
A study of lattice-based spoken term detection for Chinese spontaneous speech. 635-640 - Brett Matthews, Upendra V. Chaudhari, Bhuvana Ramabhadran:
Fast audio search using vector space modelling. 641-646 - Sophie Rosset, Olivier Galibert, Gilles Adda, Éric Bilinski:
The LIMSI QAst systems: Comparison between human and automatic rules generation for question-answering on speech transcriptions. 647-652 - Feifan Liu, Yang Liu:
Soundbite identification using reference and automatic transcripts of broadcast news speech. 653-658 - Timothy J. Hazen, Fred Richardson, Anna Margolis:
Topic identification from audio recordings using word and phone recognition lattices. 659-664 - Upendra V. Chaudhari, Michael Picheny:
Improvements in phone based audio search via constrained match with high order confusion estimates. 665-670 - Michael Levit, Dilek Hakkani-Tür, Gökhan Tür, Daniel Gillick:
Integrating several annotation layers for statistical information distillation. 671-676 - Yi-Cheng Pan, Hung-lin Chang, Lin-Shan Lee:
Analytical comparison between position specific posterior lattices and confusion networks based on words and subword units for spoken document indexing. 677-682 - Scott Otterson, Mari Ostendorf:
Efficient use of overlap information in speaker diarization. 683-686 - Wooil Kim, John H. L. Hansen:
Speechfind for CDP: Advances in spoken document retrieval for the U. S. collaborative digitization program. 687-692 - Yan Huang, Oriol Vinyals, Gerald Friedland, Christian A. Müller, Nikki Mirghafori, Chuck Wooters:
A fast-match approach for robust, faster than real-time speaker diarization. 693-698 - Konstantin Markov, Satoshi Nakamura:
Never-ending learning system for on-line speaker diarization. 699-704 - Vishwa Gupta, Patrick Kenny, Pierre Ouellet, Gilles Boulianne, Pierre Dumouchel:
Multiple feature combination to improve speaker diarization of telephone conversations. 705-710 - Abhishek Chandel, Abhinav Parate, Maymon Madathingal, Himanshu Pant, Nitendra Rajput, Shajith Ikbal, Om Deshmukh, Ashish Verma:
Sensei: Spoken language assessment for call center agents. 711-716 - Korbinian Riedhammer, Georg Stemmer, Tino Haderlein, Maria Schuster, Frank Rosanowski, Elmar Nöth, Andreas K. Maier:
Towards robust automatic evaluation of pathologic telephone speech. 717-722
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.