| Whiting
School of
Engineering
1996 Annual
Report Cover Page
Table of Contents
Report from the
Dean
Highlights
Statistical Profile
Awards and
Distinctions
Biomedical
Engineering
Chemical
Engineering
Civil Engineering
Computer Science
Electrical and
Computer
Engineering
Geography and
Environmental
Engineering
Materials Science
and Engineering
Mathematical
Sciences
Mechanical
Engineering
Center for Language
and Speech
Processing
Center for
Nondestructive
Evaluation
Chemical Propulsion
Information Agency
Instructional
Television Facility
Part-Time Programs
in Engineering and
Applied Science
Teaching and
Research Initiatives
Reasons to Celebrate
Corporation,
Foundation, and
Organization
Support
Grants and Contracts
Publications
Administration and
Committees
|
Workshop Draws Internationally Known Experts
For six weeks during July and August 1995, the Center for Language and Speech
Processing (CLSP) was home to 24 scientists who attended LM95, a language
modeling research workshop. A language model is that component of an
automatic speech recognizer (transcriber) which, knowing (hypothesizing) what
was said before, predicts what is likely to be said next. The workshop was the
third in a series sponsored by the federal government and the first hosted by
CLSP. The participants represented academia, industry, and the U.S. government
and included scholars from France, Germany, and Spain. Frederick Jelinek, director
of the Center, chaired the workshop, and Eric Brill, assistant professor of computer
science, and William Byrne, associate research scientist in CLSP, also represented
the Whiting School as participants.
Participants at LM95 were divided into four teams with each team having an
assigned project leader and associated research goal: Spanish: explore language
modeling techniques for the recognition of unrestricted, conversational Spanish
over telephone channels; Language Modeling for Spontaneous Speech: study
systematically the baseline system and associated error analysis; Fast Training
and Portability: make better use of small data sets; and Phrase Structure Language
Models: improve the language model by incorporating linguistic structure. In
addition, two special weeks were organized with dedicated topics and invited
experts in the fields of linguistics and information theory. Lastly, eight guest
participants gave presentations at various times during the workshop.
A Collector of Words
David Yarowsky, assistant professor of computer science and member of the
Center for Language and Speech Processing (CLSP), needs words. Lots of them.
Three to five billion, in fact. Along with Frederick Jelinek, CLSP director, Eric Brill,
also an assistant professor of computer science, and Sanjeev Khudanpur, a CLSP
research scientist, Yarowsky will use the tremendous word bank he is developing
and sophisticated programming to improve the performance of language modeling
systems.
In the past, researchers have used limited models of simple word sequences to
guide systems such as speech recognizers. The CLSP team is now developing
methods that use far richer contextual clues, including long-distance word
dependencies, syntactic structure, and topic analysis to predict which words a
person has spoken. This research is driven by the word association patterns
observed in the huge text database. Yarowsky is also investigating ways in which
the semantic classification of words in context can be learned automatically from
these patterns.
With the text database currently holding about 600 million words, Yarowsky isnt
terribly restrictive about the sources of information that pour into it, but he does
seek a balance of styles. Can you imagine trying to predict the flow of a natural
conversation if all you were exposed to were scientific journal articles? Yarowsky
asks. The word bank contains newswire reports, conversational speech
transcriptions, email messages, text from the Internet, and scientific papers.
Yarowsky hopes to make an annotated version of the collection available to
researchers and others via the World Wide Web.
Established 1992
Phone 410-516-4237
Email clsp@jhu.edu
WWW http://www.clsp.jhu.edu/
Affiliated Researchers
Frederick Jelinek, Director
Paul Smolensky, Assistant Director
Biomedical Engineering
John Heinz, Murray B. Sachs, Eric D. Young
Cognitive Science
Michael R. Brent, Luigi Burzio, Robert Frank,
Paul Smolensky
Computer Science
Eric Brill, Steven L. Salzberg, David Yarowsky
Electrical and Computer Engineering
Andreas G. Andreou, William J. Byrne, Gert Cauwenberghs, Frederick Jelinek,
Sanjeev Khudanpur
Mathematical Sciences
Lenore J. Cowen, Carey E. Priebe, Colin O. Wu
Psychology
Peter Jusczyk
Research Areas
Analog and Digital VLSI
Computational Foundations of Grammatical Theory
Corpus Based Natural Language Processing
Information Theory
Language Acquisition and Computational Psycholinguistics
Language Modeling
Machine Learning of Natural Language
Neural Auditory Processing
Pronunciation Modeling
Semantic Analysis and Classification of Text
Speech Signal Processing
Statistical and Combinatorial Data Clustering
|