Speech and language technology: People are not dictionaries

with Mark Liberman

As a researcher and research manager at AT&T Bell Labs from 1975-1990, as a member of Penn's faculty since 1990, and as founder and director of the Linguistic Data Consortium since 1992, Liberman has participated actively in the evolution of speech and language research towards a model of quantitative, replicable studies based on published datasets. In its 26 years of existence, the LDC has distributed more than 160,000 copies of nearly 3,000 datasets to more than 5,600 research organizations in 92 countries.

In his personal research, a key focus has been the scientific application of techniques from machine learning and human language technology to very large speech collections, in the range of dozens to thousands of hours and involving up to tens of thousands of speakers. Scientific application areas include phonetics, psychology of language, sociolinguistics, and clinical diagnosis and monitoring. The last category includes current collaborations on speech, language, and communicative interaction in Autism Spectrum Disorder, Frontotemporal Degeneration, and Alzheimer's Disease.

During this conversation, I ask Mark how the marriage between linguistics and computer science works today and has worked since the early days of this field, before it was called so. What skills are young students equipped with, and what applications computational linguistics has today. I also ask trivial questions like "how many languages are there in the world?" and you never get a trivial answer from a world class expert like Mark. I have learnt so much from this conversation and I hope you will too! My new favourite quote is: "A language is a dialect with an army and a navy".

Mark Liberman and Geoffrey K. Pullum, Far from the Madding Gerund: and Other Dispatches from the Language Log. 2006, William, James, and Co. ISBN 1-59028-055-5.

More detailed info can be found here: https://www.ling.upenn.edu/~myl/LibermanCV.html