Advancements and Challenges in Speech Technologies and Arabic Speech Processing

0
429

May 16, 2024
9:00 a.m. – 12:00 p.m.
Auditorium between building 4 and 5

Join us for an enlightening seminar series featuring two distinguished speakers, Pedro Moreno and Ahmed Ali, who will share their insights on the evolution and future of Speech Technologies and Arabic Speech Processing respectively.

Our first speaker, Pedro Moreno, has led the Speech R&D Team at Google for over two decades. In his talk titled “Past, Present and Future of Speech Technologies,” he will share his personal views on the evolution of speech technologies, their impact on society, and the current revolution brought about by Large Language Models. Pedro will also discuss the interplay between foundational and applied research and the role of senior leads in nurturing the next generation of scientists. 

Our second speaker, Ahmed Ali, is the head of FANAR Lab, an Arabic Generative AI Lab at Qatar Computing Research Institute (QCRI). In his talk titled “Arabic Speech Processing: Challenges and Opportunities,” Ahmed will delve into the latest research within the ArabicSpeech project, discussing the specific challenges encountered in dialectal Arabic research and the development of novel methodologies and techniques. He will also share insights on potential regional impact and stimulate discussion on future directions and potential collaborations.

Don’t miss this opportunity to learn from these experts and gain a deeper understanding of the advancements and challenges in the field of speech technologies. We look forward to your participation.

Event agenda

Speaker 1:  Pedro Moreno
Time: 9:00 – 10:30 a.m.
Title: Past, Present and Future of Speech Technologies

Abstract

Speech Technologies started with the dream of machines that could speak and understand humans. Starting with Voder at the NYC Worlds Fair in 1939 and concluding with the current explosion of speech technologies in our everyday life. Technologies such as voice search, voice to voice translation, automatic captions of videos, meetings, summarization of calls, audio mining, speaker verification, etc. are pervasive in our daily lives. 

How did this start, how has it evolved and what is the future? In this talk, I’ll give my personal view of where we are and where we are going, describe a little bit my career and contributions in this journey and conclude with some thoughts about how the current Large Language Models revolution will affect speech science and how speech technologies can affect society in positive and negative ways.

In this talk, I’ll also muse about the role of foundational research vs applied research, the interplay between the two based on my experience and the role of senior leads in preparing the next generation of scientists, whether this happens in academic or industry environments.

Biography notes

 Pedro J. Moreno has led the Speech R&D Team at Google (100 researchers and engineers) until April 2024. In his more than 20 years of experience in the field of speech science he has led projects in:

  • Speech to Speech realtime translation
    Noise Robustness in ASR
  • Multimedia search engines (SpeechBot)
  • Multimedia Machine Learning (Audio, Music, etc)
  • Internationalization of ASR, leading the large expansion of languages of google voice search to more than 100 languages
  • Development of contextual modeling in ASR
  • Development of multilingual and universal ASR systems
  • Development of speech technologies for impaired speakers
  • Research into the use of speech input for LLMs
  • genAI detection for trust and safety

Pedro started his career in speech science with a MS in Telecommunications Engineering from Universidad Politécnica de Madrid. He then interned for  2 years at Bell Laboratories working in ASR and speech to speech translation.  After that he was awarded a Fulbright Scholarship to continue his Ph.D. Studies at Carnegie Mellon University. Pedro started his professional career at HP Labs and then joined google research in 2004. His research focus has always been the application of foundational ideas to products to solve user needs. 

Speaker 2: Ahmed Ali
Time: 10:30 – 11:30 a.m.
Title: Arabic Speech Processing: Challenges and Opportunities

Abstract

In recent years, the field of Arabic speech processing has experienced substantial advancements, significantly driven by the integration of deep learning and sizable, diverse datasets that encompass both spoken and written forms. The current focus is to improve the accuracy of Modern Standard Arabic in Automatic Speech Recognition and Text to Speech, with limited attention to dialectal Arabic and even less attention to applications such as teaching Arabic to non-native speakers. In this talk, I will delve deeply into the latest research within the ArabicSpeech project, discussing the specific challenges we have encountered in our research on dialectal Arabic, such as multi-dialectal processing and code-switching, and highlighting how these obstacles have driven the development of novel methodologies and techniques. Our efforts have focused on pioneering approaches to creating practical, scalable speech-processing solutions that are effective even without balanced data. Furthermore, I will share our results and insights gained from exploring these techniques, demonstrating their potential regional impact. This presentation will shed light on our research progress and stimulate discussion on future directions and potential collaborations, especially in underrepresented areas such as dialect recognition and educational applications.

Biography notes

Ahmed is the head of FANAR Lab, an Arabic Generative AI Lab at Qatar Computing Research Institute (QCRI). Founder of ArabicSpeech.org. Co-founder and the Chief Scientist of KANARI AI. Twenty years of experience with high impact in research and industry. Main expertise is speech processing and Natural Language Processing (NLP). Proven capability of performing high quality research in more than 90 peer-reviewed publications in top-tier conferences, journals, and patents. High ability to generate novel ideas and bring research through tech-transfer illustrated in KANARI, a multi-million-dollar start-up. Experience in big corporations such as IBM and Nuance, and startups such as SpinVox and KANARI. Advisor for the UN, ESCWA, Aljazeera, BBC, and DW among others. Large impact on the speech community is shown in numerous articles featuring his research, including MIT Tech Review, BBC, and Speechmagazine. General Chair for the first IEEE speech conference in the Middle East. High-quality teaching and mentoring skills demonstrated in running the annual ArabicSpeech, spoken languages of the world committee, speech hackathon, and John Hopkins Summer school JSALT 2022. Ahmed is known for his leadership in Speech and NLP. He won the prestigious Qatar Foundation Best Innovation Award in 2018 and the World Summit Awards in 2024.

LEAVE A REPLY