Seyyed Saeed Sarfjoo

Ph.D. Computer Science

Martigny, Switzerland
+41779848152
saeed.sarfjoo@idiap.ch

Summary

Seyyed Saeed Sarfjoo received the B.Sc. degree in information technology from Isfahan University, Isfahan, Iran, in 2009 and the M.Sc. degree in information technology from Qom University, Qom, Iran, in 2012. In 2009, he joined Asr Gooyesh Pardaz Co, Tehran, Iran, as a Software Developer/Researcher. His career in this company mostly focused on Persian text-to-speech, Persian speech recognition, and Persian interactive voice response systems. In 2013, he started the Ph.D. in computer science at Ozyegin University, Istanbul, Turkey, and received the Ph.D. degree in August 2017. In that period, he was also a graduate research assistant at the Speech Processing Lab of Ozyegin University. In 2017, he joined National Institute of Informatics, Tokyo, Japan as a visiting Ph.D. student. He joined the biometric and speech groups of Idiap research institute, Martigny, Switzerland in 2018 and 2020, respectively. His research interest includes speech recognition, speaker recognition and speech synthesis.

Education

Doctor of Philosophy in Computer Science

20132017

Ozyegin University, Istanbul, Turkey

Thesis Title : Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation with Limited Data

Master of Science in Information Technology

20092012

Qom University, Qom, Iran

Thesis Title : A New Framework for Information Retrieval to use in Persian Spoken Document Retrieval

Bachelor of Science in Information Technology

20042009

Isfahan University, Isfahan, Iran

Research Interests

Speech recognition
Speaker adaptation
Speaker verification
Spoofing attack
Speech synthesis

Selected Publications

Zuluaga-Gomez. J., Sarfjoo S.S., Prasad A., Nigmatulina I., Motlicek P., et al., 2021 BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection and Role Identification of Air-Traffic Communications, Idiap-RR-15-2021
Ohneiser O., Sarfjoo S.S., Helmke H., Shetty S., Motlicek P., et al., 2021. Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, Interspeech 2021, pp.3291--3295
Sarfjoo S.S., Madikeri S. and Motlicek P., 2021. Speech Activity Detection Based on Multilingual Speech Recognition System. Interspeech 2021, pp.4369--4373
Fabien M., Sarfjoo S.S., Motlicek P., and Madikeri S., 2021. Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data. 1st ISCA Symposium on Security and Privacy in Speech Communication, 2021
Prasad A., Zuluaga-Gomez J., Motlicek P., Ohneiser O., Helmke H., Nigmatulina I., Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR, arXiv preprint arXiv:2108.12175
Sarfjoo S.S., Madikeri S., Motlicek P. and Marcel S., 2020. Supervised domain adaptation for text-independent speaker verification using limited data, Interspeech 2020, pp.3815-3819.
Zuluaga-Gomez J., et al. 2020. Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications. Multidisciplinary Digital Publishing Institute Proceedings. Vol. 59. No. 1. 2020.
Sarfjoo S.S., Madikeri S., Hajibabaei M., Motlicek P. and Marcel S., 2019. Idiap submission to the NIST SRE 2019 Speaker Recognition Evaluation. Idiap, Rue Marconi 19, 1920 Martigny, Idiap-RR Idiap-RR-15-2019, 11 2019
Madikeri S., Sarfjoo S.S., Motlicek P. and Marcel S., 2019. Idiap submission to the NIST SRE 2018 Speaker Recognition Evaluation. Idiap, Rue Marconi 19, 1920 Martigny, Idiap-RR Idiap-RR-17-2019, 11 2019
Sarfjoo S.S., Magimai.-Doss M. and Marcel S., 2019. Domain Adaptation and Investigation of Robustness of DNN-based Embeddings for Text-Independent Speaker Verification Using Dilated Residual Networks. Idiap, Rue Marconi 19, 1920 Martigny, Idiap-RR Idiap-RR-10-2019, 10 2019
Sarfjoo S.S., Wang X., Henter G.E., Lorenzo-Trueba J., Takaki S. and Yamagishi J., 2019. Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model. arXiv preprint arXiv:1911.03952.
Sarfjoo S.S. and Yamagishi J., 2018. SUPERSEDED-Device Recorded VCTK (Small subset version). Data Share in University of Edinburgh. School of Informatics. The Centre for Speech Technology Research (CSTR).
Sarfjoo, S.S., Demiroglu, C. and King, S., 2017. Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation with Limited Data. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 25(4), pp.839-851.
Sarfjoo, S.S. and Demiroglu, C., 2016. Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data. Interspeech 2016, pp.317-321.
Khodabakhsh, A., Sarfjoo, S.S., Uludag, U., Soyyigit, O. and Demiroglu, C., 2016. Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems. arXiv preprint arXiv:1608.02272.
Mohammadi, A., Sarfjoo, S.S. and Demiroglu, C., 2014. Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(12), pp.2146-2157.

Honors and Awards

NII Grant, Feb 2017 - July 2017, Tokyo, Japan
ISCA Grant for Interspeech 2016 conference in San Francisco, Ca, USA
Marie Curie International Reintegration Grant CLSASTS, 2014 - 2017
TUBITAK Research Grant, Feb 2013 - 2017
Full scholarship, Özyeğin University
Best technical expert in Asr-Gooyesh Pardaz Co for two years, 2011 - 2013

Work experience

Postdoctoral Researcher

2018Present

Idiap Research Institute, Martigny, Switzerland

Contribution on ATC projects for improving the ASR accuracy in air traffic controller communication
- ATC projects include several sub-projects including H2020 project HAAWAII and other domain-specific projects (SOL97, SOL96, STARFISH, and COMINT)
- HAAWAII project:
  - Description: The Horizon 2020 funded HAAWAII project develops a reliable, error-resilient and adaptable solution to automatically transcribe voice commands from air traffic controllers (ATCO) and pilots. Using machine learning, the project builds on very large collections of speech data, organized with a minimum expert effort, to develop a new set of speech recognition models for the complex ATM environments of the London terminal area (TMA) and Icelandic enroute airspace.
  - Link: https://www.haawaii.de/wp
  - Contribution: working on VAD, diarization, and ASR systems
- SOL96 and SOL97 projects:
  - Description: The data sets for these projects have been recorded and collected in the course of three SESAR-2020 funded industrial research projects PJ.10-W2-96 (“SOL96”), PJ.16-W1-044, and PJ.05-W2-975 (both combined and called “SOL97”). SOL96 aims to reduce ATCOs’ workload with an ASR-supported aircraft radar label. SOL97 aims to reduce ATCO’s workload in an ATC tower environment, including the speech of ATCOs from the Lithuanian ANSP.
  - Links: https://www.sesarju.eu/projects/cwphmi and https://www.remote-tower.eu/wp/project-pj05-w2/solution-97-2/
  - Contribution: Working on providing software for data collection, segmentation, pre-transcription, and iterative improving the ASR system including OOV integration, using semi-supervised models, AM and LM adaptation, and making dynamic graphs using contextual data
- STARFISH project:
  - Description: Starfish is an ongoing project aiming to create and improve ABSR for the airport Fraport, Germany. The collected data is ATCO speech recorded in the ATC operations room during simulations at Fraport.
  - Contribution: working on VAD, text normalization, and integration of online ASR pipeline.
- COMINT project:
  - Description: Comint is an ongoing project aiming to create and improve ASR for air traffic management in the Bern and Zurich airports.
  - Contribution: working on VAD, in domain noise augmentation for ASR training, and command prediction models using surveillance data.
Contribution with Uniphore Company for developing the state-of-the-art automatic speech recognition (ASR) models for low-resource Vietnamese and Bahasa Indonesian languages
- Iterative improving the ASR model with lexicon and data extension, in-domain noise augmentation, advance semi-supervised learning, and text normalization techniques
Contribution with Uniphore Company for applying the contextual-based word boosting methods for ASR on the conversational telephony speech data (for call centers)
Working on single- and multi-task ASR-based voice activity detection
Working on speaker diarization (aIB, AHC, VBx, and BERT-based model, ...)
Have a contribution to Idiap NIST SRE 2018, 2019, and 2020 submissions
Working on new DNN-based methods for text-independent speaker recognition
Have a contribution to Bob signal-processing and machine learning toolbox https://www.idiap.ch/software/bob/

Programmer and Researcher

20172018

Asr Gooyesh Pardaz Co, Tehran, Iran

Ariana project (Persian text to speech system)
- Implementing Glottal-HMM vocoder for Persian text-to-speech synthesis
- Increasing the Ariana's lexicon for covering the non-formal and new words
- Implementing the SDK version of Ariana and applying the software security

Visiting Ph.D. Student

20172017

National Institute of Informatics, Tokyo, Japan

Working on unsupervised speaker adaptation and speech enhancement for DNN-based speech synthesis under the supervision of Prof. Junichi Yamagishi

Unsupervised speaker adaptation for DNN-based speech synthesis
Speech enhancement for device recorded audio files
The source code of GAN-based method for speech enhancement can be found in https://github.com/ssarfjoo/improvedsegan

Research Assistant

20132017

Speech processing laboratory, Ozyegin University, Istanbul, Turkey

Worked on several speech processing tasks including:

Synthetic speech generation and speaker adaptation for HMM-based speech synthesis systems (Using Python)
- Worked with HTS toolkit, MLSA, STRAIGHT, and World vocoders
Cross-lingual speaker adaptation for HMM-based speech synthesis systems between Turkish and English languages (Using Python)
Text-independent speaker recognition systems (Using MATLAB)
- Experienced with GMM-UBM, JFA, TVS, and PLDA technologies.
Spoofing detection for speaker verification systems (Using MATLAB)
Implementing Unit Selection speech synthesis system
- Worked with MaryTTS system
Recording and preparing Bi-Lingual Turkish-English corpus for cross-lingual adaptation.
Predicting the relationship quality in recently married couples (Using MATLAB)
- Organized locally recorded database.
- Used a wide range of speech- and text-based features and various machine learning algorithms (e.g.
  SVM, Decision Trees, Random Forests, kNN, and Naive Bayes) and toolkits (e.g. MSSV, OpenSmile, Praat, FEAST, and Audioseg)

Programmer and Researcher

20092013

Asr Gooyesh Pardaz Co, Tehran, Iran

Ariana project (Persian text to speech system), 2009-2013
- Implementing desktop, Android, WCF-based service and SAPI engine (compatible with screen reader applications) versions of Ariana software.
- Worked on NLP, database, UI, accessibility, and software security with use of C#, C++, java, and Assembly languages.
Nevisa project (Persian speech recognition system), 2011-2013
- Implementing desktop and WCF-based service versions of Nevisa software.
- Worked on database, UI, and software security with use of C#, C++, C, and Assembly languages.
Niusha project (Persian interactive voice response system), 2012-2013
- Worked on UI, service monitoring, and VoIP-based IVR with use of C#, Visual C++.Net, and C++ languages.
- Project manager (Contains Elastics-based IVR, VoIP-based IVR, VXML generator, Designer, Monitoring, and UI sub systems)

Languages

Persian (Native)
English (Full professional proficiency)
Turkish (Elementary level)
French (Elementary level)

Technical Skills

DNN Toolkits
- Torch
- Pytorch
- Tensorflow
- Theano
- PyOpenCL
- CURRENNT
Speech Toolkits
- Festival, Speech Tools, Flite
- HMM Toolkit (HTK), HMM-based Speech Synthesis System (HTS)
- Merlin, Kaldi, Bob
Programming Skills
- C, C++, Java, C#, MATLAB, Python, Perl
- ASP.NET

Mobile Programming
- Android

References

References available upon request.