Download PDF

Summary

Seyyed Saeed Sarfjoo received the B.Sc. degree in information technology from Isfahan University, Isfahan, Iran, in 2009 and the M.Sc. degree in information technology from Qom University, Qom, Iran, in 2012. In 2009, he joined Asr Gooyesh Pardaz Co, Tehran, Iran, as a Software Developer/Researcher. His career in this company mostly focused on Persian text-to-speech, Persian speech recognition, and Persian interactive voice response systems. In 2013, he started the Ph.D. in computer science at Ozyegin University, Istanbul, Turkey, and received the Ph.D. degree in August 2017. In that period, he was also a graduate research assistant at the Speech Processing Lab of Ozyegin University. In 2017, he joined National Institute of Informatics, Tokyo, Japan as a visiting Ph.D. student. He joined the biometric and speech groups of Idiap research institute, Martigny, Switzerland in 2018 and 2020, respectively. His research interest includes speech recognition, speaker recognition and speech synthesis. 

Education

Doctor of Philosophy in Computer Science

20132017
Ozyegin University, Istanbul, Turkey

Thesis Title : Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation with Limited Data

Master of Science in Information Technology

20092012
Qom University, Qom, Iran

Thesis Title : A New Framework for Information Retrieval to use in Persian Spoken Document Retrieval

Bachelor of Science in Information Technology

20042009
Isfahan University, Isfahan, Iran

Research Interests

  • Speech recognition
  • Speaker adaptation
  • Speaker verification
  • Spoofing attack
  • Speech synthesis

Selected Publications

  • Zuluaga-Gomez. J., Sarfjoo S.S., Prasad A., Nigmatulina I., Motlicek P., et al., 2021 BERTraffic: A Robust BERT-Based Approach for Speaker Change Detection and Role Identification of Air-Traffic Communications, Idiap-RR-15-2021
  • Ohneiser O., Sarfjoo S.S., Helmke H., Shetty S., Motlicek P., et al., 2021. Robust Command Recognition for Lithuanian Air Traffic Control Tower Utterances, Interspeech 2021, pp.3291--3295
  • Sarfjoo S.S., Madikeri S. and Motlicek P., 2021. Speech Activity Detection Based on Multilingual Speech Recognition System. Interspeech 2021, pp.4369--4373
  • Fabien M., Sarfjoo S.S., Motlicek P., and Madikeri S., 2021. Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data. 1st ISCA Symposium on Security and Privacy in Speech Communication, 2021
  • Prasad A., Zuluaga-Gomez J., Motlicek P., Ohneiser O., Helmke H., Nigmatulina I., Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR, arXiv preprint arXiv:2108.12175
  • Sarfjoo S.S., Madikeri S., Motlicek P. and Marcel S., 2020. Supervised domain adaptation for text-independent speaker verification using limited data, Interspeech 2020, pp.3815-3819.
  • Zuluaga-Gomez J., et al. 2020. Automatic Call Sign Detection: Matching Air Surveillance Data with Air Traffic Spoken Communications. Multidisciplinary Digital Publishing Institute Proceedings. Vol. 59. No. 1. 2020.
  • Sarfjoo S.S., Madikeri S., Hajibabaei M., Motlicek P. and Marcel S., 2019. Idiap submission to the NIST SRE 2019 Speaker Recognition Evaluation. Idiap, Rue Marconi 19, 1920 Martigny, Idiap-RR Idiap-RR-15-2019, 11 2019
  • Madikeri S., Sarfjoo S.S., Motlicek P. and Marcel S., 2019. Idiap submission to the NIST SRE 2018 Speaker Recognition Evaluation. Idiap, Rue Marconi 19, 1920 Martigny, Idiap-RR Idiap-RR-17-2019, 11 2019
  • Sarfjoo S.S., Magimai.-Doss M. and Marcel S., 2019. Domain Adaptation and Investigation of Robustness of DNN-based Embeddings for Text-Independent Speaker Verification Using Dilated Residual Networks. Idiap, Rue Marconi 19, 1920 Martigny, Idiap-RR Idiap-RR-10-2019, 10 2019
  • Sarfjoo S.S., Wang X., Henter G.E., Lorenzo-Trueba J., Takaki S. and Yamagishi J., 2019. Transformation of low-quality device-recorded speech to high-quality speech using improved SEGAN model. arXiv preprint arXiv:1911.03952.
  • Sarfjoo S.S. and Yamagishi J., 2018. SUPERSEDED-Device Recorded VCTK (Small subset version). Data Share in University of Edinburgh. School of Informatics. The Centre for Speech Technology Research (CSTR).
  • Sarfjoo, S.S., Demiroglu, C. and King, S., 2017. Using Eigenvoices and Nearest-Neighbors in HMM-Based Cross-Lingual Speaker Adaptation with Limited Data. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 25(4), pp.839-851.
  • Sarfjoo, S.S. and Demiroglu, C., 2016. Cross-Lingual Speaker Adaptation for Statistical Speech Synthesis Using Limited Data. Interspeech 2016, pp.317-321.
  • Khodabakhsh, A., Sarfjoo, S.S., Uludag, U., Soyyigit, O. and Demiroglu, C., 2016. Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems. arXiv preprint arXiv:1608.02272.
  • Mohammadi, A., Sarfjoo, S.S. and Demiroglu, C., 2014. Eigenvoice speaker adaptation with minimal data for statistical speech synthesis systems using a MAP approach and nearest-neighbors. Audio, Speech, and Language Processing, IEEE/ACM Transactions on, 22(12), pp.2146-2157.

Honors and Awards

  • NII Grant, Feb 2017 - July 2017, Tokyo, Japan
  • ISCA Grant for Interspeech 2016 conference in San Francisco, Ca, USA
  • Marie Curie International Reintegration Grant CLSASTS, 2014 - 2017 
  • TUBITAK Research Grant, Feb 2013 - 2017
  • Full scholarship, Özyeğin University
  • Best technical expert in Asr-Gooyesh Pardaz Co for two years, 2011 - 2013

Work experience

Postdoctoral Researcher

2018Present
Idiap Research Institute, Martigny, Switzerland
  • Contribution on ATC projects for improving the ASR accuracy in air traffic controller communication
    • ATC projects include several sub-projects including H2020 project HAAWAII and other domain-specific projects (SOL97, SOL96, STARFISH, and COMINT)
    • HAAWAII project:
      • Description:  The Horizon 2020 funded HAAWAII project develops a reliable, error-resilient and adaptable solution to automatically transcribe voice commands from air traffic controllers (ATCO) and pilots. Using machine learning, the project builds on very large collections of speech data, organized with a minimum expert effort, to develop a new set of speech recognition models for the complex ATM environments of the London terminal area (TMA) and Icelandic enroute airspace.
      • Link: https://www.haawaii.de/wp
      • Contribution: working on VAD, diarization, and ASR systems
    • SOL96 and SOL97 projects:
      • Description: The data sets for these projects have been recorded and collected in the course of three SESAR-2020 funded industrial research projects PJ.10-W2-96 (“SOL96”), PJ.16-W1-044, and PJ.05-W2-975 (both combined and called “SOL97”). SOL96 aims to reduce ATCOs’ workload with an ASR-supported aircraft radar label.  SOL97 aims to reduce ATCO’s workload in an ATC tower environment, including the speech of ATCOs from the Lithuanian ANSP.
      • Links: https://www.sesarju.eu/projects/cwphmi and https://www.remote-tower.eu/wp/project-pj05-w2/solution-97-2/  
      • Contribution:  Working on providing software for data collection, segmentation, pre-transcription, and iterative improving the ASR system including OOV integration, using semi-supervised models, AM and LM adaptation, and making dynamic graphs using contextual data
    • STARFISH project:
      • Description: Starfish is an ongoing project aiming to create and improve ABSR for the airport Fraport, Germany. The collected data is ATCO speech recorded in the ATC operations room during simulations at Fraport.
      •  Contribution: working on VAD, text normalization, and integration of online ASR pipeline.
    • COMINT project:
      • Description: Comint is an ongoing project aiming to create and improve ASR for air traffic management in the Bern and Zurich airports. 
      • Contribution: working on VAD, in domain noise augmentation for ASR training, and command prediction models using surveillance data.
  • Contribution with Uniphore Company for developing the state-of-the-art automatic speech recognition (ASR) models for low-resource Vietnamese and Bahasa Indonesian languages
    • Iterative improving the ASR model with lexicon and data extension, in-domain noise augmentation, advance semi-supervised learning, and text normalization techniques 
  • Contribution with Uniphore Company for applying the contextual-based word boosting methods for ASR on the conversational telephony speech data (for call centers)   
  • Working on single- and multi-task ASR-based voice activity detection
  • Working on speaker diarization (aIB, AHC, VBx, and BERT-based model, ...) 
  • Have a contribution to Idiap NIST SRE 2018, 2019, and 2020 submissions
  • Working on new DNN-based methods for text-independent speaker recognition 
  • Have a contribution to Bob signal-processing and machine learning toolbox https://www.idiap.ch/software/bob/ 

Programmer and Researcher

20172018
Asr Gooyesh Pardaz Co, Tehran, Iran
  • Ariana project (Persian text to speech system)
    • Implementing Glottal-HMM vocoder for Persian text-to-speech synthesis
    • Increasing the Ariana's lexicon for covering the non-formal and new words
    • Implementing the SDK version of Ariana and applying the software security

Visiting Ph.D. Student

20172017
National Institute of Informatics, Tokyo, Japan

Working on unsupervised speaker adaptation and speech enhancement for DNN-based speech synthesis under the supervision of Prof. Junichi Yamagishi

  • Unsupervised speaker adaptation for DNN-based speech synthesis 
  • Speech enhancement for device recorded audio files 
  • The source code of GAN-based method for speech enhancement can be found in https://github.com/ssarfjoo/improvedsegan

Research Assistant

20132017
Speech processing laboratory, Ozyegin University, Istanbul, Turkey

Worked on several speech processing tasks including:

  • Synthetic speech generation and speaker adaptation for HMM-based speech synthesis systems (Using Python)
    • Worked with HTS toolkit, MLSA, STRAIGHT, and World vocoders
  • Cross-lingual speaker adaptation for HMM-based speech synthesis systems between Turkish and English languages (Using Python)
  • Text-independent speaker recognition systems (Using MATLAB)
    • Experienced with GMM-UBM, JFA, TVS, and PLDA technologies.
  • Spoofing detection for speaker verification systems (Using MATLAB)
  • Implementing Unit Selection speech synthesis system
    • Worked with MaryTTS system
  • Recording and preparing Bi-Lingual Turkish-English corpus for cross-lingual adaptation.
  • Predicting the relationship quality in recently married couples (Using MATLAB)
    • Organized locally recorded database.
    • Used a wide range of speech- and text-based features and various machine learning algorithms (e.g.
      SVM, Decision Trees, Random Forests, kNN, and Naive Bayes) and toolkits (e.g. MSSV, OpenSmile, Praat, FEAST, and Audioseg)

Programmer and Researcher

20092013
Asr Gooyesh Pardaz Co, Tehran, Iran
  • Ariana project (Persian text to speech system), 2009-2013
    • Implementing desktop, Android, WCF-based service and SAPI engine (compatible with screen reader applications) versions of Ariana software.
    • Worked on NLP, database, UI, accessibility, and software security with use of C#, C++, java, and Assembly languages.
  • Nevisa project (Persian speech recognition system), 2011-2013
    • Implementing desktop and WCF-based service versions of Nevisa software.
    • Worked on database, UI, and software security with use of C#, C++, C, and Assembly languages.
  • Niusha project (Persian interactive voice response system), 2012-2013
    • Worked on UI, service monitoring, and VoIP-based IVR with use of C#, Visual C++.Net, and C++ languages. 
    • Project manager (Contains Elastics-based IVR, VoIP-based IVR, VXML generator, Designer, Monitoring, and UI sub systems) 

Languages

  • Persian (Native)
  • English (Full professional proficiency)
  • Turkish (Elementary level)
  • French (Elementary level)

Technical Skills

  • DNN Toolkits
    • Torch
    • Pytorch
    • Tensorflow
    • Theano
    • PyOpenCL
    • CURRENNT
  • Speech Toolkits
    • Festival, Speech Tools, Flite
    • HMM Toolkit (HTK), HMM-based Speech Synthesis System (HTS)
    • Merlin, Kaldi, Bob 
  • Programming Skills
    • C, C++, Java, C#, MATLAB, Python, Perl
    • ASP.NET
  • Mobile Programming
    • Android

References

References available upon request.

Created withVisualCV