Skip to content

jakariaemon/WSI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Whisper Speaker Identification (WSI)

Whisper Speaker Identification (WSI) is a state-of-the-art speaker identification model designed for multilingual scenarios.The WSI model adapts OpenAI's Whisper encoder and fine-tunes it with a projection head with hybrid loss (Online Triplet + Multi-View Self-Supervised). This approach enhances its ability to generate discriminative, language-agnostic speaker embeddings.WSI demonstrates state-of-the-art performance on multilingual datasets, achieving lower Equal Error Rates (EER) and higher F1 Scores.

Usage

Coming Soon!

Cite This Work

If you use this work, please cite:

Jakaria Islam Emon, Md Abu Salek, Kazi Tamanna Alam
"Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings"
arXiv preprint, 2025.
https://doi.org/10.48550/arXiv.2503.10446

@article{emon2025whisper,
  author    = {Jakaria Islam Emon and Md Abu Salek and Kazi Tamanna Alam},
  title     = {Whisper Speaker Identification: Leveraging Pre-Trained Multilingual Transformers for Robust Speaker Embeddings},
  journal   = {arXiv preprint},
  year      = {2025},
  eprint    = {2503.10446},
  archivePrefix = {arXiv},
  primaryClass = {cs.SD},
  doi       = {10.48550/arXiv.2503.10446}
}

License

This project is licensed under the CC BY-NC-SA 4.0 License.

About

Whisper Speaker Identification (WSI), a cutting-edge model for multilingual speaker identification.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages