Fast Speaker Diarization Using Python
With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement computationally intensive algorithms such as Gaussian Mixture Model (GMM) training, a particular class of statistical models used in, e.g., speech recognition, image segmentation, and document classification, to achieve faster than real-time performance. However, developing and maintaining the complex low-level GPU code is difﬁcult and requires a deep understanding of the hardware architecture of the parallel processor, which machine-learning experts do not necessarily have. Furthermore, such low-level implementations are not readily reusable in other applications and are not portable to other platforms, limiting programmer productivity.
We therefore developed a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU from Python code, using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. Fast Speaker Diarization using Python (FSDP) was a case study to demonstrate GMM training using the Expectation-Maximization (EM) algorithm. Using ParLab’s ASP framework, we were able to implement a fast speaker diarization system captured in under 100 lines of Python code that achieves a level of performance 50-250 times faster than real-time, without signiﬁcant loss in accuracy. This performance is competitive with hand-crafted GPU code, showing that code variant selection and parameter tuning can be separated from application development to increase productivity for both application programmers and performance-tuning specialists.
FSDP was one of the first implementations of what became the PyCASP framework, eventually leading to SMASH; these projects aim to develop tools for big data processing that map multimedia content-analysis Python applications onto parallel platforms.
The source code for our Gaussian Mixture Model Specializer is available on GitHub.
Researchers @ ICSI:
Collaborators @ ParLab:
- Henry Cook
- Armando Fox
- Ekaterina Gonina
- Shoaib Kamil
- David Patterson