(archive site)

Fast Speaker Diarization Using Python

Fast Speaker Diarization Using Python

With the emergence of highly parallel multicore and manycore processors, such as graphics processing units (GPUs), one can re-implement computationally intensive algorithms such as Gaussian Mixture Model (GMM) training, a particular class of statistical models used in, e.g., speech recognition, image segmentation, and document classification, to achieve faster than real-time performance. However, developing and maintaining the complex low-level GPU code is difficult and requires a deep understanding of the hardware architecture of the parallel processor, which machine-learning experts do not necessarily have. Furthermore, such low-level implementations are not readily reusable in other applications and are not portable to other platforms, limiting programmer productivity.

We therefore developed a specialization framework to automatically map and execute computationally intensive GMM training on an NVIDIA GPU from Python code, using SEJITS, a set of techniques that leverages just-in-time code generation and compilation. Fast Speaker Diarization using Python (FSDP) was a case study to demonstrate GMM training using the Expectation-Maximization (EM) algorithm. Using ParLab’s ASP framework, we were able to implement a fast speaker diarization system captured in under 100 lines of Python code that achieves a level of performance 50-250 times faster than real-time, without significant loss in accuracy. This performance is competitive with hand-crafted GPU code, showing that code variant selection and parameter tuning can be separated from application development to increase productivity for both application programmers and performance-tuning specialists.

FSDP was one of the first implementations of what became the PyCASP framework, eventually leading to SMASH; these projects aim to develop tools for big data processing that map multimedia content-analysis Python applications onto parallel platforms.

Project Results

Source Code:

The source code for our Gaussian Mixture Model Specializer is available on GitHub.

Fast Speaker Diarization using Python Publications


FSDP was a collaboration between Speech and Audio & Multimedia researchers at ICSI and University of California – Berkeley’s Parallel Computing Laboratory (ParLab).

Researchers @ ICSI:

Collaborators @ ParLab:

  • Henry Cook
  • Armando Fox
  • Ekaterina Gonina
  • Shoaib Kamil
  • David Patterson