Scalable Multimedia content AnalysiS in a High-level language
Using multimedia as an example, researchers on the SMASH (Scalable Multimedia content AnalysiS in a High-level language) project are developing tools for high-level analysis of large amounts of data. The scalability of content analysis methods affects any field that uses social media videos, whether in science, business, or intelligence-gathering. Making multimedia content analysis more scalable thus allows for better algorithms to be developed by more researchers across many disciplines.
The SMASH project aims to provide a single software environment for productive, efficient, portable, and scalable application development. One major approach to making applications scalable is to explicitly map their computation onto parallel platforms. However, developing efficient parallel implementations and fully utilizing the available resources remains a challenge, due to ever-increasing code complexity, limitations on portability, and the prior knowledge of the underlying hardware that is necessarily required. SMASH takes advantage of previous work at ICSI, including the System for Running Systems (SRS) and PyCASP, a Python-based framework that automatically maps computation onto parallel platforms from Python application code.
As with PyCASP, the SMASH framework uses a pattern-based structural-composition technique to achieve high performance at the application level. However, SMASH uses deep neural nets (DNNs), which is currently the most accurate approach for audio applications like automatic speech recognition and audio-based event detection. This research has therefore involved identifying the components required in a DNN framework and analyzing the algorithms required to develop these components. We are basing our DNN audio-content analysis code on Caffe, the fastest publicly available CPU/GPU code for DNN-based image classification. We have developed new components for the Caffe framework to adapt it to handle audio processing; this expanded DNN framework, audioCaffe, is fast and parallelizes seamlessly on both CPUs and GPUs. We are currently testing audioCaffe in different audio analysis applications, as well as exploring additional methods for scaling the DNN framework on GPU clusters.
To lay the foundation for this type of large-scale multimedia analysis, SMASH researchers at ICSI are also working with Lawrence Livermore National Laboratories to process the images and videos in the recently released Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset, computing frequently used audio and visual features and developing annotated subcorpora for common multimedia-analysis tasks.
- The audioCaffe source code is available on GitHub
- The audioCaffe + YLI demo runs a demonstration experiment on data from the YLI corpus
- The YLI Corpus (including extracted features for the YFCC100M dataset)
SMASH in the News:
The release of the YFCC100M dataset, and our work to process it, made the news:
The SMASH project is a collaboration between researchers at ICSI and University of California – Berkeley.
Researchers @ ICSI (Current and Past):
Researchers @ UC Berkeley:
- Khalid Ashraf
- Kurt Keutzer
Funding for SMASH is provided by National Science Foundation grant IIS-1251276. The opinions, findings, and conclusions described on this website are those of the researchers and do not necessarily reflect the views of the funders.