(archive site)

Corpora for AMM Research at ICSI


The Berkeley Multimodal Location Estimation Project

Corpus for the Ambulance Detection Task: We collected a set of videos from multiple cities that contained ambulances, then trained an automatic system to guess which city an unknown ambulance video was from based on the sound of the siren.

Corpus for the Indoor/Outdoor Detection Task: We collected a corpus of videos and tagged them for several features, then trained a system to automatically detect if a novel video was recorded indoors or outdoors.

MediaEval 2014 Placing Task Dataset: A subset of the YLI corpus, provided for the MediaEval Benchmarking Initiative’s Placing Task for 2014, 2015, and 2016.

Non-ICSI Corpora for Multimodal Location Estimation: Earlier MediaEval Placing Task datasets may be found on the MediaEval website.


SMASH/Scalable Big Data Analysis

YLI Corpus: Based on the Yahoo Flickr Creative Commons 100 Million (YFCC100M) dataset, a collection of of 99.3 million images and 700 thousand videos from Flickr, compiled by Yahoo Labs. We are working with Lawrence Livermore National Laboratories to process the images and videos, computing frequently used audio and visual features and developing subcorpora for multimedia-analysis tasks.


Speech Recognition and Speaker Diarization Research

ICSI Meeting Corpus: We produced an audio corpus of 40+ hours of multichannel studio-quality recordings of actual meetings, the largest of its kind at the time it was released.

AMI Meeting Corpus: ICSI is a member of the Augmented Multi-party Interaction (AMI) consortium, which produced an audio and video corpus of 100 hours of mostly scenario-driven meetings.

Non-ICSI Corpora for Speaker Diarization: Additional data used in our diarization work has come from, among others, the Rich Transcription Evaluation (RT Eval) run by the National Institute for Standards and Technology (NIST). Most of the RT Eval data was drawn from resources hosted by the LDC.