Video Content Analysis
Massive numbers of video clips are generated daily on many types of devices and uploaded to the Internet. Unlike videos that are produced for broadcast or from planned surveillance, the “unconstrained” video clips that can be produced by anyone with a digital camera or camera phone present a significant challenge for manual as well as for automated analysis. Such clips might include any possible scene or event, and generally have limited quality control.
ICSI researchers are involved in several projects to implement methods that allow users to find videos containing specified events, such as “Making a cake,” “Batting a run in,” or “Assembling a shelter.” We use multiple approaches in this work, using machine-learning techniques that rely directly on low-level features to categorize event types as well as techniques that identify audio “concepts”. In this latter semantic approach, each observable event is made up of specifiable audio or visual concepts, such as people conversing, a crowd cheering, or the whine of a power tool, which can in turn be identified according to their features by a trained system. By analyzing the content of a large dataset of consumer-produced videos labeled for the events they depict, a system can determine which features are unique to those events or which concepts make them up, and then in turn identify events in newly uploaded videos.
ICSI’s work in this area originated in the Speech group, and focuses largely on audio concept detection. For example, we make use of our extensive research on speaker diarization to build systems for segmenting audio tracks and grouping the segments by similarity to identify concepts. Another important problem is that of identifying which sounds are distinctive, i.e., relatively unique to a particular type of event, and which tend to occur in so many types of video they are not useful for distinguishing them. In this way, audio concepts can be thought of as being like words: if the word “thoracic” appears frequently in a book, the chances are good that it’s a medical publication, whereas frequent occurrence of the word “the” does not get one very far in identifying genre.
A state-of-the-art video-search system being built by multiple institutions. ICSI contributes expertise in using audio concept detection for event identification and video categorization. (Part of IARPA’s ALADDIN program.)
ICSI is collaborating with researchers at Lawrence Livermore National Labs on a full semantic event-based retrieval system for multimedia.
A browser that presents users with the basic narrative elements of a sitcom — scenes, punchlines, dialogue segments, etc. — and a per-actor filter on top of a standard video player interface, so they can navigate to particular elements or moments they remember.
Video Duplicate Detection Using Acoustic Methods:
Developed a method for determining whether two different videos had duplicate subparts, even if those subparts were not bit-identical, by using acoustic diarization techniques to segment, cluster, and compare the audio tracks.