(archive site)

The Berkeley Multimodal Location Estimation Project

The Berkeley Multimodal Location Estimation Project

Location estimation is the task of estimating the geo-coordinates of the content recorded in digital media. The Berkeley Multimodal Location Estimation Project (MMLE) leverages GPS-tagged media available on the web as a training set for an automatic location estimator that identifies the probable recording location of user-generated media content that lacks geolocation metadata. The possible locations for a given image, video, or audio track are narrowed down using visual features, acoustic cues, and textual tags (metadata) from the recordings, combined with information from external knowledge bases,such as databases of placenames. (Our Location Estimation Tutorial describes how a human could perform this task.)

Location estimation is inherently a difficult and multiplex problem. A given video may have explicit location indicators, such as GPS coordinates in the metadata; it may have multiple rich indirect indicators that are identifiable by comparison with a knowledge base, such as images that match known landmarks or voices whose regional accent can be identified; or there may be no apparent location indicators at all.

The MMLE team at ICSI has therefore experimented with using a number of different types of cues — and different combinations thereof — for geo-location, to determine which achieve the greatest accuracy for the amount of processing power needed. For example, we compared a pure machine-learning approach to determining which metadata tags on videos tend to be associated with which geo-coordinates with an approach that incorporated information from external sources, such as placename databases and lexical databases. In a more recent experiment, we built a system that narrows down possible locations by first comparing metadata tags with placename databases and then comparing key frames in the video to images from those places to get a more granular estimation.

Because training data needed for a particular type of location estimation system can sometimes be sparse, we are developing methods that approach the problem as one of inference over a graph. This approach uses a centroid-based candidate fusion approach that jointly estimates the geo-locations of all of the test videos, rather than processing each individually, thus incorporating the distribution of features across items in the test set into the model.


ICSI researchers have participated in the MediaEval Benchmarking Initiative’s Placing Task every year since its inception. Our system received a Distinctive Mention in 2012 for representing the “most novel theoretical approach”, and members of the ICSI MMLE research team will participate in organizing the 2014 MediaEval Placing Task.

Project Results


We have developed multiple video corpora for the MMLE project.

Berkeley MMLE in the News:

Our work on location estimation was featured by a leading industry news source:

Berkeley Multimodal Location Estimation Publications


The Berkeley Multimodal Location Estimation Project is a joint project between the Vision and Audio & Multimedia groups at the International Computer Science Institute, the BASiCS (Berkeley Audio Visual Signal Processing and Communication Systems) group at the University of California РBerkeley, and researchers at Technische Universität Berlin.

Researchers @ ICSI (Current and Past):

Collaborators @ UC Berkeley (Current and Past):

  • Venkatesan Ekambaram
  • Kannan Ramchandran

Collaborators @ TU-Berlin:

  • Pascal Kelm
  • Sebastian Schmiedeke
  • Thomas Sikora


The MMLE research was partly funded by National Geospatial-Intelligence Agency NURI grant #HM11582-10-1-0008, a KFAS Doctoral Study Abroad Fellowship, and National Science Foundation EAGER grant IIS-1128599. The opinions, findings, and conclusions described on this website are those of the researchers and do not necessarily reflect the views of the funders.