(archive site)

Speaker Diarization Publications


AURORA (An ALADDIN Project)

Start Here

  • Benjamín Elizalde, Gerald Friedland, Howard Lei, and Ajay Divakaran. 2012. There Is No Data Like Less Data: Percepts for Video Concept Detection on Consumer-Produced Media. In Proceedings of the ACM International Workshop on Audio and Multimedia Methods for Large-Scale Video Analysis (AMVA) at ACM Multimedia 2012 (MM’12), Nara, Japan, October 2012, pp. 27-32. [PDF]
  • Benjamín Elizalde, Mirco Ravanelli, and Gerald Friedland. 2013. Audio Concept Ranking for Video Event Detection on User-Generated Content. In Proceedings of the InterSpeech First Workshop on Speech, Language and Audio in Multimedia (SLAM ’13), Marseille, France, August 2013. [PDF]
  • Benjamín Elizalde, Howard Lei, and Gerald Friedland. 2013. An i-Vector Representation of Acoustic Environments for Audio-Based Video Event Detection on User Generated Content. In Proceedings of the IEEE International Symposium on Multimedia (ISM 2013), Anaheim, California, December 2013, pp. 114-117. [PDF]

More Publications

  • Benjamín Elizalde and Gerald Friedland. 2013. Lost in Segmentation: Three Approaches for Speech/Non-Speech Detection in Consumer-Produced Videos. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME 2013), San Jose, California, July 2013. [PDF]
  • Benjamín Elizalde and Gerald Friedland. 2013. Taming the Wild: Acoustic Segmentation in Consumer‐Produced Videos. ICSI Technical Report TR-12-016. Berkeley, CA: International Computer Science Institute. [PDF]
  • Benjamín Elizalde, Gerald Friedland, and Karl Ni. 2013. What You Hear Is What You Get: Audio-Based Video Content Analysis. In Proceedings of the Bay Area Machine Learning Symposium 2013 (BayLearn), Menlo Park, California, August 2013. [PDF]
  • Benjamín Elizalde, Howard Lei, Gerald Friedland, and Nils Peters. 2013. Capturing the Acoustic Scene Characteristics for Audio Scene Detection. In Proceedings of the IEEE AASP Challenge: Detection and Classification of Acoustic Scenes and Events (D-CASE) at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2013), New Paltz, New York, October 2013. [PDF]
  • Hui Cheng, Jingen Liu, Saad Ali, Omar Javed, Qian Yu, Amir Tamrakar, Ajay Divakaran, Harpreet S. Sawhney, R. Manmatha, James Allan, Alex Hauptmann, Mubarak Shah, Subhabrata Bhattacharya, Afshin Dehghan, Gerald Friedland, Benjamin Martinez Elizalde, Trevor Darrell, Michael Witbrock, and Jon Curtis. 2012. SRI-Sarnoff AURORA System at TRECVID 2012: Multimedia Event Detection and Recounting. NIST TRECVID 2012. Gaithersburg, MD: National Institute of Standards and Technology. [PDF]
  • Gerald Friedland, Benjamín Martinez Elizalde, Howard Lei, and Ajay Divakaran. 2012. There Is No Data Like Less Data: Percepts for Video Concept Detection on Consumer-Produced Media. ICSI Technical Report TR-12-006. Berkeley, CA: International Computer Science Institute. [PDF]
  • Po-Sen Huang, Robert Mertens, Ajay Divakaran, Gerald Friedland, and Mark Hasegawa-Johnson. 2012. How to Put It into Words – Using Random Forests to Extract Symbol Level Descriptions from Audio Content for Concept Detection. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), pp. 505-508, Kyoto, Japan, March 2012. [PDF]
  • Bhiksha Raj, Benjamín Elizalde, Gerald Friedland, Juan A. Nolazco-Flores, and L. Paola Garcia-Perera. 2012. Segment and Conquer: Acoustic Segmentation on Consumer-Produced (aka “Wild”) Videos. Poster presented at 2nd Greater New York Area Multimedia and Vision Meeting, New York, NY, June 15, 2012.
  • Robert Mertens, Po-Sen Huang, Luke Gottlieb, Gerald Friedland, and Ajay Divakaran. 2011. On the Applicability of Speaker Diarization to Audio Concept Detection for Multimedia Retrieval. In Proceedings of the IEEE International Symposium on Multimedia (ISM 2011), Dana Point, California, December 2011, pp. 446-51. [PDF]
  • Robert Mertens, Howard Lei, Luke Gottlieb, Gerald Friedland, and Ajay Divakaran. 2011. Acoustic Super Models for Large Scale Video Event Detection. In Proceedings of the ACM International Workshop on Events in Multimedia (EiMM11), Scottsdale, Arizona, November 2011. [PDF]

Dia-Localization

Start Here

  • Gerald Friedland, Chuohao Yeo, and Hayley Hung. 2010. Dialocalizaton: Acoustic Speaker Diarization and Visual Localization as Joint Optimization Problem. ACM Transactions on Multimedia Computing, Communications, and Applications 6:4. [PDF]
  • Gerald Friedland, Hayley Hung, and Chuohao Yeo. 2009. Multi-Modal Speaker Diarization of Real-World Meetings Using Compressed-Domain Video Features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4069-72. [PDF]

More Publications

  • Hayley Hung, Gerald Friedland, and Chuohao Yeo. 2010. Computationally Efficient Clustering of Audio-Visual Meeting Data. In Multimedia Interaction and Intelligent User Interfaces: Principles, Methods, and Applications, edited by M. Etho, J. Luo, and L. Shao, pp. 25-59.
  • Mary Knox and Gerald Friedland. 2010. Multimodal Speaker Diarization Using Oriented Optical Flow Histograms. In Proceedings of the 11th International Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan, September 2010, pp. 290-93. [PDF]
  • Gerald Friedland, Chuohao Yeo, and Hayley Hung. 2009. Visual Speaker Localization Aided by Acoustic Models. In Proceedings of the ACM International Conference on Multimedia (ACM Multimedia 2009), Beijing, China, October 2009, pp. 195-202. [PDF]
  • Gerald Friedland, Hayley Hung, and Chuohao Yeo. 2008. Multi-modal Speaker Diarization of Real-world Meetings Using Compressed-Domain Video Features. ICSI Technical Report TR-08-007. Berkeley, CA: International Computer Science Institute. [PDF]
  • Hayley Hung and Gerald Friedland. 2008. Towards Audio-Visual On-Line Diarization of Participants in Group Meetings. In Proceedings of European Conference on Computer Vision (ECCV), Marseille, France, October 2008. [PDF]

Fast Speaker Diarization using Python

  • Ekaterina Gonina, Gerald Friedland, Henry Cook, Kurt Keutzer. 2011. Fast Speaker Diarization Using a High-Level Scripting Language. In Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Dec 11-15, 2011, Waikoloa, Hawaii. [PDF]
  • Henry Cook, Ekaterina Gonina, Shoaib Kamil, Gerald Friedland, David Patterson, and Armando Fox. 2011. CUDA-Level Performance with Python-Level Productivity for Gaussian Mixture Model Applications. In Proceedings of the Third USENIX Workshop on Hot Topics in Parallelism (HotPar ’11), Berkeley, California, May 2011. [PDF]

Joke-O-Mat

Start Here

  • Gerald Friedland, Adam Janin, and Luke Gottlieb. 2013. Narrative Theme Navigation for Sitcoms Supported by Fan-Generated Scripts. Multimedia Tools and Applications 63:2, pp. 387-406. [PDF]
  • Gerald Friedland, Luke Gottlieb, and Adam Janin. 2009. Joke-O-Mat: Browsing Sitcoms Punchline by Punchline (ACM Grand Challenge submission). Proceedings of the ACM International Conference on Multimedia (ACM Multimedia 2009), Beijing, China, October 2009, pp. 1115-16. [PDF]

More Publications

  • Adam Janin, Luke Gottlieb, and Gerald Friedland. 2010. Joke-O-Mat HD: Browsing Sitcoms with Human Derived Transcripts. In Proceedings of the ACM International Conference on Multimedia (ACM Multimedia 2010), Florence, Italy, October 2010, pp. 1591-94. [PDF]
  • Gerald Friedland, Luke Gottlieb, and Adam Janin. 2010. Narrative-Theme Navigation for Sitcoms Supported by Fan-Generated Scripts. In Proceedings of the Third International Workshop on Automated Information Extraction in Media Production (AIEMPro ’10) at the ACM International Conference on Multimedia (ACM Multimedia 2010), Florence, Italy, October 2010, pp. 3-8. [PDF]
  • Gerald Friedland, Luke Gottlieb, and Adam Janin. 2009. Using Artistic Markers and Speaker Identification for Narrative-Theme Navigation of Seinfeld Episodes. In Proceedings of the 11th IEEE International Symposium on Multimedia (ISM09), San Diego, California, December 2009, Workshop on Content-Based Audio/Video Analysis for Novel TV Services, pp. 511-16. [PDF]

The Meeting Diarist

  • Gerald Friedland, Jike Chong, and Adam Janin. A Parallel Meeting Diarist. In Proceedings of the Workshop on Searching Spontaneous Conversational Speech (SSCS) at the ACM International Conference on Multimedia (ACM Multimedia 2010), Florence, Italy, October 2010, pp. 57-60. [PDF]
  • Gerald Friedland, Jike Chong, and Adam Janin. Parallelizing Speaker-Attributed Speech Recognition for Meeting Browsing. In Proceedings of the 2010 IEEE International Symposium on Multimedia (ISM2010), Taiwan, December 2010, pp. 121-28. [PDF]

Meeting Dominance Estimation

Start Here

  • Hayley Hung, Yan Huang, Gerald Friedland, and Daniel Gatica-Perez. 2008. Estimating the Dominant Person in Multi-Party Conversations Using Speaker Diarization Strategies. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, Nevada, September 2007, pp. 2197-2200. April 2008 [PDF]

More Publications

  • Hayley Hung, Yan Huang, Gerald Friedland, and Daniel Gatica-Perez. Estimating Dominance in Multi-Party Meetings Using Speaker Diarization from a Single Microphone. IEEE Transactions on Audio, Speech and Language Processing 19:4, pp. 847–60.
  • Hayley Hung, Dinesh Jayagopi, Chuohao Yeo, Gerald Friedland, Sileye O. Ba, J-M. Odobez, Kannan Ramchandran, Nikki Mirghafori, and Daniel Gatica-Perez. 2007. Using Audio and Video Features to Classify the Most Dominant Person in Meetings. In Proceedings of ACM Multimedia 2007, Augsburg, Germany, September 2007, pp. 835-38.

Online Speaker Diarization

Start Here

  • Gerald Friedland. 2012. Using a GPU, Online Diarization = Offline Diarization. ICSI Technical Report TR-12-004. Berkeley, CA: International Computer Science Institute. [PDF]
  • Oriol Vinyals and Gerald Friedland. 2008. Live Speaker Identification in Meetings:. ICSI Technical Report TR-08-001. Berkeley, CA: International Computer Science Institute. [PDF]

More Publications

  • Carlos Vaquero, Oriol Vinyals, and Gerald Friedland. 2010. A Hybrid Approach to Online Speaker Diarization. In Proceedings of the 11th International Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan, September 2010, pp. 2642-45. [PDF]
  • Oriol Vinyals and Gerald Friedland. 2008. Towards Semantic Analysis of Conversations: A System for the Live Identification of Speakers in Meetings. In Proceedings of IEEE International Conference on Semantic Computing, Santa Clara, August 2008, pp. 426-31. [PDF]
  • Gerald Friedland and Oriol Vinyals. 2008. Live Speaker Identification in Conversations. In Proceedings of ACM Multimedia 2008, Vancouver, Canada, October 2008, pp. 1017-18. [PDF]

Speaker Diarization and Speaker Identification Techniques

Start Here

  • Xavier Anguera Miro, Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Gerald Friedland, and Oriol Vinyals. 2012. Speaker Diarization: A Review of Recent Research. IEEE Transactions on Audio, Speech, and Language Processing 20:2, pp. 356-70. [PDF]
  • Gerald Friedland, Adam Janin, David Imseng, Xavi Anguera, Luke Gottlieb, Marijn Huijbregts, Mary Knox, and Oriol Vinyals. 2012. The ICSI RT-09 Speaker Diarization System. IEEE Transactions on Audio, Speech, and Language Processing 20:2, pp. 371-81. [PDF]
  • Gerald Friedland, Oriol Vinyals, Yan Huang, and Christian Müller. 2009. Fusing Short Term and Long Term Features for Improved Speaker Diarization. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2009), Taipei, Taiwan, April 2009, pp. 4077-80. [PDF]

More Publications

  • Mary Tai Knox, Nikki Mirghafori, and Gerald Friedland. 2013. Exploring Methods of Improving Speaker Accuracy for Speaker Diarization. In Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), Lyon, France, August 2013. [PDF]
  • Howard Lei, Jaeyoung Choi, and Gerald Friedland. 2013. Nowhere to Hide: Exploring User-Verification across Flickr Accounts. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2013), Vancouver, Canada, May 2013. [PDF]
  • Gerald Friedland and Fabio Valente. 2012. Speaker Diarization. In Multimodal Signal Processing: Human Interactions in Meetings, edited by S. Reynals, H. Bourlard, J. Carletta, and A. Popescu-Belis. Cambridge/New York: Cambridge University Press.
  • Mary Tai Knox, Nikki Mirghafori, and Gerald Friedland. 2012. Where Did I Go Wrong?: Identifying Troublesome Segments for Speaker Diarization Systems. In Proceedings of the 13th Annual Conference of the International Speech Communication Association (InterSpeech 2012), Portland, Oregon, September 2012. [PDF]
  • Howard Lei, Bernd T. Meyer, and Nikki Mirghafori. 2012. Spectro-Temporal Gabor Features for Speaker Recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2012), Kyoto, Japan, March 2012, pp. 4241-44. [PDF]
  • Kofi Boakye, Oriol Vinyals, and Gerald Friedland. 2011. Improved Overlapped Speech Handling for Speaker Diarization. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, August 2011, pp. 941-44.
  • Gerald Friedland. 2011. Speaker Diarization. In Speech and Audio Signal Processing, 2nd edition, edited by B. Gold, N. Morgan, and D. Ellis. Oxford: Wiley-Blackwell.
  • Howard Lei, Jaeyoung Choi, Adam Janin, and Gerald Friedland. 2011. User Verification: Matching the Uploaders of Videos across Accounts. In Proceedings of the IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP 2011), Prague, Czech Republic, May 2011, pp. 2404-07. [PDF]
  • Howard Lei and Nikki Mirghafori. 2011. Data Selection with Kurtosis and Nasality features for Speaker Recognition. In Proceedings of the 12th Annual Conference of the International Speech Communication Association (Interspeech 2011), Florence, Italy, August 2011, pp. 2753-56. [PDF]
  • Simon Bozonnet, Nicholas Evans, Xavi Anguera, Oriol Vinyals, Gerald Friedland, and Corinne Fredouille. 2010. System Output Combination for Improved Speaker Diarization. In Proceedings of the 11th International Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan, September 2010, pp. 2642-45. [PDF]
  • Gerald Friedland and David Van Leeuwen. 2010. Speaker Recognition and Diarization. In Semantic Computing, edited by P. Sheu, H. Yu, C. V. Ramamamoorthy, A. K. Joshi, and L. A. Zadeh, pp. 115-130. Hoboken, NJ: IEEE Press/Wiley.
  • David Imseng and Gerald Friedland. 2010. An Adaptive Initialization Method for Speaker Diarization Based on Prosodic Features. In Proceedings of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), Dallas, Texas, March 2010, pp. 4946-49. [PDF]
  • David Imseng and Gerald Friedland. 2010. Tuning-Robust Initialization Methods for Speaker Diarization. IEEE Transactions on Audio, Speech, and Language Processing 18:8, pp. 2028-37. [PDF]
  • Howard Lei. 2010. Structured Approaches to Data Selection for Speaker Recognition. PhD dissertation, University of California–Berkeley. [PDF]
  • Howard Lei, Jaeyoung Choi, Adam Janin, and Gerald Friedland. 2010. Persona Linking: Matching Uploaders of Videos across Accounts. ICSI Technical Report TR-10-009. Berkeley, CA: International Computer Science Institute. [PDF]
  • Andreas Stolcke, Gerald Friedland, and David Imseng. 2010. Leveraging Speaker Diarization for Meeting Recognition from Distant Microphones. In Proceedings of the 35th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2010), Dallas, Texas, March 2010, pp. 4390-93. [PDF]
  • Oriol Vinyals, Gerald Friedland, and Nelson Morgan. 2010. Discriminative Training for Hierarchical Clustering in Speaker Diarization. In Proceedings of the 11th International Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, Japan, September 2010, pp. 2326-29. [PDF]
  • Kofi A. Boakye, Beatriz Trueba-Hornero, Oriol Vinyals, and Gerald Friedland. 2008. Overlapped Speech Detection for Improved Speaker Diarization in Multiparty Meetings. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2008), Las Vegas, Nevada, April 2008, pp. 4353-56. [PDF]
  • Gerald Friedland, Oriol Vinyals, Yan Huang, and Christian Müller. 2009. Prosodic and Other Long-Term Features for Speaker Diarization. IEEE Transactions on Audio, Speech, and Language Processing 17:5, pp. 985-93. [PDF]
  • David Imseng and Gerald Friedland. 2009. Robust Speaker Diarization for Short Speech Recordings. In Proceedings of the 11th Biannual IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU 2009), Merano, Italy, December 2009, pp. 432-37. [PDF]
  • Xavi Anguera, Chuck Wooters, and Javier Hernando. 2007. Acoustic Beamforming for Speaker Diarization of Meetings. IEEE Transactions on Audio, Speech and Language Processing 15:7, pp. 2011-22.
  • Oriol Vinyals, Gerald Friedland, and Nikki Mirghafori. 2007. Revisiting a Basic Function on Current CPUs: A Fast Logarithm Implementation with Adjustable Accuracy. ICSI Technical Report TR-07-002. Berkeley, CA: International Computer Science Institute. [PDF]