Dartmouth Events

Computer Science Colloquium

Anshumali Shrivastava, PhD student at Cornell University, will speak on "An Excursion in Probabilistic Hashing Techniques for Big Data."

Tuesday, March 3, 2015
4:15pm – 5:15pm
Carson L01
Intended Audience(s): Public
Categories: Lectures & Seminars

Large-scale machine learning and data mining applications are constantly dealing with datasets at TB scale and the anticipation is that soon it will reach PB levels. At this scale conventional algorithms fail and simple data mining operations such as search, learning, clustering, etc. become challenging

In this talk, I will introduce probabilistic hashing techniques for large-scale search and learning.  I will show how the old hashing framework, originally meant for sub-linear search, can be converted into fast learning algorithms. I will talk about our recent success in constructing hash functions for dot product by making use of asymmetry. Such a construction is not possible in the conventional setting and was a known hard problem. I will further show the direct consequence of hashing inner products in speeding up popular learning algorithms.  Later, I will discuss some of the recent improvements in some decade old textbook hashing algorithms, which will include the fastest way of performing minwise hashing in practice.

I will demonstrate the utility of the above techniques on various real applications including search, learning, collaborative filtering and our ongoing collaboration with HRDAG (Human Rights Data Analysis Group) and NCRN (NSF- Census Research Network) in estimating death counts in Syria since March 2011.

Anshumali Shrivastava is a Ph. D. student in the computer science department at Cornell University since 2010.  His broad research interests include large-scale machine learning, randomized algorithms for big data systems and graph mining.  His research on hashing inner products has won Best Paper Award at NIPS 2014 while his work on representing graphs got the Best Paper Award at IEEE/ACM ASONAM 2014.   Before coming to Cornell, he worked as a scientist at FICO (Fair Isaac Corp.) research Bangalore.  Anshumali did his bachelors and masters in mathematics and computing from Indian Institute of Technology (IIT) Kharagpur in 2008, where he holds Institute Silver Medal for graduating at the top of the class.   

For more information, contact:
Shannon Stearne

Events are free and open to the public unless otherwise noted.