Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Sahil's Notepad
Locality Sensitive Hashing (LSH) and Min Hash
[Indyk-Motwani’98] Many distance related questions (nearest neighbor, closest x, ..) can be...
Date: 06/11/2008
Set Similarity and Min Hash
Given two sets S1, S2, find similarity(S1, S2) - based not hamming distance (not Euclidean). Jaccard...
Date: 06/10/2008
Information Retrieval & Search - Basic IR Models
Our focus in the database world has primarily been on retrieving information from a structured...
Date: 03/05/2008
Information Theory (1) - The Science of Communication
IT is a beautiful sub-field of CS with applications across the gamut of scientific fields: coding...
Date: 02/21/2008
Random Sampling over Joins
Source: On Random Sampling over Joins. Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, Sigmod...
Date: 02/11/2008
Converting Between Random Sampling Methods
Sampling f fraction out of n records: Sampling with replacement Sample is a multi-set of fn...
Date: 02/05/2008
Reservoir Sampling
A simple random sampling strategy to produce a sample without replacement from a stream of data -...
Date: 02/05/2008