Age of Reason

Random musing of books and stuff I am reading.

2004/12/22

Rabin Karp Fingerprinting

Dan Gusfield presents Rabin & Karp's fingerprinting algorithm for
string matching in expected linear time.

Fingerprinting and Signatures are sparse hashing,
while parity check is dense hashing.

The idea is this.
If the parity of pattern and text differ,
there can be no match.
Otherwise the probability of a false
match is given by the compression ratio
of the parity, 1/2 in this case.

match(pattern,text)
for(i=1 to |text|)
if( hash(pattern[1..n]) == hash(text[i..i+n]) ){
pattern =~ text[i] with
probability of false match being #patterns / #hash
else
no match at i.

Other idea is to compute the sliding hash incrementally inside the loop:

hash(text[i..i+n]) = hash(text[i-1..i+n-1]) + incremental effort.

Rabin and Karp use numbers mod p as the hashing function,
where p is a random prime < maxint on the computer.
Computation mod p is fast and there is
no need to carry around bignums.

The probability of error falls quickly
by hashing w.r.t. multiple random primes.

0 Comments:

Post a Comment

<< Home