### Rabin Karp Fingerprinting

Dan Gusfield presents Rabin & Karp's fingerprinting algorithm for

string matching in expected linear time.

Fingerprinting and Signatures are sparse hashing,

while parity check is dense hashing.

The idea is this.

If the parity of pattern and text differ,

there can be no match.

Otherwise the probability of a false

match is given by the compression ratio

of the parity, 1/2 in this case.

match(pattern,text)

for(i=1 to |text|)

if( hash(pattern[1..n]) == hash(text[i..i+n]) ){

pattern =~ text[i] with

probability of false match being #patterns / #hash

else

no match at i.

Other idea is to compute the sliding hash incrementally inside the loop:

hash(text[i..i+n]) = hash(text[i-1..i+n-1]) + incremental effort.

Rabin and Karp use numbers mod p as the hashing function,

where p is a random prime < maxint on the computer.

Computation mod p is fast and there is

no need to carry around bignums.

The probability of error falls quickly

by hashing w.r.t. multiple random primes.

string matching in expected linear time.

Fingerprinting and Signatures are sparse hashing,

while parity check is dense hashing.

The idea is this.

If the parity of pattern and text differ,

there can be no match.

Otherwise the probability of a false

match is given by the compression ratio

of the parity, 1/2 in this case.

match(pattern,text)

for(i=1 to |text|)

if( hash(pattern[1..n]) == hash(text[i..i+n]) ){

pattern =~ text[i] with

probability of false match being #patterns / #hash

else

no match at i.

Other idea is to compute the sliding hash incrementally inside the loop:

hash(text[i..i+n]) = hash(text[i-1..i+n-1]) + incremental effort.

Rabin and Karp use numbers mod p as the hashing function,

where p is a random prime < maxint on the computer.

Computation mod p is fast and there is

no need to carry around bignums.

The probability of error falls quickly

by hashing w.r.t. multiple random primes.

## 0 Comments:

Post a Comment

<< Home