Skip to content

A distance based hash (one where similar input gives similar output, the opposite of a cryptographic hash), suitable for text applications.

Notifications You must be signed in to change notification settings

jwilkins/nilsimsa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

nilsimsa

Build Status

Nilsimsa is a distance based hash, which is the opposite of more familiar hashes like MD5. Instead of small changes making a large difference in the resulting hash (to avoid collisions), distance based hashes cause similar values to have similar output. This is good for detecting near similar documents without having to store the original text.

Standard usage is as follows:

require 'nilsimsa'

n1 = Nilsimsa::new text1 = "The quick brown fox" n1.update(text1) puts "Text '#{text1}': #{n1.hexdigest}"

About

A distance based hash (one where similar input gives similar output, the opposite of a cryptographic hash), suitable for text applications.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published