Skip to content

Commit

Permalink
fix some bugs, update README, turn into a gem, add some tests
Browse files Browse the repository at this point in the history
  • Loading branch information
PhilT committed Jul 31, 2015
1 parent 3814c18 commit d495600
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 12 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 11,7 @@ This is a straight port of SymSpell from C# to Ruby. I've started moving things

Original source with inline comments and README is here: https://github.com/wolfgarbe/symspell.

I've changed very little from the original source (apart from removing the commandline interface) but please note it has no test coverage at this time.
I've changed very little from the original source (apart from removing the commandline interface) but please note it has only some very basic end to end tests at this time.


## Usage
Expand All @@ -26,5 26,7 @@ I've changed very little from the original source (apart from removing the comma

## EDIT_DISTANCE_MAX

`EDIT_DISTANCE_MAX` is the number of letters to remove to find a match. Standard text should be around 2-3 if you have a smaller dictionary you could try larger numbers to catch drastically misspelt words. Note that creating the dictionary will take a lot longer as the combinations go up exponentially.
`EDIT_DISTANCE_MAX` is the number of operations needed to tranform one string into another.

For example the edit distance between **CA** and **ABC** is 2 because **CA** => **AC** => **ABC**. Edit distances of 2-5 are normal. Note, however, increasing EDIT_DISTANCE_MAX exponentially increases the combinations and therefore the time it takes to create the dictionary.

4 changes: 2 additions & 2 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -1,9 1,9 @@
require 'rake/testtask'

desc 'Test, build and install the gem'
task :default => [:spec, :install]
task :default => [:test]

Rake::TestTask.new(:spec) do |t|
Rake::TestTask.new(:test) do |t|
t.pattern = 'tests/*_test.rb'
end

Expand Down
16 changes: 8 additions & 8 deletions lib/symspell.rb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 15,7 @@ def create_dictionary(corpus)
word_count = 0

File.open(corpus, 'r').each_line do |word|
word_count = 1 if create_dictionary_entry(word.strip, language)
word_count = 1 if create_dictionary_entry(word.strip)
end
end

Expand Down Expand Up @@ -156,10 156,10 @@ def parse_words(text)
text.downcase.scan(/[\w-[\d_]] /).first
end

def create_dictionary_entry(key, language)
def create_dictionary_entry(key)
result = false
value = nil
if valueo = @dictionary[language key]
if valueo = @dictionary[key]
if valueo.is_a?(Fixnum)
tmp = valueo
value = DictionaryItem.new
Expand All @@ -171,7 171,7 @@ def create_dictionary_entry(key, language)
elsif @wordlist.count < MAX_INT
value = DictionaryItem.new
value.count = 1
@dictionary[language key] = value
@dictionary[key] = value

@maxlength = key.size if key.size > @maxlength
end
Expand All @@ -182,17 182,17 @@ def create_dictionary_entry(key, language)
result = true

edits(key, 0, Set.new).each do |delete|
if value2 = @dictionary[language delete]
if value2 = @dictionary[delete]
if value2.is_a?(Fixnum)
tmp = value2
di = DictionaryItem.new
di.suggestions << tmp
@dictionary[language delete] = di
@dictionary[delete] = di
add_lowest_distance(di, key, keyint, delete) unless di.suggestions.include?(keyint)
elsif !value2.suggestions.include?(keyint)
end
else
@dictionary[language delete] = keyint
@dictionary[delete] = keyint
end
end
end
Expand Down Expand Up @@ -239,7 239,7 @@ def damerau_levenshtein_distance(source, target)
sd[letter] = 0 unless sd[letter]
end

(0..m).each do |i|
(1..m).each do |i|
db = 0
(0..n).each do |j|
i1 = sd[target[j - 1]]
Expand Down
31 changes: 31 additions & 0 deletions tests/symspell_test.rb
Original file line number Diff line number Diff line change
@@ -0,0 1,31 @@
require 'minitest/autorun'
require_relative '../lib/symspell'

class SymSpellTest < Minitest::Test
def setup
@edit_distance_max = 2
end

def subject
@subject ||= SymSpell.new(@edit_distance_max).tap do |subject|
subject.create_dictionary 'tests/words.txt'
end
end
def test_lookup_correctly_spelled_word
assert_equal 'andrew', subject.lookup('andrew').first.term
end

def test_lookup_misspelt_word
assert_equal 'andrew', subject.lookup('andre').first.term
end

def test_lookup_fails_to_find_match
assert_equal nil, subject.lookup('amigon').first
end

def test_lookup_finds_match_after_turning_up_edit_distance
@edit_distance_max = 3
assert_equal 'imogen', subject.lookup('amigon').first.term
end
end

7 changes: 7 additions & 0 deletions tests/words.txt
Original file line number Diff line number Diff line change
@@ -0,0 1,7 @@
mark
john
peter
mary
andrew
imogen

0 comments on commit d495600

Please sign in to comment.