Nate Bunnyfield (natebunnyfield) wrote,
Nate Bunnyfield
natebunnyfield

If you've ever wondered how spellcheckers worked.

In information theory, the Levenshtein distance or edit distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution. It is named after the Russian scientist Vladimir Levenshtein, who considered this distance in 1965. It is useful in applications that need to determine how similar two strings are, such as spell checkers.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since these three edits change one into the other, and there is no way to do it with less than three edits:

1. kitten
2. sitten (substitution of 'k' for 's')
3. sittin (substitution of 'i' for 'e')
4. sitting (insert 'g' at the end)

from http://en.wikipedia.org/wiki/Levenshtein_distance

And http://www.merriampark.com/ld.htm has a java demo.
Subscribe
  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 2 comments