?

Log in

No account? Create an account
If you've ever wondered how spellcheckers worked. In information… - Nate Bunnyfield [entries|archive|friends|userinfo]
Nate Bunnyfield

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Links
[Links:| natehaas.com onetake (my experimental music podcast) ]

[Jul. 8th, 2005|03:32 pm]
Nate Bunnyfield
If you've ever wondered how spellcheckers worked.

In information theory, the Levenshtein distance or edit distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution. It is named after the Russian scientist Vladimir Levenshtein, who considered this distance in 1965. It is useful in applications that need to determine how similar two strings are, such as spell checkers.

For example, the Levenshtein distance between "kitten" and "sitting" is 3, since these three edits change one into the other, and there is no way to do it with less than three edits:

1. kitten
2. sitten (substitution of 'k' for 's')
3. sittin (substitution of 'i' for 'e')
4. sitting (insert 'g' at the end)

from http://en.wikipedia.org/wiki/Levenshtein_distance

And http://www.merriampark.com/ld.htm has a java demo.
LinkReply

Comments:
[User Picture]From: lazyman
2005-07-09 10:32 am (UTC)
Aoccdrnig to rseearch at Cmabrigde Uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.

Not totally true, but an interesting meme nonetheless.
(Reply) (Thread)
[User Picture]From: subplot2
2005-07-09 06:22 pm (UTC)
Thanks for that link, Brian. Consider it bookmarked.

I first read of this theory in a progressive-type magazine when it initially appeared, and have been too lazy to follow up on it even though it's been endlessly regugitated over the Internet. I think every time I see it I get more annoyed, because it obviously does matter what order the letters are in because I feel like I'm carrying weights when I read scrambled passages. Sure, I can understand it, but my brain is working differently than if I were reading unscrambled text.

Thanks again, I needed that read.

(Reply) (Parent) (Thread)