3 Packages to Build a Spell Checker in Python


how to build a spell checker in python

This post is going to talk about three different packages for coding a spell checker in Python – pyspellchecker, TextBlob, and autocorrect.

pyspellchecker

The pyspellchecker package allows you to perform spelling corrections, as well as see candidate spellings for a misspelled word. To install the package, you can use pip:

 pip install pyspellchecker 

Once installed, the pyspellchecker is really straightforward to use. Note that even though we use “pyspellchecker” when installing via pip, we just type “spellchecker” in the package import statement. The first piece is to create a SpellChecker object, which we’ll just call “spell”.

 from spellchecker import SpellChecker spell = SpellChecker() 

Now, we’re ready to test this out with a few misspellings. We’ll use a few words from this list of commonly misspelled words.

To attempt a correction, you can use the correction method:

 spell.correction("adress") # address 
 spell.correction("becuase") # because 

pyspellchecker also has a method to split the words in a sentence.

 spell.split_words("this sentnce has misspelled werds") #['this', 'sentnce', 'has', 'misspelled', 'werds'] 

Once we have a list of the words in the sentence, we can just loop over each word (via a list comprehension) using our SpellChecker object.

 words = spell.split_words("this sentnce has misspelled werds") [spell.correction(word) for word in words] #['this', 'sentence', 'has', 'misspelled', 'words'] 

If you just want to flag what words in a sentence are misspelled you can use the unknown method. This method will return a Python set of the potentially misspelled words.

 spell.unknown(["dilema", "column", "aquire"]) #{'aquire', 'dilema'} 

We can also see the candidate spellings for a misspelled word.

 spell.candidates("conceed") #{'concede', 'conceded'} 

TextBlob

The powerful TextBlob can also do spelling corrections. To install TextBlob we can use pip (note all lowercase):

 pip install textblob 

To use TextBlob’s spellchecking functionality, we just need to import the Word class. Then we can input a word and check its spelling using the spellcheck method, like below.

 from textblob import Word word = Word('percieve') word.spellcheck() # [('perceive', 1.0)] 

As can be seen above, TextBlob returns two pieces – a recommended correction for this word, and a confidence score associated with the correction. In this case, we just get one word back with a confidence of 1.0, or 100%.

Let’s try another word that returns multiple possibilities. If we input the string “personell”, we get a list of possible corrections with confidence scores because this string is fairly similar in spelling to a few different words.

 word = Word('personell') word.spellcheck() #[('personal', 0.65), #('personally', 0.2642857142857143), # ('peroneal', 0.06428571428571428), # ('personnel', 0.014285714285714285), # ('personen', 0.007142857142857143)] 

According to its documentation, TextBlob’s spelling correction feature is about 70% accurate.

autocorrect

The last package we’ll examine is called autocorrect. Again, we can install this package with pip:

 pip install autocorrect 

Once installed, we’ll import the Speller class from autocorrect. Then we’ll create an object that uses the English language (lang = ‘en’). We’ll use this object to do spelling corrections.

 from autocorrect import Speller check = Speller(lang='en') 

Next, we can input a sentence to our object, and it will attempt to correct any misspellings.

 check("does this sentece have misspelled wordz?") # 'does this sentence have misspelled words?' 

A few caveats

It’s important to keep in mind that no programmatic spell checker is perfect. However, Python does have several pre-made options available, as described above, but you could also potentially build your own as well using fuzzy matching. Also, words outside of context make it more difficult to determine the correct spelling if the misspelled string is similar to multiple words. For example, take the string “liberry”. This is a known misspelling for library. However, it is also just one letter off from liberty.

If we use one of the packages above, we get the word “liberty” returned, which is not illogical, as the string is very close in spelling, but context could help reveal which word makes the most sense. For building a contextual spell checker in Python, you might want to check out recurrent neural networks or Markov models.

 spell.correction("liberry") # liberty word = Word("liberry") word.spellcheck() # liberty check("liberry") # liberty 

That’s all for this post! Please click here to follow my blog on Twitter.