Quickstart
pyspellchecker
is designed to be easy to use to get basic spell checking.
Installation
The best experience is likely to use pip
:
pip install pyspellchecker
If you are using virtual environments, it is recommended to use pipenv
to
combine pip and virtual environments:
pipenv install pyspellchecker
Read more about Pipenv
Basic Usage
Setting up the spell checker requires importing and initializing the instance.
from spellchecker import SpellChecker
spell = SpellChecker()
There are several methods to determine if a word is in the word frequency list:
from spellchecker import SpellChecker
spell = SpellChecker()
spell['morning'] # True
'morning' in spell # True
# find those words from a list of words that are found in the dictionary
spell.known(['morning', 'hapenning']) # {'morning'}
# find those words from a list of words that are not found in the dictionary
spell.unknown(['morning', 'hapenning']) # {'hapenning'}
Once a word is identified as misspelled, you can find the likeliest replacement:
from spellchecker import SpellChecker
spell = SpellChecker()
misspelled = spell.unknown(['morning', 'hapenning']) # {'hapenning'}
for word in misspelled:
spell.correction(word) # 'happening'
If using a set of long words that is taking a long time to process corrections then the Levenshtein distance can be set to 1. The default, is 2.
from spellchecker import SpellChecker
spell = SpellChecker(distance=1) # set the Levenshtein Distance parameter
# do additional work
# now for shorter words, we can revert to Levenshtein Distance of 2!
spell.distance = 2
Or if the word identified as the likeliest is not correct, a list of candidates can also be pulled:
from spellchecker import SpellChecker
spell = SpellChecker()
misspelled = spell.unknown(['morning', 'hapenning']) # {'hapenning'}
for word in misspelled:
spell.correction(word) # {'penning', 'happening', 'henning'}
Changing Language
To set the language of the dictionary to load, one must set the language parameter on initialization.
from spellchecker import SpellChecker
spell = SpellChecker(language='es') # Spanish dictionary
print(spell['mañana'])
Multiple Languages
If you would like to check multiple default languages, it is possible to pass a list of language identifiers to the constructor to load each:
Adding and Removing Terms from a Dictionary
There are several ways to add additional terms to your word frequency dictionary including by filepath, string of text, or by a list of words.
To load a pre-defined dictionary file (either as a json file or a gzipped json file):
from spellchecker import SpellChecker
spell = SpellChecker()
spell.word_frequency.load_dictionary('./path-to-my-word-frequency.json')
To load a text document that will be parsed into individual words and each word added to the frequency list:
from spellchecker import SpellChecker
spell = SpellChecker()
spell.word_frequency.load_text_file('./path-to-my-text-doc.txt')
To load plain text from input or another source:
from spellchecker import SpellChecker
spell = SpellChecker()
spell.word_frequency.load_text('Text to be parsed and added to the system')
Or update using a list of words:
from spellchecker import SpellChecker
spell = SpellChecker()
spell.word_frequency.load_words(['Text', 'to', 'be','added', 'to', 'the', 'system'])
Or add a single word:
from spellchecker import SpellChecker
spell = SpellChecker()
spell.word_frequency.add('Text')
Removing words is as simple as adding words:
from spellchecker import SpellChecker
spell = SpellChecker()
spell.word_frequency.remove_words(['Text', 'to', 'be','removed', 'from', 'the', 'system'])
# or remove a single word
spell.word_frequency.remove('meh')
Iterating Over a Dictionary
Iterating over the dictionary is as easy as writing a simple for loop:
from spellchecker import SpellChecker
spell = SpellChecker()
for word in spell:
print("{}: {}".format(word, spell[word]))
The iterator returns the word. To get the number of times that the word is found in the WordFrequency object one can use a simple lookup.
How to Build a New Dictionary
Building a custom or new language dictionary is relatively straight forward. To
begin, you will need to have either a word frequency list or text files that
represent the usage of the terms. Since pyspellchecker
uses word frequency, it
is better to have the most common words have higher frequencies!
Once you have the corpus, code similar to the following should build out the dictionary:
from spellchecker import SpellChecker
# turn off loading a built language dictionary, case sensitive on (if desired)
spell = SpellChecker(language=None, case_sensitive=True)
# if you have a dictionary...
spell.word_frequency.load_dictionary('./path-to-my-json-dictionary.json')
# or... if you have text
spell.word_frequency.load_text_file('./path-to-my-text-doc.txt')
# export it out for later use!
spell.export('my_custom_dictionary.gz', gzipped=True)
It is also possible to build a dictionary from other sources outside of
pyspellchecker
, it requires that the data be in the following format and
saved as a json object:
{
"a": 1,
"b": 2,
"apple": 45,
"bike": 60
}
Note that the data does not need to be sorted!
A quick, command line spell checking program
Setting up a quick and easy command line program using pyspellchecker
is
straight forward:
from spellchecker import SpellChecker
# could add command line arguments to set the parameters of the spell
# check class; setup what type of information to present back, etc.
spell = SpellChecker()
print("To exit, hit return without input!")
while True:
word = input('Input a word to spell check: ')
if word == '': # not sure, but need a way to kill the program...
break
word = word.lower()
if word in spell:
print("'{}' is spelled correctly!".format(word))
else:
cor = spell.correction(word)
print("The best spelling for '{}' is '{}'".format(word, cor))
print("If that is not enough; here are all possible candidate words:")
print(spell.candidates(word))
Using with PyInstaller
It is possible to use pyspellchecker
with tools such as PyInstaller to add
spell-checking to your executable program. To do so, you will need to add the
required dictionaries to the executable.
You will need to add the files to a folder in your executable called spellchecker/resources/
to match the location that pyspellchecker
checks for the supported dictionaries.
pyinstaller --add-binary="spellchecker/resources/en.json.gz:spellchecker/resources" my_prog.py
On windows one should use a semi-colon instead of the colon:
pyinstaller --add-binary="spellchecker/resources/en.json.gz;spellchecker/resources" my_prog.py