.. _quickstart: Quickstart =============================================================================== ``pyspellchecker`` is designed to be easy to use to get basic spell checking. Installation +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The best experience is likely to use ``pip``: .. code:: bash pip install pyspellchecker If you are using virtual environments, it is recommended to use ``pipenv`` to combine pip and virtual environments: .. code:: bash pipenv install pyspellchecker Read more about `Pipenv `__ Basic Usage +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Setting up the spell checker requires importing and initializing the instance. .. code:: python from spellchecker import SpellChecker spell = SpellChecker() There are several methods to determine if a word is in the word frequency list: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell['morning'] # True 'morning' in spell # True # find those words from a list of words that are found in the dictionary spell.known(['morning', 'hapenning']) # {'morning'} # find those words from a list of words that are not found in the dictionary spell.unknown(['morning', 'hapenning']) # {'hapenning'} Once a word is identified as misspelled, you can find the likeliest replacement: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() misspelled = spell.unknown(['morning', 'hapenning']) # {'hapenning'} for word in misspelled: spell.correction(word) # 'happening' If using a set of long words that is taking a long time to process corrections then the Levenshtein distance can be set to 1. The default, is 2. .. code:: python from spellchecker import SpellChecker spell = SpellChecker(distance=1) # set the Levenshtein Distance parameter # do additional work # now for shorter words, we can revert to Levenshtein Distance of 2! spell.distance = 2 Or if the word identified as the likeliest is not correct, a list of candidates can also be pulled: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() misspelled = spell.unknown(['morning', 'hapenning']) # {'hapenning'} for word in misspelled: spell.correction(word) # {'penning', 'happening', 'henning'} Changing Language +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ To set the language of the dictionary to load, one must set the language parameter on initialization. .. code:: python from spellchecker import SpellChecker spell = SpellChecker(language='es') # Spanish dictionary print(spell['maƱana']) Multiple Languages ------------------------------------------------------------------------------- If you would like to check multiple default languages, it is possible to pass a list of language identifiers to the constructor to load each: .. code:: python from spellchecker import SpellChecker spell = SpellChecker(language=['es', 'en']) Adding and Removing Terms from a Dictionary +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ There are several ways to add additional terms to your word frequency dictionary including by filepath, string of text, or by a list of words. To load a pre-defined dictionary file (either as a json file or a gzipped json file): .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell.word_frequency.load_dictionary('./path-to-my-word-frequency.json') To load a text document that will be parsed into individual words and each word added to the frequency list: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell.word_frequency.load_text_file('./path-to-my-text-doc.txt') To load plain text from input or another source: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell.word_frequency.load_text('Text to be parsed and added to the system') Or update using a list of words: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell.word_frequency.load_words(['Text', 'to', 'be','added', 'to', 'the', 'system']) Or add a single word: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell.word_frequency.add('Text') Removing words is as simple as adding words: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() spell.word_frequency.remove_words(['Text', 'to', 'be','removed', 'from', 'the', 'system']) # or remove a single word spell.word_frequency.remove('meh') Iterating Over a Dictionary +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Iterating over the dictionary is as easy as writing a simple for loop: .. code:: python from spellchecker import SpellChecker spell = SpellChecker() for word in spell: print("{}: {}".format(word, spell[word])) The iterator returns the word. To get the number of times that the word is found in the WordFrequency object one can use a simple lookup. How to Build a New Dictionary +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Building a custom or new language dictionary is relatively straight forward. To begin, you will need to have either a word frequency list or text files that represent the usage of the terms. Since ``pyspellchecker`` uses word frequency, it is better to have the most common words have higher frequencies! Once you have the corpus, code similar to the following should build out the dictionary: .. code:: python from spellchecker import SpellChecker # turn off loading a built language dictionary, case sensitive on (if desired) spell = SpellChecker(language=None, case_sensitive=True) # if you have a dictionary... spell.word_frequency.load_dictionary('./path-to-my-json-dictionary.json') # or... if you have text spell.word_frequency.load_text_file('./path-to-my-text-doc.txt') # export it out for later use! spell.export('my_custom_dictionary.gz', gzipped=True) It is also possible to build a dictionary from other sources outside of ``pyspellchecker``, it requires that the data be in the following format and saved as a json object: .. code:: python { "a": 1, "b": 2, "apple": 45, "bike": 60 } Note that the data does not need to be sorted! A quick, command line spell checking program +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Setting up a quick and easy command line program using ``pyspellchecker`` is straight forward: .. code:: python from spellchecker import SpellChecker # could add command line arguments to set the parameters of the spell # check class; setup what type of information to present back, etc. spell = SpellChecker() print("To exit, hit return without input!") while True: word = input('Input a word to spell check: ') if word == '': # not sure, but need a way to kill the program... break word = word.lower() if word in spell: print("'{}' is spelled correctly!".format(word)) else: cor = spell.correction(word) print("The best spelling for '{}' is '{}'".format(word, cor)) print("If that is not enough; here are all possible candidate words:") print(spell.candidates(word)) Using with PyInstaller +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ It is possible to use ``pyspellchecker`` with tools such as PyInstaller to add spell-checking to your executable program. To do so, you will need to add the required dictionaries to the executable. You will need to add the files to a folder in your executable called **spellchecker/resources/** to match the location that ``pyspellchecker`` checks for the supported dictionaries. .. code:: bash pyinstaller --add-binary="spellchecker/resources/en.json.gz:spellchecker/resources" my_prog.py On windows one should use a semi-colon instead of the colon: .. code:: bash pyinstaller --add-binary="spellchecker/resources/en.json.gz;spellchecker/resources" my_prog.py