pyspellchecker API

Here you can find the full developer API for the pyspellchecker project. pyspellchecker provides a library for determining if a word is misspelled and what the likely correct spelling would be based on word frequency.

SpellChecker

class spellchecker.SpellChecker(language: Union[str, Iterable[str]] = 'en', local_dictionary: Union[pathlib.Path, str, None] = None, distance: int = 2, tokenizer: Optional[Callable[[str], Iterable[str]]] = None, case_sensitive: bool = False)[source]

The SpellChecker class encapsulates the basics needed to accomplish a simple spell checking algorithm. It is based on the work by Peter Norvig (https://norvig.com/spell-correct.html)

Parameters:
  • language (str) – The language of the dictionary to load or None for no dictionary. Supported languages are en, es, de, fr, pt, ru, lv, and eu. Defaults to en. A list of languages may be provided and all languages will be loaded.
  • local_dictionary (str) – The path to a locally stored word frequency dictionary; if provided, no language will be loaded
  • distance (int) – The edit distance to use. Defaults to 2.
  • case_sensitive (bool) – Flag to use a case sensitive dictionary or not, only available when not using a language dictionary.

Note

Using a case sensitive dictionary can be slow to correct words.

candidates(word: Union[str, bytes]) → Optional[Set[str]][source]

Generate possible spelling corrections for the provided word up to an edit distance of two, if and only when needed

Parameters:word (str) – The word for which to calculate candidate spellings
Returns:The set of words that are possible candidates or None if there are no candidates
Return type:set
correction(word: Union[str, bytes]) → Optional[str][source]

The most probable correct spelling for the word

Parameters:word (str) – The word to correct
Returns:The most likely candidate or None if no correction is present
Return type:str
distance

The maximum edit distance to calculate

Note

Valid values are 1 or 2; if an invalid value is passed, defaults to 2

Type:int
edit_distance_1(word: Union[str, bytes]) → Set[str][source]

Compute all strings that are one edit away from word using only the letters in the corpus

Parameters:word (str) – The word for which to calculate the edit distance
Returns:The set of strings that are edit distance one from the provided word
Return type:set
edit_distance_2(word: Union[str, bytes]) → List[str][source]

Compute all strings that are two edits away from word using only the letters in the corpus

Parameters:word (str) – The word for which to calculate the edit distance
Returns:The set of strings that are edit distance two from the provided word
Return type:set
export(filepath: Union[pathlib.Path, str], encoding: str = 'utf-8', gzipped: bool = True) → None[source]

Export the word frequency list for import in the future

Parameters:
  • filepath (str) – The filepath to the exported dictionary
  • encoding (str) – The encoding of the resulting output
  • gzipped (bool) – Whether to gzip the dictionary or not
known(words: Iterable[Union[str, bytes]]) → Set[str][source]

The subset of words that appear in the dictionary of words

Parameters:words (list) – List of words to determine which are in the corpus
Returns:The set of those words from the input that are in the corpus
Return type:set
classmethod languages() → Iterable[str][source]

list: A list of all official languages supported by the library

split_words(text: Union[str, bytes]) → Iterable[str][source]

Split text into individual words using either a simple whitespace regex or the passed in tokenizer

Parameters:text (str) – The text to split into individual words
Returns:A listing of all words in the provided text
Return type:list(str)
unknown(words: Iterable[Union[str, bytes]]) → Set[str][source]

The subset of words that do not appear in the dictionary

Parameters:words (list) – List of words to determine which are not in the corpus
Returns:The set of those words from the input that are not in the corpus
Return type:set
word_frequency

An encapsulation of the word frequency dictionary

Note

Not settable

Type:WordFrequency
word_usage_frequency(word: Union[str, bytes], total_words: Optional[int] = None) → float[source]

Calculate the frequency to the word provided as seen across the entire dictionary

Parameters:
  • word (str) – The word for which the word probability is calculated
  • total_words (int) – The total number of words to use in the calculation; use the default for using the whole word frequency
Returns:

The probability that the word is the correct word

Return type:

float

WordFrequency

class spellchecker.WordFrequency(tokenizer=None, case_sensitive=False)[source]

Store the dictionary as a word frequency list while allowing for different methods to load the data and update over time

add(word: Union[str, bytes], val: int = 1) → None[source]

Add a word to the word frequency list

Parameters:
  • word (str) – The word to add
  • val (int) – The number of times to insert the word
dictionary

A counting dictionary of all words in the corpus and the number of times each has been seen

Note

Not settable

Type:Counter
items() → Generator[Tuple[str, int], None, None][source]

Iterator over the words in the dictionary

Yields:str – The next word in the dictionary int: The number of instances in the dictionary

Note

This is the same as dict.items()

keys() → Generator[str, None, None][source]

Iterator over the key of the dictionary

Yields:str – The next key in the dictionary

Note

This is the same as spellchecker.words()

letters

The listing of all letters found within the corpus

Note

Not settable

Type:set
load_dictionary(filename: Union[pathlib.Path, str], encoding: str = 'utf-8') → None[source]

Load in a pre-built word frequency list

Parameters:
  • filename (str) – The filepath to the json (optionally gzipped) file to be loaded
  • encoding (str) – The encoding of the dictionary
load_json(data: Dict[str, int]) → None[source]

Load in a pre-built word frequency list

Parameters:data (dict) – The dictionary to be loaded
load_text(text: Union[str, bytes], tokenizer: Optional[Callable[[str], Iterable[str]]] = None) → None[source]

Load text from which to generate a word frequency list

Parameters:
  • text (str) – The text to be loaded
  • tokenizer (function) – The function to use to tokenize a string
load_text_file(filename: Union[pathlib.Path, str], encoding: str = 'utf-8', tokenizer: Optional[Callable[[str], Iterable[str]]] = None) → None[source]

Load in a text file from which to generate a word frequency list

Parameters:
  • filename (str) – The filepath to the text file to be loaded
  • encoding (str) – The encoding of the text file
  • tokenizer (function) – The function to use to tokenize a string
load_words(words: Iterable[Union[str, bytes]]) → None[source]

Load a list of words from which to generate a word frequency list

Parameters:words (list) – The list of words to be loaded
longest_word_length

The longest word length in the dictionary

Note

Not settable

Type:int
pop(key: Union[str, bytes], default: Optional[int] = None) → int[source]

Remove the key and return the associated value or default if not found

Parameters:
  • key (str) – The key to remove
  • default (obj) – The value to return if key is not present
remove(word: Union[str, bytes]) → None[source]

Remove a word from the word frequency list

Parameters:word (str) – The word to remove
remove_by_threshold(threshold: int = 5) → None[source]

Remove all words at, or below, the provided threshold

Parameters:threshold (int) – The threshold at which a word is to be removed
remove_words(words: Iterable[Union[str, bytes]]) → None[source]

Remove a list of words from the word frequency list

Parameters:words (list) – The list of words to remove
tokenize(text: Union[str, bytes]) → Generator[str, None, None][source]

Tokenize the provided string object into individual words

Parameters:text (str) – The string object to tokenize
Yields:str – The next word in the tokenized string

Note

This is the same as the spellchecker.split_words() unless a tokenizer function was provided.

total_words

The sum of all word occurrences in the word frequency dictionary

Note

Not settable

Type:int
unique_words

The total number of unique words in the word frequency list

Note

Not settable

Type:int
words() → Generator[str, None, None][source]

Iterator over the words in the dictionary

Yields:str – The next word in the dictionary

Note

This is the same as spellchecker.keys()