Check spelling using codespell

Today I discovered codespell via this Rich commit. codespell is a really simple spell checker that can be run locally or incorporated into a CI flow.

codespell is designed to run against source code. Instead of using a dictionary of correctly spelled words it instead uses a dictionary of known common spelling mistakes, derived from English Wikipedia (defined here). This makes it less likely to be confused by variable or function names, while still being able to spot spelling mistakes in comments.

Basic usage:

pip install codespell
codespell
# Or point it at a folder, or files in that folder:
codespell docs/*.rst

This outputs any spelling errors it finds in those files. I got this the first time I ran it against the Datasette documentation:

docs/authentication.rst:63: perfom ==> perform
docs/authentication.rst:76: perfom ==> perform
docs/changelog.rst:429: repsonse ==> response
docs/changelog.rst:503: permissons ==> permissions
docs/changelog.rst:717: compatibilty ==> compatibility
docs/changelog.rst:1172: browseable ==> browsable
docs/deploying.rst:191: similiar ==> similar
docs/internals.rst:434: Respons ==> Response, respond
docs/internals.rst:440: Respons ==> Response, respond
docs/internals.rst:717: tha ==> than, that, the
docs/performance.rst:42: databse ==> database
docs/plugin_hooks.rst:667: utilites ==> utilities
docs/publish.rst:168: countainer ==> container
docs/settings.rst:352: inalid ==> invalid
docs/sql_queries.rst:406: preceeded ==> preceded, proceeded

You can create a file of additional words that it should ignore and pass that using the --ignore-words option:

codespell docs/*.rst --ignore-words docs/codespell-ignore-words.txt

Since I don't have any words in that file yet I added one fake word, so my file looks like this:

AddWordsToIgnoreHere

Each ignored word should be on a separate line.

I added it to my GitHub Actions CI like this:

name: Check spelling in documentation

on: [push, pull_request]

jobs:
  spellcheck:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - name: Set up Python ${{ matrix.python-version }}
      uses: actions/setup-python@v2
      with:
        python-version: 3.9
    - uses: actions/cache@v2
      name: Configure pip caching
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-spellcheck
        restore-keys: |
          ${{ runner.os }}-pip-spellcheck
    - name: Install dependencies
      run: |
        pip install codespell
    - name: Check spelling
      run: codespell docs/*.rst --ignore-words docs/codespell-ignore-words.txt

Now any push or pull request will have the spell checker applied to it, and will fail if any new incorrectly spelled words are detected.

Here's the full PR where I added this to Datasette, and the commit where I added this to sqlite-utils.

Created 2021-08-03T09:34:02-07:00, updated 2021-08-03T10:09:50-07:00 · History · Edit