Today I discovered codespell via this Rich commit. codespell
is a really simple spell checker that can be run locally or incorporated into a CI flow.
codespell is designed to run against source code. Instead of using a dictionary of correctly spelled words it instead uses a dictionary of known common spelling mistakes, derived from English Wikipedia (defined here). This makes it less likely to be confused by variable or function names, while still being able to spot spelling mistakes in comments.
Basic usage:
pip install codespell
codespell
# Or point it at a folder, or files in that folder:
codespell docs/*.rst
This outputs any spelling errors it finds in those files. I got this the first time I ran it against the Datasette documentation:
docs/authentication.rst:63: perfom ==> perform
docs/authentication.rst:76: perfom ==> perform
docs/changelog.rst:429: repsonse ==> response
docs/changelog.rst:503: permissons ==> permissions
docs/changelog.rst:717: compatibilty ==> compatibility
docs/changelog.rst:1172: browseable ==> browsable
docs/deploying.rst:191: similiar ==> similar
docs/internals.rst:434: Respons ==> Response, respond
docs/internals.rst:440: Respons ==> Response, respond
docs/internals.rst:717: tha ==> than, that, the
docs/performance.rst:42: databse ==> database
docs/plugin_hooks.rst:667: utilites ==> utilities
docs/publish.rst:168: countainer ==> container
docs/settings.rst:352: inalid ==> invalid
docs/sql_queries.rst:406: preceeded ==> preceded, proceeded
You can create a file of additional words that it should ignore and pass that using the --ignore-words
option:
codespell docs/*.rst --ignore-words docs/codespell-ignore-words.txt
Since I don't have any words in that file yet I added one fake word, so my file looks like this:
AddWordsToIgnoreHere
Each ignored word should be on a separate line.
I added it to my GitHub Actions CI like this:
name: Check spelling in documentation
on: [push, pull_request]
jobs:
spellcheck:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: 3.9
- uses: actions/cache@v2
name: Configure pip caching
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-spellcheck
restore-keys: |
${{ runner.os }}-pip-spellcheck
- name: Install dependencies
run: |
pip install codespell
- name: Check spelling
run: codespell docs/*.rst --ignore-words docs/codespell-ignore-words.txt
Now any push or pull request will have the spell checker applied to it, and will fail if any new incorrectly spelled words are detected.
Here's the full PR where I added this to Datasette, and the commit where I added this to sqlite-utils
.
Created 2021-08-03T09:34:02-07:00, updated 2021-08-03T10:09:50-07:00 · History · Edit