Today I added "related TILs" to this TIL website - so each TIL now shows five related TILs at the bottom of the page.
I'm generating the related content using a SQLite full-text search query.
UPDATE: I switched this out for a different implementation, described in Storing and serving related documents with openai-to-sqlite and embeddings.
I take the title and body text of an entry, strip out any non-alphanumeric characters, de-dupe the words, and then combine them to form a big OR search query.
I built an initial prototype of this using an Observable notebook that generated these queries from pasted text. See also my research issue.
Here's a simple version of that query, using some example words pulled from this entry. It executes against til_fts
which is a SQLite FTS table created using the .enable_fts()
method of the sqlite-utils Python library.
select title, rank from til_fts where til_fts match '
i OR wanted OR to OR run OR some OR django OR tests OR using OR
pytestdjango OR and OR with OR configured OR pick OR up OR the
OR databaseurl OR environment OR variable OR via OR djdatabaseurl
OR against OR a OR postgresql OR server OR running OR in OR
github OR actions OR it OR took OR while OR figure OR out OR
right OR pattern OR trick OR was OR define OR postgres OR service'
order by rank limit 5
Here are the results from that query. Unsurprisingly the entry itself shows up first, but the other items look relevant enough to me:
title | rank |
---|---|
Running tests against PostgreSQL in a service container | -61.04335068286244 |
Talking to a PostgreSQL service container from inside a Docker container | -37.54518907167069 |
Allowing a container in Docker Desktop for Mac to talk to a PostgreSQL server on the host machine | -29.712660785491842 |
Installing different PostgreSQL server versions in GitHub Actions | -27.738649522063493 |
Docker Compose for Django development | -25.027695483498224 |
The key trick here is the order by rank
- SQLite FTS has a robust scoring mechanism built in, and it turns out it's good enough that simply feeding in all of the words from the entry and sorting by that rank provides good-enough related content results on its own.
In order to display related content I need to do two additional things: I need to filter out the current entry, and I need to join against the til
table in order to retrieve the information I want to display.
Here's the finished query:
select
til.topic, til.slug, til.title, til.created
from
til
join til_fts on til.rowid = til_fts.rowid
where
til_fts match :words
and not (
til.slug = :slug
and til.topic = :topic
)
order by
til_fts.rank
limit
5
And an example of it running, which returns the following:
topic | slug | title | created |
---|---|---|---|
github-actions | service-containers-docker | Talking to a PostgreSQL service container from inside a Docker container | 2020-09-18T11:43:19-07:00 |
docker | docker-for-mac-container-to-postgresql-on-host | Allowing a container in Docker Desktop for Mac to talk to a PostgreSQL server on the host machine | 2022-03-31T22:48:17-07:00 |
github-actions | different-postgresql-versions | Installing different PostgreSQL server versions in GitHub Actions | 2021-07-05T17:43:13-07:00 |
docker | docker-compose-for-django-development | Docker Compose for Django development | 2021-05-24T22:08:23-07:00 |
django | just-with-django | Using just with Django | 2022-06-06T14:24:37-07:00 |
I decided to implement this using a custom Jinja template function, added using a Datasette one-off plugin.
The custom function is called related_tils()
, and is called from the template like this:
{% set related = related_tils(til) %}
{% if related %}
<h3>Related</h3>
<ul class="related">
{% for til in related %}
<li><span class="topic">{{ til.topic }}</span> <a href="/{{ til.topic }}/{{ til.slug }}">{{ til.title }}</a> - {{ til.created[:10] }}</li>
{% endfor %}
</ul>
{% endif %}
Here's the implementation of that custom template function. It's registered using the extra_template_vars
Datasette plugin hook (described here).
The function itself is an inner async def
function, because it needs to be able to use await
in order to execute its SQL query.
Jinja automatically awaits this kind of function when the template is rendered (see enable_async=True
in the Jinja documentation).
So this function takes a TIL entry, creates a de-duped list of words from the title and body, uses that to construct and execute the SQL query and returns the results:
from datasette import hookimpl
import re
non_alphanumeric = re.compile(r"[^a-zA-Z0-9\s]")
multi_spaces = re.compile(r"\s+")
@hookimpl
def extra_template_vars(request, datasette):
async def related_tils(til):
text = til["title"] + " " + til["body"]
text = non_alphanumeric.sub(" ", text)
text = multi_spaces.sub(" ", text)
words = list(set(text.lower().strip().split()))
sql = """
select
til.topic, til.slug, til.title, til.created
from
til
join til_fts on til.rowid = til_fts.rowid
where
til_fts match :words
and not (
til.slug = :slug
and til.topic = :topic
)
order by
til_fts.rank
limit
5
"""
result = await datasette.get_database().execute(
sql,
{"words": " OR ".join(words), "slug": til["slug"], "topic": til["topic"]},
)
return result.rows
return {
"related_tils": related_tils,
}
Here's the full code for that plugin, which also registers some other template functions.
Created 2022-07-31T17:53:23-07:00, updated 2023-08-14T23:23:40-07:00 · History · Edit