Protocols in Python

Datasette currently has a few API internals that return sqlite3.Row objects. I was thinking about how this might work in the future - if Datasette ever expands beyond SQLite (plugin-provided backends for PostgreSQL and DuckDB for example) I'd want a way to return data from other stores using objects that behave like sqlite3.Row but are not exactly that class.

I thought about implementing my own wrapper class for sqlite3.Row, but one of its benefits is that it's written in C and hence should provide optimal memory usage and performance.

It looks like that's what typing.Protocol() is for.

Here's some code I put together (with initial assistance from both Claude and ChatGPT) to explore what that would look like:

from typing import Any, Dict, List, Protocol, Union
import sqlite3


class RowProtocol(Protocol):
    def keys(self) -> List[str]:
        ...

    def __getitem__(self, index: Union[int, str]) -> Any:
        ...


class MyRow:
    def __init__(self, data: Dict[str, Any]):
        self.data = data

    def keys(self) -> List[str]:
        return list(self.data.keys())

    def __getitem__(self, index: Union[int, str]) -> Any:
        if isinstance(index, int):
            key = self.keys()[index]
            return self.data.get(key)
        elif isinstance(index, str):
            return self.data.get(index)
        else:
            raise TypeError("Index must be either int or str.")


def get_rows() -> List[RowProtocol]:
    row1 = MyRow({"name": "Milo", "species": "cat"})

    conn = sqlite3.connect(":memory:")
    conn.row_factory = sqlite3.Row
    row2 = conn.execute("select 'Cleo' as name, 'dog' as species").fetchone()

    return [row1, row2]


if __name__ == "__main__":
    rows = get_rows()
    for row in rows:
        # Uncomment this when running mypy:
        # reveal_type(row)
        print(row.keys(), row["name"])

This passes a mypy check. Running it demonstrates that the MyRow and sqlite3.Row objects can be treated equivalently.

Uncommenting reveal_type(row) causes mypy to print out the RowProtocol type while it is running.

The thing that surprised me about this at first is that I had expected I would need to "register" the types with the protocol in some way - but it turns out protocols really are just a formalization of Python's duck typing.

Effectively this code is saying "the objects returned by get_rows() should only be accessed via their .keys() and __getitem__() methods".

Which looks like exactly what I would need to implement my own alternative to sqlite3.Row in the future in a way that works neatly with Python type checking tools.

Conditional reveal_type

That reveal_type(row) line will raise an error if you run the code using python and not mypy. The fix for that looks like this:

from typing import TYPE_CHECKING

...

if TYPE_CHECKING:
    reveal_type(obj)

Elsewhere

Created 2023-07-26T08:24:47-07:00, updated 2023-07-26T14:20:20-07:00 · History · Edit