Given a large, heterogeneous table I wanted to identify patterns in the rows in terms of which columns were not null.
Imagine a table like this for example:
id field1 field2 field3 field4
I want to know how many records have values for (id, field1, field2) compared to the number of rows with values for (id, field3, field4).
I worked out the following query pattern for answering this question:
select case when [pk] is not null then 'pk, ' else '' end || case when [created] is not null then 'created, ' else '' end || case when [planet_int] is not null then 'planet_int, ' else '' end || case when [on_earth] is not null then 'on_earth, ' else '' end || case when [state] is not null then 'state, ' else '' end || case when [city_id] is not null then 'city_id, ' else '' end || case when [neighborhood] is not null then 'neighborhood, ' else '' end || case when [tags] is not null then 'tags, ' else '' end || case when [complex_array] is not null then 'complex_array, ' else '' end || case when [distinct_some_null] is not null then 'distinct_some_null, ' else '' end as columns, count(*) as num_rows from [facetable] group by columns order by num_rows desc
Try that here
This has the desired effect: it gives me back all of the combinations of not-null columns in the table, with a count for each one.
(Running this on a table with 1,000,000+ rows took about 40 seconds, so I had to use
datasette data.db --setting sql_time_limit_ms 100000 to bump up the default time limit in Datasette.)
One remaining problem: how to generate the above query for an arbitrary table. I came up with the following SQL query for generating a SQL query like the above:
select 'select ' || group_concat(' case when [' || name || '] is not null then ' || quote(name || ', ') || ' else '''' end', ' || ') || ' as columns, count(*) as num_rows from [' || :table || '] group by columns order by num_rows desc' as query from pragma_table_info(:table)
Try that out
:table as an input and generates SQL which can be used to generate column-combination counts.
Created 2021-01-12T14:00:51-08:00 · Edit