I wanted to run a query against the GitHub GraphQL API using curl
on the command line, while keeping the query itself as readable as possible. Here's the recipe I came up with (tested in both bash
and zsh
), with TOKEN replaced by my GitHub API personal access token:
curl -s https://api.github.com/graphql -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d "$(jq -c -n --arg query '
{
search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) {
repositoryCount
nodes {
__typename
... on Repository {
nameWithOwner
description
defaultBranchRef {
name
target {
... on Commit {
committedDate
url
message
}
}
}
}
}
}
}' '{"query":$query}')"
As you can see, the GraphQL query itself is embedded in plain text inside a complex set of escaping tricks.
I needed to encode the query as part of a JSON document that looks like this:
{"query":"\n{\n search(type: REPOSITORY, query: \"user:simonw topic:git-scraping\", first: 100) {\n repositoryCount\n nodes {\n __typename\n ... on Repository {\n nameWithOwner\n description\n defaultBranchRef {\n name\n target {\n ... on Commit {\n committedDate\n url\n message\n }\n }\n }\n }\n }\n }\n}"}
The jq
recipe handles the construction of that document for me:
jq -c -n --arg query '
{
search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) {
repositoryCount
nodes {
__typename
... on Repository {
nameWithOwner
description
defaultBranchRef {
name
target {
... on Commit {
committedDate
url
message
}
}
}
}
}
}
}' '{"query":$query}'
jq -c
means "compact syntax" - so the JSON itself is produced as a single line.
The -n
option stands for --null-input
- which is described as:
Don´t read any input at all! Instead, the filter is run once using null as the input. This is useful when using
jq
as a simple calculator or to construct JSON data from scratch.
Then --arg query '...'
sets a variable within jq
to the string representing my GraphQL query.
Finally I evaluate the jq
expression '{"query":$query}'
which constructs the final document with my GraphQL query as the value for the "query"
key.
Having constructed the JSON document, I needed to pass it to the curl -d
option to submit it to the server.
The recipe for doing that is:
-d "$(jq -c -n --arg query ...)"
I tried doing this with -d $(jq ...)
first, and it didn't work - because whitespace inside the substition was treated as separate tokens passed to curl
.
Adding the wrapping double quotes caused the substition result to be treated as a single value.
I was worried that double quotes within the string itself would break out of the pattern, but this Stackoverflow answer reassured me otherwise:
Once one is inside
$(...)
, quoting starts all over from scratch.In other words,
"..."
and$(...)
can nest within each other. Command substitution,$(...)
, can contain one or more complete double-quoted strings.
The resulting (but truncated) JSON from the GraphQL query looks like this:
{
"data": {
"search": {
"repositoryCount": 22,
"nodes": [
{
"__typename": "Repository",
"nameWithOwner": "simonw/csv-diff",
"description": "Python CLI tool and library for diffing CSV and JSON files",
"defaultBranchRef": {
"name": "main",
"target": {
"committedDate": "2021-02-23T02:53:11Z",
"url": "https://github.com/simonw/csv-diff/commit/33e0a5918283c02a339a1fb507fc7a9cda89a198",
"message": "Handle missing JSON keys, refs #13"
}
}
},
My end goal was to create a SQLite database with a record for each of my GitHub repositories that were tagged git-scraping
that included the date of their most recent commit. Here's how I did that:
curl https://api.github.com/graphql -X POST \
-H "Authorization: Bearer ..." \
-H "Content-Type: application/json" \
-d "$(jq -c -n --arg query '
{
search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) {
repositoryCount
nodes {
__typename
... on Repository {
nameWithOwner
description
defaultBranchRef {
name
target {
... on Commit {
committedDate
url
message
}
}
}
}
}
}
}' '{"query":$query}')" \
| jq .data.search.nodes | sqlite-utils insert /tmp/github.db repos - --flatten
The line doing the work at the end is:
| jq .data.search.nodes | sqlite-utils insert /tmp/github.db repos - --flatten
This uses jq
to pull out the {"data": {"search": {"nodes": [...]
array from the returned JSON, then pipes that into sqlite-utils insert.
This line does the rest:
sqlite-utils insert /tmp/github.db repos - --flatten
That reads from standard input (-
) and creates a repos
table in the new github.db
SQLite file.
The --flatten
option at the end ensures that nested fields such as {"defaultBranchRef": {"target": {"committedDate": ...
are flattened to columns with names like defaultBranchRef_target_committedDate
.
The final table schema looks like this:
% sqlite-utils schema /tmp/github.db
CREATE TABLE [repos] (
[__typename] TEXT,
[nameWithOwner] TEXT,
[description] TEXT,
[defaultBranchRef_name] TEXT,
[defaultBranchRef_target_committedDate] TEXT,
[defaultBranchRef_target_url] TEXT,
[defaultBranchRef_target_message] TEXT
);
Created 2022-02-21T16:11:12-08:00, updated 2023-04-30T22:17:01-07:00 · History · Edit