Using curl to run GraphQL queries from the command line

I wanted to run a query against the GitHub GraphQL API using curl on the command line, while keeping the query itself as readable as possible. Here's the recipe I came up with (tested in both bash and zsh), with TOKEN replaced by my GitHub API personal access token:

curl -s https://api.github.com/graphql -X POST \
-H "Authorization: Bearer TOKEN" \
-H "Content-Type: application/json" \
-d "$(jq -c -n --arg query '
{
  search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) {
    repositoryCount
    nodes {
      __typename
      ... on Repository {
        nameWithOwner
        description
        defaultBranchRef {
          name
          target {
            ... on Commit {
              committedDate
              url
              message
            }
          }
        }
      }
    }
  }
}' '{"query":$query}')"

As you can see, the GraphQL query itself is embedded in plain text inside a complex set of escaping tricks.

Building a JSON document with jq

I needed to encode the query as part of a JSON document that looks like this:

{"query":"\n{\n  search(type: REPOSITORY, query: \"user:simonw topic:git-scraping\", first: 100) {\n    repositoryCount\n    nodes {\n      __typename\n      ... on Repository {\n        nameWithOwner\n        description\n        defaultBranchRef {\n          name\n          target {\n            ... on Commit {\n              committedDate\n              url\n              message\n            }\n          }\n        }\n      }\n    }\n  }\n}"}

The jq recipe handles the construction of that document for me:

jq -c -n --arg query '
{
  search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) {
    repositoryCount
    nodes {
      __typename
      ... on Repository {
        nameWithOwner
        description
        defaultBranchRef {
          name
          target {
            ... on Commit {
              committedDate
              url
              message
            }
          }
        }
      }
    }
  }
}' '{"query":$query}'

jq -c means "compact syntax" - so the JSON itself is produced as a single line.

The -n option stands for --null-input - which is described as:

Don´t read any input at all! Instead, the filter is run once using null as the input. This is useful when using jq as a simple calculator or to construct JSON data from scratch.

Then --arg query '...' sets a variable within jq to the string representing my GraphQL query.

Finally I evaluate the jq expression '{"query":$query}' which constructs the final document with my GraphQL query as the value for the "query" key.

Passing that to curl with "$()"

Having constructed the JSON document, I needed to pass it to the curl -d option to submit it to the server.

The recipe for doing that is:

-d "$(jq -c -n --arg query ...)"

I tried doing this with -d $(jq ...) first, and it didn't work - because whitespace inside the substition was treated as separate tokens passed to curl.

Adding the wrapping double quotes caused the substition result to be treated as a single value.

I was worried that double quotes within the string itself would break out of the pattern, but this Stackoverflow answer reassured me otherwise:

Once one is inside $(...), quoting starts all over from scratch.

In other words, "..." and $(...) can nest within each other. Command substitution, $(...), can contain one or more complete double-quoted strings.

The resulting (but truncated) JSON from the GraphQL query looks like this:

{
  "data": {
    "search": {
      "repositoryCount": 22,
      "nodes": [
        {
          "__typename": "Repository",
          "nameWithOwner": "simonw/csv-diff",
          "description": "Python CLI tool and library for diffing CSV and JSON files",
          "defaultBranchRef": {
            "name": "main",
            "target": {
              "committedDate": "2021-02-23T02:53:11Z",
              "url": "https://github.com/simonw/csv-diff/commit/33e0a5918283c02a339a1fb507fc7a9cda89a198",
              "message": "Handle missing JSON keys, refs #13"
            }
          }
        },

Combining it with sqlite-utils insert

My end goal was to create a SQLite database with a record for each of my GitHub repositories that were tagged git-scraping that included the date of their most recent commit. Here's how I did that:

curl https://api.github.com/graphql -X POST \
-H "Authorization: Bearer ..." \
-H "Content-Type: application/json" \
-d "$(jq -c -n --arg query '
{
  search(type: REPOSITORY, query: "user:simonw topic:git-scraping", first: 100) {
    repositoryCount
    nodes {
      __typename
      ... on Repository {
        nameWithOwner
        description
        defaultBranchRef {
          name
          target {
            ... on Commit {
              committedDate
              url
              message
            }
          }
        }
      }
    }
  }
}' '{"query":$query}')" \
  | jq .data.search.nodes | sqlite-utils insert /tmp/github.db repos - --flatten

The line doing the work at the end is:

| jq .data.search.nodes | sqlite-utils insert /tmp/github.db repos - --flatten

This uses jq to pull out the {"data": {"search": {"nodes": [...] array from the returned JSON, then pipes that into sqlite-utils insert.

This line does the rest:

sqlite-utils insert /tmp/github.db repos - --flatten

That reads from standard input (-) and creates a repos table in the new github.db SQLite file.

The --flatten option at the end ensures that nested fields such as {"defaultBranchRef": {"target": {"committedDate": ... are flattened to columns with names like defaultBranchRef_target_committedDate.

The final table schema looks like this:

% sqlite-utils schema /tmp/github.db 
CREATE TABLE [repos] (
   [__typename] TEXT,
   [nameWithOwner] TEXT,
   [description] TEXT,
   [defaultBranchRef_name] TEXT,
   [defaultBranchRef_target_committedDate] TEXT,
   [defaultBranchRef_target_url] TEXT,
   [defaultBranchRef_target_message] TEXT
);

Created 2022-02-21T16:11:12-08:00, updated 2023-04-30T22:17:01-07:00 · History · Edit