Limited JSON API for Google searches using Programmable Search Engine

I figured out how to use a JSON API to run a very limited Google search today in a legit, non-screen-scraper way.

Google offer a product called Programmable Search Engine, which used to be called Google Custom Search.

It's intended for creating a search engine for your own site, by restricting results to specific domains - but when you create one you can opt to search the whole web instead.

You can then use their JSON API to run searches.

It's quite limited:

But it works! And it's pretty easy to get running.

First, create a new Programmable Search Engine from the dashboard. The create page is pretty straight-forward:

Screenshot of the create form - you basically just need to give it a name and solve a captcha

Now get an API key - I used the button in the middle of the API documentation:

A nice blue button for creating an API key

You need the "Search engine ID" from the dashboard - mine was 84ec3c54dca9646ff.

And that's it! You can combine the API key and search engine ID to run searches:

https://www.googleapis.com/customsearch/v1?key=API-KEY
  &cx=84ec3c54dca9646ff
  &q=SEARCH-TERM

It seems to support a lot of the same search filters as Google. I tried using this, URL-encoded, and seemed to get the results I wanted:

"powered by datasette" -site:github.com -site:simonwillison.net -site:datasette.io -site:pypi.org

The results come back as JSON that looks like this (truncated after the first result):

{
  "kind": "customsearch#search",
  "url": {
    "type": "application/json",
    "template": "https://www.googleapis.com/customsearch/v1?q={searchTerms}&num={count?}&start={startIndex?}&lr={language?}&safe={safe?}&cx={cx?}&sort={sort?}&filter={filter?}&gl={gl?}&cr={cr?}&googlehost={googleHost?}&c2coff={disableCnTwTranslation?}&hq={hq?}&hl={hl?}&siteSearch={siteSearch?}&siteSearchFilter={siteSearchFilter?}&exactTerms={exactTerms?}&excludeTerms={excludeTerms?}&linkSite={linkSite?}&orTerms={orTerms?}&relatedSite={relatedSite?}&dateRestrict={dateRestrict?}&lowRange={lowRange?}&highRange={highRange?}&searchType={searchType}&fileType={fileType?}&rights={rights?}&imgSize={imgSize?}&imgType={imgType?}&imgColorType={imgColorType?}&imgDominantColor={imgDominantColor?}&alt=json"
  },
  "queries": {
    "request": [
      {
        "title": "Google Custom Search - \"powered by datasette\" -site:github.com -site:simonwillison.net -site:datasette.io -site:pypi.org",
        "totalResults": "65200",
        "searchTerms": "\"powered by datasette\" -site:github.com -site:simonwillison.net -site:datasette.io -site:pypi.org",
        "count": 10,
        "startIndex": 1,
        "inputEncoding": "utf8",
        "outputEncoding": "utf8",
        "safe": "off",
        "cx": "84ec3c54dca9646ff"
      }
    ],
    "nextPage": [
      {
        "title": "Google Custom Search - \"powered by datasette\" -site:github.com -site:simonwillison.net -site:datasette.io -site:pypi.org",
        "totalResults": "65200",
        "searchTerms": "\"powered by datasette\" -site:github.com -site:simonwillison.net -site:datasette.io -site:pypi.org",
        "count": 10,
        "startIndex": 11,
        "inputEncoding": "utf8",
        "outputEncoding": "utf8",
        "safe": "off",
        "cx": "84ec3c54dca9646ff"
      }
    ]
  },
  "context": {
    "title": "The whole web"
  },
  "searchInformation": {
    "searchTime": 0.25516,
    "formattedSearchTime": "0.26",
    "totalResults": "65200",
    "formattedTotalResults": "65,200"
  },
  "items": [
    {
      "kind": "customsearch#result",
      "title": "hhs",
      "htmlTitle": "hhs",
      "link": "https://hhscovid.publicaccountability.org/hhs",
      "displayLink": "hhscovid.publicaccountability.org",
      "snippet": "Powered by Datasette · Queries took 5.536ms · Data source: U.S. Department of Health & Human Services · Home · Name Search · Dataset Search · Browse Datasets.",
      "htmlSnippet": "<b>Powered by Datasette</b> · Queries took 5.536ms · Data source: U.S. Department of Health &amp; Human Services &middot; Home &middot; Name Search &middot; Dataset Search &middot; Browse Datasets.",
      "cacheId": "QbpCTHbMliYJ",
      "formattedUrl": "https://hhscovid.publicaccountability.org/hhs",
      "htmlFormattedUrl": "https://hhscovid.publicaccountability.org/hhs",
      "pagemap": {
        "metatags": [
          {
            "viewport": "width=device-width, initial-scale=1, shrink-to-fit=no"
          }
        ]
      }
    }

As a bonus, you can pipe results into a SQLite database using sqlite-utils like this:

curl 'https://www.googleapis.com/customsearch...' | \
  jq .items | sqlite-utils insert /tmp/search.db search -   

Created 2023-09-16T17:00:48-07:00 · Edit