TIL search: cloud
I decided to publish static CSV files to accompany my https://cdc-vaccination-history.datasette.io/ project, using a Google Cloud bucket (see [cdc-vaccination-history issue #9](https://github.com/simonw/cdc-vaccination-history/issues/9)).
The Google Cloud tutorial on [https://cloud.google.com/storage/docs/hosting-static-website-http#gsutil](https://cloud.google.com/storage/docs/hosting...
I deployed https://metmusem.datasettes.com/ by creating a folder on my computer containing a Dockerfile and then shipping that folder up to Google Cloud Run.
Normally I use [datasette publish cloudrun](https://docs.datasette.io/en/stable/publish.html#publishing-to-google-cloud-run) to deploy to Cloud Run, but in this case I decided to do it by...
Spotted in [this Cloud Run example](https://github.com/GoogleCloudPlatform/github-actions/blob/20c294aabd5331f9f7b8a26e6075d41c31ce5e0d/example-workflows/cloud-run/.github/workflows/cloud-run.yml):
```yaml
name: Build and Deploy to Cloud Run
on:
push:
branches:
- master
```
Useful if you don't want people opening pull requests against your repo that inadvertantly trigger a deploy action!
An alternative mechanism I've used...
For [datasette/issues/1522](https://github.com/simonw/datasette/issues/1522) I wanted to use a Docker build argument in a `Dockerfile` that would then be deployed to Cloud Run.
I needed this to be able to control the version of Datasette that was deployed. Here's my simplified `Dockerfile`:
```dockerfile
FROM python:3-alpine
ARG DATASETTE_REF
# Copy to...
...One is to find the item ID using `op items list` and `grep`:
```bash
op items list | grep 'Datasette Cloud Dev'
```
This displayed:
```
uv4maokwxaaymkmoxawwcyfeve Datasette Cloud Dev Simon Personal 4 minutes ago
```
You can then access the item using `op item get` and that ID:
```bash
op item get uv4maokwxaaymkmoxawwcyfeve
```
This output what looked like YAML:
```yaml
ID: uv4maokwxaaymkmoxawwcyfeve
Title...
In [VIAL issue 724](https://github.com/CAVaccineInventory/vial/issues/724) a Cloud Scheduler job which triggered a Cloud Run hosted export script - by sending an HTTP POST to an endpoint - was returning an error. The logs showed the error happened exactly three minutes after the task started executing.
Turns out the HTTP endpoint (which does a lot of work...
From [this example](https://github.com/GoogleCloudPlatform/github-actions/blob/20c294aabd5331f9f7b8a26e6075d41c31ce5e0d/example-workflows/cloud-run/.github/workflows/cloud-run.yml) I learned that you can set environment variables such that they will be available in ALL jobs once at the top of a workflow:
```yaml
name: Build and Deploy to Cloud Run
on:
push:
branches:
- master
env:
PROJECT_ID: ${{ secrets...
The `gcloud run services list` command lists your services running on Google Cloud Run:
```
~ % gcloud run services list --platform=managed
SERVICE REGION URL LAST DEPLOYED BY LAST DEPLOYED AT
✔ calands us-central1 https://calands-j7hipcg4aq-uc.a.run.app ...@gmail.com 2020-09-02T00:15:29.563846Z
✔ cloud-run-hello us-central1 https://cloud-run-hello-j7hipcg4aq-uc.a.run...
...I ran `chmod 755 submit-to-datasette-cloud.sh` and added it to the GitHub repository.
## Running it in GitHub Actions
Having set the `DS_TOKEN` secret for my repository, I added the following to the `scrape.yml` file:
```yaml
- name: Submit latest to Datasette Cloud
env:
DS_TOKEN: ${{ secrets.DS_TOKEN }}
run: |-
./submit-to-datasette-cloud.sh
```
Now every...
...Today I figured out how to use it to run an hourly task (a "Val") that fetches data from an Atom feed, parses it and then submits the resulting parsed data to a table running on [Datasette Cloud](https://www.datasette.cloud/) via the [Datasette JSON write API](https://docs.datasette.io/en/latest/json_api.html#the-json-write...
I used the [google-github-actions/setup-gcloud](https://github.com/google-github-actions/setup-gcloud) action in all of my GitHub Actions workflows that deploy applications to Cloud Run.
The pattern I used to use looked like this:
```yaml
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.10'
- name: Set up Cloud Run
uses...
Google Cloud provide extremely finely grained billing, but you need to access it through BigQuery which I find quite inconvenient.
You can export a dump [from BigQuery](https://console.cloud.google.com/bigquery) to your Google Drive and then download and import it into Datasette.
I started with a `SELECT *` query against the export table it had created for me...
...I hadn't tried it myself, but I'd heard good things about Google Cloud Vision - so I gave that a go using [their online demo](https://cloud.google.com/vision/docs/drag-and-drop):
 - but it's a bit of a pain to setup.
You have to install two extras for it. First, this:
gcloud alpha logging tail
That installs the functionality, but as the documentation will tell you:
> To...
...If you want a per-service breakdown of pricing on your Google Cloud Run services within a project (each service is a different deployed application) the easiest way to do it is to apply labels to those services, then request a by-label pricing breakdown.
This command will update a service (restarting it) with a new label:
```bash
gcloud run...
We launched the [Datasette Cloud blog](https://www.datasette.cloud/blog/) today. The Datasette Cloud site itself is a Django app - it uses Django and PostgreSQL to manage accounts, teams and soon billing and payments, then launches dedicated containers running Datasette for each customer.
It's been a while since I've built a new blog implementation in Django! I...
...Set up Cloud Run
uses: google-github-actions/setup-gcloud@v0
with:
version: '275.0.0'
service_account_email: ${{ secrets.GCP_SA_EMAIL }}
service_account_key: ${{ secrets.GCP_SA_KEY }}
- name: Deploy demo to Cloud Run
env:
CLIENT_SECRET: ${{ secrets.AUTH0_CLIENT_SECRET }}
run: |-
gcloud config set run/region us-central1
gcloud config set project datasette-222320
wget https://latest...
I have two different Google Cloud accounts active at the moment. Here's how to list them with `gcloud auth list`:
```
% gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
simon@example.com
* me@gmail.com
To set the active account, run:
$ gcloud config set account `ACCOUNT`
```
And to switch between them with `gcloud config set account`:
```
% gcloud config set account me...
Today while running `datasette publish cloudrun ...` I noticed the following:
```
Uploading tarball of [.] to [gs://datasette-222320_cloudbuild/source/1618465936.523167-939ed21aedff4cb8a2c914c099fb48cd.tgz]
```
`gs://` indicates a Google Cloud Storage bucket. Can I see what's in that `datasette-222320_cloudbuild` bucket?
Turns out I can:
```
~ % gsutil ls -l gs://datasette-222320_cloudbuild/source/ | head -n 10
36929 2019-05-03T13...
...The code example looked like this:
```javascript
const kv = await Deno.openKv();
```
Wait, that looks like a core language feature? Are they shipping a client for their own proprietary hosted cloud database as part of their core language?
They're not - at least not in the open source implementation of Deno. I dug in and I think I understand what...
...2
```
Here's a script I wrote using this technique for the TIL [Use labels on Cloud Run services for a billing breakdown](https://til.simonwillison.net/til/til/cloudrun_use-labels-for-billing-breakdown.md):
```bash
#!/bin/bash
for line in $(
gcloud run services list --platform=managed \
--format="csv(SERVICE,REGION)" \
--filter "NOT metadata.labels.service:*" \
| tail -n +2...
...The `access_token` will work for an hour, but you can store the `refresh_token` and use it to obtain a new `access_token` any time you like.
## Setting up the OAuth app
In the Google Cloud console you need to navigate to "APIs and Services", "Credentials", "Create Credentials" and select "OAuth client ID". You need to create a client...
...Edge TTL is set to Use cache-control header if present, bypass cache if not.](https://static.simonwillison.net/static/2024/cloudflare-cache-rule.jpg)
I've told it that for any incoming request with a hostname containing `.datasette.site` (see [background in my weeknotes](https://simonwillison.net/2024/Jan/7/page-caching-and-custom-templates-for-datasette-cloud/)) it...
I got an unexpected traffic spike to https://russian-ira-facebook-ads.datasettes.com/ - which runs on Cloud Run - and decided to use `robots.txt` to block crawlers.
Re-deploying that instance was a little hard because I didn't have a clean repeatable deployment script in place for it (it's an older project) - so I decided to try...
...This may result in poor performance" - this particular project runs on Google Cloud Run so I'm less concerned about tying up a worker than I would be normally, plus the export option is only available to trusted staff users with access to the Django Admin interface.
To add the CSV export option to a `ModelAdmin` subclass, do the following...
...Note that while the Google Drive API implies that a file can live in more than one folder - `parents` is an array of IDs - Google Drive [simplified their model in September 2020](https://cloud.google.com/blog/products/g-suite/simplifying-google-drives-folder-structure-and-sharing-models) such that a file can only be in a single folder.
## Code...
...string
>,
message string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION 's3://datasette-cloud-fly-logs/'
```
I ran that in the Athena query editor at https://us-east-1.console.aws.amazon.com/athena/home?region=us-east-1#/query-editor
As you...
...Rising temperatures, ocean acidification, and altered (slight bold) precipitation patterns all contribute to shifts in the distribution and behavior of marine (big bold) species (bold), influencing the delicate balance of under water ecosystems (3 words in bold).](https://static.simonwillison.net/static/2024/colbert-vis-2.jpg)
That's from [colbert.aiserv.cloud](https://colbert.aiserv.cloud/), a really neat...
...I'm really excited about cloud-based development environments such as [GitHub Codespaces](https://github.com/features/codespaces) for exactly this reason - I love the idea that you can get a working environment by clicking a green button, and if it breaks you can throw it away and click the button again to get a brand new one.
Today I...
[Protomaps](https://protomaps.com/) is "an open source map of the world, deployable as a single static file on cloud storage". It involves some _very_ clever technology, rooted in the [PMTiles](https://github.com/protomaps/PMTiles) file format which lets you create a single static file containing vector tile data which is designed to be hosted on static hosting but...
...name
)
select
group_concat(query, ' union all ')
from
queries
```
I tried this against the FiveThirtyEight database and the query it produced was way beyond the URL length limit for Cloud Run.
Here's the result if [run against latest.datasette.io/fixtures](https://latest.datasette.io/fixtures?sql=with+tables+as+%28%0D%0A++select%0D%0A++++name+as+table...