For google-drive-to-sqlite I wanted a mechanism to recursively return metadata on every file in a specified Goole Drive folder.
To fetch files in a single folder with the ID FOLDER_ID
you need to use this API:
https://www.googleapis.com/drive/v3/files?q=...
The q=
parameter needs to be fed the following special search query:
"FOLDER_ID" in parents
So the URL for a folder with ID 1E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i
ends up looking like this:
https://www.googleapis.com/drive/v3/files?q=%221E6Zg2X2bjjtPzVfX8YqdXZDCoB3AVA7i%22%20in%20parents
You need to accompany that with a Authorization: Bearer YOUR_ACCESS_TOKEN
header - see Google OAuth for CLI applications for more on that.
The metadata representation you get back by default is pretty thin.
You can add &fields=*
to the URL to get back the full JSON representation - but this can actually be a little bit too much data, mainly because it includes a list of every user who has permissions relating to every file.
You can specify individual fields, but it's not instantly obvious how to do so. The syntax turns out to look like this:
nextPageToken, files(id, kind, name)
Each token corresonds to a key, The files(id, kind, name)
bit selects the fields you want from items in the files
array.
The full list of file keys I've been using are:
[
"kind",
"id",
"name",
"mimeType",
"starred",
"trashed",
"explicitlyTrashed",
"parents",
"spaces",
"version",
"webViewLink",
"iconLink",
"hasThumbnail",
"thumbnailVersion",
"viewedByMe",
"createdTime",
"modifiedTime",
"modifiedByMe",
"owners",
"lastModifyingUser",
"shared",
"ownedByMe",
"viewersCanCopyContent",
"copyRequiresWriterPermission",
"writersCanShare",
"folderColorRgb",
"quotaBytesUsed",
"isAppAuthorized",
"linkShareMetadata",
]
This will get you the files and folders within a folder... but you have to do extra work if you want to recursively fetch files from every sub-folder.
A sub-folder in the response from that API looks like this:
{
"kind": "drive#file",
"id": "1ixwEcEUmZ5-RZY2AQrwG-s1rvOF9LtqX",
"name": "My sub-folder",
"mimeType": "application/vnd.google-apps.folder"
}
So fetching everything involves fetching the contents of a folder, then filtering out anything with a mimeType
of application/vnd.google-apps.folder
, extracting the id
and recursively fetching the contents of that folder too.
You also need to handle pagination, in case a folder contains more than 100 items. That's done using the nextPageToken
key in the response JSON, which you can then pass as ?pageToken=
to the API to retrieve the next page.
Note that while the Google Drive API implies that a file can live in more than one folder - parents
is an array of IDs - Google Drive simplified their model in September 2020 such that a file can only be in a single folder.
Here's the code I used to implement recursive fetching of folder contents:
def paginate_files(access_token, *, q=None, fields=None):
pageToken = None
files_url = "https://www.googleapis.com/drive/v3/files"
params = {}
if fields is not None:
params["fields"] = "nextPageToken, files({})".format(",".join(fields))
if q is not None:
params["q"] = q
while True:
if pageToken is not None:
params["pageToken"] = pageToken
else:
params.pop("pageToken", None)
data = httpx.get(
files_url,
params=params,
headers={"Authorization": "Bearer {}".format(access_token)},
timeout=30.0,
).json()
if "error" in data:
raise Exception(data)
yield from data["files"]
pageToken = data.get("nextPageToken", None)
if pageToken is None:
break
def files_in_folder_recursive(access_token, folder_id, fields):
for file in paginate_files(
access_token, q='"{}" in parents'.format(folder_id), fields=fields
):
yield file
if file["mimeType"] == "application/vnd.google-apps.folder":
yield from files_in_folder_recursive(access_token, file["id"], fields)
Created 2022-02-16T20:56:05-08:00, updated 2022-02-17T09:49:18-08:00 · History · Edit