Using Tesseract.js to OCR every image on a page

Pasting this code into a DevTools console should load Tesseract.js from a CDN, loop through every image loaded by that page (every PNG, GIF, JPG or JPEG), run OCR on them and output the result to the DevTools console.

There's one major catch: the images need to be served in a context that allows JavaScript to read their content - either from the same domain, or from a separate domain with a permissive CORS policy.

Very few sites do this! It worked on for me, where it successfully OCRs the Google logo as containing the text "Google".

var s = document.createElement("script")
s.src = "";
s.onload = (async () => {
  const imageUrls = performance.getEntries().map(f =>
    n => n.includes('.jpg') || n.includes('.gif') || n.includes('.png')  || n.includes('.jpeg')
  const worker = Tesseract.createWorker();
  await worker.load();
  await worker.loadLanguage('eng');
  await worker.initialize('eng');
  for (const url of imageUrls) {
    var { data: { text } } = await worker.recognize(url);

Created 2021-11-09T12:27:11-08:00 · Edit