There's a story going around at the moment that people have found code from their private GitHub repositories in the AI training data known as The Stack, using this search tool: https://huggingface.co/spaces/bigcode/in-the-stack …
Via this comment on Hacker News I started exploring the ClickHouse Playground. It's really cool, and among other things it allows CORS-enabled API hits that can query a decade of history from the GitHub events archive in less than a second. …