Summarizing Hacker News discussion themes with Claude and LLM

I've been experimenting with the combination of Claude and my LLM CLI tool to give me quick summaries of long discussions on Hacker News.

I only use this occasionally: usually I'll follow the conversation directly, but sometimes it can be neat to get a quick summary of the key themes in a huge 100+ comment thread.

I'm using Claude for this because both of the current Claude models (Claude Instant, cheaper and faster and Claude 2, better) can handle up to 100,000 tokens.

My Bash script for this does the following:

I'm using LLM with the llm-claude plugin, which requires an API key from Anthropic. Prior to obtaining that key I used llm-openrouter which provides access to Claude via a paid endpoint.

In both cases I had set the claude alias in my local LLM install using llm aliases set claude longer-model-id.

After several rounds of iteration, I settled on the following prompt:

HACKER NEWS CONTENT HERE Summarize the themes of the opinions expressed here, including quotes where appropriate.

This is beginning to work quite well. Here's the summary it generated for this thread on Teaching with AI with 316 comments:

Here is a summary of some of the key themes that emerged from the discussion on using AI systems like ChatGPT for education:

On potential benefits:

On risks and limitations:

On implementation challenges:

In summary, the discussion highlighted both opportunities and challenges in leveraging AI for education, with many calling for careful implementation and rethinking of learning objectives, assessments, and the role of teachers.

A quick spot-check on the quoted comments found them on the page, so if it's hallucinating any I haven't caught it yet.

I'm using the Claude API for free for non-commercial purposes. This mechanism can use a LOT of tokens, so if you're going to try this keep an eye on how much it might cost you.

Here's the Bash script I wrote (with a skeleton by GPT-4, hence the argument validation) called hn-summary.sh:

#!/bin/bash

# Validate that the argument is an integer
if [[ ! $1 =~ ^[0-9]+$ ]]; then
  echo "Please provide a valid integer as the argument."
  exit 1
fi

# Make API call, parse and summarize the discussion
curl -s "https://hn.algolia.com/api/v1/items/$1" | \
  jq -r 'recurse(.children[]) | .text' | \
  llm -m claude 'Summarize the themes of the opinions expressed here, including quotes where appropriate.'

I run it like this:

hn-summary.sh 37340314

Follow-up questions

Since this is using LLM, I can ask follow-up questions directly after running the script using llm -c (for --continue):

llm -c 'More detail and quotes about student motivation'

LLM replied:

Here are some additional details and quotes about the theme of student motivation and learning when using AI systems like ChatGPT in education:

Several comments expressed concerns that students would use ChatGPT simply to get grades or test scores, without actually learning:

However, some argued that intrinsic motivation matters more than grades:

And that students would still need to learn fundamentals:

The overall concern was that easy access to ChatGPT could negatively impact student motivation to deeply engage with course material.

Follow-up questions could get expensive as well, since they require re-submitting the entire prompt to Claude every time.

Everything gets logged

Every prompt and response run through LLM is logged to a SQLite database. You can access the last 3 logged messages with:

llm logs -n 3

Or run Datasette against the full log database like this:

datasette "$(llm logs path)"

Price estimates

I found this PDF of Claude's July 2023 pricing:

The Anthropic logs listed my initial prompt (with all of the comments) as 38,254 prompt tokens returning 410 completion tokens.

At those prices (using Code Interpreter a the world's most-over-powered calculator) that would have cost me:

Adding attribution

I tried this variation of the one-liner to include attribution for each of the quotes:

curl -s "https://hn.algolia.com/api/v1/items/$1" | \
  jq -r 'recurse(.children[]) | .author + ": " + .text' | \
  llm -m claude 'Summarize the themes of the opinions expressed here, including quotes (with author attribution) where appropriate.'

Two things have changed here. First, I'm now getting the jq program to output username: comment-text:

jq -r 'recurse(.children[]) | .author + ": " + .text'

I've also updated the prompt to add the parenthesis bit here:

...including quotes (with author attribution) where appropriate.

This works, but I'm not sure I like it better. It seems to to result in longer quotes, where I actually liked the shorter, inline quotes from the previous version.

Example output from this conversation about NGINX Unit:

hn-summary.sh 37453854

There are a few key themes in the discussion about Nginx Unit:

  1. Performance and benchmarking

"Trying this with a moderately complex PHP Laravel app and got a 40% speed improvement. Very useful to be able to run multiple apps each using different lang runtimes/versions without needing separate docker containers." - Dachande663

"Nginx Unit + PHP seems to handedly out-perform Nginx + php-fpm[1][2][3]." - jchook

  1. Configuration and ease of use

"Why the obsession (it seems to be the prominent point in the readme) with configuration via API? How often do you need to add php support on the fly? I want to configure my app server via files so it just starts up in the state that I expect." - gibmeat

"This allows you to start up generic machines with no configuration and customize them after boot from a remote host. It's not so much "on the fly", as it is moving the long-term config storage to a different system." - sneak

  1. Language and runtime support

"Instead the docs have you do something manual with certbot (a complete nono if you believe in automatic SSL and are using docker images that don't persist data, as Docker is meant to be used)." - tomjen3

"I'd love to see a performance benchmark for unit. Especially for an nodejs/express app" - abdellah123

  1. Comparisons to similar tools

"I switched to caddy from nginx and didn't look back. Auto SSL wildcards was enough." - andrewstuart

"This seems to be a “universal” app server, like gunicorn is for running Pyhthon stuff, php-fpm for PHP and so on." - ExoticPearTree

In summary, commenters seem intrigued by Nginx Unit's potential for performance, ease of use, and language support, but want to see more benchmarking and comparisons to similar tools before adopting it more widely.

Adding a -m model option

This tool turns out to be a useful way to compare different models. I added a -m model_name switch in the latest version (now defaulting to Claude 3 Haiku):

#!/bin/bash

# Validate that the first argument is an integer
if [[ ! $1 =~ ^[0-9]+$ ]]; then
  echo "Please provide a valid integer as the first argument."
  exit 1
fi

id="$1"

# Parse the optional -m argument
model="haiku"
if [[ $2 == "-m" && -n $3 ]]; then
  model="$3"
fi

# Make API call, parse and summarize the discussion
curl -s "https://hn.algolia.com/api/v1/items/$id" | \
  jq -r 'recurse(.children[]) | .author + ": " + .text' | \
  llm -m "$model" -s 'Summarize the themes of the opinions expressed here.
  For each theme, output a markdown header.
  Include direct "quotations" (with author attribution) where appropriate.
  You MUST quote directly from users when crediting them, with double quotes.
  Fix HTML entities. Output markdown. Go long.'

Created 2023-09-09T14:17:27-07:00, updated 2024-03-28T06:37:21-07:00 · History · Edit