A follow-up to Using OpenAI functions and their Python library for data extraction and Using the ChatGPT streaming API from Python. If I have a stream of chunks of a larger JSON document, how can I output full individual JSON objects as soon as they are available?
My completed JSON will look like this:
{
"items": [
{
"title": "Classic and Custom Car Show",
"date": "Sat, Sep 2",
"venue_name": "1700 W Hillsdale Blvd • San Mateo, CA",
"start_time": "9:00 AM"
},
{
"title": "【衛蘭、王君馨、衛詩、鍾舒漫、吉中鳴牧師】MMO AINEO LIVE 愛你喲!佈道會(三藩市站)",
"date": "Sat, Aug 26",
"venue_name": "San Mateo Performing Arts Center • San Mateo, CA",
"start_time": "7:30 PM"
},
{
"title": "San Francisco Small Business Expo 2023",
"date": "Fri, Aug 18",
"venue_name": "Hyatt Regency San Francisco Airport • Burlingame, CA",
"start_time": "10:00 AM"
},
{
"title": "Stanford Genetics Conference on Structural Variants and DNA Repeats",
"date": "Thu, Sep 7",
"venue_name": "Stanford Center for Academic Medicine • Palo Alto, CA",
"start_time": "8:00 AM"
},
{
"title": "DJ QUIK / ROJAI / Z-MAN / DJ SAURUS",
"date": "Fri, Aug 18",
"venue_name": "The Longboard Margarita Bar • Pacifica, CALIFORNIA",
"start_time": "9:00 PM"
},
{
"title": "DUE SOUTH W/ CHERRY GLAZERR, MOMMA (DUO), KING ISIS (FREE!)",
"date": "Sat, Aug 26",
"venue_name": "Jerry Garcia Amphitheater • San Francisco, CA",
"start_time": "2:30 PM"
}
]
}
If that's going to arrive as a sequence of chunks, how can I display those items as soon as they become available?
After much experimentation I figured out this recipe using ijson:
import ijson
import json
import time
chunks = [
'{\n "items": [\n {\n "ti',
'tle": "Classic and Custom Car ',
'Show",\n "date": "Sat, Sep',
' 2",\n "venue_name": "1700',
" W Hillsdale Blvd • San Mateo,",
' CA",\n "start_time": "9:0',
'0 AM"\n },\n {\n "titl',
'e": "【衛蘭、王君馨、衛詩、鍾舒漫、吉中鳴牧師】MMO ',
'AINEO LIVE 愛你喲!佈道會(三藩市站)",\n ',
' "date": "Sat, Aug 26",\n ',
' "venue_name": "San Mateo Per',
"forming Arts Center • San Mate",
'o, CA",\n "start_time": "7',
':30 PM"\n },\n {\n "ti',
'tle": "San Francisco Small Bus',
'iness Expo 2023",\n "date"',
': "Fri, Aug 18",\n "venue_',
'name": "Hyatt Regency San Fran',
"cisco Airport • Burlingame, CA",
'",\n "start_time": "10:00 ',
'AM"\n },\n {\n "title"',
': "Stanford Genetics Conferenc',
"e on Structural Variants and D",
'NA Repeats",\n "date": "Th',
'u, Sep 7",\n "venue_name":',
' "Stanford Center for Academic',
' Medicine • Palo Alto, CA",\n ',
' "start_time": "8:00 AM"\n ',
' },\n {\n "title": "DJ ',
"QUIK / ROJAI / Z-MAN / DJ SAUR",
'US",\n "date": "Fri, Aug 1',
'8",\n "venue_name": "The L',
"ongboard Margarita Bar • Pacif",
'ica, CALIFORNIA",\n "start',
'_time": "9:00 PM"\n },\n {',
'\n "title": "DUE SOUTH W/ ',
"CHERRY GLAZERR, MOMMA (DUO), K",
'ING ISIS (FREE!)",\n "date',
'": "Sat, Aug 26",\n "venue',
'_name": "Jerry Garcia Amphithe',
'ater • San Francisco, CA",\n ',
' "start_time": "2:30 PM"\n ',
" }\n ]\n}",
]
events = ijson.sendable_list()
coro = ijson.items_coro(events, "items.item")
seen_events = set()
for chunk in chunks:
coro.send(chunk.encode("utf-8"))
if events:
# Any we have not seen yet?
unseen_events = [e for e in events if json.dumps(e) not in seen_events]
if unseen_events:
for event in unseen_events:
seen_events.add(json.dumps(event))
print(json.dumps(event))
time.sleep(1)
You create an ijson
coroutine, then send it chunks of JSON data. It will write new items to the sendable_list()
as soon as they are available.
The hardest part to figure out was this:
coro = ijson.items_coro(events, "items.item")
This is the syntax to indicate that I'm interested in the array items in the {"items": [...]}
object.
The confusing part is that item
here is a reserved word in ijson
which means "individual items of an array". I'm not sure what you are meant to do if your nested object also has a key called "item"
- I guess work to avoid that situation from coming up!
This works though. The above code, when executed, prints out each of the nested objects one at a time, with a 1 second sleep between each one.
Created 2023-08-15T18:07:38-07:00 · Edit