Inspired by py-free-threading.github.io I decided to try out a beta of Python 3.13 with the new free-threaded mode enabled, which removes the GIL.
I chose to use the macOS installer to get a pre-built binary. I downloaded and ran the macOS installer from www.python.org/downloads/release/python-3130b3/, and then when I got to this screen:
I selected the "Customize" option and checked this additional box:
Once it had finished installing I didn't run the script to add it to my path (I didn't want to intefere with my many other Python versions).
This gave me two new Python binaries - one with free-threading enabled and one without:
/Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13
/Library/Frameworks/PythonT.framework/Versions/3.13/bin/python3.13t
Note the PythonT.framework
in the second path.
It also seemed to setup the following aliases in /usr/local/bin
:
/usr/local/bin/python3.13
/usr/local/bin/python3.13t
These are symlinks:
ls -lah /usr/local/bin | grep python3.13
lrwxr-xr-x 1 root wheel 73B Jul 12 16:26 python3.13 -> ../../../Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13
lrwxr-xr-x 1 root wheel 80B Jul 12 16:26 python3.13-config -> ../../../Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13-config
lrwxr-xr-x 1 root wheel 81B Jul 12 16:26 python3.13-intel64 -> ../../../Library/Frameworks/Python.framework/Versions/3.13/bin/python3.13-intel64
lrwxr-xr-x 1 root wheel 75B Jul 12 16:26 python3.13t -> ../../../Library/Frameworks/PythonT.framework/Versions/3.13/bin/python3.13t
lrwxr-xr-x 1 root wheel 82B Jul 12 16:26 python3.13t-config -> ../../../Library/Frameworks/PythonT.framework/Versions/3.13/bin/python3.13t-config
Starting those Python processes shows which one has free-threading in the interpreter header:
% /usr/local/bin/python3.13
Python 3.13.0b3 (v3.13.0b3:7b413952e8, Jun 27 2024, 09:57:31) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
% /usr/local/bin/python3.13t
Python 3.13.0b3 experimental free-threading build (v3.13.0b3:7b413952e8, Jun 27 2024, 10:04:51) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
I asked Claude 3.5 Sonnet to write me a quick test script, then iterated on and simplified the result myself until I got to this:
import argparse
import time
from concurrent.futures import ThreadPoolExecutor
def cpu_bound_task(n):
"""A CPU-bound task that computes the sum of squares up to n."""
return sum(i * i for i in range(n))
def main():
parser = argparse.ArgumentParser(description="Run a CPU-bound task with threads")
parser.add_argument("--threads", type=int, default=4, help="Number of threads")
parser.add_argument("--tasks", type=int, default=10, help="Number of tasks")
parser.add_argument(
"--size", type=int, default=5000000, help="Task size (n for sum of squares)"
)
args = parser.parse_args()
print(f"Running {args.tasks} tasks of size {args.size} with {args.threads} threads")
start_time = time.time()
with ThreadPoolExecutor(max_workers=args.threads) as executor:
list(executor.map(cpu_bound_task, [args.tasks] * args.size))
end_time = time.time()
duration = end_time - start_time
print(f"Time with threads: {duration:.2f} seconds")
if __name__ == "__main__":
main()
I saved this as gildemo.py
and tried running it with the new Python binaries the one with free-threading enabled and the one without.
IMPORTANT: This code has a bug, explained below.
Here's what I saw in Activity Monitor while running the scripts:
No free-threading (with the GIL) - reported 99% CPU usage:
Free-threading (no GIL) - reported 258.8% CPU usage:
And here are some results from running the script (invalid due to the bug described below).
First with the GIL in place:
% python3.13 gildemo.py --size 1000000
Running 10 tasks of size 1000000 with 4 threads
Time with threads: 17.14 seconds
And then without the GIL:
% python3.13t gildemo.py --size 1000000
Running 10 tasks of size 1000000 with 4 threads
Time with threads: 8.19 seconds
Bartek Ogryczak spotted a bug in the code above:
I was wondering with only 2x improvement with 4 threads, and seems like the the code is off:
map(cpu_bound_task, [args.tasks] * args.size)
ought to be
map(cpu_bound_task, [args.size] * args.tasks)
I had to think a bit about this. The executor.map()
method takes two arguments - the function to be run in the thread, and a list of arguments where each argument will be passed to a separate invocation of that function.
tasks
is the number of times to run the function in the test. size
controls the complexity of that function (the sum of squares up to that number).
Running [x] * 5
produces [x, x, x, x, x]
- the list duplicated by that constant.
So Bartek is right! Consider this invocation:
python3.13t gildemo.py --size 10000000 --threads 4 --tasks 10
That should run sum(i * i for i in range(10000000))
10 times in four different threads. But, thanks to the bug in the code, it actually runs sum(i * i for i in range(10))
10000000 times in four threads!
The original mistake here came from Claude, but I'm responsible for it - I broke the cardinal rule of LLM-assisted programming, which is to closely review every line of code it produces.
After applying Bartek's fix I get these results with free-threading (no GIL):
% python3.13t gildemo.py --size 10000000 --threads 4
Running 10 tasks of size 10000000 with 4 threads
Time with threads: 1.38 seconds
% python3.13t gildemo.py --size 10000000 --threads 8
Running 10 tasks of size 10000000 with 8 threads
Time with threads: 1.10 seconds
% python3.13t gildemo.py --size 10000000 --threads 12
Running 10 tasks of size 10000000 with 12 threads
Time with threads: 1.00 seconds
And these results with the GIL in place:
% python3.13 gildemo.py --size 10000000 --threads 4
Running 10 tasks of size 10000000 with 4 threads
Time with threads: 4.01 seconds
% python3.13 gildemo.py --size 10000000 --threads 8
Running 10 tasks of size 10000000 with 8 threads
Time with threads: 4.03 seconds
% python3.13 gildemo.py --size 10000000 --threads 12
Running 10 tasks of size 10000000 with 12 threads
Time with threads: 3.92 seconds
Created 2024-07-12T17:13:36-07:00, updated 2024-07-12T20:43:04-07:00 · History · Edit