TikTok may or may not be banned in the USA within the next 24 hours or so. Here's a pattern you can use to download all of the videos from a specific account.
I used a variant of my Twitter scraping trick. Start by loading up a profile page - like https://www.tiktok.com/@ilgallinaio_special - in Firefox or Chrome or Safari.
Open up the DevTools and paste in the following JavaScript:
window.videoUrls = new Set();
function collect() {
Array.from(document.querySelectorAll('a[href*="/video/"]'), el => el.href).forEach(href => {
window.videoUrls.add(href);
})
};
setInterval(collect, 500);
This will scan the page every half a second looking for links to TikTok videos - links with /video/
in their URL - and add those to a growing set called videoUrls
.
Now switch to the "oldest" sort direction and scroll down the page until you reach the bottom. TikTok implements infinite-ish scrolling so this may take a while for an account with a lot of videos.
Once you get to the bottom, copy out the collected list of URLs. In Firefox I used this command for that:
copy(Array.from(window.videoUrls))
That copied the array of URLs to my clipboard. I then pasted them into a file and saved it as videos.json
- the file contents looked something like this (but a lot longer):
[
"https://www.tiktok.com/@ilgallinaio_special/video/7204803049351695622",
"https://www.tiktok.com/@ilgallinaio_special/video/7204877634189151493",
"https://www.tiktok.com/@ilgallinaio_special/video/7205157890372537606",
"https://www.tiktok.com/@ilgallinaio_special/video/7205189803074211077"
]
The yt-dlp Python program can download from TikTok. I ran it against all of the URLs in my videos.json
file like this:
mkdir -p downloads
jq -r '.[]' videos.json | while read url; do
uvx yt-dlp -o "downloads/%(title)s-%(id)s.%(ext)s" "$url"
if [[ $? -eq 0 ]]; then
echo "Successfully downloaded: $url"
else
echo "Failed to download: $url"
fi
sleep 1
done
This creates a downloads/
folder containing files with names like this:
#perte -7204803049351695622.mp4
#perte -7204877634189151493.mp4
#perte -7205189803074211077.mp4
#perte i galli moroseta🐓🐓🌸🍾🍾💪😅-7205157890372537606.mp4
I did this against an account that wasn't just dancing chickens and decided to use Whisper running on macOS via mlx-whisper to generate text files with transcripts, so I could search that content later on.
Here's the recipe I used for that, powered by uv run
:
for f in *.mp4; do [[ ! -f "${f:r}.txt" ]] && echo "Processing $f" && uv run --with mlx-whisper mlx_whisper "$f"; done
This can be run multiple times - it checks to see if a .txt
file exists already and only executes against .mp4
files that have not yet been processed.
After I kicked this off against a larger account I realized a progress bar would be nice. I got ChatGPT o1 to write me this script:
#!/usr/bin/env python3
import sys
import time
import subprocess
def main():
if len(sys.argv) < 3:
print(f"Usage: {sys.argv[0]} <total> <shell_command>")
sys.exit(1)
total = int(sys.argv[1])
# If your command may include spaces, you might need to do this:
# shell_command = ' '.join(sys.argv[2:])
# but for the simple example provided:
shell_command = sys.argv[2]
# -- Step 1: Get initial progress and record the time --
try:
initial_output = subprocess.check_output(shell_command, shell=True)
done_initial = int(initial_output.strip())
except Exception as e:
print(f"Error running initial command: {shell_command}\n{e}")
sys.exit(1)
# Clamp in case the command returns something above the total or below zero
if done_initial < 0:
done_initial = 0
if done_initial > total:
done_initial = total
time_initial = time.time()
# Print one quick update before we start the loop
print_progress(done_initial, total, 0, 0)
# If we already reached (or exceeded) the total, exit immediately
if done_initial >= total:
print("\nDone!")
sys.exit(0)
# -- Step 2: Repeatedly poll the command to update progress --
polling_interval = 1.0 # seconds between checks
while True:
time.sleep(polling_interval)
# Fetch current progress
try:
output = subprocess.check_output(shell_command, shell=True)
done = int(output.strip())
except Exception as e:
print(f"\nError running command: {shell_command}\n{e}")
sys.exit(1)
# Clamp done to never exceed total or go below 0
if done < 0:
done = 0
if done > total:
done = total
# How much progress has been made since we started measuring?
delta_done = done - done_initial
delta_time = time.time() - time_initial
# Print the progress bar
print_progress(done, total, delta_done, delta_time)
if done >= total:
break
print("\nDone!")
def print_progress(done, total, delta_done, delta_time):
"""
Print a single-line progress bar with percentage and ETA (if possible).
Overwrites the previous line via carriage return.
"""
# Fraction complete
fraction = done / total if total else 1.0
# Build the bar
bar_length = 50
filled_length = int(bar_length * fraction)
bar = "#" * filled_length + "-" * (bar_length - filled_length)
# Compute ETA based only on new progress (delta_done)
if delta_done > 0:
time_per_item = delta_time / delta_done
remaining = total - done
eta_seconds = int(time_per_item * remaining)
eta_string = format_eta(eta_seconds)
else:
# If no new items have completed since the script started, can't guess yet
eta_string = "calculating..."
progress_line = (
f"\r[{bar}] {done}/{total} ({fraction*100:.1f}%) - ETA: {eta_string}"
)
print(progress_line, end='', flush=True)
def format_eta(seconds):
"""Convert number of seconds into a H:MM:SS or M:SS format string."""
h = seconds // 3600
m = (seconds % 3600) // 60
s = seconds % 60
if h > 0:
return f"{h:d}:{m:02d}:{s:02d}"
else:
return f"{m:02d}:{s:02d}"
if __name__ == "__main__":
main()
Which I can then run like this:
uv run progress.py 45 'ls *.mp4 | wc -l'
The 45
there is the expected number of downloads (found with jq length < videos.json
). The ls *.mp4 | wc -l
string is a command to run on each iteration to count how many items have been processed.
This command provides both a visible ASCII progress bar and an ETA prediction of when the program will finish, based on how many items have been processed and how quickly they appear to be running.
Created 2025-01-18T17:06:27-08:00, updated 2025-01-18T17:13:06-08:00 · History · Edit