Python Bits — Using Threads

This is the second in the series of Python blog posts I’m writing, you can find the first one here. In this particular one we’ll add threads to our Imgur Album downloader, hopefully making it a tad bit faster than before.

First of, you must be wondering, due to the infamous GIL(in CPython of course), threads are not useful in Python. Luckily in our case, most of the time threads will be waiting for network activity, and the GIL would happily switch threads from one to another instead of locking on one of them. So, it would actually be beneficial to use threads since most of the times we’ll be doing either Network IO (while downloading the images) or File IO (while writing the images to disk)

Since this is Python, using threads is quite simple, we just need to import the threading module, create a thread instance, and then tell it to run. While creating the instance we can tell it what function to run, and if needed, we can pass arguments to that function through this thread we have created. We will then call the join function in the main process loop so that we can wait for our thread to finish

import threading
# we'll use a Python thread to call this function
def foo(arg1):
print(arg1)
# using the below syntax you can create a Python thread
# Note that we need to pass a tuple to args, therefore I've
# added a comma(,) after the "3", don't forget that
thread = threading.Target(target=foo, args=(3,))
# starting the thread is no-brainer
thread.start()
# wait for the thread to finish
thread.join()
view raw pbits-1.py hosted with ❤ by GitHub

We’ll just call our download_img function from each thread, telling it to download a different picture. One problem we might face now is with the progress bar, since threads run parallel to our main process thread, our for loop will finish as soon as we have launched all the requisite number of threads, and thus the progress bar would reach 100% before all the images have finished downloading.

To counter this, after each thread completes, we’ll manually update our progress bar.

This is how we do it :

bar = progressbar.ProgressBar(max_value=len(img_lst))
with lock:
bar.update(i)
i += 1
view raw pbits-2.py hosted with ❤ by GitHub

The max_value tells it that we have this many items, when the count reaches that number, the progress bar should be at 100%. To update the progress bar, we’ll take a lock to increment a variable, and use that variable to update the progress bar. The lock is necessary to prevent multiple threads from updating the same global variable simultaneously, and not mess up the whole thing.

#! /usr/bin/env python
import os
import re
import sys
import threading
import progressbar
import requests
from imgurpython import ImgurClient
regex = re.compile(r'\.(\w+)$')
def get_extension(link):
ext = regex.search(link).group()
return ext
lock = threading.Lock()
i = 1
def download_img(img):
# If we don't specify global here, Python would complain.
# It would assume that "i" and "bar" are two local variables
# and we're using them without initialization.
# Using the below syntax, all out threads can access
# these global variables
global i, bar
file_ext = get_extension(img.link)
resp = requests.get(img.link, stream=True)
# create unique name by combining file id with its extension
file_name = img.id + file_ext
with open(file_name, ‘wb’) as f:
for chunk in resp.iter_content(chunk_size=1024):
f.write(chunk)
with lock:
bar.update(i)
i += 1
try:
album_id = sys.argv[1]
except IndexError:
raise Exception(‘Please specify an album id’)
client_id = os.getenv(‘IMGUR_CLIENT_ID’)
client_secret = os.getenv(‘IMGUR_CLIENT_SECRET’)
client = ImgurClient(client_id, client_secret)
img_lst = client.get_album_images(album_id)
bar = progressbar.ProgressBar(max_value=len(img_lst))
threads = []
for img in img_lst:
t = threading.Thread(target=download_img, args=(img,))
threads.append(t)
t.start()
# this is for the main loop to wait for all threads to finish
for t in threads:
t.join()
view raw pbits-3.py hosted with ❤ by GitHub

Phew, that was quite some work with locks and all. In the next post, we’ll move from these messy threads to the new and shiny async-await style for doing asynchronous code.