Written by Jamie Tanna
on April 28, 2022
CC-BY-NC-SA-4.0 Apache-2.0
2 mins

Building a fault-tolerant work queue for command-line executions with GNU Parallel

Featured image for sharing metadata for article

I've recently been backfilling my listening history from Podcast Addict, which has involved scraping the database and then converting the posts to Micropub commands.

As part of this, I needed to script the execution of ~600 commands. I started off by producing a single script that could execute each of the commands, like so:

#!/usr/bin/env bash

micropub create form h=entry ...
micropub create form h=entry ...
micropub create form h=entry ...

However, this highlighted a number of cases where intermittent errors would result in a failure, and I needed to retry. This meant I needed to start recording what passed and failed, and then have some means for reading the in queue, and the failure queue, and remove successful entries.

To start off with, I simply manually retried the failed entries, but after maybe a couple of dozen, I grew a bit bored by it, knowing I had so many more to go.

I started to consider writing something for this in either Ruby or Go, feeling that it was just complex enough to need something a bit more thought out, but also surprised I'd not seen something like this before that I could utilise.

Fortunately, it turns out it is a solved problem, and my searching found that I could use GNU Parallel for this purpose.

This would allow us to run the following:

# on first run
touch joblist
# start the jobs
tail -f joblist | parallel \
       # if you want to throttle jobs
       -j1 \
       # if you want to make sure there are retries when commands fail
       --retry-failed --retries 3 \
       # store a record of what passes/fails, to allow rerunning and resuming where you left off
       --joblog joblog

# then, we can run i.e.:
echo "micropub create form h=entry ..." >> joblist

This is super convenient, and gives us a fault-tolerant solution, allowing us to retry errors, as well as resume where we left off with the jobs.

Because this is running with parallel we can also parallelise it well, for speed boosts!

Written by Jamie Tanna on Thu, 28 Apr 2022 21:04:10+01:00, and last updated on Thu, 28 Apr 2022 20:11:54+00:00.

Content for this article is shared under the terms of the Creative Commons Attribution Non Commercial Share Alike 4.0 International, and code is shared under the Apache License 2.0.

#blogumentation #command-line #gnu-parallel.

Also on:

This post was filed under articles.

Interactions with this post

Below you can find the interactions that this page has had using WebMention.

Have you written a response to this post? Let me know the URL:

Do you not have a website set up with WebMention capabilities? You can use Comment Parade.

← → Top