zealcharm's website pennant zealcharm's website

About me - Works - Toys - Blog - Contact

A pitfall of Python's subprocess Ctrl+C handling (compared to bash)

Usually, I start writing my scripts in bash, and then translate them to Python if they get too complex. When doing so, I assume that invoking a simple command from a bash script is equivalent to a subprocess.run call. However, this is not true when it comes to Ctrl+C (aka. SIGINT) handling.

A reliable bash script

This bash script shows a common pattern where a script will first set up some state, does something over it, then cleans the state using a shell trap:

#!/usr/bin/env bash
set -eu

# Set up some temporary stuff and ensure that it is cleaned up on exit. The `sleep`s simulate non-trivial clean-up.
trap 'rm -f a && sleep 0.2 && rm -f b && sleep 0.2 && rm -f c' EXIT
touch a b c

# Simulate a long running process
tail -q -f a b c

For example, we may want to perform a backup using rsync. Before doing it, we unlock and mount the target volume using cryptsetup open and mount. Whether things go right or wrong, we clean up the setup steps with umount and cryptsetup close.

Granted, this pattern is not 100% bulletproof. If our script gets killed [or the power goes off, or the system crashes...] the cleanup step will never run, which may leave some junk behind. But since this is very uncommon, it's often acceptable to deal with it manually.

But something that may happen more often is that you forgot something, or you ran out of time, and you want to stop the process. In this case, you press Ctrl+C, which sends a SIGINT to the shell, which triggers the trap and cleans everything up properly:

$ ./shell_script_with_traps
^C
$ ls a b c
ls: cannot access 'a': No such file or directory
ls: cannot access 'b': No such file or directory
ls: cannot access 'c': No such file or directory

...maybe it wasn't so reliable after all

Thus, our script becomes part of our collection of reliable, battle-hardened scripts. We move to building greater things, which involve invoking that script using Python's subprocess.run. Can you guess what happens when we press Ctrl+C this time?

$ python -c "import subprocess; subprocess.run(['./shell_script_with_traps'])"
^C
[...the traceback for the KeyboardInterrupt exception...]
$ ls a b c
ls: cannot access 'a': No such file or directory
ls: cannot access 'b': No such file or directory
c

Only half of our cleanup trap seems to have run. Let's try using asyncio.create_subprocess_exec instead:

$ python -c "import asyncio
async def run():
    proc = await asyncio.create_subprocess_exec('./shell_script_with_traps')
    await proc.communicate()
asyncio.run(run())"
^C
[...the traceback for the KeyboardInterrupt exception...]
$ ls a b c
a  b  c

Now the trap does not appear to run at all. What's going on?

SIGINT handling in Python vs bash

When we press Ctrl+C, all processes in the foreground process group receive the SIGINT signal. So, both Python and bash get a chance to handle the signal.

As expected, bash will run the trap upon receiving SIGINT. But, on the other hand, Python's subprocess.run will give the bash process a grace period of 250 milliseconds to exit, and then forcefully kill the process. This is why the example's trap got interrupted halfway through.

When using asyncio.create_subprocess_exec, we don't even get this grace period, so the bash process gets killed immediately, often before the trap can run at all.

By contrast, in a bash-only scenario (if the script were called from within another bash script), the exit trap would not get interrupted. See this excerpt from Greg's Wiki:

bash is among a few shells that implement a wait and cooperative exit approach at handling SIGINT/SIGQUIT delivery. When interpreting a script, upon receiving a SIGINT, it doesn't exit straight away but instead waits for the currently running command to return and only exits (by killing itself with SIGINT) if that command was also killed by that SIGINT. The idea is that if your script calls vi for instance, and you press Ctrl+C within vi to cancel an action, that should not be considered as a request to abort the script.

Finally, note that bash and shell traps are irrelevant to the problem at large. For example, a C program that runs some expensive cleanup logic on SIGINT would also get prematurely killed by Python.

What can we do about it?

Part of the problem is that our bash script's cleanup logic was never truly reliable after all. We implemented something that was apparently good enough, but as it often happens, it turns out that our assumption was too optimistic and it's not so uncommon to get our shell script killed after all.

Unfortunately, it is not always simple or possible to implement bulletproof cleanup logic, which generally involves passing the buck to the operating system. For scenarios involving regular files, there are quite a few tricks, such as using pipes (including bash's process substitution), and various Linux extras such as open(..., O_TMPFILE), memfd_create, namespaces (see also: bubblewrap), and so on. But more complex scenarios involving cryptsetup, filesystem mounts, swap files, and so on, are often impossible or very awkward to handle this way.

We can also try to work around it from the Python side, though unfortunately there doesn't seem to be any simple way to do it without adding a bunch of complexity (switching to lower level interfaces such as subprocess.Popen and handling errors manually). This may be something to improve in the standard library in the future.

Another possibility could be to spawn the exit logic as a background process that waits for the foreground script to exit and then cleans up, though this feels like pushing in the direction of building a Rube Goldberg machine that will break in even more subtle ways.

Ultimately, I don't have any simple, universal and reliable fix for this at the moment. If you have any idea, don't hesitate to contact me!