I recently dockerized an SFTP file server setup on an old Raspberry Pi 2 server, and I was surprised by how slow logging in to the server was: about 4 seconds per connection1.
My first intuition was that the time was lost to network latency or encryption, but the previous setup didn't have this problem, so that couldn't be it.
In fact the problem easily reproduces on a cheap VPS, though to a lesser degree:
$ docker run --name sftptest -d -p 127.0.0.1:1234:22 --entrypoint ash atmoz/sftp:alpine -c "/entrypoint foo:pass:::upload"
$ time sshpass -p pass ssh -p1234 -o PreferredAuthentications=password -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null foo@127.0.0.1
[...]
real 0m1.290s
user 0m0.014s
sys 0m0.008s
$ docker rm -f sftptest
At first, I tried playing with various OpenSSH switches and debugging options, but that didn't help much, as the time seemed to be lost "doing nothing". But after some more tinkering, I tried strace
, and quickly found that it was trying to close several million file descriptors:
$ docker run --name sftptest -d -p 127.0.0.1:1234:22 --entrypoint ash atmoz/sftp:alpine -c "apk add strace && strace -f /entrypoint foo:pass:::upload"
$ timeout 10s sshpass -p pass ssh -p1234 -o PreferredAuthentications=password -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null foo@127.0.0.1
$ docker logs sftptest
[...a log of logs...]
[pid 39] close(119676) = -1 EBADF (Bad file descriptor)
[pid 39] close(119677) = -1 EBADF (Bad file descriptor)
[pid 39] close(119678) = -1 EBADF (Bad file descriptor)
[pid 39] close(119679) = -1 EBADF (Bad file descriptor)
[pid 39] close(119680) = -1 EBADF (Bad file descriptor)
[pid 39] close(119681) = -1 EBADF (Bad file descriptor)
[pid 39] close(119682) = -1 EBADF (Bad file descriptor)
[...a lot of logs...]
^C
$ docker rm -f sftptest
After some more analysis, I found that all those close()
calls seemed to come from a function named closefrom_fallback
in OpenSSH, which will close all file descriptors up to OPEN_MAX
in an attempt to ensure most file descriptors are closed2.
So why does this only happen when running in a Docker container? The problem is that Docker containers will have by default a very generous limit of 1048576
open files, instead of the typical default of 1024
on typical setups:
$ getconf OPEN_MAX
1024
$ docker run --rm alpine:3.13 getconf OPEN_MAX
1048576
This high OPEN_MAX
setting, when combined with the brute-force 'close all file descriptors' approach, has already caused various issues in various applications, but I haven't found anyone reporting this for OpenSSH, so I guess there are a lot of annoyingly slow SFTP logins out there.
Thankfully working around the problem is pretty easy once we know the cause, by specifying a smaller OPEN_MAX
limit by passing the --ulimit nofile=1024:1024
flag to docker run
to get about 14x faster logins:
$ docker run --name sftptest -d -p 127.0.0.1:1234:22 --ulimit nofile=1024:1024 --entrypoint ash atmoz/sftp:alpine -c "/entrypoint foo:pass:::upload"
$ time sshpass -p pass ssh -p1234 -o PreferredAuthentications=password -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null foo@127.0.0.1
[...]
real 0m0.091s
user 0m0.015s
sys 0m0.000s
$ docker rm -f sftptest
-
I was using nPlayer as the client, which seems to open a new connection on every folder change, so those 4 seconds quickly added up to a minute. ↩
-
I haven't looked in detail why this
closefrom_fallback
code path is hit, but it appears that OpenSSH will try to enumerate open files by looking at/proc/$$/fd
, but it can't do this afterchroot
is called to jail the SFTP to a directory. ↩