SSH Keystroke Obfuscation Bypass

14 min read June 24, 2024 2873 words
A disclosure for an OpenSSH keystroke obfuscation bypass affecting current OpenSSH versions after 9.4.

Introduction

OpenSSH version 9.5 introduced measures to mitigate a keystroke timing attack via traffic analysis. The patch involved adding keystroke timing obfuscation to the SSH client. As per the release notes, this feature “attempts to hide inter-keystroke timings by sending interactive traffic at fixed intervals (default: every 20ms) when there is only a small amount of data being sent”. Additionally, fake chaff packets are sent after the last real keystroke, significantly complicating traffic analysis by shrouding the real keystrokes in a stream of impostors. The feature can be controlled and disabled via the ObscureKeystrokeTiming option in the SSH config.

As part of my Bachelor dissertation, I researched the impact of using keystroke latency analysis to infer an SSH session’s underlying commands being run by the client. As a part of this endeavour, I discovered a way to bypass the measures introduced in OpenSSH 9.5, reaching up until the latest release. I notified the developers on April 24th and received a response from Damien Miller himself (the developer who introduced the patch), but unfortunately all further correspondence was met with silence. Hence, the publication of the disclosure.

The existing problem

Previous implementations of the SSH protocol leaked a significant amount of metadata, especially when used interactively. Despite being fully encrypted, this metadata can be used, in principle, to breach the confidentiality of the underlying session. Simply put, each time you press a keystroke in an interactive SSH session, that keystroke is packaged, padded, encrypted, and sent over the wire to the server, on its own. It is then echoed back from the server. This means that each keystroke can be clearly identified and timestamped, opening up potential keystroke-latency attacks to infer what was typed by the client. This can be taken further by also incorporating additional context such as the size of the server’s responses, as well as other metadata points, which I have analysed in detail in my paper for university, which will be published once it is marked.

The above can be observed using Wireshark, with a display filter for ssh. However, to make this process easier, I wrote a tool called SSHniff to automate the metadata extraction process. As part of my research project, I also included Jupyter notebooks where I showcase how the intercepted latencies can be leveraged to infer the underlying UNIX command, using algorithms like Dynamic Time Warp (DTW) and/or Time Series Forests. This is also summarised in this blogpost.

Obfuscation in a Nutshell

While this attack vector was discarded by many individuals in the past, preventative measures were first introduced in October of last year (2023). The idea is to veil the real keystroke packets among a wave of fake packets that look just the same to an external observer. These are the so-called chaff. Further, all outgoing packets are quantised to a latency of roughly 20ms. The “chaff” is in reality just SSH2_MSG_PING and SSH2_MSG_PONG packets, that have the same size as the keystroke packets. Whenever a keystroke is typed, these chaff packets start flooding out, hiding any subsequent keystrokes. They are also sent for a certain time interval after the last real keystroke.

Discovering the Bypass

Part of my thesis was to evaluate how effective these preventative measures introduced by OpenSSH are. While I expected them to completely break this attack vector, when I loaded a Wireshark capture into SSHniff, I realised that certain packets still stick out substantially, among the hundreds of packets hovering around 20ms intervals.

In this session, I ran uptime. As you can see, there is one spike for each letter in uptime, but note that the first real keystroke is at latency zero, and the last spike corresponds to the Enter keystroke, for a total of seven real keystrokes.

To verify, I ran some additional commands and found the same behaviour. netstat -tlpn, for instance, has thirteen letters. The Enter keystroke was omitted in this session.

As I came to realise, these spikes were caused by SSHniff skipping three packets each time (hence the 60ms relative latency for each spike), which boiled down to the tool’s implementation, as it only looks for packets of a certain size K, corresponding to the keystroke packets. This implied that among the chaff, certain packets were slightly larger or smaller than the tool expected, so it ignored them.

This prompted me to take a closer look at Wireshark, and indeed, I realised that for each keystroke typed after the chaff is triggered, these real keystroke packets produce larger packets (as well as two server-side echoes), which means they can be quite clearly identified. I used the previous methods like DTW to check if I can use these outlying packets to infer the underlying command just as before, and it worked, which served as verification that these are indeed the real keystroke packets.

For more information on the latency analysis part of this attack, refer to the Keystroke Latency Analysis section.

I then notified the OpenSSH developers and started digging deeper into the observed behaviour.

Fat Packets

These outlying packets were roughly twice the size of “normal” keystrokes (and therefore also twice the size of the chaff packets). I say roughly because the absolute on-the-wire size depends on the encryption ciphers used, among other things.

The chaff are the packets of 102 length, which, normally, using this cipher set would be the size of regular keystrokes (those that SSHniff would also filter out). Since the client-side packet initiating these triplets is larger (138), all three packets slip through the cracks and cause the aforementioned spikes when plotted.

It is interesting to note that the initial keystroke is sent “normally”, without being packaged in a fat packet. Only once the chaff flood is initiated do subsequent real keystrokes produce these larger packets. Similarly, waiting for the chaff to subside before writing out the next keystroke, also produces a regular packet pair (followed by chaff).

OpenSSH Verbose Output

I compiled OpenSSH v9.7 with the -DPACKET_DEBUG flag, to get a more verbose view of the session. In the following session, I ran the whoami command. I will now show how the client constructs/packages keystrokes.

Starting with the initial keystroke, which, as stated, is sent with a normal size and followed by chaff packets.

debug1: packet_start[94]

plain:     buffer len = 15
0000: 00 00 00 00 00 5e 00 00 00 00 00 00 00 01 77     .....^........w
debug1: send: len 20 (includes padlen 5, aadlen 4)

encrypted: buffer len = 36
0000: c7 e4 05 07 10 e1 f3 4b 24 ba 61 e8 fe 6e 0b 01  .......K$.a..n..
0016: 79 50 4e af 6a 96 31 5e ff fa ec bf 2b 3b 91 42  yPN.j.1^....+;.B
0032: a7 14 64 a6                                      ..d.

debug3: obfuscate_keystroke_timing: starting: interval ~20ms

debug1: input: packet len 20

debug1: partial packet: block 8, need 16, maclen 0, authlen 16, aadlen 4

read_poll enc/full: buffer len = 36
0000: d7 47 be 72 2c e8 e7 e1 ae 38 fe a2 f4 e9 e0 04  .G.r,....8......
0016: 9c 00 fd 1d 41 f8 d9 6e 61 4b 90 4e a4 e6 2c 30  ....A..naK.N..,0
0032: 92 92 42 54                                      ..BT
debug1: input: padlen 5

debug1: input: len before de-compress 10

Packet type 94 is SSH2_MSG_CHANNEL_DATA (defined in ssh2.h) and stores an individual keystroke. We can also see the obfuscation starting, with an interval of ~20ms (default). The packet has an encrypted length of 36 bytes, which matches what we see on Wireshark when looking at the TCP payload length.

This is followed by the server-side echo, which we “read” on the client side:

read/plain[94]:

buffer len = 9
0000: 00 00 00 00 00 00 00 01 77                       ........w
debug1: received packet type 94

w

Next, have a look at some of the chaff that follows the first real keystroke. This is sent by the client:

debug1: packet_start[192]

plain:     buffer len = 15
0000: 00 00 00 00 00 c0 00 00 00 05 50 49 4e 47 21     ..........PING!
debug1: send: len 20 (includes padlen 5, aadlen 4)

encrypted: buffer len = 36
0000: d7 cf 6b 64 25 d6 40 89 68 eb 4d 6c a0 cb de e6  ..kd%.@.h.Ml....
0016: d0 b5 14 81 c4 57 6f c4 3a 82 eb 55 44 d2 b4 9d  .....Wo.:..UD...
0032: 3b 58 12 ac                                      ;X..
debug1: input: packet len 20

debug1: partial packet: block 8, need 16, maclen 0, authlen 16, aadlen 4

read_poll enc/full: buffer len = 36
0000: 88 40 00 52 0e 4b fc eb 89 f7 72 1f d6 a4 3f dd  .@.R.K....r...?.
0016: 0b dd 27 19 0e a8 84 f7 74 6f 43 e7 8c eb 16 9e  ..\'.....toC.....
0032: 37 4e 89 95                                      7N..
debug1: input: padlen 5

debug1: input: len before de-compress 10

We can see it is an SSH2_MSG_PING, and most importantly, it is also 36 bytes, perfectly matching the real keystroke. Several of these PINGs are sent, and each are followed by the server’s PONG (SSH2_MSG_PONG), of 36 bytes, again.

read/plain[193]:

buffer len = 9
0000: 00 00 00 05 50 49 4e 47 21                       ....PING!
debug1: received packet type 193

debug1: Received SSH2_MSG_PONG len 5

So far, everything seems to run as intended. However, when we reach the second real keystroke, namely h, things start behaving differently.

First, the keystroke packet is constructed, just as before:

debug1: packet_start[94]

plain:     buffer len = 15
0000: 00 00 00 00 00 5e 00 00 00 00 00 00 00 01 68     .....^........h
debug1: send: len 20 (includes padlen 5, aadlen 4)

encrypted: buffer len = 36
0000: c3 22 ea f0 f5 47 15 db 95 c9 64 ec e6 66 40 a2  .\"...G....d..f@.
0016: d2 fc 71 e2 59 35 c3 a7 85 90 4c b9 7f 17 fd 65  ..q.Y5....L....e
0032: 97 54 c3 e6                                      .T..

Same length, but notably the partial packet and read_poll debug entries are missing, which is because the packet is not actually sent yet. What follows is the construction of a PING packet, before this keystroke is sent:

debug1: packet_start[192]

plain:     buffer len = 15
0000: 00 00 00 00 00 c0 00 00 00 05 50 49 4e 47 21     ..........PING!
debug1: send: len 20 (includes padlen 5, aadlen 4)
encrypted: buffer len = 72

0000: c3 22 ea f0 f5 47 15 db 95 c9 64 ec e6 66 40 a2  .\"...G....d..f@.
0016: d2 fc 71 e2 59 35 c3 a7 85 90 4c b9 7f 17 fd 65  ..q.Y5....L....e
0032: 97 54 c3 e6 c6 59 df 64 eb c8 ba d4 f7 ed 5a 88  .T...Y.d......Z.
0048: 53 13 da 7e 7f 1d 63 9d dd 23 40 b4 b9 67 6e f3  S..~..c..#@..gn.
0064: 76 12 66 1b 89 5b 5a 21                          v.f..[Z!

debug1: input: packet len 20

debug1: partial packet: block 8, need 16, maclen 0, authlen 16, aadlen 4

read_poll enc/full: buffer len = 36
0000: 47 3e ca 40 05 b8 a8 5b 1d 1a 2b bd bd c6 d5 35  G>.@...[..+....5
0016: d1 dc 56 f2 28 8a c4 07 df cb 73 e1 fb cc 0a 9e  ..V.(.....s.....
0032: 20 73 c7 97                                       s..
debug1: input: padlen 5

debug1: input: len before de-compress 10

Here we see the previously-missing partial packet and read_poll debug entries, but we also see that because these two packets were essentially combined, the encrypted buffer length is now 72, instead of 36 bytes.

Finally, we get two server-side echoes, starting with the PONG, followed by the keystroke echo for h:

read/plain[193]:

buffer len = 9
0000: 00 00 00 05 50 49 4e 47 21                       ....PING!
debug1: received packet type 193

debug1: Received SSH2_MSG_PONG len 5

debug1: input: packet len 20

debug1: partial packet: block 8, need 16, maclen 0, authlen 16, aadlen 4

read_poll enc/full: buffer len = 36
0000: ec 9f ef a2 55 7e c3 4c f8 75 08 a9 8d 45 7e 14  ....U~.L.u...E~.
0016: 1f 55 b1 44 6e ea c7 f9 c9 ef ed ef 33 42 a7 29  .U.Dn.......3B.)
0032: 67 84 fa 94                                      g...
debug1: input: padlen 5

debug1: input: len before de-compress 10

read/plain[94]:

buffer len = 9
0000: 00 00 00 00 00 00 00 01 68                       ........h
debug1: received packet type 94

h

This is what the triplet spikes look like at the verbose debug level. It also explains the larger size and the duplicate echoes, as the real keystrokes are packaged up together with a PING packet, producing a single packet twice the size of a “normal” packet, and triggering two server-side responses.

SSHniff

In the spirit of following the good old “PoC or GTFO” mindset, I wrote an atrocious but functional “patch” into SSHniff, where if SSH versions after 9.4 are detected, it is assumed that obfuscation is in use and the bypass is employed. Note that it really is an atrocious bunch of code that I polluted my text editor with, but it ought to suffice in showing that the current keystroke obfuscation is completely transparent.

Here is an example of running SSHniff on an intercepted SSH session that used the obfuscation: I ran iptables -S, whoami, ls -al, and finally fat-fingered exi, followed by exit. You can verify this for yourself, as I included the PCAP here.

<SNIP>
┃╭─────────────────Client─────────────────╮      ╭─────────────────Server─────────────────╮
┃│           192.168.0.19:55932           │      │            192.168.0.16:22             │
┃│    e42184b06d45385a906f0803d04c83da    │----->│    aae6b9604f6f3356543709a376d7f657    │
┃│          SSH-2.0-OpenSSH_9.7           │      │          SSH-2.0-OpenSSH_9.7           │
┃╰────────────────────────────────────────╯      ╰────────────────────────────────────────╯
<SNIP>
┣━ tcp.seq ─ Latency μs ─ Type
[4450](       0) ─ Keystroke
[4774](  177182) ─ Keystroke
[5026](  119630) ─ Keystroke
[5170](   60477) ─ Keystroke
[5530](  182991) ─ Keystroke
[5638](   36727) ─ Keystroke
[5998](  175786) ─ Keystroke
[6142](   59886) ─ Keystroke
[6394](  119464) ─ Keystroke
[6646](  117633) ─ Keystroke
[7078](  219396) ─ Keystroke
┣╮ [10858]( 3478329) ─ Enter
┃╰─╼[236]
┣━
[10858](       0) ─ Keystroke
[11290](  238980) ─ Keystroke
[11470](   80064) ─ Keystroke
[11650](   79103) ─ Keystroke
[11902](  122768) ─ Keystroke
[12226](  158690) ─ Keystroke
┣╮ [15034]( 3324090) ─ Enter
┃╰─╼[204]
┣━
[15034](       0) ─ Keystroke
[15322](  162362) ─ Keystroke
[15502](   81398) ─ Keystroke
[15682](   83084) ─ Keystroke
[15862](   79398) ─ Keystroke
[16114](  123489) ─ Keystroke
┣╮ [18598]( 1363393) ─ Enter
┃╰─╼[3116]
┣━
[18598](       0) ─ Keystroke
[18922](  164250) ─ Keystroke
[19210](  144942) ─ Keystroke
┣╮ [22522]( 1822534) ─ Enter
┃╰─╼[256]
┣━
[22522](       0) ─ Keystroke
[22846](  162024) ─ Keystroke
[23134](  149977) ─ Keystroke
[23458](  158038) ─ Keystroke
┣╮ [27350](  204709) ─ Enter
┃╰─╼[272]
┣━
┣━━━━

As you can see, the keystrokes are extracted seamlessly and are ripe to be fed to the analysis tool.

Keystroke Latency Analysis

This is not part of the initial disclosure for the obfuscation bypass, but it should help understand both the impact of the metadata leaked by the SSH protocol and also the need for such preventative measures. It will also paint a more complete picture of the entire attack and discovery process.

To demonstrate how SSH metadata can be used to breach confidentiality, I will show a PoC of how to use SSHniff to extract keystrokes and infer the underlying command(s).

Wireshark captures can be fed to the tool, which then produces output like this:

Among other things it shows any keystroke sequences typed out during the session, as well as their relative latency (in microseconds), TCP sequence numbers, and the inferred keystroke type. Using the packet sizings, we can discern between certain keystrokes, like backspaces, Enter (Return), and horizontal arrow keys, which is yet another crucial point in traffic analysis. Here, the only thing typed in the session was exit, followed by Enter (Return).

The tool can also serialise the data, such that it can then be plotted and processed. Using Jupyter notebook, I set up this proof of concept, using a small dataset accumulated for my thesis.

Rhythmic Commands

For a full look at the research, consult the paper (link TBD), and/or consult the notebook on the SSHniff repository. If you are only interested in the obfuscation bypass, scroll all the way down in the notebook until the “Patch Analysis” section.

In a nutshell, I show that commands can produce certain “profiles”, or rhythms, when typed, which are identifiable by their latency. The below plot is an example where I myself typed out sudo apt upgrade 18 times:

The dataset collected by external participants also showed this to be the case (although naturally, some commands were more identifiable than others):

Using algorithms like the Euclidean Distance or DTW, an intercepted (unlabelled) keystroke sequence could be compared to the commands in the dataset, therefore calculating the “similarity” between the sequences.

This is what such a sequence, observed by SSHniff might look like:

Some of the results are summarised in this table: