SSHniff
SSHniff: An SSH Metadata Analyser
As part of my final-year bachelor research project, I took a stab at determining how feasible a metadata-based attack on the SSH protocol is nowadays. This type of attack was first introduced in 2001, in a paper called “Timing Analysis of Keystrokes and Timing Attacks on SSH”. It is based on the fact that interactive SSH sessions leak significant metadata. Keystrokes, while encrypted, are sent to and echoed by the server individually.
This allows an observer to detect both the amount of keystrokes sent to the server and their individual timings. Consequently, a theoretical attack based on these metrics is feasible, potentially compromising the confidentiality of the session. Advancements in fields like Keystroke Dynamics (KD), where typing rhythms are used to authenticate or fingerprint users, support this notion.
In pursuit of this research, I developed SSHniff, an SSH metadata analysis tool, based on Ben Reardon’s Packet Strider. It is written in Rust and comes with a suite of functions to aid in the analysis of interactive SSH sessions. The goal is to raise awareness about this attack vector and ensure necessary mitigations are implemented. Additionally, it will aid in the development of these mitigations, as touched upon in this blog post.
How it works
The tool makes use of certain tell-tale packets in a given SSH connection sequence to calculate properties of the session, such as the size of keystroke packets and the method and time of authentication. The primary source of this information is the initial Key Exchange (KEX) procedure, which sets the stage for the attack. Detailed methodology behind this algorithm is explained in the paper accompanying this tool, which will be released upon approval from my university.
The tool relies on a network capture to autonomously pinpoint one or more SSH sessions and extract their metadata. By default, it outputs data in a human-readable format but can also produce JSON
output for further processing. The repository includes example captures and analysis notebooks demonstrating how intercepted data can be used to infer UNIX commands run during the session. Currently, the tool is tailored to OpenSSH
and has not been tested on other implementations.
SSHniff Demo
So, what does running this tool actually look like, you might ask. Currently, to make use of the analyser, you need a PCAP/NG
network capture of an SSH session. In future iterations, the tool will also include real-time monitoring capabilities.
In this demo, I will let SSHniff
analyse a session where I authenticated using a password and ran the commands uptime
, ls -al
, and exit
, in that order. The session was observed using Wireshark
and saved to a pcap
file.
┏━━━━ Results
┃ Stream 0
┃ Duration (UTC): 2024-06-18 11:28:40 - 2024-06-18 11:29:01
┃ KEX curve25519-sha256
┃ Encryption chacha20-poly1305@openssh.com
┃ MAC umac-64-etm@openssh.com
┃ Compression none
┃╭─────────────────Client─────────────────╮ ╭─────────────────Server─────────────────╮
┃│ 192.168.0.135:52925 │ │ 192.168.0.45:22 │
┃│ 779664e66160bf75999f091fce5edb5a │----->│ ae8bd7dd09970555aa4c6ed22adbbf56 │
┃│ SSH-2.0-OpenSSH_for_Windows_8.6 │ │SSH-2.0-OpenSSH_8.4p1 Raspbian-5+deb11u3│
┃╰────────────────────────────────────────╯ ╰────────────────────────────────────────╯
┃
┣━ Timeline of Events
┣ [1123] Server hostkey accepted
┣ [1594] New Keys (21)
┣ [1610] Keystroke Size Indicator
┣ [1651] First login prompt
┣ [1714] OfferRSAKey
┣ [1703] RejectedKey
┣ [1755] CorrectPassword
┃
┣━ Keystroke Sequences
┣━ tcp.seq ─ Latency μs ─ Type
┣ [2602] ─ ( 0) ─ Keystroke
┣ [2638] ─ ( 29759) ─ Keystroke
┣ [2674] ─ ( 315379) ─ Keystroke
┣ [2710] ─ ( 314804) ─ Keystroke
┣ [2746] ─ ( 330272) ─ Keystroke
┣ [2782] ─ ( 74439) ─ Keystroke
┣╮ [2854] ─ ( 3210230) ─ Enter
┃╰─╼[304]
┣━
┣ [2854] ─ ( 0) ─ Keystroke
┣ [2890] ─ ( 74920) ─ Keystroke
┣ [2926] ─ ( 240266) ─ Keystroke
┣ [2962] ─ ( 854781) ─ Keystroke
┣ [2998] ─ ( 495017) ─ Keystroke
┣ [3034] ─ ( 75195) ─ Keystroke
┣╮ [3106] ─ ( 3419802) ─ Enter
┃╰─╼[2364]
┣━
┣ [3106] ─ ( 0) ─ Keystroke
┣ [3142] ─ ( 75640) ─ Keystroke
┣ [3178] ─ ( 119807) ─ Keystroke
┣ [3214] ─ ( 149607) ─ Keystroke
┣╮ [3286] ─ ( 740109) ─ Enter
┃╰─╼[200]
┣━
┃
┣━━━━
As you can see, the output includes several data-points about the session. Generally useful data that is included is the duration of the session, the encryption, MAC, and compression algorithms used, as well as the client and server HASSH fingerprints. The Timeline of Events
section details packets leading up to the active session, including key offers and password attempts.
Next, the really interesting part about this research, are the observed keystroke sequences. As you can see, the three commands are clearly identifiable- they are recorded, timestamped, and ripe for analysis. Furthermore, you can see that the Enter
keystroke can be clearly identified, delimiting each command sequence. It also allows the tool to measure the response size of each command. The first, namely uptime
, produces little output and therefore has a relatively small return size- 304 bytes. Depending on the directory, ls -al
can produce a comparatively much larger output, as is the case here. The usefulness of the return sizes has yet to be investigated, as my focus was on the keystroke timings and the latency produced by different commands, however, I suspect great potential in combining the two for a context-enhanced version of the attack.
Another useful characteristic of the current implementation of SSH is that certain keys produce distinctly identifiable signatures on the packet level. For instance, the delete
key, or backspace, can be identified clearly, which is of course very useful when trying to determine what command was run, as you know which keystrokes might have been spurious typos or accidental. SSHniff
points those out, too:
<...SNIP...>
┣━ Keystroke Sequences
┣━ tcp.seq ─ Latency μs ─ Type
┣ [2602] ─ ( 0) ─ Keystroke
┣ [2638] ─ ( 104849) ─ Keystroke
┣ [2674] ─ ( 89948) ─ Keystroke
┣ [2710] ─ ( 105028) ─ Keystroke
┣ [2746] ─ ( 165089) ─ Keystroke
┣ [2782] ─ ( 120250) ─ Keystroke
┣ [2818] ─ ( 779709) ─ Delete
┣ [2854] ─ ( 165404) ─ Delete
┣ [2890] ─ ( 179973) ─ Delete
┣ [2926] ─ ( 449512) ─ Keystroke
┣ [2962] ─ ( 390009) ─ Keystroke
┣ [2998] ─ ( 1275194) ─ Delete
┣ [3034] ─ ( 134892) ─ Keystroke
┣ [3070] ─ ( 150137) ─ Keystroke
┣╮ [3142] ─ ( 2009890) ─ Enter
┃╰─╼[240]
┣━
┣ [3142] ─ ( 0) ─ Keystroke
┣ [3178] ─ ( 165742) ─ Keystroke
┣ [3214] ─ ( 104241) ─ Keystroke
┣ [3250] ─ ( 119883) ─ Keystroke
┣╮ [3322] ─ ( 163810) ─ Enter
┃╰─╼[264]
┣━
In this example, I fat-fingered the whoami
command (twice), followed by exit
-ing the session. Nevertheless, the backspaces I typed were detected, so they can be accounted for when performing an analysis and trying to determine what I typed. The output always includes the respective packets’ tcp.seq
number, so one can always fall back to the network capture for manual debugging and analysis.
Other identifiable keystrokes include:
- Vertical/Horizonal arrow-keys for navigating commands
- Tab-completion (WIP)
- Ctrl-keys (WIP)
Analysing Latencies
“Okay, so a really motivated attacker can see how much I typed and when I typed it… So what?”
Well, that was the second part of my research- finding out how feasible this attack is. In pursuit of this research, I put together a small dataset from colleagues and friends alike, who accessed a VPS and typed common bash commands while their keystrokes were recorded. I must stress that this dataset ended up being considerably smaller than I had hoped, but nevertheless it produced valuable results. I also created a second dataset that was tailored to me, to see whether a targeted attack would be more effective.
The 2001 version of this attack made use of a Hidden Markov Model (HMM) to obtain information from the intercepted keystroke latencies. This approach proved effective for narrowing down randomly-chosen passwords, resulting in the conclusion that an attacker “could potentially extract 1.2 bits of information per character pair by using the latency information". However, in this research the focus is on a fixed set of commands, which changes the context significantly. Specifically, in the original paper, the researchers gathered latency data for 142 keystroke pairs, forming the states of the HMM used to identify various keystroke combinations that might occur in a password. In contrast, the approach taken in this research aimed to uncover more general information about an SSH session, not just credentials. Consequently, it was anticipated that common UNIX commands would constitute the majority of the intercepted keystrokes, reducing the range of potential keystroke combinations compared to the initial study. As such, a different approach was taken to leverage this relatively limited scope of potential commands.
The list of commands provided to the participants of the study included text like:
sudo apt-get install
iptables -L
pacman -Syu
netstat -tlpn
cat /etc/hosts
The aim was to cast a wide net of commands, flags, and keywords, which could then be analysed and used to profile observed command sequences. I visualised the latencies produced by different commands, which showed that there is a sort of “rhythm” for each command. Naturally, some rhythms were more pronounced than others, which is likely correlated with familiarity of the underlying command, but overall the graphs produced seemed to consistently follow a certain trend for each command:
As one would expect, these rhythms were significantly more pronounced on the personalised dataset, such as the one depicted below, showing 18 captures of the sudo apt upgrade
command.
Comparing (Algo)Rhythms
Matching an unlabelled keystroke sequence, such as the one depicted below, to a profile within the dataset is a matter of comparing time-series data.
There are various algorithms and methods to perform this comparison, each with their own merits and drawbacks. The Euclidean distance is fast, but rigid, as it cannot handle misaligned series well, by design. Another, more flexible option is Dynamic Time Warping (DTW), which manipulates regions of one series so as to fit another. As such, it excels at spotting certain commands and keywords, even if they appear out of order, relative to the dataset. Finally, there are also interval-based approaches that splice a given series into smaller intervals, which are used to train individual models that vote on an overall consensus of similarity. One such algorithm is the Time Series Forest (TSF), which makes use of decision trees for each spliced interval.
I implemented the analysis part in the notebooks found in the analysis
folder of the SSHniff
repository. The SSHniff
output can be loaded into the notebooks directly and be used to analyse observed command sequences. An example snippet using the participant dataset looks like this:
for sequence in ssh:
print("=" * 64)
print("DTW: ", find_best_match(sequence, cmds)[0])
try:
print("Euclidean: ", find_best_match_euclidean(sequence, cmds)[0])
except:
print(f"Failed at len {len(sequence)}")
# Real command: sudo systemctl status sshd
================================================================
DTW: ('systemctl status ', 0.5927303838841784)
Failed at len 26
# Real command: systemctl stop dnsmasq
================================================================
DTW: ('cat /etc/hosts', 0.6240965905382237)
Failed at len 22
# Real command: cat /etc/resolv.conf
================================================================
DTW: ('cat /etc/hosts', 0.5082970508023236)
Euclidean: ('cat /etc/resolv.conf', 0.6612118700951216)
# Real command: uptime
================================================================
DTW: ('uptime', 0.2845382440682646)
Euclidean: ('uptime', 0.2845382440682646)
# Real command: cat /etc/hosts
================================================================
DTW: ('cat /etc/hosts', 0.46613383232370575)
Euclidean: ('cat /etc/hosts', 0.5126030891431617)
# Real command: systemctl start firejail
================================================================
DTW: ('systemctl start ', 0.7892191949017342)
Failed at len 47
# Real command: touch test
================================================================
DTW: ('touch ', 0.4542034657833189)
Euclidean: ('rm -rf /tmp/*', 0.6451812010810625)
# Real command: exit
================================================================
DTW: ('exit', 0.19418610318151172)
Euclidean: ('exit', 0.19418610318151172)
This shows both the DTW algorithm and the Euclidean approach, the latter of which failing to handle differently-sized sequences. The numbers attached to the outputs are the distance to the inferred command, which can also be interpreted as the certainty that a match was found.
I’ll leave the precise nuances of each algorithm to the paper, but the overall results are shown in the table below.
Conclusion
For a long time, developers and users alike felt that this attack is not realistic or impactful enough to be bothered with. However, in October of last year, Damien Miller introduced a patch that adds keystroke obfuscation to the OpenSSH client (>=9.5
), hiding real packets in a swarm of fake “chaff”. Following my research, I found this to be a great step in the right direction, as it would completely throw off SSHniff
, or any implementation based on the described methodology for that matter.
Unfortunately, however, during the last stages of my research I discovered that the obfuscation is not implemented properly and can be fully bypassed, something which I outline in the following, separate blogpost. The silver lining is that with a tool like SSHniff
, verifying that a patch is implemented properly is much easier than sifting through WireShark
packets and counting sizes.
As stated previously, the dataset at my disposal was quite small in size, which is also the reason why I created a second, personalised one, to showcase the impact of a targeted attack. However, even with the smaller participant dataset, certain commands were consistently identifiable with various algorithms. This was merely a PoC, highlighting that the metadata transmitted by SSH is too much for comfort. It is not unfeasible for actors with substantially more resources to leverage this on a much more impactful scale; network captures are essentially immutable, so once a session is observed and stored, they could eventually fall prey to more sophisticated attacks.
In the conclusion of my thesis, I also highlighted that for future work, such approaches could include incorporating additional metadata and context into the analysis. For instance, the sizes of the responses returned by the server for each command sequence could tighten the scope of commands to be considered for each keystroke sequence. Additionally, one can infer certain parameters like the CWD’s path length (or the current username) by looking at the length of the CLI prompt packet returned by the server after each command. There are a lot of unexplored ways to work with this metadata, but the important thing is that most would be muzzled by the patch introduced in 9.5, which is great.