Cold case: protocol reverse engineering (part 1)

October 14, 2017

Some years ago I was asked to do a reverse engineering of an older protocol used in automation systems. The goal was to be able to communicate with some equipment already in-place in some industrial buildings nearby. Let me explain.

Note: at the end of the story I'll put some commands and techniques that I applied in order to reverse engineer this protocol. If you're not interested in this story, ~~feel free to subscribe and wait for the last part~~ you can find the second post here.

Some background

In industrial sector, automation is everything. Either long or short processes are automated nowadays: there are many mechanical devices controlled by one or more PLCs (Programmable Logic Controller). Today PLC is a sort of "dedicated-computer". Each PLC is connected to an automation network via fieldbus-protocols, such as PROFINET (over Industrial Ethernet).

Usually PLCs have some logic (for example, open valve X if temperature Y is raising), but of course your building process may require many variables and on-the-fly changes, so they're constantly connected to one (or more) server, named supervisor. This system is called DCS, or Distributed control system, or SCADA, Supervisory control and data acquisition. Many PLCs with logic, some control stations (for local operators) and some supervisors.

With the same system you can acquire, store and manipulate data regarding the build process: for example, quantities of chemical compound, temperatures, timing.

Some of these systems (mostly the older one, the DCS) uses proprietary protocols to talk between PLC and supervisors, and offers no APIs. SCADA systems often uses standards protocols (as alternative with proprietary ones) and/or APIs thru OPC drivers.

Times go by

Times go by, for everything. In this case, I had my hands on a 15-years-old system, with a spare PLC and a spare supervisor (as the real system is still in use, and producing). The PLC was equipped with some digital and analog I/O boards. It was working perfectly: the PLC logic was ok, the supervisor was supervis-ing, etc.

So, why I was there? Because the supervisor had one big issue: it runs on Windows 2000. And there was no way to make software work with newer systems nor accessing/interfacing with some driver. Of course, there was an easy way to solve this: dismantle everything and rebuild the building chain with newer hardware. In short, replace fully-functional hardware (and software) with a new product (which may have some issues, who knows).

You would agree with me by thinking that rebuild the chain was a big waste of time, money and hardware.

So my job was find a way to interface PLCs with a newer system.

Ground truth

I had little clues. All that I had was:

An OPC devkit to write a driver which talks OPC (an open standard SCADA-compatible)
The PLC and the supervisor in my hands

The first thing that I noticed is that the supervisor and the PLC were connected with two (distinct) cables that looked like industrial ethernet cables. Pin mapping and some tests revealed that I was right: interfaces were ethernet. That might be consistent with some sort of ancient PROFINET drafts. At least, I was hoping that.

So the first step was doing a man-in-the-middle "attack": with a dual-ethernet workstation, I made a passive-tap-bridge and I put the workstation in the middle of supervisor-PLC connection. A passive-tap-bridge is an ethernet bridge with some features:

No change on packets (no firewall, etc)
No Spanning Tree Protocol, loop protections or something like that
Promiscuous interface set
No IP address or something like that
and Wireshark, of course

A side note: the PLC was connected with both cables (for redundancy purpose), so I took offline the second cable to force the supervisor and PLC to talk with me listening on the first link.

I wasn't lucky enough. The data stream that I expected was something like a TCP/UDP connection of some sort. Instead, I had a bunch of Ethernet II frames with unknown data.

I made some hypothesis:

The protocol was too old for some sort of encryption (so, no encryption)
I was looking at the raw data of a proprietary protocol
OR, there was some sort of proprietary weak encryption for higher protocols than Ethernet

It's not uncommon to see direct-to-ethernet C&C protocols. So I started to analyze some frames.

The analysis (part 1)

At first look, no clue. There were two strings of 3 chars (maybe some sort of "magic" for that protocol, or maybe some random coincidence) in many frames. Also, layer 2 protocol 0x8275 said nothing to me. So I collected some data streams between supervisor and PLC (by query the PLC from the supervisor's software), extracted the ethernet payload and I started to analyze it with vbindiff, a diff-like tool that deals with binary files.

I noticed that there were some bytes incremented linearly, like a counter. Other bytes instead were swapped if the direction was the opposite (eg. on PLC response). That might indicate some sort of addressing (it makes sense). Then, the value that I was querying was probably at the end of the frame, because a change on the PLC side was reflected by a change (somehow) at the end of the ethernet payload. But it was not linear (consistent with real changes), so no idea how that value was calculated.

To recap, this analysis told me that:

The protocol was not encrypted - eg. if it had been encrypted I would not have seen the same bits, in the same place, between two messages.
There was some sort of counter: usually this is made to avoid duplicating packets at the destination, and to provide flow-control/retransmissions/etc. So at least one of these mechanism were in place.
There was some sort of addressing scheme (with at least 1 byte)
That two cables were meant to be used with two separated networks, but they were transmitting the same data (for redundancy purpose)
Each command (eg. set this value to X) requires at least 1 reply. Usually 2.
The Ethernet FCS (Frame Check sequence) was wrong. In each frame.

With 50-60 traces (with various commands and replies each), I started to look for correlations between these bytes in different frames.

One thing that I noticed after some hours of digging is that, when there were two replies, the first one was somehow really small. Then, analyzing more I discovered that there was another short (16bit) value that was swapped between two frames of the same transmission. Last conclusion collides with the addressing scheme assumption said before, so I was really unhappy.

I tried to do some interesting thing, such as byte conversions and endianess swapping. There was no clue until I found something caught my attention: swapping the cable results in a change on a byte before the "address-byte" we mentioned before. So it might indicate some sort of 2-byte addressing scheme or subnetting thing.

After a week of testing, I was about to surrender. There was something familiar with all these bits, but I didn't know what.

Then, I had somehow a breakthrough: what if that is a subset (or custom version) of a wide-used protocol?

(go to part 2)