Cold case: protocol reverse engineering (part 2)

If you missed the previous post, you can find it here.

I was determined to find out which protocol they were mimicking. If you know OSI model (often named ISO/OSI), you already know that each level may (in fact, they do) adds an header (sometimes also a footer) in every Protocol Data Unit (PDU) in a process named "encapsulation".

So, I ignored the ethernet "type" field and I tried to manually match the protocol by looking at the raw PDU (printed in hexadecimal format in a piece of paper). After some trial-and-error, I found that the PDU was a real-complete IP packet! I haven't tried the IP protocol as the first one because I thought that they built a custom protocol based on a simpler/older one than IP.

After that discovery, I forced Wireshark to decode it as IP, and also the TCP connection showed up. Counters, addresses and other things that I was supposing before (see previous post) were actually IP and TCP fields (IP addresses, TCP ports, sequence numbers, etc).

I was happy because this was a big step forward. Now we can assume that PDU size (at TCP level) is actually a proprietary protocol. So it comes the hardest part.

I made a number of traces (~200) and I come back to vbindiff to compare TCP data. Note that, as in the previous case, the trace is made with a passive bridge tap and by doing some action on the controller/PLC. Then, every trace is saved with raw capture and "metadata" (the complete environment, eg: what commands are sent by the controller, what actions is performed on PLC, which are the values displayed or set).

What I learned, by doing many comparison (which took me nearly two weeks), is that:

A side node: actually the first thing that I noticed was the floating point value. I still remember the IEEE 754 logic and structure (I've studied that in High school, nearly 7 years ago), that helped me because floating point values are stored in a particular (characteristic) format.

There was sufficient data to write a custom software to do tests. But one question remains: how do I simulate a "wrong" ethertype for IP? I was suspecting that the ethertype were modified by the industrial ethernet board (or its driver), so I build my "PLC"-client for Windows and I made it run over the controller. And I was right: every ethernet frame with an IP/TCP connection to the PLC was changed to adjust the ethertype.

After some adjustment and testing, my client was working well (as the original controller) for setting and getting values (not the PLC-programming part yet). But still, how to use another industrial ethernet card with Windows and make it change the ethertype?

With a kernel driver, of course.

The Windows Kernel Driver parabola

I must admit: that was a really impulsive decision. But still, it taught me a lot.

Unfortunately I do not remember a lot - I'm not a Windows fan. I downloaded the whole Windows Driver toolkit and I began to play with some examples. Then, I wrote a simple "network filter" that was doing the same as the board: checking if the connection was for the TCP port of the PLC and change the ethertype field.

The "development-environment" was a pain: two physical machines (why I didn't use VMs? I tried, but something in Windows kernel didn't cooperate...) linked with both ethernet and serial connection. There was an option to use ethernet-only, but for some reason it didn't work at all. The "host" was a simple-plain Windows 7 PC with Visual Studio, the "guest" was a Windows 7 with debug enabled.

As you can expect, with a debugger attached to a kernel you can do pretty much everything: freeze the machine (and then continue the execution), inject drivers, trigger events, and, of course, trigger an Blue Screen Of Death.

The driver was working perfectly - except for only two bugs (which causes some BSODs in tests) that I fixed immediately after the crash. I checked with Wireshark: packets were indistinguishable (I mean, you cannot tell if it was sent by the original controller or my software).

Then, after the tests with my client over the "driver-equipped-machine", I accidentally launched my PLC client even on the host machine (which didn't have the driver). And it worked like a charm. WTF? Why?

It turns out that custom ethertype (and so the Windows driver) was not really necessary. Even standard IP ethertype was OK.

See why I say that was an impulsive decision?

Conclusion

After that, I moved all the code into the OPC toolkit to build an OPC driver, so I was able to get/set values by using OPC compliant software, such as the Kepware one.

It took me nearly three months to reverse engineer this protocol. And I'm still asking me how the hell I made it.

Appendix - tools

To summarize tools and usages:

A lot of things that helped me were not coming from networking study: programming skills (I'm a C developer, and I made some small software in assembly), operative system internals, RFC comments about some technical trick in standards (see, for example, RFCs about IEEE 754 and TCP Tahoe/Reno).