The legacy code behind WannaCry – the skeleton in the closet
While there’s been lots of discussion on the WannaCry ransomware (as well as its deeper implications) making the rounds, it’s definitely worthwhile to have a discussion about the core vulnerability which made this mess possible in the first place: a simple bug that should have been prevented by following (secure) coding best practices.
The WannaCry ransomware was a specific exploitation of the EternalBlue exploit, leaked by the Shadow Brokers hacker group in April 2017. EternalBlue targeted an implementation mistake in the ancient version of the Server Message Block (SMBv1) message handler in the Windows kernel – enabled by default on any OS from XP to Windows Server 2016. To emphasize the severity of the vulnerability, an attacker could trigger it by sending SMB protocol messages to an affected machine across the Internet – worse yet, as the SMB protocol handling is done in a kernel module, a successful exploit allowed the attacker to run their shellcode in ring 0 (i.e. with the highest privileges).
This is not the first time a critical vulnerability was found in the processing of the proprietary SMBv1 protocol (there are 33 unique SMBv1-specific vulnerabilities at the time of writing), and there is constant pressure on Microsoft to get rid of it completely or just disable it by default. Thankfully, there are more secure versions of the protocol available – SMBv2 has been available since the release of Vista in 2006, and SMBv3 since 2012. Microsoft itself asked users to disable SMBv1 support on their machines and added an audit tool that could identify SMBv1 usage, but could not remove support for it even in Windows 10 and Windows Server 2016, as it would break compatibility with a large number of machines still running Windows XP, not to mention many different types of consumer devices that can only communicate via SMBv1.
Microsoft released a patch to address the vulnerability in question for supported systems (Vista and up) on 14th March 2017, but this did not prevent the attack due to delays in applying security updates and a lack of support for legacy Windows versions (e.g. XP) still being in use by many. Shortly after the large scale attack, Microsoft decided to release a patch – even for these no longer supported versions.
Unfortunately, the sheer inertia of legacy code is strong – and with that comes the potential for critical remotely exploitable kernel-level vulnerabilities.
How a simple integer problem can ruin your day (and heap)
Note: Most of the source code in this post is a reconstruction based on IDA decompilation of srv.sys (v6.1.7601.17608, Windows 7 32-bit) – unfortunately the decompilation process introduced some artifacts and made the code overall harder to understand. Whenever possible, we tried to highlight relevant operations by representing them in equivalent higher-level code.
This example goes to show that no matter how serious a vulnerability may be, at its core there is always a simple programming bug, frequently in legacy code implementing functionality that most developers no longer use. The core of the vulnerability is in functions responsible for converting FEA (Full Extended Attribute as defined in the SMBv1 standard) list blocks in SMBv1 messages (SMB_FEA_LIST; see 2.2.1.2.2.1 in the MS CIFS standard), specifically converting between OS/2 and NT variants of SMBv1. The use case itself (converting between OS/2 and NT formats) implies code written in the 1990s – code that could be doing quite dangerous things, since such a conversion implies a copying-and-rewriting of the message.
In this case, observe the following code snippet (as decompiled from SRV.SYS) which takes a pointer to the start of the SMB_FEA_LIST block as the a1 parameter:
int __stdcall SrvOs2FeaListSizeToNt(_DWORD *a1) { _WORD *v1; // eax@1 unsigned int v2; // edi@1 unsigned int v3; // esi@1 int v4; // ebx@3 int v6; // [sp+Ch] [bp-4h]@1 v1 = a1; v6 = 0; v2 = (unsigned int)a1 + *a1; v3 = (unsigned int)(a1 + 1); if ( (unsigned int)(a1 + 1) < v2 ) { while ( v3 + 4 < v2 ) { v4 = *(_WORD *)(v3 + 2) + *(_BYTE *)(v3 + 1); if ( v4 + v3 + 4 + 1 > v2 ) break; if ( RtlSizeTAdd(v6, (v4 + 12) & 0xFFFFFFFC, &v6) < 0 ) return 0; v3 += v4 + 5; if ( v3 >= v2 ) return v6; v1 = a1; } *v1 = (_WORD)(v3 - v1); } return v6; }
The key line is the one bolded above; it is executed once we have looped through the entire list of FEA elements in the SMBv1 message. That line overwrites the length of the SMB_FEA_LIST block with the calculated length into the memory area pointed to by the a1 function parameter, practically performing the following operation:
(WORD) SMB_FEA_LIST->SizeOfListInBytes = (WORD) ((DWORD) pointer_to_end_of_list – (DWORD) pointer_to_start_of_list);
This is one of the rare cases where it is actually simpler to look at the code in assembly:
The pointer subtraction should give us the size of the list – this also means we’re (correctly!) not trusting the length value in the message blindly. However, there is a critical problem here: the SizeOfListInBytes field itself is a 32-bit DWORD (ULONG as per the specification), but we are storing a 16-bit value in it. Thus, if the original value of SizeOfListInBytes contains a value larger than 65535 (216-1), the upper 16 bits of that value will be retained during that operation. For instance, if we have an SMB_FEA_LIST block where SizeOfListInBytes (from the message itself) has the value of 65536 (216, i.e. 0x10000) but the actual length of the content is 65535 bytes (0xFFFF), SizeOfListInBytes will contain 131071 (0x1FFFF) after returning from this function. That is obviously not the correct value.
In addition to overwriting the size value, this function also calculates the total length of the data in the FEA list in NT format; the parsing of the SMB_FEA structures is properly validated at all steps (including the extra byte for storing the trailing zero) and the calculation cannot overflow due to the use of the safe function RtlSizeTAdd. Thus, in addition to returning the total length of FEA data as parsed from the input (by overwriting the SizeOfListInBytes in the header), the function also returns the recalculated size of the same data in NT format.
The main problem at this point is that these two values can be out of sync if the SizeOfListInBytes value is sufficiently large in the input message – it will end up containing a much bigger number than the actual size of the SMB1 message.
From type mismatch error to heap corruption
Programming mistakes like the type mismatch shown above can appear anywhere in code – and since the consequences are only triggered under very specific circumstances that are unlikely to occur during normal usage, they can be hard to track down. What made things worse in this case was reliance on this incorrectly-calculated value for sensitive memory operations.
The vulnerable function shown above was called from SrvOs2FeaListToNt; as that function is quite long, we will only show the relevant parts here. First, we can see that the recalculated size for the NT format FEA list block is stored in variable v5, while the (now corrupted) size of the OS/2 format SMB_FEA_LIST is written into the first four bytes of the SMB_FEA_LIST block itself. Then we allocate enough space to hold the NT format FEA list block.
v5 = SrvOs2FeaListSizeToNt(a1); // ... v7 = (_DWORD *)SrvAllocateNonPagedPool(v5, 21);
Shortly thereafter we define the boundaries of the copy operation by calculating a pointer from the corrupted length value:
v8 = &a1[*(_DWORD *)a1 - 5];
It may look a little intimidating (the wonders of decompilation), but it can be rewritten to a much simpler form as such:
v8 = SMB_FEA_LIST + SMB_FEA_LIST->SizeOfListInBytes - 5
That line effectively means that we are taking SizeOfListInBytes (which at this point can contain a value that’s larger than the actual size of the list), subtract 5, and add it to the pointer, thus we get an “end” pointer that potentially addresses a memory area beyond the allocated heap chunk.
Then the code iterates through the SMB_FEA_LIST block (we made some minor changes to the code to make the role of some variables clearer):
while ( !(*source_position & 0x7F) ) { v12 = dest_position; v11 = (signed __int16) source_position; dest_position = (_DWORD *)SrvOs2FeaToNt(dest_position, source_position); source_position += (unsigned __int8) source_position [1] + *((_WORD *)source_position + 1) + 5; if ( source_position > v8 ) { dest_position = v12; goto LABEL_13; } }
This loop will continue until source_position surpasses v8, at which point it will exit the loop and do a final check on whether we have reached the end of the block; similarly, if the first byte of a SMB_FEA block is 0x7F, the processing will stop. However, neither of those error cases are particularly interesting with respect to this vulnerability.
As we just established, in case of a corrupt SizeOfListInBytes value, v8 will point to a memory address past the end of the SMB_FEA_LIST buffer, thus in that while loop we’ll call SrvOs2FeaToNt repeatedly with out-of-bounds pointer values:
unsigned int __stdcall SrvOs2FeaToNt(int a1, int a2) { int v2; // esi@1 _BYTE *v3; // ebx@1 unsigned int result; // eax@1 v2 = a1; *(_BYTE *)(a1 + 4) = *(_BYTE *)a2; // copy ExtendedAttributeFlag *(_BYTE *)(a1 + 5) = *(_BYTE *)(a2 + 1); // copy AttributeNameLengthInBytes *(_WORD *)(a1 + 6) = *(_WORD *)(a2 + 2); // copy AttributeValueLengthInBytes _memmove((void *)(a1 + 8), (const void *)(a2 + 4), *(_BYTE *)(a2 + 1)); // copy AttributeName v3 = (_BYTE *)(*(_BYTE *)(a1 + 5) + a1 + 8); // calculate current position in target buffer *v3++ = 0; // add trailing zero _memmove(v3, (const void *)(*(_BYTE *)(v2 + 5) + a2 + 5), *(_WORD *)(v2 + 6)); // copy AttributeValue result = (unsigned int)&v3[*(_WORD *)(a1 + 6) + 3] & 0xFFFFFFFC; // calculate final position in destination with alignment *(_DWORD *)v2 = ((unsigned int)&v3[*(_WORD *)(v2 + 6) + 3] & 0xFFFFFFFC) - v2; // update first 4 bytes in destination buffer with length (result_ptr – start_ptr) return result; }
Normally, this function would copy over the individual SMB_FEA blocks between the two formats from the a2 buffer into the a1 buffer, one after another, with some minor differences in the result (such as writing the total size of the block into the first 4 bytes of a1).
However, at this point (due to the corruption of SizeOfListInBytes, and thus changing the exit condition of the loop) we know that a1 points past the end of the buffer we allocated to hold the converted FEA list block, and a2 is similarly indexing memory areas beyond the heap block holding the SMB_FEA_LIST element. Thus, the function will perform repeated out-of-bounds memory manipulation operations; particularly, the second memmove operation will potentially cause a buffer overflow on the heap with up to 64 kB of data read from a memory area past the end of the buffer.
At this point, if the attacker has managed to successfully set up the heap beforehand (e.g. via heap grooming techniques and use of other vulnerabilities), the overflow can corrupt the heap and overwrite part of a subsequent heap memory block holding SMB data in a controlled manner, eventually leading to code execution – there have been many excellent in-depth articles written about the exploitation details.
Lessons learned
The heap overflow was actually triggered by a simple mathematical error in part of code that looks like input validation functionality – after all, it recalculated the actual size of the block, and overwrote the size value in the block header with the calculated value. This is commendable – many protocol handlers just blindly trust the size parameter sent in the message, which can lead to buffer overflow- and overread-type problems (just think of Heartbleed – trusting the size parameter was the root of that vulnerability).
Unfortunately, the data type used for storing the result of this calculation was inappropriate, resulting in situations where the calculated value would not be stored correctly. Such situations would probably not show up during regular testing, either – after all, who’d think to test SMBv1 messages with OS/2 style FEA list blocks that were larger than 64 kB, right?
Incidentally, when Microsoft fixed the vulnerability, the only change in the code was changing the type of v1 in SrvOs2FeaListSizeToNt into _DWORD*. In the fixed code, all 4 bytes of the calculated length are stored appropriately, and the input / output buffer lengths stayed consistent – a simple solution to a simple (but too often overlooked) programming bug.
Type mismatch issues are a typical and dangerous vulnerability type. To prevent these problems, it is essential that all developers in the team apply secure coding best practices – specifically for this case, verifying that assignment operations are performed on the appropriate data type. In the specific case of handling protocol messages, each individual block structure should be defined via a struct, in which case trying to write a WORD into a DWORD field – for instance – should raise a compiler warning that the developer can then notice and fix.
Working with integers and the interactions of different integer types can be deceptively difficult. That however does not mean that writing correct code is impossible: there are well-established best practices on how to deal with situations such as the bug that lead to the vulnerability responsible for WannaCry.
Check out our course catalog to find out how SCADEMY Secure Coding Academy can help your engineers in gaining these essential skills