Table of Contents
Previous Section Next Section

13.2. Techniques to Block Buffer Overflow Attacks

"All non-trivial programs have bugs."

This section discusses some of the most important techniques in use, or in various stages of research, to detect and prevent buffer overflow attacks and related exploits on computer systems. These procedures and systems potentially help prevent infections by fast-spreading computer worms.

There is no major difference between the overflow technique of the Morris Internet worm7 and today's more advanced attacks (such as Linux/Slapper2), other than the complexity and overflow type. These worms are built on a classic set of ideas involving the overflow of stack or heap structures. They can be classified into a few main categories.

For example, most of the BSD or UNIX -based worms, such as Morris, Linux/Slapper, BSD/Scalper, and Solaris/Sadmind, can be classified as shellcode-based worms.

Shellcode is a short sequence of code that runs a command shell on the remote system (for instance, /bin/sh on UNIX or cmd.exe on Windows). The hacker community exchanges copies of shellcode for many operating systems, and some hackers build exploits to run such code or their modified versions via an overflow. After such a shell is executed on the remote machine, the worm can copy itself to the remote system and completely control the system. On the other hand, hackers use this technique to "own" yet another remote machine.

Other worm classes, like W32/CodeRed, do not use the shellcode technique. Instead, they hijack a thread in a faulty application and run themselves as part of the exploited host service, using run-time code injection. The techniques presented in this chapter provide protection against both shellcode and run-time code injection attacks.

There is one more significant class of attacks, known as the return-to-LIBCattack . In this case, the attacker attempts to force a return into existing, well-known, standard code on the system (for instance, C run-time code or OS APIs). The attacker accomplishes this by overflowing the stack in such a way that an instruction like ret would return the execution flow to a desired API call, with the parameters that the attacker chooses. (The stack is overflowed with the desired parameters, as well as a "return address," which is actually the address of the desired API call.)

In this way, neither the stack nor the heap is executed, which is important because some antioverflow techniques involve checking for code that is running when it should not in most casesthat is, on the stack or in the heap. This kind of attack would be immune to such protection techniques because code is not run on the stack or heap.

Although existing worms do not currently use the return-to-LIBC technique, I expect that future worms will. In preparation for such worms, I spend some time describing mitigation techniques against the return-to-LIBC attacks.

13.2.1. Code Reviews

The most effective buffer overflow attack prevention method is the code reviews that security experts perform. More often than not, applications by many companies are released with minimal or no code reviews, leading to potential security problems.

Even if code reviews are performed, many people are not properly educated to find potential security issues in time. It is imperative to train professionals about security at all stages of development. Programmers need to be as educated about security as QA professionals.

Code reviews are particularly important because individuals who own the source code can perform the best defense. However, we cannot assume that the developer will detect all security flaws. In fact, outsiders, such as security professionals or hackers, report the majority of flaws. Another problem is that security code reviews often forget to validate the design but focus on the code itself only. This alone can lead to serious vulnerability problems.

13.2.1.1 Security Updates

Many security professionals believe that publishing exploits forces companies to make fixes available quickly, thus improving overall security for the public. In fact, even when patches (security updates) are made available, customers often neglect to apply them until the patched vulnerabilities have been used against them.

There are several reasons for poor adoption of security updates:

  • People are unaware they exist or do not want to apply the patches.

  • They are often costly to implement at large corporations.

  • Sometimes patches do not fix the security flaw completely.

  • Patches occasionally cause crashes or incompatibility with existing systems.

Working updates/patches are the most effective types of protection against specific security flaws. Neglecting to apply security updates is not a good practice, even if some updates cause problems on some systems. A good example of this is Microsoft Security Bulletin MS03-007, which was incorrectly known to many people as the "WebDav vulnerability." One of the actual buffer overflow vulnerabilities was located in ring 3, the user mode. In particular, a run-time library (RTL) function of the NT-native API module, NTDLL.DLL, needed to be fixed. In addition, the integer overflow vulnerability condition existed in the kernel as well.

Because the initial exploit worked over the WebDav feature of IIS, some security professionals believed that disabling WebDav was good enough to mitigate possible attacks against the system. The patch that Microsoft provided replaces NTDLL.DLL, which is considered major surgery and can cause complications on some systems.

Due to possible complications and because some security experts believed that disabling the WebDav feature in IIS was sufficient protection, many people did not apply the patch, disabling WebDav instead. This situation left many systems without serious protection.

The main lesson is that a vulnerability can be demonstrated by exploiting a particular application; however, if the vulnerability lies in a shared component, such as an OS component, all of the applications using that particular component are potentially vulnerable. Simply because the exploit demonstrates that the vulnerability has used one application does not mean that other applications are safe; the proper fix is that of the root cause. In this case, disabling the application masked the true problem. This situation is even worse when it comes to statically linked libraries such as zlib or openssl, which might have vulnerabilities. Many software vendors neglect or do not realize that their software is vulnerable and do not issue patches when such libraries are effected.

We need to take every possible available measure at every stage of software deployment to protect against potential attacks on vulnerable software. We need to adopt everything available from the ground upfrom source to run-time protectionsto mitigate attacks. At the same time, we need to understand the capabilities and limitations of each type of protective technique.

13.2.2. Compiler-Level Solutions

For some time, programmers have adopted bounds-checking software, such as BoundsChecker. This helps programmers find many types of existing overflows and other software quality problems. As buffer overflow attacks have become more popular and successful, security professionals have started to think about compiler-level solutions to prevent certain kinds of attacks.

C and C++ provide great flexibility for buffer overflow errors of all types. Because C and C++ code is especially vulnerable, programmers must adopt compiler-level solutions.

Such solutions cannot eliminate the need for code reviews, however. Compiler-level solutions are primarily safety guards against the most common types of stack-based overflow attacks. Most of these solutions do not provide any protection against heap-based overflows, nor can they provide 100% protection against all stack-based overflow situations. In fact, this chapter provides a few simple examples of why such systems remain vulnerable to the very stack-smashing attacks that they are supposed to prevent.

However, we should keep in mind that when more techniques are employed to raise the bar, a greater level of skill will be required to circumvent the technique, proportionate to a smaller population of attackers with this requisite skill set. Further, attackers with the required skill set will hopefully need to spend more time to create a successful attack.

Unfortunately, attackers have some advantages:

  • They have access to at least the compiled code and even the source code in the case of open-source targets.

  • They have time.

  • The difficulty of exploitation varies. Some vulnerabilities are easily exploited by the attackers, while others take months to develop. The complexity of defense does not change, however. It is equally difficult regardless of how easily the vulnerability is exploited. (Even a two-line code change can be difficult and extremely costly to deliver in some projects. And by defense I mean more than source fixes.)

  • They do not have to be completely accurate to target all the systems, although some exploits need acute precision.

13.2.2.1 StackGuard

StackGuard was introduced in 19988 as one of the first compiler-level extensions to prevent certain types of stack-based overflows in run-time code and was created as an extension of the gcc compiler. StackGuard cleverly introduces return address modification detection using a "canary" technique. Most stack-based overflows occur by overflowing buffers that are placed next to a function return address on the stack. Usually a missing bounds check provides the means to overflow a buffer with a long string value, thus manipulating a function return address on the stack. This attack is called stack smashing9.

When the function returns to its caller, it picks up a newly presented address that the attacker has placed there. StackGuard protects against such attacks by inserting a canary value next to the return address on the stack (see Figure 13.2).

Figure 13.2. StackGuard places a "canary" below the "return address" on the stack.


StackGuard is a simple patch to the function_prologue and function_epilogue of the gcc. By extending the prologue to set the canary and the epilogue function to check it, alteration of the canary can be detected at runtime.

Thus when the canary value changes, the epilogue routine will execute the "canary-death-handler" instead of letting the function return. When the attack is detected, the attacker's code does not have a chance to run.

There are a few issues that StackGuard's 2.x implementation did not address, some of which will be addressed in StackGuard 3. It does not protect against frame pointer (EBP) attacks because the canary is placed next to the return address, so the overflow of the frame pointer itself may not be detected. This is because the canary value does not need to be changed to modify the frame pointer.

Further, StackGuard remains vulnerable to attacks that target the function pointers among local variables. However, it is a fact that StackGuard itself could have effectively blocked many Internet worms, such as the Morris worm, assuming that the application containing the vulnerable code, such as fingerd, was compiled with it.

The Morris worm used a shellcode-based attack and modified the return address of main() on the stack to run its shellcode, which was passed as a "string" to the vulnerable fingerd service10.

Recompiling the vulnerable service with StackGuard can prevent Linux worms that use simple stack-smashing attacks. Worms such as Linux/Slapper use heap-based overflows, which StackGuard itself cannot prevent. It is important, however, to note that heap overflows are not a common technique in today's computer worms; most worms use a simple stack-based overflow.

Using StackGuard is strongly recommended. In fact, Linux compilations are available that have been recompiled with StackGuard to make the system more secure.

Microsoft Visual C++ .NET 2003 7.0 independently developed11 a technique similar to StackGuard's. This was changed in the 7.1 release to another method, which shows similarities to that of ProPolice.

13.2.2.2 ProPolice

IBM researcher Hiroaki Etoh12 developed ProPolice. ProPolice introduces many novel features based on the foundations of StackGuard. Like StackGuard, it provides compiler-level protection against buffer overflows. Its novel ideas include moving the canary value and optimizing buffers and function pointer locations on the stack so that attempts to exploit the function pointers are more difficult to accomplishbecause they are out of the way. See Figure 13.3 for an illustration.

Figure 13.3. The "canary" of ProPolice below the frame pointer and "return address."


By default, ProPolice protects the frame pointer and the return address both by a trickier placement of the canary value below the frame pointer. ProPolice also concatenates string buffers and places them above the local variables, thereby providing better protection for function pointers that are local variables.

Also ProPolice attempts to create local copies of passed-in function pointers; however, compiler optimizations can cause problems for this trick. Remaining issues include function pointers in passed-in structures that contain string buffers.

Like StackGuard, ProPolice is also finding its place in operating system builds. Its current claim to fame is that it is included in the OpenBSD 3.3 releaseit will make a system considerably more difficult to attack. ProPolice makes stack-based overflows much more difficult and should present a formidable challenge to even accomplished attackers.

Because ProPolice protects stack integrity, it will not prevent attacks against heap-based structures13, so worms like Linux/Slapper2 challenge it.

13.2.2.3 Microsoft Visual Studio .NET 2003: 7.0 and 7.1

Microsoft first introduced the /GS option in Visual Studio .NET 2003. The new option is called Buffer Security Check, which is available as a code generation option and is turned on by default.

Consider the buggy C code shown in Listing 13.1.

Listing 13.1. A Buggy C Code
int Bogus(char *mystring)
{
      char buf[8];

      strcpy(buf, mystring); // oops!
      return 0;
}

void main(void)
{
     Bogus("Here is a typical stack overflow!");
}

The compiler primarily protects arrays that are at least five bytes long; the security check code is not generated for shorter buffers. This is probably done as a performance trade-off, assuming that most overflows happen in larger buffers. Regardless of how short the buffer is, however, if an attacker can get his/her input to the buggy function, that particular function can be exploited.

Now let's look at some code that VC .NET 2003 7.0 generated:

00401296  push offset string "Here is a typical stack overflow!"
0040129B  call Bogus (401000h)

So far, we have passed a pointer to a long string to Bogus() via the stack. Listing 13.2 shows what happens inside Bogus().

Listing 13.2. Setting a "Security Cookie"
Bogus:
00401000  sub  esp,0Ch
00401003  mov  eax,dword ptr [___security_cookie (407030h)]
00401008  xor  eax,dword ptr [esp+0Ch]
0040100C  lea  edx,[esp]
00401010  mov  dword ptr [esp+8],eax

Bogus() will first access a security_cookie value randomly generated by the CRT. A special CRT routine initializes this value to a random DWORD. The reason is simple: If the attacker can guess the security_cookie value, he/she will be able to cause an overflow, present a "fake" security_cookie value, and remain undetected by the Buffer Security Check feature. (This attack remains feasible if an attacker can get around the security check, overwrite a previous frame above the stack, run its code via a function pointer, and fix the stack afterward to remain hidden.)

The value of security_cookie is XORed with the current return address and then saved next to the return address on the stack as a cookie. Then the buggy copy takes place as an in-lined strcpy(), as shown in Listing 13.3.

Listing 13.3. The Potential Overflow Condition
00401014  mov  eax,dword ptr [esp+10h]
00401018  sub  edx,eax
0040101A  lea  ebx,[ebx]
00401020  mov  cl,byte ptr [eax]
00401022  mov  byte ptr [edx+eax],cl
00401025  inc  eax
00401026  test cl,cl
00401028  jne  Bogus+20h (401020h)

Finally, the epilogue routine of Bogus() picks up the saved cookie value and "decodes" it to the "ecx" register (see Listing 13.4).

Listing 13.4. Decoding the "Security Cookie"
0040102A  mov  ecx,dword ptr [esp+8]
0040102E  xor  eax,eax
00401030  xor  ecx,dword ptr [esp+0Ch]
00401034  add  esp,0Ch
00401037  jmp  __security_check_cookie (4013F1h)

Next the epilogue jumps to the C runtime defined in seccook.c within the CRT source code, as shown in Listing 13.5.

Listing 13.5. The Standard "Security" Handler
void __declspec(naked) __fastcall __security_check_cookie(DWORD_PTR cookie)
{
    /* x86 version written in asm to preserve all regs */
    __asm {
        cmp ecx, __security_cookie
        jne failure
        ret
failure:
        jmp report_failure
    }
}

Thus a comparison is made against the original security cookie value. If a mismatch is detected, the code continues to report_failure. However, standard reporting only occurs if a user_handler was not previously set. The user handler allows setting an arbitrary handler to provide functionality differently than the default method. As user_handler is a function pointer placed in the data section, an overflow of the user_handler itself might be possible in some cases, allowing an attacker to run his/her code of choice via this handler.

If a user_handler was not set, which is normally done with _set_security_error_handler(), then the stack overflow is reported to the user, and the program's execution is stopped.

The cookie value is placed below the frame pointer when there is one. In this way, the check can now answer attacks on the frame pointer.

Microsoft clearly improved the Buffer Security Check feature in the 7.1 edition of the compiler. The cookie value is no longer XORed against the return address, which did not have any obvious benefits. Instead, the cookie is saved and checked. Some of the aforementioned issues, however, have not yet been solved.

The most important feature of the 7.1 edition is that string buffers are joined together and the compiler moves the function pointers and other local variables below the buffers on the stack. Microsoft's implementation of the stack integrity check matches the most important features of ProPolice.

Like ProPolice, the Microsoft Visual Studio .NET 2003 7.1 security check also has conflicts with its own compiler optimization switches. For instance, in optimized code, passed-in function pointers might be direct references to a previous stack frame above the stack. This means that such function pointers can be over written and abused before the security check can take place because the check does not occur until the function returnsnested calls that use corrupted local function pointers passed as parameters (via optimized direct references to the caller's stack frame) are vulnerable to those corruptions.

One alternative to consider is using pragmas to turn off code optimization for certain code sections (sections that pass function pointers, for example). This is a good practice to put in place for other problems, such as clearing an "in-memory secret" (deleting a temporary key) as the last line of a function, which clever code optimization might eliminate as dead code. This is because the variable does not appear to be used as the end of the function is reached.

Another remaining challenge is standard Windows exception handling. Several an exception occurs, the exception handler chain is traversed to find an active exception handler to invoke. Many generic Windows exploits are based on overwriting stack-based exception handler frames to run the attacker's code. Several current exploits, as well as the W32/CodeRed worm, use this technique.

The Buffer Security Check feature itself does not mitigate such problems. An alternative that does mitigate this attack was developed at Symantec. Refer to Section 13.3, "Worm-Blocking Techniques," for more information. Also note that Microsoft is planning several changes to the /GS implementation in Visual Studio 2005, which will likely address some of the deficiencies described in this section.

13.2.3. Operating System-Level Solutions and Run-Time Extensions

Compiler-level stack integrity checking is only one option for operating systemlevel protections against overflows. While recompiled system components (OS or third-party) are less vulnerable to stack-based attacks, unprotected components (OS or otherwise) cause the system to remain vulnerable.

Although most Intel processors do not provide a page-level mechanism to prevent stack execution, some processors do, and operating systems can take advantage of that protection on such systems. (Alternatives for Intel systems are described in detail later.)

The major issue is that compiler-level protection requires source code to compile. During the last few years, some newer solutions have emerged that do not require source code, but they are specific to certain processors, such as Intel, or to certain operating systems, such as Linux. The following section discusses some of the most significant of such system extensions.

13.2.3.1 Solaris on SPARC

A number of operating systems have built-in features to protect them from certain types of buffer overflow attacks. For example, Solaris systems can be protected from stack execution by changing a system setting located in the /etc/system file. In this way, Solaris can prevent stack-based buffer overflow attacks on SPARC when the attack results in stack execution. See Figure 13.4 for a depiction.

Figure 13.4. Configuration options on Solaris on SPARC to prevent stack execution.
set noexec_user_stack=1
set noexec_user_stack_log=1

As a result of this system setting change, the user stack area of Solaris processes will not be mapped as executable (exec), thus executing the stack results in a core dump, which is also logged in the system log file, if so configured. See Figure 13.5 for a depiction.

Figure 13.5. User stack of "sh" process not marked executable ("exec"), as pmap shows.
#pmap 653
653:   /sbin/sh
00010000    272K read/exec         /sbin/sh
00062000     16K read/write/exec   /sbin/sh
00066000     24K read/write/exec   [ heap ]
FFBEE000      8K read/write        [ stack ]
       total      320K

Certain protection systems have attempted to achieve similar results using executable and data segments on Intel processors (see examples in Section 13.2.5). Both of these solutions will prevent stack execution.

Although such solutions are attractive, it is important to remember that there are significant overflow dangers that do not involve executing code on the stack, such as heap overflows and return-to-LIBC attacks.

This is exactly where compiler-based solutions might help because compiler solutions, such as StackGuard, ProPolice, or Microsoft's Buffer Security Check, attempt to avoid exploitation via return addresses and frame pointer modifications, and some of them make it more difficult to exploit function pointers. Thus it is fair to say that these systems nicely complement each other. It is also clear that other techniques need to be applied to mitigate remaining issues.

13.2.4. Subsystem ExtensionsLibsafe

Some solutions add attack-prevention logic within the user-mode process address space of individual applications. Libsafe14 is a run-time protection available on Linux. It protects against hijacked return addresses as well as frame pointer attacks, but it might not be able to protect processes that do not use frame pointers on the stack between function calls. In such a situation, Libsafe simply lets the application do whatever it wants.

Libsafe takes advantage of a standard Linux feature that allows a sort of preemptive "overloading" of functions in dynamically loaded libraries. Libsafe loads as a dynamic library and loads function names, such as memcpy() and strcpy(), into the process address space. Thus when GLIBC (the standard C run-time library on Linux) is loaded, such functions will already be known, and the Libsafe version of these routines will be used instead of the GLIBC version. When an application calls strcpy(), it will call into Libsafe first.

Libsafe traces the stack using the frame pointers from the stack structures. Then Libsafe uses the function-specific logic to validate the parameters and to figure whether a parameter is arbitrarily too long and able to overwrite the location of a frame pointer or return address. In such a case, Libsafe will immediately stop executing the process. Otherwise, it will call the original function from GLIBC by dynamically switching to it.

Currently, the functions that Libsafe protects include memcpy(), strcpy(), strncpy(), wcscpy(), stpcpy(), wcpcpy(), strcat(), strncat(), wcscat(), [v]sprintf(), [v]snprintf(), vprintf(), vfprintf(), getwd(), gets(), and realpath(). This is not an exhaustive list of "vulnerable" functions, but it certainly contains some of the most common causes of vulnerabilities in C code.

Libsafe 2.0 protects the most wanted list of "vulnerable" function calls from public enemy stack-smashing attacks. It also protects the functions that can be used to execute format string exploits10.

13.2.5. Kernel Mode Extensions

Many kernel-mode extensions attempt to deal with a large set of attacks, but such solutions face major challenges, such as intense exposure to false positives. Any kernel-mode extension is susceptible to stability problems, which is somewhat true of a technique that was first deployed in PaX15 for open-source systems (which becomes more problematic on closed-source systems), including the direct manipulation of the page flags of the page tables.

PaX and its follow-up implementation, SecureStack16, sets the Supervisor bit of page flags to cause a page fault, which the product's driver handles when the user-mode code accesses such pages. This makes it possible to check whether or not the instruction pointer points to a writeable page on the stack or heap.

The implementation uses a clever technique to minimize performance impact so that page faults occur mostly on execution, rather than on data access. This technique keeps performance degradation down to less than 5%.

The trick to this clever technique lies in its use of the translation look-aside buffers (TLBs) of Intel processors. On a 32-bit Intel architecture, a page table entry (PTE) describes every 4KB page of memory. The PTE describes the page location and, through various attributes, its availability. One of the PTE flags is the Supervisor bit. When the Supervisor bit is set in the PTE for a given page, access to that page in user mode will generate an exception. In turn, the product's driver, which is set up to handle exceptions in kernel mode, performs the security check. PaX and SecureStack set this bit for certain user-mode pages, such as writeable pages or stack areas.

The key to this trick is that as Pentium and above processors have two TLBsone for data access (DTLB) and one for instruction access (ITLB)page faults are minimized by setting the Supervisor bit only in the ITLB copy of a PTE, not in the DTLB copy16. Thus executing writeable pages via the ITLB can be detected and prevented. An important feature of this technique is that stack and heap execution of writeable pages is blocked.

Unfortunately, writeable page execution is common (mostly on Windows systems, but also on others). Packed executables exemplify this problem. When legitimate writeable page execution occurs, this system will have a false positive.

Fortunately, executing writeable pages is uncommon on server platforms. As a way to mitigate the false positive problem, PaX provides tools that make applications PaX-friendly. Other exclusion systems can further mitigate the problem.

PaX implements another stack execution prevention strategy on Intel by segmenting the process address space in such a way that stack execution can be prevented via the segment rights themselves. The benefit of this segmentation is that there is virtually no performance penalty. However, this solution needs to be tightly integrated in the operating system itself, which leads to development difficulties on nonopen source platforms.

The outstanding issue is stability. Solutions such as these are processor-dependent and to an extent, OS version dependent, which might also include service pack dependency. These techniques provide the means to protect against large classes of user-mode attacks, which are the most common. However, they do not necessarily provide protection against kernel-mode (ring 0) overflows, so such systems are vulnerable to bugs in system and third-party drivers where malicious input can produce harmful side effects. (Newer versions of PaX have extra protection for kernel pages.)

Further, the attacker can challenge stack and heap execution prevention with the aforementioned return-to-LIBC type of attack, but these problems can be further mitigated by other techniques, as described in Section 13.3, "Worm-Blocking Techniques."

13.2.6. Program Shepherding

Another interesting technique was discussed in an MIT research paper17 with some promising results. This new technique is called program shepherding.

Program shepherding was built with the use of a dynamic optimizer called DynamoRIO. The goal of RIO on Dynamo was fast code execution to optimize code without recompiling the actual executables involved. This project was based on collaboration between Hewlett-Packard and the Massachusetts Institute of Technology18. Program shepherding was built into this model, and thus it can take advantage of the faster code execution and use this advantage to implement code flow verification as well. It does so by implementing a code cache to which the program's code is copied into fragments and validates the program code in the cache before it is executed. Thus the system never runs the real code, but its cached copy only, using the real CPU in the system instead of emulating the code. The program fragments are modified on the fly in the program cache to establish control over the code. This allows the secured execution of applications.

The basic system needs extensions to address some tricky exploitation techniques. A particularly difficult problem is the detection of code flow change that occurs as the result of arbitrary data change in the process address space. For example, places such as the global offset table (GOT) on Unix or the Import Address Table (IAT) on Windows might be modified to make code flow changes that are hard to detect based on code flow verification in a cache.

    Table of Contents
    Previous Section Next Section