4.2. File Infection Techniques
In this section, you will learn about the common virus infection strategies that virus writers8 have used over the years to invade new host systems.
4.2.1. Overwriting Viruses
Some viruses simply locate another file on the disk and overwrite it with their own copy. Of course, this is a very primitive technique, but it is certainly the easiest approach of all. Such simple viruses can do major damage when they overwrite files on the entire disk.
Overwriting viruses cannot be disinfected from a system. Infected files must be deleted from the disk and restored from backups. Figure 4.3 shows how the content of the host program changes when an overwriting virus attacks it.
Figure 4.3. An overwriting virus infection that changes host size.
Normally, overwriting viruses are not very successful threats because the obvious side effects of the infections are easily discovered by users. However, such viruses have better potential when this technique is combined with network-based propagation. For instance, the VBS/LoveLetter.A@mm virus mass mails itself to other systems. When executed, it will overwrite with its own copy any local files with the following extensions:
Another overwriting virus infection method is used by the so-called tiny viruses. A classic family of this type is the Trivial family on DOS. During the early 1990s, many virus writers attempted to write the shortest possible binary virus. Not surprisingly, there are many variants of Trivial. Some of the viruses are as short as 22 bytes (Trivial.22).
The algorithm for such viruses is simple:
The shortest viruses are often unable to infect more than a single host program in the same directory in which the virus was executed. This is because finding the next host file would be "as expensive" as a couple of bytes of extra code. Such viruses are not advanced enough to attack a file marked read-only because that would take a couple of extra instructions.
Often the virus code is optimized to take advantage of the content of the registers during program execution as they are passed in by the operating system. Thus the virus code itself does not need to initialize registers that have known content set by the system loader. By using this condition, virus writers can make their creation even shorter.
Such optimization, however, can cause fatal errors when the virus code is executed on the wrong platform, which did not initialize the registers in the way that the virus expected.
Some tricky overwriting viruses also use BIOS disk writes instead of DOS file functions to infect new files. A very primitive form of such a virus was implemented in 15 bytes. The virus overwrites each sector on the disk with itself. Evidently, the system corruption is so major that such viruses kill the host system very quickly, keeping the virus from spreading any further.
Figure 4.4 illustrates an overwriting virus that simply overwrites the beginning of the host but does not change its size.
Figure 4.4. An overwriting virus that does not change the size of the host.
4.2.2. Random Overwriting Viruses
Another rare variation of the overwriting method does not change the code of the program at the top of the host file. Instead, the virus seeks to a random location in the host program and overwrites the file with itself at that location. Evidently, the virus code might not even get control during execution of the host. In both cases, the host program is lost during the virus's attack and often crashes before the virus code can execute. An example of this virus is the Russian virus Omud9, as shown in Figure 4.5.
Figure 4.5. A random overwriter virus.
To improve performance by reducing the disk I/O, modern antivirus scanners are optimized to find viruses at "well-known" locations of the file whenever possible. Thus random overwriting viruses are often problematic for scanners to find because a scanner would need to scan the contents of the complete host program for the virus code, which is too I/O expensive.
4.2.3. Appending Viruses
A very typical DOS COM file infection technique is called normal COM. In this technique, a jump (JMP) instruction is inserted at the front of the host to point to the end of the original host. A typical example of this virus is Vienna, which was published in Ralf Burger's computer virus book in a slightly modified form with its source code. This was back in 19861987.
The technique gets its name from the location of the virus body, which is appended to the end of the file. (It is interesting to note that some viruses infect EXE files as COM by first converting the EXE file to a COM file. The Vacsina virus family uses this technique.)
The jump instruction is sometimes replaced with equally functional instructions, such as the following:
a.) CALL start_of_virus b.) PUSH offset start_of_virus RET
The first three overwritten bytes at the top of the host program (sometimes 416) are stored in the virus body. When the virus-infected program is executed, the virus loads in memory with the actual infected host. The jump instruction directs control to the virus body, and then the virus typically replicates itself by locating new host programs on the disk or by executing some sort of activation routine (also called a trigger). Finally, the virus virtually cleans the program in memory by copying the original bytes to offset CS:0x100 (the location where the COM files are loaded) and executes the original program by jumping back to CS:0x100. The COM files are loaded to CS:0x100 because the program segment prefix (PSP) is placed at CS:0CS:0xFF.
Figure 4.6 shows how a DOS COM appender virus infects a host program.
Figure 4.6. A typical DOS COM appender virus.
Obviously the appender technique can be implemented for any other type of executable file, such as EXE, NE, PE, and ELF formats, and so on. Such files have a header section that stores the address of the main entry point, which, in most cases, will be replaced with a new entry point to the start of the virus code appended to the end of the file.
Section 4.3 is dedicated to Win32 infection techniques to demonstrate the principles of file infection techniques in modern file formats. These formats often have complicated internal structuresoffering many more opportunities to attackers.
4.2.4. Prepending Viruses
A common virus infection technique uses the principle of inserting virus code at the front of host programs. Such viruses are called prepending viruses. This is a simple kind of infection, and it is often very successful. Virus writers have implemented it on various operating systems, causing major virus outbreaks in many.
An example of a COM prepender virus is the Hungarian virus Polimer.512.A, which prepends itself, 512 bytes long, at the front of the executable and shifts the original program content to follow itself.
Let's take a look at the front of the Polimer virus in DOS DEBUG. Polimer is a good example to study because the top of the virus code is a completely harmless data area with a message that is displayed onscreen during execution of infected programs.
>DEBUG polimer.com -d 142F:0100 E9 80 00 00 3F 3F 3F 3F-3F 3F 3F 3F 43 4F 4D 00 ....????????COM. 142F:0110 05 00 00 00 2E 8B 26 68-01 00 00 00 00 00 00 00 ......&h........ 142F:0120 00 00 00 00 00 00 00 00-41 20 6C 65 27 6A 6F 62 ........A le'job 142F:0130 62 20 6B 61 7A 65 74 74-61 20 61 20 50 4F 4C 49 b kazetta a POLI 142F:0140 4D 45 52 20 6B 61 7A 65-74 74 61 20 21 20 20 20 MER kazetta ! 142F:0150 56 65 67 79 65 20 65 7A-74 20 21 20 20 20 20 0A Vegye ezt ! .
The virus body is loaded to offset 0x100 in memory. The virus code starts with a jump (0xe9) instruction to give control to the virus code after its own data area. Because Polimer is 512 bytes (0x200) long, at the front of COM executable, offset 0x300 in memory should be the original host program (0x100+0x200=0x300). Indeed, in this example, the actual infected host is the Free Memory Query Program. Prepending COM viruses can easily start their host programs by copying the original programs' content to offset 0x100 and giving it control.
-d300 142F:0300 E9 9E 00 0D 46 72 65 65-20 4D 65 6D 6F 72 79 20 ....Free Memory 142F:0310 51 75 65 72 79 20 50 72-6F 67 72 61 6D 2C 20 56 Query Program, V 142F:0320 65 72 73 69 6F 6E 20 34-2E 30 33 0D 0A 53 4D 47 ersion 4.03..SMG 142F:0330 20 53 6F 66 74 77 61 72-65 0D 0A 28 43 29 20 43 Software..(C) C 142F:0340 6F 70 79 72 69 67 68 74-20 31 39 38 36 2C 31 39 opyright 1986,19 142F:0350 38 37 20 53 74 65 76 65-6E 20 4D 2E 20 47 65 6F 87 Steven M. Geo 142F:0360 72 67 69 61 64 65 73 0D-0A 1A 00 00 00 00 00 00 rgiades.........
Figure 4.7 illustrates how a prepender virus is inserted at the front of a host program.
Figure 4.7. A typical prepender virus.
Prepender viruses are often implemented in high-level languages such as C, Pascal, or Delphi. Depending on the actual structure of the executable, the execution of the original program might not be as trivial a task as it is for COM files. This is exactly why a generic solution involves creation of a new temporary file on the disk to hold the content of the original host program. Then a function, such as system(), is used to execute the original program in the temporary file. Such viruses typically pass command-line parameters of the infected host to the host program stored in the temporary file. Thus the functionality of the application will not break because of missing parameters.
4.2.5. Classic Parasitic Viruses
A variation of the prepender technique is known as the classic parasitic infection, as shown in Figure 4.8. Such viruses overwrite the top of the host with their own code and save the top of the original host program to the very end of the host, usually virus-size long. The first such virus was Virdem, written by Ralf Burger. In fact, Virdem is one of the first examples of a file virus ever seen; Burger's book did not even contain information about any other kinds of computer viruses but file viruses. Burger distributed his creation at the Chaos Computer Club conference in December 1986.
Figure 4.8. A classic parasitic virus.
Often when such viruses are repaired, a common problem occurs. In many cases, the repair definition directs to copy N number of bytes to the front of the file by calculating backward from the end of the infected program. Then the file is truncated at FILESIZE-N, where N is typically the size of the virusbut the size of the file can change. The most common reason for this is a multiple infection, when the file is infected more than once.
In other cases, the file has some extra data appended, such as inoculation information placed there by some other antivirus program. For instance, the Jerusalem virus uses the MsDos marker at the end of the infected file to "recognize" files that are already infected. Some early antivirus programs appended the string to the end of all COM and EXE files to inoculate files from recurrent Jerusalem infections. Although it might sound like a great idea, the extra modification of the files can easily cause trouble for disinfectors. This happens when the inoculated file is already infected with another parasitic virus. When the FILESIZE-N calculation is used, the repair routine will seek to an incorrect location 5 bytes after the top of the original program content. This repair will result in a garbage host program that will crash when executed. This kind of disinfection is often called a half-cooked repair10.
Some special parasitic infectors do not save the top of the host to the end of the host program. Instead they use a temporary file to store this information outside of the file, sometimes with hidden attributes. For example, the Hungarian DOS virus, Qpa, uses this technique and saves 333 bytes (the size of the virus) to an extra file. Some members of the infamous W32/Klez family use this technique to store the entire host program in a new file.
4.2.6. Cavity Viruses
Cavity viruses (as shown in Figure 4.9) typically do not increase the size of the object they infect. Instead they overwrite a part of the file that can be used to store the virus code safely. Cavity infectors typically overwrite areas of files that contain zeros in binary files. However, other areas also can be overwritten, such as 0xCC- filled blocks that C compilers often use for instruction alignment. Other viruses overwrite areas that contain spaces (0x20).
Figure 4.9. A cavity virus injects itself into a cave of the host.
The first known virus to use this technique was Lehigh, in 1987. Lehigh was a fairly unsuccessful virus. However, Ken van Wyk created a lot of publicity about the virus and eventually set up the VIRUS-L newsgroup on Usenet to discuss his findings.
Cavity infectors are usually slow spreaders on DOS systems. The Bulgarian Darth_Vader viruses, for instance, never caused major outbreaks. This was also due to the fact that Darth_Vader was a slow infector virus. It waited for a program to be written, and only then did it infect the program using a cavity of the host.
A special kind of cavity virus infection relies on PE programs' relocation sections. Relocations of most executables are not used in normal situations. Modern linker versions can be configured to compile PE executable files without a relocation tableto make them shorter. Relocation cavity viruses overwrite the relocation section when there are relocations in the host. When the relocation section is longer than the virus, the virus does not increase the file size. Such viruses make sure that the relocation section is the last or it has sufficient length. Otherwise, the file gets corrupted easily during infection. For example, the W32/CTX and W95/Vulcano virus families use this technique.
4.2.7. Fractionated Cavity Viruses
A few Windows 95 viruses implement the cavity infection technique extremely successfully. The W95/CIH virus implements a variation of cavity infection called the fractionated cavity technique. In this case, the virus code is split between a loader routine and N number of sections that contain section slack space. First the loader (HEAD) routine of the virus locates the snippets of the virus code and reads them into a continuous area of memory, using an offset tablet kept in the HEAD part of the virus code. During infection, the virus locates the section slack gaps of portable executable (PE) files and injects its code into as many section slack holes as necessary.
A new viral entry point will be presented in the header of the file to point to the start of the virus code, usually inside the header section of the host applications. Some shorter cavity infectors, such as Murkry, use this area to infect files in a single step. However, CIH is longer and needs to split its code into snippets. Eventually, the virus executes the original host program from the stored entry point (EP). The advantage of the technique is that the virus only needs to "remember" the original EP of the host and simply jump there to execute the loaded program in memory.
Figure 4.10 represents the state of the host program before and after the infection of a fractionated cavity virus. The host would normally start at its entry point (EP) defined in its header section. The virus replaces that EP value with VEP, the viral entry point. The VEP points to the loader of the virus snippets. If there is not enough slack space in the file to present the loader in a single snippet, the file cannot get infected.
Figure 4.10. A fractionated cavity virus.
The section slacks are typically presented in modern file formats such as PE, and they can be easily located using the section header information of such files.
One of the special problems of cavity virus repair is that the content of overwritten areas cannot be restored 100%. This happens when the virus overwrites areas of files that usually contain zeros, but in other cases contain some other pattern. Thus the cryptographic checksums of files after repair will be often different from the original program's content. Furthermore, exact identification of such viruses is complicated because the virus snippets need to be pieced together.
4.2.8. Compressing Viruses
A special virus infection technique uses the approach of compressing the content of the host program. Sometimes this technique is used to hide the host program's size increase after the infection by packing the host program sufficiently with a binary packing algorithm. Compressor viruses are sometimes called "beneficial" because such viruses might compress the infected program to a much shorter size, saving disk space. (Runtime binary packers, such as PKLITE, LZEXE, UPX, or ASPACK, are extremely popular programs. Many of these have been used independently by attackers to pack the content of Trojan horses, viruses, and computer worms to make them obfuscated and shorter.)
The DOS virus, Cruncher, was among the first to use the compression technique. Some of the 32-bit Windows viruses that use this technique include W32/HybrisF (a file infector plug-in of the Hybris worm), written by the virus writer, Vecna. Another infamous example is W32/Aldebera, which combines the infection method with polymorphism. Aldebera attempts to compress the host in such a way that the host remains equivalent in size to the original file. This virus was written by the virus writer, B0/S0 (Bozo) of the IKX virus writing group, in 1999.
The W32/Redemption virus of the virus writer, Jacky Qwerty, also uses the compression technique to infect 32-bit PE files on Windows systems. Figure 4.11 shows how a compressor virus attacks a file.
Figure 4.11. A compressor virus.
4.2.9. Amoeba Infection Technique
A rarely seen virus infection technique, Amoeba, embeds the host program inside the virus body. This is done by prepending the head part of the virus to the front of the file and appending the tail part to the very end of the host file. The head has access to the tail and is loaded later. The original host program is reconstructed as a new file on the disk for proper execution afterwards. For example, W32/Sand.12300, written by the virus writer, Alcopaul, uses this technique to infect PE files on Windows systems. Sand is written in Visual Basic.
Figure 4.12 shows the host program before and after infection by a virus that uses the Amoeba infection technique.
Figure 4.12. The Amoeba infection method.
4.2.10. Embedded Decryptor Technique
Some crafty viruses inject their decryptors into the executable's code. The entry point of the host is modified to point to the decryptor code. The location of the decryptor is randomly selected, and the decryptor is split into many parts. The overwritten blocks are stored inside the virus code for proper execution of the host program after infection.
When the infected application starts, the decryptor is executed. The decryptor of the virus decrypts the encrypted virus body and gives it control. The Slovakian polymorphic virus, One_Half, used this method to infect DOS COM and EXE files in May 1994. Evidently, the proper infection of EXE files with this technique is a more complicated task. If relocations are applied to parts of the file that are overwritten with pieces of the virus decryptor, the decryptor might get corrupted in memory. This can result in problems in executing host programs properly.
Figure 4.13 shows the "Swiss cheese" layout of infected program content. The detection of such viruses made scanning code more complicated. The scanner needed either to detect decryptor blocks split into many parts or to include some more advanced scanning technique, such as code emulation, to resolve detection easily. (These techniques are discussed in Chapter 11.)
Figure 4.13. A "Swiss cheese" infection.
The easiest way to analyze such virus code is based on the use of special goat (decoy) files filled with a constant pattern, such as 0x41 ("A") characters. After the test infection, the overwritten parts stand out in the infected test program the following way:
142F:0D80 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 142F:0D90 41 41 41 41 41 2E FD 16-2E F9 FB 36 E9 77 FD 41 AAAAA......6.w.A 142F:0DA0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 142F:0DB0 41 41 41 41 41 41 41 41-41 41 41 41 41 41 41 41 AAAAAAAAAAAAAAAA 142F:0DC0 41 41 41 41 41 41 41 41-41 3E 2E BB 88 14 2E F9 AAAAAAAAA>...... 142F:0DD0 EB 9B 41 41 41 41 41 41-41 41 41 41 41 41 41 41 ..AAAAAAAAAAAAAA
Note the 0xE9 (JMP) and 0xEB (JMP short) patterns in the previous dump in two pieces of One_Half's decryptor. These are the pointers to the next decryptor block. In the past, several antivirus products would put together the pieces of the decryptor by following these offsets to decrypt the virus quickly and identify it properly.
4.2.11. Embedded Decryptor and Virus Body Technique
A more sophisticated infection technique was used by the Bulgarian virus, Commander_Bomber, written by Dark Avenger as one of his last known viruses in late 1993. The virus was named after the string that can be found in the virus body: COMMANDER BOMBER WAS HERE.
The Commander_Bomber virus body is split into several parts, which are placed at random positions on the host program, overwriting original content of the host. The head of the virus code starts in the front of the file and gives control to the next piece of the virus code, and so on. These pieces overwrite the host program in a way similar to the One_Half virus. The overwritten parts are stored at the end of the file, and a table is used to describe their locations.
Figure 4.14 shows the sophistication of the virus code's location within the host program. Scanners must follow the spiral path of the control flow from block to block until they find the main virus body.
Figure 4.14. Commander_Bomber-style infection.
The control blocks are polymorphic, generated by the DAME (Dark Avenger Mutation Engine) of the virus. This makes the blocks especially difficult to read because they contain a lot of garbage code with obfuscated ways to give control to the next block, until the nonencrypted virus body is reached. Eventually, the control arrives at the main virus body, which can be practically anywhere in the file, not at its very end. This is a major advantage for such viruses, because scanners need to locate where the main body of the virus is stored. Back in 1993, this technique was extremely sophisticated, and only a few scanners were able to detect such viruses effectively. The host program is reconstructed by the virus in runtime.
4.2.12. Obfuscated Tricky Jump Technique
W32/Donut, the first virus to infect .NET executables, was not dependent on JIT compilation as discussed in Chapter 3. This is because first editions of the .NET executable format can be attacked at its entry-point code, which is still architecture-dependent. (In later versions of Windows, this platform-dependent code will be eliminated by moving the functionality to the system loader itself.)
Donut gets control immediately upon executing an infected .NET PE file. The virus uses the simplest possible infection technique to infect .NET images. In fact, Donut turns .NET executables to regular-looking PE files. This is because the virus nullifies the data directory entry of the CLR header when it infects a .NET application.
The six-byte-long jump to the _CorExeMain() import at the entry point of .NET files is replaced by Donut with a jump to the virus entry point. The _CorExeMain() function is used to fire up the CLR execution of the MSIL code. The entry point in the header is not changed by the virus. This technique is called an obfuscated tricky jump. Evidently, this method can fool some heuristic scanners.
The actual jump at the entry point will be replaced with a 0xE9 (JMP) opcode, followed by an offset to the start of the virus body in the first physical byte of the relocation section, as shown in Figure 4.15.
Figure 4.15. An applied obfuscated tricky jump technique.
The obfuscated tricky jump is a common technique to avoid changing the original entry point of the file. One of the first viruses that used this trick was DOS COM infector, Leapfrog, which followed the jump instruction at the front of the host and inserted its own jump to the actual entry point instead, as Figure 4.15 demonstrates.
The first documented Win32 virus, W32/Cabanas11, used this technique as an antiheuristic feature to infect regular PE files on Windows 95 and Windows NT.
When activated, W32/Donut displays the following message box shown in Figure 4.16.
Figure 4.16. The message box of the W32/Donut virus.
The virus writer wanted to call this creation ".dotNet," but because this is a platform name, it cannot be the name of the virus. For obvious reasons, viruses are not called "DOS," "Windows," and so on. So I decided to name the virus something that sounds similar to "dotNET," calling it Donut instead.
4.2.13. Entry-Point Obscuring (EPO) Viruses
Entry-point obscuring viruses do not change the entry point of the application to infect it; neither do they change the code at the entry point. Instead, they change the program code somewhere in such a way that the virus gets control randomly.
220.127.116.11 Basic EPO Techniques on DOS
Several viruses use the EPO strategy on DOS to avoid easy detection with fast scanners that scan the file near its entry-point code. For example, in early 1997 the Olivia virus12 infected DOS EXE and COM files using this method. This technique became increasingly popular among virus writers to defeat heuristics analyzer programs after 1995.
Olivia infects COM and EXE files as they are run or renamed or as their attributes are changed. First, the virus clears the attributes of the file, and then it opens the file to analyze its structure.
Figure 4.17 demonstrates the simplified look of an EPO virus-infected program.
Figure 4.17. A typical encrypted DOS EPO virus.
If the victim has a COM extension, Olivia uses a special function that reads four bytes in a loop from the beginning of the victim and checks for E9h (JMP), EBh (JMP short), 90h (NOP), F8h (CLC), F9h (STC), FAh (CLI), FBh (STI), FCh (CLD), and FDh (STD) each time. If one of the previous instructions is found, the virus seeks the place of the next such instruction. If that position is not in the last 64 bytes of the host, the virus modifies the host program at the location where the previous instruction sequence was detected.
Olivia uses the 0x68 (Intel 286 PUSH) opcode to push a word value to the stack. This is followed by a 0xC3 (RET) instruction, which gives control to the virus code by popping the pushed offset to the decryptor of the virus.
(0x68) PUSH offset DECRYPTOR (0xC3) RET
In Figure 4.17, a jump instruction is shown to transfer control to a decryptor located at the end of the file, followed by the encrypted virus body. Other viruses often use a CALL instruction or similar trampoline to transfer control to the start of the virus body.
Figure 4.18 shows the happy birthday message displayed by Olivia upon activation.
Figure 4.18. The payload of the Olivia virus.
18.104.22.168 Advanced EPO Techniques on DOS
The Nexiv_Der virus13 (shown in Figure 4.19) is polymorphic in COM files, and it also infects the disk's boot sector (DBS). The most interesting technique of this virus, however, is the special EPO technique that it uses to infect files. Nexiv_Der was named after a backward string contained in its encrypted body: "Nexiv_Der takes on your files."
Figure 4.19. A polymorphic EPO virus.
This virus traces the execution of a program as an application debugger does. Then it patches the code at a randomly selected location to a CALL instruction. This CALL instruction points to the polymorphic decryptor of the virus.
The execution path through a program depends on many parameters, including command-line arguments passed to the program and DOS version number. Depending on the same parameters, an infected victim program will most likely run the virus code upon normal execution each time. However, the virus might not run at all on a different version of DOS because the virus code cannot take control. This generates a major problem for even sophisticated heuristic scanners that use a virtual machine to simulate the execution of programs because it is difficult to emulate all of the system calls and the execution path of the victim.
The major idea of the Nexiv_Der virus is based on its hook of the INT 1 handler (TRACE) under DOS. This handler is the real infection routine. It starts to trace the host program for at least 256 instructions and stops at the maximum of 2,048 iterations. If the last instruction of the trace happens to be an E8h, E9h, or 80C0..CF opcode (CALL, JMP, ADD AL,byte .. OR BH,byte), then Nexiv_Der replaces it with a CALL instruction, which starts the virus at the end of the file. Figure 4.19 shows a high-level look at a Nexiv_Der-infected executable.
The main advantage of this technique is the increased likelihood of virus code execution in a similar host-system environment. This technique, however, is too complicated and thus encountered very rarely.
22.214.171.124 EPO Viruses on 16-Bit Windows
One of the first EPO viruses in the wild was the Tentacle_II family14 on Windows 3.x systems. This virus does not change the original entry point of the NE header, which is the obvious choice of typical 16-bit Windows viruses. How can it take control, then? The virus takes advantage of the NE file structure. Although the NE on disk structure is more complicated to parse, it provides many more possibilities for an attacker to inject code reliably into the execution flow. Tentacle_II takes advantage of the module reference table of NE files to find common function calls that are expected to be executed among the first function calls made by the host program.
Tentacle_II checks for the KERNEL and VBRUN300 module names in the module reference table. The virus picks the module number of the found module name and reads the segment relocation records of every segment. It looks for the relocation record 91 (INITTASK) in the case of KERNEL or 100 (THUNKMAIN) in the case that VBRUN300 has been found previously. Both of these relocation records point to standard initialization code that must be called at the beginning of a Windows application. For example, the original KEYVIEW.EXE (a standard Windows application) has a relocation entry for KERNEL.91 for its first segment as follows:
type offset target PTR 0053h USER.1 OFFS 007Eh KERNEL.178 PTR 0073h FAXOPT.12 PTR 00D3h USER.5 PTR 005Bh FAXOPT.44 PTR 00CAh KERNEL.30 PTR 0031h USER.176 PTR 00A0hKERNEL.91 (INITTASK) PTR 008Eh KERNEL.102 Relocations: 9
Segment relocation records:
a.) Segment 0001h relocations type offset target PTR 0053h USER.1 OFFS 007Eh KERNEL.178 PTR 0073h FAXOPT.12 PTR 00D3h USER.5 PTR 005Bh FAXOPT.44 PTR 00CAh KERNEL.30 PTR 0031h USER.176 PTR 00A0h 0003h:002Eh (VIRUS_SEGMENT:2Eh) PTR 008Eh KERNEL.102 Relocations: 9 b.) Segment 0003h relocations (VIRUS_SEGMENT) type offset target PTR 2964h SHELL.6 (REGQUERYVALUE) PTR 2968h SHELL.5 (REGSETVALUE) PTR 296Ch KERNEL.91 (INITTASK -> STARTHOST) Relocations: 3
Thus the infected file starts as it would before the infection, but when the application calls one of the preceding initialization functions, control is passed to the address where the virus starts.
The VIRUS_SEGMENT has three relocation records. One of these will point to the original initialization procedure KERNEL.91 or VBRUN300. In this way, the virus is able to start the host program after itself. This infection technique is an NE entry pointobscuring infection technique, which makes Tentacle_II an antiheuristic Windows virus.
The preceding analysis was made with the help of Borland's TDUMP (Turbo Dump) utility. In the analysis techniques and tools sections, I will give a longer introduction to such tools and their role in virus analysis.
The payload of Tentacle_II is shown in Figure 4.20. The virus creates a TENTACLE.GIF file on the disk, which will be displayed each time a GIF image is viewed on the infected system.
Figure 4.20. The payload of the Tentacle_II virus.
126.96.36.199 API-Hooking Technique on Win32
On Win32 systems, EPO techniques became highly advanced. The PE file format15 can be attacked in different ways. One of the most common EPO techniques is based on the hooks of an instruction pattern in the program's code section. A typical Win32 application makes a lot of calls to APIs (application program interfaces). Many Win32 EPO viruses take advantage of API CALL points and change these pointers to their own start code.
For example, the W32/CTX and W32/Dengue viruses of GriYo locate a CALL instruction in the host program's code section that points to the import directory. In this way, the virus can reliably identify byte patterns that belong to a function call. After that, the CALL instruction is modified in such a way that it will point to the start of the virus code located elsewhere, typically appended to the end of the file. Such viruses typically search for one or both API call implementations:
This kind of virus also makes its selection for an API hook location totally at random; in some cases, the virus might not even get control each time a host program is executed. Some families of computer viruses make sure that the virus will execute from the file most of the time.
Viruses can hook an API that is called whenever the application exits back to the system. In this case, most programs call the ExitProcess() API. By replacing the call to ExitProcess() with the call to the virus body, a virus can trigger its infection routine more reliably whenever the application exits. To make antivirus detection more difficult, viruses often combine EPO techniques with code obfuscation techniques, such as encryption or polymorphism.
Figure 4.21 illustrates a Win32 EPO virus that replaces a CALL to ExitProcess() API with a CALL to the virus code. After the virus takes control, it will eventually run the original code (C) by fixing the code in memory and giving control to the fixed block.
Figure 4.21. An EPO virus that hooks API calls of the host.
Normally disk activity increases whenever the application exits. This happens for several different reasons. For example, if an application has used a lot of virtual memory, the operating system will need to do a lot of paging, which increases disk activity. Thus it is likely that viruses like this remain unnoticed for a long time.
188.8.131.52 Function Call Hooking on Win32
Another common technique of EPO viruses is to locate a function call reliably in the application's code section to a subroutine of the program. Because the pattern of a CALL instruction could be part of another instruction's data, the virus would not be able to identify the instruction boundaries properly by looking for CALL instruction alone.
To solve this problem, viruses often check to see whether the CALL instruction points to a pattern that appears to be the start of a typical subroutine call, similar to the following:
PUSH EBP ; opcode 0x55 MOV EBP, ESP ; opcode 0x89E5
Figure 4.22 illustrates the replacement of a function call to Foobar() with a call to the start of virus code. The Foobar() function starts with the 0x55 0x89 0xe5 sequence; it is easily identified as a function entry point. A similar opcode sequence is 0x55 0x8B 0xEC, which also translates to the same assembly. This virus technique is used by variants of W32/RainSong (created by the virus writer, Bumblebee).
Figure 4.22. Function callhooking EPO virus.
184.108.40.206 Import Table Replacing on Win32
Newer Win32 viruses infect Win32 executables in such a way that they do not need to modify the original code of the program to take control. Instead, such EPO viruses work somewhat similarly to the 16-bit Windows virus Tentacle_II.
To get control, the virus simply changes the import address table entries of the PE host in such a way that each API call of the application via the import address directory will run the virus code instead. In turn, the activated virus code presents a new import table in the memory image of the program. As a result, consequential API CALLs run proper, original entry-point code via the fixed import table.
This technique is used by the W32/Idele family of computer viruses, written by the virus writer, Doxtor L, as shown in Figure 4.23. W32/Idele changes the program section slack area of the code section with a small routine that allocates memory and decrypts the virus code into the allocated block and then executes it. Thus Idele avoids creating import entries with addresses that do not point to the code section.
Figure 4.23. Import table-replacing EPO virus.
220.127.116.11 Instruction Tracing Technique on Win32
The Nexiv_Der virus inspired modern virus writing on 32-bit Windows systems. In 2003, new viruses started to appear that use EPO, based on the technique that was pioneered on DOS. For example, the W32/Perenast16 family of viruses is capable of tracing host programs before infection by running the host as a hidden debug process using standard Windows debug APIs.
18.104.22.168 Use of "Unknown" Entry Points
Another technique to execute virus code in a semi-EPO manner involves code execution via non-well-known entry points of applications. The Win32 PE file format is commonly known to execute applications from the MAIN entry point stored in the PE.OptionalHeader.AddressOfEntryPoint field of the executable's header structure. Thus it is common knowledge that such programs always start wherever this field points.
It might come as a surprise that this is not necessarily the first entry point in a PE file that the system loader executes. On Windows NT systems and above, the system loader looks for the thread local storage (TLS) data directory in the PE files header first. If it finds TLS entry points, it executes these first. Only afterward will it run the MAIN entry-point code.
The following two message boxes are printed by a TLSDEMO program of Peter Ferrie. The demo was created when he discovered the TLS entry-point trick at Symantec during heuristic analysis research in 2000.
When the application is executed, it prints a message box from both the TLS and the MAIN entry points of the applications.
First, it prints the message box from the TLS, as shown in Figure 4.24.
Figure 4.24. The TLS entry point is executed first.
When you click on the OK button, you arrive at the real main entry point, as shown in Figure 4.25.
Figure 4.25. The main entry point is executed next.
Initially we did not talk about this trick because it could be used to develop even trickier viruses. However, the virus writer, roy g biv, discovered this undocumented trick and has already used it successfully in some of his W32/Chiton17 viruses in 2003.
22.214.171.124 Code IntegrationBased EPO Viruses
A very sophisticated virus infection technique is called code integration. A virus using this technique inserts its own code into the execution flow of the host program using standard EPO techniques and merges its code with the host program's code. This is a complicated process that requires complete disassembling and reassembling of the host. Fortunately, it is extremely complicated to develop such viruses. Disassembling the host program is a fairly CPU-intensive operation that requires a lot of memory. Such viruses need to update the host program's content with proper relocations for code and data sections of the host. The W95/Zmist virus, by the Russian virus writer, Zombie, uses this approach. Because of its high sophistication, this technique is detailed in Chapter 7.
Figure 4.26 shows a typical layout of a file infected with a sophisticated code-integration EPO virus.
Figure 4.26. A poly and metamorphic code-integration virus.
Code integration is a major challenge for scanners and computer virus analysts. The entire file must be examined to find the virus. The virus is camouflaged in the code section of the infected host program, and it is very difficult to locate the instruction that transfers control to the start of the virus. In the case of W95/Zmist, the decryptor of the virus code is not in one piece but is split in a manner similar to the One_Half or the Commander_Bomber virus.
4.2.14. Possible Future Infection Techniques: Code Builders
After reading the previous sections, you might wonder what could get more complicated and sophisticated than the code-integration EPO technique. This section provides you with an example that has not yet been seen in the most complex implementations of known computer viruses with the kind of sophistication that is unknown in computer viruses. The closest example is the W95/Zmist virus. The Zmist virus makes use of the host program's content in a manner that is similar to the Code Builder technique. Zmist calls into the host program's code to execute an RET (Return) instruction from it. Thus, the virus code flows into the host program's code and back. The author of the virus probably intended to extend this approach to build the entire virus body on the fly, using the content of the host program. Consider the code-builder virus shown in Figure 4.27.
Figure 4.27. A code-builder virus.
The idea is based on the fact that any program might contain another set of programs in it as instructions or instruction sequences. A virus might be able to analyze the host program's code in such a sophisticated way that these strings of instruction could be used as the virus itself. It might be difficult to find code that would transfer control properly with accurate register state. However, to demonstrate the idea, imagine a simple code-builder virus that would find the letters V, I, R, U, and S in the host program's code. The builder of the virus would copy these pieces together into memory. The builder itself would look like a generic sequence of code, which could be easy to vary based on metamorphic techniques. The builder would be integrated into the code of the host program itself.
Fortunately, this is a rather complicated virus, but certainly it would be very challenging to detect it in files. (A few members of the W95/Henky family use an approach similar to this, except that the viruses are not EPO, which simplifies their detection.)