Table of Contents
Previous Section Next Section

11.8. Regular and Generic Disinfection Methods

Traditionally, antivirus scanners have only been able to disinfect viruses that have been analyzed beforehand by product developers32. Producers of antivirus products were pretty much able to keep up with new viruses, adding detection and disinfection routines, until about 1996. It has been a logical expectation of users of antivirus programs that a detected virus is always repaired to restore the clean state of the host programs. Although full backups provide easy restoration of all infected programs, they are not always available unless a backup strategy is in place or is an integrated part of a disaster recovery system.

The situation quickly changed after 15,000 additional viruses were generated overnight using the PS-MPC kit. Even the producers of exact identification scanners and disinfectors had to admit that generic methods were necessary to clean viruses.

As the number of viruses continues to grow, more and more viruses are only detected because the developers do not consider every virus important enough to necessitate specific disinfection routines. Unfortunately, some users will eventually get infected by such viruses.

It is possible (but difficult) to disinfect unknown viruses. There are several approaches to this problem: One method is to trace the execution of a possibly infected program with debugger interfaces until the virus has restored the host to its original state33. This method works but cannot be considered truly reliable. An alternative is to emulate the program and collect information on its execution, using this information with generic rules to perform rule-based disinfection. Although this is difficult to implement, it produces surprisingly good results for DOS viruses and also can be applied to other classes of viruses, such as Win32 viruses.

How many viruses can be removed in this way? Testing a generic disinfector is a very difficult task. Testing how many particular viruses it can handle does not make sense because it is a generic antivirus product. It is more important to test how many different types of viruses it can handle by using such methods. A figure of 60% is quite possible, at least for DOS viruses. Most antivirus programs (such as my old program, Pasteur) do not even come close to this percentage of disinfection because there have been no sufficient resources to write disinfection by hand for each virus variant.

Generic methods explained in this chapter can be used as a disinfection solution without using heuristics to detect the virus in the first place. In such a case, the virus is detected and identified by normal methods, but it is repaired generically. This method was used effectively against virus generation kits by several antivirus products, including the Solomon engine34, 35 used by NAI. Generic disinfection can reduce the size of the antivirus database greatly because less virus variant-specific data needs to be stored.

11.8.1. Standard Disinfection

Before we can talk about generic disinfection, we should understand how a virus is repaired by the antivirus program. Virus infection techniques are the subject of Chapter 4, where it was demonstrated that in most cases, a virus adds itself to the end of the host file. If this is the case, the virus modifies the beginning of the program to transfer control to itself. Unless the virus is very primitive, it saves the beginning of the file within the virus code because it will be necessary to execute the victim file correctly after infection (see Listing 11.15).

Listing 11.15. A Simple DOS COM Infector Virus
a. Victim program

b. Infected program

Every virus adds new functionalities to the victim. The infected victim will execute the virus code, which will infect other files or system areas or go resident in memory. The virus code then "repairs" the beginning of the victim in memory and starts it. This sounds very simple. Unfortunately, it is only simple from the point of view of the virus, which modifies a few bytes in the victim file and saves a piece of the file's original code in the virus body (in this example: CCC).

In the early years, there were no problems with conventional disinfection. We had enough time to analyze viruses because there were only a few. We could spend weeks with every new sample until we had all the information necessary to clean it successfully.

Basically, the cleaning process is as easy as the infection. The following is all we need to know:

  • How to find the virus (in most cases, with a search string selected from the virus)

  • Where the original beginning of the host file (CCC) can be found in the virus

  • The size of the virus body in bytes

If we have all this information, we can easily remove the virus: "Let's read the original beginning from the virus code and put it back in its original place and then truncate the file at its original end, calculating where this is from the virus size." That's it! This method might have been interesting for the first ten viruses, but everyone who has spent years with viruses finds it just too tedious.

So we developed so-called goat systems to replicate virus samples automatically. These systems save time. We can calculate the place of the original bytes in the virus body by comparing many infected samples to uninfected ones, using a special utility. This system works as long as the virus is not encrypted, self-mutating, or polymorphic. Of course, it must not have an antigoat mechanism or a new infection technique that our disinfector does not know how to handle. If one of these problems occurs, we must analyze the virus manually. If we are lucky, this is enough. If not, we must change our antivirus strategy by adding new functions to it or by modifying already existing ones. This can take a lot of time and is therefore not efficient enough.

11.8.2. Generic Decryptors

Most of the better antivirus products have a generic decryptor to combat polymorphic viruses, so it appears we can solve the biggest problem that way. We can decrypt the virus so we can use the old search-string technique once again, which is great. Basically, the generic decryptor method is a part of the generic disinfection technique.

11.8.3. How Does a Generic Disinfector Work?

The idea of doing generic disinfection without any information stored about the original file was first developed by Frans Veldman in his TBCLEAN program.

The generic disinfection method is simple but great: The disinfector loads the infected file and starts to emulate it until the virus restores the infected file to its "original" form and is ready to execute it. So the generic disinfector uses the virus to perform the most important part of the cleaning process. As Veldman said, "Let the virus do the dirty work!" The virus has the beginning of the original file. All we need to do is copy the cleaned program back into the file.

However, there are still a few questions that have not been answered. These are addressed in the following sections.

11.8.4. How Can the Disinfector Be Sure That the File Is Infected?

We can use all the techniques that we used for heuristic scanners. The generic disinfector is a combination of a heuristic scanner and a heuristic disinfector. Thus the disinfector will not remove the "unknown from the unknown"33 but will remove the virus from the unknown. Standard detection methods, however, also can be applied to detect the virus first. Then the emulator can be used to let the virus do the cleaning for us.

11.8.5. Where Is the Original End of the Host File?

This question is also very important. We cannot always simply remove the part of the program that gained control; otherwise we cannot handle viruses like One_Half (see Chapter 4), which insert the decryptor into the host program.

In most cases, we can truncate the file to which the first jump (JMP) points or where the entry point is, but not with viruses like One_Half. If we truncate the file in that position, we will remove too much, and the "disinfected" program will not work anymore.

Another problem appears when removing too few bytes from the infected program, leaving some remnant virus code behind. In this case, other virus scanners might find search strings from the file after disinfection, causing ghost positives.

We should collect information about the virus during emulation. That way, we can get a very good result.

11.8.6. How Many Virus Types Can We Handle This Way?

The number of methods that viruses can use to infect programs or system areas is virtually unlimited. Although we cannot handle all viruses by using only generic disinfection techniques, we can handle most of the simple ones. Boot Sector Viruses

Unfortunately, it is relatively easy to write a boot sector virus. Nowadays, file viruses outnumber boot sector viruses by a large margin, and boot sector viruses are less and less common. Thus it is not a very big problem to handle boot sector viruses using conventional methods. We also can use generic methods to detect and disinfect boot sector viruses. Emulation of the boot program is simple, and most boot viruses store the original boot sector somewhere on the disk and will load it at one point in their execution. This moment can be captured, and the virus can be disinfected generically. File Viruses

Many more possible ways to infect files exist because there are so many different file structures. The biggest problem is the overwriting method, in which the virus overwrites the beginning of the file with its body, without saving the original code. Such viruses are impossible to disinfect without information about the file structure before infection. Although it is not possible to disinfect such viruses, these are easily detected using heuristics. Less than 5% of viruses are overwriting and cannot be disinfected.

There are other problematic cases, such as EPO Windows application infectors, device driver infectors, cluster infectors, batch file infectors, object file infectors, and parasitic macro infectors. Together, these account for about 10% of all known viruses today.

Several other viruses cause problems for heuristic techniques36. Such viruses use different infection techniques, with dirty tricks specifically designed to make detection and disinfection with generic methods difficult. These viruses make up about 15% of all viruses.

When we combine overwriting viruses and other special cases, the result is that about 30% of all viruses cannot be handled easilyor at allwith generic methods. If the part of the virus code where the virus repairs the infected program cannot gain control during emulation, then the disinfector cannot get the necessary information. We should control the execution of the virus code very intelligently. For example, when the virus executes its "Are you there?" call, the emulator should give the answer the virus wants. In this way, the virus thinks that its code is already resident in memory and repairs the host file! However, even this technique is difficult to implement in all cases.

11.8.7. Examples of Heuristics for Generic Repair

AHD (Advanced Heuristic Disinfector) was a research project, but such heuristics are built into most current antivirus software. AHD used the generic disinfection method combined with a heuristic scanner. These are the heuristic flags of the program:

  • Encryption: A code decryptor function is found.

  • Open existing file (r/w): The program opens another executable file for write. This flag is very common in viruses and in some normal programs (like make.exe).

  • Suspicious file access: Might be able to infect a file. AHD can display additional information about the virus type, such as recursive infection structure (direct action).

  • Time/date trigger routine: This virus might have an activation routine.

  • Memory-resident code: This program is a TSR.

  • Interrupt hook: When the program hooks a critical interrupt, like INT 21h, we can display all the hooked interrupts (INT XXh .. INT YYh).

  • Undocumented interrupt calls: AHD knows a lot of "undocumented" interrupts, so this flag will be displayed when the interrupt looks tricky, like the VSAFE/VWATCH uninstall interrupt sequence, which is very common in DOS viruses to disable the resident components of MSAV (Microsoft AntiVirus on DOS).

  • Relocation in memory: The program relocates itself in a tricky way.

  • Looking for memory size: The program tries to modify BIOS TOP memory by overwriting the BIOS data area at location 0:413h.

  • Self-relocating code

  • Code to search for files: The program tries to find other executable programs (*.COM and *.EXE; also *.côm, and so on, which means the same for DOS via canonical functions, and at least the Hungarian Qpa virus uses it as an antiheuristic).

  • Strange memory allocation

  • Replication: This program overwrites the beginning of other programs.

  • Antidebugging code

  • Direct disk access (boot infection or damage)

  • Use of undocumented DOS features

  • EXE/COM determination: The program tries to check whether a file is an EXE file.

  • Program load trap

  • CMOS access: The program tries to modify the CMOS data area.

  • Vector code: The virus tries to use the generic disinfector as a vector to execute itself on the system by exploiting the code tracingbased analyzer.

11.8.8. Generic Disinfection Examples

Here are two examples of disinfection using AHD. In the first case shown in Listing 11.16, the virus is polymorphic. It uses the original Mutation Engine (MtE). The virus is recognized using heuristics analysis, and the clean state of the program is restored.

Listing 11.16. The Zeppelin Virus, Which Uses the MtE Engine, but Nonetheless Repaired Generically
 - Encrypted code
 - Self-relocating code
 - Code to search for files
 - Open existing file (r/w)
 - Suspicious file access
 - Time/Date trigger routine
  -> Probably infected with an unknown virus

 1. Infect host starts with -> 0xE9 0xFC 0x13 0x53 0x6F
 2. Clean host starts with  -> 0xEB 0x3C 0x90 0x53 0x6F
 3. Original file size: 5119 , Virus size: 4097
          Virus can be removed generically.

During emulation, the far jump (0xE9) to the start of the virus body at the beginning of the host is replaced by a short jump (0xEB), which is the original code placed there by the virus to run the host.

Next, let's take a look at the disinfection where the virus is a VCL (virus creation laboratory) called VCL.379, shown in Listing 11.17.

Listing 11.17. The VCL.379 Virus Repaired Generically
 - Self-relocating code
 - Code to search for files
 - Open existing file (r/w)
 - Suspicious file access
 - Time/Date trigger routine
  -> Probably infected with an unknown virus

 1. Infect host starts with -> 0xE9 0xE5 0x03 0x90 0x90
 2. Clean host starts with  -> 0x90 0x90 0x90 0x90 0x90
 3. Original file size: 1000 , Virus size: 379
          Virus can be removed generically.

During the emulation of VCL.379, the host program is restored perfectly. The host is a typical goat file that contains 1,000 NOP instructions (0x90 bytes).


More information about goat files and their use is available in Chapter 15, "Malicious Code Analysis Techniques."

    Table of Contents
    Previous Section Next Section