Table of Contents
Previous Section Next Section

7.5. Polymorphic Viruses

The next step in the level of difficulty is a polymorphic attack. Polymorphic viruses can mutate their decryptors to a high number of different instances that can take millions of different forms.

7.5.1. The 1260 Virus

The first known polymorphic virus, 1260, was written in the U.S. by Mark Washburn in 199016. This virus has many interesting techniques that were previously predicted by Fred Cohen. The virus uses two sliding keys to decrypt its body, but more importantly, it inserts junk instructions into its decryptor. These instructions are garbage in the code. They have no function other than altering the appearance of the decryptor.

Virus scanners were challenged by 1260 because simple search strings could no longer be extracted from the code. Although 1260's decryptor is very simple, it can become shorter or longer according to the number of inserted junk instructions and random padding after the decryptor for up to 39 bytes of junk instructions. In addition, each group of instructions (prolog, decryption, and increments) within the decryptor can be permutated in any order. Thus the "skeleton" of the decryptor can change as well. Consider the example of an instance of a decryptor extracted from 1260 (see Listing 7.5).

Listing 7.5. An Example Decryptor of 1260
; Group 1  Prolog Instructions
inc     si        ; optional, variable junk
mov     ax,0E9B   ; set key 1
clc               ; optional, variable junk
mov     di,012A   ; offset of Start
nop               ; optional, variable junk
mov     cx,0571   ; this many bytes - key 2

; Group 2  Decryption Instructions
Decrypt:
xor     [di],cx   ; decrypt first word with key 2
sub     bx,dx     ; optional, variable junk
xor     bx,cx     ; optional, variable junk
sub     bx,ax     ; optional, variable junk
sub     bx,cx     ; optional, variable junk
nop               ; non-optional junk
xor     dx,cx     ; optional, variable junk
xor     [di],ax   ; decrypt first word with key 1
; Group 3  Decryption Instructions
inc     di        ; next byte
nop               ; non-optional junk
clc               ; optional, variable junk
inc     ax        ; slide key 1
; loop
loop    Decrypt   ; until all bytes are decrypted  slide key 2
; random padding up to 39 bytes

Start:
;     Encrypted/decrypted virus body

In each group of instructions, up to five junk instructions are inserted (INC SI, CLC, NOP, and other do-nothing instructions) with no repetitions allowed in the junk. There are two NOP junk instructions that always appear.

1260 does not have register replacement, but more complicated polymorphic attacks use that trick. Nonetheless, 1260 is an effective polymorphic engine that generates a high variety of decryptors.

7.5.2. The Dark Avenger Mutation Engine (MtE)

The next important development in the history of polymorphic viruses was MtE17, a mutation engine written by the Bulgarian Dark Avenger. The first version MtE was released during the summer of 1991, later followed by another version in early 1992. The idea of the mutation engine is based on modular development. For novice virus writers, it was difficult to write a polymorphic virus. However, more advanced virus writers came to their rescue. The MtE engine was released as an object that could be linked to any simple virus.

The concept of MtE is to make a function call to the mutation engine function and pass control parameters in predefined registers. The engine takes care of building a polymorphic shell around the simple virus inside it.

The parameters to the engine include the following:

  • A work segment

  • A pointer to the code to encrypt

  • Length of the virus body

  • Base of the decryptor

  • Entry-point address of the host

  • Target location of encrypted code

  • Size of decryptor (tiny, small, medium, or large)

  • Bit field of registers not to use

In response, the MtE engine returns a polymorphic decryption routine with an encrypted virus body in the supplied buffer. (See Listing 7.6.)

Listing 7.6. An Example Decryptor Generated by MtE
mov     bp,A16C      ; This Block initializes BP
                     ; to "Start"-delta
mov     cl,03        ; (delta is 0x0D2B in this example)
ror     bp,cl
mov     cx,bp
mov     bp,856E
or      bp,740F
mov     si,bp
mov     bp,3B92
add     bp,si
xor     bp,cx
sub     bp,B10C      ; Huh ... finally BP is set, but remains an
                     ; obfuscated pointer to encrypted body

Decrypt:
mov     bx,[bp+0D2B] ; pick next word
                     ; (first time at "Start")
add     bx,9D64      ; decrypt it
xchg    [bp+0D2B],bx ; put decrypted value to place

mov     bx,8F31      ; this block increments BP by 2
sub     bx,bp
mov     bp,8F33
sub     bp,bx        ; and controls the length of decryption

jnz     Decrypt      ; are all bytes decrypted?

Start:
      ; encrypted/decrypted virus body

This example of MtE illustrates how remarkably complicated this particular engine is. Can you guess how to detect it?

It makes sense to look at a large set of samples first. The first time, it took me five days before I managed to write a reliable detector for it. MtE could produce some decryptor cases that appeared only in about 5% or less of all cases. However, the engine had a couple of minor limitations that were enough to detect the virus reliably using an instruction size disassembler and a state machine. In fact, there is only one constant byte in an MtE decryptor, the 0x75 (JNZ), which is followed by a negative offsetand even that is placed at a variable location (at the end of the decryptor, whose length is not constant).

Note

MtE does not have garbage instructions in the decryptor, as 1260 does. Therefore MtE attacks techniques that attempt to optimize decryptors to defeat polymorphism.


MtE's impact on antivirus software was clear. Most AV engines had to go through a painful rearchitecting to introduce a virtual machine for the use of the scanning engine. As Frans Veldman used to say, "We simply let the virus do the dirty work."

MtE was quickly followed by many similar engines, such as TPE (Trident Polymorphic Engine), written by Masouf Khafir in Holland in 1993.

Today, hundreds of polymorphic engines are known. Most of these engines were only used to create a couple of viruses. After a polymorphic decryptor can be detected, using it becomes a disadvantage to virus writers because any new viruses are covered by the same detection. Such detections, however, usually come with the price of several false positives and false negatives. More reliable techniques detect and recognize the virus body itself.

This opens up the possibility for virus writers to use the same polymorphic engine in many different viruses successfully unless such viruses are handled with heuristic or generic detection methods.

7.5.3. 32-Bit Polymorphic Viruses

W95/HPS and W95/Marburg18 were the first viruses to use real 32-bit polymorphic engines. These two viruses were authored by the infamous Spanish virus writer, GriYo, in 1998. He also created several highly polymorphic viruses earlier on DOS, such as the virus Implant19.

Just like Implant's polymorphic engine, HPS's polymorphic engine is powerful and advanced. It supports subroutines using CALL/RET instructions and conditional jumps with nonzero displacement. The code of the polymorphic engine takes about half of the actual virus code, and there are random byte-based blocks inserted between the generated code chains of the decryptor. The full decryptor is built only during the first initialization phase, which makes the virus a slow polymorphic. This means that antivirus vendors cannot test their scanner's detection rate efficiently because the infected PC must be rebooted to force the virus to create a new decryptor.

The decryptor consists of Intel 386based instructions. The virus body is encrypted and decrypted by different methods, including XOR/NOT and INC/DEC/SUB/ADD instructions with 8, 16, or 32 -bit keys, respectively. From a detection point of view, this drastically reduces the range of ideas. I am sad to say that the polymorphic engine was very well written, just like the rest of the virus. It was certainly not created by a beginner.

Consider the following example of a decryptor, simplified for illustration. The polymorphic decryptor of the virus is placed after the variably encrypted virus body. The decryptor is split between small islands of code routines, which can appear in mixed order. In the example shown in Listing 7.7, the decryptor starts at the Decryptor_Start label, and the decryption continues until the code finally jumps to the decrypted virus body.

Listing 7.7. An Illustration of a W95/Marburg Decryptor Instance
Start:
                          ; Encrypted/Decrypted Virus body is placed here

Routine-6:
dec     esi               ; decrement loop counter
ret

Routine-3:
mov     esi,439FE661h     ; set loop counter in ESI
ret

Routine-4:
xor     byte ptr [edi],6F ; decrypt with a constant byte
ret

Routine-5:
add     edi,0001h         ; point to next byte to decrypt
ret

Decryptor_Start:
call    Routine-1         ; set EDI to "Start"
call    Routine-3         ; set loop counter

Decrypt:
call    Routine-4         ; decrypt
call    Routine-5         ; get next
call    Routine-6         ; decrement loop register
cmp     esi,439FD271h     ; is everything decrypted?
jnz     Decrypt           ; not yet, continue to decrypt
jmp     Start             ; jump to decrypted start

Routine-1:
call    Routine-2         ; Call to POP trick!

Routine-2:
pop     edi
sub     edi,143Ah         ; EDI points to "Start"
ret

The preceding decryptor is highly structural, with each differently functioning piece placed in its own routine. The result is millions of possible code patterns filled with random garbage instructions between the islands.

Polymorphic viruses can create an endless number of new decryptors that use different encryption methods to encrypt the constant part (except their data areas) of the virus body.

Some polymorphic viruses, such as W32/Coke, use multiple layers of encryption. Furthermore, variants of Coke also can infect Microsoft Word documents in a polymorphic way. Coke mutates its macro directly with the binary code of the virus instead of using macro code directly. Normally, polymorphic macro viruses are very slow because they need a lot of interpretation. Because Coke generates mutated macros with binary code, it does not suffer from slow-down issues and, as a result, is harder to notice. Consider the following example of Coke taken from two instances of mutated AutoClose() macros shown in Listing 7.8.

Listing 7.8. An Example Snippet of Coke's Polymorphic Macro
'BsbK
Sub AuTOclOSE()
oN ERROr REsuMe NeXT
SHOWviSuAlBASIcEditOr = faLsE
If nmnGG > WYff Then
For XgfqLwDTT = 70 To 5
JhGPTT = 64
KjfLL = 34
If qqSsKWW < vMmm Then
For QpMM = 56 To 7
If qtWQHU = PCYKWvQQ Then
If lXYnNrr > mxTwjWW Then
End If
If FFnfrjj > GHgpE Then
End If

The second example is a little longer because of the junk. I have highlighted some of the essential instructions in these examples. Notice that even these are not presented in the same order whenever possible. For example, the preceding instance turns off the Visual Basic Editor, so you will no longer see it in Word's menus. However, in Listing 7.9, this happens later, after lots of inserted junk and other essential instructions.

Listing 7.9. Another Snippet of Coke's Polymorphic Macro
'fYJm
Sub AUtOcLOse()
oN ERRor REsUME NexT
optIOns.saVenorMALPrOmpT = FAlsE
DdXLwjjVlQxU$ = "TmDKK"
NrCyxbahfPtt$ = "fnMM"
If MKbyqtt > mHba Then
If JluVV > mkpSS Then
jMJFFXkTfgMM$ = "DmJcc"
For VPQjTT = 42 To 4
If PGNwygui = bMVrr Then
dJTkQi = 07
'wcHpsxllwuCC
End If
Next VPQjTT
quYY = 83
End If
DsSS = 82
bFVpp = 60
End If
tCQFv=1
Rem kJPpjNNGQCVpjj
LyBDXXXGnWW$ = "wPyTdle"
If cnkCvCww > FupJLQSS Then
VbBCCcxKWxww$ = "Ybrr"
End If
opTiONS.COnFIrmCOnvErsiOnS = faLSe
Svye = 55
PgHKfiVXuff$ = "rHKVMdd"
ShOwVisUALbaSiCEdITOR = fALSe

Newer polymorphic engines use an RDA-based decryptor that implements a brute-force attack against its constant but variably encrypted virus body in a multiencrypted manner. Manual analysis of such viruses might lead to great surprises. Often there are inefficiencies of randomness in such polymorphic engines, which can be used as a counterattack. Sometimes even a single wildcard string can do the magic of perfect detection.

Most virus scanners, years ago, already had a code emulator capable of emulating 32-bit PE (portable executable) files. Other virus researchers only implemented dynamic decryption to deal with such viruses. That worked just as in the previous cases because the virus body was still constant under encryption. According to the various AV tests, some vendors were still sorry not to have support for difficult virus infection techniques.

Virus writers used the combination of entry-point obscuring techniques with 32-bit polymorphism to make the scanner's job even more difficult. In addition, they tried to implement antiemulation techniques to challenge code emulators.

Nevertheless, all polymorphic viruses carry a constant code virus body. With advanced methods, the body can be decrypted and identified. Consider Figure 7.3 as an illustration.

Figure 7.3. Instances of encrypted and decrypted polymorphic virus bodies.


    Table of Contents
    Previous Section Next Section