VB Decompiler Forum Index VB Decompiler
Hosted by TheAutomaters.com
 
  MemberlistMemberlist
 

Generic Detection for Visual Basic Internet Worms

 
   VB Decompiler Forum Index -> Articles
Author Message
_aLfa_
Site Admin


Joined: 21 Sep 2002
Posts: 233
Location: Aveiro, Portugal

Posted: Wed Sep 08, 2004 1:06 am     Post subject: Generic Detection for Visual Basic Internet Worms

Andy Nikishin and Mike Pavlyushchik
Kaspersky Lab, Russia


Recently, we have seen a growing tendency for virus (worm) writers to write their creations using high-level languages such as C++, Pascal (Delphi), Visual Basic and so on. This trend has placed a strong demand on anti-virus experts to find methods of generic detection for such programs using heuristics.

It is no secret that Internet worms hold the 'number one spot' in all virus-related charts and lists, and Visual Basic is one of the most popular languages among today's worm writers. For these reasons, we decided to start looking into the possibility of generic detection of Internet worms written in Visual Basic.


Starting Point
To determine whether or not a program is an Internet worm, we have to analyse the program's behaviour, determine what undesirable things the program does and how it does those things.

According to statistics, we know that most of the Internet worms written in Visual Basic (VB) spread using MS Outlook. The reason for this is that MS Outlook represents a COM object and as such can be accessed by any external program. From another angle, Visual Basic performs these actions 'transparently', meaning that even the program author may know nothing about how it really works.


VB File Format Overview
There are several versions of MS Visual Basic, but we shall examine only versions 5 and 6, since most of the recent Internet worms have been written with these version. The executable file format for the two versions is very similar, so we will analyse them as one. The internal structure of files compiled with MS Visual Basic differs from those created by other compilers. The file contains not only program code, but also a lot of data that describes the code and which is used at run time.

Usually, anti-virus scanners check program code from the entry point, but in VB files this method is useless. The entry point of a VB file points to a short stub that simply calls a run-time function that never return:

[asm:yahelc1a]
  1. 0040273C   push  0004028B4 ; sInitData
  2. 00402741   call  000402736 ; MSVBVM60.ThunRTMain
  3. 00402746   add   [eax], al
  4. 00402748   add   [eax], al
  5. 0040274A   add   [eax], al
[/asm:yahelc1a]

Further program execution is under the control of a run-time library that simply calls the program's procedures from file. To prepare the program for running, the run-time library uses data stored in the sInitData structure. Its pointer is passed to the ThunRTMain() run-time function wich initiates the program executions.

A great deal of useful information can be obtained by analysing the sInitData structure and its sub-structures. For example, we can find the name of the project and compiled file, all imported and declared functions, used OCX files, begin and end of the native code stream, structures that describe all modules and forms, and so on.

MS Visual Basic compiler can create two types of executable - 'Native Code', wich contains procedures compiled to native Intel x86 code, and 'P-Code', wich contains the byte code interpreted by the Visual Basic virtual machine at run time. Of course, each code format is reflected in the sInitData structure in a different way, and needs to be processed separately.


Native Code Analysis
From the sInitData structure we can see that the native code stream or 'segment' lies within the file as a persistent piece. It does not contain statically linked run-time code as, for example, Delphi code does. This means that the stream contains only author-defined code, without any run-time procedures, wich only take up valuable time during analysis. So, the analysis range is limited quite strictly by the code stream.

Let's return to Internet worms that use MS Outlook to spread. MS Outlook represents a COM object with ProgID (OLE Automation programmatic identifier) 'Outlook.Application'. To work with this object the program has to create its instance in some way. For example, it can be done as follows:

[vb:yahelc1a]
  1. Set objOutlook = CreateObject("Outlook.Application")
[/vb:yahelc1a]

Next, the program uses the object instance by calling its methods and properties. Depending on the definition of the variable that holds the object instance before calling any method or property, Visual Basic performs either early binding (during time of compilation) or late binding (at the run time) automatically.


Late Binding
Late binding is performed if the type of variable that holds the object instance is defined as Object or Variant: Dim objOutlook as Object or Dim objOutlook as Variant.

The VB run-time library has a set of functions for late binding calls. Their names are constructed using 'LateMem' with various prefixes and postfixes: __vba[Var]LateMem[Named][Call][St|Ld][Ad|Rf].

For example:
__vbaLateMemSt
__vbaLateMemCallLd
__vbaLateMemNamedStAd

We will call these 'LateMem functions'. Each of them receives the name of the calling method (as a string), the number of the method's parameters, the parameters themselvers, and (optionally) a pointer for the result value. For those who are familiar with COM technology basics, we can say that all LateMem functions use IDispatch interface. The LateMem functions transforms the method name to memberId by calling IDispatch::GetIDsOfNames(), then invokes the method with parameters by calling IDispatch::Invoke(). For Example:

[vb:yahelc1a]
  1. Set objNamespace = objOutlook.GetNamespace("MAPI")
[/vb:yahelc1a]

The compiled code is as follows:

[asm:yahelc1a]
  1. 00401764   sub   esp, 10h
  2. 00401767   mov   ecx, 8                ; VT_BSTR
  3. 0040176C   mov   edx, esp              ; Param1
  4. 0040176E   mov   eax, offset aMapi     ; "MAPI"
  5. 00401773   push  1                     ; params count
  6. 00401775   push  offset aGetnamespace  ; "GetNamespace"
  7. 0040177A   mov   [edx], ecx
  8.            ; Param1.vt=VT_BSTR
  9. 0040177C   mov   ecx, [ebp+dummy+4]
  10. 00401782   mov   [edx+4], ecx
  11. 00401785   mov   ecx, [ebp+var_14]
  12. 00401788   push  ecx
  13. 00401789   mov   [edx+8], eax
  14.            ; Param1.bstrVal="MAPI"
  15. 0040178C   mov   eax, [ebp+dummy+0Ch]
  16. 00401792   mov   [edx+0Ch], eax
  17. 00401795   lea   edx, [ebp+objOutlook]
  18. 00401798   push  edx
  19. 00401799   call ds:__vbaLateMemCallLd
  20.            ; objOutlook.GetNamespace
[/asm:yahelc1a]

As can be seen, before the __vbaLateMemCallLd function is called, the pointer to the object instance (objOutlook), method name ("GetNamespace") and one parameter ("MAPI") were place on the stack.

Thus, by going through the code and analysing LateMem function calls, we will find all the late binding calls to COM objects.


Early Binding
Visual Basic performs early binding if the type of variable that holds the object instance is defined as an application defined type:

[vb:yahelc1a]
  1. Dim objOutlook As Outlook.Application
[/vb:yahelc1a]

In this case, the compiled code looks completely diferent. For the same example, the compiled code will be:

[asm:yahelc1a]
  1. 00401794   mov   eax, [ebp+objOutlook]
  2. 00401797   lea   edx, [ebp+objNamespace]
  3. 0040179A   push  edx
  4. 0040179B   push  offset aMapi          ; "MAPI"
  5. 004017A0   mov   ecx, [eax]
  6. 004017A2   push eax
  7. 004017A3   call dword ptr [ecx+4Ch]    ; GetNamespace
[/asm:yahelc1a]

Here the pointer to the method name is not pushed on the stack as a parameter. Instead, the method function is called directly, using the virtual function table (vtable). By analysing this code we can determine wich method has been called, based on the vtable offset (in our example 4Ch), but we need to know the interface type to bind the method number with the exact method name.

From the code shown above, we cannot see the interface type, thus it looks as if we have come to a dead end. Fortunately, there is a way out of this situation. If we look at the code just after the method's call, we will see:

[asm:yahelc1a]
  1. 004017A6   cmp   eax, esi
  2. 004017A8   fnclex
  3. 004017AA   jge   short loc_4017BE
  4. 004017AC   mov   ecx, [ebp+objOutlook]
  5. 004017AF   push  4Ch
  6. 004017B1   push  offset GUID__Application
  7.            ; {00063001-0000-0000-C000-000000000046}
  8. 004017B6   push  ecx
  9. 004017B7   push  eax
  10. 004017B8   call ds:__vbaHresultCheckObj
  11. 004017BE   ...
[/asm:yahelc1a]

The __vbaHresultCheckObj() function shows an error message if the method called returns an error value. Let us check the input parameters of this function. The third parameter is a reference to the GUID of the interface called (which, in this case, is _Application) and the fourth parameter is offset in the method table (vtable) - in fact, the number of methods multiplied by four (here we have 4Ch; this value corresponded to the GetNamespace method).

Tracing the __vbaHresultCheckObj() functions shows us all the program's calls of COM objects using early binding. As a result, we are able to find all the calls of COM objects in a program. Moreover it is unimportant what kind of binding was used - late or early. We filter all calls of interest to an MS Outlook object to understand the algorithm's interaction with MS Outlook. Finally, using the evidence we have collected, we can pass verdict on the program: guilty or not (i.e. worm or not)!


P-Code Analysis
During the analysis of P-Code compiled files we find that there is no code executed by CPU (except the entry point). All procedures are compiled into byte code, which is interpreted, controlled and run by Visual Basic's run-time library. Of course, such code needs different data to organize work with objects, local procedures data, constants, and so on. Therefore, the format of sInitData is slightly different.

In the process of investigating sInitData and its substructures, we look at the module description tables (in fact, these are descriptions of classes). Among other data there is a table of constants that is used by P-Code. Every module has its own table. These constants are references to strings, GUIDs, declared and run-time functions. Note that these tables are not present in files compiled in native code. Of course, this is understandable - all references to constants are already put into executable code. The following is an example of a constant table:

[asm:yahelc1a]
  1. 004017CC  dd  offset rtcShell
  2. 004017D0  dd  offset aCProgramFilesN
  3.           ; "C:\ProgramFiles\NortonAntiVirus\*.dat"
  4. 004017D4  dd offset aOutlook_applic
  5.           ; "Outlook.Application"
  6. 004017D8  dd offset rtcCreateObject
  7. 004017DC  dd offset aMapi          ; "MAPI"
  8. 004017E0  dd offset aGetnamespace  ; "GetNameSpace"
  9. 004017E4  dd offset aOutlook       ; "Outlook"
  10. 004017E8  dd offset aGuest         ; "Guest"
  11. 004017EC  dd offset aPassword      ; "password"
  12. 004017F0  dd offset aLogon         ; "Logon"
  13. 004017F4  dd offset aAddresslists  ; "AddressLists"
  14. 004017F8  dd offset aCount         ; "Count"
  15. 004017FC  dd offset aCreateitem    ; "CreateItem"
  16. 00401800  dd offset aAddressentries
  17.           ; "AddressEntries"
  18. 00401804  dd offset aRecipients    ; "Recipients"
  19. 00401808  dd offset aAdd           ; "Add"
  20. 0040180C  dd offset aSubject       ; "Subject"
  21. 00401810  dd offset aBody          ; "Body"
  22. 00401814  dd offset aAttachments   ; "Attachments"
  23. 00401818  dd offset aSend          ; "Send"
  24. 0040181C  dd offset aLogoff        ; "Logoff"
  25. 00401820  dd offset a_vxv          ; ".vxv"
  26. 00401824  dd offset kernel32_OpenProcess_
  27. 00401828  dd offset kernel32_GetExitCodeProcess_
  28. 0040182C  dd offset rtcDoEvents
  29. 00401830  dd offset rtcGetTimer
  30. 00401834  ...
[/asm:yahelc1a]

Here we see that the names of all COM's object methods are present in this table. Even a simple analysis of these strings gives us the opportunity to detect an Internet worm in a program with a high probability. In addition, it is possible to analyse P-Code itself. Such analysis shows us all COM's methods calls as Native code analysis does. However, this variant is more difficult and more laborious and it needs in-depth knowledge of P-Code structure and its additional data, so we shall not examine this method here.


Conclusion
Usually, we finish our articles with a warning, saying that the situation on the virus front goes from bad to worse. This time, however, we can turn our backs on tradition. In spite of the apparent difficulty, it is not difficult to write generic detection procedures to reveal Visual Basic worms, regardless of the code's type. Thus, in this case, we are able to say that the situation has gone from bad to better.


Date: January 2002
Source: Virus Bulletin
Back to top
   VB Decompiler Forum Index -> Articles All times are GMT
Page 1 of 1

 
You can post new topics in this forum
You can reply to topics in this forum
You can edit your posts in this forum
You can delete your posts in this forum
You can vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group