_aLfa_ Site Admin
Joined: 21 Sep 2002 Posts: 233 Location: Aveiro, Portugal
|
Posted: Wed Sep 08, 2004 1:06 am
Post subject: Generic Detection for Visual Basic Internet Worms
|
|
Andy Nikishin and Mike Pavlyushchik
Kaspersky Lab, Russia
Recently, we have seen a growing tendency for virus (worm) writers to write their creations using high-level languages such as C++, Pascal (Delphi), Visual Basic and so on. This trend has placed a strong demand on anti-virus experts to find methods of generic detection for such programs using heuristics.
It is no secret that Internet worms hold the 'number one spot' in all virus-related charts and lists, and Visual Basic is one of the most popular languages among today's worm writers. For these reasons, we decided to start looking into the possibility of generic detection of Internet worms written in Visual Basic.
Starting Point
To determine whether or not a program is an Internet worm, we have to analyse the program's behaviour, determine what undesirable things the program does and how it does those things.
According to statistics, we know that most of the Internet worms written in Visual Basic (VB) spread using MS Outlook. The reason for this is that MS Outlook represents a COM object and as such can be accessed by any external program. From another angle, Visual Basic performs these actions 'transparently', meaning that even the program author may know nothing about how it really works.
VB File Format Overview
There are several versions of MS Visual Basic, but we shall examine only versions 5 and 6, since most of the recent Internet worms have been written with these version. The executable file format for the two versions is very similar, so we will analyse them as one. The internal structure of files compiled with MS Visual Basic differs from those created by other compilers. The file contains not only program code, but also a lot of data that describes the code and which is used at run time.
Usually, anti-virus scanners check program code from the entry point, but in VB files this method is useless. The entry point of a VB file points to a short stub that simply calls a run-time function that never return:
[asm:yahelc1a]- 0040273C push 0004028B4 ; sInitData
- 00402741 call 000402736 ; MSVBVM60.ThunRTMain
- 00402746 add [eax], al
- 00402748 add [eax], al
- 0040274A add [eax], al
[/asm:yahelc1a]
Further program execution is under the control of a run-time library that simply calls the program's procedures from file. To prepare the program for running, the run-time library uses data stored in the sInitData structure. Its pointer is passed to the ThunRTMain() run-time function wich initiates the program executions.
A great deal of useful information can be obtained by analysing the sInitData structure and its sub-structures. For example, we can find the name of the project and compiled file, all imported and declared functions, used OCX files, begin and end of the native code stream, structures that describe all modules and forms, and so on.
MS Visual Basic compiler can create two types of executable - 'Native Code', wich contains procedures compiled to native Intel x86 code, and 'P-Code', wich contains the byte code interpreted by the Visual Basic virtual machine at run time. Of course, each code format is reflected in the sInitData structure in a different way, and needs to be processed separately.
Native Code Analysis
From the sInitData structure we can see that the native code stream or 'segment' lies within the file as a persistent piece. It does not contain statically linked run-time code as, for example, Delphi code does. This means that the stream contains only author-defined code, without any run-time procedures, wich only take up valuable time during analysis. So, the analysis range is limited quite strictly by the code stream.
Let's return to Internet worms that use MS Outlook to spread. MS Outlook represents a COM object with ProgID (OLE Automation programmatic identifier) 'Outlook.Application'. To work with this object the program has to create its instance in some way. For example, it can be done as follows:
[vb:yahelc1a]- Set objOutlook = CreateObject("Outlook.Application")
[/vb:yahelc1a]
Next, the program uses the object instance by calling its methods and properties. Depending on the definition of the variable that holds the object instance before calling any method or property, Visual Basic performs either early binding (during time of compilation) or late binding (at the run time) automatically.
Late Binding
Late binding is performed if the type of variable that holds the object instance is defined as Object or Variant: Dim objOutlook as Object or Dim objOutlook as Variant.
The VB run-time library has a set of functions for late binding calls. Their names are constructed using 'LateMem' with various prefixes and postfixes: __vba[Var]LateMem[Named][Call][St|Ld][Ad|Rf].
For example:
__vbaLateMemSt
__vbaLateMemCallLd
__vbaLateMemNamedStAd
We will call these 'LateMem functions'. Each of them receives the name of the calling method (as a string), the number of the method's parameters, the parameters themselvers, and (optionally) a pointer for the result value. For those who are familiar with COM technology basics, we can say that all LateMem functions use IDispatch interface. The LateMem functions transforms the method name to memberId by calling IDispatch::GetIDsOfNames(), then invokes the method with parameters by calling IDispatch::Invoke(). For Example:
[vb:yahelc1a]- Set objNamespace = objOutlook.GetNamespace("MAPI")
[/vb:yahelc1a]
The compiled code is as follows:
[asm:yahelc1a]- 00401764 sub esp, 10h
- 00401767 mov ecx, 8 ; VT_BSTR
- 0040176C mov edx, esp ; Param1
- 0040176E mov eax, offset aMapi ; "MAPI"
- 00401773 push 1 ; params count
- 00401775 push offset aGetnamespace ; "GetNamespace"
- 0040177A mov [edx], ecx
- ; Param1.vt=VT_BSTR
- 0040177C mov ecx, [ebp+dummy+4]
- 00401782 mov [edx+4], ecx
- 00401785 mov ecx, [ebp+var_14]
- 00401788 push ecx
- 00401789 mov [edx+8], eax
- ; Param1.bstrVal="MAPI"
- 0040178C mov eax, [ebp+dummy+0Ch]
- 00401792 mov [edx+0Ch], eax
- 00401795 lea edx, [ebp+objOutlook]
- 00401798 push edx
- 00401799 call ds:__vbaLateMemCallLd
- ; objOutlook.GetNamespace
[/asm:yahelc1a]
As can be seen, before the __vbaLateMemCallLd function is called, the pointer to the object instance (objOutlook), method name ("GetNamespace") and one parameter ("MAPI") were place on the stack.
Thus, by going through the code and analysing LateMem function calls, we will find all the late binding calls to COM objects.
Early Binding
Visual Basic performs early binding if the type of variable that holds the object instance is defined as an application defined type:
[vb:yahelc1a]- Dim objOutlook As Outlook.Application
[/vb:yahelc1a]
In this case, the compiled code looks completely diferent. For the same example, the compiled code will be:
[asm:yahelc1a]- 00401794 mov eax, [ebp+objOutlook]
- 00401797 lea edx, [ebp+objNamespace]
- 0040179A push edx
- 0040179B push offset aMapi ; "MAPI"
- 004017A0 mov ecx, [eax]
- 004017A2 push eax
- 004017A3 call dword ptr [ecx+4Ch] ; GetNamespace
[/asm:yahelc1a]
Here the pointer to the method name is not pushed on the stack as a parameter. Instead, the method function is called directly, using the virtual function table (vtable). By analysing this code we can determine wich method has been called, based on the vtable offset (in our example 4Ch), but we need to know the interface type to bind the method number with the exact method name.
From the code shown above, we cannot see the interface type, thus it looks as if we have come to a dead end. Fortunately, there is a way out of this situation. If we look at the code just after the method's call, we will see:
[asm:yahelc1a]- 004017A6 cmp eax, esi
- 004017A8 fnclex
- 004017AA jge short loc_4017BE
- 004017AC mov ecx, [ebp+objOutlook]
- 004017AF push 4Ch
- 004017B1 push offset GUID__Application
- ; {00063001-0000-0000-C000-000000000046}
- 004017B6 push ecx
- 004017B7 push eax
- 004017B8 call ds:__vbaHresultCheckObj
- 004017BE ...
[/asm:yahelc1a]
The __vbaHresultCheckObj() function shows an error message if the method called returns an error value. Let us check the input parameters of this function. The third parameter is a reference to the GUID of the interface called (which, in this case, is _Application) and the fourth parameter is offset in the method table (vtable) - in fact, the number of methods multiplied by four (here we have 4Ch; this value corresponded to the GetNamespace method).
Tracing the __vbaHresultCheckObj() functions shows us all the program's calls of COM objects using early binding. As a result, we are able to find all the calls of COM objects in a program. Moreover it is unimportant what kind of binding was used - late or early. We filter all calls of interest to an MS Outlook object to understand the algorithm's interaction with MS Outlook. Finally, using the evidence we have collected, we can pass verdict on the program: guilty or not (i.e. worm or not)!
P-Code Analysis
During the analysis of P-Code compiled files we find that there is no code executed by CPU (except the entry point). All procedures are compiled into byte code, which is interpreted, controlled and run by Visual Basic's run-time library. Of course, such code needs different data to organize work with objects, local procedures data, constants, and so on. Therefore, the format of sInitData is slightly different.
In the process of investigating sInitData and its substructures, we look at the module description tables (in fact, these are descriptions of classes). Among other data there is a table of constants that is used by P-Code. Every module has its own table. These constants are references to strings, GUIDs, declared and run-time functions. Note that these tables are not present in files compiled in native code. Of course, this is understandable - all references to constants are already put into executable code. The following is an example of a constant table:
[asm:yahelc1a]- 004017CC dd offset rtcShell
- 004017D0 dd offset aCProgramFilesN
- ; "C:\ProgramFiles\NortonAntiVirus\*.dat"
- 004017D4 dd offset aOutlook_applic
- ; "Outlook.Application"
- 004017D8 dd offset rtcCreateObject
- 004017DC dd offset aMapi ; "MAPI"
- 004017E0 dd offset aGetnamespace ; "GetNameSpace"
- 004017E4 dd offset aOutlook ; "Outlook"
- 004017E8 dd offset aGuest ; "Guest"
- 004017EC dd offset aPassword ; "password"
- 004017F0 dd offset aLogon ; "Logon"
- 004017F4 dd offset aAddresslists ; "AddressLists"
- 004017F8 dd offset aCount ; "Count"
- 004017FC dd offset aCreateitem ; "CreateItem"
- 00401800 dd offset aAddressentries
- ; "AddressEntries"
- 00401804 dd offset aRecipients ; "Recipients"
- 00401808 dd offset aAdd ; "Add"
- 0040180C dd offset aSubject ; "Subject"
- 00401810 dd offset aBody ; "Body"
- 00401814 dd offset aAttachments ; "Attachments"
- 00401818 dd offset aSend ; "Send"
- 0040181C dd offset aLogoff ; "Logoff"
- 00401820 dd offset a_vxv ; ".vxv"
- 00401824 dd offset kernel32_OpenProcess_
- 00401828 dd offset kernel32_GetExitCodeProcess_
- 0040182C dd offset rtcDoEvents
- 00401830 dd offset rtcGetTimer
- 00401834 ...
[/asm:yahelc1a]
Here we see that the names of all COM's object methods are present in this table. Even a simple analysis of these strings gives us the opportunity to detect an Internet worm in a program with a high probability. In addition, it is possible to analyse P-Code itself. Such analysis shows us all COM's methods calls as Native code analysis does. However, this variant is more difficult and more laborious and it needs in-depth knowledge of P-Code structure and its additional data, so we shall not examine this method here.
Conclusion
Usually, we finish our articles with a warning, saying that the situation on the virus front goes from bad to worse. This time, however, we can turn our backs on tradition. In spite of the apparent difficulty, it is not difficult to write generic detection procedures to reveal Visual Basic worms, regardless of the code's type. Thus, in this case, we are able to say that the situation has gone from bad to better.
Date: January 2002
Source: Virus Bulletin
|
|