This time it's a simple 8-bit virtual machine! Spoilers!
MalwareTech has published another challenge on his Discord channel.
This malware has stolen a flag and encrypted it with a very simple encryption algorithm; unfortunately, the decrypter is coded in a custom 8-bit assembly language which it runs using a minimalistic virtual machine. Your job is to reverse engineer the malware and figure out the instruction set used by the VM, once you've done this you should write your own VM in Python which will run the decrypter and recover the flag.
So we are reversing a virtual machine with custom assembly. Ok then!
At a minimum, a virtual machine must have registers, opcodes to move data and a way to track the state of execution.
radare2 vm1.exe -- Use 'e asm.offset=true' to show offsets in 16bit segment addressing mode. [0x00402370]> aaa [ WARNING : block size exceeding max block size at 0x00401000 [+] Try changing it with e anal.bb.maxsize [x] Analyze all flags starting with sym. and entry0 (aa) [x] Analyze function calls (aac) [x] Analyze len bytes of instructions for references (aar) [x] Use -AA or aaaa to perform additional experimental analysis. [x] Constructing a function name for fcn.* and sym.func.* functions (aan) [0x00402370]> aflv 0x00401000 1 1022 sym.vm1_easy.exe__MD5Transform_MD5__CAXQAKQAE_Z 0x00401e50 5 160 sym.vm1_easy.exe__Encode_MD5__CAXPAEPAKI_Z 0x00401ef0 5 117 sym.vm1_easy.exe__Decode_MD5__CAXPAKPAEI_Z 0x00401f70 1 22 sym.vm1_easy.exe___0MD5__QAE_XZ 0x00401f90 1 70 sym.vm1_easy.exe__Init_MD5__QAEXXZ 0x00401fe0 10 263 sym.vm1_easy.exe__Update_MD5__QAEXPAEI_Z 0x004020f0 4 161 sym.vm1_easy.exe__Final_MD5__QAEXXZ 0x004021a0 5 74 sym.vm1_easy.exe__writeToString_MD5__QAEXXZ 0x004021f0 1 51 sym.vm1_easy.exe__digestMemory_MD5__QAEPADPAEH_Z 0x00402230 1 60 sym.vm1_easy.exe__digestString_MD5__QAEPADPAD_Z 0x00402270 10 109 fcn.00402270 0x004022e0 6 139 fcn.004022e0 0x00402370 1 128 entry0 0x004023f6 1 6 sub.ntdll.dll_memset_3f6 0x004023fc 1 6 sub.ntdll.dll_memcpy_3fc 0x00402402 1 6 sub.ntdll.dll_sprintf_402 0x00402408 1 6 sub.ntdll.dll_strlen_408
This looks very similar to the shellcode1.exe challenge, but this time there are two additional functions:
0x004022e0. Let's take a look at the main first with
As you can see in the screenshot, the most interesting part of the main function is the transfer of 507 bytes onto the heap with the consecutive call to
0x004023b5 e826ffffff call fcn.004022e0.
Let's see what's there:
[0x004022e0]> s fcn.004022e0; pdf;
| ; var int local_10h @ ebp-0x10 | ; var int local_ch @ ebp-0xc | ; var int local_8h @ ebp-0x8 | ; var int local_1h @ ebp-0x1
Function only has local variables. Btw, you can rename local variables and function arguments with
[0x004022e0]> afvn local_1h eip.
To make following function calls easier, let's enter the visual mode with
push ebp mov ebp, esp sub esp, 0x10 mov byte [local_1h], 0
First, we have the function prologue and space allocation. Then local_1h variable is being assigned byte 0.
mov eax, 1 test eax, eax je 0x402367
eax is being set to
1 and we enter a
while loop. The
test eax, eax followed by
je 0x402367 is the check if contents of
eax is 0. The
test instruction is the same thing as logical
and. If result of the
ZF zero flag will be set and this is what
je checks to determine whether to jump or not. Here we see that if
eax is 0, we will return from this function with the usual epilogue:
mov esp, ebp pop ebp ret
eax is 1, we will continue on another branch.
movzx ecx, byte [local_1h] mov edx, dword [0x40423c] movzx eax, byte [edx + ecx + 0xff] mov dword [local_10h], eax
This block is responsible for assigning
local_1h value to
ecx, retrieveing the data from the index of
0x40423c + local_1h + 0xff and assigning it to
movzx means that we will extend byte with 0s to 4 bytes of
add cl, 1 mov byte [local_1h], cl movzx edx, byte [local_1h] mov eax, dword [0x40423c] movzx ecx, byte [eax + edx + 0xff] mov dword [local_ch], ecx
Then we are incrementing local_1h and with it's help moving value of
0x40423c + local_1h + 0xff to
local_ch. I am not going to provide the sample of the next snippet as it repeats itself. So far we have the following:
local_1hacts as a pointer to the index
local_10hwas assigned the offset of
0xff + 0
local_chthe offset of
0xff + 1
0xff + 2
- The data source is located at
After the assignments, the variables are pushed onto the stack and another function is called. We can confirm the calling convention:
:> afc cdecl
cdecl means the arguments are passed right to left and the caller cleans up the stack afterwards.
mov ecx, dword [local_8h] push ecx mov edx, dword [local_ch] push edx mov eax, dword [local_10h] push eax call fcn.00402270
Is the same thing as
fcn.00402270(local_10h, local_ch, local_8h)
We can switch to another function right inside the visual mode by doing the Vim-style
Shift+: and typing
fcn.00402270 (int arg_8h, int arg_ch, int arg_10h); ; var int local_4h @ ebp-0x4 ; arg int arg_8h @ ebp+0x8 ; arg int arg_ch @ ebp+0xc ; arg int arg_10h @ ebp+0x10
This may be a little confusing, but this is correct. Since the values are submitted right-to-left, the arguments are going to be assigned in the opposite direction
arg_8h is our
mov eax, dword [arg_8h] mov dword [local_4h], eax
local_4h = arg_8h
Now it seems like the function is entering
if/else if/else block.
cmp dword [local_4h], 1
If local_4h = 1...
mov ecx, dword [0x40423c] add ecx, dword [arg_ch] mov dl, byte [arg_10h] mov byte [ecx], dl
0x40423c + arg_ch = arg_10h
Here we can start making some assumptions about the code. It checks if local_4h == 0x01, and if it does, the code putting the value at
0x40423c + arg_ch to
arg_10h. It looks like
local_4h may be the opcode,
0x40423c is the base of our VM memory,
arg_ch is a scratch register and
arg_10h - possibly address in memory, let's see what else is happening. This branch sets the lower byte of
1 and returns.
On the other branch, we have
cmp dword [local_4h], 2. If our opcode variable is
0x02, the following happens:
mov eax, dword [0x40423c] add eax, dword [arg_ch] mov cl, byte [eax] mov byte [0x404240], cl jmp 0x4022d5;[gi]
Move lower byte from
0x40423c + arg_ch into some memory location at
+4 offset and jump to the block that returns 1.
:> ? 0x404240 - 0x40423c hex 0x4 octal 04 unit 4 segment 0000:0004 int32 4 string "\x04" binary 0b00000100 fvalue: 4.0 float: 0.000000f double: 0.000000 trits 0t11
It looks like some flag is being set. Let's look at another opcode
cmp dword [local_4h], 3
movzx edx, byte [0x404240] mov eax, dword [0x40423c] add eax, dword [arg_ch] movzx ecx, byte [eax] xor ecx, edx mov edx, dword [0x40423c] add edx, dword [arg_ch] mov byte [edx], cl jmp 0x4022d5;[gi]
Load the flag from
0x404240, xor it with
0x40423c + arg_ch and update that location with the result. These are all the opcodes we have. Seems pretty straightforward.
0x01 - mov dst, data 0x02 - setkey data 0x03 - xor dst, key 0x04 - ret
Now let's pull the memory. If you remember, we established that the memory size was 507 bytes. Let's dump it in python's format.
0x0040423c]> pcp 570 @ 0x404040 import struct buf = struct.pack ("570B", *[ 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00, 0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0xde,
At this point, all that's left it to translate the binary into python VM. I am not going to cover how to do that, try it yourself. If you feel stuck, the code is attached. But really, try yourself first. Many thanks to MalwareTech for creating and posting these fun little challenges.