Monday, February 15, 2016

Have I found the mysterious anti-debug?

In my previous post I mentioned that running the game under a debugger, would, after a while, force terminate the game.
I speculated either the debugger was being found by an API directly, or indirectly via a timing anti-debug.

I did some experiments. And the evidence points to a timing anti-debug. The time it takes to terminate the game is variable, and it turns out, it only happens if the performance of the game is rather bad. In this case, it was Olly 2's fault. There seems to be some kind of bug in Ollydbg 2.01 whereby all threads of a running application are suspended and resumed constantly, the game runs although with a 30-35% penalty. The timing anti-debug sees this, sees that more ticks are being expended than normal and with careful communication between two threads, it calls NtTerminateProcess by spawning several threads that point to a VM program(only of the thread has a different VM program than the rest)..
In most cases, what Olly is doing is normal behaviour, it's how it's usually done, but not in my case, I've observed olly idling and not doing this suspend/resume thing. The bug seems to disappear if I(at least in my case) I hit a memory breakpoint. Then olly is acting normal, and the game does not terminate, or at least not as fast as before, if the avg grows as time passes, because of small slowdowns, then it will terminate eventually.

I looked at my trace log of one of the obfuscated threads, lo and behold, RDTSC on address 3955DEA9(quick reminder there is no ASLR). The result of RDTSC is stored in EDX:EAX, these values are later used in a loop and are encrypted and stored in a table.
Now that I know what is what, I can better understand the underlying algorithm. One thing is certain, the mystery is solved.

Quick reminder that timing anti-debugs are in my opinion, the most difficult to handle, it isn't as easy returning 0 on GetTickCount.

Wednesday, February 10, 2016

Hunting for the mysterious anti-debug.

Well, it's probably not that mysterious. But let me point you to my last post, where I mentioned the problem briefly in the last paragraph.

I mentioned there were two threads, one is 3939EF70 and the other is 3939F9C0. Since the game utilizes no ASLR, and SecuROM expects most addresses to be the same on any system(hardcoded), there is no need to recompute them for each system.

The first thread to be started is 3939EF70, the thread is extremely obfuscated, here is but a sample of it

lea esp,[esp-4]
mov dword ptr ss:[esp],ebp
mov ebp,esp
sub esp,4
mov dword ptr ss:[esp],32E
xor dword ptr ss:[esp],00000326
sub esp,dword ptr ss:[esp]
lea esp,[esp-4]
mov dword ptr ss:[esp],esi
mov eax,-1D9
mov eax,dword ptr ds:[eax+3C04C67D]
xor eax,0000911A
mov dword ptr ds:[3C04C4A4],eax
mov eax,-0E7
mov eax,dword ptr ds:[eax+3C04C58F]
xor eax,00009A12
mov dword ptr ds:[3C04C4A8],eax
mov eax,-55
mov eax,dword ptr ds:[eax+3C04C4F9]
add eax,dword ptr ds:[3C04C4A8]
mov dword ptr ds:[3C04C4AC],eax
call 3939EFE5
add dword ptr ss:[esp],3E
push dword ptr ss:[esp]
sub dword ptr ss:[esp],3B
push ebx
mov ebx,dword ptr ss:[esp+4]
xchg dword ptr ss:[esp],ebx
xchg dword ptr ss:[esp],ebp
mov ebp,dword ptr ss:[ebp]
sub ebp,13
xchg dword ptr ss:[esp],ebp
mov dword ptr ss:[esp+4],A60004C2
jmp short 3939F00F
retn 4

This is from the tracer, in reality some of these instructions are overlapped. This thread seems to, initially, just loop over, checking for a value if it is bigger or smaller than another at particular hardcoded addresses, and jumps to different piece of code, but they all ultimately end at the same place initially, GetTickCount,+120 seconds to the value returned by GetTickCount, and then Sleep(120 seconds).

It repeats the aforementioned Sleep infinitely, until the other thread 3939F9C0 signals it, by writing different values to these hardcoded addresses, thereby making that thread 3939EF70 take different branches. At some point, 3939EF70 starts another thread with CreateRemoteThread that executes the VM. Interestingly, there is no synchronization between the threads, both threads rely on the fact that the either of them will Sleep when one is modifying the same addresses.

I mentioned an anti-debug, that's right. The game runs under a debugger for as long as thread 3939F9C0 allows it to, then randomly between 5-30 minutes(rarely longer) the thread calls NtTerminateProcess.
I've been speculating this is a timing anti-debug, that there is some 'avg' value that goes up as the debugger handles various events and generally slows down the game by 20-30%. As soon as this value crosses some threshold, it calls NtTerminateProcess. This seems to be further reinforced by the fact, that if I were to start the game under a debugger, detach, the game would never terminate. If that is not the case, then I am being detected by a different method.

Oh yes, I managed to manually patch the exe to disable code verification. Now I can tamper with some of the code(except the packed code which I can only modify at runtime).

Sunday, February 7, 2016

SecuROM's anti-tampering verification is only one if?

I was wondering how to deal with it, turns out, I didn't have to. The verification was a loop that computed a checksum of the code, then when it did, it compared the resulting checksum to a DWORD from an array.

There were two different instances of this, I simply had to patch the conditional jumps. One was je short, the other was jne. I changed the je to a jmp and then nop'ed the jne.

With this I could modify the code as I please.

Saturday, February 6, 2016

3DM will stop cracking for a whole year!

In a surprise announcement, 3DM cracking group have said that they will stop cracking single-player games under the pretext that they want to see if sales of games increase.

Why did I say pretext? Remember they last said they had difficulty cracking the latest iteration of Denuvo? As of now I propose either of the following reasons:

  1. 3DM cannot crack Denuvo and want to work on it for the next year. This sounds more plausible. They might even drop out of the scene completely, only to re-emerge years later when things around Denuvo had quieted down.
  2. They are genuinely doing this for the stated reason.
  3. They have come to an understanding with Denuvo GmbH/VMProtect to cease their activities for now.

More on this here.

Sunday, January 31, 2016

Analyzing the SecuROM 8.10.X VM.

I want to thank ARTeam for providing the docs on SecuROM 7.30 VM they really helped and are mostly still relevant today.

That said, I have not worked on SecuROM 7.30 ever, but I believe the VM has changed since then.

Here is an overview of the VM initialization.

When entering the VM, an argument is pushed to the stack. It's a pointer to a pointer to the VM opcodes. I call this argument a "program".
The dummy call after pushfd is used to get the address of the VM.



In the picture above several things happen. A spinlock is created by the thread which enters the VM and will initialize the context, all other threads, if any, will wait till the first thread has finished the initialization.

Then SecuROM uses the loop x86 construct to loop over all 100 possible VM thread contexts, and finds the first free one. The busy flag 0x66666666 indicates if a thread is busy or not. I should note that SecuROM 7.30 only supported up to 10 threads, SC 8.10 supports 100.

After the first free context is found, SecuROM jumps to the following code which sets the busy flag.

lea edx,[ebx+24]
mov dword ptr ds:[edx],66666666

then the VM context is zeroed out, but care is taken not to zero out the busy flag. Afterwards the lock is removed and the jump "je short 38D702FE" takes us to the last step of the initialization.


pop ebx loads the VM context for this thread in ebx.

sub dword ptr ss:[esp],7 subtracts 7 bytes from the VM function address which I mentioned above that it is pushed to the stack with a dummy call so it ends up as 38D70280 in this exe.

This part fills the VM context struct. I've taken the liberty of adding captions next to the instructions which are self-explanatory.


Next is this obfuscated code.


In a nutshell, it fetches the delta to the pointer to the opcodes, adds the VM entry point, and fetches 2 DWORDs(aka 8 bytes).

CPU Disasm
Address   Hex dump                 Command                                                      Comments
38D7035A    8B70 04              mov esi,dword ptr ds:[eax+4]
38D7035D    8B00                   mov eax,dword ptr ds:[eax]

The first 4 bytes of the opcode is the modifier, the next 4 bytes is the obfuscated address of the handler.
The modifier is used to calculate the next handler address. The one in the VM context is updated with this new one, after some xor and shifts are performed.

The "encrypted" address of the first handler is decrypted with a XOR.
xor esi,48371826
 It's then copied to eax.

Finally

CPU Disasm
Address   Hex dump                 Command                                                      Comments
38D70396    B9 19000000       mov ecx,19
38D7039B    83F1 1D              xor ecx,0000001D
38D7039E    01D9                   add ecx,ebx
38D703A0    8301 08              add dword ptr ds:[ecx],8 <-- Add 8 bytes to VM EIP
38D703A3    FFE0                  jmp eax

The VM EIP(program counter) is incremented by 8 bytes, and we jump to the address of the first handler.

Here comes the juicy part. The jump to first handler goes to this code

Step 1.

First, ebx+20 is updated with the address of the next pseudo handler, for a lack of a better word. And if we follow the jump we end up where the actual first instruction is executed in this particular handler.

Step 2.

and if we follow the jump we end up at what I call the "dispatcher". 

Step 3.

The dispatcher adds the VM entry point to the value added in ebx+20 to form the address of the next pseudo handler.

Basically, from what I understood, a single handler which is usually a sequence of instructions has been split into several small pseudo handlers each reached in three steps. In my opinion this is just obfuscation to slow down reverse engineering.

Sometimes in Step 1 there is an additional instruction that moves a value in ebx+400, which is usually used to substract 4 bytes from the stack pointer.

Now, if we follow the each jump and where it leads to, remove all jumps and dispatcher code, the first handler's code is basically this.

mov esi,dword ptr ds:[ebx+4]
add esi,dword ptr ds:[ebx+0C]
add esi,4
push dword ptr ds:[esi]
pop edi
mov dword ptr ds:[ebx+400],4
sub esi,dword ptr ds:[ebx+400]
push dword ptr ds:[esi]
pop esi
mov cl,byte ptr ds:[ebx+10]
push eax
xor eax,esi
xor eax,dword ptr ss:[esp]
add esp,4
shl eax,10
shr eax,18
xor al,2A
add byte ptr ds:[ebx+10],al
mov eax,3BF7C894
push edi
push eax
mov eax,esi
shl eax,18
shr eax,18
ror al,cl
xor al,78
shl eax,2
add eax,ebx
mov edi,eax
pop eax
push eax
pop dword ptr ds:[edi]
pop edi
push edi
pop eax
sub al,cl
xor eax,00B42D00
add dword ptr ds:[ebx+4],8 <-- Update VM EIP with 8 bytes.

In my case, this handler basically calculated the address of another handler.

This isn't an exhaustive analysis of the Securom 8 VM, it's but a scratch of the surface. Furthermore, I have not identified the VM exit procedure.

Denuvo and VMProtect are the same?

Recently I've been reading on Denuvo, and how certain code seems not similar but identical to that of VMProtect. Russian websites are also saying that Denuvo<=>VMProtect indicating that perhaps the two companies are sharing the same code base. That certain features in VMProtect appear in Denuvo and disappear in VMProtect, and vice-versa.

Here is the article in question (Russian).

Wednesday, January 6, 2016

Just Cause 3 and Denuvo

So apparently the same thing is happening with JC3 that happened with FIFA15 and DAI.

The founder of notorious Chinese cracking forum 3DM is warning that given the current state of anti-piracy technology, in two years there might be no more pirate games to play. The claims come after attempts to breach the Denuvo security protecting Just Cause 3 pushed the group's cracking expert to breaking point.

Do you know what drives technological innovation? Competition! Right now anti-tamper/DRM solutions are being sought because of piracy, and get broken, which forces the authors to come up with new and interesting ways to prevent their solution from being broken.

The same cannot be said from "our" side, us reverse engineers. Most people keep their tools private and we have a stagnation of publicly available tools to help us combat these new solutions and techniques.
Do you know why? Money, Denuvo is paid, cracking is something people do for free of charge, there is little incentive to release their internal tools or to release docs or even bother .

Until then, this "prediction" of 3DM might have some merit.

Monday, August 17, 2015

mmap equivalent in Windows or How To Map Physical Memory to Userspace.

Windows unfortunately has no equivalent of mmap that can access physical memory e.g by mapping /dev/mem to some userspace address, however this can be achieved by a simple(not really) kernel-mode driver that I've personally used in Windows 7 x64, though there should be no reason why it wouldn't work in Windows 8/8.1/10

I found it in this article on CodeProject.

Now, even if you compile the driver, on x64 Windows systems the driver needs to be signed, this for development purposes can be disabled, follow the article on MSDN on how to do that.

But beware, fiddling with physical memory can lead to some very dangerous results if you aren't careful, e.g permanent hardware failure or data loss.

Tuesday, August 4, 2015

The scary Virtual Machine

Sorry for the cheesy thread title, but I had no idea what to put there.

But anyway, I recently came across more virtual machines, and honestly, when you get to the jist of it, they aren't all that difficult to understand nor implement.

For instance, this guy here wrote his own C compiler for the C89 standard, and made it work for his own custom virtual CPU, for which he wrote several "emulators"(emulator;virtual machine it's all the same in this context) in C, Java and finally, Javascript. This actually gave me an idea to implement some VMs in Javascript as well, I mean you can run the thing in your browser.

Now, Virtual Machines like VirtualBox, VMWare and QEMU are different, they try to emulate a whole computer with the peripherals and also takes advantage of a CPU's special virtualization options for HW virtualization, they are indeed harder to write and understand and I myself couldn't even begin to comprehend VirtualBox's code.

But we aren't interested in those(or at least I am not) right now, we just want to emulate a CPU, or even create our own, the sky is the limit.

Monday, February 2, 2015

Back to the Roots.

Initially this blog was about compiling stuff for Windows, then I turned it into a RE blog, but today I plan to go back to the roots and post some stuff about compilation.

The goal was to cross-compile MySQL from Linux x86_64 to Android, ARM-v7a. But there are a few problems here.
1.) Google's NDK offers a very slimmed-down version of GCC and they provide their own standard C library called Bionic which is missing a ton of stuff, widechar support being one of them among many missing headers.
2.) The Crystax NDK does not add these missing headers, so the only option left was to compile my own GCC with GLIBC. I did that nearly 2 years ago with crosstool-ng and have successfully compiled php like that(but not with all features).
3.) Some targets built by the CMake system have to be run, this isn't possible when cross-compiling, so you need to first build MySQL for the host, and then gather the tools you need further increasing the complexity of the task of compiling MySQL for Android.
4.) Static linking of GLIBC, this is the most important part, as it needs to be set  in the CMAKE_C_FLAGS before you build(and it takes a while)

Needless to say I managed to compile it, with default features, but I forgot to statically link GLIBC, and mysql did not run.