Over the past couple of months some major changes have happened to my life. The biggest one was the sudden passing of my father. He died in a car crash, this affected me deeply. Obviously due to this, and a sudden disinterest in RE before that made me step away for some time.
In the meantime, Denuvo was cracked, so no point in pursuing that anymore. x64dbg has progressed marvelously with ever increasing features, bug fixes and usability improvements, so I believe it has officially surpassed Ollydbg, not to mention x64 support which is the de facto standard these days.
Monday, December 19, 2016
Tuesday, July 5, 2016
Useless post.
Going to dig in a bit. Presumably what I am looking at is the Origin DRM, so what I am looking at is not Denuvo. And yes, that is a (useless) screenshot of the first thing I see when I load up an Origin game in x64dbg. So it's nothing special.
Quite frankly, I am angry, the shitposting on reddit regarding Denuvo, the false promises, the contradicting information of what Denuvo does, how it works etc.
How about I post what I find in addition to screenshots(or videos?) that at least proves/disproves something or sheds some light on the functionality.
Saturday, April 16, 2016
Might take a look at Unravel.
Unlike the other games, this one seems interesting as I want to play it. It uses Denuvo. This shouldn't be interpreted as if I am going to crack it, that will likely be impossible as I have never studied Denuvo moreover x64dbg still lacks a tracer and the underlying TitanEngine engine is wonky(aka the tools aren't mature enough;no offense mrexodia).
Some progress on Crysis 3.
In light of the news that Crytek open sourced their entire engine, I am able to study the Crysis 3 machine code and identify critical code and structures.
One particular bugging problem was clipping/culling of characters in Singleplayer or Multiplayer at a really short distance, in multiplayer this is unacceptable. There is no option enabled to fix this, so obviously I had to dig in the code, but it is C++ and it's also lots of math, I mean this is 3D programming, which I am both unfamiliar with. So I opted to use Cheat Engine to find the variable responsible for this, which led me to a piece of code that held a float type which upon change showed that in part affected the drawing at distance. Unfortunately multiple subsystems used this value and it fucked up the game, the code which I couldn't have identified without the CryEngine source code was the CCamera class constructor, and the particular value might be related to the Z buffer or Frustum plane. If I can find where they are calculated perhaps I can fix this drawing issue.
I also managed to enable all CryEngine commands though not all are modifiable, one in particular is the Nanovision blur command, which might help people in the Multiplayer.
Sunday, April 3, 2016
Great article on API hooking!
As I was browsing RE related blogs and articles I stumbled upon this gem.
Amazing article that gave me more insight on how to perform more stealthy hooking.
Amazing article that gave me more insight on how to perform more stealthy hooking.
Monday, March 28, 2016
Analyzing the SecuROM 8.10.X VM Part 2
This will be a small post that I will update here-after once I discover more.
If you go back to the previous article, I mentioned the part where SecuROM walks all 100 possible VM contexts and finds the first free one. Well, the table starts at address 0x3BF7C2FC, since all the contexts are allocated from a single Heap(likely created by HeapCreate and later, memory is allocated using RtlAllocateHeap with said heap handle) they are contiguous, so I subtracted the address of one context with the previous in the table and came up with a size of 0x460(1120) bytes for each VM context, this includes the VM Context structure and the scratchpad, which is the area where the virtual registers are written, as well as code being written and executed.
Address 0x3BF7C2F8 is the spinlock value.
Now, when analyzing the first VM program, on the surface it felt like it wouldn't do much, but I actually saw that before it zeroes out the busy flag, it recursively(although I could be wrong) called the VM a dozen more times with different arguments, that much code will take very long to trace. In addition to this, it also writes and executes code off the scratchpad.
Now, let's analyze the first obfuscated VM handler from the previous post:
mov esi,dword ptr ds:[ebx+4]
add esi,dword ptr ds:[ebx+0C]
add esi,4
This part fetches the current VM EIP(which is a delta from the VM entry point), adds the entry point to compute the pointer to the opcodes. "add esi, 4" increments the VM EIP by 4 bytes.
push dword ptr ds:[esi]
pop edi
Translates to exactly mov edi, dword ptr ds:[esi] which moves the new opcode into register EDI.
mov dword ptr ds:[ebx+400],4
sub esi,dword ptr ds:[ebx+400]
This here is both boring and interesting, it's boring because it decrements the VM EIP by 4 bytes, but it's interesting that instead of fetching the opcode first, then incrementing by 4 bytes to get the next one, it does it backwards. It's also interesting, because offset 0x400 in the VM context is referenced elsewhere lots of times. Anyway, to basically simplify this, it's the equivalent of "sub esi, 4".
push dword ptr ds:[esi]
pop esi
Translates to "mov esi, dword ptr ds:[esi]". Again, moving an opcode into register ESI.
mov cl,byte ptr ds:[ebx+10] moves the modifier from the VM context into 8-bit register CL.
push eax
xor eax, esi
xor eax,dword ptr ss:[esp]
add esp,4
While eax isn't referenced before, it contains the address of the start of the handler, in my case it was 0x38F78918.
The address of the handler is xor'ed with the opcode we extracted into ESI. 38F78918 ^ 687ADD02 = 508D541A. And because push eax, pushed 38F78918 to the stack, xor eax,dword ptr ss:[esp] translates to 508D541A ^ 38F78918 = 687ADD02.
To summarize this uses the xor method of swapping values, which can be translated as mov eax, esi.
shl eax,10
shr eax,18
xor al,2A
add byte ptr ds:[ebx+10],al
So one of the opcodes we previously moved into ESI, then into EAX is used as a modifier, by adding it to the previous one(which by default is always 0x95). The shifts there essentially chopping off bits to extract the 3rd byte of the opcode and adds it to the default value to equal 0x72(1 byte add).
mov eax,3BF7C894
push edi
push eax
mov eax,esi
shl eax,18
shr eax,18
ror al,cl
xor al,78
shl eax,2
The constant being moved to EAX will be discussed later on as it is saved on the stack for later use. We will focus on mov eax, esi which moves the value 687ADD02 which we figured out how it was produced earlier, into register EAX, the next two shifts essentially extract the 4th byte, which is 0x2 and right rotate it with the default modifier 0x95 stored into CL. Since 0x95 is larger than the 32 bits, the value should wrap around, in the end we get a value of 0x10, which I suppose can be translated as al << 3. The value is then xor'ed with 0x78 and produced 0x68 which is then left shifted by 0x2 to produce 0x1A0.
add eax,ebx
mov edi,eax
pop eax
push eax
So what happened before, all that junk above just to compute offset 0x1A0, then the value from EBX is added to 0x1A0, EBX contains the address of the VM context. Now remember the constant before, 0x3BF7C894, it's moved to EAX via the first pop eax, and then it's pushed again. So pop eax, push eax can be translated as mov eax, dword ptr ss:[esp].
pop dword ptr ds:[edi]
The constant is stored to where EDI points to via that pop. It points to offset 0x1A0 in the VM context.
pop edi
push edi
pop eax
So what happens here? Well, pop edi moves one of the opcodes into EDI, then pushes it onto the stack again and pops it right back into EAX. So we can translated this as either mov eax, dword ptr ss:[esp] or if we take into account the pop edi instruction, then mov eax, edi. The opcode was 3852A852.
sub al,cl
xor eax,00B42D00
So, the last byte of opcode 3852A852 is subtracted by CL(0x95) and produces value 3852A8BD. which is xor'ed by 00B42D00 and the final value in EAX is 38E685BD.
add dword ptr ds:[ebx+4],8 <-- Update VM EIP with 8 bytes.
jmp eax <-- Jump to computed handler.
Pretty self-explanatory. VM EIP is incremented by 8 bytes, and we jump to the address in EAX, which is the next handler.
So what this handler did in a nutshell, is not only store the constant 3BF7C894 into 0x1A0(this could be a virtual register), but also compute the address of the next handler. So we can probably simplify this handler to "mov reg, imm" or as "mov reg1A0, 3BF7C894".
UPDATE:
Let's look at the handler that we jump to, which is the second handler. First thing to notice is this is a 12 byte opcode, and not 8 as with the previous one.
mov esi, dword ptr [ebx + 4]
add esi, dword ptr [ebx + 0xc]
Standard VM EIP delta, with the VM address being added to it.
add esi, 4
mov edi, dword ptr [esi]
One opcode being loaded into EDI, note again how ESI was incremented by 4, so it's loading the second DWORD opcode. The value is 02C4EC00.
mov dword ptr [ebx + 0x400], 4
sub esi, dword ptr [ebx + 0x400]
push dword ptr [esi + 8]
As ESI is decremented by 4, then a value at offset 0x8 is pushed on the stack, which is the 3rd opcode. So it's stored for later use.
push esi
xor esi, dword ptr [esi]
xor esi, dword ptr [esp]
add esp, 4
Again, standard swap using the xor trick. So essentially, this is mov esi, dword ptr ds:[esi]. We just loaded the first opcode.
mov cl, byte ptr [ebx + 0x10]
The modifier in the VM context is loaded into CL.
push eax
xor eax, esi
xor eax, dword ptr [esp]
add esp, 4
This piece translates to exactly, mov eax, esi.
shl eax, 0x10
shr eax, 0x18
xor al, 0x40
add byte ptr [ebx + 0x10], al
The value in EAX is E518721C, (0xE518721C << 0x10) >> 0x18 = 0x72 ^ 0x40 = 0x32 - we are extracting the third byte, decrypting it with the xor and updating the modifier.
push eax
xor eax, edi
xor eax, dword ptr [esp]
add esp, 4
Don't even need to see this in action to know that it is doing mov eax, edi.
xor eax, 0x2c4ec60
So 02C4EC00 ^ 02C4EC60 = 0x60.
sub esp, 4
mov dword ptr [esp], eax
This can simply be interpreted as push eax.
push esi
pop eax
Seems like we are moving what was in ESI to EAX e.g mov eax, esi.
shl eax, 0
shr eax, 0x18
rol al, cl
xor al, 0x36
shl eax, 2
This piece is is decrypting the value using the modifier. The first left shift is redundant, the whole operation is as follows: E5 << CL(0x8C) | E5 >> 32 - CL(0x8C) = 0x5E ^ 0x36 = 0x68 << 2 = 0x1A0. Woohoo, so it's our virtual register where we stored our constant before.
push eax
add dword ptr [esp], ebx
pop eax
The value in EAX is now 0x1A0. It is pushed on the stack, EBX is added to it(it contains the VM context address) and is popped back into EAX.
push edi
xor edi, eax
xor edi, dword ptr [esp]
add esp, 4
Translates to mov edi, eax.
pop eax
push eax
neg eax
sub dword ptr [edi], eax
By this point, pop eax moves into eax the value 0x60 that was pushed earlier on. Pushes it on the stack again.. It negates it which is to say -0x60 or 0-0x60 = FFFFFFA0 and substracts it from the virtual register reg1A0 which holds the constant 3BF7C894 to equal 3BF7C8F4.
mov dword ptr [ebx + 0x400], 0x2152f
mov eax, dword ptr [ebx + 0x400]
pop eax
Completely redundant operation. You move the value from EBX+400 to eax, but then do pop eax, which moves the value 0x60 to EAX overwriting the previous value. Which is also irrelevant, because it's overwritten in the next sequence.
pop eax
rol al, cl
xor eax, 0x2c4ec60
add dword ptr [ebx + 4], 0xc
jmp eax
So the first instruction moves the value 3A2245FD which is also the third opcode, rotates the last byte 0xFD with CL(0x8C) to produce the value 3A2245DF and then it's xor'ed with 02C4EC60, so 3A2245DF ^ 02C4EC60 = 38E6A9BF.
VM EIP is then incremented by 12!!! bytes and we jump to the next handler at address 38E6A9BF.
So this handler essentially does add reg1A0, 0x60, or "add vmreg, imm"?
Now, let's analyze the first obfuscated VM handler from the previous post:
mov esi,dword ptr ds:[ebx+4]
add esi,dword ptr ds:[ebx+0C]
add esi,4
push dword ptr ds:[esi]
pop edi
mov dword ptr ds:[ebx+400],4
sub esi,dword ptr ds:[ebx+400]
push dword ptr ds:[esi]
pop esi
mov cl,byte ptr ds:[ebx+10]
push eax
xor eax,esi
xor eax,dword ptr ss:[esp]
add esp,4
shl eax,10
shr eax,18
xor al,2A
add byte ptr ds:[ebx+10],al
mov eax,3BF7C894
push edi
push eax
mov eax,esi
shl eax,18
shr eax,18
ror al,cl
xor al,78
shl eax,2
add eax,ebx
mov edi,eax
pop eax
push eax
pop dword ptr ds:[edi]
pop edi
push edi
pop eax
sub al,cl
xor eax,00B42D00
add dword ptr ds:[ebx+4],8 <-- Update VM EIP with 8 bytes.
jmp eax <-- Jump to computed handler.
mov esi,dword ptr ds:[ebx+4]
add esi,dword ptr ds:[ebx+0C]
add esi,4
This part fetches the current VM EIP(which is a delta from the VM entry point), adds the entry point to compute the pointer to the opcodes. "add esi, 4" increments the VM EIP by 4 bytes.
push dword ptr ds:[esi]
pop edi
Translates to exactly mov edi, dword ptr ds:[esi] which moves the new opcode into register EDI.
mov dword ptr ds:[ebx+400],4
sub esi,dword ptr ds:[ebx+400]
This here is both boring and interesting, it's boring because it decrements the VM EIP by 4 bytes, but it's interesting that instead of fetching the opcode first, then incrementing by 4 bytes to get the next one, it does it backwards. It's also interesting, because offset 0x400 in the VM context is referenced elsewhere lots of times. Anyway, to basically simplify this, it's the equivalent of "sub esi, 4".
push dword ptr ds:[esi]
pop esi
Translates to "mov esi, dword ptr ds:[esi]". Again, moving an opcode into register ESI.
mov cl,byte ptr ds:[ebx+10] moves the modifier from the VM context into 8-bit register CL.
push eax
xor eax, esi
xor eax,dword ptr ss:[esp]
add esp,4
While eax isn't referenced before, it contains the address of the start of the handler, in my case it was 0x38F78918.
The address of the handler is xor'ed with the opcode we extracted into ESI. 38F78918 ^ 687ADD02 = 508D541A. And because push eax, pushed 38F78918 to the stack, xor eax,dword ptr ss:[esp] translates to 508D541A ^ 38F78918 = 687ADD02.
To summarize this uses the xor method of swapping values, which can be translated as mov eax, esi.
shl eax,10
shr eax,18
xor al,2A
add byte ptr ds:[ebx+10],al
So one of the opcodes we previously moved into ESI, then into EAX is used as a modifier, by adding it to the previous one(which by default is always 0x95). The shifts there essentially chopping off bits to extract the 3rd byte of the opcode and adds it to the default value to equal 0x72(1 byte add).
mov eax,3BF7C894
push edi
push eax
mov eax,esi
shl eax,18
shr eax,18
ror al,cl
xor al,78
shl eax,2
The constant being moved to EAX will be discussed later on as it is saved on the stack for later use. We will focus on mov eax, esi which moves the value 687ADD02 which we figured out how it was produced earlier, into register EAX, the next two shifts essentially extract the 4th byte, which is 0x2 and right rotate it with the default modifier 0x95 stored into CL. Since 0x95 is larger than the 32 bits, the value should wrap around, in the end we get a value of 0x10, which I suppose can be translated as al << 3. The value is then xor'ed with 0x78 and produced 0x68 which is then left shifted by 0x2 to produce 0x1A0.
add eax,ebx
mov edi,eax
pop eax
push eax
So what happened before, all that junk above just to compute offset 0x1A0, then the value from EBX is added to 0x1A0, EBX contains the address of the VM context. Now remember the constant before, 0x3BF7C894, it's moved to EAX via the first pop eax, and then it's pushed again. So pop eax, push eax can be translated as mov eax, dword ptr ss:[esp].
pop dword ptr ds:[edi]
The constant is stored to where EDI points to via that pop. It points to offset 0x1A0 in the VM context.
pop edi
push edi
pop eax
So what happens here? Well, pop edi moves one of the opcodes into EDI, then pushes it onto the stack again and pops it right back into EAX. So we can translated this as either mov eax, dword ptr ss:[esp] or if we take into account the pop edi instruction, then mov eax, edi. The opcode was 3852A852.
sub al,cl
xor eax,00B42D00
So, the last byte of opcode 3852A852 is subtracted by CL(0x95) and produces value 3852A8BD. which is xor'ed by 00B42D00 and the final value in EAX is 38E685BD.
add dword ptr ds:[ebx+4],8 <-- Update VM EIP with 8 bytes.
jmp eax <-- Jump to computed handler.
Pretty self-explanatory. VM EIP is incremented by 8 bytes, and we jump to the address in EAX, which is the next handler.
So what this handler did in a nutshell, is not only store the constant 3BF7C894 into 0x1A0(this could be a virtual register), but also compute the address of the next handler. So we can probably simplify this handler to "mov reg, imm" or as "mov reg1A0, 3BF7C894".
UPDATE:
Let's look at the handler that we jump to, which is the second handler. First thing to notice is this is a 12 byte opcode, and not 8 as with the previous one.
mov esi, dword ptr [ebx + 4]
add esi, dword ptr [ebx + 0xc]
Standard VM EIP delta, with the VM address being added to it.
add esi, 4
mov edi, dword ptr [esi]
One opcode being loaded into EDI, note again how ESI was incremented by 4, so it's loading the second DWORD opcode. The value is 02C4EC00.
mov dword ptr [ebx + 0x400], 4
sub esi, dword ptr [ebx + 0x400]
push dword ptr [esi + 8]
As ESI is decremented by 4, then a value at offset 0x8 is pushed on the stack, which is the 3rd opcode. So it's stored for later use.
push esi
xor esi, dword ptr [esi]
xor esi, dword ptr [esp]
add esp, 4
Again, standard swap using the xor trick. So essentially, this is mov esi, dword ptr ds:[esi]. We just loaded the first opcode.
mov cl, byte ptr [ebx + 0x10]
The modifier in the VM context is loaded into CL.
push eax
xor eax, esi
xor eax, dword ptr [esp]
add esp, 4
This piece translates to exactly, mov eax, esi.
shl eax, 0x10
shr eax, 0x18
xor al, 0x40
add byte ptr [ebx + 0x10], al
The value in EAX is E518721C, (0xE518721C << 0x10) >> 0x18 = 0x72 ^ 0x40 = 0x32 - we are extracting the third byte, decrypting it with the xor and updating the modifier.
push eax
xor eax, edi
xor eax, dword ptr [esp]
add esp, 4
Don't even need to see this in action to know that it is doing mov eax, edi.
xor eax, 0x2c4ec60
So 02C4EC00 ^ 02C4EC60 = 0x60.
sub esp, 4
mov dword ptr [esp], eax
This can simply be interpreted as push eax.
push esi
pop eax
Seems like we are moving what was in ESI to EAX e.g mov eax, esi.
shl eax, 0
shr eax, 0x18
rol al, cl
xor al, 0x36
shl eax, 2
This piece is is decrypting the value using the modifier. The first left shift is redundant, the whole operation is as follows: E5 << CL(0x8C) | E5 >> 32 - CL(0x8C) = 0x5E ^ 0x36 = 0x68 << 2 = 0x1A0. Woohoo, so it's our virtual register where we stored our constant before.
push eax
add dword ptr [esp], ebx
pop eax
The value in EAX is now 0x1A0. It is pushed on the stack, EBX is added to it(it contains the VM context address) and is popped back into EAX.
push edi
xor edi, eax
xor edi, dword ptr [esp]
add esp, 4
Translates to mov edi, eax.
pop eax
push eax
neg eax
sub dword ptr [edi], eax
By this point, pop eax moves into eax the value 0x60 that was pushed earlier on. Pushes it on the stack again.. It negates it which is to say -0x60 or 0-0x60 = FFFFFFA0 and substracts it from the virtual register reg1A0 which holds the constant 3BF7C894 to equal 3BF7C8F4.
mov dword ptr [ebx + 0x400], 0x2152f
mov eax, dword ptr [ebx + 0x400]
pop eax
Completely redundant operation. You move the value from EBX+400 to eax, but then do pop eax, which moves the value 0x60 to EAX overwriting the previous value. Which is also irrelevant, because it's overwritten in the next sequence.
pop eax
rol al, cl
xor eax, 0x2c4ec60
add dword ptr [ebx + 4], 0xc
jmp eax
So the first instruction moves the value 3A2245FD which is also the third opcode, rotates the last byte 0xFD with CL(0x8C) to produce the value 3A2245DF and then it's xor'ed with 02C4EC60, so 3A2245DF ^ 02C4EC60 = 38E6A9BF.
VM EIP is then incremented by 12!!! bytes and we jump to the next handler at address 38E6A9BF.
So this handler essentially does add reg1A0, 0x60, or "add vmreg, imm"?
Tuesday, March 22, 2016
Anti-dumping trick or coincidence?
In MSVC the way to name/rename a thread is to call RaiseException with special parameters. More information can be found on MSDN.
The exception value is specific, it's 0x406D1388. If you see this exception value, it likely means that an application is trying to set a name for it's thread.
This is the case in CryEngine, however in my dumped exe, the code was failing. The exception handler used was _except_handler3, which is some generic handler in MSVCRT? Which also uses a secondary, user-provided handler table.
_except_handler3 has an internal check with VirtualQuery and checks the page access of this handler table. Specifically, 'MEMORY_BASIC_INFORMATION's 'Protect' member. Since the (user supplied)handler table was in the .rdata section, the protection should have been PAGE_READONLY, but in my case, it was PAGE_WRITECOPY. It was not set as read-only, because when I rebuilt the imports, some of the pointers were in the .rdata section, thus the import reconstructor made it writeable, this causes _except_handler3's page protection check to fail, thus it never calls any of the handlers, and the thread renaming exception never gets handled leading to a crash early on.
But here's the problem, if I make the .rdata section just read-only, the PE loader fails early as it cannot write the imports to the OriginalFirstThunk addresses. So essentially, I need proper import rebuilding.
UPDATE: Upon consulting with other people who pretty much solved my problem, I learned that it is normal for the IAT to reside in .rdata or a read-only section, the loader shouldn't choke, so we're not quite sure what the exact cause was, but upon closer inspection, the IAT RVA and IAT Size were set to 0 in the PE header. Upon fixing this, all was well.
The exception value is specific, it's 0x406D1388. If you see this exception value, it likely means that an application is trying to set a name for it's thread.
This is the case in CryEngine, however in my dumped exe, the code was failing. The exception handler used was _except_handler3, which is some generic handler in MSVCRT? Which also uses a secondary, user-provided handler table.
_except_handler3 has an internal check with VirtualQuery and checks the page access of this handler table. Specifically, 'MEMORY_BASIC_INFORMATION's 'Protect' member. Since the (user supplied)handler table was in the .rdata section, the protection should have been PAGE_READONLY, but in my case, it was PAGE_WRITECOPY. It was not set as read-only, because when I rebuilt the imports, some of the pointers were in the .rdata section, thus the import reconstructor made it writeable, this causes _except_handler3's page protection check to fail, thus it never calls any of the handlers, and the thread renaming exception never gets handled leading to a crash early on.
But here's the problem, if I make the .rdata section just read-only, the PE loader fails early as it cannot write the imports to the OriginalFirstThunk addresses. So essentially, I need proper import rebuilding.
UPDATE: Upon consulting with other people who pretty much solved my problem, I learned that it is normal for the IAT to reside in .rdata or a read-only section, the loader shouldn't choke, so we're not quite sure what the exact cause was, but upon closer inspection, the IAT RVA and IAT Size were set to 0 in the PE header. Upon fixing this, all was well.
Thursday, March 17, 2016
For those of you that have time to spare.
As we have more or less shifted to x64, we find ourselves in need of new tools, in this case, free debuggers. I don't think we will ever see Olly64 come to fruit, the developer has not posted any updates in years, so with that in mind, I urge those that have the time and like to contribute to open source projects, to check out x64dbg, it has the potential to replace Olly, both 32 and 64-bit. Unfortunately, the devs need help, there's like only 2-3 active contributors, and lots of features are missing from the debugger to make it more useful.
It comes with the Snowman decompiler built-in, although I've found it to be less than accurate. There are also plans to have graphs, just like IDA, but so far nobody has come forth to contribute.
With your contributions, you will be indirectly helping in defeating future and current x64 protections(one being Denuvo/VMProtect). Of course, if you loathe piracy, then the other reason is malware research.
Addendum: You can also join the development channel on #x64dbg@irc.freenode.net.
It comes with the Snowman decompiler built-in, although I've found it to be less than accurate. There are also plans to have graphs, just like IDA, but so far nobody has come forth to contribute.
With your contributions, you will be indirectly helping in defeating future and current x64 protections(one being Denuvo/VMProtect). Of course, if you loathe piracy, then the other reason is malware research.
Addendum: You can also join the development channel on #x64dbg@irc.freenode.net.
Labels:
contribute,
crack,
debugger,
decompiler,
denuvo,
free,
graphs,
help,
open source,
understaffed,
urge,
vmprotect,
x64dbg
Monday, February 15, 2016
Have I found the mysterious anti-debug?
In my previous post I mentioned that running the game under a debugger, would, after a while, force terminate the game.
I speculated either the debugger was being found by an API directly, or indirectly via a timing anti-debug.
I did some experiments. And the evidence points to a timing anti-debug. The time it takes to terminate the game is variable, and it turns out, it only happens if the performance of the game is rather bad. In this case, it was Olly 2's fault. There seems to be some kind of bug in Ollydbg 2.01 whereby all threads of a running application are suspended and resumed constantly, the game runs although with a 30-35% penalty. The timing anti-debug sees this, sees that more ticks are being expended than normal and with careful communication between two threads, it calls NtTerminateProcess by spawning several threads that point to a VM program(only of the thread has a different VM program than the rest)..
In most cases, what Olly is doing is normal behaviour, it's how it's usually done, but not in my case, I've observed olly idling and not doing this suspend/resume thing. The bug seems to disappear if I(at least in my case) I hit a memory breakpoint. Then olly is acting normal, and the game does not terminate, or at least not as fast as before, if the avg grows as time passes, because of small slowdowns, then it will terminate eventually.
I looked at my trace log of one of the obfuscated threads, lo and behold, RDTSC on address 3955DEA9(quick reminder there is no ASLR). The result of RDTSC is stored in EDX:EAX, these values are later used in a loop and are encrypted and stored in a table.
Now that I know what is what, I can better understand the underlying algorithm. One thing is certain, the mystery is solved.
Quick reminder that timing anti-debugs are in my opinion, the most difficult to handle, it isn't as easy returning 0 on GetTickCount.
I speculated either the debugger was being found by an API directly, or indirectly via a timing anti-debug.
I did some experiments. And the evidence points to a timing anti-debug. The time it takes to terminate the game is variable, and it turns out, it only happens if the performance of the game is rather bad. In this case, it was Olly 2's fault. There seems to be some kind of bug in Ollydbg 2.01 whereby all threads of a running application are suspended and resumed constantly, the game runs although with a 30-35% penalty. The timing anti-debug sees this, sees that more ticks are being expended than normal and with careful communication between two threads, it calls NtTerminateProcess by spawning several threads that point to a VM program(only of the thread has a different VM program than the rest)..
In most cases, what Olly is doing is normal behaviour, it's how it's usually done, but not in my case, I've observed olly idling and not doing this suspend/resume thing. The bug seems to disappear if I(at least in my case) I hit a memory breakpoint. Then olly is acting normal, and the game does not terminate, or at least not as fast as before, if the avg grows as time passes, because of small slowdowns, then it will terminate eventually.
I looked at my trace log of one of the obfuscated threads, lo and behold, RDTSC on address 3955DEA9(quick reminder there is no ASLR). The result of RDTSC is stored in EDX:EAX, these values are later used in a loop and are encrypted and stored in a table.
Now that I know what is what, I can better understand the underlying algorithm. One thing is certain, the mystery is solved.
Quick reminder that timing anti-debugs are in my opinion, the most difficult to handle, it isn't as easy returning 0 on GetTickCount.
Labels:
anti-debug,
GetTickCount,
rdtsc,
securom,
time,
timing,
vm
Wednesday, February 10, 2016
Hunting for the mysterious anti-debug.
Well, it's probably not that mysterious. But let me point you to my last post, where I mentioned the problem briefly in the last paragraph.
I mentioned there were two threads, one is 3939EF70 and the other is 3939F9C0. Since the game utilizes no ASLR, and SecuROM expects most addresses to be the same on any system(hardcoded), there is no need to recompute them for each system.
The first thread to be started is 3939EF70, the thread is extremely obfuscated, here is but a sample of it
This is from the tracer, in reality some of these instructions are overlapped. This thread seems to, initially, just loop over, checking for a value if it is bigger or smaller than another at particular hardcoded addresses, and jumps to different piece of code, but they all ultimately end at the same place initially, GetTickCount,+120 seconds to the value returned by GetTickCount, and then Sleep(120 seconds).
It repeats the aforementioned Sleep infinitely, until the other thread 3939F9C0 signals it, by writing different values to these hardcoded addresses, thereby making that thread 3939EF70 take different branches. At some point, 3939EF70 starts another thread with CreateRemoteThread that executes the VM. Interestingly, there is no synchronization between the threads, both threads rely on the fact that the either of them will Sleep when one is modifying the same addresses.
I mentioned an anti-debug, that's right. The game runs under a debugger for as long as thread 3939F9C0 allows it to, then randomly between 5-30 minutes(rarely longer) the thread calls NtTerminateProcess.
I've been speculating this is a timing anti-debug, that there is some 'avg' value that goes up as the debugger handles various events and generally slows down the game by 20-30%. As soon as this value crosses some threshold, it calls NtTerminateProcess. This seems to be further reinforced by the fact, that if I were to start the game under a debugger, detach, the game would never terminate. If that is not the case, then I am being detected by a different method.
Oh yes, I managed to manually patch the exe to disable code verification. Now I can tamper with some of the code(except the packed code which I can only modify at runtime).
I mentioned there were two threads, one is 3939EF70 and the other is 3939F9C0. Since the game utilizes no ASLR, and SecuROM expects most addresses to be the same on any system(hardcoded), there is no need to recompute them for each system.
The first thread to be started is 3939EF70, the thread is extremely obfuscated, here is but a sample of it
lea esp,[esp-4]
mov dword ptr ss:[esp],ebp
mov ebp,esp
sub esp,4
mov dword ptr ss:[esp],32E
xor dword ptr ss:[esp],00000326
sub esp,dword ptr ss:[esp]
lea esp,[esp-4]
mov dword ptr ss:[esp],esi
mov eax,-1D9
mov eax,dword ptr ds:[eax+3C04C67D]
xor eax,0000911A
mov dword ptr ds:[3C04C4A4],eax
mov eax,-0E7
mov eax,dword ptr ds:[eax+3C04C58F]
xor eax,00009A12
mov dword ptr ds:[3C04C4A8],eax
mov eax,-55
mov eax,dword ptr ds:[eax+3C04C4F9]
add eax,dword ptr ds:[3C04C4A8]
mov dword ptr ds:[3C04C4AC],eax
call 3939EFE5
add dword ptr ss:[esp],3E
push dword ptr ss:[esp]
sub dword ptr ss:[esp],3B
push ebx
mov ebx,dword ptr ss:[esp+4]
xchg dword ptr ss:[esp],ebx
xchg dword ptr ss:[esp],ebp
mov ebp,dword ptr ss:[ebp]
sub ebp,13
xchg dword ptr ss:[esp],ebp
mov dword ptr ss:[esp+4],A60004C2
jmp short 3939F00F
retn 4
This is from the tracer, in reality some of these instructions are overlapped. This thread seems to, initially, just loop over, checking for a value if it is bigger or smaller than another at particular hardcoded addresses, and jumps to different piece of code, but they all ultimately end at the same place initially, GetTickCount,+120 seconds to the value returned by GetTickCount, and then Sleep(120 seconds).
It repeats the aforementioned Sleep infinitely, until the other thread 3939F9C0 signals it, by writing different values to these hardcoded addresses, thereby making that thread 3939EF70 take different branches. At some point, 3939EF70 starts another thread with CreateRemoteThread that executes the VM. Interestingly, there is no synchronization between the threads, both threads rely on the fact that the either of them will Sleep when one is modifying the same addresses.
I mentioned an anti-debug, that's right. The game runs under a debugger for as long as thread 3939F9C0 allows it to, then randomly between 5-30 minutes(rarely longer) the thread calls NtTerminateProcess.
I've been speculating this is a timing anti-debug, that there is some 'avg' value that goes up as the debugger handles various events and generally slows down the game by 20-30%. As soon as this value crosses some threshold, it calls NtTerminateProcess. This seems to be further reinforced by the fact, that if I were to start the game under a debugger, detach, the game would never terminate. If that is not the case, then I am being detected by a different method.
Oh yes, I managed to manually patch the exe to disable code verification. Now I can tamper with some of the code(except the packed code which I can only modify at runtime).
Labels:
analysis,
antidebug,
crysis,
denuvo,
obfuscation,
re,
reverse engineering,
securom,
threads
Sunday, February 7, 2016
SecuROM's anti-tampering verification is only one if?
I was wondering how to deal with it, turns out, I didn't have to. The verification was a loop that computed a checksum of the code, then when it did, it compared the resulting checksum to a DWORD from an array.
There were two different instances of this, I simply had to patch the conditional jumps. One was je short, the other was jne. I changed the je to a jmp and then nop'ed the jne.
With this I could modify the code as I please.
There were two different instances of this, I simply had to patch the conditional jumps. One was je short, the other was jne. I changed the je to a jmp and then nop'ed the jne.
With this I could modify the code as I please.
Saturday, February 6, 2016
3DM will stop cracking for a whole year!
In a surprise announcement, 3DM cracking group have said that they will stop cracking single-player games under the pretext that they want to see if sales of games increase.
Why did I say pretext? Remember they last said they had difficulty cracking the latest iteration of Denuvo? As of now I propose either of the following reasons:
More on this here.
Why did I say pretext? Remember they last said they had difficulty cracking the latest iteration of Denuvo? As of now I propose either of the following reasons:
- 3DM cannot crack Denuvo and want to work on it for the next year. This sounds more plausible. They might even drop out of the scene completely, only to re-emerge years later when things around Denuvo had quieted down.
- They are genuinely doing this for the stated reason.
- They have come to an understanding with Denuvo GmbH/VMProtect to cease their activities for now.
Sunday, January 31, 2016
Analyzing the SecuROM 8.10.X VM.
I want to thank ARTeam for providing the docs on SecuROM 7.30 VM they really helped and are mostly still relevant today.
That said, I have not worked on SecuROM 7.30 ever, but I believe the VM has changed since then.
Here is an overview of the VM initialization.
When entering the VM, an argument is pushed to the stack. It's a pointer to a pointer to the VM opcodes. I call this argument a "program".
The dummy call after pushfd is used to get the address of the VM.
then the VM context is zeroed out, but care is taken not to zero out the busy flag. Afterwards the lock is removed and the jump "je short 38D702FE" takes us to the last step of the initialization.
pop ebx loads the VM context for this thread in ebx.
sub dword ptr ss:[esp],7 subtracts 7 bytes from the VM function address which I mentioned above that it is pushed to the stack with a dummy call so it ends up as 38D70280 in this exe.
This part fills the VM context struct. I've taken the liberty of adding captions next to the instructions which are self-explanatory.
Next is this obfuscated code.
In a nutshell, it fetches the delta to the pointer to the opcodes, adds the VM entry point, and fetches 2 DWORDs(aka 8 bytes).
The first 4 bytes of the opcode is the modifier, the next 4 bytes is the obfuscated address of the handler.
The modifier is used to calculate the next handler address. The one in the VM context is updated with this new one, after some xor and shifts are performed.
The "encrypted" address of the first handler is decrypted with a XOR.
Finally
In my case, this handler basically calculated the address of another handler.
This isn't an exhaustive analysis of the Securom 8 VM, it's but a scratch of the surface. Furthermore, I have not identified the VM exit procedure.
That said, I have not worked on SecuROM 7.30 ever, but I believe the VM has changed since then.
Here is an overview of the VM initialization.
When entering the VM, an argument is pushed to the stack. It's a pointer to a pointer to the VM opcodes. I call this argument a "program".
The dummy call after pushfd is used to get the address of the VM.
In the picture above several things happen. A spinlock is created by the thread which enters the VM and will initialize the context, all other threads, if any, will wait till the first thread has finished the initialization.
Then SecuROM uses the loop x86 construct to loop over all 100 possible VM thread contexts, and finds the first free one. The busy flag 0x66666666 indicates if a thread is busy or not. I should note that SecuROM 7.30 only supported up to 10 threads, SC 8.10 supports 100.
After the first free context is found, SecuROM jumps to the following code which sets the busy flag.
lea edx,[ebx+24]
mov dword ptr ds:[edx],66666666
then the VM context is zeroed out, but care is taken not to zero out the busy flag. Afterwards the lock is removed and the jump "je short 38D702FE" takes us to the last step of the initialization.
sub dword ptr ss:[esp],7 subtracts 7 bytes from the VM function address which I mentioned above that it is pushed to the stack with a dummy call so it ends up as 38D70280 in this exe.
This part fills the VM context struct. I've taken the liberty of adding captions next to the instructions which are self-explanatory.
Next is this obfuscated code.
In a nutshell, it fetches the delta to the pointer to the opcodes, adds the VM entry point, and fetches 2 DWORDs(aka 8 bytes).
CPU Disasm
Address Hex dump Command Comments
38D7035A 8B70 04 mov esi,dword ptr ds:[eax+4]
38D7035D 8B00 mov eax,dword ptr ds:[eax]
The first 4 bytes of the opcode is the modifier, the next 4 bytes is the obfuscated address of the handler.
The modifier is used to calculate the next handler address. The one in the VM context is updated with this new one, after some xor and shifts are performed.
The "encrypted" address of the first handler is decrypted with a XOR.
xor esi,48371826It's then copied to eax.
Finally
CPU Disasm
Address Hex dump Command Comments
38D70396 B9 19000000 mov ecx,19
38D7039B 83F1 1D xor ecx,0000001D
38D7039E 01D9 add ecx,ebx
38D703A0 8301 08 add dword ptr ds:[ecx],8 <-- Add 8 bytes to VM EIP
38D703A3 FFE0 jmp eax
The VM EIP(program counter) is incremented by 8 bytes, and we jump to the address of the first handler.
Here comes the juicy part. The jump to first handler goes to this code
Step 1.
First, ebx+20 is updated with the address of the next pseudo handler, for a lack of a better word. And if we follow the jump we end up where the actual first instruction is executed in this particular handler.
Step 2.
and if we follow the jump we end up at what I call the "dispatcher".
Step 3.
The dispatcher adds the VM entry point to the value added in ebx+20 to form the address of the next pseudo handler.
Basically, from what I understood, a single handler which is usually a sequence of instructions has been split into several small pseudo handlers each reached in three steps. In my opinion this is just obfuscation to slow down reverse engineering.
Sometimes in Step 1 there is an additional instruction that moves a value in ebx+400, which is usually used to substract 4 bytes from the stack pointer.
Now, if we follow the each jump and where it leads to, remove all jumps and dispatcher code, the first handler's code is basically this.
mov esi,dword ptr ds:[ebx+4]
add esi,dword ptr ds:[ebx+0C]
add esi,4
push dword ptr ds:[esi]
pop edi
mov dword ptr ds:[ebx+400],4
sub esi,dword ptr ds:[ebx+400]
push dword ptr ds:[esi]
pop esi
mov cl,byte ptr ds:[ebx+10]
push eax
xor eax,esi
xor eax,dword ptr ss:[esp]
add esp,4
shl eax,10
shr eax,18
xor al,2A
add byte ptr ds:[ebx+10],al
mov eax,3BF7C894
push edi
push eax
mov eax,esi
shl eax,18
shr eax,18
ror al,cl
xor al,78
shl eax,2
add eax,ebx
mov edi,eax
pop eax
push eax
pop dword ptr ds:[edi]
pop edi
push edi
pop eax
sub al,cl
xor eax,00B42D00
add dword ptr ds:[ebx+4],8 <-- Update VM EIP with 8 bytes.
In my case, this handler basically calculated the address of another handler.
This isn't an exhaustive analysis of the Securom 8 VM, it's but a scratch of the surface. Furthermore, I have not identified the VM exit procedure.
Denuvo and VMProtect are the same?
Recently I've been reading on Denuvo, and how certain code seems not similar but identical to that of VMProtect. Russian websites are also saying that Denuvo<=>VMProtect indicating that perhaps the two companies are sharing the same code base. That certain features in VMProtect appear in Denuvo and disappear in VMProtect, and vice-versa.
Here is the article in question (Russian).
Here is the article in question (Russian).
Wednesday, January 6, 2016
Just Cause 3 and Denuvo
So apparently the same thing is happening with JC3 that happened with FIFA15 and DAI.
Do you know what drives technological innovation? Competition! Right now anti-tamper/DRM solutions are being sought because of piracy, and get broken, which forces the authors to come up with new and interesting ways to prevent their solution from being broken.
The same cannot be said from "our" side, us reverse engineers. Most people keep their tools private and we have a stagnation of publicly available tools to help us combat these new solutions and techniques.
Do you know why? Money, Denuvo is paid, cracking is something people do for free of charge, there is little incentive to release their internal tools or to release docs or even bother .
Until then, this "prediction" of 3DM might have some merit.
The founder of notorious Chinese cracking forum 3DM is warning that given the current state of anti-piracy technology, in two years there might be no more pirate games to play. The claims come after attempts to breach the Denuvo security protecting Just Cause 3 pushed the group's cracking expert to breaking point.
Do you know what drives technological innovation? Competition! Right now anti-tamper/DRM solutions are being sought because of piracy, and get broken, which forces the authors to come up with new and interesting ways to prevent their solution from being broken.
The same cannot be said from "our" side, us reverse engineers. Most people keep their tools private and we have a stagnation of publicly available tools to help us combat these new solutions and techniques.
Do you know why? Money, Denuvo is paid, cracking is something people do for free of charge, there is little incentive to release their internal tools or to release docs or even bother .
Until then, this "prediction" of 3DM might have some merit.
Labels:
3DM crack,
anti,
anti-piracy,
cause,
crack,
denuvo,
dragon age,
drm,
game,
just,
just cause,
just cause 3,
piracy,
tamper
Subscribe to:
Posts (Atom)