Saturday, October 24, 2020

 After some cleanup I managed to get the unique code verification blocks, and narrowed them down to 35. This excludes various junk instructions inserted and different registers/memory locations. Below is the code I used to find the unique instances.

for (int i = 0; i < m.size(); i++) {
Vertex n = (Vertex) m.keySet().toArray()[i];
int size = m.get(n).size();
for (int j = 0; j < size; j++) {
Vertex v = m.get(n).get(j);

long hash = 0;
MessageDigest md5 = MessageDigest.getInstance("MD5");
for (int k = 0; k < v.insns.size(); k++) {

Instruction in = v.insns.get(k);
if(in.bytes.length == 2 && in.bytes[0] == (byte)0xEB && in.bytes[1] == (byte)0x0)



hash = ByteBuffer.wrap(md5.digest()).getInt();

if(occurrences.putIfAbsent(hash, v.insns) == null)
System.out.println("not added");
However, those 35 unique instances are variations of two operations, xor and add. So xor and add mutated to 35 unique blocks of code, multiplied a bunch of times to 22718. 

Sunday, October 11, 2020

My tool is progressing nicely.

 After a month's work I have a tool that can more or less create basic blocks from instructions.

I decided my first target, identifying and extracting all basic blocks of the little anti-bp checks of the protector.

393844C383EC 20sub esp,20
393844C6C74424 1C 406CD77Amov dword ptr ss:[esp+1C],7AD76C40
393844CEC74424 18 6E000000mov dword ptr ss:[esp+18],6E
393844D6894C24 14mov dword ptr ss:[esp+14],ecx
393844DAB9 90443839mov ecx,39384490
393844DFC14C24 1C 10ror dword ptr ss:[esp+1C],10
393844E4894424 10mov dword ptr ss:[esp+10],eax
393844E8C1E6 00shl esi,0
393844EB8B01mov eax,dword ptr ds:[ecx]
393844ED014424 1Cadd dword ptr ss:[esp+1C],eax
393844F1C1E7 00shl edi,0
393844F483C1 04add ecx,4
393844F766:FF4C24 18dec word ptr ss:[esp+18]
393844FC75 EDjne 393844EB


Just a sample of the code, the overall pattern is sub esp <size>, three movs and the most important pattern, mov reg, imm. The checks only process at most 294 bytes. And there are 22718 of them with an additional 5500 that are probably false-positives.

The next step is to analyze whether these 22k checks all use the same algorithm or are different and somehow extract it programmatically. I am exploring dataflow analysis and symbolic execution, or if the algo is the same only one sample is enough.