如何让IDA的F5更“好看”
2023-3-31 19:18:43 Author: BeFun安全实验室(查看原文) 阅读量:10 收藏

前言

众所周知IDA的F5在逆向中是非常强大的,然而也存在一些限制,IDA F5出来的结果有时候会稍微有点乱,比如一大堆强制类型转换,一堆不知道什么东西的指针偏移等等。

这篇文章会以一个类似32位x86的虚拟机的CTF逆向题为例,简单记录一下如何根据已有信息推测还原出一个结构体类型,并且对F5的结果进行优化,让代码看起来更加直观和易读。

相关文件下载:https://github.com/Inv0k3r/pwnable_files/raw/master/mvze.zip

一些ida快捷键

D:修改数据类型,db是字节,dw是双字,dd是四字节,dq是八字节

小键盘*:创建数组

Y:修改数据类型,可以把光标放在函数名字上改函数,或者放在变量上修改变量 N:修改函数/变量名字

U:取消定义,可以把识别为函数的汇编重新变回字符

C:定义为代码,把没识别成代码的字符识别成代码

P:创建函数,前面用C识别出来的仅仅是反汇编代码,要F5还得创建函数

开始

上来先找主函数:

main进来之后根据字符串可知需要输入一个二进制文件,那么合理猜测sub_27BE是用来处理二进制文件的,点进去之后有两个函数:

根据内容来看,第一个是用来做初始化的:

第二个是用来读取指令序列文件的:

而下面的while循环的两个函数,一个可以看出是取指令函数,另一个点开是一套switch,是一个用来执行虚拟机指令的函数:

对于稍微复杂一些的二进制文件,恢复数据结构是非常重要的,下面我会写一下如何根据IDA的信息逐步恢复一个结构体。

首先我们已知的是,程序是一个读取指令并执行的程序,那么我们就要先确定指令格式。

根据while循环里的第一个函数来看:

显然变量v6存储了指令。

可以看到指令长度应该都是12字节,即定长指令,然后a1+12000的位置应该是保存着下一条指令的地址,即rip寄存器,作为返回内容的v2则是**(a1+12000),也就是先从a1+12000拿到一个地址,再把这个地址里的内容返回。

这里先按照意义给它们改个名字,同时把12000用16进制表示:

然后新建一个结构体,在0x2ee0的位置设置为rip寄存器:

然后根据之前初始化函数,把我们的结构体划分开,名字暂时先不管,需要注意的是初始化函数里的a1是QWORD,8字节,所以后面的+500和+1000都是以8字节为单位:

同时根据上层函数确定我们的结构体最小为12064字节(3015*4+4):

有个4次循环malloc,推测可能是一个数组:

修复结果:

然后看一下这个函数:

大概意思是把参数1变成4字节的形式,以参数2作为index返回地址,第一个调用相当于返回&(a1->field_fa0[0]),然后把a1都改成新建的结构体类型,再看读取文件函数:

可以看到读取到fa0的是指令序列,命名为code,此函数返回了指令数,所以上层函数的2f1c是指令字节数,命名为binary_size

field_0暂时不清楚作用,所以直接建一个0x2ee0的数组

程序是64位,int是32位,所以指令名称的长度应该是4字节,根据case 0的参数来看,指令高4字节作为参数2,低4字节作为指令本身(参考https://www.cnblogs.com/goodhacker/p/7692443.html),然后指令总长度是12,那么还有4字节的指令就是我们的参数a3了,暂且命名为param2,建个结构体:

然后修改函数参数的各种类型:

此时程序已经非常清晰了,field_2ef8是前面malloc分配的4个长度为4的堆数组,使用的时候是用指令的第二个操作数做index的,可以推测这个数组应该是一个类似通用寄存器的东西,我们命名为regs,同时注意到每个函数的第一个参数都是一样的,很明显这是一个C++的类,第一个参数是this。

点进第一个case看一下:

emmmm不太对啊,我们切到汇编取消函数定义(快捷键U)再重新创建函数(快捷键C,快捷键P),一次不行可以多来几次,我ida的重新分析功能没有起作用,可能是一些BUG:

这次就没有问题了,返回值根据上层执行指令的函数来看是没什么用的,所以这条指令就是field_2ee8+4

按照上面的方法把所有函数的类型之类的修改一下,然后根据功能推测一下结构体内各个数据的意义,以及函数的参数、返回值信息,对F5的结果进行人工修改。

这里要说的一点是,像上面图里的一些语句前面有(xxxx)的强制类型转换,一般说明这个数据的类型前后不匹配,我们可以根据强制类型转换的信息来修改数据类型,举例:

这个函数首先可以看出来这个返回值没有任何用处,所以直接给他改成void类型的函数

然后改结构体的regs的类型,标记成了unsigned int*,我前面设置的是int,改成unsigned int看看:

这时候再看其他函数:

说明之前给函数参数的类型定义也不太对,则根据这个信息全改成unsigned int,最终结果:

接下来就是对每条指令进行逆向

指令功能

case0:

先自增,再赋值,可以推测这个应该是push param,而0x2ee8则是栈顶寄存器,改成rsp,函数命名为push_int

case1:

类似上一个,命名为push_reg

case2:

这个是出栈,pop_reg

case3:

给寄存器赋值mov_reg_int

然后有的指令不知道是什么可以暂时先跳过,接着看别的

这个是根据参数大小来设置寄存器值的,可以认为是一个条件寄存器,指令为cmp

输入一个字符到0号寄存器

输出0号寄存器内容

1f40这里是一个1000大小的int数组,推测这里可能是一个用来存放数据的数组,记为data

这个可以看出2ef0应该是rbp,指令是mov rbp, rsp

根据这个可以看出栈是用rsp作为index,所以数据结构开头的1000个int应该是stack,这个指令是leave

这个应该是ret指令

这个稍微有些复杂,实际上就是把rbp的地址转成字符串,再转回地址到变量i,然后根据变量i去栈里找到对应的idx,再压入栈里,可以理解成push rbp,其他指令类似,最终效果:

然后根据这些结果,我们就可以写个脚本把binary转换成汇编代码了

解题

codes = open("binary", "rb").read()reg_list = {    0: 'rax',    1: 'rbx',    2: 'rcx',    3: 'rdx'}for i in range(len(codes) // 12):    code = codes[i*12:i*12+12]    instrument = int.from_bytes(code[0:4], 'little')    data1 = int.from_bytes(code[4:8], 'little')    data2 = int.from_bytes(code[8:12], 'little')    print("{:3}. {} -> ".format(i, hex(int.from_bytes(code, 'little'))[2:].rjust(24, '0')), end='')
if instrument == 0: print("push {}".format(data1), end='')
if instrument == 1: print("push {}".format(reg_list[data1]), end='')
if instrument == 2: print("pop {}".format(reg_list[data1]), end='')
if instrument == 3: # mov if data1 >= 32 and data1 <= 126: char = chr(data1) else: char = '\\x' + hex(data1)[2:].rjust(2, '0') print("mov {}, {}({})".format(reg_list[data2], hex(data1), char), end='')
if instrument == 4: print("mov {}, data[{}]".format(reg_list[data2], data1), end='')
if instrument == 5: print("add {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 6: print("sub {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 7: print("mul {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 8: print("div {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 9: print("xor {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 10: print("mov {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 11: # print("mov data[{}], {}".format(data2, reg_list[data1]), end='')
if instrument == 12: print("mov data[{}], {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 13: print("inc {}".format(reg_list[data1]), end='')
if instrument == 14: print("dec {}".format(reg_list[data1]), end='')
if instrument == 15: print("cmp {}, {}".format(data2, reg_list[data1]), end='')
if instrument == 16: print("cmp {}, {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 17: print("jl code[{}]".format(data1), end='')
if instrument == 18: print("push {}; call code[{}]".format(data2, data1), end='')
if instrument == 19: print("push rbp", end='')
if instrument == 20: print("mov rbp, rsp", end='')
if instrument == 21: print("mov rsp, rbp", end='')
if instrument == 22: print("pop rbp", end='')
if instrument == 23: print("pop rip, ret", end='')
if instrument == 24: print("mov [rsp-{}], {}".format(((~data1+1)&0xffffffff), reg_list[data2]), end='')
if instrument == 25: print("add {}, {}".format(reg_list[data2], data1), end='')
if instrument == 26: print("sub {}, {}".format(reg_list[data2], data1, end=''))
if instrument == 27: print("mov {}, data[{}]".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 28: print("mov data[{}], {}".format(reg_list[data2], reg_list[data1]), end='')
if instrument == 29: print("jne code[{}]".format(data1), end='')
if instrument == 30: print("jmp code[{}]".format(data1), end='')
if instrument == 50: print("in rax(int)", end='')
if instrument == 51: print("out rax(int)", end='')
if instrument == 52: print("out rax(char)", end='')
if instrument == 53: print("in rax(char)", end='')
if instrument == 54: print("hlt", end='')
print()

得到一个结果:

  0. 000000000000008f00000003 -> mov rax, 0x8f(\x8f)  1. 00000000000000000000000b -> mov data[0], rax  2. 000000000000007700000003 -> mov rax, 0x77(w)  3. 00000001000000000000000b -> mov data[1], rax  4. 000000000000003300000003 -> mov rax, 0x33(3)  5. 00000002000000000000000b -> mov data[2], rax  6. 000000000000006e00000003 -> mov rax, 0x6e(n)  7. 00000003000000000000000b -> mov data[3], rax  8. 000000000000006c00000003 -> mov rax, 0x6c(l)  9. 00000004000000000000000b -> mov data[4], rax 10. 000000000000003100000003 -> mov rax, 0x31(1) 11. 00000005000000000000000b -> mov data[5], rax 12. 000000000000006e00000003 -> mov rax, 0x6e(n) 13. 00000006000000000000000b -> mov data[6], rax 14. 000000000000006700000003 -> mov rax, 0x67(g) 15. 00000007000000000000000b -> mov data[7], rax 16. 000000000000006c00000003 -> mov rax, 0x6c(l) 17. 00000008000000000000000b -> mov data[8], rax 18. 000000000000006f00000003 -> mov rax, 0x6f(o) 19. 00000009000000000000000b -> mov data[9], rax 20. 000000000000007600000003 -> mov rax, 0x76(v) 21. 0000000a000000000000000b -> mov data[10], rax 22. 000000000000003300000003 -> mov rax, 0x33(3) 23. 0000000b000000000000000b -> mov data[11], rax 24. 000000000000006a00000003 -> mov rax, 0x6a(j) 25. 0000000c000000000000000b -> mov data[12], rax 26. 000000000000006300000003 -> mov rax, 0x63(c) 27. 0000000d000000000000000b -> mov data[13], rax 28. 000000000000006800000003 -> mov rax, 0x68(h) 29. 0000000e000000000000000b -> mov data[14], rax 30. 000000000000006500000003 -> mov rax, 0x65(e) 31. 0000000f000000000000000b -> mov data[15], rax 32. 000000000000003100000003 -> mov rax, 0x31(1) 33. 00000010000000000000000b -> mov data[16], rax 34. 000000000000003400000003 -> mov rax, 0x34(4) 35. 00000011000000000000000b -> mov data[17], rax 36. 000000000000003300000003 -> mov rax, 0x33(3) 37. 00000012000000000000000b -> mov data[18], rax 38. 000000000000009e00000003 -> mov rax, 0x9e(\x9e) 39. 00000013000000000000000b -> mov data[19], rax 40. 00000000000000c000000003 -> mov rax, 0xc0(\xc0) 41. 00000014000000000000000b -> mov data[20], rax 42. 00000000000000cd00000003 -> mov rax, 0xcd(\xcd) 43. 00000015000000000000000b -> mov data[21], rax 44. 00000000000000cb00000003 -> mov rax, 0xcb(\xcb) 45. 00000016000000000000000b -> mov data[22], rax 46. 000000000000008500000003 -> mov rax, 0x85(\x85) 47. 00000017000000000000000b -> mov data[23], rax 48. 000000000000008400000003 -> mov rax, 0x84(\x84) 49. 00000018000000000000000b -> mov data[24], rax 50. 000000000000009800000003 -> mov rax, 0x98(\x98) 51. 00000019000000000000000b -> mov data[25], rax 52. 000000000000008e00000003 -> mov rax, 0x8e(\x8e) 53. 0000001a000000000000000b -> mov data[26], rax 54. 000000000000009d00000003 -> mov rax, 0x9d(\x9d) 55. 0000001b000000000000000b -> mov data[27], rax 56. 000000000000008300000003 -> mov rax, 0x83(\x83) 57. 0000001c000000000000000b -> mov data[28], rax 58. 000000000000008e00000003 -> mov rax, 0x8e(\x8e) 59. 0000001d000000000000000b -> mov data[29], rax 60. 000000000000008e00000003 -> mov rax, 0x8e(\x8e) 61. 0000001e000000000000000b -> mov data[30], rax 62. 00000000000000d200000003 -> mov rax, 0xd2(\xd2) 63. 0000001f000000000000000b -> mov data[31], rax 64. 00000000000000fb00000003 -> mov rax, 0xfb(\xfb) 65. 00000020000000000000000b -> mov data[32], rax 66. 000000000000001a00000003 -> mov rax, 0x1a(\x1a) 67. 00000021000000000000000b -> mov data[33], rax 68. 000000000000005700000003 -> mov rax, 0x57(W) 69. 00000022000000000000000b -> mov data[34], rax 70. 000000000000005200000003 -> mov rax, 0x52(R) 71. 00000023000000000000000b -> mov data[35], rax 72. 00000000000000ef00000003 -> mov rax, 0xef(\xef) 73. 00000024000000000000000b -> mov data[36], rax 74. 000000000000006900000003 -> mov rax, 0x69(i) 75. 000000000000000000000034 -> out rax(char) 76. 000000000000006e00000003 -> mov rax, 0x6e(n) 77. 000000000000000000000034 -> out rax(char) 78. 000000000000007000000003 -> mov rax, 0x70(p) 79. 000000000000000000000034 -> out rax(char) 80. 000000000000007500000003 -> mov rax, 0x75(u) 81. 000000000000000000000034 -> out rax(char) 82. 000000000000007400000003 -> mov rax, 0x74(t) 83. 000000000000000000000034 -> out rax(char) 84. 000000000000002000000003 -> mov rax, 0x20( ) 85. 000000000000000000000034 -> out rax(char) 86. 000000000000007900000003 -> mov rax, 0x79(y) 87. 000000000000000000000034 -> out rax(char) 88. 000000000000006f00000003 -> mov rax, 0x6f(o) 89. 000000000000000000000034 -> out rax(char) 90. 000000000000007500000003 -> mov rax, 0x75(u) 91. 000000000000000000000034 -> out rax(char) 92. 000000000000007200000003 -> mov rax, 0x72(r) 93. 000000000000000000000034 -> out rax(char) 94. 000000000000002000000003 -> mov rax, 0x20( ) 95. 000000000000000000000034 -> out rax(char) 96. 000000000000006600000003 -> mov rax, 0x66(f) 97. 000000000000000000000034 -> out rax(char) 98. 000000000000006c00000003 -> mov rax, 0x6c(l) 99. 000000000000000000000034 -> out rax(char)100. 000000000000006100000003 -> mov rax, 0x61(a)101. 000000000000000000000034 -> out rax(char)102. 000000000000006700000003 -> mov rax, 0x67(g)103. 000000000000000000000034 -> out rax(char)104. 000000000000003a00000003 -> mov rax, 0x3a(:)105. 000000000000000000000034 -> out rax(char)106. 000000000000000a00000003 -> mov rax, 0xa(\x0a)107. 000000000000000000000034 -> out rax(char)108. 000000010000002500000003 -> mov rbx, 0x25(%)109. 000000000000000000000035 -> in rax(char)110. 00000001000000000000000c -> mov data[rbx], rax111. 00000000000000010000000d -> inc rbx112. 00000036000000010000000f -> cmp 54, rbx113. 000000000000006d00000011 -> jl code[109]114. 000000010000000000000003 -> mov rbx, 0x0(\x00)115. 000000000000000100000001 -> push rbx116. 000000020000000000000004 -> mov rcx, data[0]117. 000000000000000200000001 -> push rcx118. 00000077000000dd00000012 -> push 119; call code[221]119. 000000000000000200000002 -> pop rcx120. 000000000000000100000002 -> pop rbx121. 00000000000000000000000b -> mov data[0], rax122. 00000000000000010000000d -> inc rbx123. 00000012000000010000000f -> cmp 18, rbx124. 00000000000000730000001d -> jne code[115]125. 000000000000001300000004 -> mov rax, data[19]126. 000000010000002500000004 -> mov rbx, data[37]127. 000000010000000000000010 -> cmp rbx, rax128. 00000000000000d20000001d -> jne code[210]129. 000000000000001400000004 -> mov rax, data[20]130. 000000010000002600000004 -> mov rbx, data[38]131. 000000010000000000000010 -> cmp rbx, rax132. 00000000000000d20000001d -> jne code[210]133. 000000000000001500000004 -> mov rax, data[21]134. 000000010000002700000004 -> mov rbx, data[39]135. 000000010000000000000010 -> cmp rbx, rax136. 00000000000000d20000001d -> jne code[210]137. 000000000000001600000004 -> mov rax, data[22]138. 000000010000002800000004 -> mov rbx, data[40]139. 000000010000000000000010 -> cmp rbx, rax140. 00000000000000d20000001d -> jne code[210]141. 000000000000001700000004 -> mov rax, data[23]142. 000000010000002900000004 -> mov rbx, data[41]143. 000000010000000000000010 -> cmp rbx, rax144. 00000000000000d20000001d -> jne code[210]145. 000000000000001800000004 -> mov rax, data[24]146. 000000010000002a00000004 -> mov rbx, data[42]147. 000000010000000000000010 -> cmp rbx, rax148. 00000000000000d20000001d -> jne code[210]149. 000000000000001900000004 -> mov rax, data[25]150. 000000010000002b00000004 -> mov rbx, data[43]151. 000000010000000000000010 -> cmp rbx, rax152. 00000000000000d20000001d -> jne code[210]153. 000000000000001a00000004 -> mov rax, data[26]154. 000000010000002c00000004 -> mov rbx, data[44]155. 000000010000000000000010 -> cmp rbx, rax156. 00000000000000d20000001d -> jne code[210]157. 000000000000001b00000004 -> mov rax, data[27]158. 000000010000002d00000004 -> mov rbx, data[45]159. 000000010000000000000010 -> cmp rbx, rax160. 00000000000000d20000001d -> jne code[210]161. 000000000000001c00000004 -> mov rax, data[28]162. 000000010000002e00000004 -> mov rbx, data[46]163. 000000010000000000000010 -> cmp rbx, rax164. 00000000000000d20000001d -> jne code[210]165. 000000000000001d00000004 -> mov rax, data[29]166. 000000010000002f00000004 -> mov rbx, data[47]167. 000000010000000000000010 -> cmp rbx, rax168. 00000000000000d20000001d -> jne code[210]169. 000000000000001e00000004 -> mov rax, data[30]170. 000000010000003000000004 -> mov rbx, data[48]171. 000000010000000000000010 -> cmp rbx, rax172. 00000000000000d20000001d -> jne code[210]173. 000000000000001f00000004 -> mov rax, data[31]174. 000000010000003100000004 -> mov rbx, data[49]175. 000000010000000000000010 -> cmp rbx, rax176. 00000000000000d20000001d -> jne code[210]177. 000000000000002000000004 -> mov rax, data[32]178. 000000010000003200000004 -> mov rbx, data[50]179. 000000010000000000000010 -> cmp rbx, rax180. 00000000000000d20000001d -> jne code[210]181. 000000000000002100000004 -> mov rax, data[33]182. 000000010000003300000004 -> mov rbx, data[51]183. 000000010000000000000010 -> cmp rbx, rax184. 00000000000000d20000001d -> jne code[210]185. 000000000000002200000004 -> mov rax, data[34]186. 000000010000003400000004 -> mov rbx, data[52]187. 000000010000000000000010 -> cmp rbx, rax188. 00000000000000d20000001d -> jne code[210]189. 000000000000002300000004 -> mov rax, data[35]190. 000000010000003500000004 -> mov rbx, data[53]191. 000000010000000000000010 -> cmp rbx, rax192. 00000000000000d20000001d -> jne code[210]193. 000000000000002400000004 -> mov rax, data[36]194. 000000010000003600000004 -> mov rbx, data[54]195. 000000010000000000000010 -> cmp rbx, rax196. 00000000000000d20000001d -> jne code[210]197. 000000000000007200000003 -> mov rax, 0x72(r)198. 000000000000000000000034 -> out rax(char)199. 000000000000006900000003 -> mov rax, 0x69(i)200. 000000000000000000000034 -> out rax(char)201. 000000000000006700000003 -> mov rax, 0x67(g)202. 000000000000000000000034 -> out rax(char)203. 000000000000006800000003 -> mov rax, 0x68(h)204. 000000000000000000000034 -> out rax(char)205. 000000000000007400000003 -> mov rax, 0x74(t)206. 000000000000000000000034 -> out rax(char)207. 000000000000000a00000003 -> mov rax, 0xa(\x0a)208. 000000000000000000000034 -> out rax(char)209. 00000000000000dc0000001e -> jmp code[220]210. 000000000000007700000003 -> mov rax, 0x77(w)211. 000000000000000000000034 -> out rax(char)212. 000000000000007200000003 -> mov rax, 0x72(r)213. 000000000000000000000034 -> out rax(char)214. 000000000000006f00000003 -> mov rax, 0x6f(o)215. 000000000000000000000034 -> out rax(char)216. 000000000000006e00000003 -> mov rax, 0x6e(n)217. 000000000000000000000034 -> out rax(char)218. 000000000000006700000003 -> mov rax, 0x67(g)219. 000000000000000000000034 -> out rax(char)220. 000000000000000000000036 -> hlt221. 000000000000000000000013 -> push rbp222. 000000000000000000000014 -> mov rbp, rsp223. 00000003fffffffd00000018 -> mov [rsp-3], rdx224. 000000030000002500000019 -> add rdx, 37225. 00000002000000030000001b -> mov rcx, data[rdx]226. 00000003fffffffd00000018 -> mov [rsp-3], rdx227. 000000030000000100000019 -> add rdx, 1228. 00000000000000030000001b -> mov rax, data[rdx]229. 00000003fffffffd00000018 -> mov [rsp-3], rdx230. 000000020000000300000005 -> add rcx, rdx231. 000000000000000200000009 -> xor rax, rcx232. 00000003fffffffe00000018 -> mov [rsp-2], rdx233. 000000000000000300000009 -> xor rax, rdx234. 00000003fffffffd00000018 -> mov [rsp-3], rdx235. 000000030000002500000019 -> add rdx, 37236. 00000003000000000000001c -> mov data[rdx], rax237. 000000000000000000000015 -> mov rsp, rbp238. 000000000000000000000016 -> pop rbp239. 000000000000000000000017 -> pop rip, ret

汇编不是很长,直接看汇编大概也可以看懂逻辑,实际上就是经典异或,根据汇编写出逆向脚本:

data = [119, 51, 110, 108, 49, 110, 103, 108, 111, 118, 51, 106, 99, 104, 101, 49, 52, 51]data2 = [158, 192, 205, 203, 133, 132, 152, 142, 157, 131, 142, 142, 210, 251, 26, 87, 82, 239]flag = ''temp = 0x8ffor i in range(18):    flag += chr(((data2[i]) ^ data[i] ^ temp) - i)    temp = data2[i]
print(flag)

总结

在逆向过程中如果能有效恢复结构体和数据类型,会让代码可读性提高很多,可以加快理解程序功能。



文章来源: http://mp.weixin.qq.com/s?__biz=MzI3NDEzMDgzNw==&mid=2247484617&idx=1&sn=81482b04e1dc50c6a94b14a826a6f251&chksm=eb19f633dc6e7f25d843710ff33e392fd1d1f53ddfaeab44690d7d65ad35276c1e7d84901070#rd
如有侵权请联系:admin#unsafe.sh