Linux 32-bit Binary Exploitation – Assembly Basics Part I

Hello Everyone, Welcome to a New series of Binary Exploitation, this is the first part of binary exploitation, there are gonna be many more parts for this binary exploitation series. In this Series I will start with Assembly Basics, required concepts for basic binary exploitation in layman terms. So, if you are new to either assembly or binary exploitation or buffer overflow – you are pretty much welcome here, because All the basics required for binary exploitation are explained in detail in this series. I am putting a lot of thought into this to make it as easy as possible and trying to cover most important and basic concepts required to learn assembly and start with binary exploitation. If you are interested in going directly to Binary Exploitation - here is the Part II Linux 32-bit binary exploitation.

This Series consists of 32-bit Assembly Basics, Concepts, Binary Exploitation, Buffer Overflow – Return to libc exploitation.

1. What is Assembly

2. Why Assembly

3. Decimal

4. Binary

Binary to Decimal Conversion
Decimal to Binary Conversion

5. Hexadecimal

Hexadecimal to Decimal Conversion
Decimal to Hexadecimal Conversion

6. Segment & Offset

7. Data Types in Assembly

8. Registers

a. General Purpose Registers

b. Segment Registers

c. Stack Registers

d. Special Purpose Registers

9. Structure of Assembly Program

10. Linux System Calls

11. Executing an Assembly Program

12. Writing a Hello World Program in Assembly language

Before Directly Jumping into Binary Exploitation, Some basics are important, lets hop on to them first.

You need to understand the basics of assembly, Registers, Hex, Binary, Hexadecimal. I will explain about registers and assembly basics which are required for this tutorial. The binary I am going to exploit in this series is an intended vulnerable binary vulnerable to Buffer overflow – Return to libc attack.

Assembly is a Low-level programming language. Programs written in assembly languages are compiled by an assembler. Every assembler has its own assembly language, which is designed for one specific computer architecture.

1. If Something crashes on windows/linux – you will get a response it usually returns the location/action that caused the error, if you are to solve that error – knowing assembly is the only way to trouble shoot low level memory problems.

2. If you need precise control over what your program is doing, a high-level language is never powerful enough to give you full security.

3. Even the most optimized high-level language compiler is still just a general compiler, thus the code it produces is also general/slow code. If you have a specific task, it will run faster in optimized assembly than in any other language.

4. I main reason would be the programming languages that you already know like python, java, c++ gives you limited functions, features but in assembly you are limited by the hardware you own only, you can play around with memory and CPU instructions to a great extent – which is pretty much fun.

1) Decimal: The decimal system is a base 10 system, meaning that it consists of 10 numbers that are used to make all the numbers 0 -9.

Example: Let’s take 275

	Hundreds	Tens	Units
Digit	2	7	5
Explanation	2x10^2	7x10^1	5x10^0
Value	200	70	5

So, the output is 200+70+5 = 275.

Lets take any example of 3456

	Thousands	Hundreds	Tens	Units
Digit	3	4	5	6
Explanation	3x10^3	4x10^2	5x10^1	6x10^0
Value	3000	400	50	6

So, the output is 3000+400+50+6 = 3456

Well, that’s how decimal system works. I guess you got no doubts regarding this. So, Let’s move onto the next one.

2) Binary: Binary system is a base 2 system. It consists of only (2 Values à 0,1) 0’s and 1’s because computer can understand only binary, so you should understand how a binary is being converted.

Binary to Decimal Conversion:

You need to multiply the Binary value with “unit value with the power of (2)”.

Let’s take the binary value 11001 and convert it to decimal:

		Total
1 x	2^4	16
1 x	2^3	8
0 x	2^2	0
0 x	2^1	0
1 x	2^0	1

Now add the total à 16+8+0+0+1 = 25.

25 is the decimal number of Binary numbers 11001. That’s how you do it. It might look complicated at first glance, but if you try it once, you will get it in an instant.

Decimal to Binary Conversion: This is much easier than converting binary to decimal. All you need to do is take the remainder and paste it as it is in its unit’s place.

Let’s take the number 275

275/2 = 1

137/2 = 1

68/2 = 0

34/2 = 0

17/2 = 1

8/2 = 0

4/2 = 0

2/2 = 0
0/2 =1

So, the binary value of Decimal 275 is 100010011.

Points to Note:

· Divide the original number by 2, if it divides evenly the remainder is 0, or else 1

· Repeat until you get 0

· Usually 1 represents TRUE, and 0 FALSE

· 001001100 is equal to 1001100, the zero’s in at the start of the value represent nothing – you can leave them alone XD

3) Hexadecimal: Hexadecimal is base 16 system. Everything related to memory is a multiple of 4, for example memory allocation starts with 8 bits,8bytes,16 bytes,32,64,128,256,512 and so on. Since hexadecimal is a base 16 system – it’s perfect for computers to use hexadecimal. Also, Hex is nothing but hexadecimal, Hex is the short form for hexadecimal, so if don’t think they are different.

You need to remember these before getting into hexadecimal conversion

Hex	Decimal	Binary
0	0	0
1	1	1
2	2	10
3	3	11
4	4	100
5	5	101
6	6	110
7	7	111
8	8	1000
9	9	1001
A	10	1010
B	11	1011
C	12	1100
D	13	1101
E	14	1110
F	15	1111

Hexadecimal to Decimal Conversion:

Lets take “D80” Hexadecimal value as an example to convert D80 into Decimal value.

D80	16^Unit’s Position	Decimal to Hex Value *16^ units’ position	Total
D x	16^2	13 x 256	3328
8 x	16^1	8 x 16	128
0 x	16^0	0 x 1	0

So, the total is 3328+128+0 = 3456. So, D80 is the hexadecimal value for 3456 Decimal number.

Decimal to Hexadecimal Conversion:

Let’s take the Decimal value 3456 and convert it to Hexadecimal, you should always go in Little Endian format (Reverse Order)

3456/16 = 216

216*16 = 3456

3456-3456 =0. So, the Hexadecimal value for Decimal 0 is 0

216/16 = 13

13*16 = 208

216-208 = 8. So, the Hexadecimal value for Decimal 8 is 8

13/16 = 0

0*16 = 0

13-0 = 13. So, the Hexadecimal Value for Decimal 13 is D

Finally, the D80 is the hexadecimal value of Decimal value 3456. Hope you understood this, if not you can drop a comment below.

Note:

Hex = Hexadecimal
In windows environments, hex is mostly represented as 0D80
In Unix Environments, hex is represented as 0xD80

Segment & Offset:

Everything on your computer is connected through a series of wires called the BUS. The BUS to the RAM is 16 bits. So, when the processor needs to write to the RAM, it does so by sending the 16-bit location through the bus. In the old days this meant that computers could only have 65535 bytes of memory (16 bits = 1111111111111111 = 65535).

That was plenty back then, but today that's not quite enough. So, designers came up with a way to send 20 bits over the bus, thus allowing for a total of 1 MB of memory.

Memory is segmented into a collection of bytes called Segments and can be access by specifying the Offset number within those segments. So, whenever the processor wants to access data, it first sends the Segment number, followed by the Offset number.

Before you get into assembly programming, you need to understand the data types & registers in assembly. Registers are the most important things in assembly. without registers - there is no memory allocations and processing. For that reason i will explain the data types in assembly, List out the types of registes with a very brief explanation.

Bits are the smallest unit of data on a computer. Each bit can only represent 2 numbers, 1 and 0. Bits are useless because they're so damn small, so we got the nibble. A nibble is a collection of 4 bits. The most important data structure used by your computer is a Byte. A byte is the smallest unit that can be accessed by your processor. It is made up of 8 bits, or 2 nibbles. A word is simply 2 bytes, or 16 bits. Originally a Word was the size of the BUS from the CPU to the RAM. Today most computers have at least a 32bit bus but, most people were used to 1 word = 16 bits, so they decided to keep it that way.

Byte	8 Bits
Word	16 bits (2 Bytes)
Double Word (Dword)	32 Bits (2 Words)
Quad Word (Qword)	64 Bits (2 Dwords)

A processor contains small areas that can store data. They are too small to store files, instead they are used to store information while the program is running.

Registers can be divided into following categories:

1) General Purpose Registers: All general-purpose registers are 16 bit and can be broken up into two 8-bit registers. For example, AX can be broken up into AL and AH.

· AX – Accumulator:

Made up of: AH, AL
Common uses: Math operations, I/O operations, INT 21

· BX – Base:

Made up of: BH, BL
Common uses: Base or Pointer

· CX – Counter

Made up of: CH, CL
Common uses: Loops and Repeats

· DX – Displacement

Made up of: DH, DL
Common uses: Various data, character output

When the x86 came out it added 4 new registers to that category: EAX, EBX, ECX, and EDX. The E stands for Extended, and that's just what they are, 32bit extensions to the originals.

2) Segment Registers:

CS - Code Segment. The memory block that stores code

DS - Data Segment. The memory block that stores data

ES - Extra Segment. Commonly used for video stuff

SS - Stack Segment. Register used by the processor to store return addresses from routines

3) Stack Registers:

BP - Base pointer. Used in conjunction with SP for stack operations
SP - Stack Pointer.

4) Special Purpose Registers:

IP - Instruction Pointer. Holds the offset of the instruction being executed

Flags - These are a bit different from all other registers. A flag register is only 1 bit in size. It's either 1 (true), or 0 (false). There are several flag registers including the Carry flag, Overflow flag, Parity flag, Direction flag, and more. You don't assign numbers to these manually. The value automatically set depending on the previous instruction.

text = assembly instructions are stored

data & bss = to store variables

heap = Location of memory where you can store and manipulate data dynamically using some programming language

Stack = managed by the compiler, it is at the bottom

.data --> all initialized data -- Strings
.bss --> all un-initialized data
.text --> Program instructions   -- Executable code
          .global _start          --> External callable routines; Libraries 
               
                 _start                   --> start of a program; Main() routine

.byte = 1 byte

.ascii = string

.asciz = Null Terminated String

.int = 32-bit integer

.short = 16-bit integer

.float = single precision floating point number

.double = double precision floating point number

.comm -- declares common memory area

.lcomm - declares local common memory area

Space created at Runtime; whatever you define here is not going to occupy any space inside the executable which shall be created using assembler and linker.

The Next important concept required to understand 32-bit assembly in Linux is Linux System calls, these are used to make requests for any user to get some output.

Before you start programming assembly, you need to understand how Linux system calls works as we will be using them a lot. We can use these system calls to execute commands, functions. In Assembly programming Sys calls can be used with libraries which can make requests to kernel modules and get the required output. Sys calls are helpful in buffer overflow exploitations as well.

Examples: exit(), read(), write() etc.

Arguments to syscall: whenever you are going to invoke a Linux system call, you need to load appropriate registers with appropriate arguments which system call will require.

EAX - System call number

EBX - First Argument

ECX - Second Argument

EDX - Third Argument

ESI - Fourth Argument

EDI - Fifth Argument

for calls which require more than 5 arguments, we pass a pointer to the structure containing arguments.

System calls are invoked by processes using a process interrupt - INT 0x80

when interrupt is invoked kernel calls the system call interrupt handler which takes all arguments and does required based on system call number.

Assembly Program to Execute System call:

.text
.global _start

_start:

      movl $1, %eax
      movl $0, %ebx
      int $0x80

Defining a system call:

exit(0) --> is the sys call used to exit a program, Explanation for the above program.

1. sys call number for exit() is 1, so load EAX with 1, mov instruction load the value 1 into eax register(%eax)

movl $1, %eax

2. "Status" is lets say "0" - EBX must be loaded with 0

movl $0, %ebx

3. Calling the syscall - Raising the interrupt 0x80

int 0x80

1) An unix architecture Assembly program in most of the cases should be saved with an extension ".s". So, always save your assembly program with an extension of .s

2) You need to create an Object file and Compiling the program using gnu assembler

as -o program.o program.s

3) Use linker to make it into an executable

ld -o program program.o

Writing a hello world program in assembly is not as easy as in other programming languages. You need to get a good understanding of system call functions write and exit. So, let me explain the how to write a Hello World program in Assembly.

Step 1: write() syscall to print the "Hello world" message

Step 2: use exit() to exit the program

So, how to you write some data in assembly? You need to understand the underlying functions used by write function and syscall.

We need to follow this to write data in Assembly

write() takes 3 arguments:

· file descriptor in which it needs to write,

· buffer - where the actual data written is to be stored

· Count: number of bytes - which needs to be written in the beginning

There file descriptor numbers for all standard streams – there are 3 standard streams in total. Standard input, Standard output, Standard error. In the same way sys call number for write() is 4.

Here are the commonly used file descriptor numbers

1) stdin, file descriptor 0

2) stdout, file descriptor 1

3) stderr, file descriptor 2

As Explained above write() takes a syscall, file descriptor, buffer and count. All These 4 should be passed for successful execution.

1) We need to call write syscall to write something. Sys call number for write() is 4, Store ‘4’ in EAX

2) After writing some data, as we need to output the info, we need to use “STDOUT”. The File descriptor for STDOUT is “1”. So, store 1 in EBX

3) The data to be written is the buffer, So Buf = pointer to a memory location containing "Hello World" String. Store “Hello World” in ECX.

4) Size of the string should be given as a Count, So, pass “11” which is the size of “Hello World” (including space) in EDX.

.data

HelloWorldString:
       
         .ascii "Hello World"
.text
.global _start
_start:
       
#load all the arguments for the write
       
       movl $4, %eax       
       movl $1, %ebx       

       movl $HelloWorldString, %ecx       

       movl $11, %edx       

       int $0x80        

#Need to exit the syscall       

       movl $1, %eax       

       movl $0, %ebx       

       int $0x80

Executing:

Save the file as hello.s

as -o hello.o hello.s

ld -o hello hello.o

./hello

That's it, for this post. As we are done with least of the basics, I think you can at least get a vague idea of what is going on in the debugger if you read this whole article. In the next post of this series, I will explain Linux 32-bit binary with an example. So, stay tuned and if you have any feedback – please comment below.

================== HACKING DREAM ===================

Main Principle of My Hacking Dream is to Promote Hacking Tricks and Tips to All the People in the World, So That Everyone will be Aware of Hacking and protect themselves from Getting Hacked. Hacking Don’t Need Agreements.

I Will Be Very Happy To Help You, So For Queries or Any Problem Comment Below or You Can Send out a Mail At [email protected]