Assembly Language
Table of Contents
1. Introduction
Assembly language is the "highest low-level language" and an umbrella term for human readable code which is machine dependent. There are no variables, we work with the registers and memory directly.
The syntax for assembly language varies depending on the CPU architecture and assembler. (For example, there are two different assembly language syntaxes for x86 assembly: the Intel syntax [commonly used in DOS and Windows] and the AT&T syntax [commonly used everywhere else].) It appears the main families of CPU architectures currently available are: x86, x64, and ARM. Be careful.
2. x86 Architectures
The registers originally were 16-bit, later became 32-bit and prefixed
by E
, and then 64-bit registers were prefixed with R
.
There are several special purposes registers:
- Accumulator registers: AX, EAX
- Counter registers (for loops and strings): CL, CH, CX, ECX, RCX
- Base index: BL, BH, BX, EBX
- Stack pointer: SP, ESP, RSP
- Frame pointer (a.k.a., stack base pointer): BP, EBP, RBP
- Source index for string operations: SI, ESI, RSI
- Destination index for string operations: DI, EDI, RDI
- Instruction pointer: IP, EIP, RIP
Following Bryant and O'Hallaron, we will write:
- \(E_{a}\) for a generic register name, its value is denoted \(R[E_{a}]\)
- Examples:
EAX
(resp.%EAX
) for Intel (resp. AT&T) syntax
- Examples:
- \(Imm\) for an immediate value, which can be used as a memory address
whose contents is denoted \(M[Imm]\)
- Example:
0x8048d8e
,42
, etc.
- Example:
- \((E_{a})\) for indirect addressing, i.e., using a register's value as the memory address and obtaining the contents at that address, i.e., \(M[R[E_{a}]]\)
- \(Imm(E_{a}, E_{i}, s)\) for scaled addressing, accessing the contents of the memory address \(M[Imm+R[E_{a}]+(R[E_{i}]\cdot s)]\)
Then there are about a million different instructions for the x86 family.
2.1. Functions Using the C Calling Convention
Compilers will take a high-level language and compile it down into assembly (or directly into machine code). C will usually preserve the function name, or mangle it in some consistent manner. For example,
/* exchange.c */ int exchange(int *xp, int *yp) { int x = *xp; *xp = *yp; *yp = x; return x; }
When we compile it using
gcc -O0 -fverbose-asm -S -o exchange.S exchange.c
, the assembly code:
exchange: .LFB0: .cfi_startproc endbr64 pushq %rbp # .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp #, .cfi_def_cfa_register 6 movq %rdi, -24(%rbp) # xp, xp movq %rsi, -32(%rbp) # yp, yp # exchange.c:3: int x = *xp; movq -24(%rbp), %rax # xp, tmp85 movl (%rax), %eax # *xp_3(D), tmp86 movl %eax, -4(%rbp) # tmp86, x # exchange.c:4: *xp = *yp; movq -32(%rbp), %rax # yp, tmp87 movl (%rax), %edx # *yp_5(D), _1 # exchange.c:4: *xp = *yp; movq -24(%rbp), %rax # xp, tmp88 movl %edx, (%rax) # _1, *xp_3(D) # exchange.c:5: *yp = x; movq -32(%rbp), %rax # yp, tmp89 movl -4(%rbp), %edx # x, tmp90 movl %edx, (%rax) # tmp90, *yp_5(D) # exchange.c:6: return x; movl -4(%rbp), %eax # x, _8 # exchange.c:7: } popq %rbp # .cfi_def_cfa 7, 8 ret .cfi_endproc
Roughly what happens in this code:
- We look up the address
*xp
which corresponds to the contents of-24(%rbp)
, then store it in%rax
, we take the value stored in memory at that address and store it in%eax
, and assign that tox
which corresponds in assembly as-4(%rbp)
- We look up
*yp
which corresponds to-32(%rbp)
which is the addressyp
points to, then store that address in%rax
, and look up its contents in memory and store it in%edx
- We then look up the address of
xp
and store it in%rax
, then update the value of \(M[E_{rax}]\) to be the value of%edx
(i.e., the value of*yp
) - We take the address of
yp
and store it in%rax
, we take the value ofx
and store it in%edx
, and then update the contents of \(M[E_{rax}]\) to be the value of%edx
(i.e., the value ofx
) - The return value is stored in
%eax
- We cleanup the function call, i.e., we
pop %rbp
to restore the base pointer andret
to end the function call.
3. References
- Zeyuan Hu, Understanding how function call works, 30 July 2017
- Jonathan Bartlett, Programming from the Ground Up
- Randal Bryant and David O'Hallaron, Computer Systems: A Programmer's Perspective. Third edition, see especially section 3.7. (DO NOT GET the international paperback edition, it's riddled with errors.)
3.1. MMIX Assembly Language
- Donald Knuth, Art of Computer Programming, volume 1, Fascicles 1
- Introduction to MMIXAL
- Documentation for MMIXAL emulator