x86-64#
Note
We use AT&T assembly syntax. This is what clang
uses. In AT&T syntax the source operand comes before the destination operand, i.e., instr src, dest
.
Tip
You can use the x86-64 emulator x64-emu to experiment and practice x86-64 assembly in your browser — it supports all (and only) the instructions and features described here.
Registers that we will use#
Register |
Description |
Preserved Across Calls |
---|---|---|
|
Instruction pointer; cannot be manipulated directly |
Irrelevant but no! |
|
General purpose; stores return value |
No |
|
General purpose; sometimes also used as the base pointer |
Yes |
|
General purpose; used for 4th argument |
No |
|
General purpose; used for 3rd argument |
No |
|
Stack pointer |
Yes (automatically) |
|
Can be used as base pointer |
Yes |
|
General purpose; used for 2nd argument |
No |
|
General purpose; used for 1st argument |
No |
|
General purpose; used for 5th argument |
No |
|
General purpose; used for 6th argument |
No |
|
General purpose |
No |
|
General purpose |
No |
|
General purpose |
Yes |
|
General purpose |
Yes |
|
General purpose |
Yes |
|
General purpose |
Yes |
The stack#
In x86 the stack starts in a high address and grows towards lower addresses.
Pushing on the stack: first decrements the stack point (rsp) by the size of the object being pushed and then writes the value in memory pointed by the stack pointer.
Popping from the stack: reads the value from memory where the stack pointer points and then increments the stack pointer by the size of the object being popped.
To allocate 16 bytes space on the stack, e.g., for storing function’s local variables, we _decrement the stack by 16 bytes:
subq $16, %rsp
Endianness#
x86 is a little-endian architecture. This means that low-significance bytes are written at lower address. For example, the four byte number 0xFFA02B1C, when written to memory at address \(n\), is stored as follows:
Address: |
\(\cdots\) |
\(n\) |
\(n+1\) |
\(n+2\) |
\(n+3\) |
\(\cdots\) |
---|---|---|---|---|---|---|
Contents: |
\(\cdots\) |
1C |
2B |
A0 |
FF |
\(\cdots\) |
CPU flags#
Flag |
Meaning |
---|---|
|
Set if the last arithmetic/logic instruction resulted in a negative number |
|
Set if the last arithmetic/logic instruction resulted in zero |
|
Set if the last arithmetic/logic instruction resulted in an overflow |
|
Set if the last arithmetic/logic instruction produced a carry |
|
Set to 1 if number of 1’s resulting from the last arithmetic/logic instruction is even |
Instructions that we will use#
Below are categorized lists of instructions that we will use.
Note that when an instruction, like movq
, takes both a source and a destination operand, both the source and the destination operands cannot be memory at the same time.
Data movement instructions#
Instruction |
Description |
Affected Flags (OF, SF, ZF, CF, PF) |
Source Operand(s) |
Destination Operand |
---|---|---|---|---|
|
Move 64-bit value from source to destination |
- |
Register, Memory |
Register, Memory |
|
Load effective address into register |
- |
Memory |
Register |
Arithmetic/logic instructions#
Instruction |
Description |
Affected Flags (OF, SF, ZF, CF, PF) |
Source Operand(s) |
Destination Operand |
---|---|---|---|---|
|
Increment value by 1 |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Decrement value by 1 |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Negate value (two’s complement) |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Add source to destination |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Subtract source from destination |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Signed multiply destination by source |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Convert quadword ( |
- |
- |
|
|
Signed divide |
OF, SF, ZF, CF, PF |
Register, Memory |
|
|
Compare source and destination (sets flags) |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Bitwise NOT (complement) operation |
- |
Register, Memory |
Register, Memory |
|
Bitwise XOR destination with source |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Bitwise OR destination with source |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Bitwise AND destination with source |
OF, SF, ZF, CF, PF |
Register, Memory |
Register, Memory |
|
Shift left destination by count bits |
OF, SF, ZF, CF, PF |
Immediate, |
Register, Memory |
|
Arithmetic shift right destination by count bits |
OF, SF, ZF, CF, PF |
Immediate, |
Register, Memory |
|
Logical shift right destination by count bits |
OF, SF, ZF, CF, PF |
Immediate, |
Register, Memory |
Note
The instruction idivq
does not set flags; in fact there is no guarantee what values the flags will have after idivq
.
If there is an overflow (the quotient does not fit in rax
), or if the divisor is zero, idivq
will result in a trap (CPU level exception caught by the operating system and usually passed to language’s runtime).
Proper use of cqto
instruction should prevent overflows.
It is a good idea (this is what we do in this course) to (produce code to) check that the divisor is not zero before every division operation.
Note
The imulq
instruction multiplies two 64-bit values (source and destination).
The result is computed as a 128-bit number.
If the result is too big to fit in the 64-bit destination, both the OF and CF flags are set.
Note
The register cl
which is used to indicate the number of shifts in shifting operations is simply the lowest byte of the rcx
register.
We don’t have direct access to it in fragment we use in this course (explained in this document) but it can be set indirectly by setting rcx
accordingly.
Stack management#
Instruction |
Description |
Affected Flags (OF, SF, ZF, CF, PF) |
Source Operand(s) |
Destination Operand |
---|---|---|---|---|
|
Push value onto the stack |
- |
Register, Memory |
Stack |
|
Pop value from the stack |
- |
Stack |
Register, Memory |
Call and return#
Instruction |
Description |
Affected Flags (OF, SF, ZF, CF, PF) |
Source Operand(s) |
Destination Operand |
---|---|---|---|---|
|
Return from function |
- |
- |
- |
|
Call a function at destination |
- |
- |
Memory address |
Jumps#
Instruction |
Description |
Affected Flags (OF, SF, ZF, CF, PF) |
Source Operand(s) |
Destination Operand |
---|---|---|---|---|
|
Unconditionally jump to destination |
- |
- |
Memory address |
|
Jump if equal (ZF=1) |
- |
- |
Memory address |
|
Jump if not equal (ZF=0) |
- |
- |
Memory address |
|
Jump if greater (ZF=0, SF=OF) |
- |
- |
Memory address |
|
Jump if greater or equal (SF=OF) |
- |
- |
Memory address |
|
Jump if less (SF!=OF) |
- |
- |
Memory address |
|
Jump if less or equal (ZF=1 or SF!=OF) |
- |
- |
Memory address |
Setting memory conditionally#
Instruction |
Description |
Affected Flags (OF, SF, ZF, CF, PF) |
Source Operand(s) |
Destination Operand |
---|---|---|---|---|
|
Set byte to 1 if equal (ZF=1) |
- |
- |
Memory |
|
Set byte to 1 if not equal (ZF=0) |
- |
- |
Memory |
|
Set byte to 1 if greater (SF=OF) |
- |
- |
Memory |
|
Set byte to 1 if greater or equal (SF=OF) |
- |
- |
Memory |
|
Set byte to 1 if less (SF!=OF) |
- |
- |
Memory |
|
Set byte to 1 if less or equal (ZF=1 or SF!=OF) |
- |
- |
Memory |
Instruction operands#
We write register operands with a preceding
%
,e.g.,xorq %rax, %rax
.We write immediate integer operands with a preceding
$
, e.g.,$10
.Labels (to stand for the memory addresses they refer to) can be directly used as memory operand, however, rip-relative addressing (see below) is the preferred mode of referring to labels.
We write indirect addresses using parentheses, e.g.,
(%rax)
for the memory location whose address is in the registerrax
— as an operand for jumping such addresses must also be preceded with an asterisk, i.e.,jmp *(%rax)
.We write relative indirect addresses with an offset preceding parentheses, e.g.,
10(%rax)
for the memory location whose address is 10 bytes after the address stored in the registerrax
— as an operand for jumping such addresses must also be preceded with an asterisk, i.e.,jmp *10(%rax)
.
Conditional jumps do not support indirect addressing. That is, while jmp *10(%rax)
is a valid instruction, je *10(%rax)
is not. Such (relative) indirect conditional jumps must be encoded manually. That is, one should write the following code instead of je *10(%rax)
; code before jump
jne after_jump
jmp *10(%rax)
after_jump:
; code after jump
Tip
X86-64 uses so-called rip-relative addressing for making code relocatable. That is, in order to move data from a label abc
, one would write movq abc(%rip), %rax
instead of movq abc, %rax
. This effectively does the same thing but is compiled in a different way so that the code is relocatable, i.e., functions correctly regardless of where in the memory it is loaded. Rip-relative addressing can only be used for accessing data and not jumps.
Jumps to labels are automatically made relative.
Sections in assembly code#
Assembly programs are divided into so-called sections: data section(s) and code section(s).
These are indicated by .data
and .text
indicators in the assembly program.
labels#
Points in the code or data section can be marked with a label by including a line label:
before the declaration or code being labeled.
We write .global label
(on a separate line) to export a label (code or data) so it can be referred to from other modules (other source files) that are linked with the code, e.g., to export the function func
, we would write:
.text
.global main
main:
movq $0, %rax
retq
Storing constants#
Constant 64-bit integers can be stored in the data section by writing
.quad n
where \(n\) is a the constant integer stored.Constant strings can be stored in the data section by writing
.asciiz "str"
to store the string “str”
These constants can be accessed by their preceding labels:
string1:
.asciiz "abcd"
integer1:
.quad 123456
Calling convention (System V ABI)#
We use the System V ABI calling convention. This is to be compatible with Linux and Mac OS, and the C programming language on these operating systems. See the System V ABI for details here: x86-64-abi-20210928.pdf
.
Passing arguments#
When calling a function, the first 6 arguments arguments are passed in registers: %rdi
, %rsi
, %rdx
, %rcx
, %r8
, %r9
.
All the remaining arguments, if any, are passed on the stack, from right to left.
That is, the last argument of the function is pushed on the stack, and then, the one before last, until finally the 7th argument is pushed on the stack, before calling the target function.
Callee versus caller saved registers#
Each function must ensure that it preserves all callee saved registers (registers marked to be preserved across calls in the table of registers above, e.g., rbp). These must be stored on the stack as part of the function prologue, i.e., the code that runs immediately the beginning of the function. The only exception is the rsp register itself which is automatically preserved by the return instruction. (Note that the function must ensure that the rsp register is exactly as it was at the beginning of the call before it can safely invoke the return instruction because the return address is read from the stack.)
Function prologue and epilogue#
Typical simple function prologue and epilogue can be as follows. Here we assume that rbp is the only callee-saved register the function will touch.
function prologue#
pushq %rbp ; save the caller's base pointer on the stack
movq %rsp, %rbp ; set our base pointer to the current stack pointer
; this is useful to be able to restore it and
; as an anchor for referring to variables on the stack
subq 112, %rsp ; reserve memory on stack for function's local
; variables (112 = 14 * 8)
function epilogue#
movq %rbp, %rsp ; restore the rsp to where it was right after pushing %rbp
; of the caller
popq %rbp ; restore the rbp to caller's value
; the stack pointer is now exactly where it was before entering
; the function, i.e., right at the return address
retq ; return to the caller
Stack alignment#
System V ABI calling convention mandates that at any function call the stack pointer (rsp
) must be 16-byte aligned. That is, the numeric value of rsp
must be divisible by 16.
The best way to ensure it is to pre-allocate all the space that function needs on the stack in the prologue and make sure it is 16-byte aligned. The function prologue above does ensure that the stack is in 16-byte alignment.
It is a good exercise to try to convince yourself that this is indeed the case.