x86-64

x86-64#

Note

We use AT&T assembly syntax. This is what clang uses. In AT&T syntax the source operand comes before the destination operand, i.e., instr src, dest.

Tip

You can use the x86-64 emulator x64-emu to experiment and practice x86-64 assembly in your browser — it supports all (and only) the instructions and features described here.

Registers that we will use#

Register	Description	Preserved Across Calls
`rip`	Instruction pointer; cannot be manipulated directly	Irrelevant but no!
`rax`	General purpose; stores return value	No
`rbx`	General purpose; sometimes also used as the base pointer	Yes
`rcx`	General purpose; used for 4th argument	No
`rdx`	General purpose; used for 3rd argument	No
`rsp`	Stack pointer	Yes (automatically)
`rbp`	Can be used as base pointer	Yes
`rsi`	General purpose; used for 2nd argument	No
`rdi`	General purpose; used for 1st argument	No
`r8`	General purpose; used for 5th argument	No
`r9`	General purpose; used for 6th argument	No
`r10`	General purpose	No
`r11`	General purpose	No
`r12`	General purpose	Yes
`r13`	General purpose	Yes
`r14`	General purpose	Yes
`r15`	General purpose	Yes

The stack#

In x86 the stack starts in a high address and grows towards lower addresses.

Pushing on the stack: first decrements the stack point (rsp) by the size of the object being pushed and then writes the value in memory pointed by the stack pointer.
Popping from the stack: reads the value from memory where the stack pointer points and then increments the stack pointer by the size of the object being popped.
To allocate 16 bytes space on the stack, e.g., for storing function’s local variables, we _decrement the stack by 16 bytes:

subq $16, %rsp

Endianness#

x86 is a little-endian architecture. This means that low-significance bytes are written at lower address. For example, the four byte number 0xFFA02B1C, when written to memory at address $n$, is stored as follows:

Address:	$\cdots$	$n$	$n+1$	$n+2$	$n+3$	$\cdots$
Contents:	$\cdots$	1C	2B	A0	FF	$\cdots$

CPU flags#

Flag	Meaning
`SF`	Set if the last arithmetic/logic instruction resulted in a negative number
`ZF`	Set if the last arithmetic/logic instruction resulted in zero
`OF`	Set if the last arithmetic/logic instruction resulted in an overflow
`CF`	Set if the last arithmetic/logic instruction produced a carry
`PF`	Set to 1 if number of 1’s resulting from the last arithmetic/logic instruction is even

Instructions that we will use#

Below are categorized lists of instructions that we will use. Note that when an instruction, like movq, takes both a source and a destination operand, both the source and the destination operands cannot be memory at the same time.

Data movement instructions#

Instruction	Description	Affected Flags (OF, SF, ZF, CF, PF)	Source Operand(s)	Destination Operand
`movq`	Move 64-bit value from source to destination	-	Register, Memory	Register, Memory
`leaq`	Load effective address into register	-	Memory	Register

Arithmetic/logic instructions#

Instruction	Description	Affected Flags (OF, SF, ZF, CF, PF)	Source Operand(s)	Destination Operand
`incq`	Increment value by 1	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`decq`	Decrement value by 1	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`negq`	Negate value (two’s complement)	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`addq`	Add source to destination	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`subq`	Subtract source from destination	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`imulq`	Signed multiply destination by source	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`cqto`	Convert quadword (`rax`) to octaword (`rdx:rax`)	-	-	`rdx:rax`
`idivq`	Signed divide `rdx:rax` by divisor	OF, SF, ZF, CF, PF	Register, Memory	`rax` (quotient), `rdx` (remainder)
`cmpq`	Compare source and destination (sets flags)	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`notq`	Bitwise NOT (complement) operation	-	Register, Memory	Register, Memory
`xorq`	Bitwise XOR destination with source	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`orq`	Bitwise OR destination with source	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`andq`	Bitwise AND destination with source	OF, SF, ZF, CF, PF	Register, Memory	Register, Memory
`shlq`	Shift left destination by count bits	OF, SF, ZF, CF, PF	Immediate, `cl`	Register, Memory
`sarq`	Arithmetic shift right destination by count bits	OF, SF, ZF, CF, PF	Immediate, `cl`	Register, Memory
`shrq`	Logical shift right destination by count bits	OF, SF, ZF, CF, PF	Immediate, `cl`	Register, Memory

Note

The instruction idivq does not set flags; in fact there is no guarantee what values the flags will have after idivq. If there is an overflow (the quotient does not fit in rax), or if the divisor is zero, idivq will result in a trap (CPU level exception caught by the operating system and usually passed to language’s runtime). Proper use of cqto instruction should prevent overflows. It is a good idea (this is what we do in this course) to (produce code to) check that the divisor is not zero before every division operation.

Note

The imulq instruction multiplies two 64-bit values (source and destination). The result is computed as a 128-bit number. If the result is too big to fit in the 64-bit destination, both the OF and CF flags are set.

Note

The register cl which is used to indicate the number of shifts in shifting operations is simply the lowest byte of the rcx register. We don’t have direct access to it in fragment we use in this course (explained in this document) but it can be set indirectly by setting rcx accordingly.

Stack management#

Instruction	Description	Affected Flags (OF, SF, ZF, CF, PF)	Source Operand(s)	Destination Operand
`pushq`	Push value onto the stack	-	Register, Memory	Stack
`popq`	Pop value from the stack	-	Stack	Register, Memory

Call and return#

Instruction	Description	Affected Flags (OF, SF, ZF, CF, PF)	Source Operand(s)	Destination Operand
`retq`	Return from function	-	-	-
`callq`	Call a function at destination	-	-	Memory address

Jumps#

Instruction	Description	Affected Flags (OF, SF, ZF, CF, PF)	Source Operand(s)	Destination Operand
`jmp`	Unconditionally jump to destination	-	-	Memory address
`je`	Jump if equal (ZF=1)	-	-	Memory address
`jne`	Jump if not equal (ZF=0)	-	-	Memory address
`jg`	Jump if greater (ZF=0, SF=OF)	-	-	Memory address
`jge`	Jump if greater or equal (SF=OF)	-	-	Memory address
`jl`	Jump if less (SF!=OF)	-	-	Memory address
`jle`	Jump if less or equal (ZF=1 or SF!=OF)	-	-	Memory address

Setting memory conditionally#

Instruction	Description	Affected Flags (OF, SF, ZF, CF, PF)	Source Operand(s)	Destination Operand
`sete`	Set byte to 1 if equal (ZF=1)	-	-	Memory
`setne`	Set byte to 1 if not equal (ZF=0)	-	-	Memory
`setg`	Set byte to 1 if greater (SF=OF)	-	-	Memory
`setge`	Set byte to 1 if greater or equal (SF=OF)	-	-	Memory
`setl`	Set byte to 1 if less (SF!=OF)	-	-	Memory
`setle`	Set byte to 1 if less or equal (ZF=1 or SF!=OF)	-	-	Memory

Instruction operands#

We write register operands with a preceding %,e.g., xorq %rax, %rax.
We write immediate integer operands with a preceding $, e.g., $10.
Labels (to stand for the memory addresses they refer to) can be directly used as memory operand, however, rip-relative addressing (see below) is the preferred mode of referring to labels.
We write indirect addresses using parentheses, e.g., (%rax) for the memory location whose address is in the register rax — as an operand for jumping such addresses must also be preceded with an asterisk, i.e., jmp *(%rax).
We write relative indirect addresses with an offset preceding parentheses, e.g., 10(%rax) for the memory location whose address is 10 bytes after the address stored in the register rax — as an operand for jumping such addresses must also be preceded with an asterisk, i.e., jmp *10(%rax).

Conditional jumps do not support indirect addressing. That is, while jmp *10(%rax) is a valid instruction, je *10(%rax) is not. Such (relative) indirect conditional jumps must be encoded manually. That is, one should write the following code instead of je *10(%rax)

; code before jump
jne after_jump
jmp *10(%rax)
after_jump:
; code after jump

Tip

X86-64 uses so-called rip-relative addressing for making code relocatable. That is, in order to move data from a label abc, one would write movq abc(%rip), %rax instead of movq abc, %rax. This effectively does the same thing but is compiled in a different way so that the code is relocatable, i.e., functions correctly regardless of where in the memory it is loaded. Rip-relative addressing can only be used for accessing data and not jumps. Jumps to labels are automatically made relative.

Sections in assembly code#

Assembly programs are divided into so-called sections: data section(s) and code section(s). These are indicated by .data and .text indicators in the assembly program.

labels#

Points in the code or data section can be marked with a label by including a line label: before the declaration or code being labeled.

We write .global label (on a separate line) to export a label (code or data) so it can be referred to from other modules (other source files) that are linked with the code, e.g., to export the function func, we would write:

.text
.global main
main:
  movq $0, %rax
  retq

Storing constants#

Constant 64-bit integers can be stored in the data section by writing .quad n where $n$ is a the constant integer stored.
Constant strings can be stored in the data section by writing .asciiz "str" to store the string “str”

These constants can be accessed by their preceding labels:

string1:
.asciiz "abcd"

integer1:
.quad 123456

Calling convention (System V ABI)#

We use the System V ABI calling convention. This is to be compatible with Linux and Mac OS, and the C programming language on these operating systems. See the System V ABI for details here: x86-64-abi-20210928.pdf.

Passing arguments#

When calling a function, the first 6 arguments arguments are passed in registers: %rdi, %rsi, %rdx, %rcx, %r8, %r9. All the remaining arguments, if any, are passed on the stack, from right to left. That is, the last argument of the function is pushed on the stack, and then, the one before last, until finally the 7th argument is pushed on the stack, before calling the target function.

Callee versus caller saved registers#

Each function must ensure that it preserves all callee saved registers (registers marked to be preserved across calls in the table of registers above, e.g., rbp). These must be stored on the stack as part of the function prologue, i.e., the code that runs immediately the beginning of the function. The only exception is the rsp register itself which is automatically preserved by the return instruction. (Note that the function must ensure that the rsp register is exactly as it was at the beginning of the call before it can safely invoke the return instruction because the return address is read from the stack.)

Function prologue and epilogue#

Typical simple function prologue and epilogue can be as follows. Here we assume that rbp is the only callee-saved register the function will touch.

function prologue#

pushq %rbp        ; save the caller's base pointer on the stack
movq %rsp, %rbp   ; set our base pointer to the current stack pointer
                  ; this is useful to be able to restore it and
                  ; as an anchor for referring to variables on the stack
subq 112, %rsp    ; reserve memory on stack for function's local
                  ; variables (112 = 14 * 8)

function epilogue#

movq %rbp, %rsp  ; restore the rsp to where it was right after pushing %rbp
                 ; of the caller
popq %rbp        ; restore the rbp to caller's value
                 ; the stack pointer is now exactly where it was before entering
                 ; the function, i.e., right at the return address
retq             ; return to the caller

Stack alignment#

System V ABI calling convention mandates that at any function call the stack pointer (rsp) must be 16-byte aligned. That is, the numeric value of rsp must be divisible by 16. The best way to ensure it is to pre-allocate all the space that function needs on the stack in the prologue and make sure it is 16-byte aligned. The function prologue above does ensure that the stack is in 16-byte alignment. It is a good exercise to try to convince yourself that this is indeed the case.

x86-64

Contents

x86-64#

Registers that we will use#

The stack#

Endianness#

CPU flags#

Instructions that we will use#

Data movement instructions#

Arithmetic/logic instructions#

Stack management#

Call and return#

Jumps#

Setting memory conditionally#

Instruction operands#

Sections in assembly code#

labels#

Storing constants#

Calling convention (System V ABI)#

Passing arguments#

Callee versus caller saved registers#

Function prologue and epilogue#

function prologue#

function epilogue#

Stack alignment#