x86-64#

Note

We use AT&T assembly syntax. This is what clang uses. In AT&T syntax the source operand comes before the destination operand, i.e., instr src, dest.

Tip

You can use the x86-64 emulator x64-emu to experiment and practice x86-64 assembly in your browser — it supports all (and only) the instructions and features described here.

Registers that we will use#

Register

Description

Preserved Across Calls

rip

Instruction pointer; cannot be manipulated directly

Irrelevant but no!

rax

General purpose; stores return value

No

rbx

General purpose; sometimes also used as the base pointer

Yes

rcx

General purpose; used for 4th argument

No

rdx

General purpose; used for 3rd argument

No

rsp

Stack pointer

Yes (automatically)

rbp

Can be used as base pointer

Yes

rsi

General purpose; used for 2nd argument

No

rdi

General purpose; used for 1st argument

No

r8

General purpose; used for 5th argument

No

r9

General purpose; used for 6th argument

No

r10

General purpose

No

r11

General purpose

No

r12

General purpose

Yes

r13

General purpose

Yes

r14

General purpose

Yes

r15

General purpose

Yes

The stack#

In x86 the stack starts in a high address and grows towards lower addresses.

  • Pushing on the stack: first decrements the stack point (rsp) by the size of the object being pushed and then writes the value in memory pointed by the stack pointer.

  • Popping from the stack: reads the value from memory where the stack pointer points and then increments the stack pointer by the size of the object being popped.

  • To allocate 16 bytes space on the stack, e.g., for storing function’s local variables, we _decrement the stack by 16 bytes:

subq $16, %rsp

Endianness#

x86 is a little-endian architecture. This means that low-significance bytes are written at lower address. For example, the four byte number 0xFFA02B1C, when written to memory at address \(n\), is stored as follows:

Address:

\(\cdots\)

\(n\)

\(n+1\)

\(n+2\)

\(n+3\)

\(\cdots\)

Contents:

\(\cdots\)

1C

2B

A0

FF

\(\cdots\)

CPU flags#

Flag

Meaning

SF

Set if the last arithmetic/logic instruction resulted in a negative number

ZF

Set if the last arithmetic/logic instruction resulted in zero

OF

Set if the last arithmetic/logic instruction resulted in an overflow

CF

Set if the last arithmetic/logic instruction produced a carry

PF

Set to 1 if number of 1’s resulting from the last arithmetic/logic instruction is even

Instructions that we will use#

Below are categorized lists of instructions that we will use. Note that when an instruction, like movq, takes both a source and a destination operand, both the source and the destination operands cannot be memory at the same time.

Data movement instructions#

Instruction

Description

Affected Flags (OF, SF, ZF, CF, PF)

Source Operand(s)

Destination Operand

movq

Move 64-bit value from source to destination

-

Register, Memory

Register, Memory

leaq

Load effective address into register

-

Memory

Register

Arithmetic/logic instructions#

Instruction

Description

Affected Flags (OF, SF, ZF, CF, PF)

Source Operand(s)

Destination Operand

incq

Increment value by 1

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

decq

Decrement value by 1

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

negq

Negate value (two’s complement)

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

addq

Add source to destination

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

subq

Subtract source from destination

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

imulq

Signed multiply destination by source

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

cqto

Convert quadword (rax) to octaword (rdx:rax)

-

-

rdx:rax

idivq

Signed divide rdx:rax by divisor

OF, SF, ZF, CF, PF

Register, Memory

rax (quotient), rdx (remainder)

cmpq

Compare source and destination (sets flags)

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

notq

Bitwise NOT (complement) operation

-

Register, Memory

Register, Memory

xorq

Bitwise XOR destination with source

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

orq

Bitwise OR destination with source

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

andq

Bitwise AND destination with source

OF, SF, ZF, CF, PF

Register, Memory

Register, Memory

shlq

Shift left destination by count bits

OF, SF, ZF, CF, PF

Immediate, cl

Register, Memory

sarq

Arithmetic shift right destination by count bits

OF, SF, ZF, CF, PF

Immediate, cl

Register, Memory

shrq

Logical shift right destination by count bits

OF, SF, ZF, CF, PF

Immediate, cl

Register, Memory

Note

The instruction idivq does not set flags; in fact there is no guarantee what values the flags will have after idivq. If there is an overflow (the quotient does not fit in rax), or if the divisor is zero, idivq will result in a trap (CPU level exception caught by the operating system and usually passed to language’s runtime). Proper use of cqto instruction should prevent overflows. It is a good idea (this is what we do in this course) to (produce code to) check that the divisor is not zero before every division operation.

Note

The imulq instruction multiplies two 64-bit values (source and destination). The result is computed as a 128-bit number. If the result is too big to fit in the 64-bit destination, both the OF and CF flags are set.

Note

The register cl which is used to indicate the number of shifts in shifting operations is simply the lowest byte of the rcx register. We don’t have direct access to it in fragment we use in this course (explained in this document) but it can be set indirectly by setting rcx accordingly.

Stack management#

Instruction

Description

Affected Flags (OF, SF, ZF, CF, PF)

Source Operand(s)

Destination Operand

pushq

Push value onto the stack

-

Register, Memory

Stack

popq

Pop value from the stack

-

Stack

Register, Memory

Call and return#

Instruction

Description

Affected Flags (OF, SF, ZF, CF, PF)

Source Operand(s)

Destination Operand

retq

Return from function

-

-

-

callq

Call a function at destination

-

-

Memory address

Jumps#

Instruction

Description

Affected Flags (OF, SF, ZF, CF, PF)

Source Operand(s)

Destination Operand

jmp

Unconditionally jump to destination

-

-

Memory address

je

Jump if equal (ZF=1)

-

-

Memory address

jne

Jump if not equal (ZF=0)

-

-

Memory address

jg

Jump if greater (ZF=0, SF=OF)

-

-

Memory address

jge

Jump if greater or equal (SF=OF)

-

-

Memory address

jl

Jump if less (SF!=OF)

-

-

Memory address

jle

Jump if less or equal (ZF=1 or SF!=OF)

-

-

Memory address

Setting memory conditionally#

Instruction

Description

Affected Flags (OF, SF, ZF, CF, PF)

Source Operand(s)

Destination Operand

sete

Set byte to 1 if equal (ZF=1)

-

-

Memory

setne

Set byte to 1 if not equal (ZF=0)

-

-

Memory

setg

Set byte to 1 if greater (SF=OF)

-

-

Memory

setge

Set byte to 1 if greater or equal (SF=OF)

-

-

Memory

setl

Set byte to 1 if less (SF!=OF)

-

-

Memory

setle

Set byte to 1 if less or equal (ZF=1 or SF!=OF)

-

-

Memory

Instruction operands#

  • We write register operands with a preceding %,e.g., xorq %rax, %rax.

  • We write immediate integer operands with a preceding $, e.g., $10.

  • Labels (to stand for the memory addresses they refer to) can be directly used as memory operand, however, rip-relative addressing (see below) is the preferred mode of referring to labels.

  • We write indirect addresses using parentheses, e.g., (%rax) for the memory location whose address is in the register rax — as an operand for jumping such addresses must also be preceded with an asterisk, i.e., jmp *(%rax).

  • We write relative indirect addresses with an offset preceding parentheses, e.g., 10(%rax) for the memory location whose address is 10 bytes after the address stored in the register rax — as an operand for jumping such addresses must also be preceded with an asterisk, i.e., jmp *10(%rax).

Conditional jumps do not support indirect addressing. That is, while jmp *10(%rax) is a valid instruction, je *10(%rax) is not. Such (relative) indirect conditional jumps must be encoded manually. That is, one should write the following code instead of je *10(%rax)

; code before jump
jne after_jump
jmp *10(%rax)
after_jump:
; code after jump

Tip

X86-64 uses so-called rip-relative addressing for making code relocatable. That is, in order to move data from a label abc, one would write movq abc(%rip), %rax instead of movq abc, %rax. This effectively does the same thing but is compiled in a different way so that the code is relocatable, i.e., functions correctly regardless of where in the memory it is loaded. Rip-relative addressing can only be used for accessing data and not jumps. Jumps to labels are automatically made relative.

Sections in assembly code#

Assembly programs are divided into so-called sections: data section(s) and code section(s). These are indicated by .data and .text indicators in the assembly program.

labels#

Points in the code or data section can be marked with a label by including a line label: before the declaration or code being labeled.

We write .global label (on a separate line) to export a label (code or data) so it can be referred to from other modules (other source files) that are linked with the code, e.g., to export the function func, we would write:

.text
.global main
main:
  movq $0, %rax
  retq

Storing constants#

  • Constant 64-bit integers can be stored in the data section by writing .quad n where \(n\) is a the constant integer stored.

  • Constant strings can be stored in the data section by writing .asciiz "str" to store the string “str”

These constants can be accessed by their preceding labels:

string1:
.asciiz "abcd"

integer1:
.quad 123456

Calling convention (System V ABI)#

We use the System V ABI calling convention. This is to be compatible with Linux and Mac OS, and the C programming language on these operating systems. See the System V ABI for details here: x86-64-abi-20210928.pdf.

Passing arguments#

When calling a function, the first 6 arguments arguments are passed in registers: %rdi, %rsi, %rdx, %rcx, %r8, %r9. All the remaining arguments, if any, are passed on the stack, from right to left. That is, the last argument of the function is pushed on the stack, and then, the one before last, until finally the 7th argument is pushed on the stack, before calling the target function.

Callee versus caller saved registers#

Each function must ensure that it preserves all callee saved registers (registers marked to be preserved across calls in the table of registers above, e.g., rbp). These must be stored on the stack as part of the function prologue, i.e., the code that runs immediately the beginning of the function. The only exception is the rsp register itself which is automatically preserved by the return instruction. (Note that the function must ensure that the rsp register is exactly as it was at the beginning of the call before it can safely invoke the return instruction because the return address is read from the stack.)

Function prologue and epilogue#

Typical simple function prologue and epilogue can be as follows. Here we assume that rbp is the only callee-saved register the function will touch.

function prologue#

pushq %rbp        ; save the caller's base pointer on the stack
movq %rsp, %rbp   ; set our base pointer to the current stack pointer
                  ; this is useful to be able to restore it and
                  ; as an anchor for referring to variables on the stack
subq 112, %rsp    ; reserve memory on stack for function's local
                  ; variables (112 = 14 * 8)

function epilogue#

movq %rbp, %rsp  ; restore the rsp to where it was right after pushing %rbp
                 ; of the caller
popq %rbp        ; restore the rbp to caller's value
                 ; the stack pointer is now exactly where it was before entering
                 ; the function, i.e., right at the return address
retq             ; return to the caller

Stack alignment#

System V ABI calling convention mandates that at any function call the stack pointer (rsp) must be 16-byte aligned. That is, the numeric value of rsp must be divisible by 16. The best way to ensure it is to pre-allocate all the space that function needs on the stack in the prologue and make sure it is 16-byte aligned. The function prologue above does ensure that the stack is in 16-byte alignment. It is a good exercise to try to convince yourself that this is indeed the case.