Last Updated on March 16, 2026 by Vivekanand

In Part 1 of this series, we explored how to make syscalls directly to the kernel. But most real programs rarely call syscalls directly — instead, they call functions, which may themselves invoke syscalls deep in the call stack.
When one function calls another, both must agree on a contract: Where do arguments go? Which registers can I clobber? Who cleans up the stack? This contract is called a calling convention, and it varies dramatically between platforms. Understanding calling conventions is essential for any low-level programmer.
This comprehensive guide covers the three major calling conventions you’ll encounter when writing cross-platform assembly:
| Convention | Platforms | Architecture |
|---|---|---|
| System V AMD64 ABI | Linux, macOS, FreeBSD | x86-64 |
| Microsoft x64 | Windows | x86-64 |
| AAPCS64 | Linux, macOS, Windows | ARM64 |
By the end, you’ll understand how to write assembly functions that play nicely with C code — and debug mysterious crashes caused by calling convention violations.
Table of Contents
What Calling Conventions Define
A calling convention specifies the rules that both the caller and callee must follow:
- Parameter passing — Which registers hold arguments, and in what order
- Return values — Where the result goes
- Register preservation — Which registers must be saved/restored by the callee
- Stack layout — Alignment requirements, shadow space, red zones
- Cleanup responsibility — Who adjusts the stack pointer after the call
Understanding these calling convention rules is essential for:
- Writing assembly functions callable from C
- Calling C library functions from assembly
- Debugging crashes in optimized code
- Reverse engineering compiled binaries
System V AMD64 ABI (Linux & macOS x86-64)
The System V AMD64 ABI is the most widely used calling convention on Unix-like systems. It’s efficient, using many registers for parameter passing which reduces stack operations.
Integer and Pointer Arguments
The first six integer/pointer arguments go in registers, in this order:
| Argument | Register |
|---|---|
| 1st | rdi |
| 2nd | rsi |
| 3rd | rdx |
| 4th | rcx |
| 5th | r8 |
| 6th | r9 |
Arguments beyond the sixth are pushed onto the stack in reverse order (right-to-left), so the 7th argument is at the lowest address.
Floating-Point Arguments
The first eight floating-point arguments use vector registers:
| Argument | Register |
|---|---|
| 1st-8th | xmm0 – xmm7 |
For variadic functions (like printf), the caller must set al to the number of vector registers used.
Return Values
| Type | Location |
|---|---|
| Integer/pointer (≤64 bits) | rax |
| Integer (128 bits) | rax (low), rdx (high) |
| Floating-point | xmm0 |
| Struct (small) | rax/rdx or xmm0/xmm1 |
| Struct (large) | Caller passes hidden pointer in rdi |
Register Preservation
This is critical: if you clobber a callee-saved register without restoring it, you’ll corrupt the caller’s state.
| Caller-saved (volatile) | Callee-saved (non-volatile) |
|---|---|
rax, rcx, rdx, rsi, rdi, r8–r11 | rbx, rbp, r12–r15, rsp |
Callee-saved means: if your function uses these registers, you must save them at entry and restore them before returning.
Stack Requirements
- 16-byte alignment: The stack pointer (
rsp) must be 16-byte aligned before thecallinstruction. Sincecallpushes an 8-byte return address,rspwill be misaligned by 8 bytes at function entry. Prologues typically subtract 8 (or 8 + 16n) to restore alignment. - Red zone: Leaf functions (functions that don’t call other functions) can use 128 bytes below
rspwithout adjusting the stack pointer. This optimization avoids prologue/epilogue overhead for simple functions.
Code Example
// Linux x86-64: add two numbers, print result
// Assemble: cc -o example example.s
// Run: ./example
.section .data
fmt: .asciz "Sum of %d and %d is %dn"
.section .text
.global main
// int add_nums(int a, int b)
// Arguments: rdi = a, rsi = b
// Returns: rax = a + b
add_nums:
lea eax, [rdi + rsi] // eax = rdi + rsi
ret
main:
push rbp
mov rbp, rsp
// Call add_nums(5, 3)
mov edi, 5 // First argument
mov esi, 3 // Second argument
call add_nums // Result in eax
// Call printf(fmt, 5, 3, result)
mov ecx, eax // 4th arg: result
mov edx, 3 // 3rd arg: second number
mov esi, 5 // 2nd arg: first number
lea rdi, [rip + fmt] // 1st arg: format string
xor eax, eax // No vector arguments
call printf
xor eax, eax // Return 0
pop rbp
ret
Note the xor eax, eax before calling printf — this tells the variadic function that we’re not passing any floating-point arguments in vector registers. This is a key detail of the System V calling convention.
Microsoft x64 Calling Convention (Windows)
Windows uses a distinctly different calling convention that prioritizes simplicity and debugging over raw performance. The Microsoft x64 calling convention is well-documented and consistent across all Windows applications.
Key Differences from System V
| Aspect | System V | Windows x64 |
|---|---|---|
| Register arguments | 6 | 4 |
| Shadow space | None | 32 bytes required |
| Red zone | 128 bytes | None |
| Variadic handling | al = vector count | Same as regular args |
Argument Registers
Only four registers are used for arguments in the Windows calling convention:
| Argument | Integer/Pointer | Floating Point |
|---|---|---|
| 1st | rcx | xmm0 |
| 2nd | rdx | xmm1 |
| 3rd | r8 | xmm2 |
| 4th | r9 | xmm3 |
Arguments 5+ go on the stack, and the caller must always allocate 32 bytes of “shadow space” even if using fewer than 4 args.
Code Example (Windows)
; Windows x64: add two numbers, print result
; Assemble: ml64 /c example.asm
; Link: link /subsystem:console example.obj msvcrt.lib
.data
fmt db "Sum of %d and %d is %d", 10, 0
.code
extern printf:proc
add_nums proc
lea eax, [rcx + rdx] ; First two args in rcx, rdx
ret
add_nums endp
main proc
sub rsp, 40 ; 32 shadow + 8 alignment
; Call add_nums(5, 3)
mov ecx, 5
mov edx, 3
call add_nums
; Call printf(fmt, 5, 3, result)
mov r9d, eax ; 4th arg
mov r8d, 3 ; 3rd arg
mov edx, 5 ; 2nd arg
lea rcx, fmt ; 1st arg
call printf
xor eax, eax
add rsp, 40
ret
main endp
end
AAPCS64 (ARM64 – All Platforms)
ARM64 uses the AAPCS64 calling convention across Linux, macOS, and Windows. It’s elegant and uses 8 registers for arguments.
Note on Apple ARM64 (macOS) deviations: While Apple Silicon Macs follow AAPCS64 for the majority of cases, Apple’s ABI has some documented deviations from the standard AAPCS64 spec. Most notably: rules for passing small structs and for variadic functions (va_list layout) differ from Linux ARM64. Code that mixes Apple ARM64 and standard AAPCS64 assumptions in those areas may behave differently across platforms. For general integer/pointer arguments, the conventions are identical. See Apple’s ARM64 ABI documentation for details.
Argument Passing
| Argument | Register |
|---|---|
| 1st-8th integer | x0–x7 |
| 1st-8th float | v0–v7 |
Register Preservation
| Caller-saved | Callee-saved |
|---|---|
x0–x18 | x19–x28, x29 (fp), x30 (lr), sp |
Code Example (ARM64)
// ARM64 Linux: add two numbers, print result
// Assemble: cc -o example example.s
// Run: ./example
.section .data
fmt: .asciz "Sum of %d and %d is %dn"
.section .text
.global main
add_nums:
add w0, w0, w1 // w0 = w0 + w1
ret
main:
stp x29, x30, [sp, #-16]!
mov x29, sp
// Call add_nums(5, 3)
mov w0, #5
mov w1, #3
bl add_nums
// Call printf(fmt, 5, 3, result)
mov w3, w0 // 4th arg
mov w2, #3 // 3rd arg
mov w1, #5 // 2nd arg
adrp x0, fmt
add x0, x0, :lo12:fmt
bl printf
mov w0, #0
ldp x29, x30, [sp], #16
ret
Side-by-Side Comparison
Here’s how the three major calling conventions compare:
| Feature | System V AMD64 | Windows x64 | AAPCS64 |
|---|---|---|---|
| Integer args | 6 regs | 4 regs | 8 regs |
| Float args | 8 regs | 4 regs | 8 regs |
| Red zone | 128 bytes | None | None |
| Shadow space | None | 32 bytes | None |
| Stack alignment | 16 bytes | 16 bytes | 16 bytes |
| Return value | rax | rax | x0 |
Stack Frame Visualization
Understanding calling conventions becomes easier when you visualize the stack frame structure for each ABI:
System V AMD64 Stack Frame
graph TB
subgraph "Stack Frame (System V AMD64)"
A["Previous Frame"] --> B["Return Address (8 bytes)"]
B --> C["Saved RBP (8 bytes)"]
C --> D["Local Variables"]
D --> E["Red Zone (128 bytes) - Optional"]
end
Windows x64 Stack Frame
Note on shadow space placement: The 32-byte shadow space is allocated by the caller above its own return address, before executing the call instruction. It sits in the caller’s frame, not the callee’s. The diagram below shows a simplified perspective from inside the callee — in practice the shadow space belongs to the caller and is used by the callee to optionally spill its register arguments.
graph TB
subgraph "Stack Frame (Windows x64)"
A["Previous Frame"] --> B["Return Address (8 bytes)"]
B --> C["Shadow Space (32 bytes) — allocated by caller"]
C --> D["Saved RBP (8 bytes)"]
D --> E["Local Variables"]
end
ARM64 Stack Frame
graph TB
subgraph "Stack Frame (ARM64)"
A["Previous Frame"] --> B["Saved x29/x30 (16 bytes)"]
B --> C["Local Variables"]
C --> D["Args beyond x7"]
end
Common Pitfalls and Debugging Tips
These are the most common calling convention mistakes that trip up assembly programmers:
1. Forgetting Shadow Space (Windows)
Windows requires 32 bytes of shadow space even for functions with fewer than 4 arguments. Forgetting this causes stack corruption.
2. Stack Misalignment
All three calling conventions require 16-byte alignment before call. Misalignment causes crashes on SSE/AVX instructions.
3. Clobbering Callee-Saved Registers
If you use a callee-saved register without saving/restoring it, you’ll corrupt the caller’s state. This is a critical calling convention rule.
What’s Next?
Now you understand how calling conventions define function communication across the call boundary. In Part 3, we’ll explore program startup: what happens before main() runs, how command-line arguments are passed, and the role of the C runtime.
Experiment: Try the code examples on Godbolt with different compilers and optimization levels. Watch how the generated code follows these calling conventions.

