x86-64 & ARM64 Calling Conventions Demystified

Learn x86-64 and ARM64 calling conventions across Linux, macOS, and Windows. Master parameter passing, stack frames, and register preservation with hands-on assembly examples.

Last Updated on March 16, 2026 by Vivekanand

Calling conventions define how functions communicate across different CPU architectures

In Part 1 of this series, we explored how to make syscalls directly to the kernel. But most real programs rarely call syscalls directly — instead, they call functions, which may themselves invoke syscalls deep in the call stack.

When one function calls another, both must agree on a contract: Where do arguments go? Which registers can I clobber? Who cleans up the stack? This contract is called a calling convention, and it varies dramatically between platforms. Understanding calling conventions is essential for any low-level programmer.

This comprehensive guide covers the three major calling conventions you’ll encounter when writing cross-platform assembly:

ConventionPlatformsArchitecture
System V AMD64 ABILinux, macOS, FreeBSDx86-64
Microsoft x64Windowsx86-64
AAPCS64Linux, macOS, WindowsARM64

By the end, you’ll understand how to write assembly functions that play nicely with C code — and debug mysterious crashes caused by calling convention violations.


What Calling Conventions Define

A calling convention specifies the rules that both the caller and callee must follow:

  1. Parameter passing — Which registers hold arguments, and in what order
  2. Return values — Where the result goes
  3. Register preservation — Which registers must be saved/restored by the callee
  4. Stack layout — Alignment requirements, shadow space, red zones
  5. Cleanup responsibility — Who adjusts the stack pointer after the call

Understanding these calling convention rules is essential for:

  • Writing assembly functions callable from C
  • Calling C library functions from assembly
  • Debugging crashes in optimized code
  • Reverse engineering compiled binaries

System V AMD64 ABI (Linux & macOS x86-64)

The System V AMD64 ABI is the most widely used calling convention on Unix-like systems. It’s efficient, using many registers for parameter passing which reduces stack operations.

Integer and Pointer Arguments

The first six integer/pointer arguments go in registers, in this order:

ArgumentRegister
1strdi
2ndrsi
3rdrdx
4thrcx
5thr8
6thr9

Arguments beyond the sixth are pushed onto the stack in reverse order (right-to-left), so the 7th argument is at the lowest address.

Floating-Point Arguments

The first eight floating-point arguments use vector registers:

ArgumentRegister
1st-8thxmm0xmm7

For variadic functions (like printf), the caller must set al to the number of vector registers used.

Return Values

TypeLocation
Integer/pointer (≤64 bits)rax
Integer (128 bits)rax (low), rdx (high)
Floating-pointxmm0
Struct (small)rax/rdx or xmm0/xmm1
Struct (large)Caller passes hidden pointer in rdi

Register Preservation

This is critical: if you clobber a callee-saved register without restoring it, you’ll corrupt the caller’s state.

Caller-saved (volatile)Callee-saved (non-volatile)
rax, rcx, rdx, rsi, rdi, r8r11rbx, rbp, r12r15, rsp

Callee-saved means: if your function uses these registers, you must save them at entry and restore them before returning.

Stack Requirements

  • 16-byte alignment: The stack pointer (rsp) must be 16-byte aligned before the call instruction. Since call pushes an 8-byte return address, rsp will be misaligned by 8 bytes at function entry. Prologues typically subtract 8 (or 8 + 16n) to restore alignment.
  • Red zone: Leaf functions (functions that don’t call other functions) can use 128 bytes below rsp without adjusting the stack pointer. This optimization avoids prologue/epilogue overhead for simple functions.

Code Example

// Linux x86-64: add two numbers, print result
// Assemble: cc -o example example.s
// Run: ./example

.section .data
fmt:    .asciz "Sum of %d and %d is %dn"

.section .text
.global main

// int add_nums(int a, int b)
// Arguments: rdi = a, rsi = b
// Returns: rax = a + b
add_nums:
    lea     eax, [rdi + rsi]    // eax = rdi + rsi
    ret

main:
    push    rbp
    mov     rbp, rsp
    
    // Call add_nums(5, 3)
    mov     edi, 5              // First argument
    mov     esi, 3              // Second argument
    call    add_nums            // Result in eax
    
    // Call printf(fmt, 5, 3, result)
    mov     ecx, eax            // 4th arg: result
    mov     edx, 3              // 3rd arg: second number
    mov     esi, 5              // 2nd arg: first number
    lea     rdi, [rip + fmt]    // 1st arg: format string
    xor     eax, eax            // No vector arguments
    call    printf
    
    xor     eax, eax            // Return 0
    pop     rbp
    ret

Note the xor eax, eax before calling printf — this tells the variadic function that we’re not passing any floating-point arguments in vector registers. This is a key detail of the System V calling convention.


Microsoft x64 Calling Convention (Windows)

Windows uses a distinctly different calling convention that prioritizes simplicity and debugging over raw performance. The Microsoft x64 calling convention is well-documented and consistent across all Windows applications.

Key Differences from System V

AspectSystem VWindows x64
Register arguments64
Shadow spaceNone32 bytes required
Red zone128 bytesNone
Variadic handlingal = vector countSame as regular args

Argument Registers

Only four registers are used for arguments in the Windows calling convention:

ArgumentInteger/PointerFloating Point
1strcxxmm0
2ndrdxxmm1
3rdr8xmm2
4thr9xmm3

Arguments 5+ go on the stack, and the caller must always allocate 32 bytes of “shadow space” even if using fewer than 4 args.

Code Example (Windows)

; Windows x64: add two numbers, print result
; Assemble: ml64 /c example.asm
; Link: link /subsystem:console example.obj msvcrt.lib

.data
fmt     db "Sum of %d and %d is %d", 10, 0

.code
extern printf:proc

add_nums proc
    lea     eax, [rcx + rdx]    ; First two args in rcx, rdx
    ret
add_nums endp

main proc
    sub     rsp, 40             ; 32 shadow + 8 alignment
    
    ; Call add_nums(5, 3)
    mov     ecx, 5
    mov     edx, 3
    call    add_nums
    
    ; Call printf(fmt, 5, 3, result)
    mov     r9d, eax            ; 4th arg
    mov     r8d, 3              ; 3rd arg
    mov     edx, 5              ; 2nd arg
    lea     rcx, fmt            ; 1st arg
    call    printf
    
    xor     eax, eax
    add     rsp, 40
    ret
main endp
end

AAPCS64 (ARM64 – All Platforms)

ARM64 uses the AAPCS64 calling convention across Linux, macOS, and Windows. It’s elegant and uses 8 registers for arguments.

Note on Apple ARM64 (macOS) deviations: While Apple Silicon Macs follow AAPCS64 for the majority of cases, Apple’s ABI has some documented deviations from the standard AAPCS64 spec. Most notably: rules for passing small structs and for variadic functions (va_list layout) differ from Linux ARM64. Code that mixes Apple ARM64 and standard AAPCS64 assumptions in those areas may behave differently across platforms. For general integer/pointer arguments, the conventions are identical. See Apple’s ARM64 ABI documentation for details.

Argument Passing

ArgumentRegister
1st-8th integerx0x7
1st-8th floatv0v7

Register Preservation

Caller-savedCallee-saved
x0x18x19x28, x29 (fp), x30 (lr), sp

Code Example (ARM64)

// ARM64 Linux: add two numbers, print result
// Assemble: cc -o example example.s
// Run: ./example

.section .data
fmt:    .asciz "Sum of %d and %d is %dn"

.section .text
.global main

add_nums:
    add     w0, w0, w1          // w0 = w0 + w1
    ret

main:
    stp     x29, x30, [sp, #-16]!
    mov     x29, sp
    
    // Call add_nums(5, 3)
    mov     w0, #5
    mov     w1, #3
    bl      add_nums
    
    // Call printf(fmt, 5, 3, result)
    mov     w3, w0              // 4th arg
    mov     w2, #3              // 3rd arg
    mov     w1, #5              // 2nd arg
    adrp    x0, fmt
    add     x0, x0, :lo12:fmt
    bl      printf
    
    mov     w0, #0
    ldp     x29, x30, [sp], #16
    ret

Side-by-Side Comparison

Here’s how the three major calling conventions compare:

FeatureSystem V AMD64Windows x64AAPCS64
Integer args6 regs4 regs8 regs
Float args8 regs4 regs8 regs
Red zone128 bytesNoneNone
Shadow spaceNone32 bytesNone
Stack alignment16 bytes16 bytes16 bytes
Return valueraxraxx0

Stack Frame Visualization

Understanding calling conventions becomes easier when you visualize the stack frame structure for each ABI:

System V AMD64 Stack Frame

graph TB
    subgraph "Stack Frame (System V AMD64)"
        A["Previous Frame"] --> B["Return Address (8 bytes)"]
        B --> C["Saved RBP (8 bytes)"]
        C --> D["Local Variables"]
        D --> E["Red Zone (128 bytes) - Optional"]
    end

Windows x64 Stack Frame

Note on shadow space placement: The 32-byte shadow space is allocated by the caller above its own return address, before executing the call instruction. It sits in the caller’s frame, not the callee’s. The diagram below shows a simplified perspective from inside the callee — in practice the shadow space belongs to the caller and is used by the callee to optionally spill its register arguments.

graph TB
    subgraph "Stack Frame (Windows x64)"
        A["Previous Frame"] --> B["Return Address (8 bytes)"]
        B --> C["Shadow Space (32 bytes) — allocated by caller"]
        C --> D["Saved RBP (8 bytes)"]
        D --> E["Local Variables"]
    end

ARM64 Stack Frame

graph TB
    subgraph "Stack Frame (ARM64)"
        A["Previous Frame"] --> B["Saved x29/x30 (16 bytes)"]
        B --> C["Local Variables"]
        C --> D["Args beyond x7"]
    end

Common Pitfalls and Debugging Tips

These are the most common calling convention mistakes that trip up assembly programmers:

1. Forgetting Shadow Space (Windows)

Windows requires 32 bytes of shadow space even for functions with fewer than 4 arguments. Forgetting this causes stack corruption.

2. Stack Misalignment

All three calling conventions require 16-byte alignment before call. Misalignment causes crashes on SSE/AVX instructions.

3. Clobbering Callee-Saved Registers

If you use a callee-saved register without saving/restoring it, you’ll corrupt the caller’s state. This is a critical calling convention rule.


What’s Next?

Now you understand how calling conventions define function communication across the call boundary. In Part 3, we’ll explore program startup: what happens before main() runs, how command-line arguments are passed, and the role of the C runtime.

Experiment: Try the code examples on Godbolt with different compilers and optimization levels. Watch how the generated code follows these calling conventions.


References

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top