Technical Glossary
The complete technical dictionary for the laboratory. Definitions and cross-references for every instruction, architecture, and compiler concept.
-O2
-O2 is a common compiler flag used in GCC and Clang that enables most optimizations that do not involve a space-speed tradeoff, providing a strong balance between compilation time and execution speed.
AAPCS64
ARM Architecture Procedure Call Standard for 64-bit. The calling convention used by ARM64/AArch64 on Linux, macOS, and most embedded environments.
AArch64
See ARM64 / AArch64. The 64-bit state of the ARMv8-A architecture and later, featuring 31 general-purpose registers and fixed-width instructions.
ABI
The Application Binary Interface (ABI) is the low-level interface between an application and the operating system or between different program modules. It dictates calling conventions, register usage, data types, and memory layout.
ARM64 / AArch64
The 64-bit execution state of the ARM architecture. Uses 31 general-purpose 64-bit registers (X0–X30) and fixed-width 32-bit instructions.
ASLR (Address Space Layout Randomization)
A security technique that randomizes the base addresses of the stack, heap, and executable segments each time a process is loaded.
Assembly Language
Low-level programming at the instruction level. Assembly provides a symbolic representation of the machine code instructions used to control a specific CPU architecture.
AST (Abstract Syntax Tree)
A tree representation of the syntactic structure of source code. Each node represents a construct (e.g., binary expression, function declaration).
AT&T Syntax
The assembly syntax style used by the GNU Assembler (GAS) and default in GCC/Clang output on Unix systems. Distinguished by source-first operand order, % register prefixes, and $ immediate prefixes.
Basic Block
A straight-line sequence of instructions with no branches except at the entry and exit. Control flow enters at the top and exits at the bottom. Basic blocks are the fundamental units of control-flow graphs and SSA-form IR.
Binary Analysis
Binary analysis is the process of extracting information from compiled executable files (like ELF or PE) without access to the source code, often used for reverse engineering, security auditing, and debugging.
Breakpoint (Software / Hardware)
A debugging mechanism that pauses program execution at a specific address. Software breakpoints work by replacing an instruction with a trap (`int3` on x86-64, `brk #0` on ARM64). Hardware breakpoints use CPU debug registers and don't modify code.
BSS Section (.bss)
The Block Started by Symbol section in an executable. Holds uninitialized global and static variables. Takes no space in the file on disk — the OS zero-fills the memory at load time.
C
C is a general-purpose, procedural programming language that provides low-level memory access and a simple set of keywords. It is widely considered the lingua franca of systems programming.
C/C++
C and C++ are compiled, statically-typed systems programming languages. They provide direct access to memory manipulation and hardware-level operations, making them the standard for OS kernels, drivers, and compilers.
Callee-Saved Register
A register whose value must be preserved across function calls. If a function (the callee) wants to use it, it must save and restore it (typically via push/pop). Also called non-volatile registers. On System V x86-64: RBX, RBP, R12–R15.
Caller-Saved Register
A register that may be overwritten by any function call. The caller must save it before calling if it needs the value afterward. Also called volatile registers. On System V x86-64: RAX, RCX, RDX, RSI, RDI, R8–R11.
Calling Convention
A set of rules defining how functions receive parameters, return values, and which registers must be preserved. Common examples include System V AMD64 ABI, Microsoft x64, and AAPCS64 for ARM64.
Clang
Clang is a compiler frontend for the C, C++, and Objective-C programming languages. It acts as a drop-in replacement for GCC and translates source code into LLVM Intermediate Representation (IR).
Code Generation
Code generation is the final phase of compilation where an intermediate representation (like LLVM IR) is translated into the specific machine code or assembly language for a target architecture.
Compiler Engineering
The study and implementation of language translators. This field covers lexical analysis, parsing, semantic analysis, optimization, and code generation.
Compiler Optimization
Compiler optimization refers to the various passes (like constant folding, dead code elimination, and loop unrolling) a compiler applies to the IR to produce faster or smaller machine code.
Constant Folding
A compiler optimization that evaluates constant expressions at compile time instead of generating code to compute them at runtime. For example, 3 + 4 * 2 is replaced by 11 in the compiled output.
Data Section (.data)
The section of an executable that contains initialized global and static variables. Unlike .bss, these values occupy space in the file on disk because they have non-zero initial values that must be loaded into memory.
Dead Code Elimination
A compiler optimization that removes code which can never be executed or whose results are never used.
Debugger
A debugger is a development tool that allows developers to inspect the runtime state of a program. It supports setting breakpoints, stepping through execution, and examining memory and CPU registers.
dyld
See Dynamic Linker (ld.so / dyld). The runtime component that loads shared libraries (.dylib) and resolves symbols on macOS.
Dynamic Linker (ld.so / dyld)
The runtime component that loads shared libraries (.so on Linux, .dylib on macOS) and resolves external symbols when a program starts or lazily at first call. On Linux it is `ld-linux-x86-64.so.2`; on macOS it is `dyld`.
Dynamic Linking
The process of deferring symbol resolution to load time or runtime, allowing multiple programs to share a single copy of a library. This reduces binary size and allows library updates without recompilation.
ELF (Executable and Linkable Format)
The standard binary format on Linux and most Unix-like systems.
Executable Formats
An executable format is a standardized file structure used by an OS to organize code and data in a binary. Common formats include ELF (Linux), Mach-O (macOS), and PE (Windows).
Function Epilogue
The sequence of instructions at the end of a function that tears down the stack frame and returns control to the caller. Typically restores the frame pointer and stack pointer, then executes a return instruction (`ret`).
Function Prologue
The sequence of instructions at the start of a function that sets up the stack frame. Typically saves the old frame pointer, establishes a new frame pointer, and allocates space for local variables on the stack.
GAS (GNU Assembler)
The assembler in the GNU Binutils toolchain. Default on Linux. Uses AT&T syntax by default (switchable to Intel with `.intel_syntax`). Invoked as `as` or through GCC.
GCC
The GNU Compiler Collection (GCC) is a highly optimized, open-source compiler system produced by the GNU Project supporting various programming languages and target architectures.
GOT (Global Offset Table)
A table of pointers in position-independent code used to access global variables and functions from shared libraries.
Graph Coloring
The algorithm used by most production compilers for register allocation. Models registers as colors and variables as graph nodes.
Instruction Selection
The compiler backend phase that maps target-independent IR operations to actual machine instructions. LLVM uses SelectionDAG and GlobalISel for this stage.
Instruction Set Architecture (ISA)
The abstract specification of a processor's machine language: its instructions, registers, memory model, and encoding format. x86-64 and ARM64 are ISAs.
Intel Syntax
The assembly syntax style originated by Intel. Uses destination-first operand order (e.g., mov dest, src) and does not use register prefixes.
Intermediate Representation (IR)
A compiler's internal representation of a program, sitting between the source language and target machine code. Designed to be language-independent and target-independent.
IR
Intermediate Representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code before translating it to target machine code.
Lazy Binding
A dynamic linking strategy where external function addresses are resolved on first call rather than at program startup.
ld.so
See Dynamic Linker (ld.so / dyld). The runtime component that loads shared libraries (.so) and resolves symbols on Linux.
Lexer / Tokenizer
The first phase of a compiler frontend that reads raw source text and produces a stream of tokens — classified chunks like identifiers, keywords, operators, and literals.
Linker
A linker is a tool that takes one or more object files generated by a compiler and combines them into a single executable program or library. It resolves symbols and performs address relocation.
Linux
Linux is a widely-used, open-source operating system kernel that forms the foundation of numerous distributions. It provides core OS functionalities including memory management, process scheduling, and hardware interfacing.
LLVM
LLVM is a compiler framework that uses a strictly defined Intermediate Representation (IR) to optimize code. It serves as the backend for many modern compilers, including Clang and Rustc.
LLVM IR
LLVM's typed, SSA-form intermediate representation. Serves as the universal interface between frontends and backends.
Loader
The operating system component responsible for reading an executable file from disk, mapping its segments into virtual memory, and transferring control to the entry point.
Loop Unrolling
A compiler optimization that replicates the body of a loop multiple times to reduce branch overhead and enable further optimizations like vectorization. Trades code size for speed.
LTO (Link-Time Optimization)
An optimization technique where the compiler defers optimization until link time, enabling cross-module analysis.
Mach-O
The executable file format used by macOS, iOS, and other Apple platforms. Supports fat/universal binaries bundling multiple architectures.
macOS
macOS is a Unix-based operating system developed by Apple. It features a hybrid kernel (XNU) combining elements of Mach and FreeBSD, and heavily utilizes Mach ports for IPC and system-level operations.
MASM
Microsoft's assembler for x86/x64, using Intel syntax with unique directives (PROC, ENDP, INVOKE).
NASM (Netwide Assembler)
A popular open-source x86/x64 assembler using Intel syntax. Widely used for learning assembly due to its clean, consistent syntax. Cross-platform (Linux, macOS, Windows).
Object Files
Object files are the intermediate output of a compiler. They contain machine code and a symbol table, but references to external functions or variables remain unresolved until processed by a linker.
Operating Systems
An operating system is the core system software that manages computer hardware and software resources, providing common services like memory management, process scheduling, and file systems for applications.
Optimization Pass
A single transformation applied to the IR during compilation. Examples include constant folding, dead code elimination, and loop unrolling.
Parser
The compiler phase that takes a token stream from the lexer and builds an Abstract Syntax Tree (AST) according to the language grammar. Validates syntactic correctness.
PE (Portable Executable)
The executable file format used by Windows for .exe, .dll, .sys files. Supports both x86-64 and ARM64.
PHI Node
An SSA construct that selects a value based on which predecessor basic block control flow came from. Necessary because in SSA form each variable has exactly one definition.
PLT (Procedure Linkage Table)
A table of small code stubs used for calling dynamically-linked functions.
Process Loading
The sequence of steps the OS performs to launch a program: reading the executable header, mapping segments into virtual memory, and jumping to the entry point.
ptrace
A Unix system call that allows one process to observe and control the execution of another.
Register
A small, fast storage location inside the CPU. x86-64 has 16 general-purpose 64-bit registers (RAX–R15); ARM64 has 31 (X0–X30) plus the zero register XZR. Fastest level of the memory hierarchy.
Register Allocation
The compiler phase that maps an unlimited number of virtual registers to the finite set of physical CPU registers.
Register Spilling
When the register allocator runs out of physical registers to hold active variables, it 'spills' a value to a stack slot in memory. This adds latency but allows the program to continue.
Relocation
A record in an object file that tells the linker to patch an instruction or data reference with the correct address once the final layout is determined.
SSA (Static Single Assignment)
An IR property where every variable is assigned exactly once. Simplifies many optimizations.
Stack Frame
The region of the call stack allocated for a single function invocation. It typically stores the return address, saved frame pointer, local variables, and arguments passed on the stack.
Static Linking
Combining all object files and library code into a single self-contained executable at build time. No runtime dependencies on shared libraries.
svc (Supervisor Call)
See System Call (syscall / svc). On ARM64 architectures, the `svc #0` (or `svc #0x80` on macOS) instruction is used to switch the CPU to kernel mode and request a system service.
Symbol Table
A data structure in object files that maps symbol names to their addresses and sizes. The linker uses this to resolve external references and combine object files.
Syscall
See System Call (syscall / svc). The interface between user-space programs and the OS kernel.
System Call (syscall / svc)
The interface between user-space programs and the OS kernel. Transitions the CPU from user mode to kernel mode to request OS services. On x86-64 Linux: `syscall`; on ARM64: `svc #0`.
System V ABI
The calling convention used on Linux and macOS for x86-64. It defines register usage for arguments (RDI, RSI, RDX, RCX, R8, R9) and return values (RAX).
Systems Programming
Systems programming involves writing system software (like OS kernels, drivers, and compilers) where performance and hardware constraints are critical. It typically requires manual memory management and direct hardware interaction.
Text Section (.text)
The section of an executable that contains the machine code instructions. Typically mapped as read-only and executable in memory.
Virtual Memory
An abstraction that gives each process its own private, contiguous address space, mapped to physical memory by the CPU's MMU.
Windows
Microsoft Windows is a family of proprietary operating systems with a fundamentally different architecture from Unix-like systems, notably using the PE executable format and a specialized API suite (Win32/NT API).
Windows x64 ABI
Microsoft's calling convention for 64-bit Windows. Notable for its 32-byte 'shadow space' and using RCX, RDX, R8, R9 for the first four arguments.
x64
x64 (or x86-64) is a 64-bit extension to the x86 instruction set architecture, offering expanded registers, a larger virtual address space, and new instructions compared to 32-bit x86.
x86-64 Architecture
The 64-bit extension of the x86 instruction set architecture, originally developed by AMD as AMD64 and later adopted by Intel as Intel 64.