Memory & Binary

How Memory Works

Stack, heap, registers, and why buffer overflows happen — the theory behind exploitation

14 min readUpdated 2026-04-18
#memory#buffer-overflow#stack#heap#aslr#nx#rop
TL;DR
  • Memory is divided into segments: stack (local variables, function frames), heap (dynamic allocation), text (code), data (globals)
  • The stack grows downward; buffer overflows write past the end of a buffer and overwrite adjacent memory
  • The instruction pointer (EIP/RIP) controls what code executes next — overwriting it controls the program
  • Modern defences: ASLR (randomises addresses), NX/DEP (non-executable stack), stack canaries, PIE
  • ROP (Return-Oriented Programming) chains existing code gadgets to bypass NX without injecting new code

How a Program Uses Memory

When a program runs, the OS allocates a virtual address space divided into segments:

High addresses
Stack
local vars, function frames
grows ↓
↓ free space ↑
Heap
grows ↑ (malloc, new)
BSS Segment
uninitialised globals
Data Segment
initialised globals
Text Segment
code (read-only)
Low addresses
NOTE

Each running process has its own virtual address space. The OS maps it to physical memory. This is why two processes can use the same address (e.g. 0x7fff...) without conflicting.


CPU Registers

Registers are tiny, ultra-fast storage locations inside the CPU.

RegisterNamePurpose
EIP / RIPInstruction PointerAddress of next instruction to execute
ESP / RSPStack PointerTop of the current stack
EBP / RBPBase PointerBottom of current stack frame
EAX / RAXAccumulatorGeneral purpose, function return value
EBX / RBXBaseGeneral purpose
ECX / RCXCounterLoop counter
EDX / RDXDataGeneral purpose
ESI / RSISource IndexString/memory operations source
EDI / RDIDestination IndexString/memory operations destination

E prefix = 32-bit (x86). R prefix = 64-bit (x86-64). EIP/RIP is the most important — controlling it means controlling what the CPU executes next.


The Stack — Frame by Frame

The stack manages function calls. Every function call pushes a stack frame containing:

  • Return address (where to go back after the function ends)
  • Saved base pointer
  • Local variables and buffers
c
void vulnerable(char *input) {
    char buffer[64];           // 64 bytes on the stack
    strcpy(buffer, input);     // copies input with no length check
}

When vulnerable() is called:

High address (top of frame)
return address
← overwrite to redirect execution
saved EBP
buffer[64]
input is copied here
Low address
step 1Normal inputsafe64 bytes or fewer
step 2Overflow inputattacker80+ bytes
step 3Overwrite EBPbytes 65–68
step 4Overwrite EIPbytes 69–72
step 5Execute attacker codeEIP > shellcode/ROP

Buffer Overflows — The Classic Exploit

A buffer overflow happens when a program writes more data into a buffer than it can hold, overwriting adjacent memory.

c
// Vulnerable C code
void vuln() {
    char name[100];
    gets(name);   // gets() has no length limit — NEVER use this
}

// Safe alternative
fgets(name, sizeof(name), stdin);

Finding the Offset (How Many Bytes to Overwrite EIP)

bash
# Generate a cyclic pattern (no repeating 4-byte sequences)
python3 -c "import pwn; print(pwn.cyclic(200).decode())"

# Or with Metasploit's tools
/usr/share/metasploit-framework/tools/exploit/pattern_create.rb -l 200

# After crash, find the offset from the value in EIP
python3 -c "import pwn; print(pwn.cyclic_find(0x61616171))"  # pwntools
/usr/share/metasploit-framework/tools/exploit/pattern_offset.rb -q 0x61616171

Writing the Exploit

python
import pwn

# Offset found: 112 bytes to reach EIP
padding = b"A" * 112

# Address to jump to (e.g. a JMP ESP instruction or shellcode address)
# Must match target architecture — little-endian for x86
eip = pwn.p32(0xdeadbeef)   # 32-bit little-endian pack

# Shellcode — execve("/bin/sh", NULL, NULL)
shellcode = pwn.shellcraft.i386.linux.sh()
shellcode_bytes = pwn.asm(shellcode)

payload = padding + eip + b"\x90" * 16 + shellcode_bytes  # NOP sled before shellcode
NOP Sled

A NOP sled (\x90\x90\x90...) is a sequence of "do nothing" instructions before the shellcode. It gives the EIP a larger target to land on — any address in the sled slides down to the shellcode. Essential when the exact shellcode address is uncertain.


Modern Defences

Real-world systems have mitigations. Understanding each one shapes the exploitation approach.

ASLR (Address Space Layout Randomisation)

Randomises the base addresses of the stack, heap, and libraries on every execution.

bash
# Check ASLR status on Linux
cat /proc/sys/kernel/randomize_va_space
# 0 = disabled, 1 = partial, 2 = full

# Disable for testing (requires root)
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Bypasses: Memory leaks (format strings, heap spraying), brute force (32-bit only), partial overwrites.

NX / DEP (Non-Executable Memory)

Marks the stack and heap as non-executable. Shellcode injected there causes a segfault instead of running.

bash
# Check if NX is enabled on a binary
checksec --file=./binary

Bypass: Return-Oriented Programming (ROP) — use existing executable code, not injected shellcode.

Stack Canaries

A random value placed between local variables and the return address. If a buffer overflow overwrites the canary, the program detects it and crashes before the return.

bash
# checksec shows all protections at once
checksec --file=./binary
# RELRO      STACK CANARY   NX         PIE
# Full       Canary found   NX enabled Enabled

Bypass: Information leak to read canary value before overwriting it.

PIE (Position Independent Executable)

Makes the entire binary position-independent — code section is also randomised by ASLR.

Impact: Without PIE, the code (text) segment is always at a fixed address, making ROP gadgets trivially findable. PIE + ASLR together mean all addresses are randomised.


ROP — Return-Oriented Programming

ROP bypasses NX by chaining together small existing code sequences called gadgets — each ending with a ret instruction.

Normal call: CALL function → runs function → returns. ROP bypasses this by chaining gadgets via return addresses on the stack:

ROP chain — stack layout (top executes first)
fake return address #1
gadget 1 address
gadget 1: pop rdi; ret
pops next stack value into rdi
gadget 2 address
gadget 2: pop rsi; ret
pops next stack value into rsi
system() address
calls system() with controlled args
"/bin/sh" address
bash
# Find ROP gadgets in a binary
ROPgadget --binary ./binary --rop
ropper -f ./binary

# pwntools automates chain building
python3 -c "
from pwn import *
elf = ELF('./binary')
rop = ROP(elf)
rop.call('system', [next(elf.search(b'/bin/sh'))])
print(rop.dump())
"

The Heap — Dynamic Memory

The heap stores data allocated at runtime (malloc() in C, new in C++).

Heap vulnerabilities:

VulnerabilityCauseImpact
Heap overflowWrite past end of allocated chunkOverwrite adjacent metadata
Use-After-Free (UAF)Use pointer after free()Dangling pointer > arbitrary write
Double Freefree() same pointer twiceHeap corruption
Heap sprayFill heap with shellcode + offsetsReliably land at known address
Use-After-Free in Modern Exploits

UAF is the dominant class of memory corruption vulnerabilities in modern browsers (Chrome, Firefox). The browser's JavaScript heap is complex and frequently exploited. Most Chrome exploits in the wild are UAF.


Format String Vulnerabilities

printf(user_input) — if user_input contains format specifiers like %x, %s, %n, the function reads from (or writes to) the stack.

c
// Vulnerable
printf(user_input);           // user controls format string

// Safe
printf("%s", user_input);    // format string is fixed
bash
# Leak stack values:
./vuln <<< "%x.%x.%x.%x.%x"

# Write to memory (very powerful):
./vuln <<< "%n"   # writes the number of bytes printed so far to the pointed address

What format strings give you:

  • Arbitrary read — leak ASLR addresses, canary values
  • Arbitrary write — overwrite function pointers, GOT entries

Tools for Binary Exploitation

ToolPurpose
gdb + pwndbg/pedaDynamic analysis, crash examination
pwntools (Python)Exploit scripting library
checksecEnumerate binary protections
ROPgadget / ropperFind ROP gadgets
ghidra / IDA ProStatic disassembly and decompilation
ltrace / straceTrace library/system calls
objdumpDisassemble binary
stringsExtract printable strings (find hardcoded creds)
bash
# Basic GDB workflow
gdb ./binary
(gdb) run < input.txt         # Run with input
(gdb) info registers          # Print all registers after crash
(gdb) x/32x $esp              # Examine 32 hex words from stack pointer
(gdb) disassemble main        # Disassemble main function

Operational Notes

  • 32-bit vs 64-bit matters a lot — function arguments in 64-bit are passed in registers (rdi, rsi, rdx) not on the stack. ROP chains and calling conventions differ entirely.
  • ASLR defeats most binary exploits on modern systems without a leak. Your first goal in most heap/stack exploits is getting a memory address, not code execution directly.
  • CTF exploits vs real-world exploits — CTF binaries have protections disabled to make challenges tractable. Real programs have full protections enabled. The skills transfer, but the complexity scales up significantly.
  • `strings ./binary | grep -i pass` — always run strings on an unknown binary first. Hardcoded credentials are found this way more often than you'd expect.

  • Exploitation — apply this foundation to real CVEs, Metasploit, and manual exploit development
  • Post-Exploitation — once inside, memory skills help dump credentials from LSASS
  • Evasion & AV Bypass — shellcode obfuscation relies on understanding how memory and execution work