Memory paging fundamentals for kernel exploitation (Win10 x64)¶
How a 48-bit virtual address is walked through PML4 → PDPT → PD → PT to a physical frame on Windows 10 x64, where the PTE for any address lives, and which PTE control bits (U/S, R/W, NX, P) a kernel write primitive flips to defeat SMEP/NX.
Mechanism¶
Why it works
x86-64 in long mode uses 4-level paging. Translation starts at the physical address in CR3 (the PML4 base) and walks four 512-entry tables, consuming 9 address bits at each level, then a 12-bit page offset:
| Bits | Field | Table walked |
|---|---|---|
| 63–48 | sign extension | (canonical) |
| 47–39 | PML4 index | PML4 |
| 38–30 | PDPT index | PDPT (PDPE) |
| 29–21 | PD index | PD (PDE) |
| 20–12 | PT index | PT (PTE) |
| 11–0 | page offset | 4 KB frame |
Each table is one 4 KB page of 512 × 8-byte entries (9 bits = 512). Bit 47
decides the canonical sign extension: user addresses are 0..., kernel
addresses are F.... A Page Frame Number (PFN) stored in an entry is just
a physical page number — the physical base is PFN * 0x1000. The final PTE's
PFN, times 0x1000, plus the 12-bit offset, is the physical address.
The exploitation-relevant trick: because the page tables are themselves
pages in physical memory, the OS maps them back into virtual memory through a
self-reference. On Windows 10 x64 the PTE array is mapped at a known base,
historically 0xFFFFF68000000000 and, after page-table randomization
(a Win10 mitigation), at a randomized base such as 0xFFFFFE0000000000. Given
that base, the PTE that controls a virtual address VA is found by indexing
into that array by VA's page number:
Each PTE carries control bits that the MMU enforces on every access:
- P (Present, bit 0) — entry is valid; cleared ⇒ page fault.
- R/W (bit 1) — writable when set, read-only when clear.
- U/S (bit 2) — User when set, Supervisor (kernel) when clear.
- A (Accessed)/D (Dirty) — set by the CPU on reference/write.
- NX (bit 63) — No-eXecute when set (DEP / NonPagedPoolNx).
These three bits are what kernel exploits target. SMEP forbids ring-0 from executing a page whose PTE has U/S = User; SMAP forbids ring-0 data access to such pages. So an attacker with a kernel write primitive who can locate the PTE of their user-mode shellcode page can clear the U/S bit (User → Supervisor) and clear NX. The page keeps its user-mode address, but the MMU now treats it as a kernel, executable page — SMEP/NX no longer fire, because from the CPU's view it is a supervisor page. This is the core of "PTE overwrite" SMEP/DEP bypasses.
Walkthrough¶
Authorized testing only
Use a Windows 10 x64 VM with a kernel debugger you own. Page-table base and PFNs are randomized per boot; numbers below are illustrative of one session.
1. Attach WinDbg and find a target virtual address. Pick a user-mode page you
control (e.g., a VirtualAlloc'd shellcode buffer at VA).
2. Walk the tables with !pte. !pte <VA> prints all four entries and the
final frame, with control bits:
kd> !pte 0x1f0000
VA 00000000001f0000
PXE at FFFFFE7F3F9FCF00 PPE at FFFFFE7F3F9E0000 PDE at FFFFFE7F3C000000 PTE at FFFFFE0000000F80
contains ... contains ... contains ... contains 0090000012345867
pfn 12345 ---DA--UWEV
The trailing flag string (---DA--UWEV) decodes the bits. Here U = User, W
= Writable, E = (not) eXecute-disabled per the level's encoding, V = Valid
(Present). The PTE at ... address is the kernel virtual address of the 8-byte
PTE that governs VA.
3. Convert virtual → physical manually. Confirm the walk with !vtop using the
CR3 base, and dump the frame with !dd:
kd> !vtop @cr3 0x1f0000 ; resolve VA using current CR3
kd> !dd <physical_base> ; dump the physical page contents
4. Compute the PTE address yourself. This is what an exploit does without a debugger — derive the controlling PTE from the randomized base:
// PTE_BASE leaked/known for this boot (randomized on Win10).
unsigned long long pte_of(unsigned long long va, unsigned long long pte_base) {
return pte_base + ((va >> 12) * 8); // (VA>>12) page number, *8 bytes/PTE
}
5. Flip the bits with a write primitive. Read the current PTE, clear U/S (make it Supervisor) and clear NX (make it executable), write it back:
unsigned long long pte = kread64(pte_of(va, pte_base));
pte &= ~(1ULL << 2); // U/S: User(1) -> Supervisor(0) => SMEP/SMAP see kernel page
pte &= ~(1ULL << 63); // NX -> 0 => page becomes executable (defeat DEP/NX)
pte |= (1ULL << 1); // R/W set, if a writable shellcode page is desired
kwrite64(pte_of(va, pte_base), pte);
// invalidate the stale TLB entry for va (e.g., re-access / context switch)
The user-address shellcode at va is now a kernel-executable page; redirecting
kernel control flow to va no longer faults under SMEP/DEP.
Detection¶
- PatchGuard does not monitor arbitrary user PTEs, so a single U/S flip is largely invisible at runtime; detection is easier at the primitive level (an exploitable kernel write).
- On hardened systems, HVCI / Kernel CFG / SMEP+SMAP raise the bar so a single PTE flip is no longer sufficient on its own.
Mitigation¶
- SMEP / SMAP (hardware) — make U/S-tagged pages unusable from ring 0 unless the attacker explicitly flips the bit, which is the whole point of patching them; defense-in-depth must therefore protect the write primitive.
- Page-table randomization (Win10) — randomizes
PTE_BASE, so the attacker must first leak it before computing PTE addresses. - HVCI (Hypervisor-protected Code Integrity) — uses second-level address translation (EPT) so the guest cannot make an arbitrary page kernel-executable even after flipping a guest PTE.
- Keep NonPagedPoolNx / DEP enforced so kernel data pages are non-executable by default.