Rolling your own breakpoints without 0xCC
Some time ago, I had the need to interrupt the execution of a program at very specific instruction addresses in code. Basically, I needed to implement breakpoints from scratch. You will probably never want to do this, and that’s completely normal. However, it is an interesting thing to know about, and if you are some explorer from the internet who wants to implement breakpoint-like functionality without 0xCC
, then you’re in the right place!
First, a little about how breakpoints are implemented in your favorite C++ debugger on x86 platforms…
How breakpoints work
From a debugger perspective, breakpoints will stop your program whenever it hits a specific line of code. If you’ve done much at the assembly level, you probably know they specifically stop when you hit a specific processor instruction. They can feel like magic, but their implementation is actually surprisingly simple.
Say we have the following assembly code for a function that adds 2+5 and returns the result: (The machine code bytes are shown on the left.)
B8 05 00 00 00 mov eax,5
83 C0 02 add eax,2
C3 ret
Now say I want to place a breakpoint on the add
instruction. The way the debugger implements this by rewriting the code in memory. Specifically, it replaces the add
instruction with int 3
.
B8 05 00 00 00 mov eax,5
CC int 3
C0 02 ; Effectively garbage
C3 ret
When the int 3
instruction is hit, the breakpoint interrupt is called, which then notifies the debugger attached to the application. The debugger will let you know the breakpoint is hit, let you do your thing, and before you resume the program it will replace the old instruction.
Unfortunately, the process behind rewriting the instruction is out of the scope of this post. However in short, the general process is:
- Enable writing on the code segment
- Save the old instruction, write
int 3
- Flush the instruction cache
- Disable writing again.
There’s a bit more to it, like getting the application into a suspended state, but that should get you started.
Why int 3 is special
What is so special about int 3
? As you may be able to guess, the int
assembly instruction simply causes the processor to call the given interrupt.
int 3
is extra special though. Normally, an int
instruction is two bytes like 0xCD 0x03
. However, the x86 instruction set has a special opcode specifically for int 3
which is only one byte: 0xCC
. Intel’s x86 developer’s manual describes it as “Interrupt 3—trap to debugger.”
This makes is very ideal for replacing other instructions, because some instructions (like ret
) are only one byte long. If we replaced ret
(0xC3
) with a two byte instruction like 0xCD 0x03
, we would be overwriting unrelated code that follows the ret
(probably the prologue of another function.)
Why I couldn’t use int 3
Unfortunately, for my particular use case, I couldn’t use int 3
! For this specific project, we needed to be able to use the Visual Studio debugger on the program as it ran. In Windows environments, int 3
always goes straight to the debugger if one is attached. You can’t even intercept it with a vectored exception handler!1
So therefore, I needed a similar solution that still met the requirements that int 3
does:
- Must be one byte instruction2
- Must cause execution to jump to a specific point
- We must be able to resume execution as normal afterwards
Quite the tall order!
The quest for another method
Long story short, there is actually an instruction that meets criteria: The hlt
instruction! This instruction is normally used to halt the processor and stop execution. However, it is a privileged instruction. When you call it from unprivileged code, the processor fires the #GP(0)
exception. On Windows, this will cause your vectored exception handler to fire with a EXCEPTION_PRIV_INSTRUCTION
exception. (On Linux and friends, you receive SIGILL
.)
So if you implement breakpoints exactly like a debugger would but with the hlt
instruction instead of int 3
you can co-exist with other debuggers manipulating your code.
Is this a hack? Yes. Does it work? Yes. If like me you find yourself wanting to have breakpoint-like functionality without using the 0xCC
instruction, now you know how! Uhh…but don’t try it in a kernel-mode driver. That’d probably not go well.
There is the theoretical problem of what happens when you and the debugger start fighting over a specific instruction, but in our case we weren’t too concerned about this situation.
Sample code
Here’s a simple example of this concept in use. In this example, I left out all of the debugger-like stuff in favor of having the hlt
instructions placed at compile-time.
This sample simulates round-robin cooperative thread-switching in a Windows application.
// A simple example of using the hlt instruction to trigger a context switch
// Please note that you should never actually use this as a method of thread cooperation
#include <stdio.h>
#include <Windows.h>
#include <assert.h>
#define NUM_THREADS 2
static HANDLE threads[NUM_THREADS];
static int currentThread = 0;
#define HLT_INSTRUCTION 0xF4
LONG CALLBACK VectoredHandler(PEXCEPTION_POINTERS exceptionInfo)
{
// If the exception isn't EXCEPTION_PRIV_INSTRUCTION, or the instruction that caused the exception isn't HLT, we don't do anything:
if (exceptionInfo->ExceptionRecord->ExceptionCode != EXCEPTION_PRIV_INSTRUCTION
|| *((unsigned char*)(void*)exceptionInfo->ContextRecord->Eip) != HLT_INSTRUCTION
)
{
return EXCEPTION_CONTINUE_SEARCH;
}
// Advance past the hlt instruction: (Normally, you'd just restore the old instruction instead.)
exceptionInfo->ContextRecord->Eip++;
// If this was a HLT, we do a "task switch":
int previousThread = currentThread;
currentThread++;
if (currentThread >= NUM_THREADS)
{ currentThread = 0; }
ResumeThread(threads[currentThread]);
SuspendThread(threads[previousThread]);
return EXCEPTION_CONTINUE_EXECUTION;
}
DWORD ThreadMain(int threadNumber)
{
while (1)
{
printf("Thread %d is running...\n", threadNumber);
printf("Thread %d is switching away...\n", threadNumber);
__asm hlt;
printf("Thread %d is waking up!\n", threadNumber);
}
}
int main()
{
// Register the exception handler:
void* exceptionHandler = AddVectoredExceptionHandler(1, VectoredHandler);
assert(exceptionHandler != NULL);
// Create the threads:
for (int i = 0; i < NUM_THREADS; i++)
{
threads[i] = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)ThreadMain, (LPVOID)(i + 1), CREATE_SUSPENDED, NULL);
assert(threads[i] != NULL);
}
// Start the first thread and wait for all threads to exit:
assert(ResumeThread(threads[0]));
assert(WaitForMultipleObjects(NUM_THREADS, threads, TRUE, INFINITE) != WAIT_FAILED);
// Cleanup:
for (int i = 0; i < NUM_THREADS; i++)
{ assert(CloseHandle(threads[i])); }
assert(RemoveVectoredExceptionHandler(exceptionHandler));
}
If you run this program, you’ll see that the two threads never run at the same time. The output will show one thread running after another.
(NOTE: As stated in the documentation for SuspendThread
, you shouldn’t ever actually do this. Although SuspendThread
is probably safe in this situation, you’re much better off using synchronization objects or some other thread synchronization method.)