2nd FASM Technical Discusion, Brno, 2007 August 25th Debugging in Long Mode - AMD64 by Feryno References: 24593.pdf from www.amd.com www.kernel.org (Linux kernel) man ptrace (Linux help) msdn.microsoft.com self-mistakes and a lot of years spending by debugging because of them... While debugging, we are playing with an executable program. We can stop it, change its memory/registers when it is stopped, step it, resume its execution. CPU executes code very quickly. During debugging, we can execute code at the speed observable by human senses (sight). For playing the game, we need another program - debugger. Why programmers need debugging? 1. find bugs (critical errors causing program crash) 2. find mistakes in procedures giving wrong unexpected result 3. improve procedures by exploring them using stepping through instructions and watching registers/memory changes 4. unknown executable discovered in system (5. learning what instructions do - suitable especially for beginners. I do it very often too - instead of reading manuals.) Debugging is possible thanks to CPU features called "exceptions". First 32 interrupts (int00h-int1Fh) are reserved for exceptions. Exceptions behave very similarly to interrupts - every exception forces interrupting the program execution and control is transfered from the currently-executing program to the routine handling the interrupting exception. These routines are part of OS kernel and they have names "exception handlers". During the control transfer to the exception handler, the CPU stops execution of the program and saves its return instruction pointer (RIP), stack pointer (RSP), flags register (RFLAGS). The handler is responsible for saving the remaining state of the interrupted program (GPR, XMM, ...). Saving registers allows the CPU to restart the interrupted program after the handler finishes exception handling. Most exceptions means an occurrence of a "degenerated" instruction/code in the program - in this case, exception boundary is reported before the instruction causing the exception, the interrupted instruction isn't allowed to complete. These exceptions are called "faults". For the life to be more complicated, the reported instruction pointer lies sometimes on the address of the following instruction, so the boundary is reported after the instruction causing the exception, and the execution of the instruction causing the exception is allowed to complete. These exceptions are called "traps". The benefit of these traps for our life is that they are the core of debugging. exception divide by zero triggers int00 vector samples: mov rcx,0 div rcx ; divisor is 0 mov rdx,3 mov rax,0 mov rcx,2 div rcx ; result (quotient) is bigger than the capabilities of the destination register exception single step triggers int01 vector samples: icebp ; db 0F1h pushfq or qword [rsp],1 shl 8 ; set Trap Flag popfq ; the core method of single stepping (in fact, OS sets this bit in program context and reloads registers when task switching) lea rax,[trap_instruction] mov dr0,rax mov eax,1 mov dr7,rax ; the core method of hardware breakpoints trap_instruction: lea rax,[mem_write_addr] mov dr0,rax mov eax,10001h mov dr7,rax mem_write_addr rb ? exception breakpoint triggers int03 vector samples: int3 db 0CCh int 03h db 0CDh, 03h exception invalid opcode triggers int06 vector samples: ud2 lea rax,rdx ; the source operand is a register db 8Dh, 0C2h correct is lea rax,[rdx] (a lot of instructions illegal in long mode...) double fault triggers int08 vector exception stack fault triggers int0C vector exception general protection triggers int0D vector exception page-fault triggers int0E vector exception aligment check triggers int11 vector ; kernel mov rax,cr0 or rax,1 shl 18 mov cr0,rax ; AM bit of CR0 ; user mode program, stack aligned at qword or dqword pushfq or qword [rsp],1 shl 18 ; set AC bit of rflags popfq mov eax,[rsp+1] ; occurs only when CPL=3, never occurs when CPL < 3. Interactions: OS / \ program debugger 1. Program causes an exception. 2. CPU stops the execution of the program, saves instruction pointer, stack pointer, flags of the program and control is given to the corresponding exception handler (= interrupt vector). 3. OS handles the interrupt vector and notifies the debugger about the exception. Linux64: mov eax,sys_wait4 syscall Win64: call qword [KERNEL32.WaitForDebugEvent] 4. User is allowed to change registers/memory of the program via debugger. Linux64: mov edi,PTRACE_GETREGS mov eax,sys_ptrace syscall PTRACE_GETREGS, PTRACE_SETREGS, PTRACE_PEEKTEXT, PTRACE_POKETEXT, PTRACE_PEEKDATA, PTRACE_POKEKDATA Win64: call qword [KERNEL32.GetThreadContext] GetThreadContext, SetThreadContext, ReadProcessMemory, WriteProcessMemory 5. User can resume execution of the program via debugger. Linux64: mov edi,PTRACE_CONT mov eax,sys_ptrace syscall mov edi,PTRACE_SINGLESTEP mov eax,sys_ptrace syscall Win64: call qword [KERNEL32.ContinueDebugEvent] If the program doesn't cause any exception then the program runs to its end and terminates. In this case, the debugger doesn't encounter any exception, debugger is only notified about program termination at the end. This is a dream of every assembly-coder and desirable terminal stage of developping any program... well not exactly, some procedures may still behave in an incorrect way and give unexpected return values... "Hardware" Breakpoint triggers int01 vector HW BP is done by setting some debug registers. We need to focus only on debug registers 0, 1, 2, 3, 6, 7 DR4, DR5, DR8, DR9, DR10, DR11, DR12, DR13, DR14, DR15 aren't used. Isn't it a pity? But on the other side, it could be even more complicated... The debug registers can be read and written only when the current-protection level (CPL) is 0 (most privileged) - kernel. ; CPL=0 mov rax,dr7 mov dr3,rcx User mode debugger running at CPL=3 can access debug registers of a program when the program is stopped after causing an exception. Linux64: mov edi,PTRACE_GETREGS mov eax,sys_ptrace syscall mov edi,PTRACE_SETREGS mov eax,sys_ptrace syscall Win64: call qword [KERNEL32.GetThreadContext] call qword [KERNEL32.SetThreadContext] DR0 DR1 DR2 DR3 64-bit registers hold virtual (linear) address. lea rax,[address] mov dr0,rax If we need to set debug register DR0-DR3, then we must set its conditions in DR7 register - enabled bit, type, lenght. DR7 bit(s) mnemonic description 31-30 LEN3 Length of Breakpoint #3 29-28 R/W3 Type of Transaction to Trap 27-26 LEN2 Length of Breakpoint #2 25-24 R/W2 Type of Transaction to Trap 23-22 LEN1 Length of Breakpoint #1 21-20 R/W1 Type of Transaction to Trap 19-18 LEN0 Length of Breakpoint #0 17-16 R/W0 Type of Transaction to Trap 6 L3 Local Exact Breakpoint #3 Enabled 4 L2 Local Exact Breakpoint #2 Enabled 2 L1 Local Exact Breakpoint #1 Enabled 0 L0 Local Exact Breakpoint #0 Enabled LEN0-LEN3 00b 1 byte 01b 2 byte, addr in corresp DR0-3 must be word aligned 10b 8 byte, address in DR must be qword aligned 11b 4 byte, address must be dword aligned R/W0-R/W3 00b int01 breakpoint on instruction execution, LEN must be 1 byte (LENx = 00b) 01b int01 occurs only on data write 10b int01 only on I/O read/write if CR4.DE=1 (bit 3. of CR4) - in, out, insb, outsb if CR4.DE=0 this setting is undefined 11b int01 occurs only on data read or data write We want to set DR0-DR3 register DRx, x = 0, 1, 2, 3 lea rax,[address] mov DRx,rax mov eax,((lenght*4 + type) shl (x*4 + 16)) + (1 shl (x*2)) mov dr7,rax ; example 0 ; we want to watch reading from or writing into 1 qword at address 100005120h (address range 100005120h-100005127h) lea rax,[100005120h] mov dr0,rax mov rax,dr7 and eax,not ((1111b shl 16) + 11b) ; mask off all or eax,(1011b shl 16) + 1 ; prepare to set what we want mov dr7,rax ; set it finally ; Done, now we can wait until code falls into the trap ! After accessing any byte at 100005120h-100005127h, int01 will occur and DR6.B0 bit will be set to 1. ; example 1 ; we want to watch writing into 8 bytes at address range 40AF31h-40AF38h ; it doesn't work by setting lenght=8 (address isn't aligned at dqword boundary) ; we must set more breakpoints to cover the whole address range ; breakpoint 0. to watch 1 byte at 40AF31h ; breakpoint 1. to watch 1 word at 40AF32h-40AF33h ; breakpoint 2. to watch 1 dword at 40AF34h-40AF37h ; breakpoint 3. to watch 1 byte at 40AF38h mov rax,dr7 and eax,0000FF00h ; mask off all lea rdx,[40AF31h] mov dr0,rdx or eax,(0001b shl 16) + 1 lea rdx,[40AF32h] mov dr1,rdx or eax,(0101b shl 20) + 100b lea rdx,[40AF34h] mov dr2,rdx or eax,(1101b shl 24) + 10000b lea rdx,[40AF38h] mov dr3,rdx or eax,(0001b shl 28) + 1000000b mov dr7,rax ; example 2 ; we want to break on the execution of the instruction at 401235h ; note: the instruction must start exactly at this address ; if the set address lies somewhere inside the instruction (instruction has 2 or more bytes) then int01 won't occur !!! lea rax,[401235h] mov dr0,rax mov rax,dr7 and eax,not ((1111b shl 16) + 11b) ; mask off all or eax,(0000b shl 16) + 1 mov dr7,rax ; example 3 ; we want to watch reading from or writing into ports 20-27h (kernel dbg - in, out, insb, outsb) mov rax,cr4 or rax,1 shl 3 ; CR4.DE bit 3. on (Debugging Extensions) mov cr4,rax mov eax,20h mov dr3,rax mov rax,dr7 and eax,not ((1111b shl 28) + 11000000b) ; mask off all or eax,1010b shl 28 + 01000000b ; LEN3=10b (8 bytes), R/W3=10b (I/O) mov dr7,rax The condition which caused an int01 exception is recorded in the DR6 debug-status register. DR6 bit name event 14 BS Single Step (rFLAGS.TF has been set) 13 BD Breakpoint Debug Access Detected (DR7.GD was set) 3 B3 Breakpoint #3 Condition Detected 2 B2 Breakpoint #2 Condition Detected 1 B1 Breakpoint #1 Condition Detected 0 B0 Breakpoint #0 Condition Detected DR7 bit(s) mnemonic description 13 GD General Detect Enabled When this bit is set, the debug exception (int01) occurs when an attempt is made to execute a MOV DRn instruction to any debug register (DR0-DR3, DR6, DR7). This bit is cleared to 0 by the processor when the int01 handler is entered, allowing the int01 handler to read and write the DR registers. The int01 exception occurs before executing the instruction, and DR6.BD is set by the processor. Software debuggers can use this bit to prevent the currently-executing program from interfering with the debug operation. int01_handler: push rax mov rax,dr6 bt eax,14 jc single_step_detected bt eax,13 jc debug_access_detected test eax,1 shl 3 jnz bp3_detected test eax,1 shl 2 jnz bp2_detected test eax,1 shl 1 jnz bp1_detected test eax,1 jnz bp0_detected icebp_detected: ... pop rax iretq Instruction execution breakpoint and general-detect condition cause the int01 exception to occur BEFORE the instruction is executed. All other breakpoint (Data Write Only, Data Read or Data Write, I/O Read or I/O Write) and single-stepping conditions cause the int01 exception to occur AFTER the instruction is executed. If more int01 conditions occur on the same instruction (e.g. repeated operations - REP prefix, like repz movsb), they can breakpoint between iteration. Databreakpoint conditions on the previous instruction occur before an instruction-breakpoint condition on the next instruction. However, if instruction and data breakpoints can occur as a result of executing a single instruction, the instruction breakpoint occurs first (before the instruction is executed), followed by the data breakpoint (after the instruction is executed). Single Step triggers int01 vector Single-step breakpoints are enabled by setting the rFLAGS.TF bit to 1. When single stepping is enabled, an int01 exception occurs after every instruction is executed until it is disabled by clearing rFLAGS.TF to 0. The instruction that sets the TF bit, and the instruction that follows it, is not single stepped. pushf or dword [rsp],1 shl 8 popf ; rflags.TF=1 now mov edx,eax ; now int01 occurs for the first time (as execution of mov, mov instruction is allowed to complete, because single step is TRAP type of exception, not FAULT type) pushf ; now int01 occurs again and dword [rsp],not (1 shl 8) ; int01 occurs for the third time popf ; int01 occurs for the forth time, it is the last time, as the execution of popf instruction ; rFLAGS.TF=0 now mov ebx,ecx ; this doesn't trigger int01 anymore int01_handler: ; When an int01 exception occurs due to single stepping, the processor clears rFLAGS.TF to 0 before entering the int01 handler, so that the handler itself is not single stepped. ; The processor also sets DR6.BS to 1, which indicates that the int01 exception occurred as a result of single stepping. push rax mov rax,dr6 bt eax,14 ; DR6. BS jnc else_than_single_step single_step_detected: ; The rFLAGS image pushed onto the debug-handler stack has the TF bit set, and single stepping resumes when a subsequent IRETQ pops the stack image into the rFLAGS register. iretq Single stepping can be a bit more complicated, we will discuss it later. "Software" Breakpoint triggers int03 vector db 0CCh = int3 ; very useful, 1 byte instruction fits to overwrite the first byte of any other instruction db 0CDh, 03h ; useless, can't fit into 1-byte instructions (cld; push/pop gpr64; xchg gpr32,eax; stosb; ...) - incompiled in program at development stage - trick how to go easy and quickly into desired part of program in development - put in program by debugger steps when handling SW BP: 1. debugger reads the original byte and saves the original byte and the original address by storing then into an internal buffer 2. debugger replaces the original byte with the byte 0CCh 3. debugger waits until int03 occures 4. int03 handler gets the address just after the executed byte 0CCh 5. debugger calculates internal value X by subtracting 1 from address returned in step 4 (X=RIP - 1) 6. debugger checks its internal buffer if any stored addr matches X 7. if no such address found, it is an instruction int3 incompiled into program (source of program has int3 instruction, developper must remove it finally) jmp end_of_int03_handler 8. if such address found, it was a breakpoint caused by byte 0CCh inserted into the program by the debugger restore the original byte at address X decrease RIP of the program (RIP - 1 = X) end_of_int03_handler: iretq Other Features (now well know, that's a pity) We can watch addresses of instructions causing control transfers. The instructions are: JMP, CALL, RET, Jcc, JrCXZ, LOOPcc, JMPF, CALLF, RETF, INTn, INT 3, ICEBP, Exceptions, IRET, IRETQ, SYSCALL, SYSRET, NMI, SMI, RSM We just need to enable 1 bit in 1 register... The register has the name Debug-Control MSR DebugCtlMSR = 01D9h mov ecx,DebugCtlMSR rdmsr or eax,1 wrmsr DebugCtlMSR bit mnemonic description 1 BTF Branch Single Step 0 LBR Last-Branch Record Setting LBR bit orders the processor to record the source and target addresses of the last control transfer (branch instruction, interrupt, and exception) taken before a debug exception occurs (int01). The processor automatically disables control-transfer recording when int01 occurs by clearing DebugCtlMSR.LBR to 0. The contents of the control-transfer recording MSRs are not altered by the processor when int01 occurs. Before exiting the debug-exception handler, software can set DebugCtlMSR.LBR to 1 to re-enable the recording mechanism. After enabling LBR bit of DebugCtlMSR, the source and destination addresses of control-transfer events before the control is given to int01 are saved by the processor - branches (call, jmp), interrupts, exceptions - LastBranchFromIP, LastBranchToIP, LastExceptionFromIP, LastExceptionToIP. These 64-bit read-only registers record control branches. LastBranchFromIP = 01DBh LastBranchToIP = 01DCh LastExceptionFromIP = 01DDh LastExceptionToIP = 01DEh mov ecx,LastBranchFromIP rdmsr mov dword [x+4],edx mov dword [x],eax ; qword [x] holds the 64-bit address x dq ? DebugCtlMSR.BTF changes the behavior of the rFLAGS.TF bit. When this bit is cleared to 0 (normal, most common setting) rFLAGS.TF bit controls instruction single stepping (normal behavior). When this bit is set to 1, the rFLAGS.TF bit controls single stepping on control transfers (branch instruction, interrupt, exception) - single step doesn't occur on every instruction, but only on control transfers ("bigger single steps"). By this way the single-step mechanism is allowed to do single step only on control transfers, rather than single step every instruction. Debuggers can use this capability to perform a "coarse" single step across blocks of code (bound by control transfers), and then, as the problem search is narrowed, switch into a "fine" single-step mode on every instruction (DebugCtlMSR.BTF=0, rFLAGS.TF=1). Summarization: symbols: address (in binary) -> label (in source) symbols supported in fdbg: Linux64: ELF64 - DWARF (Debug With Arbitrary Record Format) Win64: exports (as in DLLs) - very useful and easy in FASM DBGHELP.DLL - not very useful in FASM, useful in C breakpoints: must lie on the begin of the instruction (not inside it !) "software" breakpoint int3, db 0CCh disadvantage - modifies memory of program "hardware" breakpoint debug registers - doesn't modify memory of program advantage - watching reading/writing memory (I/O ports) disadvantage - only 4 breakpoints steps: trace into step over usefulness of step over: rep (repnz scasb, repz movsb, repz stosb) call loop They are 2 groups of assembly programmers who really need debugging. 1. Programmers doing biiiiig boooooring exxxxxhaustive job (the reason of doing a lot of mistakes). 2. Programmers doing a lot of mistakes because they just started to learn assembly. Their improvement is rapidly increasing thanks to debugging. This is the specimen (sample) of the first type of programmers DSC00056.JPG This is the second type of programmers... DSC00040.JPG