Binary Analysis With Python GDB API
Hello everyone, welcome back to my blog post. Today, I am going to delve into the intricacies of the GDB debugger, providing an insightful example to showcase its capabilities. I trust that this blog will prove to be a valuable resource for individuals engaged in reverse engineering.
GDB, or GNU Debugger, is a powerful tool used for debugging and analyzing programs written in C, C++, and other languages. It enables users to scrutinize and manipulate the execution of a program, aiding in the identification and resolution of bugs or vulnerabilities.
In this blog, we will explore a practical example, shedding light on how GDB can be effectively employed during the reverse engineering process. By understanding GDB’s commands, breakpoints, and watchpoints, readers will gain valuable insights into the inner workings of programs, making the reverse engineering journey more informed and efficient.
What is a Stack Frame?
Imagine your program is like a book, and each function call is a new chapter. The stack frame is like a bookmark, keeping track of where you are in the story. It contains essential information about the current function, such as local variables, return addresses, and parameters.
Navigating the Stack with GDB:
When you’re debugging with GDB, it’s like being a detective, examining clues in your program’s memory. Let’s break it down:
A. Breakpoints:
- Think of breakpoints as magnifying glasses. You place them where you want to inspect the story closely.
- In GDB, type
break
followed by the function or line number you're interested in, and GDB will pause your program there.
B. Inspecting the Stack Frame:
- Use the
backtrace
orbt
command in GDB to see the call stack. It's like flipping through the pages of your book to see how you got to the current chapter. - The stack frame for each function call shows you where you are in the grand scheme of your program.
C. Local Variables:
- Local variables are like characters in your story. They have roles limited to their function (chapter).
- Type
info locals
in GDB, and it will reveal the local characters of your current function.
D. Parameter Peek:
- Parameters are the messengers between chapters, carrying information from one function to another.
- Use
info args
in GDB to eavesdrop on these messengers and understand what each function is passing along.
Why Should You Care?
Understanding the stack frame is crucial for debugging. It helps you trace the flow of your program, identify where issues might be hiding, and even foresee problems before they emerge.
Binary Analysis Challenge
First of all, I’ve created a binary that we will use for GDB debugging. The reason I haven’t shown all the commands is to emphasize that, for learning GDB, it’s beneficial to use examples. We will begin by analyzing the file through some static analysis and then proceed to dynamic analysis. You can think of it as reverse engineering, but we will find the secret value without relying on the Python GDB API, using basic GDB commands. Afterward, we will understand the methodology and explore how to integrate it into a Python file.
Static Analysis
We can see that the file is not stripped, dynamically linked, and 64-bit.
We can also see that most security protections are in place, but we are not going to exploit them.
We just used string
command.
GDB Analysis without Python
I love using GDB; it is crucial when using limited, pre-installed commands to analyze the file. We just need to switch from dynamic analysis to analyzing the runtime and registers that can be changed. Our focus is solely on finding the secret value.
To disassemble the file, we can use info functions
to find the main function. Then, using disassemble main
, we can disassembled it.
- Function Prologue
0x0000000000001149
:endbr64
0x000000000000114d
:push rbp
0x000000000000114e
:mov rbp,rsp
0x0000000000001151
:sub rsp,0x10
2. Initialization
0x0000000000001155
:mov rax,0x12345678
0x000000000000115c
:mov DWORD PTR [rbp-0xc],eax
0x000000000000115f
:mov DWORD PTR [rbp-0x8],0x0
3. Loop:
0x0000000000001166
:jmp 0x1184 <main+59>
- Inside the loop:
0x0000000000001168 to 0x0000000000001188
: Complex arithmetic operations involving shifts, additions, XOR, and comparisons.0x000000000000118a
: Loop termination condition check.
4.Print-Result
0x000000000000118a
:mov DWORD PTR [rbp-0x4],0x0
0x0000000000001192
: Print the result usingprintf
.
5.Function Epilogue
0x00000000000011ab
:mov eax,0x0
0x00000000000011b0
:leave
0x00000000000011b1
:ret
This disassembly represents a function, presumably the main
function, with a loop that performs complex arithmetic operations on the DWORD PTR [rbp-0xc]
variable. The result is printed using printf
. The loop continues until a certain condition is met. The function then concludes with the function epilogue.
jmp 0x1184
: This is an unconditional jump, effectively creating a loop. It jumps to the address0x1184
, which is the end of the loop.- Inside the loop:
sar DWORD PTR [rbp-0xc], 0x2
: Right-shifts the value at the memory address[rbp-0xc]
by 2 bits (arithmetic shift).mov edx, DWORD PTR [rbp-0x8]
: Moves the value at the memory address[rbp-0x8]
into theedx
register.mov eax, edx
: Copies the value inedx
intoeax
.shl eax, 0x2
: Left-shifts the value ineax
by 2 bits.add eax, edx
: Adds the value inedx
toeax
.add eax, eax
: Doubles the current value ineax
.add eax, edx
: Adds the originaledx
value toeax
.add eax, 0x7
: Adds 7 toeax
.xor DWORD PTR [rbp-0xc], eax
: Performs an XOR operation between the current value at[rbp-0xc]
andeax
, storing the result back in[rbp-0xc]
.add DWORD PTR [rbp-0x8], 0x1
: Increments the value at[rbp-0x8]
by 1.
cmp DWORD PTR [rbp-0x8], 0x3
: Compares the value at[rbp-0x8]
with 3.jle 0x1168
: Jumps back to the beginning of the loop (0x1168
) if the result of the comparison was less than or equal to (jle) 3. Otherwise, it continues with the next instruction after the loop.
This block of code appears to be a loop that performs a series of arithmetic operations and bit manipulations on values stored at specific memory addresses. The loop runs three times (0x3
iterations). The details of what it's calculating might depend on the initial values in memory at [rbp-0xc]
and [rbp-0x8]
.
These commands set breakpoints at the specified locations:
break main
: Before the loop starts.break *0x000000000000117d
: Inside the loop.break *0x0000000000001192
: After the loop.
First-Break point
We can see that `0x12345678` is being stored in `rax` after the first breakpoint.
Second-break point
“We can observe that `0x12345678` was stored in `rax`, as evident during the second break point.
Third-break point
- Memory Contents at
rbp-0xc
:
- The memory contents at
rbp-0xc
are0x78 0x34 0x12 0x00
. This sequence of bytes represents the value0x12345678
.
- Printed Value of
rbp-0xc
:
- The printed value of
rbp-0xc
is(void *) 0x7fffffffdee4
. This address points to the beginning of the memory region where the value0x12345678
is stored.
- Registers:
rax
holds the value0x28
.rbx
,rcx
,rsi
, andrdi
are related to memory addresses and values for function parameters.rbp
points to the base of the stack frame.rsp
points to the current stack position.rip
is the instruction pointer and points to the address0x555555555192
, which is within themain
function at offset73
.
- Instruction at
rip
(0x555555555192
):
- The instruction at the current instruction pointer (
rip
) is part of theprintf
call within themain
function.
In summary, at the breakpoint, it’s displaying the value stored at the memory location rbp-0xc
, which contains the byte sequence 0x78 0x34 0x12 0x00
, representing the integer value 0x12345678
. This value is subsequently printed in the printf
statement.
Memory Contents at rbp-0xc
:
- The secret value is stored at memory address
RBP-0xC
, which is0x7fffffffdee4
in this case. - The
x/4xb $rbp-0xc
command displays the 4 bytes at this address:0x78 0x34 0x12 0x00
.
2. Interpret the Value:
- These bytes represent the secret value in little-endian format (least significant byte first).
- Reversing the byte order gives the decimal value:
0x00123478
.
3. Verify the Value:
- The
print $rbp-0xc
command indirectly confirms the value, showing the pointer to the same memory address. - The program’s output will likely further verify the value when it prints “The secret value is: …”
Therefore, the secret value at the third breakpoint is 1908872.
GDB Analysis with Python
As we can see, it is not hard to analyze the registers now that we have used some instructions. I will implement Python code to automate this. Let’s see what we can do.
Using the GDB API in Python can offer several advantages and enhance the debugging and analysis process:
A. Automation:
- With the GDB Python API, you can automate repetitive tasks and analyses. This is particularly useful when dealing with large codebases or when performing repetitive debugging operations.
B. Scripting:
- Python is a powerful scripting language that allows you to write scripts to control GDB dynamically. You can create custom scripts to automate debugging scenarios, making it easier to test different aspects of your program.
C. Integration with Other Tools:
- Python can be easily integrated with other tools and libraries. This allows you to extend GDB’s functionality by combining it with data analysis, visualization, or custom algorithms implemented in Python.
D. Custom Commands:
- You can create custom GDB commands using Python, tailoring the debugging environment to suit your specific needs. This can help in quickly accessing and visualizing information relevant to your analysis.
E. Data Analysis and Visualization:
- Python’s extensive libraries for data analysis and visualization (such as NumPy and Matplotlib) can be utilized to analyze and visualize data extracted from GDB. This can be helpful in understanding program behavior, identifying patterns, and making informed decisions during debugging.
F. Dynamic Debugging Scenarios:
- Python allows you to dynamically control the debugging process based on runtime conditions. You can set breakpoints, evaluate expressions, and modify the program’s state based on the current state of the program during execution.
E. Extensibility:
- The GDB Python API enables you to extend GDB’s functionality beyond what is possible with standard GDB commands. You can develop custom solutions tailored to specific projects or debugging requirements.
G. Cross-Platform Support:
- Python is a cross-platform language, which means your scripts can run on different operating systems without modification. This provides consistency in your debugging workflow across various environments.
check: https://devguide.python.org/development-tools/gdb/
Explanation:
A. Connect to GDB:
gdb.execute("file secret")
: Load the "secret" file into GDB.
B. Set Breakpoints:
gdb.Breakpoint("main")
: Set a breakpoint at the beginning of themain
function (address0x0000000000001155
).gdb.Breakpoint("*main+29")
: Set a breakpoint inside the loop (address0x0000000000001166
).gdb.Breakpoint("*main+73")
: Set a breakpoint after the loop (address0x0000000000001192
).
C. Define Breakpoint Event Handler:
handle_breakpoint(event)
: Define a function to be called when a breakpoint is hit. It prints the values ofRAX
and the secret value stored at($rbp - 0xc)
. It then continues to the next breakpoint and, if the last breakpoint is hit, sleeps for 1 second.
D. Connect the Event Handler:
gdb.events.stop.connect(handle_breakpoint)
: Register the defined event handler for thestop
event (when a breakpoint is hit).
E. Run the Program:
gdb.execute("run")
: Start the program and stop at the specified breakpoints.
D. Check Inferior Validity:
if not gdb.selected_inferior().is_valid():
: Check if the inferior (process being debugged) is valid. If not, kill the program.
The script automates the debugging process by setting breakpoints at specific locations, defining an event handler to print values, and controlling the flow between breakpoints. It demonstrates the use of the GDB Python API to interact with GDB programmatically.
We should have to use -x
to run our file.
The result
Conclude
In this blog post, we learned how to analyze a file through GDB. I know this is not exhaustive, but we have just scratched the surface of how we can use the Python GDB API to analyze the file. There will be more blogs about debugging; this is just the beginning. Thank you for reading this blog.
You can follow me on:
Youtube: https://www.youtube.com/@lockpin010
Linkedin: https://www.linkedin.com/in/ahmetgoker/
Twitter: https://twitter.com/lockpin010_