Binary Analysis With Python GDB API

Ahmet Göker
10 min readFeb 3, 2024

--

Hello everyone, welcome back to my blog post. Today, I am going to delve into the intricacies of the GDB debugger, providing an insightful example to showcase its capabilities. I trust that this blog will prove to be a valuable resource for individuals engaged in reverse engineering.

GDB, or GNU Debugger, is a powerful tool used for debugging and analyzing programs written in C, C++, and other languages. It enables users to scrutinize and manipulate the execution of a program, aiding in the identification and resolution of bugs or vulnerabilities.

In this blog, we will explore a practical example, shedding light on how GDB can be effectively employed during the reverse engineering process. By understanding GDB’s commands, breakpoints, and watchpoints, readers will gain valuable insights into the inner workings of programs, making the reverse engineering journey more informed and efficient.

What is a Stack Frame?

Imagine your program is like a book, and each function call is a new chapter. The stack frame is like a bookmark, keeping track of where you are in the story. It contains essential information about the current function, such as local variables, return addresses, and parameters.

Navigating the Stack with GDB:

When you’re debugging with GDB, it’s like being a detective, examining clues in your program’s memory. Let’s break it down:

A. Breakpoints:

  • Think of breakpoints as magnifying glasses. You place them where you want to inspect the story closely.
  • In GDB, type break followed by the function or line number you're interested in, and GDB will pause your program there.

B. Inspecting the Stack Frame:

  • Use the backtrace or bt command in GDB to see the call stack. It's like flipping through the pages of your book to see how you got to the current chapter.
  • The stack frame for each function call shows you where you are in the grand scheme of your program.

C. Local Variables:

  • Local variables are like characters in your story. They have roles limited to their function (chapter).
  • Type info locals in GDB, and it will reveal the local characters of your current function.

D. Parameter Peek:

  • Parameters are the messengers between chapters, carrying information from one function to another.
  • Use info args in GDB to eavesdrop on these messengers and understand what each function is passing along.

Why Should You Care?

Understanding the stack frame is crucial for debugging. It helps you trace the flow of your program, identify where issues might be hiding, and even foresee problems before they emerge.

Binary Analysis Challenge

First of all, I’ve created a binary that we will use for GDB debugging. The reason I haven’t shown all the commands is to emphasize that, for learning GDB, it’s beneficial to use examples. We will begin by analyzing the file through some static analysis and then proceed to dynamic analysis. You can think of it as reverse engineering, but we will find the secret value without relying on the Python GDB API, using basic GDB commands. Afterward, we will understand the methodology and explore how to integrate it into a Python file.

Static Analysis

We can see that the file is not stripped, dynamically linked, and 64-bit.

We can also see that most security protections are in place, but we are not going to exploit them.

We just used string command.

GDB Analysis without Python

I love using GDB; it is crucial when using limited, pre-installed commands to analyze the file. We just need to switch from dynamic analysis to analyzing the runtime and registers that can be changed. Our focus is solely on finding the secret value.

To disassemble the file, we can use info functions to find the main function. Then, using disassemble main , we can disassembled it.

  1. Function Prologue
  • 0x0000000000001149: endbr64
  • 0x000000000000114d: push rbp
  • 0x000000000000114e: mov rbp,rsp
  • 0x0000000000001151: sub rsp,0x10

2. Initialization

  • 0x0000000000001155: mov rax,0x12345678
  • 0x000000000000115c: mov DWORD PTR [rbp-0xc],eax
  • 0x000000000000115f: mov DWORD PTR [rbp-0x8],0x0

3. Loop:

  • 0x0000000000001166: jmp 0x1184 <main+59>
  • Inside the loop:
  • 0x0000000000001168 to 0x0000000000001188: Complex arithmetic operations involving shifts, additions, XOR, and comparisons.
  • 0x000000000000118a: Loop termination condition check.

4.Print-Result

  • 0x000000000000118a: mov DWORD PTR [rbp-0x4],0x0
  • 0x0000000000001192: Print the result using printf.

5.Function Epilogue

  • 0x00000000000011ab: mov eax,0x0
  • 0x00000000000011b0: leave
  • 0x00000000000011b1: ret

This disassembly represents a function, presumably the main function, with a loop that performs complex arithmetic operations on the DWORD PTR [rbp-0xc] variable. The result is printed using printf. The loop continues until a certain condition is met. The function then concludes with the function epilogue.

  1. jmp 0x1184: This is an unconditional jump, effectively creating a loop. It jumps to the address 0x1184, which is the end of the loop.
  2. Inside the loop:
  • sar DWORD PTR [rbp-0xc], 0x2: Right-shifts the value at the memory address [rbp-0xc] by 2 bits (arithmetic shift).
  • mov edx, DWORD PTR [rbp-0x8]: Moves the value at the memory address [rbp-0x8] into the edx register.
  • mov eax, edx: Copies the value in edx into eax.
  • shl eax, 0x2: Left-shifts the value in eax by 2 bits.
  • add eax, edx: Adds the value in edx to eax.
  • add eax, eax: Doubles the current value in eax.
  • add eax, edx: Adds the original edx value to eax.
  • add eax, 0x7: Adds 7 to eax.
  • xor DWORD PTR [rbp-0xc], eax: Performs an XOR operation between the current value at [rbp-0xc] and eax, storing the result back in [rbp-0xc].
  • add DWORD PTR [rbp-0x8], 0x1: Increments the value at [rbp-0x8] by 1.
  1. cmp DWORD PTR [rbp-0x8], 0x3: Compares the value at [rbp-0x8] with 3.
  2. jle 0x1168: Jumps back to the beginning of the loop (0x1168) if the result of the comparison was less than or equal to (jle) 3. Otherwise, it continues with the next instruction after the loop.

This block of code appears to be a loop that performs a series of arithmetic operations and bit manipulations on values stored at specific memory addresses. The loop runs three times (0x3 iterations). The details of what it's calculating might depend on the initial values in memory at [rbp-0xc] and [rbp-0x8].

These commands set breakpoints at the specified locations:

  • break main: Before the loop starts.
  • break *0x000000000000117d: Inside the loop.
  • break *0x0000000000001192: After the loop.

First-Break point

We can see that `0x12345678` is being stored in `rax` after the first breakpoint.

Second-break point

“We can observe that `0x12345678` was stored in `rax`, as evident during the second break point.

Third-break point

  1. Memory Contents at rbp-0xc:
  • The memory contents at rbp-0xc are 0x78 0x34 0x12 0x00. This sequence of bytes represents the value 0x12345678.
  1. Printed Value of rbp-0xc:
  • The printed value of rbp-0xc is (void *) 0x7fffffffdee4. This address points to the beginning of the memory region where the value 0x12345678 is stored.
  1. Registers:
  • rax holds the value 0x28.
  • rbx, rcx, rsi, and rdi are related to memory addresses and values for function parameters.
  • rbp points to the base of the stack frame.
  • rsp points to the current stack position.
  • rip is the instruction pointer and points to the address 0x555555555192, which is within the main function at offset 73.
  1. Instruction at rip (0x555555555192):
  • The instruction at the current instruction pointer (rip) is part of the printf call within the main function.

In summary, at the breakpoint, it’s displaying the value stored at the memory location rbp-0xc, which contains the byte sequence 0x78 0x34 0x12 0x00, representing the integer value 0x12345678. This value is subsequently printed in the printf statement.

Memory Contents at rbp-0xc:

  • The secret value is stored at memory address RBP-0xC, which is 0x7fffffffdee4 in this case.
  • The x/4xb $rbp-0xc command displays the 4 bytes at this address: 0x78 0x34 0x12 0x00.

2. Interpret the Value:

  • These bytes represent the secret value in little-endian format (least significant byte first).
  • Reversing the byte order gives the decimal value: 0x00123478.

3. Verify the Value:

  • The print $rbp-0xc command indirectly confirms the value, showing the pointer to the same memory address.
  • The program’s output will likely further verify the value when it prints “The secret value is: …”

Therefore, the secret value at the third breakpoint is 1908872.

GDB Analysis with Python

As we can see, it is not hard to analyze the registers now that we have used some instructions. I will implement Python code to automate this. Let’s see what we can do.

Using the GDB API in Python can offer several advantages and enhance the debugging and analysis process:

A. Automation:

  • With the GDB Python API, you can automate repetitive tasks and analyses. This is particularly useful when dealing with large codebases or when performing repetitive debugging operations.

B. Scripting:

  • Python is a powerful scripting language that allows you to write scripts to control GDB dynamically. You can create custom scripts to automate debugging scenarios, making it easier to test different aspects of your program.

C. Integration with Other Tools:

  • Python can be easily integrated with other tools and libraries. This allows you to extend GDB’s functionality by combining it with data analysis, visualization, or custom algorithms implemented in Python.

D. Custom Commands:

  • You can create custom GDB commands using Python, tailoring the debugging environment to suit your specific needs. This can help in quickly accessing and visualizing information relevant to your analysis.

E. Data Analysis and Visualization:

  • Python’s extensive libraries for data analysis and visualization (such as NumPy and Matplotlib) can be utilized to analyze and visualize data extracted from GDB. This can be helpful in understanding program behavior, identifying patterns, and making informed decisions during debugging.

F. Dynamic Debugging Scenarios:

  • Python allows you to dynamically control the debugging process based on runtime conditions. You can set breakpoints, evaluate expressions, and modify the program’s state based on the current state of the program during execution.

E. Extensibility:

  • The GDB Python API enables you to extend GDB’s functionality beyond what is possible with standard GDB commands. You can develop custom solutions tailored to specific projects or debugging requirements.

G. Cross-Platform Support:

  • Python is a cross-platform language, which means your scripts can run on different operating systems without modification. This provides consistency in your debugging workflow across various environments.

check: https://devguide.python.org/development-tools/gdb/

Explanation:

A. Connect to GDB:

  • gdb.execute("file secret"): Load the "secret" file into GDB.

B. Set Breakpoints:

  • gdb.Breakpoint("main"): Set a breakpoint at the beginning of the main function (address 0x0000000000001155).
  • gdb.Breakpoint("*main+29"): Set a breakpoint inside the loop (address 0x0000000000001166).
  • gdb.Breakpoint("*main+73"): Set a breakpoint after the loop (address 0x0000000000001192).

C. Define Breakpoint Event Handler:

  • handle_breakpoint(event): Define a function to be called when a breakpoint is hit. It prints the values of RAX and the secret value stored at ($rbp - 0xc). It then continues to the next breakpoint and, if the last breakpoint is hit, sleeps for 1 second.

D. Connect the Event Handler:

  • gdb.events.stop.connect(handle_breakpoint): Register the defined event handler for the stop event (when a breakpoint is hit).

E. Run the Program:

  • gdb.execute("run"): Start the program and stop at the specified breakpoints.

D. Check Inferior Validity:

  • if not gdb.selected_inferior().is_valid():: Check if the inferior (process being debugged) is valid. If not, kill the program.

The script automates the debugging process by setting breakpoints at specific locations, defining an event handler to print values, and controlling the flow between breakpoints. It demonstrates the use of the GDB Python API to interact with GDB programmatically.

We should have to use -x to run our file.

The result

Conclude

In this blog post, we learned how to analyze a file through GDB. I know this is not exhaustive, but we have just scratched the surface of how we can use the Python GDB API to analyze the file. There will be more blogs about debugging; this is just the beginning. Thank you for reading this blog.

You can follow me on:

Youtube: https://www.youtube.com/@lockpin010

Linkedin: https://www.linkedin.com/in/ahmetgoker/

Twitter: https://twitter.com/lockpin010_

--

--

Ahmet Göker
Ahmet Göker

Written by Ahmet Göker

Full stack Reverser | Linux-Kernel | Windows API

No responses yet