JVM | Reverse-Engineering
Hello everyone,
Welcome back to another blog about reversing java files respectively. You will read and understand about these concepts:
- Cracking a password protected application
- Basic String Obfuscation
- Advanced String Obfuscation
Before covering these concepts let me explain the java structure.
1. Java Bytecode: When you write a Java program, it needs to be translated into a format that computers can understand and execute. This translation process produces bytecode. Bytecode is like a set of instructions that a special virtual computer called the Java Virtual Machine (JVM) can read and follow.
2. Java Virtual Machine (JVM): It’s a software component that runs inside your computer’s central processing unit (CPU). The JVM understands and interprets the bytecode instructions produced from the Java source code. It acts as a custom virtual CPU that follows its own set of instructions, which is different from the regular CPU’s instruction set.
3. Stack-Based Language: The way JVM executes bytecode is by using a stack. A stack is like a bucket where you can put things on top and take them off from the top. Instead of using registers like your computer’s CPU, the JVM uses this stack to handle temporary variables and calculations.
4. Stack Underflow and Stack Overflow: When you try to take something off from an empty stack, it’s called a Stack Underflow. When you add too many things to the stack, and it runs out of memory, it’s called a Stack Overflow (similar to a bucket overflowing with too many items).
5. Java Bytecode for “Hello World: The bytecode to print “Hello World” to the console involves three main instructions:
— getstatic java/lang/System.out:Ljava/io/PrintStream;: This retrieves the special “out” variable in the `System` class, which represents the console, and stores it on the stack.
— ldc Hello World: This loads the string “Hello World” onto the stack.
— invokevirtual java/io/PrintStream.println:(Ljava/lang/String;)V`: This tells the JVM to call the `println` function on the `System.out` variable, using the string at the top of the stack as an argument.
6. Class Files: Java source code is compiled into .class files, one class per file. These files can reference other classes, and the JVM links them together when the program runs. By using a tool called javap
, we can see the methods and fields present in a class. Methods and fields have names and descriptors. Descriptors represent the arguments and return type of a method or the type of a field.
7. Descriptors: Descriptors are a way to represent the signature of a method or a field. For example, a method like `void main(String[] args, int i)` would produce the descriptor `main([Ljava/lang/String;I)V`, where `[Ljava/lang/String;` represents an array of Strings, `I` represents an integer, and `V` represents void (no return type).
8. javap Tool: javap is a command-line tool that comes with the Java Development Kit (JDK). It can be used to disassemble compiled Java classes and show the methods and fields they contain. The -v flag stands for verbose mode, which provides more detailed information, and the -p flag shows private members of the class.
So, the command javap -v -p HelloWorld.class will disassemble the HelloWorld.class file and display the methods and fields it contains, including private members, in a detailed format.
Please take a look at: https://en.wikipedia.org/wiki/List_of_Java_bytecode_instructions
Bytecode
Let me explain above picture:
1. LDC 0: This instruction loads the constant value 0 onto the stack.
Stack: [0]
2. LDC 3: This instruction loads the constant value 3 onto the stack.
Stack: [0, 3]
3. SWAP: This instruction swaps the top two values on the stack.
Stack: [3, 0]
4. POP: This instruction removes the top value from the stack.
Stack: [3]
5. INEG: This instruction negates the top value on the stack (changes its sign).
The bytecode instructions first load the values 0 and 3 onto the stack, then swap their positions, remove the top value (3), and finally negate the remaining value, resulting in -3 being at the top of the stack.
If you understood this correctly, let me dive into the concepts that we want to reverse and crack it.
Hello World
- Class Information:
— Class Name: Main
— Superclass: java/lang/Object
— Interfaces: None
— Fields: None
— Methods: 2 (a constructor and a main method)
— Attributes: 1 (Constant pool)
2. Constant Pool: The constant pool holds references to various constants used in the class, such as method names, field names, and string literals. For example, #1, #2, #3, #4 are references to methods and fields.
3. Constructor Main():
— Descriptor: ()V (No arguments, Void return type)
— Flags: 0x0000 (No specific flags)
— Code: The constructor has a single instruction (0 to 4) to invoke the superclass constructor (`java/lang/Object.<init>:()V`) and then return.
— LineNumberTable: The line number table shows the line number where the constructor code exists in the original source file. In this case, it’s at line 3.
4. public static void main(String[])` Method:
— Descriptor: ([Ljava/lang/String;)V (Takes an array of Strings as an argument, Void return type)
— Flags: 0x0009 (ACC_PUBLIC and ACC_STATIC)
— Code: The main method starts with two local variables (stack=2, locals=2) and sets the value of the second local variable to 0 (iconst_0) and then stores it in the first local variable (istore_1). The method then proceeds to load System.out onto the stack, loads the string “Hello World,” and calls the println method to print it.
— The iinc instruction is used to increment the value of the first local variable by 2.
— LineNumberTable: The line number table indicates the line numbers corresponding to specific instructions in the original source file. For instance, the instructions at lines 5, 6, 7, and 8 in the bytecode map to lines 0, 2, 10, and 13 in the source file.
5. Source File: The SourceFile attribute specifies the name of the original source file from which the class was compiled. In this case, it’s SecretSourceFile.java.
Overall, the provided information describes a Java class named `Main` with a constructor and a `main` method that prints “Hello World” to the console and increments a variable by 2. The bytecode instructions are represented in numeric form, and they perform the desired operations when executed.
Cracking a password protected application
There is a source file, and we are going to crack a password protected application.
I am now going to explain the source file.
Certainly! The provided information is a disassembled representation of a Java class named `PasswordProtectedApplication`. Let’s go through the details step by step:
1. Class Information:
— Class Name: PasswordProtectedApplication
— Superclass: java/lang/Object
— Interfaces: None
— Fields: None
— Methods: 2 (a constructor and a `main` method)
— Attributes: 1 (Constant pool)
2. Constant Pool: The constant pool holds references to various constants used in the class, such as method names, field names, and string literals. For example, #1, #2, #3, #4 are references to methods and fields.
3. Constructor <init>():
— Descriptor: ()V (No arguments, Void return type)
— Flags: 0x0000 (No specific flags)
— Code: The constructor has a single instruction (0 to 2) to invoke the superclass constructor (`java/lang/Object.”<init>”:()V`) and then return.
4. public static void main(String[]) Method:
— Descriptor: ([Ljava/lang/String;)V (Takes an array of Strings as an argument, Void return type)
— Flags: 0x0009 (ACC_PUBLIC and ACC_STATIC)
— Code: The main method starts with loading a string “yxvF2ho95ANJVCX” (the password) and stores it. It then checks if the entered password (supplied as a command-line argument) matches the stored password using the equals method. If the passwords match, it prints “You guessed the correct password” using System.out.println. Otherwise, it prints “You guessed the wrong password” or “Please supply a password” depending on whether the argument is missing or incorrect.
— LineNumberTable: The line number table indicates the line numbers corresponding to specific instructions in the original source file.
The class `PasswordProtectedApplication` appears to be a simple Java program that checks whether the supplied command-line argument matches a predefined password and prints different messages based on the outcome. The class is designed to protect access to certain parts of the program by requiring the correct password for entry.
“yxvF2ho95ANJVCX” → This should be the password, let me give a try:
- I put a password which was wrong.
- I did not put any password, I just wanted to see the output. And as I expected.
- Lastly, I put the password correctly which we found earlier.
2. Basic String Obfuscation
It seems that “aRa2lPT6A6gIqm4RE” is xored. Because we can see a xor method.
In this instance, the string is not directly visible in the class file itself. To uncover its content, you’ll need to employ methods like decompilation, bytecode analysis, or virtualization. These approaches will allow you to explore the class’s internal structure and decipher the hidden string within.
In this level I am going to use “JD-GUI” a decompiler tool which is very useful. I will be able to drop this class into this tool. Let see!
JD-GUI: http://java-decompiler.github.io/
As expected, the string is XORED. let me make draw it:
This code defines a private static method called xor, which takes a single argument paramString of type String. The purpose of this method is to perform an XOR encryption-like operation on the characters of the input string.
1. char[] arrayOfChar1 = paramString.toCharArray(); : The toCharArray() method is called on the paramString, converting it into an array of characters arrayOfChar1. This is done to facilitate manipulation of individual characters.
2. char[] arrayOfChar2 = new char[arrayOfChar1.length];`: An array `arrayOfChar2` is created with the same length as `arrayOfChar1`. This array will store the result of the XOR operation.
3. for (byte b = 0; b < arrayOfChar2.length; b++) : A loop is initiated that iterates over each character in `arrayOfChar1, with the loop variable `b` ranging from 0 to `arrayOfChar2.length — 1.
4. char c = arrayOfChar1[b]; : Inside the loop, the current character in `arrayOfChar1` is extracted and stored in the variable c.
5. arrayOfChar2[b] = (char)(c ^ b % 3); : The XOR operation is performed on the character c using the value of b modulo 3. This operation XORs the ASCII value of the character with the result of the modulo operation.
XOR is a bitwise operation that compares the bits of two values. When two bits are different, the XOR result is 1; when they are the same, the XOR result is 0.
In this case, the XOR operation is used to transform the characters based on their positions in the input string. The effect is to obfuscate the original characters.
6. return new String(arrayOfChar2); : After the loop, the modified arrayOfChar2 is used to create a new String object. This new string contains the result of the XOR operation.
the XOR method takes a string as input, converts it into an array of characters, and then performs a XOR-based transformation on each character based on its position in the input string. The resulting obfuscated characters are combined to create a new string, which is then returned as the output of the method.
We can decode the string to get the real password :)
3. Advanced String Obfuscation
In order to get the password, we need to understand about “Advanced byte code manipulation” let me cover:
Advanced bytecode manipulation in Java refers to the practice of programmatically modifying Java bytecode at a low level. Java bytecode is the intermediate representation of Java code that is generated by the Java compiler and executed by the Java Virtual Machine (JVM).
Bytecode manipulation allows developers to inspect, modify, or generate bytecode instructions, providing them with fine-grained control over the behavior of Java applications at runtime. This process is commonly used in various scenarios, such as:
1. Instrumentation: Bytecode manipulation is often used for instrumentation, which involves modifying Java classes to collect runtime data, monitor performance, or add custom logging. For example, it can be used to add logging statements to methods, track method execution time, or inject custom code for profiling purposes.
2. Aspect-Oriented Programming : AOP is a programming paradigm that allows cross-cutting concerns (such as logging, security, and transaction management) to be separated from the main business logic. Bytecode manipulation enables AOP frameworks to inject these concerns into the bytecode at runtime without modifying the original source code.
3. Code Generation: Bytecode manipulation is used to dynamically generate classes and methods at runtime. This can be useful in scenarios where code needs to be generated based on runtime conditions or user-defined configurations.
4. Optimizations: Advanced bytecode manipulation techniques can be employed to perform bytecode-level optimizations, such as code size reduction, constant folding, and inlining of methods.
5. Hot Code Swapping: During development and debugging, bytecode manipulation enables tools like hot code swapping, which allows developers to change parts of the code and see the changes immediately without restarting the application.
To achieve bytecode manipulation, developers commonly use libraries and frameworks such as ASM (a popular bytecode manipulation library), ByteBuddy, Javassist, and cglib. These tools provide APIs and utilities that simplify the process of working with bytecode and make it easier to create custom bytecode transformations and manipulations.
Link: https://asm.ow2.io/
It is worth noting that bytecode manipulation is a powerful technique, but it should be used with caution, as incorrect modifications can lead to unexpected behavior and runtime errors. Proper understanding of bytecode, the JVM, and the intended modifications is crucial to ensuring the stability and correctness of the manipulated code.
If you determined how to deobfuscate the code, let’s dive into it.
It seems that the file is a JAR file. We can execute with -jar command:
I am unable to read:
During my investigation phases, I came across to java disassembling called “Krakatau” you can read the official docs, but let me explain briefly.
Krakatau provides an assembler and disassembler for Java bytecode, which allows you to convert binary classfiles to a human readable text format, make changes, and convert it back to a classfile, even for obfuscated code. You can also create your own classfiles from scratch by writing bytecode manually, and can examine and compare low level details of Java binaries. Unlike javap, the Krakatau disassembler can handle even highly obfuscated code, and the disassembled output can be reassembled into a classfile.
Krakatau also provides a decompiler for converting Java binaries to readable source code. Unlike other decompilers, the Krakatau decompiler was specifically designed for working with obfuscated code and can easily handle tricks that break other decompilers. However, the Krakatau decompiler does not support some Java 8+ features such as lambdas, so it works best on older code.
Link: https://github.com/Storyyeller/Krakatau
Instead of reversing statically typed, we can reverse dynamically typed. Because statically reversing will be painful.
It is written in RUST, so when you clone to your local machine “You must install cargo and rust to be run properly”.
I got a ZIP file:
It seems that 3 java classes were stored in disassembled.zip which I then unzipped it.
- push a byte onto the stack as an integer value.
2. Invoke a static method and puts the result on the stack (might be void); the method is identified by method.
{3,38}?
3. Invoke virtual method on object objectief and puts the result on the stack (might be void); the method is identified by method reference index in constant pool (indexbyte1 << 8 | indexbyte2)
Could it be “1.a(3, 38)” ?
I think that 1.a(3,38) will be used for printl(); “Invokevirtual” Invoke Virtual will be used for print. And I reckon that it will give “”
Let me explain above:
L82: getstatic [42]: is an instruction used to retrieve a static field from a class. In this case, it is trying to get the value of the static field at index 42.
L85: iconst_4: is an instruction to push the integer value 4 onto the stack.
L86: bipush 87: is used to push a byte-sized (8-bit) constant value onto the stack. In this case, it is pushing the value 87 onto the stack.
L88: invokestatic [15]: is an instruction used to invoke a static method from a class. In this case, it is invoking the static method at index 15.
L91: invokevirtual [48]: is used to invoke a non-static (instance) method. It is invoking the virtual method at index 48.
That we know everything about this manipulation, we are going to use Krakatauv again. I had already manipulated the bytecode. Just watch and understand it.
As you can 3 classes are appeared to us. 0.j is being manipulated. We will be using “assemble.py” to assemble. j files.
Interesting enough that the “result.jar” is corrupted. We can use only 1 class file where the main class has been started.
There is a flag -cp → It is a command-line option that stands for “classpath.” The classpath is used to specify the location(s) where the JVM should look for classes and resources that are required by the Java application being executed.
When you run a Java program from the command line, the JVM needs to know where to find the compiled .class files for the classes used in the program and any required libraries (JAR files) that the program depends on. The -cp
option allows you to specify one or more directories or JAR files separated by the system-specific path separator to set the class path.
0 → is the being manipulated bytecode file.
As I said: Bytecode manipulation is a technique used in software development, particularly in the context of languages that are compiled to bytecode, such as Java and Kotlin. It involves programmatically modifying the bytecode of a program at runtime, before or during its execution. This allows developers to add, remove, or modify instructions within the compiled code, which can be useful for various purposes.
You can use:
- https://asm.ow2.io/
- xxDark/SSVM: Java VM running on a JVM (github.com)
- Col-E/Recaf: The modern Java bytecode editor (github.com)
I also recommend you watch this video: https://youtu.be/0EWWBrvU6rs
Conclusion
I must say that dynamically reversing was the most challenging part. I learned a lot about dealing with bytecode, which is crucial for optimizing and altering runtime behavior. I had a lot of fun while writing and explaining this topic. It is highly recommended to learn programming languages to have a better understanding of them during the reverse engineering process. The important thing is not completing it within 2 or 3 days but rather focusing on what you learn from it, using different tools and techniques. I will cover one more topic about OBF, but first, I need to understand and complete it.
Ahmet | Threat Cases Operator