This tutorial series will guide you through the basics of decompiling a C++ executable, from setup all the way to reversing C++ classes.
The video tutorial is created by James Tate over on his excellent YouTube channel, and it is highly recommended that you subscribe here: James Tate - YouTube.
The first step, of course, is to download Ghidra if you haven’t already, which you can do from the official site:
At the time of writing this tutorial, the version of Ghidra was 10.2.3.
You will also need a Java Development Kit (JDK) version 17+, which you can download from the AdoptOpenSDK official site: AdoptOpenJDK - Open source, prebuilt OpenJDK binaries.
You can now run Ghidra from the extracted folder by running the main script from bash (or double-clicking on it):
./ghidraRun
It may ask you for your JDK path. Enter where you installed your OpenJDK 1 like so:
******************************************************************
JDK 17+ (64-bit) could not be found and must be manually chosen!
******************************************************************
Enter path to JDK home directory:
If you already hava Java installed and just need to find the JDK home directory you can execute the following:
> which javac # returns location of the java compiler
> javac -version # returns the version of the java compiler
Note that on MacOSX it installed to: /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home
First of all, you need a project to start reverse-engineering a binary executable. To do this, use File -> New Project.
Select Non-Shared project
, give it a name such as Example
and click Finish.
To follow along in this tutorial, you can either compile the sample code provided or download the pre-compiled executables.
Both are available on James’s GitHub repository: GitHub - james-tate/ghidraExampleSource.
Note that there are two pre-compiled executables in this repository: one is stripped (which means it doesn’t have any debug symbols) and the other is standard.
You can use the compiler of your choice as long as it supports C++. So, if you have a special compiler for PS2/Dreamcast/Xbox/Gamecube, etc., feel free to use that. But bear in mind that importing executables for those systems will require a third-party plugin known as a loader
.
You can import a file into Ghidra very simply with: File -> Import File
. Find your executable file that you built with your C++ compiler.
This will open the import dialog. In this tutorial, we also want to load in the external libraries. This makes it easier to reverse engineer, as you can swap between the main executable and the libraries really easily in Ghidra. 2
To do this, click “Options” and set the Library Paths in the dialog.
It will show the Import Results dialog with a lot of interesting information it found about the binary.
Now finally double click on the example
executable to unleash the Ghidra!
It will now start importing the file and ask you if you want to analyze it. Select “Yes” and keep the default settings.
If you have symbols, you can use the Navigation -> Go To...
menu and type “main”. But if you don’t have symbols (e.g you used the stripped version), then we will need to find it ourselves.
To find it manually, go to the .text
section, and it will take you to the entry
function. If you are using the same example as the video tutorial, then you will have a __libc_start_main
function, and its first parameter is a function pointer to the main
function.
If you are using a different executable or compiled with a different compiler, this can be set up differently. But entry
will call main
somewhere, so it may require a bit of debugging with a debugger such as gdb
or an emulator’s built-in debugger.
When you have found what you believe to be the main method, right-click on the auto-generated function name, and select “Rename Function”.
One of the main advantages of Ghidra is its free out of the box decompiler, now that you have found the main function it is easy to decompile it by going to Window -> Decompile
.
If you have debug symbols in the executable then it will look very similar to the original source:
In this section, we will learn how to use structures in Ghidra by applying them to data and navigating through the program using cross-references. We will also learn how to change the function signature to improve data presentation and how to create an array and apply it to a global offset 3.
Before we can use structures in Ghidra, we need to set them up. To do this, we can follow these steps:
Open the program in Ghidra and go to the Data Type Manager:
Create a new structure and name it:
Add fields to the structure and set their data types and offsets.
Save the structure.
Once we have set up the structure, we can apply it to data by following these steps:
Note that if you get something similar to:
yourStructName.field0x4._0_1_
Then this means that at offset 0x4 in the struct we have an undefined field for the structure.
There is also a short cut for doing this directly from the decompile view by right clicking and selecting “Auto Create Structure”.
To see where the global structure or function is being used, we can go to the listing view and look at the cross-references. The cross-references show us everywhere in the program that is referencing that particular global variable. We can double-click on the cross-reference to quickly navigate to that location in the program.
To navigate through the program using cross-references, we can follow these steps:
To change the function signature in Ghidra, we can follow these steps:
We can also use Ghidra to create arrays. To do this, we first need to identify the size of the elements in the array. In our example, we can see that the size of each element is 4 bytes. We can then right-click on the global variable and select “Create Array”. We can then specify the number of elements we want to create, making sure not to create too many and overwrite existing data.
To create an array in Ghidra, we can follow these steps:
Follow these easy steps to analyze and identify classes in Ghidra.
In the video the instructor show this code in the decompile window:
ppcVar1 = operator.new(0x14);
FUN_000111f4(ppcVar1);
(***ppcVar1)(ppcVar1, ranNum);
(**(*ppcVar1 + 0xc)) (ppcVar1, ranNum & Oxffff);
Take a close look at the code and try to identify the class constructor and virtual function calls. This will help you understand the class structure better.
The variable ppcVar1
can be renamed to this
as it represents the this pointer of the class that was created with operator.new
.
Note that operator.new
only appears if you have added the external libC library that it was compiled with to the project.
The line FUN_000111f4(ppcVar1);
is most likely a constructor call as it comes directly after the new call and also takes in the this
pointer.
Now that you have a better understanding of the code, let’s create a class in Ghidra:
thiscall
and save.
Take some time to identify the data types of the class members. Once you know what each member is, update their names to make your code easier to understand.
You will notice that the constructor calls a function at the start, this is the constructor for the base class.
If you click on the PTR___cxa_pure_virtual_000117ec
it will take you to the listing view where it shows three other virtual functions:
It’s time to create a structure to represent the VTable for the Base class:
func *
)If you click on PTR_FUN_000112a8+1_000117c
you will be taken to the listing view with 6 functions listed:
You can change these all to __thiscall
as they are all the functions that will go into the VTable.
In this tutorial, we will learn how to analyze a derived class in C++ and rename its functions for better understanding. We will start by setting up the derived class and then analyze its functions one by one.
If we go through all of our virtual functions in the VTable you will eventually find the Destructor for the class, which calls operator.delete
(if you have the libc library).
__thiscall
~ClassNameDestructor
In this tutorial, we’ll explore a derived class constructor and its associated members. We’ll also create a virtual table pointer for better understanding of the virtual function calls.
The derived class has the following decompilation after setting most of the variable names:
/* DISPLAY WARNING Type casts are NOT being printed */
void __thiscall Nest::Nest(Nest *this)
{
bool bVar1;
char *pcVar2;
bool bVar3;
char *pcVar4;
char *local_10;
char *local_c;
*&this-vptr = &NestVtable;
pcVar2 = malloc(0x20);
this->hashsub1 = pcVar2;
pcVar2 = malloc(0x20);
this->hashSub2 = pcVar2;
puts("Creating Nest Object");
this->0x1337 = 0x1337;
this->hash = "8689d701c21f91c4085f08d9a411c629";
local_18 = this->hashsub2;
local_c = this->hashsub1;
bVar3 = false;
while (bar1 = bVar3, *this->hash != '\0') {
pcVar4 = this->hash;
this->hash = pcVar4 + 0x1;
pcVar2 = local_c + 0x1;
*local_c = *pcVar4;
bVar3 = bVar1 ^ 0x1;
local_c = pcVar2;
if (bVar1) {
*local_10 = *this->hash;
local_10 = local_10 + 0x1;
local_c = peVar2;
}
}
return;
)
Global classes are setup before the main function is even called in a function called init.
Reverse engineering on shared libraries can be a time-consuming task, especially when dealing with embedded systems. In this tutorial, we will explore the tools and capabilities available in native Linux, as well as the scripting interface and headless analysis tool that Ghidra offers. We will use a nonsensical example to show how to use Ghidra’s headless analysis tool to scan multiple shared libraries in order to speed up your analysis.
Before starting, make sure that you have the following tools installed on your system:
ldd
- to list shared library dependenciesobjdump
- to display information about object filesYou will also need a set of shared libraries to work with. You can download the example libraries from the author’s GitHub page.
ldd
and objdump
We can use the ldd
command to list the shared library dependencies for a given binary. For example, running ldd <binary>
will show which shared libraries the binary will try to pull in to execute.
We can also use objdump
to display information about object files. For example, running objdump -T <binary>
will show the exported symbols from the binary.
To use Ghidra for reverse engineering shared libraries, we first need to load the shared libraries into the project. We can do this by selecting “File > Import > External Libraries” and then selecting the shared libraries we want to load.
We can then use Ghidra to analyze the shared libraries. For example, we can click on a function in the binary and Ghidra will automatically switch to the location of where that function lives inside of the shared library.
However, if we have hundreds of binaries or shared libraries to analyze, this process can be time-consuming. In such cases, we can use Ghidra’ headless analysis tool and scripting interface.
To use Ghidra’ headless analysis tool, we need to create a python script using the python Ghidra FlatProgramAPI
to perform our analysis. Here, we will create a script to extract the names of objects created by calling the setname
function:
instructions = currentProgram.getListing().getInstructions(1)
for instruction in instructions:
mnemonic = instruction.getMnemonicString()
if mnemonic == "CALL":
funcAddress = instruction.getOpObjects(0)[0]
func = getFunctionContaining(toAddr(funcAddress.getOffset()))
callingFunc = getFunctionContaining(instruction.getAddress())
if func is not None:
if func.getName() == "setName":
inst = instruction.getPrevious()
instAddr = inst.getAddress()
while(getFunctionContaining(instAddr) == callingFunc ):
numOps = getInstructionAt(instAddr).getNumOperands()
for i in range(numOps):
for op in getInstructionAt(instAddr).getOperandReferences(i):
if op.getReferenceType().isData():
data = getDataAt(op.getToAddress())
if data is not None:
if data.getDataType().toString() == "string":
print "Found name of the {}() to be {} in {}".format(
callingFunc,
data,
currentProgram)
inst = inst.getPrevious()
instAddr = inst.getAddress()
This script gets a list of all the instructions starting at the first instruction, then goes through each of those instructions and gets the mnemonic string. We only want it to print out the string whenever it’s a CALL. We then get the address in which we are making a call to and walk backward from this call to get each instruction until we find an instruction that is loading a string. We print out that string value.
You can execute this script by importing it in the Ghidra Script Manager.
We also use this script with the analyzeHeadless
tool to automate our analysis. For example, we can run the following command:
./analyzeHeadless $(pwd) names --import $(pwd)/*.so -recursive -postScript my_script.py