Introduction to Decompiling C++ with Ghidra

Edit on Github | Updated: 1st September 2019

Introduction

This tutorial series will guide you through the basics of decompiling a C++ executable, from setup all the way to reversing C++ classes.

The video tutorial is created by James Tate over on his excellent YouTube channel, and it is highly recommended that you subscribe here: James Tate - YouTube.

Download and Run Ghidra

The first step, of course, is to download Ghidra if you haven’t already, which you can do from the official site:

Download Ghidra

Download Ghidra from the Official Site

At the time of writing this tutorial, the version of Ghidra was 10.2.3.

You will also need a Java Development Kit (JDK) version 17+, which you can download from the AdoptOpenSDK official site: AdoptOpenJDK - Open source, prebuilt OpenJDK binaries.

You can now run Ghidra from the extracted folder by running the main script from bash (or double-clicking on it):

./ghidraRun

It may ask you for your JDK path. Enter where you installed your OpenJDK ¹ like so:

******************************************************************
JDK 17+ (64-bit) could not be found and must be manually chosen!
******************************************************************
Enter path to JDK home directory: 

If you already hava Java installed and just need to find the JDK home directory you can execute the following:

> which javac # returns location of the java compiler
> javac -version # returns the version of the java compiler

Note that on MacOSX it installed to: /Library/Java/JavaVirtualMachines/temurin-17.jdk/Contents/Home

Create a New Project

First of all, you need a project to start reverse-engineering a binary executable. To do this, use File -> New Project. GhidraNewProject

Select Non-Shared project, give it a name such as Example and click Finish.

Obtaining Your Binary Executable to Reverse

To follow along in this tutorial, you can either compile the sample code provided or download the pre-compiled executables.

Both are available on James’s GitHub repository: GitHub - james-tate/ghidraExampleSource.

Note that there are two pre-compiled executables in this repository: one is stripped (which means it doesn’t have any debug symbols) and the other is standard.

You can use the compiler of your choice as long as it supports C++. So, if you have a special compiler for PS2/Dreamcast/Xbox/Gamecube, etc., feel free to use that. But bear in mind that importing executables for those systems will require a third-party plugin known as a loader.

Import Your Binary Executable

You can import a file into Ghidra very simply with: File -> Import File. Find your executable file that you built with your C++ compiler. GhidraImportFile

This will open the import dialog. In this tutorial, we also want to load in the external libraries. This makes it easier to reverse engineer, as you can swap between the main executable and the libraries really easily in Ghidra. ²

To do this, click “Options” and set the Library Paths in the dialog. GhidraImportOptions

It will show the Import Results dialog with a lot of interesting information it found about the binary.

Now finally double click on the example executable to unleash the Ghidra! GhidraAfterImportExe

It will now start importing the file and ask you if you want to analyze it. Select “Yes” and keep the default settings. GhidraAnalysisOptions

How to Find the Main Function

If you have symbols, you can use the Navigation -> Go To... menu and type “main”. But if you don’t have symbols (e.g you used the stripped version), then we will need to find it ourselves. GhidraGoTo GhidraGoToMain

To find it manually, go to the .text section, and it will take you to the entry function. If you are using the same example as the video tutorial, then you will have a __libc_start_main function, and its first parameter is a function pointer to the main function. GhidraStrippedLibCStartMain

If you are using a different executable or compiled with a different compiler, this can be set up differently. But entry will call main somewhere, so it may require a bit of debugging with a debugger such as gdb or an emulator’s built-in debugger.

When you have found what you believe to be the main method, right-click on the auto-generated function name, and select “Rename Function”. GhidraRenameFunction

Decompile the Main Function

One of the main advantages of Ghidra is its free out of the box decompiler, now that you have found the main function it is easy to decompile it by going to Window -> Decompile. GhidraDecompileMain

If you have debug symbols in the executable then it will look very similar to the original source: Ghidra Decompiled Output

Using Structures in Ghidra

In this section, we will learn how to use structures in Ghidra by applying them to data and navigating through the program using cross-references. We will also learn how to change the function signature to improve data presentation and how to create an array and apply it to a global offset ³.

Setting Up Structures in Ghidra

Before we can use structures in Ghidra, we need to set them up. To do this, we can follow these steps:

Open the program in Ghidra and go to the Data Type Manager:
Create a new structure and name it:
Add fields to the structure and set their data types and offsets.
Save the structure.

Once we have set up the structure, we can apply it to data by following these steps:

Highlight the data and right-click.
Choose “Data Type” and select the structure we created.
Click “Apply” to apply the structure to the data.

Note that if you get something similar to:

yourStructName.field0x4._0_1_

Then this means that at offset 0x4 in the struct we have an undefined field for the structure.

There is also a short cut for doing this directly from the decompile view by right clicking and selecting “Auto Create Structure”. GhidraAutoCreateStructure

Creating Arrays and Changing Function Signatures

Navigating through the Program with Cross-References

To see where the global structure or function is being used, we can go to the listing view and look at the cross-references. The cross-references show us everywhere in the program that is referencing that particular global variable. We can double-click on the cross-reference to quickly navigate to that location in the program.

To navigate through the program using cross-references, we can follow these steps:

Go to the listing view and look for the cross-references.
Click on the cross-reference to go directly to the function.
Note the ‘R’ or ‘W’ beside the Cross Reference indicating whether the function Reads or Writes to it.

Changing Function Signatures and Naming

To change the function signature in Ghidra, we can follow these steps:

Highlight the function and right-click.
Choose “Edit Function Signature”.
Change the data type to the correct type (in this case, a global structure pointer).
Click “OK” to save the changes.

Creating Arrays

We can also use Ghidra to create arrays. To do this, we first need to identify the size of the elements in the array. In our example, we can see that the size of each element is 4 bytes. We can then right-click on the global variable and select “Create Array”. We can then specify the number of elements we want to create, making sure not to create too many and overwrite existing data.

To create an array in Ghidra, we can follow these steps:

Highlight the data and right-click.
Choose “Data Type” and select “Create Array”.
Choose the number of elements and the data type.
Click “OK” to create the array.

Analyzing and Identifying C++ Classes in Ghidra

Follow these easy steps to analyze and identify classes in Ghidra.

Step 1: Identify C++ Instance creation logic

In the video the instructor show this code in the decompile window:

ppcVar1 = operator.new(0x14);
FUN_000111f4(ppcVar1);
(***ppcVar1)(ppcVar1, ranNum);
(**(*ppcVar1 + 0xc)) (ppcVar1, ranNum & Oxffff);

Take a close look at the code and try to identify the class constructor and virtual function calls. This will help you understand the class structure better.

The variable ppcVar1 can be renamed to this as it represents the this pointer of the class that was created with operator.new.

Note that operator.new only appears if you have added the external libC library that it was compiled with to the project.

The line FUN_000111f4(ppcVar1); is most likely a constructor call as it comes directly after the new call and also takes in the this pointer.

Step 2: Create a class in Ghidra

Now that you have a better understanding of the code, let’s create a class in Ghidra:

Edit the constructor function signature and select the calling convention as thiscall and save.
Now when you right-click on the first parameter to the constructor function you can choose “Auto Create Class” to create the class.
Give the auto-generated class a more meaningful name.

Step 3: Give your class members meaningful names

Take some time to identify the data types of the class members. Once you know what each member is, update their names to make your code easier to understand.

GhidraRenameField

Step 4: Set up the virtual table for the base class

You will notice that the constructor calls a function at the start, this is the constructor for the base class. GhidraDetectingInheritanceVPtr

If you click on the PTR___cxa_pure_virtual_000117ec it will take you to the listing view where it shows three other virtual functions: Ghidra4VirtualFunctions

It’s time to create a structure to represent the VTable for the Base class:

Create a new structure (New -> Structure) called “BaseVtable” with a virtual function.
Add the other virtual functions in the same way (func *)
Now do the same for the Derived class as it will override some of the virtual functions

If you click on PTR_FUN_000112a8+1_000117c you will be taken to the listing view with 6 functions listed: GhidraFunctionsThatMakeUpVTable

You can change these all to __thiscall as they are all the functions that will go into the VTable.

Destructors

In this tutorial, we will learn how to analyze a derived class in C++ and rename its functions for better understanding. We will start by setting up the derived class and then analyze its functions one by one.

Detecting Destructors

If we go through all of our virtual functions in the VTable you will eventually find the Destructor for the class, which calls operator.delete (if you have the libc library). GhidraDestructor

Set it to a __thiscall
Rename them to a suitable deconstructor name ~ClassNameDestructor

GhidraDecompileDestructure

Derived Class Constructors

In this tutorial, we’ll explore a derived class constructor and its associated members. We’ll also create a virtual table pointer for better understanding of the virtual function calls.

Analyzing the Derived Class Constructor

The derived class has the following decompilation after setting most of the variable names:

/* DISPLAY WARNING Type casts are NOT being printed */

void __thiscall Nest::Nest(Nest *this)
{
  bool bVar1;
  char *pcVar2;
  bool bVar3;
  char *pcVar4;
  char *local_10;
  char *local_c;

  *&this-vptr = &NestVtable;
  pcVar2 = malloc(0x20);
  this->hashsub1 = pcVar2;
  pcVar2 = malloc(0x20);
  this->hashSub2 = pcVar2;
  puts("Creating Nest Object");
  this->0x1337 = 0x1337;
  this->hash = "8689d701c21f91c4085f08d9a411c629";
  local_18 = this->hashsub2;
  local_c = this->hashsub1;
  bVar3 = false;
  
  while (bar1 = bVar3, *this->hash != '\0') {
    pcVar4 = this->hash;
    this->hash = pcVar4 + 0x1;
    pcVar2 = local_c + 0x1;
    *local_c = *pcVar4;
    bVar3 = bVar1 ^ 0x1;
    local_c = pcVar2;
    if (bVar1) {
      *local_10 = *this->hash;
      local_10 = local_10 + 0x1;
      local_c = peVar2;
    }
  }
  return;
)

C++ Classes Stack and Global Classes

Global classes are setup before the main function is even called in a function called init.

Ghidra Shared Library Scripting and Headless Analysis

Reverse engineering on shared libraries can be a time-consuming task, especially when dealing with embedded systems. In this tutorial, we will explore the tools and capabilities available in native Linux, as well as the scripting interface and headless analysis tool that Ghidra offers. We will use a nonsensical example to show how to use Ghidra’s headless analysis tool to scan multiple shared libraries in order to speed up your analysis.

Prerequisites

Before starting, make sure that you have the following tools installed on your system:

ldd - to list shared library dependencies
objdump - to display information about object files
Ghidra

You will also need a set of shared libraries to work with. You can download the example libraries from the author’s GitHub page.

Analyzing Shared Libraries

Using `ldd` and `objdump`

We can use the ldd command to list the shared library dependencies for a given binary. For example, running ldd <binary> will show which shared libraries the binary will try to pull in to execute.

We can also use objdump to display information about object files. For example, running objdump -T <binary> will show the exported symbols from the binary.

Using Ghidra

To use Ghidra for reverse engineering shared libraries, we first need to load the shared libraries into the project. We can do this by selecting “File > Import > External Libraries” and then selecting the shared libraries we want to load.

We can then use Ghidra to analyze the shared libraries. For example, we can click on a function in the binary and Ghidra will automatically switch to the location of where that function lives inside of the shared library.

However, if we have hundreds of binaries or shared libraries to analyze, this process can be time-consuming. In such cases, we can use Ghidra’ headless analysis tool and scripting interface.

Using Ghidra’ Headless Analysis Tool and Scripting Interface

To use Ghidra’ headless analysis tool, we need to create a python script using the python Ghidra FlatProgramAPI to perform our analysis. Here, we will create a script to extract the names of objects created by calling the setname function:

instructions = currentProgram.getListing().getInstructions(1)

for instruction in instructions:
	mnemonic = instruction.getMnemonicString()

	if mnemonic == "CALL":
		funcAddress = instruction.getOpObjects(0)[0]
		func = getFunctionContaining(toAddr(funcAddress.getOffset()))
		callingFunc = getFunctionContaining(instruction.getAddress())
		if func is not None:
			if func.getName() == "setName":
				inst = instruction.getPrevious()
				instAddr = inst.getAddress()
				while(getFunctionContaining(instAddr) == callingFunc ):
					numOps = getInstructionAt(instAddr).getNumOperands()
					for i in range(numOps):
						for op in getInstructionAt(instAddr).getOperandReferences(i):
							if op.getReferenceType().isData():
								data = getDataAt(op.getToAddress())
								if data is not None:
									if data.getDataType().toString() == "string":
										print "Found name of the {}() to be {} in {}".format(
											callingFunc, 
											data,
											currentProgram)
					inst = inst.getPrevious()
					instAddr = inst.getAddress()

This script gets a list of all the instructions starting at the first instruction, then goes through each of those instructions and gets the mnemonic string. We only want it to print out the string whenever it’s a CALL. We then get the address in which we are making a call to and walk backward from this call to get each instruction until we find an instruction that is loading a string. We print out that string value.

You can execute this script by importing it in the Ghidra Script Manager.

We also use this script with the analyzeHeadless tool to automate our analysis. For example, we can run the following command:

./analyzeHeadless $(pwd) names --import $(pwd)/*.so -recursive -postScript my_script.py

Binary Diffing with Ghidra