RISC-V: Baremetal From The Ground Up (Chipyard Edition)
This article will walk you through the behind-the-scenes of how a baremetal C program is compiled and linked as a RISC-V binary file.
Let's start with something simple. The "hello world" equivalent program in the embedded systems world would be the blinkly LED program:
This might look intimidating. Let's break down the elements:
First, we define SET_BITS
and CLEAR_BITS
as macro functions. These will become handy, since when operating Memory-Mapped Input/Output (MMIO) registers, in most cases we are operating on a bit-level, only touching fields that we are focused on and leave the rest bits intact.
Then, we define GPIOA_OUTPUT_VAL
, GPIOA_OUTPUT_EN
, and CLINT_MTIME
. These are the memory address of the corresponding MMIO registers.
Followed by that, we write our first actual line of C program, which defines GPIO_PIN
as a constant. We also define a global variable called counter
.
Note:
We only defined the address of MMIO registers that we are going to use here. And to demonstrate the read-only data section, we delibriately define the GPIO_PIN
as a global constant instead of macro.
In a more proper program, these elements are defined in a slightly different way (see CLINT, GPIO, and HAL_GPIO in Baremetal-IDE as an example)
Then, we define delay()
, which reads from the mtime
register in CLINT to keep track of the time, and halt the program for a given amount of ticks
.
Note:
Note that we are using ticks, instead of a physical unit of time like seconds or milliseconds, in the delay function. This is because without special circuits such as Real-time Clock (RTC), the SoC does not have a sense of how fast the real-world wall clock. The ratio between ticks and seconds is determined by the input clock frequency and the internal clock tree settings of the SoC.
Finally, we move to the main()
function. Inside the function, we first enable the GPIO output functionality, and then proceed to an infinite loop. Inside the loop, we toggle the LED every time the loop restarts, delay for 1000 ticks, and then increment the counter.
Note:
You might be more familiar with the main that looks like this:
This is because in embedded systems, main normally would be an infinite loop and will not return. It does not make sense for an embedded program to "exit from main()", since there is no additional code past main().
In order to compile this C program to something our SoC can understand, we need to use the RISC-V Toolchain.
RISC-V Toolchain
The RISC-V Toolchain is a collection of executables that helps us to compile, assemble, and link the program we write in C/C++ to binary format. It can also provide tools for us to debug and analyze the generated binaries.
There is a wide range of choices of toolchains, usually marked by different prefixes. The following is a simple list of the common ones that we may encounter:
TODO
Here, we will use the riscv-gnu-toolchain from riscv-collab (it comes with the prefix riscv64-unknown-elf-
).
For toolchain installation, see Setting up RISC-V Toolchain.
In the toolchain directory, we can see a set of executables:
-gcc
is the most general one. You can consider it as the entry executable which can invoke the compiler, the linker, and the assembler by passing it different compiler flags.
-ar
is the assembler itself.
-ld
is the linker itself.
-objdump
, -readelf
, and -nm
are elf file analyzers.
-objcopy
is the format converter. It can convert between elf format, binary, hex, and many other.
All of these toolchain executables will run on the host machine, but it knows the architecture of the target SoC, and thus can build the binary in the format that our target can understand.
Build Process
Now let's dive into the build process. There are several stages in the build process. Normally, the toolchain will join several stages together to speed up the build process. Here, we pass special flags to the toolchain to let it stop at each stage, so we can take a look at the intermediate contents.
Pre-processing Stage
The first stage is the pre-processing stage.
In this stage, the compiler will resolve all the compiler macros (basically, everything we defined with "#" marks).
By default, the compiler will not generate this intermediate "main.i" file for us. To do this, we will pass the -E
argument to tell the compiler stop after pre-processing. We use the -o
argument to specify the output file.
We can see that in main.i
, all of the macro defines are processed and replaced with their definition contents.
Code Generation Stage
The "main.i" file is then passed through the compiler again for the code-generation stage.
In this stage, all of the high-level C/C++ code will be converted to architecture-specific assembly language.
Similarly, the compiler will not generate this intermediate file for us, and we need to use the -S
argument to command the compiler stop after code-generation.
The resulting file is our familiar RISC-V assembly code.
Assembling Stage
At the assembling stage, the assembly language will be further converted into binary instructions.
The output file is also called "relocatable object file". The word “relocatable” indicates that the addresses in the program (where to put each piece of code in the memory) are not determined yet.
Same as before, we need to supply the -c
flag to prevent compiler proceed to linking stage.
The format of the relocatable object file is in Executable and Linkable Format (ELF). Since it's a binary format, we cannot examine the content directly with text editor anymore, so we need the toolchain to decode the content.
Analyzing Relocatable Object Files
There's still one last stage (linking stage) remaining, but let's take a side track here and examine the content of the generated "main.o" file first.
The ELF format describes how various elements of the code (e.g. code, data, read-only data, uninitialized data) are located in different sections.
We will use the riscv-unknown-elf-objdump
to analyze our program
Display Section Headers
Let's first examine the section headers in main.o.
By running objdump with -h
argument, we can print out all the section headers in an ELF file.
.text
section holds the code of the program.
.data
section holds the initialized global data.
.bss
section holds the uninitialized global data. The actual memory mapped with this section will be reset to zero by the program boot code. The name "bss" stands for "block starting symbol", and is chosen due to historical reasons. Due to the RISC-V compiler's default setting, it's also generating a .sbss
section, which stands for "small .bss data" and decides to put our counter
variable there.
.rodata
section holds the read-only data. Due to the RISC-V compiler's default setting, it instead generats a .srodata
section, which stands for "small .rodata data" and decides to put our GPIO_PIN
constant there.
.comment
and .riscv.attributes
are sections added by the compiler for debugging purposes.
Note that all the sections start at address 0x00. This is the reason why .o
files are called relocatable. All of the addresses are relative, and it will be during the linking stage to let linker to convert these relative addresses into absolute locations.
Display Full Content
With -s
argument, we can print out the full content of the ELF file. . The result will be large, so we redirect the output to a file.
Display Disassembly
With -d
argument, we can print out the disassembly code from the text section
Linking Stage
This is the final stage before we can get an executable binary program.
The linker will put different pieces of code and data to our desired address locations, resolve all the not-yet-defined symbols, and merge all the programs and external libraries into a single file.
We need to tell the linker how we want the program to be linked together, and that is through the use of a linker script.
Linker scripts are written in linker commands, with the file extension .ld
.
Linker Commands
ENTRY
defines the entry point of the program. It is the first piece of code the MCU will execute. Debugger will also set the initial PC location according to this entry value.
Syntax of the ENTRY command is shown below, where entry_symbol_name
is the name of the entry function.
MEMORY
defines the various memory regions in the MCU and provides info of their locations and sizes. Linker also calculates the total code size and memory usage from this value to determine if the program can fit inside the memory.
Syntax of the MEMORY command is shown below.
The attribute is defined as follows:
R
Read-only sections
W
Read and write sections
X
Sections containing executable code
A
Allocated sections
I
Initialized sections
L
Initialized sections, same as I
!
Invert the meaning of the following symbols
SECTION
defines which symbol sections are mapped to which memory regions, as well as the order of the mapping. It will generate the defined sections in the final ELF file. For example, we can map .text
section to FLASH
region.
Syntax of the SECTION command is shown below.
When virtual memory address and load memory address are the same, we only need to write the virtual memory address.
Writing Linker Script
For the sake of simplicity and ease of understanding, for now we will not care about the C runtime hassles and interrupt routines. We will make our program enter directly to main, and start to run our blinky LED program.
Thus, the entry symbol of our program will just be main
In Chipyard tutorial SoC design, we have three memory regions
To keep things simple, we will stack every section on top of each other on scratchpad memory.
Now we have our unsafe-but-usable linker script:
Finally, we are ready for linking.
With -T
argument, we can tell gcc to link the target programs.
Also for simplicity, we are not going to link the standard C library for now. To do that, we are adding the -nostdlib
argument.
Format Converison
Loading the Program
TODO
Startup Code
Our LED has successfully blinked. However, if we try running other more complex programs, they might fail. This is because we have made a lot of assumptions about the state of the SoC when we enter the main() function.
This is usually set up with a startup file. This piece of the program will be responsible for setting up the interrupt vector, initializing the stack, zeroing out the .bss
section, and sometimes also copying the .data
section to SRAM. Hence, we will write our own startup file to properly initialize the SoC.
// TODO: change
Boot Flow:
The program starts at the BootROM `path`.
Jump to the entry point, which is at the label: _enter in
freedom-metal/src/entry.S
.Initialize global pointer gp register using the generated symbol
__global_pointer$
.Write mtvec register with early_trap_vector as default exception handler.
Read mhartid into register a0 and call _start, which exists in crt0.S.
Initialize stack pointer, sp, with _sp generated symbol. Harts with mhartid of one or larger are offset by (_sp + __stack_size * mhartid). The __stack_size field is generated in the linker file.
Check if mhartid == __metal_boot_hart and run the init code if they are equal. All other harts skip init and go to the Post Init Flow, step #15.
Boot Hart Init Flow Begins Here
Init data section to destination in defined RAM space
Copy ITIM section, if ITIM code exists, to destination
Zero out bss section
Call atexit library function which registers the libc and freedom-metal destructors to run after main returns
Call __libc_init_array library function, which runs all functions marked with attribute((constructor)).
Post Init Flow Begins Here
Call the C routine __metal_synchronize_harts, where hart 0 will release all harts once their individual msip bits are set. The msip bit is typically used to assert a software interrupt on individual harts, however interrupts are not yet enabled, so msip in this case is used as a gatekeeping mechanism
Check misa register to see if floating point hardware is part of the design, and set up mstatus accordingly.
Single or multi-hart design redirection step
If design is a single hart only, or a multi-hart design without a C-implemented function secondary_main, ONLY the boot hart will continue to main(). b. For multi-hart designs, all other CPUs will enter sleep via WFI instruction via the weak secondary_main label in crt0.S, while boot hart runs the application program. c. In a multi-hart design which includes a C-defined secondary_main function, all harts will enter secondary_main as the primary C function.
Interrupt Vector
Stack Initialization
__stack_size
__boot_hart_idx
__global_pointer$
_sp: Address of the end of stack for hart 0, used to initialize the beginning of the stack since the stack grows lower in memory. On a multi-hart system, the start address of the stack for each hart is calculated using (_sp + __stack_size * mhartid)
metal_segment_bss_target_start & metal_segment_bss_target_end ◦ Used to zero out global data mapped to .bss section
metal_segment_data_source_start, metal_segment_data_target_start, metal_segment_data_target_end ◦ Used to copy data from image to its destination in RAM.
metal_segment_itim_source_start, metal_segment_itim_target_start, metal_segment_itim_target_end ◦ Code or data can be placed in itim sections using the __attribute__section(".itim")
Last updated