Skip to content


STM32 »

Compilation process with Linker and Startup code

The compilation is a method whereby the source code is converted into object code. There are some steps involved to produce the final executable file. Linker and Startup code are two important parts of the compilation and boot up to make sure binary code is orginzed and system can run.

Last update: 2022-06-04


STM32-Tutorials

Overview#

Following are the steps that a program goes through until it is translated into an executable form:

  • Preprocessing
  • Compilation
  • Assembly
  • Linking

In general, compilation for Microprocessors is the same as the Compilation process for executables on an Operating System. However, there are some main different points:

  • Cross-compilation: MCUs can not run a compiler itself, therefore, there must be a cross-compiler for MCUs
  • Library: MCUs use a light-weight version of libraries to reduce the program footprint and might increase performance
  • Hardware-depend: Many libraries only implement minimal code which mainly does nothing, such as the standard I/O. On a specific hardware, the actual implementation must be done.
  • Linking: The executables have to define sections stored in different memory spaces in runtime (Flash/ RAM). On MCU, CPU can directly execute instructions on Flash device.

For the general steps in compilation, refer to the Compilation for C/C++ on OS.

ARM Toolchain#

If you use STM32Cube IDE, the IDE already has a toolchain for STM32 MCUs. If you start without the IDE, you can start with Arm GNU Toolchain.

A good alternative toolchain package is The xPack Build Framework:

The xPack project aims to provide a set of cross-platform tools to manage, configure and build complex, modular, multi-target (multi-architecture, multi-board, multi-toolchain) projects, with an emphasis on C/C++ and bare-metal embedded projects.

Download and install them. Note to add the binary folders to the system environment.

Add Arm GNU toolchain to System Environment

EABI

The default ARM tool chain application binary interface is the Embedded Application Binary Interface (EABI). It defines the conventions for files, data types, register mapping, stack frame and parameter passing rules. The EABI is commonly used on ARM and PowerPC CPUs.

Example program#

nostdlib.zip

This is the Blink - Hello World program: Blink a LED at 10 Hz.

You can use any STM32 board because this is just a very simple project. I choose to use a Nucleo-64 board with STM32F411RE.

The main code to blink LED on PA5 by using registers:

main.c
#include <stdint.h>
#include "delay.h"

/* Clock */
#define RCC_AHB1ENR     *((volatile uint32_t*) (0x40023830))

/* GPIO A */
#define GPIOA_MODER     *((volatile uint32_t*) (0x40020000))
#define GPIOA_BSRR      *((volatile uint32_t*) (0x40020018))

/* Global initialized variable */
uint32_t isLoop = 1;

int main() {
    /* turn on clock on GPIOA */
    RCC_AHB1ENR |= (1 << 0);

    /* set PA5 to output mode */
    GPIOA_MODER &= ~(1 << 11);
    GPIOA_MODER |=  (1 << 10);

    while(isLoop) {
        /* set HIGH on PA5 */
        GPIOA_BSRR |= (1 << 5);
        delay();

        /* set LOW on PA5 */
        GPIOA_BSRR |= (1 << (5+16));
        delay();
    }
    return 0;
}

The delay function using a busy loop:

delay.c
#include <stdint.h>

/* Global Read-only variable */
const uint32_t DELAY_MAX = 0x0000BEEF;

/* Global Uninitialized varible */
uint32_t delay_counter;

void delay() {
    for(delay_counter=DELAY_MAX; delay_counter--;);
}

Firstly, just try to compile the program without any specific option:

arm-none-eabi-gcc \
    main.c delay.c \
    -o main.elf
/arm-none-eabi/bin/ld.exe:
/arm-none-eabi/lib\libc.a(lib_a-exit.o): in function `exit':
(.text.exit+0x2c): undefined reference to `_exit'

Of course, you can not compile the source code!

By default, GCC tries to link the application with libc in newlib package, and there is no implementation for the function _exit.

At this time, we will tell the compiler to not use the standard libraries.

arm-none-eabi-gcc \
    -nostdlib \
    main.c delay.c \
    -o main.elf

Just ignore the warning about entry symbol. We’ll fix it later.

Compiler options#

In the Step 5: Check the compilation settings, the STM32Cube IDE automatically sets some compilation flags. How do you select those flags?

GNU Online Documentation is available for different versions.

Target Architecture

We can either use -march= or -mcpu= options, but -mcpu=cortex-m4 is easy to understand and remember. Note that Cortex-M only supports Thumb instruction set, so you have to use -mthumb option. Let use soft Floating Point at this moment.

-mcpu=cortex-m4 -mthumb -mfloat-abi=soft
Target GNU standard
-std=gnu11
Target Libraries

GNU ARM libraries use newlib to provide standard implementation of C libraries. To reduce the code size and make it independent to hardware, there is a lightweight version newlib-nano used in MCUs.

However, newlib-nano does not provide an implementation of low-level system calls which are used by C standard libraries, such as print() or scan(). To make the application compilable, a new library named nosys should be added. This library just provide a simple implementation of low-level system calls which mostly return a by-pass value.

To use newlib-nano and nosys libs:

--specs=nano.specs --specs=nosys.specs

At this moment, we ignore the standard libraries, and check on it later.

Compilation warnings

To see potential errors, enable Warning for all:

-Wall
Debug level

Turn on debug if needed:

-g

Hence, the build command will be:

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    -nostdlib \
    -Wall \
    main.c delay.c \
    -o main.elf

Just ignore the warning about the entry symbol Reset_Handler. We’ll fix it later.

Program sections#

Run arm-none-eabi-objdump to see the sections and code of the output

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    -nostdlib \
    -Wall \
    -c main.c \
    -o main.o

arm-none-eabi-objdump -h main.o > main.o.obj_h
main.o.obj_h
main.o:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000068  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000004  00000000  00000000  0000009c  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  000000a0  2**0
                  ALLOC
  3 .comment      0000003a  00000000  00000000  000000a0  2**0
                  CONTENTS, READONLY
  4 .ARM.attributes 0000002e  00000000  00000000  000000da  2**0
                  CONTENTS, READONLY

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb \
    -nostdlib \
    -std=gnu11 \
    -Wall \
    -c delay.c \
    -o delay.o

arm-none-eabi-objdump -h delay.o > delay.o.obj_h
main.o.obj_h
delay.o:     file format elf32-littlearm

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         0000002c  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000060  2**0
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000004  00000000  00000000  00000060  2**2
                  ALLOC
  3 .rodata       00000004  00000000  00000000  00000060  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .comment      0000003a  00000000  00000000  00000064  2**0
                  CONTENTS, READONLY
  5 .ARM.attributes 0000002e  00000000  00000000  0000009e  2**0
                  CONTENTS, READONLY

.text: Code and Data
The code containing instructions which are located in Flash. The text code also store constant values which are encoded as raw bytes at the end of a function.
.data: Initialized variable

Variables can change their values, so variables are copied from Flash to RAM by the startup code.

In this example, in main.o, there are 4 bytes for uint32_t isLoop = 1;.

.bss: Uninitialized variables

Variables can change their values, so variables are copied from Flash to RAM. However, because these values are uninitialized, so we do not need to store their values, we just need to reserve memory for them.

The entire .bss segment is described by a single number, probably 4 bytes or 8 bytes, that gives its size in the running process, whereas the .data section is as big as the sum of sizes of the initialized variables.

In this example, in delay.o, there are 4 bytes for uint32_t delay_counter;.

.rodata: Read-only data

Constant variables are stored in Flash.

In this example, in delay.o, there are 4 bytes for const uint32_t DELAY_MAX = 0x0000BEEF;.

Section locations

Data (variable) Load time Run time Section Note
Global initialized Flash RAM .data Copy from Flash to RAM by startup code
Global static initialized
Local static initialized
Global uninitialized - RAM .bss Reserved space by startup code
Global static uninitialized
Local static uninitialized
All global constants Flash - .rodata
All other local - RAM (Stack) - App code uses stack to store

Linker and Locator#

Linker is used to merge all sections from different binaries into the final executable file.

main.c --> main.o {
    .text, 
    .data, 
    .bss, 
    .rodata
}

delay.c --> delay.o {
    .text, 
    .data, 
    .bss, 
    .rodata}
main.elf = main.o + delay.o = {
    .text = .text(main) + .text(delay)}
    .data = .data(main) + .data(delay)}
    .bss = .bss(main) + .bss(delay)}
    .rodata = .rodata(main) + .rodata(delay)}
}

A linker script is used to decribe the Memory Layout:

ENTRY command

Set the Entry point address in the header, which tell GDB to know the first instruction to be executed

ENTRY(address)
MEMORY command

Describe different memory parts in the system. Linker uses this information to calculate address

MEMORY
{
    name (attribute): ORIGIN = <address>, LENGTH = <size>
}
SECTIONS command

Create memory layout by creating section name, section order. In each section, choose which data is used, how data is stored, and loaded.

Location Counter is a special symbol denoted by a dot .. Linker will automatically update it with current location information. A variable can be used to save location to mark boundaries. Location counter can be set also.

SECTIONS
{
    <symbol> = LOADADDR(<symbol>);
    .<section>:
    {
        <symbol> = .;
        *(.sub_section);
        . = ALIGN(n);
    } ><Run Location> [AT> Storage Location]
}

Here is the linker script:

linker.ld
ENTRY(Reset_Handler)

MEMORY
{
  RAM    (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH   (rx)    : ORIGIN = 0x08000000,   LENGTH = 512K
}

_estack = ORIGIN(RAM) + LENGTH(RAM);

SECTIONS
{
    .isr_vector :
    {
        *(.isr_vector)
    } >FLASH

    .text :
    {
        *(.text)
        _etext = .;
    } >FLASH

    .rodata :
    {
        *(.rodata)
    } >FLASH

    _lddata = LOADADDR(.data);
    .data :
    {
        _sdata = .;
        *(.data)
        _edata = .;
    } >RAM AT> FLASH

    .bss :
    {
        _sbss = .;
        *(.bss)
        _ebss = .;
    } >RAM
}

In the Linker Script, we define some symbols:

  • _etext: End address of .text section
  • _lddata: Load address (from Flash) of .data section
  • _sdata: Start address of .data section
  • _edata: End address of .data section
  • _sbss: Start address of .bss section
  • _ebss: End address of .bss section

To build with Linker script, use -T <linkerfile>. The option -Wl,-Map=<output> to show the full memory mapping.

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    -nostdlib \
    -Wall \
    -T linker.ld -Wl,-Map=main.tmp.map \
    main.o delay.o \
    -o main.tmp

Open the file main.tmp.map to see the addresses assigned to symbols in the linker scripts.

main.tmp.map
Memory Configuration

Name             Origin             Length             Attributes
RAM              0x0000000020000000 0x0000000000020000 xrw
FLASH            0x0000000008000000 0x0000000000080000 xr
*default*        0x0000000000000000 0xffffffffffffffff

Linker script and memory map

LOAD main.o
LOAD delay.o
                0x0000000020020000                _estack = (ORIGIN (RAM) + LENGTH (RAM))

.isr_vector
 *(.isr_vector)

.text           0x0000000008000000       0x94
 *(.text)
 .text          0x0000000008000000       0x68 main.o
                0x0000000008000000                main
 .text          0x0000000008000068       0x2c delay.o
                0x0000000008000068                delay
                0x0000000008000094                _etext = .

.glue_7         0x0000000008000094        0x0
 .glue_7        0x0000000008000094        0x0 linker stubs

.glue_7t        0x0000000008000094        0x0
 .glue_7t       0x0000000008000094        0x0 linker stubs

.vfp11_veneer   0x0000000008000094        0x0
 .vfp11_veneer  0x0000000008000094        0x0 linker stubs

.v4_bx          0x0000000008000094        0x0
 .v4_bx         0x0000000008000094        0x0 linker stubs

.iplt           0x0000000008000094        0x0
 .iplt          0x0000000008000094        0x0 main.o

.rodata         0x0000000008000094        0x4
 *(.rodata)
 .rodata        0x0000000008000094        0x4 delay.o
                0x0000000008000094                DELAY_MAX
                0x0000000008000098                _lddata = LOADADDR (.data)

.rel.dyn        0x0000000008000098        0x0
 .rel.iplt      0x0000000008000098        0x0 main.o

.data           0x0000000020000000        0x4 load address 0x0000000008000098
                0x0000000020000000                _sdata = .
 *(.data)
 .data          0x0000000020000000        0x4 main.o
                0x0000000020000000                isLoop
 .data          0x0000000020000004        0x0 delay.o
                0x0000000020000004                _edata = .

.igot.plt       0x0000000020000004        0x0 load address 0x000000000800009c
 .igot.plt      0x0000000020000004        0x0 main.o

.bss            0x0000000020000004        0x4 load address 0x000000000800009c
                0x0000000020000004                _sbss = .
 *(.bss)
 .bss           0x0000000020000004        0x0 main.o
 .bss           0x0000000020000004        0x4 delay.o
                0x0000000020000004                delay_counter
                0x0000000020000008                _ebss = .
OUTPUT(main.tmp elf32-littlearm)
LOAD linker stubs

.comment        0x0000000000000000       0x39
 .comment       0x0000000000000000       0x39 main.o
                                         0x3a (size before relaxing)
 .comment       0x0000000000000039       0x3a delay.o

.ARM.attributes
                0x0000000000000000       0x2e
 .ARM.attributes
                0x0000000000000000       0x2e main.o
 .ARM.attributes
                0x000000000000002e       0x2e delay.o

The section .data started at 0x20000000 (loaded at 0x08000098) is only 4 bytes for uint32_t isLoop = 1;.

The section .bss started at 0x20000004 (loaded at 0x0800009c) is 4 bytes for uint32_t delay_counter.

The section .rodata started at 0x08000094 is 4 bytes for const uint32_t DELAY_MAX = 0x0000BEEF;


To find the symbols and their addresses:

arm-none-eabi-nm main.tmp
main.tmp.sym
20000008 B _ebss
20000004 D _edata
20020000 T _estack
08000094 T _etext
08000098 A _lddata
20000004 B _sbss
20000000 D _sdata
08000068 T delay
20000004 B delay_counter
08000094 R DELAY_MAX
20000000 D isLoop
08000000 T main
         U Reset_Handler

Linker Symbols#

Accessing a linker script defined variable from source code is not intuitive. In particular a linker script symbol is not equivalent to a variable declaration in a high level language, it is instead a symbol that does not have a value.

Before going further, it is important to note that compilers often transform names in the source code into different names when they are stored in the symbol table. For example, Fortran compilers commonly prepend or append an underscore, and C++ performs extensive name mangling. Therefore there might be a discrepancy between the name of a variable as it is used in source code and the name of the same variable as it is defined in a linker script. For example in C a linker script variable might be referred to as:

extern int foo;

But in the linker script it might be defined as:

_foo = 1000;

In the remaining examples however it is assumed that no name transformation has taken place.

When a symbol is declared in a high level language such as C, two things happen:

  • The first is that the compiler reserves enough space in the program’s memory to hold the value of the symbol.
  • The second is that the compiler creates an entry in the program’s symbol table which holds the symbol’s address. ie the symbol table contains the address of the block of memory holding the symbol’s value.

So for example the following C declaration, at file scope:

int foo = 1000;

creates an entry called foo in the symbol table. This entry holds the address of an int sized block of memory where the number 1000 is initially stored.

When a program references a symbol the compiler generates code that first accesses the symbol table to find the address of the symbol’s memory block and then code to read the value from that memory block. So:

foo = 1;

looks up the symbol foo in the symbol table, gets the address associated with this symbol and then writes the value 1 into that address. Whereas:

int * a = & foo;

looks up the symbol foo in the symbol table, gets its address and then copies this address into the block of memory associated with the variable a.

Linker scripts symbol declarations, by contrast, create an entry in the symbol table but do not assign any memory to them. Thus they are an address without a value. So for example the linker script definition:

foo = 1000;

creates an entry in the symbol table called foo which holds the address of memory location 1000, but nothing special is stored at address 1000. This means that you cannot access the value of a linker script defined symbol - it has no value - all you can do is access the address of a linker script defined symbol.

Hence, when you are using a linker script defined symbol in source code you should always take the address of the symbol, and never attempt to use its value. For example suppose you want to copy the contents of a section of memory called .ROM into a section called .FLASH and the linker script contains these declarations:

start_of_ROM   = .ROM;
end_of_ROM     = .ROM + sizeof (.ROM);
start_of_FLASH = .FLASH;

Then the C source code to perform the copy would be as below. Note the use of the & operators. These are correct.

extern char start_of_ROM, end_of_ROM, start_of_FLASH;
memcpy (& start_of_FLASH, & start_of_ROM, & end_of_ROM - & start_of_ROM);

Alternatively the symbols can be treated as the names of vectors or arrays and then the code will again work as expected:

extern char start_of_ROM[], end_of_ROM[], start_of_FLASH[];
memcpy (start_of_FLASH, start_of_ROM, end_of_ROM - start_of_ROM);

Note how using this method does not require the use of & operators.

Vector Table#

On reset, the processor loads the MSP with the value from address 0x00000000, then starts code execution from the memory at 0x00000004 which must be the Reset_Handler function.

There are 15 system exceptions, included Reset Handler, and there are up-to 240 interruptions.

The Table 37. Vector table for STM32F411xC/E in the document RM0383: Reference manual STM32F411xC/E advanced Arm®-based 32-bit MCUs shows the supported Exceptions and Interrupts:

Vector table for F411xC/E

vector.c
#include <stdint.h>

#define RAM_START       0x20000000
#define RAM_SIZE        128 * 1024
#define RAM_END         ((RAM_START) + (RAM_SIZE))

void Default_Handler(void) {
    while(1) {}
}

void Reset_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void NMI_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void HardFault_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void MemManage_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void BusFault_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void UsageFault_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void SVC_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void DebugMon_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void PendSV_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void SysTick_Handler(void) __attribute__ ((weak, alias("Default_Handler")));
void WWDG_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void PVD_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TAMP_STAMP_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void RTC_WKUP_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void FLASH_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void RCC_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI0_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI1_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI2_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI3_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI4_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream0_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream1_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream2_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream3_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream4_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream5_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream6_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void ADC_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI9_5_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM1_BRK_TIM9_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM1_UP_TIM10_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM1_TRG_COM_TIM11_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM1_CC_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM2_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM3_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM4_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void I2C1_EV_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void I2C1_ER_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void I2C2_EV_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void I2C2_ER_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void SPI1_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void SPI2_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void USART1_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void USART2_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void EXTI15_10_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void RTC_Alarm_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void OTG_FS_WKUP_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA1_Stream7_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void SDIO_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void TIM5_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void SPI3_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream0_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream1_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream2_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream3_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream4_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void OTG_FS_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream5_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream6_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void DMA2_Stream7_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void USART6_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void I2C3_EV_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void I2C3_ER_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void FPU_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void SPI4_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));
void SPI5_IRQHandler(void) __attribute__ ((weak, alias("Default_Handler")));

__attribute__ ((section(".isr_vector")))
uint32_t vector_table[] = {
    (uint32_t) RAM_END,
    (uint32_t) Reset_Handler,
    (uint32_t) NMI_Handler,
    (uint32_t) HardFault_Handler,
    (uint32_t) MemManage_Handler,
    (uint32_t) BusFault_Handler,
    (uint32_t) UsageFault_Handler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) SVC_Handler,
    (uint32_t) DebugMon_Handler,
    (uint32_t) 0,
    (uint32_t) PendSV_Handler,
    (uint32_t) SysTick_Handler,
    (uint32_t) WWDG_IRQHandler,
    (uint32_t) PVD_IRQHandler,
    (uint32_t) TAMP_STAMP_IRQHandler,
    (uint32_t) RTC_WKUP_IRQHandler,
    (uint32_t) FLASH_IRQHandler,
    (uint32_t) RCC_IRQHandler,
    (uint32_t) EXTI0_IRQHandler,
    (uint32_t) EXTI1_IRQHandler,
    (uint32_t) EXTI2_IRQHandler,
    (uint32_t) EXTI3_IRQHandler,
    (uint32_t) EXTI4_IRQHandler,
    (uint32_t) DMA1_Stream0_IRQHandler,
    (uint32_t) DMA1_Stream1_IRQHandler,
    (uint32_t) DMA1_Stream2_IRQHandler,
    (uint32_t) DMA1_Stream3_IRQHandler,
    (uint32_t) DMA1_Stream4_IRQHandler,
    (uint32_t) DMA1_Stream5_IRQHandler,
    (uint32_t) DMA1_Stream6_IRQHandler,
    (uint32_t) ADC_IRQHandler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) EXTI9_5_IRQHandler,
    (uint32_t) TIM1_BRK_TIM9_IRQHandler,
    (uint32_t) TIM1_UP_TIM10_IRQHandler,
    (uint32_t) TIM1_TRG_COM_TIM11_IRQHandler,
    (uint32_t) TIM1_CC_IRQHandler,
    (uint32_t) TIM2_IRQHandler,
    (uint32_t) TIM3_IRQHandler,
    (uint32_t) TIM4_IRQHandler,
    (uint32_t) I2C1_EV_IRQHandler,
    (uint32_t) I2C1_ER_IRQHandler,
    (uint32_t) I2C2_EV_IRQHandler,
    (uint32_t) I2C2_ER_IRQHandler,
    (uint32_t) SPI1_IRQHandler,
    (uint32_t) SPI2_IRQHandler,
    (uint32_t) USART1_IRQHandler,
    (uint32_t) USART2_IRQHandler,
    (uint32_t) 0,
    (uint32_t) EXTI15_10_IRQHandler,
    (uint32_t) RTC_Alarm_IRQHandler,
    (uint32_t) OTG_FS_WKUP_IRQHandler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) DMA1_Stream7_IRQHandler,
    (uint32_t) 0,
    (uint32_t) SDIO_IRQHandler,
    (uint32_t) TIM5_IRQHandler,
    (uint32_t) SPI3_IRQHandler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) DMA2_Stream0_IRQHandler,
    (uint32_t) DMA2_Stream1_IRQHandler,
    (uint32_t) DMA2_Stream2_IRQHandler,
    (uint32_t) DMA2_Stream3_IRQHandler,
    (uint32_t) DMA2_Stream4_IRQHandler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) OTG_FS_IRQHandler,
    (uint32_t) DMA2_Stream5_IRQHandler,
    (uint32_t) DMA2_Stream6_IRQHandler,
    (uint32_t) DMA2_Stream7_IRQHandler,
    (uint32_t) USART6_IRQHandler,
    (uint32_t) I2C3_EV_IRQHandler,
    (uint32_t) I2C3_ER_IRQHandler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) FPU_IRQHandler,
    (uint32_t) 0,
    (uint32_t) 0,
    (uint32_t) SPI4_IRQHandler,
    (uint32_t) SPI5_IRQHandler,
};

weak and alias attribute

The exception handlers are user defined, so the Default Handler is only used in case the corresponding Handler is not implemented.


section attribute

Code can be assigned to a memory location by labeling the code with sections.

Now, include the vector table in linker, you will see the section .isr_vector is now filled:

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    -nostdlib \
    -Wall \
    -T linker.ld -Wl,-Map=vector.tmp.map \
    main.o delay.o vector.o \
    -o vector.tmp
vector.tmp.map
.isr_vector     0x0000000008000000      0x198
 *(.isr_vector)
 .isr_vector    0x0000000008000000      0x198 vector.o
                0x0000000008000000                vector_table

Startup code#

The startup code is responsible for setting up the right environment for the main code to run.

  • Provide the vector table
  • Implement Reset Handler
    • Copy .data section from Flash to RAM
    • Reserve memory for .bss section
    • Call to main function
startup.c
#include <stdint.h>

extern uint32_t _sdata;
extern uint32_t _edata;
extern uint32_t _lddata;
extern uint32_t _sbss;
extern uint32_t _ebss;

extern void main(void);

void Reset_Handler(void) {
    // copy .data section from flash to ram
    uint32_t size = (uint32_t)&_edata - (uint32_t)&_sdata;
    uint8_t *pRAM = (uint8_t*)&_sdata;
    uint8_t *pFlash = (uint8_t*)&_lddata;

    for(int i=0; i<size; i++) {
        pRAM[i] = pFlash[i];
    }

    // initialize .bss section
    size = (uint32_t)&_ebss - (uint32_t)&_sbss;
    pRAM = (uint8_t*)&_sbss;

    for(int i=0; i<size; i++) {
        pRAM[i] = 0;
    }

    // call to main
    main();
}

Examine the binary file#

Build all files:

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    -nostdlib \
    -Wall \
    -T linker.ld -Wl,-Map=main.elf.map \
    main.o delay.o vector.o startup.o \
    -o main.elf

The elf file is a wrapper of a binary file because it contains extra metadata, such as the symbol table:

arm-none-eabi-nm main.elf > main.elf.sym
main.elf.sym
20000008 B _ebss
20000004 D _edata
20020000 D _estack
080002b8 T _etext
080002bc A _lddata
20000004 B _sbss
20000000 D _sdata
0800022c W ADC_IRQHandler
0800022c W BusFault_Handler
0800022c W DebugMon_Handler
0800022c T Default_Handler
08000200 T delay
20000004 B delay_counter
080002b8 R DELAY_MAX
0800022c W DMA1_Stream0_IRQHandler
0800022c W DMA1_Stream1_IRQHandler
0800022c W DMA1_Stream2_IRQHandler
0800022c W DMA1_Stream3_IRQHandler
0800022c W DMA1_Stream4_IRQHandler
0800022c W DMA1_Stream5_IRQHandler
0800022c W DMA1_Stream6_IRQHandler
0800022c W DMA1_Stream7_IRQHandler
0800022c W DMA2_Stream0_IRQHandler
0800022c W DMA2_Stream1_IRQHandler
0800022c W DMA2_Stream2_IRQHandler
0800022c W DMA2_Stream3_IRQHandler
0800022c W DMA2_Stream4_IRQHandler
0800022c W DMA2_Stream5_IRQHandler
0800022c W DMA2_Stream6_IRQHandler
0800022c W DMA2_Stream7_IRQHandler
0800022c W EXTI0_IRQHandler
0800022c W EXTI1_IRQHandler
0800022c W EXTI15_10_IRQHandler
0800022c W EXTI2_IRQHandler
0800022c W EXTI3_IRQHandler
0800022c W EXTI4_IRQHandler
0800022c W EXTI9_5_IRQHandler
0800022c W FLASH_IRQHandler
0800022c W FPU_IRQHandler
0800022c W HardFault_Handler
0800022c W I2C1_ER_IRQHandler
0800022c W I2C1_EV_IRQHandler
0800022c W I2C2_ER_IRQHandler
0800022c W I2C2_EV_IRQHandler
0800022c W I2C3_ER_IRQHandler
0800022c W I2C3_EV_IRQHandler
20000000 D isLoop
08000198 T main
0800022c W MemManage_Handler
0800022c W NMI_Handler
0800022c W OTG_FS_IRQHandler
0800022c W OTG_FS_WKUP_IRQHandler
0800022c W PendSV_Handler
0800022c W PVD_IRQHandler
0800022c W RCC_IRQHandler
08000234 T Reset_Handler
0800022c W RTC_Alarm_IRQHandler
0800022c W RTC_WKUP_IRQHandler
0800022c W SDIO_IRQHandler
0800022c W SPI1_IRQHandler
0800022c W SPI2_IRQHandler
0800022c W SPI3_IRQHandler
0800022c W SPI4_IRQHandler
0800022c W SPI5_IRQHandler
0800022c W SVC_Handler
0800022c W SysTick_Handler
0800022c W TAMP_STAMP_IRQHandler
0800022c W TIM1_BRK_TIM9_IRQHandler
0800022c W TIM1_CC_IRQHandler
0800022c W TIM1_TRG_COM_TIM11_IRQHandler
0800022c W TIM1_UP_TIM10_IRQHandler
0800022c W TIM2_IRQHandler
0800022c W TIM3_IRQHandler
0800022c W TIM4_IRQHandler
0800022c W TIM5_IRQHandler
0800022c W UsageFault_Handler
0800022c W USART1_IRQHandler
0800022c W USART2_IRQHandler
0800022c W USART6_IRQHandler
08000000 D vector_table
0800022c W WWDG_IRQHandler

The Reset_Handler is at 0x08000234, the Default_Handler is at 0x0800022c.


Extract binary content:

arm-none-eabi-objcopy -O binary main.elf main.bin

Examine binary file

Check isr_vector at 0x08000000:

group 4 bytes, use little-endian, start at 0, size 32 bytes
xxd -g4 -e -s0 -l32 main.bin
00000000: 20020000 08000235 0800022d 0800022d  ... 5...-...-...
00000010: 0800022d 0800022d 0800022d 00000000  -...-...-.......

You will notice that:

  • The MSP value at the address 0x00000000 is the RAN_END value 0x20020000.
  • The Reset Handler address is written at 0x00000004, which is 0x08000235 (note that the LSB bit is 1 to indicate Thumb state).

Let check the value of DELAY_MAX at the address 0x080002bc:

xxd -g4 -e -s0x2bc -l4 main.bin
000002bc: 0000beef

You will notice the constant value 0x0000BEEF is stored at that address.


Review assembly code

You can read the assembly code from the elf file using

arm-none-eabi-objdump -S main.elf
main.elf.s
main.elf:     file format elf32-littlearm


Disassembly of section .text:

08000198 <main>:
 8000198:   b580        push    {r7, lr}
 800019a:   af00        add r7, sp, #0
 800019c:   4b14        ldr r3, [pc, #80]   ; (80001f0 <main+0x58>)
 800019e:   681b        ldr r3, [r3, #0]
 80001a0:   4a13        ldr r2, [pc, #76]   ; (80001f0 <main+0x58>)
 80001a2:   f043 0301   orr.w   r3, r3, #1
 80001a6:   6013        str r3, [r2, #0]
 80001a8:   4b12        ldr r3, [pc, #72]   ; (80001f4 <main+0x5c>)
 80001aa:   681b        ldr r3, [r3, #0]
 80001ac:   4a11        ldr r2, [pc, #68]   ; (80001f4 <main+0x5c>)
 80001ae:   f423 6300   bic.w   r3, r3, #2048   ; 0x800
 80001b2:   6013        str r3, [r2, #0]
 80001b4:   4b0f        ldr r3, [pc, #60]   ; (80001f4 <main+0x5c>)
 80001b6:   681b        ldr r3, [r3, #0]
 80001b8:   4a0e        ldr r2, [pc, #56]   ; (80001f4 <main+0x5c>)
 80001ba:   f443 6380   orr.w   r3, r3, #1024   ; 0x400
 80001be:   6013        str r3, [r2, #0]
 80001c0:   e00f        b.n 80001e2 <main+0x4a>
 80001c2:   4b0d        ldr r3, [pc, #52]   ; (80001f8 <main+0x60>)
 80001c4:   681b        ldr r3, [r3, #0]
 80001c6:   4a0c        ldr r2, [pc, #48]   ; (80001f8 <main+0x60>)
 80001c8:   f043 0320   orr.w   r3, r3, #32
 80001cc:   6013        str r3, [r2, #0]
 80001ce:   f000 f817   bl  8000200 <delay>
 80001d2:   4b09        ldr r3, [pc, #36]   ; (80001f8 <main+0x60>)
 80001d4:   681b        ldr r3, [r3, #0]
 80001d6:   4a08        ldr r2, [pc, #32]   ; (80001f8 <main+0x60>)
 80001d8:   f443 1300   orr.w   r3, r3, #2097152    ; 0x200000
 80001dc:   6013        str r3, [r2, #0]
 80001de:   f000 f80f   bl  8000200 <delay>
 80001e2:   4b06        ldr r3, [pc, #24]   ; (80001fc <main+0x64>)
 80001e4:   681b        ldr r3, [r3, #0]
 80001e6:   2b00        cmp r3, #0
 80001e8:   d1eb        bne.n   80001c2 <main+0x2a>
 80001ea:   2300        movs    r3, #0
 80001ec:   4618        mov r0, r3
 80001ee:   bd80        pop {r7, pc}
 80001f0:   40023830    .word   0x40023830
 80001f4:   40020000    .word   0x40020000
 80001f8:   40020018    .word   0x40020018
 80001fc:   20000000    .word   0x20000000

08000200 <delay>:
 8000200:   b480        push    {r7}
 8000202:   af00        add r7, sp, #0
 8000204:   f64b 62ef   movw    r2, #48879  ; 0xbeef
 8000208:   4b07        ldr r3, [pc, #28]   ; (8000228 <delay+0x28>)
 800020a:   601a        str r2, [r3, #0]
 800020c:   bf00        nop
 800020e:   4b06        ldr r3, [pc, #24]   ; (8000228 <delay+0x28>)
 8000210:   681b        ldr r3, [r3, #0]
 8000212:   1e5a        subs    r2, r3, #1
 8000214:   4904        ldr r1, [pc, #16]   ; (8000228 <delay+0x28>)
 8000216:   600a        str r2, [r1, #0]
 8000218:   2b00        cmp r3, #0
 800021a:   d1f8        bne.n   800020e <delay+0xe>
 800021c:   bf00        nop
 800021e:   bf00        nop
 8000220:   46bd        mov sp, r7
 8000222:   bc80        pop {r7}
 8000224:   4770        bx  lr
 8000226:   bf00        nop
 8000228:   20000004    .word   0x20000004

0800022c <Default_Handler>:
 800022c:   b480        push    {r7}
 800022e:   af00        add r7, sp, #0
 8000230:   e7fe        b.n 8000230 <Default_Handler+0x4>
    ...

08000234 <Reset_Handler>:
 8000234:   b580        push    {r7, lr}
 8000236:   b086        sub sp, #24
 8000238:   af00        add r7, sp, #0
 800023a:   4a1a        ldr r2, [pc, #104]  ; (80002a4 <Reset_Handler+0x70>)
 800023c:   4b1a        ldr r3, [pc, #104]  ; (80002a8 <Reset_Handler+0x74>)
 800023e:   1ad3        subs    r3, r2, r3
 8000240:   60fb        str r3, [r7, #12]
 8000242:   4b19        ldr r3, [pc, #100]  ; (80002a8 <Reset_Handler+0x74>)
 8000244:   60bb        str r3, [r7, #8]
 8000246:   4b19        ldr r3, [pc, #100]  ; (80002ac <Reset_Handler+0x78>)
 8000248:   607b        str r3, [r7, #4]
 800024a:   2300        movs    r3, #0
 800024c:   617b        str r3, [r7, #20]
 800024e:   e00a        b.n 8000266 <Reset_Handler+0x32>
 8000250:   697b        ldr r3, [r7, #20]
 8000252:   687a        ldr r2, [r7, #4]
 8000254:   441a        add r2, r3
 8000256:   697b        ldr r3, [r7, #20]
 8000258:   68b9        ldr r1, [r7, #8]
 800025a:   440b        add r3, r1
 800025c:   7812        ldrb    r2, [r2, #0]
 800025e:   701a        strb    r2, [r3, #0]
 8000260:   697b        ldr r3, [r7, #20]
 8000262:   3301        adds    r3, #1
 8000264:   617b        str r3, [r7, #20]
 8000266:   697b        ldr r3, [r7, #20]
 8000268:   68fa        ldr r2, [r7, #12]
 800026a:   429a        cmp r2, r3
 800026c:   d8f0        bhi.n   8000250 <Reset_Handler+0x1c>
 800026e:   4a10        ldr r2, [pc, #64]   ; (80002b0 <Reset_Handler+0x7c>)
 8000270:   4b10        ldr r3, [pc, #64]   ; (80002b4 <Reset_Handler+0x80>)
 8000272:   1ad3        subs    r3, r2, r3
 8000274:   60fb        str r3, [r7, #12]
 8000276:   4b0f        ldr r3, [pc, #60]   ; (80002b4 <Reset_Handler+0x80>)
 8000278:   60bb        str r3, [r7, #8]
 800027a:   2300        movs    r3, #0
 800027c:   613b        str r3, [r7, #16]
 800027e:   e007        b.n 8000290 <Reset_Handler+0x5c>
 8000280:   693b        ldr r3, [r7, #16]
 8000282:   68ba        ldr r2, [r7, #8]
 8000284:   4413        add r3, r2
 8000286:   2200        movs    r2, #0
 8000288:   701a        strb    r2, [r3, #0]
 800028a:   693b        ldr r3, [r7, #16]
 800028c:   3301        adds    r3, #1
 800028e:   613b        str r3, [r7, #16]
 8000290:   693b        ldr r3, [r7, #16]
 8000292:   68fa        ldr r2, [r7, #12]
 8000294:   429a        cmp r2, r3
 8000296:   d8f3        bhi.n   8000280 <Reset_Handler+0x4c>
 8000298:   f7ff ff7e   bl  8000198 <main>
 800029c:   bf00        nop
 800029e:   3718        adds    r7, #24
 80002a0:   46bd        mov sp, r7
 80002a2:   bd80        pop {r7, pc}
 80002a4:   20000004    .word   0x20000004
 80002a8:   20000000    .word   0x20000000
 80002ac:   080002bc    .word   0x080002bc
 80002b0:   20000008    .word   0x20000008
 80002b4:   20000004    .word   0x20000004

Download and Debug#


Run OpenOCD

Each target has its own configurations, such as _CPUTAPID, _ENDIAN, or Debug registers. You will need this configuration file to work with your target.

For example, target an STM32F411 MCU:

stm32f4x.cfg
# script for stm32f4x family

#
# stm32 devices support both JTAG and SWD transports.
#
source [find target/swj-dp.tcl]
source [find mem_helper.tcl]

if { [info exists CHIPNAME] } {
   set _CHIPNAME $CHIPNAME
} else {
   set _CHIPNAME stm32f4x
}

set _ENDIAN little

# Work-area is a space in RAM used for flash programming
# By default use 32kB (Available RAM in smallest device STM32F410)
if { [info exists WORKAREASIZE] } {
   set _WORKAREASIZE $WORKAREASIZE
} else {
   set _WORKAREASIZE 0x8000
}

#jtag scan chain
if { [info exists CPUTAPID] } {
   set _CPUTAPID $CPUTAPID
} else {
   if { [using_jtag] } {
      # See STM Document RM0090
      # Section 38.6.3 - corresponds to Cortex-M4 r0p1
      set _CPUTAPID 0x4ba00477
   } {
      set _CPUTAPID 0x2ba01477
   }
}

swj_newdap $_CHIPNAME cpu -irlen 4 -ircapture 0x1 -irmask 0xf -expected-id $_CPUTAPID
dap create $_CHIPNAME.dap -chain-position $_CHIPNAME.cpu

tpiu create $_CHIPNAME.tpiu -dap $_CHIPNAME.dap -ap-num 0 -baseaddr 0xE0040000

if {[using_jtag]} {
   jtag newtap $_CHIPNAME bs -irlen 5
}

set _TARGETNAME $_CHIPNAME.cpu
target create $_TARGETNAME cortex_m -endian $_ENDIAN -dap $_CHIPNAME.dap

$_TARGETNAME configure -work-area-phys 0x20000000 -work-area-size $_WORKAREASIZE -work-area-backup 0

set _FLASHNAME $_CHIPNAME.flash
flash bank $_FLASHNAME stm32f2x 0 0 0 0 $_TARGETNAME

flash bank $_CHIPNAME.otp stm32f2x 0x1fff7800 0 0 0 $_TARGETNAME

if { [info exists QUADSPI] && $QUADSPI } {
   set a [llength [flash list]]
   set _QSPINAME $_CHIPNAME.qspi
   flash bank $_QSPINAME stmqspi 0x90000000 0 0 0 $_TARGETNAME 0xA0001000
}

# JTAG speed should be <= F_CPU/6. F_CPU after reset is 16MHz, so use F_JTAG = 2MHz
#
# Since we may be running of an RC oscilator, we crank down the speed a
# bit more to be on the safe side. Perhaps superstition, but if are
# running off a crystal, we can run closer to the limit. Note
# that there can be a pretty wide band where things are more or less stable.
adapter speed 2000

adapter srst delay 100
if {[using_jtag]} {
 jtag_ntrst_delay 100
}

reset_config srst_nogate

if {![using_hla]} {
   # if srst is not fitted use SYSRESETREQ to
   # perform a soft reset
   cortex_m reset_config sysresetreq
}

$_TARGETNAME configure -event examine-end {
    # Enable debug during low power modes (uses more power)
    # DBGMCU_CR |= DBG_STANDBY | DBG_STOP | DBG_SLEEP
    mmw 0xE0042004 0x00000007 0

    # Stop watchdog counters during halt
    # DBGMCU_APB1_FZ |= DBG_IWDG_STOP | DBG_WWDG_STOP
    mmw 0xE0042008 0x00001800 0
}

proc proc_post_enable {_chipname} {
    targets $_chipname.cpu

    if { [$_chipname.tpiu cget -protocol] eq "sync" } {
        switch [$_chipname.tpiu cget -port-width] {
            1 {
                mmw 0xE0042004 0x00000060 0x000000c0
                mmw 0x40021020 0x00000000 0x0000ff00
                mmw 0x40021000 0x000000a0 0x000000f0
                mmw 0x40021008 0x000000f0 0x00000000
              }
            2 {
                mmw 0xE0042004 0x000000a0 0x000000c0
                mmw 0x40021020 0x00000000 0x000fff00
                mmw 0x40021000 0x000002a0 0x000003f0
                mmw 0x40021008 0x000003f0 0x00000000
              }
            4 {
                mmw 0xE0042004 0x000000e0 0x000000c0
                mmw 0x40021020 0x00000000 0x0fffff00
                mmw 0x40021000 0x00002aa0 0x00003ff0
                mmw 0x40021008 0x00003ff0 0x00000000
              }
        }
    } else {
        mmw 0xE0042004 0x00000020 0x000000c0
    }
}

$_CHIPNAME.tpiu configure -event post-enable "proc_post_enable $_CHIPNAME"

$_TARGETNAME configure -event reset-init {
    # Configure PLL to boost clock to HSI x 4 (64 MHz)
    mww 0x40023804 0x08012008   ;# RCC_PLLCFGR 16 Mhz /8 (M) * 128 (N) /4(P)
    mww 0x40023C00 0x00000102   ;# FLASH_ACR = PRFTBE | 2(Latency)
    mmw 0x40023800 0x01000000 0 ;# RCC_CR |= PLLON
    sleep 10                    ;# Wait for PLL to lock
    mmw 0x40023808 0x00001000 0 ;# RCC_CFGR |= RCC_CFGR_PPRE1_DIV2
    mmw 0x40023808 0x00000002 0 ;# RCC_CFGR |= RCC_CFGR_SW_PLL

    # Boost JTAG frequency
    adapter speed 8000
}

$_TARGETNAME configure -event reset-start {
    # Reduce speed since CPU speed will slow down to 16MHz with the reset
    adapter speed 2000
}

On a board with an ST Link debugger:

board.cfg
source [find interface/stlink.cfg]

transport select hla_swd

# increase working area to 64KB
set WORKAREASIZE 0x10000

source [find target/stm32f4x.cfg]

reset_config srst_only

You can use any STM32 compatible debuggers such as ST_Link V⅔, J-Link to connect with Serial Wire Debug (SWD) interface on the target MCU.

Debugging a target board

openocd -f board.cfg
xPack OpenOCD x86_64 Open On-Chip Debugger 0.11.0+dev (2022-03-25-17:32)
Licensed under GNU GPL v2
For bug reports, read
        http://openocd.org/doc/doxygen/bugs.html
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
srst_only separate srst_nogate srst_open_drain connect_deassert_srst

Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections
Info : clock speed 2000 kHz
Info : STLINK V2J39M27 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.276040
Info : [stm32f4x.cpu] Cortex-M4 r0p1 processor detected
Info : [stm32f4x.cpu] target has 6 breakpoints, 4 watchpoints
Info : starting gdb server for stm32f4x.cpu on 3333
Info : Listening on port 3333 for gdb connections

OpenOCD Commands are available online and examples.


Telnet client

Run Telnet:

Run the Telnet client:

telnet 127.0.0.1 4444

Telnet is used access to OpenOCD server and use OpenOCD commands directly.

flash write_image erase main.elf
reset halt
resume

Use Telnet to connect to OpenOCD


GDB Client

Prepare a debug version with -g option, named it main-debug.elf

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -nostdlib \
    -std=gnu11 \
    -Wall \
    -g \
    -T linker.ld -Wl,-Map=main-debug.elf.map \
    main.c delay.c vector.c startup.c \
    -o main-debug.elf

Run the GDB client with debug version:

arm-none-eabi-gdb main-debug.elf

Then connect to OpenOCD server:

target extended-remote localhost:3333

All OpenOCD command must be start with monitor tag

(gdb)
monitor flash write_image erase main.elf
monitor reset halt
monitor resume

GDB has its own command set, you can use it too:

(gdb)
br main
step

Use GDB to connect to OpenOCD

Use standard library#

newlib-nano.zip

Let see a new example that uses the standard library.

The source code is from the above example, but added some mofifications:

  • Add stdio.h library for using printf() function
  • Use Semihosting for output if macro USE_SEMIHOSTING is defined
#include <stdint.h>
#include <stdio.h>
#include "delay.h"

/* Clock */
#define RCC_AHB1ENR     *((volatile uint32_t*) (0x40023830))

/* GPIO A */
#define GPIOA_MODER     *((volatile uint32_t*) (0x40020000))
#define GPIOA_BSRR      *((volatile uint32_t*) (0x40020018))

/* Global initialized variable */
uint32_t isLoop = 1;

#ifdef USE_SEMIHOSTING
/* Semohosting */
extern void initialise_monitor_handles(void);
#endif

int main() {
    char counter = 0;

#ifdef USE_SEMIHOSTING
    initialise_monitor_handles();
#endif

    /* turn on clock on GPIOA */
    RCC_AHB1ENR |= (1 << 0);

    /* set PA5 to output mode */
    GPIOA_MODER &= ~(1 << 11);
    GPIOA_MODER |=  (1 << 10);

    while(isLoop) {
        /* set HIGH on PA5 */
        GPIOA_BSRR |= (1 << 5);
        delay();

        /* set LOW on PA5 */
        GPIOA_BSRR |= (1 << (5+16));
        delay();

        /* output */
        printf("counter = %d\n", counter);
        counter++;
    }
    return 0;
}

Let try to compile:

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    -Wall \
    -T linker.ld -Wl,-Map=main.elf.map \
    main.c delay.c vector.c startup.c \
    -o main.elf
 \crt0.o: in function `_mainCRTStartup':
    (.text+0x64): undefined reference to `__bss_start__'
    (.text+0x68): undefined reference to `__bss_end__'
 \libc.a(lib_a-exit.o): in function `exit':
    (.text.exit+0x16): undefined reference to `_exit'
 \libc.a(lib_a-sbrkr.o): in function `_sbrk_r':
    (.text._sbrk_r+0xc): undefined reference to `_sbrk'
 \libc.a(lib_a-writer.o): in function `_write_r':
    (.text._write_r+0x14): undefined reference to `_write'
 \libc.a(lib_a-closer.o): in function `_close_r':
    (.text._close_r+0xc): undefined reference to `_close'
 \libc.a(lib_a-fstatr.o): in function `_fstat_r':
    (.text._fstat_r+0x12): undefined reference to `_fstat'
 \libc.a(lib_a-isattyr.o): in function `_isatty_r':
    (.text._isatty_r+0xc): undefined reference to `_isatty'
 \libc.a(lib_a-lseekr.o): in function `_lseek_r':
    (.text._lseek_r+0x14): undefined reference to `_lseek'
 \libc.a(lib_a-readr.o): in function `_read_r':
    (.text._read_r+0x14): undefined reference to `_read'
 \libc.a(lib_a-abort.o): in function `abort':
    (.text.abort+0xa): undefined reference to `_exit'
 \libc.a(lib_a-signalr.o): in function `_kill_r':
    (.text._kill_r+0x12): undefined reference to `_kill'
 \libc.a(lib_a-signalr.o): in function `_getpid_r':
    (.text._getpid_r+0x0): undefined reference to `_getpid'
 \libgcc.a(unwind-arm.o): in function `get_eit_entry':
    (.text.get_eit_entry+0x90): undefined reference to `__exidx_start'
    (.text.get_eit_entry+0x94): undefined reference to `__exidx_end'

We see that the compiler link to the libc by default.

Standard C libraries

GNU ARM libraries use newlib to provide standard implementation of C libraries. To reduce the code size and make it independent to hardware, there is a lightweight version newlib-nano used in MCUs.

However, newlib-nano does not provide an implementation of low-level system calls which are used by C standard libraries, such as print() or scan(). To make the application compilable, a new library named nosys should be added. This library just provide a simple implementation of low-level system calls which mostly return a by-pass value.

The lib newlib-nano is enabled via linker options --specs=nano.specs, and nosys is enabled via linker option --specs=nosys.specs. These two libraries are included by default in GCC linker options in generated project, check it here.

arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    --specs=nano.specs --specs=nosys.specs \
    -Wall \
    -T linker.ld -Wl,-Map=main.elf.map \
    main.c delay.c vector.c startup.c \
    -o main.elf
\crt0.o: in function `_mainCRTStartup':
    (.text+0x64): undefined reference to `__bss_start__'
    (.text+0x68): undefined reference to `__bss_end__'
\libnosys.a(sbrk.o): in function `_sbrk':
    (.text._sbrk+0x18): undefined reference to `end'

There are still few errors that need fixed in Linker script.

Let download the source code of newlib from newlib ftp directory.

Search for __bss_start__, __bss_start__ and you can see a note:

    * `mcore/crt0.S`: Renamed file from `crt0.s`.
    Only invoke `init()` and `fini()` routines for ELF builds.
    Use `__bss_start__` and `__bss_end__` to locate `.bss` section.

Search for the function _sbrk you can see a note in source code, which mentions that end symbol should be end of heap.

void * _sbrk (ptrdiff_t incr) {
  extern char end asm ("end");  /* Defined by the linker.  */
  static char *heap_end;
  char *prev_heap_end;

  if (heap_end == NULL)
    heap_end = &end;

Update linker sections#

The linker should update below sections:

  • add alignment to each section
  • include subsections, e.g. *(.text*)
  • new heap section to check reserved memory for stack and heap
linker.ld
ENTRY(Reset_Handler)

MEMORY
{
  RAM    (xrw)    : ORIGIN = 0x20000000,   LENGTH = 128K
  FLASH   (rx)    : ORIGIN = 0x08000000,   LENGTH = 512K
}

_estack = ORIGIN(RAM) + LENGTH(RAM);
_Min_Heap_Size = 0x200; /* required amount of heap */
_Min_Stack_Size = 0x400; /* required amount of stack */

SECTIONS
{
    .isr_vector :
    {
        . = ALIGN(4);
        KEEP(*(.isr_vector)) /* Startup code */
        . = ALIGN(4);
    } >FLASH

    .text :
    {
        . = ALIGN(4);
        *(.text)
        *(.text*)
        *(.glue_7)         /* glue arm to thumb code */
        *(.glue_7t)        /* glue thumb to arm code */
        *(.eh_frame)
        KEEP (*(.init))
        KEEP (*(.fini))
        . = ALIGN(4);
        _etext = .;
    } >FLASH

    .rodata :
    {
        . = ALIGN(4);
        *(.rodata)
        *(.rodata*)
        . = ALIGN(4);
    } >FLASH

    .ARM : {
        . = ALIGN(4);
        __exidx_start = .;
        *(.ARM.exidx*)
        __exidx_end = .;
        . = ALIGN(4);
    } >FLASH

    _lddata = LOADADDR(.data);
    .data :
    {
        . = ALIGN(4);
        _sdata = .;
        *(.data)
        *(.data*)
        . = ALIGN(4);
        _edata = .;
    } >RAM AT> FLASH

    .bss :
    {
        . = ALIGN(4);
        _sbss = .;
        __bss_start__ = _sbss;
        *(.bss)
        *(.bss.*)
        *(COMMON)
        . = ALIGN(4);
        _ebss = .;
        __bss_end__ = _ebss;
    } >RAM

    ._user_heap_stack :
    {
        . = ALIGN(8);
        end = .;
        _end = .;
        __end__ = .;
        . = . + _Min_Heap_Size;
        . = . + _Min_Stack_Size;
        . = ALIGN(8);
    } >RAM

    /* Remove information from the compiler libraries */
    /DISCARD/ :
    {
        libc.a ( * )
        libm.a ( * )
        libgcc.a ( * )
    }

    .ARM.attributes 0 : { *(.ARM.attributes) }
}

Update startup code#

startup.c
#include <stdint.h>

extern uint32_t _sdata;
extern uint32_t _edata;
extern uint32_t _lddata;
extern uint32_t _sbss;
extern uint32_t _ebss;

extern void main(void);
extern void __libc_init_array(void);

void Reset_Handler(void) {
    // copy .data section from flash to ram
    uint32_t size = (uint32_t)&_edata - (uint32_t)&_sdata;
    uint8_t *pRAM = (uint8_t*)&_sdata;
    uint8_t *pFlash = (uint8_t*)&_lddata;

    for(int i=0; i<size; i++) {
        pRAM[i] = pFlash[i];
    }

    // initialize .bss section
    size = (uint32_t)&_ebss - (uint32_t)&_sbss;
    pRAM = (uint8_t*)&_sbss;

    for(int i=0; i<size; i++) {
        pRAM[i] = 0;
    }

    // init libc
    __libc_init_array();

    // call to main
    main();
}

Run OpenOCD with Semihosting#

Compile with Semihosting:

  • Use --specs=rdimon.specs
  • Use -DUSE_SEMIHOSTING
  • Use -lrdimon
arm-none-eabi-gcc \
    -mcpu=cortex-m4 -mthumb -mfloat-abi=soft \
    -std=gnu11 \
    --specs=rdimon.specs \
    -Wall \
    -DUSE_SEMIHOSTING
    -T linker.ld -Wl,-Map=main.elf.map -lrdimon\
    main.c delay.c vector.c startup.c \
    -o main-semi.elf

Run OpenOCD, and use Telnet to connect and run below command

arm semihosting enable
halt
flash write_image erase main-semi.elf
reset halt
resume

Use Semihosting with OpenOCD

Comments