Skip to content

STM32 »

Introduction to ARM Cortex-M & STM32 MCUs

The ARM Cortex-M is a group of 32-bit RISC ARM processor cores optimized for low-cost and energy-efficient integrated circuits. This post gives an overview about registers, memory map, interrupts, clock sources and the Cortex Microcontroller Software Interface Standard (CMSIS) library. This also shows the brief difference in STM32 MCU product lines.

Last update: 2022-05-16

ARM Cortex-M processors#

The ARM (Advanced RISC Machines) processors use Reduced Instruction Set Computing (RISC) architectures, and nowadays have many revisions (ARMv6, ARMv6-M, ARMv7, ARMv7-A, etc.).

ARM Cortex is a wide set of 32/64-bit core architectures, which are based on ARM architecture revisions. For example, a processor based on the Cortex-M4 core is designed on the ARMv7-M architecture.

ARM Cortex microcontrollers are divided into three main subfamilies:

  • Cortex-A which stands for Application
  • Cortex-R which stand for Real-Time
  • Cortex-M which stands for EMbedded

Operational Modes#

The processor gives 2 Operational Modes:

Thread mode (default)
  • Used to execute application software
  • The processor enters Thread mode when it comes out of reset
  • Can be in privileged or unprivileged access level, using bit nPRIV in the CONTROL register
Handler mode
  • Used to handle exceptions. The Interrupt Program Status register IPSR contains the exception type number of the current Interrupt Service Routine (ISR)
  • The processor returns to Thread mode when it has finished exception processing
  • Always in privileged access level

Access Levels#

  • Has limited access to the MSR and MRS instructions, and cannot use the CPS instruction.
  • Cannot access the system timer, NVIC, or system control block.
  • Might have restricted access to memory or peripherals.
  • Must use the SVC instruction to make a supervisor call to transfer control to privileged software.
Privileged (default)
  • Can use all the instructions and has access to all resources.
  • Can write to the CONTROL register to change the privilege level for software execution


The processor implements two stacks, the main stack and the process stack, with independent copies of the stack pointer.

The Stack Pointer (SP) is register R13. In Thread mode, bit[1] of the CONTROL register indicates the stack pointer to use:

  • 0: Main Stack Pointer (MSP). This is the reset value.
  • 1: Process Stack Pointer (PSP).

On reset, the processor loads the MSP with the value from address 0x00000000.

Handler mode always uses the MSP, so the processor ignores explicit writes to the active stack pointer bit of the CONTROL register when in Handler mode. The exception entry and return mechanisms update the CONTROL register.

In an OS environment, it is recommended that threads running in Thread mode use the process stack, and the kernel and exception handlers use the main stack.

Core Registers#

Like all RISC architectures, Cortex-M processors are load/store machines, which perform operations only on CPU registers except for two categories of instructions: load and store, used to transfer data between CPU registers and memory locations

Processor register set on ARM Cortex-M Microprocessor

  • R0~ R12 are general-purpose registers, and can be used as operands for ARM instructions. Some general-purpose registers, however, can be used by the compiler as registers with special functions.

  • R13 is the Stack Pointer (SP) register, which is also said to be banked. This means that the register content changes according to the current CPU mode (privileged or unprivileged). This function is typically used by Real Time Operating Systems (RTOS) to do context switching.

  • R14 is the Link Register (LR) register, which is a special-purpose register which holds the address to return to when a function call completes. This is more efficient than the more traditional scheme of storing return addresses on a call stack, sometimes called a machine stack. The linker register does not require the writes and reads of the memory containing the stack which can save a considerable percentage of execution time with repeated calls of small subroutines.

  • R15 is the Program Counter (PC) register, which has the address of the next instruction to be executed from memory. Usually, the PC is incremented after fetching an instruction. However, control transfer instructions can change the sequence by placing a new value in the PC register.

    In debugger, the PC register contains the address of the instruction which will be executed in next step. It is the displayed address of the instruction in the xecute stage. The actual PC value is the address of the instruction in the fetch stage (2 instruction ahead!). Read more in Load instruction example.

  • Program status register (PSR) combines Application Program Status Register (APSR), Interrupt Program Status Register (IPSR), Execution Program Status Register (EPSR)

  • PRIMASK is the Priority Mask register which prevents the activation of all exceptions with configurable priority

  • FAULTMASK is the Fault Mask register which prevents activation of all exceptions except for Non-Maskable Interrupt (NMI)

  • CONTROL is the register that controls the stack used and the privilege level for software execution when the processor is in Thread mode and indicates whether the FPU state is active

Memory Map#

ARM defines a standardized memory address space common to all Cortex-M cores, which ensures code portability among different silicon manufacturer. The address space is 4 GB wide (due to 32-bit address line), and it is organized in several subregions with different logical functionalities.

Fixed memory map for ARM cores

The first 512 MB are dedicated to code area:

  • All Cortex-M processors map the code area starting at address 0x00000000. This area also includes the pointer to the beginning of the stack (usually placed in SRAM) and the system interrupt vector table.

  • An area starting at address 0x08000000 is bound to the internal MCU flash memory, and it is the area where program code resides. With a specific boot configuration, this area is also aliased from address 0x00000000. This means that it is perfectly possible to refer to the content of the flash memory both starting at address 0x08000000 and 0x00000000.

  • System Memory is a ROM region filled with official pre-programmed Bootloader which can be used to load code from several peripherals, including USARTs, USB and CAN bus.

  • Option Bytes region contains a series of bit flags which can be used to configure several aspects of the MCU (such as flash read protection, hardware watchdog, boot mode and so on) and are related to a specific microcontroller.

Next 512 MB is mapped to Internal SRAM:

  • It starts at address 0x20000000 and can potentially extend to 0x3FFFFFFF.

  • This area also can be aliased to the start-up address at 0x00000000 to execute code in internal RAM.

The left space is for peripherals and other stuff:

  • Other memory regions are mapped to external RAM, peripherals and the internal core registers. All Cortex processor registers are at fixed locations for all Cortex-based microcontrollers. This allows code to be more easily ported between different core variants and indeed other vendors’ Cortex-based microcontrollers.

Memory Map for Code Area


In embedded applications, it is quite common to work with a single bit of a word using bit-masking. For example:

uint8_t flags = 0;
flags |= 0x4; // set the 4-th bit

generates assembly code :

0x0a: 79fb ldrb r3, [r7, #7]
0x0c: f043 0304 orr.w r3, r3, #4
0x10: 71fb strb r3, [r7, #7]

Such a simple operation requires three assembly instructions (fetch, modify, save). This leads to a problem if an interruption happens between processing bit mask.

Bit-banding is the ability to map each bit of a given area of memory to a whole word in the aliased bit-banding memory region, allowing atomic access to such bit.

Memory Map of an address in a bit-banding region

ARM defines two bit-band regions for Cortex-M based MCUs, each one is 1 MB wide and mapped to a 32 Mbit bit-band alias region.

  • The first one starts at 0x20000000 and ends at 0x200FFFFF, and it is aliased from 0x22000000 to 0x23FFFFFF. It is dedicated to the bit access of SRAM memory locations.

  • Another bit-banding region starts at 0x40000000 and ends at 0x400FFFFF, which is dedicated to the memory mapping of peripherals, from 0x42000000 to 0x43FFFFFF.

Define two macros in C that allow to easily compute bit-band alias addresses:

// Define base address of bit-band
#define BITBAND_SRAM_BASE 0x20000000
// Define base address of alias band
#define ALIAS_SRAM_BASE 0x22000000
// Convert SRAM address to alias region
#define BITBAND_SRAM(a,b) ((ALIAS_SRAM_BASE + ((uint32_t)&(a)-BITBAND_SRAM_BASE)*32 + (b*4)))

// Define base address of peripheral bit-band
#define BITBAND_PERI_BASE 0x40000000
// Define base address of peripheral alias band
#define ALIAS_PERI_BASE 0x42000000
// Convert PERI address to alias region
#define BITBAND_PERI(a,b) ((ALIAS_PERI_BASE + ((uint32_t)a-BITBAND_PERI_BASE)*32 + (b*4)))

Example that quickly modifies the state of PIN5 of the GPIOA port as follows:

#define GPIOA_PERH_ADDR 0x40020000
#define ODR_ADDR_OFF 0x14


*GPIOA_PIN5 = 0x1; // Turns GPIO HIGH

Memory Map for Bit-banding Area

Thumb Instruction Set#

ARM Cortex-M processors provide a 32-bit instruction set, not only allows for a rich set of instructions, but also guarantees the best performance. However, memory footprint of the firmware has bigger cost. To address such issues, ARM introduced the Thumb 16-bit instruction set which is transparently expanded to full 32-bit ARM instructions in real time, without performance loss. Afterwards, ARM introduced the Thumb-2 instruction set, which is a mix of 16 and 32-bit instruction sets in one operation state.

The T bit of EPS Register

The Execution Program Status Register (EPSR) as a T bit to indicate Thumb state.

If T but is 1, next instruction is Thumb ISA.
If T but is 0, next instruction is ARM ISA.

The Cortex-M4 processor only supports execution of instructions in Thumb state. Hence, the T bit must be always 1.

The LSB (bit 0) of the Program Counter (PC) register is loaded to that T bit when the PC register is written. Therefore, any address that is put into PC register must be odd. This is usually taken care by the compiler. In case you call a function by an address manually, you have to take care the LSB bit of the address yourself.

void myfunc() {
  __asm volatile("nop");
int main() {
    // at 0x080001d8, but compiler will assign value 0x080001d9
    void (*pfunc_by_name)() = myfunc; 

    // manual load an address shoule be careful
    // use 0x080001d8 will cause Usage Fault Exception: Invalid State
    void (*pfunc_by_addr)() = (void *)0x080001d9;

Compiler changes the address of a function to maintain the T bit

Instruction Pipeline#

Before an instruction is executed, the CPU has to fetch it from memory and decode it. So, it has 3 stages to complete an instruction. Modern CPUs introduce a way to parallelize these operations in order to increase their instructions’ throughput. The basic instruction cycle is broken up into a series of steps, as if the instructions traveled along a pipeline.

3-stage instruction pipeline

When dealing with pipelines, branching is an issue to be addressed. When branching causes the invalidation of pipeline streams, the last two instructions which have been loaded into the pipeline will be discarded.

Memory Alignment#

Aligned and Unaligned memory access

ARM based CPUs are traditionally capable of accessing byte (8-bit), half word (16-bit) and word (32-bit) signed and unsigned variables, without increasing the number of assembly instructions as it happens on 8-bit MCU architectures which reads byte by byte. Aligned memory access causes a waste of memory locations.

Interrupts and Exceptions#

Interrupts and exceptions are asynchronous events that alter the program flow. When an exception or an interrupt occurs, the CPU suspends the execution of the current task, saves its context (that is, its stack pointer) and starts the execution of a routine designed to handle the interrupting event. This routine is called Exception Handler in case of exceptions and Interrupt Service Routine (ISR) in case of an interrupt. After the exception or interrupt has been handled, the CPU resumes the previous execution flow, and the previous task can continue its execution. In the ARM architecture, interrupts are one type of exception.

  • Interrupts are usually generated from on-chip peripherals (e.g., a timer) or external inputs (e.g. a tactile switch connected to a GPIO), and in some cases they can be triggered by software.

  • Exceptions are, instead, related to software execution, and the CPU itself can be a source of exceptions.

Each exception (and hence interrupt) has a number which uniquely identifies it. Cortex-M cores has pre-defined exception table which contains the addresses of function to handle those exceptions.

Number Exception Type Priority Function
1 Reset -3 Reset
2 NMI -2 Non-Maskable Interrupt
3 Hard Fault -1 All faults that hang the processor
4 Memory Fault Configurable Memory issue
5 Bus Fault Configurable Data bus issue
6 Usage Fault Configurable Data bus issue
7 ~ 10 Reserved Reserved
11 SVCall Configurable System service call (SVC instruction)
12 Debug Configurable Debug monitor (via SWD)
13 Reserved Reserved
14 PendSV Configurable Pending request for System Service call
15 SysTick Configurable System Timer
16 ~ 240 IRQ Configurable Interrupt Request

System Timer#

Cortex-M based processors can optionally provide a System Timer, also known as SysTick which is a 24-bit down-counting timer used to provide a system tick for Real Time Operating Systems (RTOS). It is used to generate periodic interrupts to scheduled tasks, or measure delay. When the timer reach zero, it fires an interrupt number 15, as seen in the Interrupt Table above.

Power Mode#

Cortex-M processors provide several levels of power management which can be set via System Control Register (SCR).

  • Run mode: full clock speed, all using peripherals are activated
  • Sleep mode: reduced clock speed, some peripherals are suspended
  • Deep sleep mode: clock is stopped, need external event to wake-up

CMSIS for SW development#

Cortex Microcontroller Software Interface Standard (CMSIS) is a vendor-independent hardware abstraction layer for the Cortex-M processor series and specifies debugger interfaces. The CMSIS consists of the following components:

  • CMSIS-CORE: API for the Cortex-M processor core and peripherals
  • CMSIS-Driver: defines generic peripheral driver interfaces for middleware making them reusable across supported devices
  • CMSIS-DSP: API for process signal and data such as fixed-point, single precision floating-point
  • CMSIS-RTOS API: Common API for Real-Time Operating Systems
  • CMSIS-Pack: a set of collections which includes source, header, library files, documentation, flash programming algorithms, source code templates and example projects
  • CMSIS-SVD: System View Description for Peripherals
  • CMSIS-DAP: Debug Access Port

Cortex-M comparison#

A table excerpted from ARM website.

Feature Cortex-M0 Cortex-M0+ Cortex-M3 Cortex-M4 Cortex-M33 Cortex-M7
ISA Armv6-M Armv6-M Armv7-M Armv7-M Armv8-M Mainline Armv7-M
Thumb, Thumb-2
Pipeline stages 3 2 3 3 3 6
Memory Protection Unit No Yes Yes Yes Yes Yes
Maximum MPU regions 0 8 8 8 16 16
(ETM or MTB)
No MTB ETMv3 ETMv3 MTB and/or ETMv4 ETMv4
DSP No No No Yes Yes Yes
Floating point hardware No No No Yes Yes Yes
Bus protocol AHB Lite AHB Lite AHB Lite, APB AHB Lite, APB AHB5 AXI4, AHB Lite, APB, TCM
Maximum # external interrupts 32 32 240 240 480 240
CMSIS Support Yes

STM32 Microcontrollers#

STM32 is a broad range of ARM Cortex-M microcontrollers divided in nine subfamilies. Internally, each microcontroller consists of the processor core, static RAM, flash memory, debugging interface, and various peripherals.

Here are advantages of using STM32 MCUs:

  • Cortex-M based MCUs have a large community, supported by free tool-chain, and is written in many shared knowledge articles

  • The Pin-to-Pin compatibility for most of STM32 MCUs helps to change the MCU while keeping pin assignments

  • Almost pins are 5V tolerant, that means it can interface with other devices which do not use 3.3V without using level shifter

  • Cheap is an advantage of using STM32 MCUs with ARM based processors and supported RTOS

  • Integrated bootloader is shipped with internal ROM which allows to reprogram the internal flash memory using some communication peripherals


STM32 MCUs comparison

Type Family Core Max Frequency Flash
High Performance STM32H7 Cortex-M7 / Cortex -M4 480 MHz / 240 MHz 1 to 2 MB
STM32F7 Cortex-M7 216 MHz 256 KB to 2 MB
STM32F4 Cortex-M4 180 MHz 64 KB to 2 MB
STM32F2 Cortex-M3 120 MHz 128 KB to 1 MB
Mainstream STM32G4 Cortex-M4 170 MHz 32 to 512 KB
STM32F3 Cortex-M4 72 MHz 16 to 512 KB
STM32F1 Cortex-M3 72 MHz 16 KB to 1 MB
STM32G0 Cortex-M0+ 64 MHz 16 to 512 KB
STM32F0 Cortex-M0 48 MHz 16 to 256 KB
Ultra-low-power STM32L5 Cortex-M33 110 MHz 256 to 512 KB
STM32L4+ Cortex-M4 120 MHz 512 KB to 2 MB
STM32L4 Cortex-M4 80 MHz 64 KB to 1 MB
STM32L1 Cortex-M3 32 MHz 32 to 512 KB
STM32L0 Cortex-M0+ 32 MHz 8 to 192 KB
Wireless STM32WB Cortex-M4 / Cortex-M0+ 64 MHz / 32 MHz 256 KB to 1 MB
STM32WL Cortex-M4 48 MHz 64 KB to 256 KB