Skip to content



STM32 »

A simple implementation of a Task Scheduler

Multitasking is the ability to execute more than one task in a fashion that is seemingly simultaneous. A scheduler is a part of operating system that is responsible for selecting, switching from a task to another one. This guide presents a simple way to implement a Round-Robin task scheduler to handle multiple tasks without priority.

Last update: 2022-06-29


STM32-Tutorials F411RE_Task_Scheduler.zip F411RE_Task_Scheduler_PendSV.zip

Task#

A Task is a piece of code, or a function, that does a specific job when it is allowed to run.

A Task has its own Stack to create its local variables. Its stack can be used to store addition information which is needed to save or restore the task from running state.

Usually, a task is an infinite loop which can repeatedly do multiple steps.

void task_main(void *param) {
    // init task
    ...
    // main loop
    while(1) {
        // do things over and over
    }
}

Scheduler#

Round-Robin scheduling method is the time slices assigned to each task are equal, and all tasks are done in a circular order.

The time slice is defined by a special timer - SysTick which is a free-run timer to produce a periodical interrupt.

In the SysTick Handler, we will to the context switching to move to a next task.

There are some others steps to initialize a scheduler:

Steps to run a Scheduler

Task Table#

Task Table is the place to store information of all tasks. Because each task should have its own stack to save its local variable, the task table must include the PSP values which point to the current stack pointer of each task.

The scheduler also needs to know which task is currently running to do save/ restore stack to the corresponding task.

Here is a minimal Task Table implementation, without using struct:

#define TASK_NUMBER_MAX   (16)
uint32_t __uCurrentTaskIdx = 0;
uint32_t __puTasksPSP[TASK_NUMBER_MAX] = {0};

The helper functions to access to the Task Table:

// return PSP value stored in slot at __uCurrentTaskIdx index 
uint32_t get_current_psp() {
  return __puTasksPSP[__uCurrentTaskIdx];
}

// save PSP value to the slot at __uCurrentTaskIdx index
void save_current_psp(uint32_t psp) {
  __puTasksPSP[__uCurrentTaskIdx] = psp;
}

Add a Task#

We need to set how much RAM is reserved for a Task Stack. Let say 1 KB. The RAM space for Tasks should not overlap the Main Stack because Handler mode always use the Main Stack.

#define RAM_START         (0x20000000u)
#define RAM_SIZE          (128 * 1024) // 128 KB

#define MAIN_STACK        (RAM_START + RAM_SIZE)
#define TASK_STACK_SIZE   (1024u)

For the i-th task, the beginning Stack address should be:

uint32_t* psp = (uint32_t*)(MAIN_STACK - (i+1)*TASK_STACK_SIZE);

Structure of Scheduler and Task Stack

Set Task Stack#

When adding a task, we initialize the PSP value of the new task at the start address of new 1 KB RAM. We reserve 1 KB for Main Stack, then every 1 KB for a new task. The Task Table will be filled a new slot when a new task is added.

void init_task(void (*handler)) {
  int i=0;

  // find an empty slot
  for(; i<TASK_NUMBER_MAX; i++) {
    if (__puTasksPSP[i] == 0) break;
  }

  if(i >= TASK_NUMBER_MAX) {
    printf("Can not register a new task anymore!\n");
    return;
  } else {
    printf("Register a task %p at slot %i\n", handler, i);
  }

  // calculate new PSP
  uint32_t* psp = (uint32_t*)(MAIN_STACK - (i+1)*TASK_STACK_SIZE);

  // fill stack frame
  ...

Fill Stack Frame#

When a task is switched in (load and run), the CPU will use PSP pointer to find below information:

  • Last CPU status → Load xPSR register
  • Next instruction to be executed → Load PC register
  • Return mode → Load LR register
  • Last execution status → Load all R0 to R12 registers

As we use the SysTick exception to trigger Context Switching, the below registers are automatically saved at exception:

  • xPSR, PC, LR, R12, R3 to R0, in fixed order

Therefore, we will prepare additional registers in the order R11 to R4.

What are the value of those registers???

  • xPSR = 0x01000000: no special status should be set at initial stage of a task, except the Thumb State bit must be set
  • PC = task’ handler function: should be the address of the task’s function
  • LR = 0xFFFFFFFD: when exception occurs, LR contains the EXC_RETURN value to control exception exit. This should make exception returns Thread mode, non-floating-point state from PSP
  • R12 to R0 = any dummy data: can use a pattern for debugging
void init_task(void (*handler)) {
    ...
    // fill dummy stack frame
    *(--psp) = 0x01000000u; // Dummy xPSR, just enable Thumb State bit;
    *(--psp) = (uint32_t) handler; // PC
    *(--psp) = 0xFFFFFFFDu; // LR with EXC_RETURN to return to Thread using PSP
    *(--psp) = 0x12121212u; // Dummy R12
    *(--psp) = 0x03030303u; // Dummy R3
    *(--psp) = 0x02020202u; // Dummy R2
    *(--psp) = 0x01010101u; // Dummy R1
    *(--psp) = 0x00000000u; // Dummy R0
    *(--psp) = 0x11111111u; // Dummy R11
    *(--psp) = 0x10101010u; // Dummy R10
    *(--psp) = 0x09090909u; // Dummy R9
    *(--psp) = 0x08080808u; // Dummy R8
    *(--psp) = 0x07070707u; // Dummy R7
    *(--psp) = 0x06060606u; // Dummy R6
    *(--psp) = 0x05050505u; // Dummy R5
    *(--psp) = 0x04040404u; // Dummy R4

    // save PSP
    __puTasksPSP[i] = (uint32_t)psp;

Start scheduling#

The scheduler has to do some extra steps before it can run the first task.

Enable PSP#

Task must use its own stack, therefore, we have to enable Process Stack mode. The initial PSP value should be the PSP of the first task. To access PSP and CONTROL register, we have to use inline assembly code.

void start_scheduler() {
  printf("Start Scheduler!\n");

  // start with the first task
  __uCurrentTaskIdx = 0;

  // prepare PSP of the first task
  __asm volatile("BL get_current_psp"); // return PSP in R0
  __asm volatile("MSR PSP, R0");  // set PSP

  // change to use PSP
  __asm volatile("MRS R0, CONTROL");
  __asm volatile("ORR R0, R0, #2"); // set bit[1] SPSEL
  __asm volatile("MSR CONTROL, R0");

  ...
}

Start SysTick#

We need to configure the SysTick to trigger every 1 ms. The application is still have Privileged Access level, therefore, we can start a timer at this point.

/* SysTick register address */
#define SCSR                  *((volatile uint32_t*) 0xE000E010u)
#define SRVR                  *((volatile uint32_t*) 0xE000E014u)

void start_scheduler() {
  ...
  // clear and set the period
  SRVR &= ~0xFFFFFFFF;
  SRVR |= 16000-1; // 1000 Hz ~ 1 ms
  // enable SysTick
  SCSR |= (1 << 1); // enable SysTick Exception request
  SCSR |= (1 << 2); // select system clock
  ...
}

Set Access Level (optional)#

This is an optional step to move the user task to Unprivileged Access level. If user tasks are set to Unprivileged, they are limited to access system memory, then you have to implement System Call through SVC exception.

void start_scheduler() {
  ...
  // Move to Unprivileged level
  __asm volatile("MRS R0, CONTROL");
  __asm volatile("ORR R0, R0, #1"); // Set bit[0] nPRIV
  __asm volatile("MSR CONTROL, R0");
  ...
}

Run the first task#

Finally, we run the first task. Note the PC value we have pushed to the task’s Stack Frame.

void start_scheduler() {
  ...
  // get the handler of the first task by tracing back from PSP which is at R4 slot
  void (*handler)() = (void (*))((uint32_t*)__puTasksPSP[__uCurrentTaskIdx])[14];

  // execute the handler
  handler();
}

The SysTick will be triggered to do Context Switching.

Context Switching#

The Context Switching is implemented in the SysTick_Handler() as it is triggered periodically.

Note 1:
The xPSR, PC, LR, R12, R3 to R0 registers are automatically saved to Task’s PSP stack when an exception occurs. They are also restored automatically when the exception exits.

Context Switching on Registers

Note 2:

The processor use Fill Descending Stack, therefore, when pushing registers to stack, the PSP is decreased, while popping register from stack, the PSP is increased.

Therefore, we can use STMDB (Store memory using Decrement Before) to save registers at PSP. Note to use the exclamation to update the changed address (new PSP value).

__asm volatile("STMDB R0!, {R4-R11}"); // R0 is updated after decrement

The same note is applied for LDMIA (Load memory using Increment After) to pop registers from stack.

Note 3:
To note change the PSP stack of the current task, SysTick_Handler() must be a naked function to remove prologue and epilogue sequence of a function.
Note 4:

The scheduler will decode which task is run next. A Round-Robin, it is easy to make a circular index.

void select_next_task() {
    /* Round-Robin scheduler */
    __uCurrentTaskIdx++;
    // check if a task is register at current slot
    if (__uCurrentTaskIdx >= TASK_NUMBER_MAX || __puTasksPSP[__uCurrentTaskIdx] == 0) {
        __uCurrentTaskIdx=0;
    }
}

The implementation of Context Switching:

__attribute__ ((naked)) void SysTick_Handler() {
  // save LR back to main, must do this firstly
  __asm volatile("PUSH {LR}");

  printf("****\n");

  /* Save the context of current task */

  // get current PSP
  __asm volatile("MRS R0, PSP");
  // save R4 to R11 to PSP Frame Stack
  __asm volatile("STMDB R0!, {R4-R11}"); // R0 is updated after decrement
  // save current value of PSP
  __asm volatile("BL save_current_psp"); // R0 is first argument

  /* Do scheduling */

  // select next task
  __asm volatile("BL select_next_task");

  /* Retrieve the context of next task */

  // get its past PSP value
  __asm volatile("BL get_current_psp"); // return PSP is in R0
  // retrieve R4-R11 from PSP Fram Stack
  __asm volatile("LDMIA R0!, {R4-R11}"); // R0 is updated after increment
  // update PSP
  __asm volatile("MSR PSP, R0");

  // exit
  __asm volatile("POP {LR}");
  __asm volatile("BX LR");
}

Main function#

In the main, we just need to do:

  • Declare Tasks’ main function
  • Add tasks into the Scheduler
  • Start the scheduler
int main() {
    // some initialization
    ...
    // add tasks
    init_task(task1_main);
    init_task(task2_main);
    init_task(task3_main);
    init_task(task4_main);
    // start
    start_scheduler();
    // should never go here
    for(;;);****
}

Debug and Run#

Set a breakpoint before the scheduler starts:

  • Check the Stack values
  • Check the dummy Stack Frame

The Stack Frame is filled

Run the scheduler, and open the output of the system, such as SWV to see the log.

All tasks should be running!

All tasks are running

There is no gurantee that the messages in output stream are completed. You mostly see the messages are interrupted into piecese because different tasks will try to use one sharing output. You will learn to synchronize output to avoid interruption later.

Scheduler with PendSV#

The delay problem

Let say task A has to delay 1 ms in its execution, how long does it wait actually?

If there are N tasks running in above scheduler, task A will have to wait N ms, not 1 ms.

Extended delay in of blocking tasks

System Ticks#

System keeps tracking the number of ticks using a global counter, and increases it by 1 at every SysTick exception. As discussed in Pendable Service Call, SysTick Handler only needs to trigger the Context Switching by setting up the pending bit for PendSV. Note that, reschedule() is called from an Exception which is in Handler mode with Privileged access level.

uint32_t __gSysTicks = 0;

void SysTick_Handler(void) {
  printf("****\n");
  __gSysTicks++;
  reschedule(1); // privileged
}

Context Switching in PendSV#

We move the Context Switching from SysTick Handler to the PendSV Handler. The sequence is kept as the same as before:

  • Save the context of current task
  • Select next task
  • Retrieve the context of next task
  • Resume next task execution
__attribute__ ((naked)) void PendSV_Handler(void) {
  // save LR back to main, must do this firstly
   __asm volatile("PUSH {LR}");

  /* Save the context of current task */

  // get current PSP
  __asm volatile("MRS R0, PSP");
  // save R4 to R11 to PSP Frame Stack
  __asm volatile("STMDB R0!, {R4-R11}"); // R0 is updated after decrement
  // save current value of PSP
  __asm volatile("BL save_current_psp"); // R0 is first argument

  /* Do scheduling */

  // select next task
  __asm volatile("BL select_next_task");

  /* Retrieve the context of next task */

  // get its past PSP value
  __asm volatile("BL get_current_psp"); // return PSP is in R0
  // retrieve R4-R11 from PSP Fram Stack
  __asm volatile("LDMIA R0!, {R4-R11}"); // R0 is updated after increment
  // update PSP
  __asm volatile("MSR PSP, R0");

  // exit
  __asm volatile("POP {LR}");
  __asm volatile("BX LR");
}

Task state#

When a task has got nothing to do, it should go to the Blocked State in which it actually does nothing instead of consuming CPU to do a meaningless loop.

To make it simple, a task has 2 states: RUNNING and BLOCKED.

The scheduler should schedule only RUNNING tasks, and unblock BLOCKED tasks if their blocking period is over.

typedef enum {
  RUNNING,
  BLOCKED
} TaskState_t;

Task Control Block#

There is a structure called Task Control Block to group all information related to a task:

  • Handler: the main function of the task
  • PSP: the current process stack pointer
  • State: task is running or blocked
  • WaitToTick: task is blocked until the system ticks meet this value
typedef struct {
  void (*handler)(void);
  uint32_t psp;
  TaskState_t state;
  uint32_t waitToTick;
} TCB_t;

uint32_t __uCurrentTaskIdx = 0;
TCB_t __pTasks[TASK_NUMBER_MAX] = {0};

uint32_t get_current_psp() {
  return __pTasks[__uCurrentTaskIdx].psp;
}

void save_current_psp(uint32_t psp) {
  __pTasks[__uCurrentTaskIdx].psp = psp;
}

Blocked Task#

A task_delay() function will set the current task to BLOCKED state, and set its waitToTick value. When a task is blocked, it should tell scheduler to run a next task. Note that, reschedule() is called from a user task which is in Thread mode with Unprivileged access level.

void task_delay(uint32_t ticks) {
  if(__uCurrentTaskIdx) {
    __pTasks[__uCurrentTaskIdx].state = BLOCKED;
    __pTasks[__uCurrentTaskIdx].waitToTick = __gSysTicks + ticks;
    reschedule(0); // unprivileged
  }
}

Idle Task#

This is the default and the first task of the scheduler. In case all user tasks are blocked, the Idle task will be run. As soon as a user task is unblocked, that task is scheduled to run on next SysTick exception.

void idle_main(void) {
  while(1) {
    __asm volatile("NOP");
  }
}

The idle_main is hidden to user, when a user tasks is added, scheduler will try to add the idle_main at the first slot:

void init_task(void (*handler)) {

  // init idle if it is not found
  if (handler != idle_main) {
    if (__pTasks[0].handler != idle_main) {
      init_task(idle_main);
    }
  }

  // find an empty slot
  int i=0;
  for(; i<TASK_NUMBER_MAX; i++) {
    if (__pTasks[i].psp == 0) break;
  }


  if(i >= TASK_NUMBER_MAX) {
    printf("Can not register a new task anymore!\n");
    return;
  } else {
    printf("Register a task %p at slot %i\n", handler, i);
  }


  // calculate new PSP
  uint32_t* psp = (uint32_t*)(MAIN_STACK - (i+1)*TASK_STACK_SIZE);

  // fill dummy stack frame
  *(--psp) = 0x01000000u; // Dummy xPSR, just enable Thumb State bit;
  *(--psp) = (uint32_t) handler; // PC
  *(--psp) = 0xFFFFFFFDu; // LR with EXC_RETURN to return to Thread using PSP
  *(--psp) = 0x12121212u; // Dummy R12
  *(--psp) = 0x03030303u; // Dummy R3
  *(--psp) = 0x02020202u; // Dummy R2
  *(--psp) = 0x01010101u; // Dummy R1
  *(--psp) = 0x00000000u; // Dummy R0
  *(--psp) = 0x11111111u; // Dummy R11
  *(--psp) = 0x10101010u; // Dummy R10
  *(--psp) = 0x09090909u; // Dummy R9
  *(--psp) = 0x08080808u; // Dummy R8
  *(--psp) = 0x07070707u; // Dummy R7
  *(--psp) = 0x06060606u; // Dummy R6
  *(--psp) = 0x05050505u; // Dummy R5
  *(--psp) = 0x04040404u; // Dummy R4

  // save PSP
  __pTasks[i].psp = (uint32_t)psp;

  // initial state
  __pTasks[i].state = RUNNING;
  __pTasks[i].handler = handler;
}

Select next task#

  • At every context switching request, a task is selected to be run only when it is in RUNNING state
  • All tasks will be checked to unblock if its waitToTick meets the global tick
void select_next_task() {
  /* Check all task state */
  for(int i=0; i<TASK_NUMBER_MAX; i++) {
    if(__pTasks[i].state == BLOCKED) {
      if(__gSysTicks >= __pTasks[i].waitToTick) {
        __pTasks[i].state = RUNNING;
      }
    }
  }

  /* Round-Robin scheduler */
  while(1) {
    __uCurrentTaskIdx++;
    // check if a task is register at current slot
    if (__uCurrentTaskIdx >= TASK_NUMBER_MAX || __pTasks[__uCurrentTaskIdx].psp == 0) {
      __uCurrentTaskIdx=0;
    }

    if(__pTasks[__uCurrentTaskIdx].state == RUNNING) break;
  }
}

Reschedule the task#

From above, we know that there are 2 moments that requests to reschedule the tasks:

  • In SysTick Handler: call from Handler mode with Privileged access level

    We can directly access the Interrupt Control and State Register to set the pending bit for PendSV

  • In Delay Task: call from Thread mode with Unprivileged access level

    We have limited access to the Interrupt Control and State Register, therefore, we have to request a Supervisor Call SVC

Fault Exception

  • Calling SVC in an SysTick handler causes Hard Fault (FORCED)
  • Direct access ICSR register in Thread Mode with unprivileged access causes Bus Fault (PRECISERR)
void reschedule(uint32_t priv) {
  if(priv) {
    // trigger PendSV directly
    ICSR |= (1 << 28);
  } else {
    // call Supervior exception to get Privileged access
    __asm volatile("SVC #255");
  }
}

To call to SVC exception, refer to Supervisor Call (SVC):

__attribute__ ((naked)) void SVC_Handler(void) {
  __asm volatile("TST LR, 4"); // check LR to know which stack is used
  __asm volatile("ITE EQ"); // 2 next instructions are conditional
  __asm volatile("MRSEQ R0, MSP"); // save MSP if bit 2 is 0
  __asm volatile("MRSNE R0, PSP"); // save PSP if bit 2 is 1
  __asm volatile("B SVC_Handler_main"); // pass R0 as the argument
}

void SVC_Handler_main(uint32_t* SP) {
  // get the address of the instruction saved in PC
  uint8_t *pInstruction = (uint8_t*)(SP[6]);

  // go back 2 bytes (16-bit opcode)
  pInstruction -= 2;

  // get the opcode, in little endian
  uint8_t svc_num = *pInstruction;

  switch(svc_num) {
    case 0xFF:
      // trigger PendSV
      ICSR |= (1 << 28);
      break;
    default:
      break;
  }
}

User Tasks and Main function#

Here is the time we use task_delay() to delay a task in non-blocking way. The expected result is: print 1111 after 16 ticks, print 2222 after 8 ticks, and so on.

/* Tasks */
void task1_main(void) {
  while(1) {
    printf("1111\n");
    task_delay(16);
  }
}

void task2_main(void) {
  while(1) {
    printf("2222\n");
    task_delay(8);
  }
}

void task3_main(void) {
  while(1) {
    printf("3333\n");
    task_delay(4);
  }
}

void task4_main(void) {
  while(1) {
    printf("4444\n");
    task_delay(2);
  }
}

int main(void)
{
  // some initialization
    ...
  // add tasks
  init_task(task1_main);
  init_task(task2_main);
  init_task(task3_main);
  init_task(task4_main);
  // start
  start_scheduler();
  // should never go here
  for(;;);
}

Debug and Run#

Run the scheduler, and open the output of the system, such as SWV to see the log.

All tasks should be running!

You can count the number of ticks **** to see how many ticks are passed before a task prints out again.

Tasks are running and have correct delays

Comments