[riot-notifications] [RIOT-OS/RIOT] cpu/cortexm_common: replace irq_restore by __set_PRIMASK for stm32l152re (#11919)

Francisco notifications at github.com
Tue Jul 30 16:43:22 CEST 2019


Talking to @cladmi IRL he suggested I mention more details on the possible explanation I have for this bug, even if still very unclear...

**The faulty scenario**

So from debugging output at some point after wake-up the `pc` gets corrupted and un-reachable instructions gets executed. This only happens when  `DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP ` are set. An examples for the debugging output is here:

```
2019-07-09 17:38:58,314 - INFO # Attempting to reconstruct state for debugging...
2019-07-09 17:38:58,315 - INFO # In GDB:
2019-07-09 17:38:58,316 - INFO #   set $pc=0x7822d5e
2019-07-09 17:38:58,317 - INFO #   frame 0
2019-07-09 17:38:58,317 - INFO #   bt
2019-07-09 17:38:58,318 - INFO # 
2019-07-09 17:38:58,319 - INFO # ISR stack overflowed by at least 16 bytes.
```

**Hints to cause**

Looking around in `stm32` and `cortex-m3` erratas and datasheet. I found a mention of a similar issue with `stm32f4` in [this ERRATA](https://www.st.com/content/ccc/resource/technical/document/errata_sheet/0a/98/58/84/86/b6/47/a2/DM00037591.pdf/files/DM00037591.pdf/jcr:content/translations/en.DM00037591.pdf) section 2.1.3. In this errata there are some hints as issue that happen the WFE/WFI are placed at 4 byte alignment and problems with the pref-etch buffer. Although this is a differtent `cpu (cortex-m4)`, it made me snoop around the pref-etch buffer and made me think a similar issue might be happening on `cortex-m3`

In the case of `cortex-m3` the pref-etch buffer can fetch two 32bits instructions or 4 16bits instructions but only in sequential code execution. In our code the `PRFTEN` and `ACC64` are enabled so we are reading 64 bits at a time. It [stm32l1xxx reference manual](https://www.st.com/content/ccc/resource/technical/document/reference_manual/cc/f9/93/b2/f0/82/42/57/CD00240193.pdf/files/CD00240193.pdf/jcr:content/translations/en.CD00240193.pdf) it is stated:

```
When the code is not sequential (branch), the instruction may not be present neither in the
current instruction line used nor in the prefetched instruction line. In this case, the penalty in
terms of number of cycles is at least equal to the number of Wait States.
```

This lead me to believe that for some reason when the branch instruction is present it is executing a corrupted pre-fetch buffer instruction, or in other terms an instruction that isn't present. For some reason this only happens when the HCLK and FCLK stays enabled in sleep mode. This might have something to do with different wake-up times since the clock is always enabled for the core?? I wasn't able to find many details of what happens on wake-up, and what could be different when HCLK stays enabled.

****

In conclusion, I don't really know the reason why the hard fault occurs, why jumping just after `WFI` causes a hardfault, I only have hints to the reason. But this fix avoids getting into the scenario where the fault may occur, as stm32 erratas would phrase it _It is a workaround a faulty scenario_, but IMO the less intrusive one, no instructions are added and the code behavior is exactly the same, but avoiding the jump so with fewer instructions instead of more. Also this only happens when  `DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP ` is set, which is only set because of openocd. 



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/RIOT-OS/RIOT/pull/11919#issuecomment-516450531
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riot-os.org/pipermail/notifications/attachments/20190730/dff9aff0/attachment.htm>


More information about the notifications mailing list