[riot-notifications] [RIOT-OS/RIOT] stm32152re: hardfault when DBGMCU_CR_DBG* bits are set and branch after __WFI() (#14015)

Francisco notifications at github.com
Mon May 4 16:22:33 CEST 2020

### Description

This issue wants to document a recurrent issue that has been seen on `stm32l152re` platforms.

#### The history so far:

When #7385 was introduced when going to sleep (i.e. calling `__WFI()`), `irq_enable` was changed to `irq_restore` and that broke `stm32l152re`. 

#8518 fixed the issue by introducing a `__NOP()` after `__WFI()`. This fixed the issue until #11159 where the call to `cortexm_sleep` was changed and now `__NOP()` didn't fix the issue but instead somehow triggered it.

#11159 re-introduced the issue by changing the way  `pm_set_lowest()` was called since `pm_set()` was now implemented for `STM32L1`. A single `__NOP()` did not fix the issue anymore.

In #11820 it was discovered that the changes in #7385 didn't actually break the code, but broke the code only when `DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP` where enabled. By default openocd sets these bits after an `examine-end` event. This is done by default for all stm32 boards.

In #11919 Since the problem was the branch, with 5d96127  `irq_restore` was replaced by `__set_PRIMASK(state);` which inlines the function call avoiding the jump and the whole issue all together.

#13999 inline the implementation of `irq_restore` so the fix in #11919 will be removed.

**The faulty scenario**

So from debugging output at some point after wake-up the `pc` gets corrupted and un-reachable instructions gets executed. This only happens when  `DBGMCU_CR_DBG_STANDBY | DBGMCU_CR_DBG_STOP | DBGMCU_CR_DBG_SLEEP ` are set. An example for the debugging output is here:

2019-07-09 17:38:58,314 - INFO # Attempting to reconstruct state for debugging...
2019-07-09 17:38:58,315 - INFO # In GDB:
2019-07-09 17:38:58,316 - INFO #   set $pc=0x7822d5e
2019-07-09 17:38:58,317 - INFO #   frame 0
2019-07-09 17:38:58,317 - INFO #   bt
2019-07-09 17:38:58,318 - INFO # 
2019-07-09 17:38:58,319 - INFO # ISR stack overflowed by at least 16 bytes.

**Hints to cause**

Looking around in `stm32` and `cortex-m3` erratas and datasheet. I found a mention of a similar issue with `stm32f4` in [this ERRATA](https://www.st.com/content/ccc/resource/technical/document/errata_sheet/0a/98/58/84/86/b6/47/a2/DM00037591.pdf/files/DM00037591.pdf/jcr:content/translations/en.DM00037591.pdf) section 2.1.3. In this errata there are some hints as issue that happen the WFE/WFI are placed at 4 byte alignment and problems with the pref-etch buffer. Although this is a differtent `cpu (cortex-m4)`, it made me snoop around the pref-etch buffer and made me think a similar issue might be happening on `cortex-m3`

In the case of `cortex-m3` the pref-etch buffer can fetch two 32bits instructions or 4 16bits instructions but only in sequential code execution. In our code the `PRFTEN` and `ACC64` are enabled so we are reading 64 bits at a time. It [stm32l1xxx reference manual](https://www.st.com/content/ccc/resource/technical/document/reference_manual/cc/f9/93/b2/f0/82/42/57/CD00240193.pdf/files/CD00240193.pdf/jcr:content/translations/en.CD00240193.pdf) it is stated:

When the code is not sequential (branch), the instruction may not be present neither in the
current instruction line used nor in the prefetched instruction line. In this case, the penalty in
terms of number of cycles is at least equal to the number of Wait States.

This lead me to believe that for some reason when the branch instruction is present it is executing a corrupted pre-fetch buffer instruction, or in other terms an instruction that isn't present. For some reason this only happens when the HCLK and FCLK stays enabled in sleep mode. This might have something to do with different wake-up times since the clock is always enabled for the core?? I wasn't able to find many details of what happens on wake-up, and what could be different when HCLK stays enabled.

#### Steps to reproduce the issue

This issue does not show up currently in master unless a single `__NOP` is added after https://github.com/fjmolinas/RIOT/blob/e7a1b40cde17dc5f407c9b3884a2603ab656ac7e/cpu/cortexm_common/include/cpu.h#L172. This may change depending on current master, since for a while a single `__NOP()` fixed the issue. 

#### Expected results

No crash ever..

#### Actual results

Has crashed in the past.

#### Possible FIXES

If the issue shows up again 3 `__NOP` could fix the issue.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riot-os.org/pipermail/notifications/attachments/20200504/33924acc/attachment.htm>

More information about the notifications mailing list