<p></p>
<p>The machine code generated by this PR looks like this:</p>
<div class="highlight highlight-source-assembly"><pre><span class="pl-en">Disassembly of section .text.thread_yield_higher:</span>

<span class="pl-c1">00000000</span><span class="pl-en"> <thread_yield_higher>:</span>
<span class="pl-en">   </span><span class="pl-c1">0</span><span class="pl-en">:   020007b7                lui     a5</span><span class="pl-s1">,</span><span class="pl-c1">0x2000</span>
<span class="pl-en">   </span><span class="pl-c1">4</span><span class="pl-en">:   </span><span class="pl-c1">4705</span><span class="pl-en">                    li      a4</span><span class="pl-s1">,</span><span class="pl-c1">1</span>
<span class="pl-en">   </span><span class="pl-c1">6</span><span class="pl-en">:   c398                    sw      a4</span><span class="pl-s1">,</span><span class="pl-c1">0</span><span class="pl-en">(a5)</span>
<span class="pl-en">   </span><span class="pl-c1">8</span><span class="pl-en">:   020006b7                lui     a3</span><span class="pl-s1">,</span><span class="pl-c1">0x2000</span>
<span class="pl-en">   c:   </span><span class="pl-c1">4785</span><span class="pl-en">                    li      a5</span><span class="pl-s1">,</span><span class="pl-c1">1</span>

<span class="pl-en">0000000e <.L12>:</span>
<span class="pl-en">   e:   </span><span class="pl-c1">4298</span><span class="pl-en">                    lw      a4</span><span class="pl-s1">,</span><span class="pl-c1">0</span><span class="pl-en">(a3)</span>
<span class="pl-en">  </span><span class="pl-c1">10</span><span class="pl-en">:   fef70fe3                beq     a4</span><span class="pl-s1">,</span><span class="pl-en">a5</span><span class="pl-s1">,</span><span class="pl-en">e <.L12></span>
<span class="pl-en">  </span><span class="pl-c1">14</span><span class="pl-en">:   </span><span class="pl-c1">8082</span><span class="pl-en">                    </span><span class="pl-k">ret</span></pre></div>
<p>That is suboptimal because the address of <code>CLINT_MSIP</code> is first loaded into <code>a5</code> and later again in <code>a3</code> - why doesn't GCC reuse <code>a5</code> for the subsequent load is a mystery to me. Similar the immediate <code>1</code> is stored in <code>a4</code> and later again in <code>a5</code> - again the register could just be reused. Dropping the duplicated code results in 6 B ROM less consumed:</p>
<div class="highlight highlight-source-assembly"><pre><span class="pl-en">Disassembly of section .text.thread_yield_higher:</span>

<span class="pl-c1">00000000</span><span class="pl-en"> <thread_yield_higher>:</span>
<span class="pl-en">   </span><span class="pl-c1">0</span><span class="pl-en">:   </span><span class="pl-c1">4785</span><span class="pl-en">                    li      a5</span><span class="pl-s1">,</span><span class="pl-c1">1</span>
<span class="pl-en">   </span><span class="pl-c1">2</span><span class="pl-en">:   </span><span class="pl-c1">02000737</span><span class="pl-en">                lui     a4</span><span class="pl-s1">,</span><span class="pl-c1">0x2000</span>
<span class="pl-en">   </span><span class="pl-c1">6</span><span class="pl-en">:   c31c                    sw      a5</span><span class="pl-s1">,</span><span class="pl-c1">0</span><span class="pl-en">(a4)</span>

<span class="pl-c1">00000008</span><span class="pl-en"> <loop_until_softirq_handled>:</span>
<span class="pl-en">   </span><span class="pl-c1">8</span><span class="pl-en">:   </span><span class="pl-c1">4314</span><span class="pl-en">                    lw      a3</span><span class="pl-s1">,</span><span class="pl-c1">0</span><span class="pl-en">(a4)</span>
<span class="pl-en">   a:   fef68fe3                beq     a3</span><span class="pl-s1">,</span><span class="pl-en">a5</span><span class="pl-s1">,</span><span class="pl-c1">8</span><span class="pl-en"> <loop_until_softirq_handled></span>
<span class="pl-en">   e:   </span><span class="pl-c1">8082</span><span class="pl-en">                    </span><span class="pl-k">ret</span></pre></div>
<details> <summary>patch of <code>thread_yield_higher()</code> to safe 6 B of ROM</summary>
<div class="highlight highlight-source-diff"><pre>commit 40117a082b354961ed9088e477963e8fce35ed74 (HEAD -> fe310)
Author: Marian Buschsieweke <marian.buschsieweke@ovgu.de>
Date:   Thu Jan 7 17:10:31 2021 +0100

    cpu/fe310: optimize irq_yield_higher()

<span class="pl-c1">diff --git a/cpu/fe310/thread_arch.c b/cpu/fe310/thread_arch.c</span>
index ed5687a49f..e12fe76efb 100644
<span class="pl-md">--- a/cpu/fe310/thread_arch.c</span>
<span class="pl-mi1">+++ b/cpu/fe310/thread_arch.c</span>
<span class="pl-mdr">@@ -180,13 +180,32 @@</span> void cpu_switch_context_exit(void)
 
 void thread_yield_higher(void)
 {
<span class="pl-md"><span class="pl-md">-</span>    /* Use SW intr to schedule context switch */</span>
<span class="pl-md"><span class="pl-md">-</span>    CLINT_REG(CLINT_MSIP) = 1;</span>
<span class="pl-md"><span class="pl-md">-</span></span>
<span class="pl-md"><span class="pl-md">-</span>    /* Latency of SW intr can be a few cycles; wait for the SW intr.</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    /* Let compiler to register allocation via fake output */</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    uint32_t tmp;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    uint32_t one = 1;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    uintptr_t clint_msip = CLINT_CTRL_ADDR + CLINT_MSIP;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    /* The assembly below implements:</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     *</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     *     *((volatile uint32_t)clint_msip) = 1;</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     *      while (*((volatile uint32_t)clint_msip) == 1) { }</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     *</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     * One would assume that the compiler is able to implement this efficiently,</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     * but alas, this is not the case. The inline assembly version safes 6 B.</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     *</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     * Latency of SW intr can be a few cycles; wait for the SW intr.</span>
      * We cannot use WFI here as the SW intr may be delivered before we
<span class="pl-md"><span class="pl-md">-</span>     * reach the WFI instruction, thereby causing a thread deadlock. */</span>
<span class="pl-md"><span class="pl-md">-</span>    while (CLINT_REG(CLINT_MSIP) == 1) {};</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     * reach the WFI instruction, thereby causing a thread deadlock.</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>     */</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    __asm__ volatile (</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        "sw %[one], 0(%[clint_msip])"                                       "\n"</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        "loop_until_softirq_handled:"                                       "\n"</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        "lw %[tmp], 0(%[clint_msip])"                                       "\n"</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        "beq %[tmp], %[one], loop_until_softirq_handled"                    "\n"</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        : [tmp]         "=&r"(tmp)</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        : [one]         "r"(one),</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>          [clint_msip]  "r"(clint_msip)</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>        : "memory"</span>
<span class="pl-mi1"><span class="pl-mi1">+</span>    );</span>
 }
 
 /**
</pre></div>
</details>
<p>But I would be surprised if two CPU instructions less would do much about the 6-7% performance regression.</p>

<p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="https://github.com/RIOT-OS/RIOT/pull/15277#issuecomment-756221953">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/ABE7WYALUHSNQ32ZJO7RRCDSYXN7FANCNFSM4S3N3C6Q">unsubscribe</a>.<img src="https://github.com/notifications/beacon/ABE7WYCEEFXJ6VBMTVM2AYLSYXN7FA5CNFSM4S3N3C62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFUJQQAI.gif" height="1" width="1" alt="" /></p>
<script type="application/ld+json">[
{
"@context": "http://schema.org",
"@type": "EmailMessage",
"potentialAction": {
"@type": "ViewAction",
"target": "https://github.com/RIOT-OS/RIOT/pull/15277#issuecomment-756221953",
"url": "https://github.com/RIOT-OS/RIOT/pull/15277#issuecomment-756221953",
"name": "View Pull Request"
},
"description": "View this Pull Request on GitHub",
"publisher": {
"@type": "Organization",
"name": "GitHub",
"url": "https://github.com"
}
}
]</script>