[riot-devel] sched_active_thread is a volatile pointer, but volatile is ignored

Juan Ignacio Carrano j.carrano at fu-berlin.de
Wed Feb 27 14:51:12 CET 2019



On 8/1/19 21:47, Kees Bakker wrote:

>> E.g.:
>>
>> void msg_recv(...)
>> {
>>     if (!sched_active_thread->waiters) {
>>      // platform specific macro to suspend thread
>>     }
>>
>>     thread_t *waiter = sched_active_thread->waiters;
>>
>>     // ...
>> }
>>
>> (or similar)
>>
>> My understanding of volatile back then was that the compiler could,
>> without volatile, assume that sched_active_thread->waiters equals NULL.
> Well, in the example code, the compiler would just read 
> sched_active_thread->waiters
> once and keep it in a register. If either sched_active_thread or waiters
> can change in between the uses in that function you must declare some
> things volatile.


Say you do something like this:

extern void f(void);

int x;

int test(void)
{
     int y = x;
     f();
     return y+x;
}

Then the compiler, knowing nothing about "f()", _has_ to re-read x in 
the return statement. This is because f() could possibly modify x. This 
is a behavior in which we can rely, because it is needed even in single 
threaded programs. This is what happens when the above snippet is 
compiled with anything above -O0:

int test(void)
{
    0:	b538      	push	{r3, r4, r5, lr}
     int y = x;
    2:	4c03      	ldr	r4, [pc, #12]	; (10 <test+0x10>)
    4:	6825      	ldr	r5, [r4, #0]
     f();
    6:	f7ff fffe 	bl	0 <f>
     return y+x;
    a:	6820      	ldr	r0, [r4, #0]
}
    c:	4428      	add	r0, r5
    e:	bd38      	pop	{r3, r4, r5, pc}
   10:	00000000 	.word	0x00000000

See the two lines with "ldr	rN, [r4, #0]".

In the provided example of "msg_recv", if the "platform specific macro 
to suspend thread" includes a call to a function not defined in the 
current translation unit, then the compiler cannot assume that this 
function does not access or change globals. The only problem would be if 
the compiler thinks he knows what the function does (via inline or LTO). 
Looking at the code for _msg_receive() it does not seem to be the case, 
to I'm puzzled as to why it would break. I tried removing it, and there 
does not seem to be a problem. I'll PR it can be tried in the CI.

Marian,
 > I think getting rid of `volatile` and adding "proper" memory barriers 
to places
 > where context switches are supposed to happen will make the code more 
robust and
 > might allow the compiler to optimize the code better

Yes, that should work in those cases where the compiler may think he is 
sure a function call cannot modify certain variables. For example, the code:

int x;

int test2(void)
{
     int y = x;
     int k = z(x);
     return y+x+k;
}

Gets turned into:

int z(int x)
{
     return x*x;
}
   14:	fb00 f000 	mul.w	r0, r0, r0
   18:	4770      	bx	lr

0000001a <test2>:

int test2(void)
{
   1a:	b510      	push	{r4, lr}
     int y = x;
   1c:	4b03      	ldr	r3, [pc, #12]	; (2c <test2+0x12>)
   1e:	681c      	ldr	r4, [r3, #0]
     int k = z(x);
   20:	4620      	mov	r0, r4
   22:	f7ff fffe 	bl	14 <z>
     return y+x+k;
}
   26:	eb00 0044 	add.w	r0, r0, r4, lsl #1
   2a:	bd10      	pop	{r4, pc}
   2c:	00000000 	.word	0x00000000


I compile with -Og to prevent inlining. Observer that the global "x" is 
only loaded once.


 > `void __sync_synchronize(void)` placed just before and just after a 
context...

Yes, but most likely overkill. __sync_synchronize() is meant to issue a 
hardware barrier (that is, force the CPU to comlete all pending memory 
operations.) This is needed in multi-core, and maybe some single-core 
scenarios, but it is not what we are looking for. We need a _compiler 
barrier_, like linux's barrier macro(). See this code:

int x;

int test3(void)
{
     int y = x;
     __sync_synchronize();
     //__asm__ __volatile__("": : :"memory");  // linux's barrier()
     return y + x;
}


With no barriers, the ASM shows that x is read once. 
__sync_synchronize() inserts a "dmb" (data memory barrier) instruction 
in addition to forcing an additional read:

int test3(void)
{
     int y = x;
   30:	4b03      	ldr	r3, [pc, #12]	; (40 <test3+0x10>)
   32:	6818      	ldr	r0, [r3, #0]
     __sync_synchronize();
   34:	f3bf 8f5b 	dmb	ish
     return y + x;
   38:	681b      	ldr	r3, [r3, #0]
}
   3a:	4418      	add	r0, r3
   3c:	4770      	bx	lr

The other barrier causes a re-read, but without the "dmb", which is 
probably what we want in most cases. Linus has a series on emails[1] in 
which he explains why barrier() is good and essentially free.

Regards,

Juan.

[1] https://yarchive.net/comp/linux/compiler_barriers.html


More information about the devel mailing list