[riot-notifications] [RIOT-OS/RIOT] sys: Added simple memory barrier API (#11438)

Kaspar Schleiser notifications at github.com
Fri Apr 26 11:24:37 CEST 2019


> Without LTO the call to `bar()` is effectively a compiler barrier. With LTO `bar()` gets inlined and the accesses to `a` get reordered across what has been the call to `bar()` and combined to a single 32 bit store.

... and thus a function call cannot be used as compiler barrier.

I have tried the same using a global variable, a volatile variable and a volatile asm access:

bar.c:

```
#ifdef MEM_VOLATILE
volatile int _bar = 0x1337;
#endif

#ifdef MEM
int _bar = 0x1337;
#endif

int bar(void) {
#ifdef ASM_VOLATILE
    __asm__ volatile ("mov    %eax,%eax;");
#endif
#if defined(MEM_VOLATILE) || defined(MEM)
    return _bar;
#else
    return 0x1337;
#endif
}
```

When compiled with ```-DMEM``` (returning a global variable instead of literal):
```
gcc -flto -DMEM -c -O3 -o foo.o foo.c
gcc -flto -DMEM -c -O3 -o bar.o bar.c
gcc -flto -DMEM -O3 -o lto foo.o bar.o
```

```
 31 0000000000001040 <main>:
 32     1040:›  48 83 ec 08          ›  sub    $0x8,%rsp
 33     1044:›  ba 37 13 00 00       ›  mov    $0x1337,%edx
 34     1049:›  be a0 a1 a2 a3       ›  mov    $0xa3a2a1a0,%esi
 35     104e:›  31 c0                ›  xor    %eax,%eax
 36     1050:›  48 8d 3d ad 0f 00 00 ›  lea    0xfad(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
 37     1057:›  c7 05 d3 2f 00 00 a0 ›  movl   $0xa3a2a1a0,0x2fd3(%rip)        # 4034 <a>
 38     105e:›  a1 a2 a3·
 39     1061:›  e8 ca ff ff ff       ›  callq  1030 <printf at plt>
 40     1066:›  31 c0                ›  xor    %eax,%eax
 41     1068:›  48 83 c4 08          ›  add    $0x8,%rsp
 42     106c:›  c3                   ›  retq···
 43     106d:›  0f 1f 00             ›  nopl   (%rax)
```

This is identical to the original LTO version, thus it makes no difference whether the compiler can reason the function is "simple" (just returning a constant) or if it returns a global variable.

When compiled with ```-DMEM_VOLATILE```:

```
gcc -flto -DMEM_VOLATILE -c -O3 -o foo.o foo.c
gcc -flto -DMEM_VOLATILE -c -O3 -o bar.o bar.c
gcc -flto -DMEM_VOLATILE -O3 -o lto foo.o bar.o
```

```
 31 0000000000001040 <main>:
 32     1040:›  b8 a2 a3 ff ff       ›  mov    $0xffffa3a2,%eax
 33     1045:›  48 83 ec 08          ›  sub    $0x8,%rsp
 34     1049:›  c6 05 e8 2f 00 00 a0 ›  movb   $0xa0,0x2fe8(%rip)        # 4038 <__TMC_END__>
 35     1050:›  8b 15 da 2f 00 00    ›  mov    0x2fda(%rip),%edx        # 4030 <_bar>
 36     1056:›  66 89 05 dd 2f 00 00 ›  mov    %ax,0x2fdd(%rip)        # 403a <__TMC_END__+0x2>
 37     105d:›  48 8d 3d a0 0f 00 00 ›  lea    0xfa0(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
 38     1064:›  31 c0                ›  xor    %eax,%eax
 39     1066:›  c6 05 cc 2f 00 00 a1 ›  movb   $0xa1,0x2fcc(%rip)        # 4039 <__TMC_END__+0x1>
 40     106d:›  8b 35 c5 2f 00 00    ›  mov    0x2fc5(%rip),%esi        # 4038 <__TMC_END__>
 41     1073:›  e8 b8 ff ff ff       ›  callq  1030 <printf at plt>
 42     1078:›  31 c0                ›  xor    %eax,%eax
 43     107a:›  48 83 c4 08          ›  add    $0x8,%rsp
 44     107e:›  c3                   ›  retq···
 45     107f:›  90                   ›  nop
```

Here the volatile read is inserted after the write of "0xa0" (line nr 35), the memory access to 'a' is not reordered across the inlined function call containing a volatile memory access.
I modified [foo.c](https://gist.github.com/db72bd7d1e2effeddb7a9a4a459e6d44) to define _bar itself (and dropped bar.c). Interestingly, the resulting code is 100% identical to the inlined version. gcc does not re-order the writes to 'a' around the volatile read.

Here's the effect of an ```__asm__ volatile``` block (compile with ```-DASM_VOLATILE```):

```
gcc -flto -DASM_VOLATILE -c -O3 -o foo.o foo.c                                                                                              
gcc -flto -DASM_VOLATILE -c -O3 -o bar.o bar.c                                                                                              
gcc -flto -DASM_VOLATILE -O3 -o lto foo.o bar.o                      
```

```
 31 0000000000001040 <main>:
 32     1040:›  48 83 ec 08          ›  sub    $0x8,%rsp
 33     1044:›  c6 05 e9 2f 00 00 a0 ›  movb   $0xa0,0x2fe9(%rip)        # 4034 <a>
 34     104b:›  89 c0                ›  mov    %eax,%eax
 35     104d:›  b8 a2 a3 ff ff       ›  mov    $0xffffa3a2,%eax
 36     1052:›  c6 05 dc 2f 00 00 a1 ›  movb   $0xa1,0x2fdc(%rip)        # 4035 <a+0x1>
 37     1059:›  ba 37 13 00 00       ›  mov    $0x1337,%edx
 38     105e:›  48 8d 3d 9f 0f 00 00 ›  lea    0xf9f(%rip),%rdi        # 2004 <_IO_stdin_used+0x4>
 39     1065:›  66 89 05 ca 2f 00 00 ›  mov    %ax,0x2fca(%rip)        # 4036 <a+0x2>
 40     106c:›  8b 35 c2 2f 00 00    ›  mov    0x2fc2(%rip),%esi        # 4034 <a>
 41     1072:›  31 c0                ›  xor    %eax,%eax
 42     1074:›  e8 b7 ff ff ff       ›  callq  1030 <printf at plt>
 43     1079:›  31 c0                ›  xor    %eax,%eax
 44     107b:›  48 83 c4 08          ›  add    $0x8,%rsp
 45     107f:›  c3                   ›  retq···
```

Here, gcc does not reorder the writes to 'a' around the asm statement (line nr 34).

So, while the function call itself pretty much vanishes (it doesn't have any effect) when using LTO, gcc, to me, doesn't show any unexpected behavior.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/RIOT-OS/RIOT/pull/11438#issuecomment-486990929
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riot-os.org/pipermail/notifications/attachments/20190426/a4f4c4e5/attachment-0001.html>


More information about the notifications mailing list