[riot-notifications] [RIOT-OS/RIOT] core/mutex: Add mutex_cancel (#15442)

Kaspar Schleiser notifications at github.com
Sun Nov 15 12:32:54 CET 2020

> In fact, the semantics of this function originally where different, but I had to change it in order to allow `xtimer_mutex_lock_timeout()` to be implemented on it. This is IMO a good indicator that the semantics is already set in stone due to the feature being exposed indirectly. So why not making it officially set in stone then.

Having the option to cancel a mutex makes implementation of xtime_mutex_lock_timeout() et al easier, I like that. But a basic primitive like mutex _usually_ doesn't have a cancel method, and _usually_ doesn't need its return code checked for "lock() exited, is the mutex now locked?".

In C, ensuring that code won't break because suddenly mutexes are canceled is usually done by convention ("don't call mutex_cancel() unless all users of that mutex expect that!". That's fine, C world works like this. But these things also make reasoning about code more difficult, because there are now constraints that are not in code. In our case, mutex is one of the most used primitives, and exposing mutex_cancel() makes all that code essentially prone to undefined behavior. It is also not feasible to check the return code of mutex_unlock() everywhere.

There has been some work on providing safe Rust abstractions on top of RIOT's API, e.g., @chrysn's awesome [riot-wrappers](https://gitlab.com/etonomy/riot-wrappers). I'm also working on a drop in replacement for core written in Rust.
In Rust, it is possible to express many more constraints in actual code, so the compiler can prevent a lot of misuse.
Now, when mutex_lock() can suddenly return "ECANCELED", the guarantees of those higher-level abstractions become watered, as in, they depend on other code using the mutex to not call mutex_cancel(), (even if that is unlikely to happen). If the concurrency guarantees are to be held at compile time, that case needs to be handled, and it needs to be handled *where it might show up* (so not once, but in code locking mutexes) . Going with convention here would make the abstractions less safe, 
handling the case in order to keep guarantees would make the abstractions less convenient and less efficient.

Previously, we had core and xtimer intermingled badly at the code level (xtimer accessed core structs directly and had half a mutex implementation inside). That was obviously suboptimal.
With this PR, the mutex implementation becomes intermingled with xtimer at API level. I prefer that over the previous solution, but it requires mutex_lock() to possibly return ECANCELED.

As long as the use of this is contained to mutex+xtimer+ztimer, higher level code can pretty safely just not handle ECANCELED outside of the mutex lock timeout functions. If mutex_cancel() is part of the public API, higher level code needs to handle the case, or be open to misuse.

If it were to me, I'd put the function declaration in a separate header that is not supposed to be included by "normal" mutex users.
Or just re-declare in xtimer.c and ztimer_core.c.
But I'd be fine with doing this through documentation.

> IMO it doesn't matter if we have to maintain this features due to it being used xtimer_mutex_lock_timeout() and ztimer_mutex_lock_timeout(), or because it is directly supported as public API.

Asked the other way around - do we know any other use case for mutex_cancel()?
If we don't, how about we keep it out of the public API for now, and re-visit once we do?
That would make life easier for people writing safe higher-level abstractions.

You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riot-os.org/pipermail/notifications/attachments/20201115/beee93c6/attachment.htm>

More information about the notifications mailing list