[riot-devel] Odd problems with xtimer

Michael Andersen michael at steelcode.com
Sat Feb 13 01:28:20 CET 2016


Hi

I am using other IPC messages, yes. There is a thread waiting with
xtimer_msg_receive_timeout that gets messages either from xtimer_set_msg or
from the network stack on packet reception.

Incidentally, if I decrease the load on the MCU by increasing the sensor
sampling interval, the problem seems to go away (or at least it has not
shown again in the past day).

I inserted a trap in the timer code that will stop everything and let me
debug if a timer is found in the linked list with the same address as the
timer to be inserted, I'll let you know again with more info when I
reproduce it.

Thanks
Michael



On Thu, Feb 11, 2016 at 2:05 AM, Joakim NohlgÄrd <joakim.nohlgard at eistec.se>
wrote:

> Also, you can use the .map file to find out if there are any buffers
> or other things nearby which may have overflowed and messed up your
> state.
>
> Are you using any IPC messages other than the xtimer functions?
> (I wonder if there might be a race between the timer ISR callbacks and
> the message reception in xtimer)
>
> Regards, Joakim
>
> On Wed, Feb 10, 2016 at 11:21 PM, Martine Lenders
> <authmillenon at gmail.com> wrote:
> > Hi,
> > normally you can guess where the timer came from by looking at the
> address
> > (or the debugger straight tells you). Is this somehow possible for your
> case
> > (i.e. 0x200010a4)? That might be helpful for the timer people.
> >
> > Regards,
> > Martine
> >
> > 2016-02-10 23:08 GMT+01:00 Michael Andersen <michael at steelcode.com>:
> >>
> >> Hi
> >>
> >> Thanks for the reply. I am on a platform essentially equal to a
> >> samr21xpro.
> >>
> >> The short answers:
> >>  - samr21xpro
> >> - only one declared xtimer_t object that is used more than once. I use
> it
> >> with xtimer_set_msg for a thread to send itself a message. Both the
> timer
> >> and the msg object are statically allocated. On the other hand, I have
> RPL
> >> and all sorts of network things going and I have no doubt there are a
> ton of
> >> timers involved. In terms of ephemeral timers, I call xtimer_usleep a
> LOT
> >> with intervals of between 1ms and 100ms from multiple threads. I also
> send
> >> packets every 200ms or so and receive them every 500ms or so.
> >>  -The interrupt load might be pretty steep if the radio is interrupting
> on
> >> every packet (promiscuous mode). I don't think it is. Otherwise I would
> >> imagine that other than the timers it is less than ten per second.
> >>
> >> As for memory corruption, that may well be the case. I will double check
> >> my code. I thought it was somewhat unusual that multiple boards would
> all
> >> get a timer pointing to itself, but I suppose not all corruption is
> >> non-deterministic and they all run identical firmware, so it might be
> >> corruption.
> >>
> >> One question, in the network stacks, are there ever two threads possibly
> >> using the same timer object? I ask because the timer_remove and the
> insert
> >> are in two different critical sections, and if there are concurrent
> calls
> >> with the same timer object then it might be possible to interrupt
> between
> >> the critical sections and insert a timer that is already in the list.
> What
> >> would then happen is that this loop would end with list_head equal to
> the
> >> timer (assuming no other timer has the same time), and then the next two
> >> lines would basically link the timer to itself.
> >>
> >> I could be wrong though, that is just a guess.
> >>
> >>
> >> On Wed, Feb 10, 2016 at 2:45 AM, Kaspar Schleiser <kaspar at schleiser.de>
> >> wrote:
> >>>
> >>> Hey Michael,
> >>>
> >>> On 02/10/2016 07:57 AM, Michael Andersen wrote:
> >>> > it seems that one of the nodes in the list points to itself, hence
> the
> >>> > endless loop.
> >>> >
> >>> > My first question is: when is this possible? It seems at first glance
> >>> > that all code paths that lead here call remove_timer to prevent this
> >>> > sort of problem.
> >>> It should not be possible (tm).
> >>>
> >>> I took another look at the code, it seems to me that timer->next gets
> >>> overwritten whenever a timer is set, so there can't be some outdated
> >>> value.
> >>>
> >>> It might be that the list logic has a bug somewhere, but I remember
> >>> testing them quite rigourously.
> >>>
> >>> > I don't access a the same timer object from two
> >>> > different threads. My code using xtimer functions is not reentered.
> >>> >
> >>> > I don't use that many timer operations in my application code, but I
> do
> >>> > assume that the following functions don't require any freeing or
> >>> > removing afterwards, am I wrong?
> >>> Completely right.
> >>>
> >>> Could you tell us more on how you are using timers?
> >>>
> >>> Interesting would be things like
> >>>
> >>> - what platform are you on
> >>> - how many timers are simultaneously active
> >>> - how are the intervals
> >>> - how is the interrupt load
> >>>
> >>> ... that might help corner the issue.
> >>>
> >>> You should consider xtimer just showing a problem which might be caused
> >>> by memory corruption.
> >>>
> >>> Kaspar
> >>> _______________________________________________
> >>> devel mailing list
> >>> devel at riot-os.org
> >>> https://lists.riot-os.org/mailman/listinfo/devel
> >>
> >>
> >>
> >> _______________________________________________
> >> devel mailing list
> >> devel at riot-os.org
> >> https://lists.riot-os.org/mailman/listinfo/devel
> >>
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel at riot-os.org
> > https://lists.riot-os.org/mailman/listinfo/devel
> >
> _______________________________________________
> devel mailing list
> devel at riot-os.org
> https://lists.riot-os.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.riot-os.org/pipermail/devel/attachments/20160212/b4c8d0d5/attachment.html>


More information about the devel mailing list