Skip to content

rtps::History::get_change_nts segfaults in 2.6.11 #6291

@mhidalgo-rai

Description

@mhidalgo-rai

Is there an already existing issue for this?

  • I have searched the existing issues

Expected behavior

rmw_fastrtps_cpp works.

Current behavior

rmw_fastrtps_cpp segfaults frequently.

Steps to reproduce

Unfortunately, I don't have an MRE. This happens in a heavily multi-threaded, multi-process application. Best I have right now is a stack trace:

#0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=125245314811456) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=11, threadid=125245314811456) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=125245314811456, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
#3  0x000071f02855d476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
#4  0x000071f02855d520 in <signal handler called> () at /lib/x86_64-linux-gnu/libc.so.6
#5  0x000071ebd92422af in eprosima::fastrtps::rtps::GuidPrefix_t::operator==(eprosima::fastrtps::rtps::GuidPrefix_t const&) const
    (prefix=..., this=0x71e9d802a) at ./include/fastdds/rtps/common/GuidPrefix_t.hpp:62
#6  eprosima::fastrtps::rtps::operator==(eprosima::fastrtps::rtps::GUID_t const&, eprosima::fastrtps::rtps::GUID_t const&) (g2=..., g1=...)
    at ./include/fastdds/rtps/common/Guid.h:138
#7  eprosima::fastrtps::rtps::History::get_change_nts(eprosima::fastrtps::rtps::SequenceNumber_t const&, eprosima::fastrtps::rtps::GUID_t const&, eprosima::fastrtps::rtps::CacheChange_t**, __gnu_cxx::__normal_iterator<eprosima::fastrtps::rtps::CacheChange_t* const*, std::vector<eprosima::fastrtps::rtps::CacheChange_t*, std::allocator<eprosima::fastrtps::rtps::CacheChange_t*> > >) const
   Python Exception <class 'gdb.error'>: value has been optimized out
 (this=<optimized out>, seq=..., guid=..., change=0x71e8f27fb988, hint=) at ./src/cpp/rtps/history/History.cpp:189
#8  0x000071ebd925c3d6 in eprosima::fastrtps::rtps::StatefulReader::NotifyChanges(eprosima::fastrtps::rtps::WriterProxy*)
    (this=this@entry=0x71eaec526b90, prox=<optimized out>) at ./src/cpp/rtps/reader/StatefulReader.cpp:1094
#9  0x000071ebd925ce3b in eprosima::fastrtps::rtps::StatefulReader::change_received(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::WriterProxy*, unsigned long)
    (this=0x71eaec526b90, a_change=0x71ebe802abc0, prox=<optimized out>, unknown_missing_changes_up_to=<optimized out>)
    at ./src/cpp/rtps/reader/StatefulReader.cpp:1042
#10 0x000071ebd925d359 in eprosima::fastrtps::rtps::StatefulReader::processDataMsg(eprosima::fastrtps::rtps::CacheChange_t*)
    (this=0x71eaec526b90, change=0x626f0eff9640) at ./src/cpp/rtps/reader/StatefulReader.cpp:610
#11 0x000071ebd9233bdc in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_to_intraprocesses(eprosima::fastrtps::rtps::CacheChange_t*)
    (this=0x626f0eff87f0, change=0x626f0eff9640) at ./src/cpp/rtps/writer/StatefulWriter.cpp:630
#12 0x000071ebd923bfd5 in eprosima::fastrtps::rtps::StatefulWriter::deliver_sample_nts(eprosima::fastrtps::rtps::CacheChange_t*, eprosima::fastrtps::rtps::RTPSMessageGroup&, eprosima::fastrtps::rtps::LocatorSelectorSender&, std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&)
    (this=this@entry=0x626f0eff87f0, cache_change=cache_change@entry=0x626f0eff9640, group=..., locator_selector=..., max_blocking_time=...)
    at ./src/cpp/rtps/writer/StatefulWriter.cpp:2107
#13 0x000071ebd942abb6 in eprosima::fastdds::rtps::FlowControllerImpl<eprosima::fastdds::rtps::FlowControllerSyncPublishMode, eprosima::fastdds::rtps::FlowControllerFifoSchedule>::run() (this=0x626f0e3729c0) at ./src/cpp/rtps/flowcontrol/FlowControllerImpl.hpp:1341
#14 0x000071f024682253 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x000071f0285afac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#16 0x000071f0286418c0 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

The other insight I have is that this doesn't happen in 2.6.10, only 2.6.11.

Fast DDS version/commit

2.6.11

Platform/Architecture

Other. Please specify in Additional context section.

Transport layer

Default configuration, UDPv4 & SHM

Additional context

Ubuntu 22.04 (dockerized). Even though this arguably belongs to https://github.com/ros2/rmw_fastrtps, I'm filing here because this issue started showing up for us when 2.6.11 was released to Humble. I parsed the diff between 2.6.10 and 2.6.11. I can't put my finger on the breaking change.

XML configuration file

Relevant log output

Network traffic capture

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageIssue pending classification

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions