Skip to content

Conversation

@deart2k
Copy link
Member

@deart2k deart2k commented Jun 9, 2021

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • New Feature
  • Bug Fix
  • Improvement
  • Performance Improvement
  • Backward Incompatible Change
  • Build/Testing/Packaging Improvement
  • Documentation (changelog entry is not required)
  • Other
  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

Detailed description / Documentation draft:

...

By adding documentation, you'll allow users to try your new feature immediately, not when someone else will have time to document it later. Documentation is necessary for all features that affect user experience in any way. You can add brief documentation draft above, or add documentation right into your patch as Markdown files in docs folder.

If you are doing this for the first time, it's recommended to read the lightweight Contributing to ClickHouse Documentation guide first.

@deart2k deart2k requested review from acmnu and ilejn June 9, 2021 11:53
@ilejn
Copy link

ilejn commented Jun 9, 2021

Looks good to me.

@deart2k deart2k merged commit 2ea2d48 into adqm-21.3.x Jun 9, 2021
rvasin pushed a commit that referenced this pull request Sep 29, 2022
…s and parallel KILL

Right now it is possible to call QueryStatus::addPipelineExecutor() when
the executors_mutex already acquired, it is possible when the query was
cancelled via KILL QUERY.

Here I will show some traces from debugger from a real example, where
tons of ProcessList::insert() got deadlocked.

Let's look at the lock owner for one of the threads that was deadlocked
in ProcessList::insert():

    (gdb) p *mutex
    $2 = {
      __data = {
        __owner = 46899,
      },
    }

And now let's see the stack trace of the 46899:

    #0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:103
    #1  0x00007fb65569b714 in __GI___pthread_mutex_lock (mutex=0x7fb4a9d15298) at ../nptl/pthread_mutex_lock.c:80
    #2  0x000000001b6edd91 in pthread_mutex_lock (arg=0x7fb4a9d15298) at ../src/Common/ThreadFuzzer.cpp:317
    #3  std::__1::__libcpp_mutex_lock (__m=0x7fb4a9d15298) at ../contrib/libcxx/include/__threading_support:303
    #4  std::__1::mutex::lock (this=0x7fb4a9d15298) at ../contrib/libcxx/src/mutex.cpp:33
    #5  0x0000000014c7ae63 in std::__1::lock_guard<std::__1::mutex>::lock_guard (__m=..., this=<optimized out>) at ../contrib/libcxx/include/__mutex_base:91
    #6  DB::QueryStatus::addPipelineExecutor (this=0x7fb4a9d14f90, e=0x80) at ../src/Interpreters/ProcessList.cpp:372
    #7  0x0000000015bee4a7 in DB::PipelineExecutor::PipelineExecutor (this=0x7fb4b1e53618, processors=..., elem=<optimized out>) at ../src/Processors/Executors/PipelineExecutor.cpp:54
    #12 std::__1::make_shared<DB::PipelineExecutor, std::__1::vector<std::__1::shared_ptr<DB::IProcessor>, std::__1::allocator<std::__1::shared_ptr<DB::IProcessor> > >&, DB::QueryStatus*&, void> (__args=@0x7fb63095b9b0: 0x7fb4a9d14f90, __args=@0x7fb63095b9b0: 0x7fb4a9d14f90) at ../contrib/libcxx/include/__memory/shared_ptr.h:963
    #13 DB::QueryPipelineBuilder::execute (this=0x7fb63095b8b0) at ../src/QueryPipeline/QueryPipelineBuilder.cpp:552
    #14 0x00000000158c6c27 in DB::Connection::sendExternalTablesData (this=0x7fb6545e9d98, data=...) at ../src/Client/Connection.cpp:797
    ClickHouse#27 0x0000000014043a81 in DB::RemoteQueryExecutorRoutine::operator() (this=0x7fb63095bf20, sink=...) at ../src/QueryPipeline/RemoteQueryExecutorReadContext.cpp:46
    ClickHouse#32 0x000000000a16dd4f in make_fcontext () at ../contrib/boost/libs/context/src/asm/make_x86_64_sysv_elf_gas.S:71

And also in the logs you can see very strange things for this thread:

    2022.09.13 14:14:51.228979 [ 51145 ] {1712D4E914EC7C99} <Debug> Connection (localhost:9000): Sent data for 1 external tables, total 11 rows in 0.00046389 sec., 23688 rows/sec., 3.84 KiB (8.07 MiB/sec.), compressed 1.1070121092649958 times to 3.47 KiB (7.29 MiB/sec.)
    ...
    2022.09.13 14:14:51.719402 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> executeQuery: (from 10.101.15.181:42478) KILL QUERY WHERE query_id = '1712D4E914EC7C99' (stage: Complete)
    2022.09.13 14:14:51.719488 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> executeQuery: (internal) SELECT query_id, user, query FROM system.processes WHERE query_id = '1712D4E914EC7C99' (stage: Complete)
    2022.09.13 14:14:51.719754 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Trace> ContextAccess (default): Access granted: SELECT(user, query_id, query) ON system.processes
    2022.09.13 14:14:51.720544 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
    2022.09.13 14:14:53.228964 [ 46899 ] {7c90ffa4-1dc8-42fd-938c-4e307c244394} <Debug> Connection (localhost:9000): Sent data for 2 scalars, total 2 rows in 2.6838e-05 sec., 73461 rows/sec., 68.00 B (2.38 MiB/sec.), compressed 0.4594594594594595 times to 148.00 B (5.16 MiB/sec.)

How is this possible? The answer is fibers and query cancellation
routine. During cancellation of async queries it going into fibers again
and try to do this gracefully. However because of this during canceling
query it may call QueryStatus::addPipelineExecutor() from
QueryStatus::cancelQuery().

Signed-off-by: Azat Khuzhin <[email protected]>
Demilivor pushed a commit that referenced this pull request May 15, 2023
Reordering of the static variables should be enough to fix this:

    [ RUN      ] IOResourceStaticResourceManager.Prioritization
    =================================================================
    ==13==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7f7d8d621970 at pc 0x5636b80dcbb2 bp 0x7f7c48e47dd0 sp 0x7f7c48e47dc8
    READ of size 8 at 0x7f7d8d621970 thread T3975 (ThreadPool)
        #0 0x5636b80dcbb1 in IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_1::operator()(long) const build_docker/./src/IO/Resource/tests/gtest_resource_manager_static.cpp:78:13
        #1 0x5636b80dcbb1 in IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0::operator()() const build_docker/./src/IO/Resource/tests/gtest_resource_manager_static.cpp:95:21
        #2 0x5636b80dcbb1 in decltype(std::declval<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&>()()) std::__1::__invoke[abi:v15000]<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&) build_docker/./contrib/llvm-project/libcxx/include/__functional/invoke.h:394:23
        #3 0x5636b80dcbb1 in decltype(auto) std::__1::__apply_tuple_impl[abi:v15000]<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&, std::__1::tuple<>&>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&, std::__1::tuple<>&, std::__1::__tuple_indices<>) build_docker/./contrib/llvm-project/libcxx/include/tuple:1789:1
        #4 0x5636b80dcbb1 in decltype(auto) std::__1::apply[abi:v15000]<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&, std::__1::tuple<>&>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&, std::__1::tuple<>&) build_docker/./contrib/llvm-project/libcxx/include/tuple:1798:1
        #5 0x5636b80dcbb1 in ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&)::'lambda'()::operator()() build_docker/./src/Common/ThreadPool.h:228:13
        #6 0x5636b80dcbb1 in decltype(std::declval<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>()()) std::__1::__invoke[abi:v15000]<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&)::'lambda'()&>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&) build_docker/./contrib/llvm-project/libcxx/include/__functional/invoke.h:394:23
        #7 0x5636b80dcbb1 in void std::__1::__invoke_void_return_wrapper<void, true>::__call<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&)::'lambda'()&>(ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&)::'lambda'()&) build_docker/./contrib/llvm-project/libcxx/include/__functional/invoke.h:479:9
        #8 0x5636b80dcbb1 in std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&)::'lambda'(), void ()>::operator()[abi:v15000]() build_docker/./contrib/llvm-project/libcxx/include/__functional/function.h:235:12
        #9 0x5636b80dcbb1 in void std::__1::__function::__policy_invoker<void ()>::__call_impl<std::__1::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<true>::ThreadFromGlobalPoolImpl<IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0>(IOResourceStaticResourceManager_Prioritization_Test::TestBody()::$_0&&)::'lambda'(), void ()>>(std::__1::__function::__policy_storage const*) build_docker/./contrib/llvm-project/libcxx/include/__functional/function.h:716:16
        #10 0x5636ea219310 in std::__1::__function::__policy_func<void ()>::operator()[abi:v15000]() const build_docker/./contrib/llvm-project/libcxx/include/__functional/function.h:848:16
        #11 0x5636ea219310 in std::__1::function<void ()>::operator()() const build_docker/./contrib/llvm-project/libcxx/include/__functional/function.h:1187:12
        #12 0x5636ea219310 in ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) build_docker/./src/Common/ThreadPool.cpp:416:13
        #13 0x5636ea2258ac in void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()::operator()() const build_docker/./src/Common/ThreadPool.cpp:180:73
        #14 0x5636ea2258ac in decltype(std::declval<void>()()) std::__1::__invoke[abi:v15000]<void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()>(void&&) build_docker/./contrib/llvm-project/libcxx/include/__functional/invoke.h:394:23
        #15 0x5636ea2258ac in void std::__1::__thread_execute[abi:v15000]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()>(std::__1::tuple<void, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()>&, std::__1::__tuple_indices<>) build_docker/./contrib/llvm-project/libcxx/include/thread:284:5
        #16 0x5636ea2258ac in void* std::__1::__thread_proxy[abi:v15000]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void ThreadPoolImpl<std::__1::thread>::scheduleImpl<void>(std::__1::function<void ()>, long, std::__1::optional<unsigned long>, bool)::'lambda0'()>>(void*) build_docker/./contrib/llvm-project/libcxx/include/thread:295:5
        #17 0x7f7d8fcf1608 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x8608) (BuildId: 7b4536f41cdaa5888408e82d0836e33dcf436466)
        #18 0x7f7d8fc16132 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x11f132) (BuildId: 1878e6b475720c7c51969e69ab2d276fae6d1dee)

    Address 0x7f7d8d621970 is located in stack of thread T0 at offset 368 in frame
        #0 0x5636b80d9bef in IOResourceStaticResourceManager_Prioritization_Test::TestBody() build_docker/./src/IO/Resource/tests/gtest_resource_manager_static.cpp:46

      This frame has 10 object(s):
        [32, 36) 'requests_per_thead' (line 48)
        [48, 208) 't' (line 49)
        [272, 296) 'ref.tmp' (line 51)
        [336, 352) 'last_priority' (line 74)
        [368, 376) 'check' (line 75) <== Memory access at offset 368 is inside this variable
        [400, 424) 'name' (line 83)
        [464, 512) 'ref.tmp13' (line 87)
        [544, 560) 'c' (line 101)
        [576, 600) 'ref.tmp32' (line 101)
        [640, 664) 'ref.tmp41' (line 102)

Signed-off-by: Azat Khuzhin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants