Uploaded image for project: 'Apache NiFi MiNiFi C++'
  1. Apache NiFi MiNiFi C++
  2. MINIFICPP-1675

Stack overflow in ExtractText for long matches on regular expressions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.12.0
    • None

    Description

      Regex search can crash in ExtractText::ReadCallback::process at ExtractText.cpp:184 when calling 

      while (std::regex_search(workStr, matches, rgx)) {
      

      due to stack overflow in long line matches.

      Possible options to handle the issue suggested by szaszm:

      1. Check out if <regex.h> regcomp/regexec works better.  Revert https://github.com/apache/nifi-minifi-cpp/pull/1159/files and change to always use regcomp/regexec on libstdc++ (or on Linux if it's not possible to check)
      2. Check if using POSIX regex instead of ECMAScript avoids the issue with std::regex.
      3. Add option to limit the matched string size. I would make this a property in minifi.properties rather than the processor, since this is a standard implementation-specific issue, not a processor-specific one.

      An example backtrace with over 2000 frames:

      #2010 0x00007f073a7ae987 in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_search_from_first (this=0x7f07350ce6b0) at /usr/include/c++/10.2.1/bits/regex_executor.h:101
      #2011 0x00007f073565469c in std::__detail::_Executor<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, std::__cxx11::regex_traits<char>, true>::_M_search (this=0x7f07350ce6b0) at /usr/include/c++/10.2.1/bits/regex_executor.tcc:42
      #2012 0x00007f073565364f in std::__detail::__regex_algo_impl<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, char, std::__cxx11::regex_traits<char>, (std::__detail::_RegexExecutorPolicy)0, false> (__s=..., __e=..., __m=..., __re=..., __flags=0) at /usr/include/c++/10.2.1/bits/regex.tcc:82
      #2013 0x00007f0735651ad4 in std::regex_search<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, char, std::__cxx11::regex_traits<char> > (__s=..., __e=..., __m=..., __re=..., __flags=0) at /usr/include/c++/10.2.1/bits/regex.h:2337#2014 0x00007f073565075a in std::regex_search<std::char_traits<char>, std::allocator<char>, std::allocator<std::__cxx11::sub_match<__gnu_cxx::__normal_iterator<char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >, char, std::__cxx11::regex_traits<char> > (__s=..., __m=..., __e=..., __f=0) at /usr/include/c++/10.2.1/bits/regex.h:2443
      #2015 0x00007f073564d3ac in org::apache::nifi::minifi::processors::ExtractText::ReadCallback::process (this=0x7f07350ced40, stream=...) at /opt/minifi/extensions/standard-processors/processors/ExtractText.cpp:184
      #2016 0x00007f073a6ffc36 in org::apache::nifi::minifi::core::ProcessSession::read (this=0x7f07346509a0, flow=..., callback=0x7f07350ced40) at /opt/minifi/libminifi/src/core/ProcessSession.cpp:317
      #2017 0x00007f073564cc5a in org::apache::nifi::minifi::processors::ExtractText::onTrigger (this=0x7f0735365b40, context=0x7f07357842f0, session=0x7f07346509a0) at /opt/minifi/extensions/standard-processors/processors/ExtractText.cpp:111
      #2018 0x00007f073a71bd4e in org::apache::nifi::minifi::core::Processor::onTrigger (this=0x7f0735365b40, context=..., session=...) at /opt/minifi/libminifi/include/core/Processor.h:222
      #2019 0x00007f073a71a4fb in org::apache::nifi::minifi::core::Processor::onTrigger (this=0x7f0735365b40, context=..., sessionFactory=...) at /opt/minifi/libminifi/src/core/Processor.cpp:244
      #2020 0x00007f073a63c51c in org::apache::nifi::minifi::SchedulingAgent::onTrigger (this=0x7f0739793b60, processor=..., processContext=..., sessionFactory=...) at /opt/minifi/libminifi/src/SchedulingAgent.cpp:120
      #2021 0x00007f073a5c3290 in org::apache::nifi::minifi::EventDrivenSchedulingAgent::run (this=0x7f0739793b60, processor=..., processContext=..., sessionFactory=...) at /opt/minifi/libminifi/src/EventDrivenSchedulingAgent.cpp:45
      #2022 0x00007f073a642604 in operator() (__closure=0x7f0735786ad0) at /opt/minifi/libminifi/src/ThreadedSchedulingAgent.cpp:100
      #2023 0x00007f073a643ac2 in std::__invoke_impl<org::apache::nifi::minifi::utils::TaskRescheduleInfo, org::apache::nifi::minifi::ThreadedSchedulingAgent::schedule(std::shared_ptr<org::apache::nifi::minifi::core::Processor>)::<lambda()>&>(std::__invoke_other, struct {...} &)
          (__f=...) at /usr/include/c++/10.2.1/bits/invoke.h:60
      #2024 0x00007f073a6439c2 in std::__invoke_r<org::apache::nifi::minifi::utils::TaskRescheduleInfo, org::apache::nifi::minifi::ThreadedSchedulingAgent::schedule(std::shared_ptr<org::apache::nifi::minifi::core::Processor>)::<lambda()>&>(struct {...} &) (__fn=...)
          at /usr/include/c++/10.2.1/bits/invoke.h:113
      #2025 0x00007f073a643842 in std::_Function_handler<org::apache::nifi::minifi::utils::TaskRescheduleInfo(), org::apache::nifi::minifi::ThreadedSchedulingAgent::schedule(std::shared_ptr<org::apache::nifi::minifi::core::Processor>)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/10.2.1/bits/std_function.h:291
      #2026 0x00007f073a64194c in std::function<org::apache::nifi::minifi::utils::TaskRescheduleInfo ()>::operator()() const (this=0x7f07350cf1b0) at /usr/include/c++/10.2.1/bits/std_function.h:622
      #2027 0x00007f073a6416ef in org::apache::nifi::minifi::utils::Worker<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::run (this=0x7f07350cf180) at /opt/minifi/libminifi/include/utils/ThreadPool.h:96
      #2028 0x00007f073a836984 in org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::run_tasks (this=0x7f07380b2f80, thread=...) at /opt/minifi/libminifi/src/utils/ThreadPool.cpp:52
      #2029 0x00007f073a8552e4 in std::__invoke_impl<void, void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*&)(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>), org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*&, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>&> (__f=
          @0x7f0735365010: (void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*)(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo> * const, std::shared_ptr<org::apache::nifi::--Type <RET> for more, q to quit, c to continue without paging--
      minifi::utils::WorkerThread>)) 0x7f073a83676c <org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::run_tasks(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>, __t=@0x7f0735365030: 0x7f07380b2f80)
          at /usr/include/c++/10.2.1/bits/invoke.h:73
      #2030 0x00007f073a854213 in std::__invoke<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*&)(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>), org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*&, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>&> (__fn=
          @0x7f0735365010: (void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*)(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo> * const, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)) 0x7f073a83676c <org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::run_tasks(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>) at /usr/include/c++/10.2.1/bits/invoke.h:95
      #2031 0x00007f073a85214c in std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>::__call<void, , 0ul, 1ul>(std::tuple<>&&, std::_Index_tuple<0ul, 1ul>) (this=0x7f0735365010, __args=...) at /usr/include/c++/10.2.1/functional:416
      #2032 0x00007f073a84fa97 in std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>::operator()<, void>() (this=0x7f0735365010) at /usr/include/c++/10.2.1/functional:499
      #2033 0x00007f073a84b606 in std::__invoke_impl<void, std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>&>(std::__invoke_other, std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>&) (__f=...)
          at /usr/include/c++/10.2.1/bits/invoke.h:60
      #2034 0x00007f073a846358 in std::__invoke_r<void, std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>&>(std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)>&) (__fn=...) at /usr/include/c++/10.2.1/bits/invoke.h:110
      #2035 0x00007f073a841f1e in std::_Function_handler<void (), std::_Bind<void (org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::*(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>*, std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>))(std::shared_ptr<org::apache::nifi::minifi::utils::WorkerThread>)> >::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/10.2.1/bits/std_function.h:291
      #2036 0x0000560140de593e in std::function<void ()>::operator()() const (this=0x7f0735365048) at /usr/include/c++/10.2.1/bits/std_function.h:622
      #2037 0x00007f073a83d8bc in org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}::operator()() (this=0x7f0735365048) at /opt/minifi/libminifi/include/utils/ThreadPool.h:290
      #2038 0x00007f073a85821e in std::__invoke_impl<void, org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}>(std::__invoke_other, org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}&&) (__f=...) at /usr/include/c++/10.2.1/bits/invoke.h:60
      #2039 0x00007f073a857e37 in std::__invoke<org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}>(org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}&&) (__fn=...) at /usr/include/c++/10.2.1/bits/invoke.h:95
      #2040 0x00007f073a857af0 in std::thread::_Invoker<std::tuple<org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}> >::_M_invoke<0ul>(std::_Index_tuple<0ul>) (
          this=0x7f0735365048) at /usr/include/c++/10.2.1/thread:264
      #2041 0x00007f073a8578ee in std::thread::_Invoker<std::tuple<org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}> >::operator()() (this=0x7f0735365048)
          at /usr/include/c++/10.2.1/thread:271
      #2042 0x00007f073a8576c8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<org::apache::nifi::minifi::utils::ThreadPool<org::apache::nifi::minifi::utils::TaskRescheduleInfo>::createThread(std::function<void ()>&&)::{lambda()#1}> > >::_M_run() (
          this=0x7f0735365040) at /usr/include/c++/10.2.1/thread:215
      #2043 0x00007f0739c2675b in ?? () from /usr/lib/libstdc++.so.6
      #2044 0x00007f073ad6719e in ?? () from /lib/ld-musl-x86_64.so.1
      #2045 0x0000000000000000 in ?? ()
      

      In the crash scenario the regex used was "^(.*)(Z std(err|out) F )" on the following log line, which may need to be extended to get the operation to crash on a local system: 

      2021-10-25T14:07:57.646713997Z stderr F {\"level\":\"error\",\"ts\":\"2021-10-25T14:07:57.646Z\",\"msg\":\"Operation failed with internal error.\",\"service\":\"cadence-history\",\"error\":\"InternalServiceError{Message: AppendHistoryNodes failed. Failed to start transaction. Error: context deadline exceeded}\",\"metric-scope\":228,\"logging-call-at\":\"persistenceMetricClients.go:1448\",\"stacktrace\":\"github.com/uber/cadence/common/log/loggerimpl.(*loggerImpl).Error\\n\\t/cadence/common/log/loggerimpl/logger.go:134\\ngithub.com/uber/cadence/common/persistence.(*historyPersistenceClient).updateErrorMetric\\n\\t/cadence/common/persistence/persistenceMetricClients.go:1448\\ngithub.com/uber/cadence/common/persistence.(*historyPersistenceClient).AppendHistoryNodes\\n\\t/cadence/common/persistence/persistenceMetricClients.go:1326\\ngithub.com/uber/cadence/service/history/shard.(*contextImpl).AppendHistoryV2Events\\n\\t/cadence/service/history/shard/context.go:965\\ngithub.com/uber/cadence/service/history/execution.(*contextImpl).appendHistoryV2EventsWithRetry.func1\\n\\t/cadence/service/history/execution/context.go:969\\ngithub.com/uber/cadence/common/backoff.Retry\\n\\t/cadence/common/backoff/retry.go:105\\ngithub.com/uber/cadence/service/history/execution.(*contextImpl).appendHistoryV2EventsWithRetry\\n\\t/cadence/service/history/execution/context.go:973\\ngithub.com/uber/cadence/service/history/execution.(*contextImpl).PersistFirstWorkflowEvents\\n\\t/cadence/service/history/execution/context.go:913\\ngithub.com/uber/cadence/service/history.(*historyEngineImpl).startWorkflowHelper\\n\\t/cadence/service/history/historyEngine.go:674\\ngithub.com/uber/cadence/service/history.(*historyEngineImpl).StartWorkflowExecution\\n\\t/cadence/service/history/historyEngine.go:554\\ngithub.com/uber/cadence/service/history.(*handlerImpl).StartWorkflowExecution\\n\\t/cadence/service/history/handler.go:721\\ngithub.com/uber/cadence/service/history.ThriftHandler.StartWorkflowExecution\\n\\t/cadence/service/history/thriftHandler.go:270\\ngithub.com/uber/cadence/.gen/go/history/historyserviceserver.handler.StartWorkflowExecution\\n\\t/cadence/.gen/go/history/historyserviceserver/server.go:1356\\ngo.uber.org/yarpc/encoding/thrift.thriftUnaryHandler.Handle\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/encoding/thrift/inbound.go:61\\ngo.uber.org/yarpc/internal/observability.(*Middleware).Handle\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/internal/observability/middleware.go:141\\ngo.uber.org/yarpc/api/middleware.unaryHandlerWithMiddleware.Handle\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/api/middleware/inbound.go:71\\ngo.uber.org/yarpc/api/transport.InvokeUnaryHandler\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/api/transport/handler_invoker.go:70\\ngo.uber.org/yarpc/transport/tchannel.handler.callHandler\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/transport/tchannel/handler.go:215\\ngo.uber.org/yarpc/transport/tchannel.handler.handle\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/transport/tchannel/handler.go:118\\ngo.uber.org/yarpc/transport/tchannel.handler.Handle\\n\\t/go/pkg/mod/go.uber.org/yarpc@v1.42.0/transport/tchannel/handler.go:107\\ngithub.com/uber/tchannel-go.channelHandler.Handle\\n\\t/go/pkg/mod/github.com/uber/tchannel-go@v1.16.0/handlers.go:126\\ngithub.com/uber/tchannel-go.(*Connection).dispatchInbound\\n\\t/go/pkg/mod/github.com/uber/tchannel-go@v1.16.0/inbound.go:203\"}\n
      

       

      Attachments

        1. core.tar.gz
          7.91 MB
          Gábor Gyimesi

        Issue Links

          Activity

            People

              lordgamez Gábor Gyimesi
              lordgamez Gábor Gyimesi
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m