Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-979 C++ API QA
  3. ORC-959

C++ reader crash in resolving nested List columns for SearchArgument



    • Sub-task
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.0
    • 1.7.0
    • C++
    • None


      SearchArgument currently only provides interfaces using column names. Only columns of struct fields can be correctly resolved. Other columns (e.g. inside LIST or MAP) will cause crash in resolving them.

      The following codes reproduce the issue:

      #include <orc/OrcFile.hh>
      using namespace std;
      using namespace orc;
      int main() {
        ORC_UNIQUE_PTR<InputStream> inStream = readLocalFile("complextypestbl.orc");
        ReaderOptions options;
        ORC_UNIQUE_PTR<Reader> reader = createReader(move(inStream), options);
        RowReaderOptions rowReaderOptions;
        ORC_UNIQUE_PTR<SearchArgumentBuilder> sarg = SearchArgumentFactory::newBuilder();
        sarg->lessThanEquals("f", PredicateDataType::STRING, Literal("bbb", 3));
        ORC_UNIQUE_PTR<SearchArgument> final_sarg = sarg->build();
        ORC_UNIQUE_PTR<RowReader> rowReader = reader->createRowReader(rowReaderOptions);
        ORC_UNIQUE_PTR<ColumnVectorBatch> batch = rowReader->createRowBatch(1024);
        return 0;

      complextypestbl.orc is an ORC file of a ACID table with the following schema:

      id bigint
      int_array array<int>
      int_array_array array<array<int>>
      int_map map<string, int> 
      int_map_array array<map<string, int>>
      nested_struct struct<a: int, b: array<int>, c: struct<d: array<array<struct<e: int, f: string>>>>, g: map<string, struct<h: struct<i: array<double>>>>>

      The above C++ codes push down a predicate on the "f" column. GDB stacktrace for the crash:

      Program received signal SIGSEGV, Segmentation fault.
      orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
      28	      if (type.getFieldName(i) == colName) {
      (gdb) bt
      #0  orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:28
      #1  0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
      #2  0x000000000045a518 in orc::SargsApplier::findColumn (type=..., colName=...) at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:31
      #3  0x000000000045a67f in orc::SargsApplier::SargsApplier (this=0x200b9f0, type=..., searchArgument=<optimized out>, rowIndexStride=<optimized out>, writerVersion=<optimized out>)
          at /home/quanlong/workspace/orc/c++/src/sargs/SargsApplier.cc:56
      #4  0x00000000004253f8 in orc::RowReaderImpl::RowReaderImpl (this=0x2009760, _contents=..., opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:244
      #5  0x00000000004257ad in orc::ReaderImpl::createRowReader (this=<optimized out>, opts=...) at /home/quanlong/workspace/orc/c++/src/Reader.cc:765
      #6  0x000000000040b688 in main ()
      (gdb) l
      24	  // find column id from column name
      25	  uint64_t SargsApplier::findColumn(const Type& type,
      26	                                    const std::string& colName) {
      27	    for (uint64_t i = 0; i != type.getSubtypeCount(); ++i) {
      28	      if (type.getFieldName(i) == colName) {
      29	        return type.getSubtype(i)->getColumnId();
      30	      } else {
      31	        uint64_t ret = findColumn(*type.getSubtype(i), colName);
      32	        if (ret != INVALID_COLUMN_ID) {
      (gdb) p type.getKind()
      $16 = orc::LIST

      Only STRUCT type has valid field names. So the above codes crash.


        1. complextypestbl.orc
          2 kB
          Quanlong Huang

        Issue Links



              stigahuang Quanlong Huang
              stigahuang Quanlong Huang
              0 Vote for this issue
              2 Start watching this issue