Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7586

Incorrect results when querying primary = "\"" in Kudu and HBase

    XMLWordPrintableJSON

Details

    Description

      Version string from catalogd web ui:

      catalogd version 3.1.0-cdh6.x-SNAPSHOT RELEASE (build 8baac7f5849b6bacb02fedeb9b3fe2b2ee9450ee)
      

      A reproduction script for the impala-shell:

      create table test(name string, primary key(name) ) stored as kudu;
      
      insert into test values ("\"");
      -- Modified 1 row(s), 0 row error(s) in 4.01s
      
      -- row found in full table scan
      select * from test;
      -- Fetched 1 row(s) in 0.15s
      
      -- row not found on = predicate (pushed to kudu)
      select * from test where name="\"";
      -- Fetched 0 row(s) in 0.13s
      
      -- row found when predicate cannot be pushed to kudu
      select * from test where name like "\"";
      -- Fetched 1 row(s) in 0.13s
      

      This was originally reported asĀ KUDU-2575. I tried to reproduce directly against Kudu using the python client but got the expected result.

      From the plan and profile, Impala is pushing down the predicate, but Kudu is not being scanned, possibly because the Kudu client short-circuits the scan as having no results based on the predicate Impala pushes down.

      00:SCAN KUDU [default.test]
         kudu predicates: name = '"'
         mem-estimate=0B mem-reservation=0B thread-reservation=1
         tuple-ids=0 row-size=15B cardinality=unavailable
         in pipelines: 00(GETNEXT)
      
      KUDU_SCAN_NODE (id=0)
                - AverageScannerThreadConcurrency: 0.00 (0.0)
                - InactiveTotalTime: 0ns (0)
                - KuduRemoteScanTokens: 0 (0)
                - MaterializeTupleTime(*): 0ns (0)
                - NumScannerThreadMemUnavailable: 0 (0)
                - NumScannerThreadsStarted: 1 (1)
                - PeakMemoryUsage: 24.0 KiB (24576)
                - PeakScannerThreadConcurrency: 1 (1)
                - RowBatchBytesEnqueued: 16.0 KiB (16384)
                - RowBatchQueueGetWaitTime: 0ns (0)
                - RowBatchQueuePeakMemoryUsage: 0 B (0)
                - RowBatchQueuePutWaitTime: 0ns (0)
                - RowBatchesEnqueued: 1 (1)
                - RowsRead: 0 (0)
      ===>  - RowsReturned: 0 (0)
                - RowsReturnedRate: 0 per second (0)
                - ScanRangesComplete: 1 (1)
                - ScannerThreadsInvoluntaryContextSwitches: 0 (0)
                - ScannerThreadsTotalWallClockTime: 0ns (0)
                  - ScannerThreadsSysTime: 158.00us (158000)
                  - ScannerThreadsUserTime: 0ns (0)
                - ScannerThreadsVoluntaryContextSwitches: 2 (2)
      ===>  - TotalKuduScanRoundTrips: 0 (0)
                - TotalTime: 1ms (1999972)
      

      I also confirmed Kudu sees no scan from Impala for this query using the /scans page of the tablet servers.

      Full profile attached.

      Attachments

        1. impalakudu_pred_bug.profile
          21 kB
          William Berkeley

        Issue Links

          Activity

            People

              tarmstrong Tim Armstrong
              wdberkeley William Berkeley
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: