Uploaded image for project: 'Kylin'
  1. Kylin
  2. KYLIN-5007

queries with limit clause may fail when string dimension is encoded in integer type

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • v3.0.2
    • v3.1.3
    • Query Engine
    • None

    Description

      Hi, team.

      Recently we encounter a problem that queries may fail if there is a LIMIT in the SQL. The SQL looks like:

      select gid from some_table group by gid limit 100
      

      The error message is like the following:

      Not sorted! last: source_v1=null,...,gid=276,... fetched: source_v1=null,...,gid=100506,...
      

      After searching the issues list, we find it is similar with KYLIN-2425, KYLIN-3089, and KYLIN-4942. We notice that these problems are not completely resolved.

      It is an row-key encoding problem, the cube uses integer:4 to encode string column gid:

      As kangkaisen mensioned in KYLIN-3089, comparator in SortMergedPartitionResultIterator is different from the one in SortedIteratorMergerWithLimit. SortedIteratorMergerWithLimit compares tuple of dimensions in their origin data type "string" rather than the encoded data type "integer" in rowkeys. In the exception message above, 276<100506 is false because they are compared in "string" type.

      It may be resolved by skipping limit pushdown when column type and encoding type may produce different comparing results, but it may lead such queries to be slower.

      Attachments

        1. image-2021-06-10-10-03-54-775.png
          131 kB
          Congling Xia

        Activity

          People

            xiacongling Congling Xia
            xiacongling Congling Xia
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: