Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-1283

ORDER BY with the first descending order causes wrong results

Details

    Description

      Each order key by can be specified with ascending or descending order.
      Recently, I found that ORDER BY with the first descending order key causes wrong result.

      If second key is a descending order, it works well. Other cases work correctly.

      select l_orderkey, l_partkey from lineitem order by l_orderkey, l_partkey desc;
      
      l_orderkey,  l_partkey
      -------------------------------
      1,  155190
      1,  67310
      1,  63700
      1,  24027
      1,  15635
      1,  2132
      2,  106170
      3,  183095
      3,  128449
      3,  62143
      3,  29380
      3,  19036
      3,  4297
      ...
      

      But, if the first sort key is a descending order, it causes wrong row number and shows wrong range part as follows:

      default> select l_orderkey, l_partkey from lineitem order by l_orderkey desc, l_partkey;
      l_orderkey,  l_partkey
      -------------------------------
      3000000,  61045
      3000000,  159113
      3000000,  167695
      3000000,  167904
      3000000,  196339
      ...
      

      According to my investigation, it seems to be related to offset problem of RowFile or index problem. The final result includes duplicated rows and the final row was wrong as follows:

      part-02-000000-000
      3000000|61045
      3000000|159113
      3000000|167695
      3000000|167904
      3000000|196339
      2999975|28334
      2999975|194023
      2999974|8020
      2999974|124152
      2999974|129921
      2999974|139248
      2999974|168914
      2999974|187923
      2999973|30533
      2999973|36196
      ...
      2919713|133486
      2919713|195963
      2919712|86257
      2919712|94542
      2919712|107370
      2919712|166342 <- duplicated rows
      2919712|178277
      ....
      1|63700
      1|67310
      1|155190
      [EOF]
      
      part-02-000001-000
      |96127                     <- looks wrong
      6000000|32255
      6000000|96127
      5999975|6452
      5999975|7272
      5999975|37131
      ....
      ....
      2919713|133486
      2919713|195963
      2919712|94542
      2919712|107370
      2919712|166342    <- duplicated rows
      [EOF]
      

      Attachments

        Activity

          People

            sirpkt Keuntae Park
            hyunsik Hyunsik Choi
            Votes:
            0 Vote for this issue
            Watchers:
            4 Stop watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment