Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-1081

Non-forwarded (simple) query shows wrong rows.

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.9.0
    • Component/s: Java Client, TajoMaster
    • Labels:
      None

      Description

      Non-forward queries show wrong rows. It is the very urgent and critical bug that must be resolved before 0.9.0 release.

      default> \d region
      
      table name: default.region
      table path: file:/Users/hyunsik/tpch/region
      store type: CSV
      number of rows: 0
      volume: 494 B
      Options: 
      	'csvfile.delimiter'='|'
      
      schema: 
      r_regionkey	INT8
      r_name	TEXT
      r_comment	TEXT
      
      default> 
      default> select * from region;
      r_regionkey,  r_name,  r_comment
      -------------------------------
      ,  ,  
      2,  ,  
      ,  ,  
      ,  ,  
      ,  ,  
      ,  ,  
      ,  ,  
      ,  ,  
      ,  ,  
      ,  "
         � (0,  
      0,  AFRICA,  lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to 
      1,  AMERICA,  hs use ironic, even requests. s
      2,  ASIA,  ges. thinly even pinto beans ca
      3,  EUROPE,  ly final courts cajole furiously final excuse
      4,  MIDDLE EAST,  uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl
      (15 rows, 0.03 sec, 494 B selected)
      
      1. TAJO-1081.patch
        2 kB
        Hyunsik Choi

        Activity

        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik,
        I have examined the bug. The problem is that the "number of rows" information is not provided when a table is created (This is also true for sample TPC-H tables in the unit tests). However, computing the number of rows given a data file may take a lot of time when the table is big, stored in a binary file format, and organized in a complex file type. The cost of computing row count may be similar to that of executing the "select count from table_name" query. So, I think that we should remove the "number of rows" information in the response to "\d table_name" command. Even though this information might be computed sometime, somewhere, a "\d" command may be issued just right after a "create external table" command. How do you think about it ?

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik, I have examined the bug. The problem is that the "number of rows" information is not provided when a table is created (This is also true for sample TPC-H tables in the unit tests). However, computing the number of rows given a data file may take a lot of time when the table is big, stored in a binary file format, and organized in a complex file type. The cost of computing row count may be similar to that of executing the "select count from table_name" query. So, I think that we should remove the "number of rows" information in the response to "\d table_name" command. Even though this information might be computed sometime, somewhere, a "\d" command may be issued just right after a "create external table" command. How do you think about it ?
        Hide
        hyunsik Hyunsik Choi added a comment -

        Thank you for sharing your investigation.

        I also investigated the problem because I cannot figure out what is the main cause.

        Your investigation is partially right. We use the number of rows in the catalog that we can obtain the number by executing "\d". Sometimes, it is not available especially when users register external tables. As you mentioned, we need {{select count from..} queries.

        But, Tajo client uses the number of rows as supplementary information for LIMIT clause or displaying. Even though the number is not available, Tajo client can work well because Tajo client can read rows until scanner reaches out the end of tuples.

        I also found the bug from PlannerUtil::getNonZeroLengthDataFiles() method. This method should have used AbstractStorageManager.hiddenFileFilter in order to skip hidden files which has prefix .. The current implementation reads all files even some files are not valid.

        If you want, you can keep going this issue. Otherwise, I can take this issue.

        Best regards,
        Hyunsik

        Show
        hyunsik Hyunsik Choi added a comment - Thank you for sharing your investigation. I also investigated the problem because I cannot figure out what is the main cause. Your investigation is partially right. We use the number of rows in the catalog that we can obtain the number by executing "\d". Sometimes, it is not available especially when users register external tables. As you mentioned, we need {{select count from..} queries. But, Tajo client uses the number of rows as supplementary information for LIMIT clause or displaying. Even though the number is not available, Tajo client can work well because Tajo client can read rows until scanner reaches out the end of tuples. I also found the bug from PlannerUtil::getNonZeroLengthDataFiles() method. This method should have used AbstractStorageManager.hiddenFileFilter in order to skip hidden files which has prefix . . The current implementation reads all files even some files are not valid. If you want, you can keep going this issue. Otherwise, I can take this issue. Best regards, Hyunsik
        Hide
        mhthanh Mai Hai Thanh added a comment -

        Hi Hyunsik,
        Please take this issue. It is an urgent one.

        Show
        mhthanh Mai Hai Thanh added a comment - Hi Hyunsik, Please take this issue. It is an urgent one.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Thank you Mai for your understanding. I just took the issue.

        Show
        hyunsik Hyunsik Choi added a comment - Thank you Mai for your understanding. I just took the issue.
        Hide
        mhthanh Mai Hai Thanh added a comment - - edited

        It seems that you miss the code to fix the "number of rows" problem. The "\d" command still shows zero row count after I apply your patch.

        Show
        mhthanh Mai Hai Thanh added a comment - - edited It seems that you miss the code to fix the "number of rows" problem. The "\d" command still shows zero row count after I apply your patch.
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12672207/TAJO-1081.patch
        against master revision 52e5543.

        +1 @author. The patch does not contain any @author tags.

        -1 tests included. The patch doesn't appear to include any new or modified tests.
        Please justify why no new tests are needed for this patch.
        Also please list what manual steps were performed to verify this patch.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 212 new Findbugs (version 2.0.3) warnings.

        -1 release audit. The applied patch generated 105 release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-core tajo-storage.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/512//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/512//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/512//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/512//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/512//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12672207/TAJO-1081.patch against master revision 52e5543. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 212 new Findbugs (version 2.0.3) warnings. -1 release audit. The applied patch generated 105 release audit warnings. +1 core tests. The patch passed unit tests in tajo-core tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/512//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/512//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/512//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-core.html Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/512//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-storage.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/512//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user hyunsik opened a pull request:

        https://github.com/apache/tajo/pull/175

        TAJO-1081: Non-forwarded (simple) query shows wrong rows.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/hyunsik/tajo TAJO-1081

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/175.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #175


        commit ecc11bed916c8d5c0bfa4b0fb29770c81f57b26a
        Author: Hyunsik Choi <hyunsik@apache.org>
        Date: 2014-10-01T23:53:11Z

        TAJO-1081: Non-forwarded (simple) query shows wrong rows.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user hyunsik opened a pull request: https://github.com/apache/tajo/pull/175 TAJO-1081 : Non-forwarded (simple) query shows wrong rows. You can merge this pull request into a Git repository by running: $ git pull https://github.com/hyunsik/tajo TAJO-1081 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #175 commit ecc11bed916c8d5c0bfa4b0fb29770c81f57b26a Author: Hyunsik Choi <hyunsik@apache.org> Date: 2014-10-01T23:53:11Z TAJO-1081 : Non-forwarded (simple) query shows wrong rows.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Hi Mai Hai Thanh,

        I also fixed the wrong row number problem. Thank you for the comment.

        Show
        hyunsik Hyunsik Choi added a comment - Hi Mai Hai Thanh , I also fixed the wrong row number problem. Thank you for the comment.
        Hide
        mhthanh Mai Hai Thanh added a comment -

        +1 LGTM

        Show
        mhthanh Mai Hai Thanh added a comment - +1 LGTM
        Hide
        jihoonson Jihoon Son added a comment -

        +1 LGTM, too.

        Show
        jihoonson Jihoon Son added a comment - +1 LGTM, too.
        Hide
        hyunsik Hyunsik Choi added a comment -

        Thank you all guys for the reviews. I'll commit it shortly.

        Show
        hyunsik Hyunsik Choi added a comment - Thank you all guys for the reviews. I'll commit it shortly.
        Hide
        hyunsik Hyunsik Choi added a comment -

        committed. Fixing it as resolved.

        Show
        hyunsik Hyunsik Choi added a comment - committed. Fixing it as resolved.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/175

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/175
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-build #384 (See https://builds.apache.org/job/Tajo-master-build/384/)
        TAJO-1081: Non-forwarded (simple) query shows wrong rows. (hyunsik: rev 3d029fa90e483e6d0678b8b6c72c4a2410e97399)

        • tajo-client/src/main/java/org/apache/tajo/cli/DescTableCommand.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
        • tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
        • tajo-client/src/main/java/org/apache/tajo/client/TajoClient.java
        • CHANGES
        • tajo-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java
        • tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-build #384 (See https://builds.apache.org/job/Tajo-master-build/384/ ) TAJO-1081 : Non-forwarded (simple) query shows wrong rows. (hyunsik: rev 3d029fa90e483e6d0678b8b6c72c4a2410e97399) tajo-client/src/main/java/org/apache/tajo/cli/DescTableCommand.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-client/src/main/java/org/apache/tajo/client/TajoClient.java CHANGES tajo-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #26 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/26/)
        TAJO-1081: Non-forwarded (simple) query shows wrong rows. (hyunsik: rev 3d029fa90e483e6d0678b8b6c72c4a2410e97399)

        • tajo-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java
        • tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        • tajo-client/src/main/java/org/apache/tajo/cli/DescTableCommand.java
        • tajo-client/src/main/java/org/apache/tajo/client/TajoClient.java
        • tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
        • CHANGES
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #26 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/26/ ) TAJO-1081 : Non-forwarded (simple) query shows wrong rows. (hyunsik: rev 3d029fa90e483e6d0678b8b6c72c4a2410e97399) tajo-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java tajo-client/src/main/java/org/apache/tajo/cli/DescTableCommand.java tajo-client/src/main/java/org/apache/tajo/client/TajoClient.java tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java CHANGES
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #385 (See https://builds.apache.org/job/Tajo-master-build/385/)
        TAJO-1081: Non-forwarded (simple) query shows wrong rows. (missed fix for test) (hyunsik: rev a90438f9846519b63a80376ec09b760a22bbc951)

        • tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #385 (See https://builds.apache.org/job/Tajo-master-build/385/ ) TAJO-1081 : Non-forwarded (simple) query shows wrong rows. (missed fix for test) (hyunsik: rev a90438f9846519b63a80376ec09b760a22bbc951) tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-CODEGEN-build #27 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/27/)
        TAJO-1081: Non-forwarded (simple) query shows wrong rows. (missed fix for test) (hyunsik: rev a90438f9846519b63a80376ec09b760a22bbc951)

        • tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-CODEGEN-build #27 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/27/ ) TAJO-1081 : Non-forwarded (simple) query shows wrong rows. (missed fix for test) (hyunsik: rev a90438f9846519b63a80376ec09b760a22bbc951) tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-block_iteration-branch-build #15 (See https://builds.apache.org/job/Tajo-block_iteration-branch-build/15/)
        TAJO-1081: Non-forwarded (simple) query shows wrong rows. (hyunsik: rev 3d029fa90e483e6d0678b8b6c72c4a2410e97399)

        • tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java
        • tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java
        • tajo-client/src/main/java/org/apache/tajo/client/TajoClient.java
        • tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
        • CHANGES
        • tajo-client/src/main/java/org/apache/tajo/cli/DescTableCommand.java
          TAJO-1081: Non-forwarded (simple) query shows wrong rows. (missed fix for test) (hyunsik: rev a90438f9846519b63a80376ec09b760a22bbc951)
        • tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-block_iteration-branch-build #15 (See https://builds.apache.org/job/Tajo-block_iteration-branch-build/15/ ) TAJO-1081 : Non-forwarded (simple) query shows wrong rows. (hyunsik: rev 3d029fa90e483e6d0678b8b6c72c4a2410e97399) tajo-core/src/main/java/org/apache/tajo/master/GlobalEngine.java tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java tajo-storage/src/main/java/org/apache/tajo/storage/AbstractStorageManager.java tajo-client/src/main/java/org/apache/tajo/client/TajoClient.java tajo-core/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java CHANGES tajo-client/src/main/java/org/apache/tajo/cli/DescTableCommand.java TAJO-1081 : Non-forwarded (simple) query shows wrong rows. (missed fix for test) (hyunsik: rev a90438f9846519b63a80376ec09b760a22bbc951) tajo-client/src/main/java/org/apache/tajo/cli/DefaultTajoCliOutputFormatter.java

          People

          • Assignee:
            hyunsik Hyunsik Choi
            Reporter:
            hyunsik Hyunsik Choi
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development