Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0, 0.9.0
    • Component/s: None
    • Labels:
      None

      Description

      Add FileScanner and FileAppender for reading from and writing to Avro.

      1. TAJO-711_140415_rebased.patch
        46 kB
        Hyunsik Choi
      2. TAJO-711_20140413_20:36:40.patch
        46 kB
        David Chen
      3. TAJO-711_20140413_21:00:34.patch
        46 kB
        David Chen
      4. TAJO-711_20140413_21:46:27.patch
        46 kB
        David Chen
      5. TAJO-711_20140414_11:07:13.patch
        46 kB
        David Chen
      6. TAJO-711_20140415_11:13:43.patch
        46 kB
        David Chen
      7. TAJO-711.patch
        46 kB
        David Chen
      8. TAJO-711.patch
        45 kB
        David Chen

        Activity

        Hide
        davidzchen David Chen added a comment -

        Created a review request against branch master in reviewboard
        https://reviews.apache.org/r/20082/

        Show
        davidzchen David Chen added a comment - Created a review request against branch master in reviewboard https://reviews.apache.org/r/20082/
        Hide
        davidzchen David Chen added a comment -

        I have an initial implementation of this done and have posted an RB. There are still a few changes and validation work I would like to do before this is fully ready:

        • Test the use of avro.schema.url. Currently, the tests only test avro.schema.literal.
        • Converting between Avro and Tajo is slightly tricky because data sets would usually use the Avro schema as the "true" schema. I would like to do some more validation to look for some more corner cases. For this ticket, I'll do a best-effort validation with flat schemas, though Avro support might not be truly battle-tested until TAJO-710 is done because most of our data here at LinkedIn have nested schemas.

        Something else we would want to look at is schema evolution across partitions. I haven't looked too closely at TAJO-283 yet, but are we storing table properties into the partitions? For example, say that partitions i...j are created with Avro schema A, set by either the avro.schema.url or avro.schema.literal property. Now, partitions j+1...k are created with an evolved Avro schema A'. Does the current implementation of partitions in Tajo support storing such properties within the partitions? In any event, if this might an issue, we can create a separate ticket for this work.

        Show
        davidzchen David Chen added a comment - I have an initial implementation of this done and have posted an RB. There are still a few changes and validation work I would like to do before this is fully ready: Test the use of avro.schema.url . Currently, the tests only test avro.schema.literal . Converting between Avro and Tajo is slightly tricky because data sets would usually use the Avro schema as the "true" schema. I would like to do some more validation to look for some more corner cases. For this ticket, I'll do a best-effort validation with flat schemas, though Avro support might not be truly battle-tested until TAJO-710 is done because most of our data here at LinkedIn have nested schemas. Something else we would want to look at is schema evolution across partitions. I haven't looked too closely at TAJO-283 yet, but are we storing table properties into the partitions? For example, say that partitions i...j are created with Avro schema A, set by either the avro.schema.url or avro.schema.literal property. Now, partitions j+1...k are created with an evolved Avro schema A'. Does the current implementation of partitions in Tajo support storing such properties within the partitions? In any event, if this might an issue, we can create a separate ticket for this work.
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12638977/TAJO-711.patch
        against master revision 5b0cf0d.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 4 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-storage.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/312//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/312//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/312//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12638977/TAJO-711.patch against master revision 5b0cf0d. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 4 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/312//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/312//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/312//console This message is automatically generated.
        Hide
        hyunsik Hyunsik Choi added a comment -

        The patch is very straightforward. Your on-going work looks great to me.

        As you aksed in TAJO-710, we firstly need to consider how to specify nested schema in our DDL statements. I'll also spend some of my own time on the consideration of DDL statement.

        The ideas you mentioned look really interesting. Avro schemas usually are too long to be inserted into DDL statements. avro.schema.url looks helpful and seems to very cool idea. When it comes to per-partition properties, currently, Tajo does not store the partition entries in catalog. For each query on partitioned tables, Tajo traverses matched partition directories in HDFS according to partition predicates. We should change this part to use partition entries stored in catalog (i.e., RDBMS), and also should add 'ALTER TABLE ADD/DROP PARTITION' statements. So, a partition does not have table properties. With the partition improvement work, we also need to allow each partition to have physical properties. I've created the Jira issue (TAJO-744) for them.

        From your idea about schema evolving, I also got some rough idea about some different kind of table to explicitly support the schema evolving for accumulated historical partitions for long time. If it is necessary, it would be nice to consider this kind of table.

        Show
        hyunsik Hyunsik Choi added a comment - The patch is very straightforward. Your on-going work looks great to me. As you aksed in TAJO-710 , we firstly need to consider how to specify nested schema in our DDL statements. I'll also spend some of my own time on the consideration of DDL statement. The ideas you mentioned look really interesting. Avro schemas usually are too long to be inserted into DDL statements. avro.schema.url looks helpful and seems to very cool idea. When it comes to per-partition properties, currently, Tajo does not store the partition entries in catalog. For each query on partitioned tables, Tajo traverses matched partition directories in HDFS according to partition predicates. We should change this part to use partition entries stored in catalog (i.e., RDBMS), and also should add 'ALTER TABLE ADD/DROP PARTITION' statements. So, a partition does not have table properties. With the partition improvement work, we also need to allow each partition to have physical properties. I've created the Jira issue ( TAJO-744 ) for them. From your idea about schema evolving, I also got some rough idea about some different kind of table to explicitly support the schema evolving for accumulated historical partitions for long time. If it is necessary, it would be nice to consider this kind of table.
        Hide
        davidzchen David Chen added a comment -

        Updated the review request against branch master in reviewboard

        Show
        davidzchen David Chen added a comment - Updated the review request against branch master in reviewboard
        Hide
        davidzchen David Chen added a comment -

        Hi Hyunsik,

        Sorry for the delay. I have been a bit busy this week. I have updated the tests to also test avro.schema.url, and the tests pass. I think this patch is ready to be committed now.

        Thanks for opening the ticket. I agree that we may want to store partition entries in the catalog as well. From what I understand, Hive currently does this and is how it handles this kind of schema evolution.

        I'm interesting in learning more about your idea for the table to explicitly support schema evolution. Can you elaborate on how that would work?

        Thanks,
        David

        Show
        davidzchen David Chen added a comment - Hi Hyunsik, Sorry for the delay. I have been a bit busy this week. I have updated the tests to also test avro.schema.url, and the tests pass. I think this patch is ready to be committed now. Thanks for opening the ticket. I agree that we may want to store partition entries in the catalog as well. From what I understand, Hive currently does this and is how it handles this kind of schema evolution. I'm interesting in learning more about your idea for the table to explicitly support schema evolution. Can you elaborate on how that would work? Thanks, David
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640015/TAJO-711_20140413_20%3A36%3A40.patch
        against master revision 8982684.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage:
        org.apache.tajo.storage.TestMergeScanner
        org.apache.tajo.storage.v2.TestStorages
        org.apache.tajo.storage.TestStorages

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/331//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/331//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/331//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/331//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640015/TAJO-711_20140413_20%3A36%3A40.patch against master revision 8982684. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings. -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage: org.apache.tajo.storage.TestMergeScanner org.apache.tajo.storage.v2.TestStorages org.apache.tajo.storage.TestStorages Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/331//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/331//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/331//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/331//console This message is automatically generated.
        Hide
        davidzchen David Chen added a comment - - edited

        Actually, looks like I spoke too soon. Interestingly, the tests passed on my machine but failed on Jenkins. I'm looking into the test failures now.

        Update: What is strange is that the test errors appear to be caused by StorageType.AVRO not being found. I'll check to see whether there was any issues with the patch itself.

        Show
        davidzchen David Chen added a comment - - edited Actually, looks like I spoke too soon. Interestingly, the tests passed on my machine but failed on Jenkins. I'm looking into the test failures now. Update: What is strange is that the test errors appear to be caused by StorageType.AVRO not being found. I'll check to see whether there was any issues with the patch itself.
        Hide
        davidzchen David Chen added a comment -

        Updated the review request against branch master in reviewboard
        https://reviews.apache.org/r/20082/

        Show
        davidzchen David Chen added a comment - Updated the review request against branch master in reviewboard https://reviews.apache.org/r/20082/
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640018/TAJO-711_20140413_21%3A00%3A34.patch
        against master revision 8982684.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage:
        org.apache.tajo.storage.TestMergeScanner
        org.apache.tajo.storage.v2.TestStorages
        org.apache.tajo.storage.TestStorages

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/332//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/332//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/332//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/332//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640018/TAJO-711_20140413_21%3A00%3A34.patch against master revision 8982684. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings. -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage: org.apache.tajo.storage.TestMergeScanner org.apache.tajo.storage.v2.TestStorages org.apache.tajo.storage.TestStorages Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/332//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/332//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/332//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/332//console This message is automatically generated.
        Hide
        davidzchen David Chen added a comment -

        Created a review request against branch master in reviewboard
        https://reviews.apache.org/r/20299/

        Show
        davidzchen David Chen added a comment - Created a review request against branch master in reviewboard https://reviews.apache.org/r/20299/
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640019/TAJO-711.patch
        against master revision 8982684.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage:
        org.apache.tajo.storage.TestMergeScanner
        org.apache.tajo.storage.v2.TestStorages
        org.apache.tajo.storage.TestStorages

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/333//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/333//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/333//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/333//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640019/TAJO-711.patch against master revision 8982684. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings. -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage: org.apache.tajo.storage.TestMergeScanner org.apache.tajo.storage.v2.TestStorages org.apache.tajo.storage.TestStorages Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/333//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/333//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/333//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/333//console This message is automatically generated.
        Hide
        davidzchen David Chen added a comment -

        Updated the review request against branch master in reviewboard

        Show
        davidzchen David Chen added a comment - Updated the review request against branch master in reviewboard
        Hide
        davidzchen David Chen added a comment -

        Hmm. I got more changes when I fetched upstream again. I rebased on master again and am uploading a new patch to see if this fixes the problem.

        Show
        davidzchen David Chen added a comment - Hmm. I got more changes when I fetched upstream again. I rebased on master again and am uploading a new patch to see if this fixes the problem.
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640021/TAJO-711_20140413_21%3A46%3A27.patch
        against master revision 8982684.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage:
        org.apache.tajo.storage.TestMergeScanner
        org.apache.tajo.storage.v2.TestStorages
        org.apache.tajo.storage.TestStorages

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/334//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/334//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/334//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/334//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640021/TAJO-711_20140413_21%3A46%3A27.patch against master revision 8982684. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings. -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage: org.apache.tajo.storage.TestMergeScanner org.apache.tajo.storage.v2.TestStorages org.apache.tajo.storage.TestStorages Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/334//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/334//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/334//artifact/incubator-tajo/patchprocess/newPatchFindbugsWarningstajo-catalog-common.html Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/334//console This message is automatically generated.
        Hide
        davidzchen David Chen added a comment -

        This is very strange. I took a look at the Jenkins test script and tried downloading the same patch that Jenkins ran, applying it on a fresh clone of the master branch, and then running the tests, but I still do not get the test failures that Jenkins is getting. Any idea why Jenkins is hitting these errors?

        Show
        davidzchen David Chen added a comment - This is very strange. I took a look at the Jenkins test script and tried downloading the same patch that Jenkins ran, applying it on a fresh clone of the master branch, and then running the tests, but I still do not get the test failures that Jenkins is getting. Any idea why Jenkins is hitting these errors?
        Hide
        hyunsik Hyunsik Choi added a comment -

        Hi David,

        I have the account to access the jenkins jobs, and I'm investigating this problem. I don't think that your patch causes this problem. I'm expecting that the findbug errors was included in another revision or jenkins has some problem. If I find the cause, I'll share it here. Now, please ignore the error message.

        Thanks,
        Hyunsik

        Show
        hyunsik Hyunsik Choi added a comment - Hi David, I have the account to access the jenkins jobs, and I'm investigating this problem. I don't think that your patch causes this problem. I'm expecting that the findbug errors was included in another revision or jenkins has some problem. If I find the cause, I'll share it here. Now, please ignore the error message. Thanks, Hyunsik
        Hide
        hyunsik Hyunsik Choi added a comment -

        FYI, I share the progress of the jenkins problem. The problem was caused by some findbug error added recently revisions. I submitted the patch (TAJO-759) to fix those problems. The problem will be fix soon.

        Thanks,
        Hyunsik

        Show
        hyunsik Hyunsik Choi added a comment - FYI, I share the progress of the jenkins problem. The problem was caused by some findbug error added recently revisions. I submitted the patch ( TAJO-759 ) to fix those problems. The problem will be fix soon. Thanks, Hyunsik
        Hide
        davidzchen David Chen added a comment -

        Updated the review request against branch master in reviewboard
        https://reviews.apache.org/r/20299/

        Show
        davidzchen David Chen added a comment - Updated the review request against branch master in reviewboard https://reviews.apache.org/r/20299/
        Hide
        davidzchen David Chen added a comment -

        I see. Thanks for looking into it, Hyunsik!

        I have rebased on master and submitted a new patch.

        Thanks,
        David

        Show
        davidzchen David Chen added a comment - I see. Thanks for looking into it, Hyunsik! I have rebased on master and submitted a new patch. Thanks, David
        Hide
        tajoqa Tajo QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640110/TAJO-711_20140414_11%3A07%3A13.patch
        against master revision 06a1496.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        -1 release audit. The applied patch generated 1 release audit warnings.

        -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage:
        org.apache.tajo.storage.v2.TestStorages
        org.apache.tajo.storage.TestMergeScanner
        org.apache.tajo.storage.TestStorages

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/341//testReport/
        Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/341//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/341//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640110/TAJO-711_20140414_11%3A07%3A13.patch against master revision 06a1496. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. -1 release audit. The applied patch generated 1 release audit warnings. -1 core tests. The patch failed these unit tests in tajo-catalog/tajo-catalog-common tajo-storage: org.apache.tajo.storage.v2.TestStorages org.apache.tajo.storage.TestMergeScanner org.apache.tajo.storage.TestStorages Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/341//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-TAJO-Build/341//artifact/incubator-tajo/patchprocess/patchReleaseAuditProblems.txt Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/341//console This message is automatically generated.
        Hide
        davidzchen David Chen added a comment -

        Hi Hyunsik,

        It looks like those tests are still failing with the same problem: java.lang.NoSuchFieldError: AVRO

        Is it possible to look at Jenkin's working directory for that job and see whether there is anything suspicious with the way the changes to CatalogProtos.proto was applied? It seems that this may be caused by either the patch not applying correctly to CatalogProtos so that AVRO was not added to the StoreType enum or a stale StoreType class was used to build tajo-storage, but I am unable to debug this any further without access to the Jenkins machine since these failures do not reproduce on my machine.

        Thanks,
        David

        Show
        davidzchen David Chen added a comment - Hi Hyunsik, It looks like those tests are still failing with the same problem: java.lang.NoSuchFieldError: AVRO Is it possible to look at Jenkin's working directory for that job and see whether there is anything suspicious with the way the changes to CatalogProtos.proto was applied? It seems that this may be caused by either the patch not applying correctly to CatalogProtos so that AVRO was not added to the StoreType enum or a stale StoreType class was used to build tajo-storage, but I am unable to debug this any further without access to the Jenkins machine since these failures do not reproduce on my machine. Thanks, David
        Hide
        hyunsik Hyunsik Choi added a comment - - edited

        Above all, thank you for nice contribution!

        The further process might continue even though the initial java compilation was failed due to the license check of rat check. So, the following tests might use the existing JAR files instead of the patched ones. This causes the pre-commit test failure. I'll fix the precommit test script bug to be stopped at the failure of rat.

        Also, as Jinho Kim mentioned, I suggest you to add one exclude rule to ${TAJO_HOME}/pom.xml. I think that the exclusion of *.avsc is better because *.schema files are used for another purpose. It will fix the current problem.

          <exclude>**/*.avsc</exclude>
        

        With some trivial fix of pom.xml, I already tested your patch. All unit tests are passed successfully. I'm reviewing your patch. I'll comment soon!

        Show
        hyunsik Hyunsik Choi added a comment - - edited Above all, thank you for nice contribution! The further process might continue even though the initial java compilation was failed due to the license check of rat check. So, the following tests might use the existing JAR files instead of the patched ones. This causes the pre-commit test failure. I'll fix the precommit test script bug to be stopped at the failure of rat. Also, as Jinho Kim mentioned, I suggest you to add one exclude rule to ${TAJO_HOME}/pom.xml . I think that the exclusion of *.avsc is better because *.schema files are used for another purpose. It will fix the current problem. <exclude>**/*.avsc</exclude> With some trivial fix of pom.xml, I already tested your patch. All unit tests are passed successfully. I'm reviewing your patch. I'll comment soon!
        Hide
        hyunsik Hyunsik Choi added a comment -

        Hi David,

        TAJO-753, a required issue for 0.8.0 release, moves some constants from CatalogConstants to StorageConstants and removes the parquet dependencies from catalog-common module.

        So, TAJO-753 requires some rebase of TAJO-711 patch. I upload the rebased patch being used in my review. I hope that this is helpful for you.

        Show
        hyunsik Hyunsik Choi added a comment - Hi David, TAJO-753 , a required issue for 0.8.0 release, moves some constants from CatalogConstants to StorageConstants and removes the parquet dependencies from catalog-common module. So, TAJO-753 requires some rebase of TAJO-711 patch. I upload the rebased patch being used in my review. I hope that this is helpful for you.
        Hide
        tajoqa Tajo QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640279/TAJO-711_140415_rebased.patch
        against master revision 1d24a25.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-storage.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/343//testReport/
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/343//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640279/TAJO-711_140415_rebased.patch against master revision 1d24a25. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/343//testReport/ Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/343//console This message is automatically generated.
        Hide
        davidzchen David Chen added a comment -

        Updated the review request against branch master in reviewboard
        https://reviews.apache.org/r/20299/

        Show
        davidzchen David Chen added a comment - Updated the review request against branch master in reviewboard https://reviews.apache.org/r/20299/
        Hide
        davidzchen David Chen added a comment -

        Thanks for your help, Hyunsik and Jinho! I was not aware of the rat checks, but that makes sense.

        I have also made another change to move all the Parquet version properties from the base pom.xml into the tajo-storage pom.xml since there is no longer a dependency of tajo-catalog on Parquet.

        I have posted a new patch with the rebase and this change.

        Thanks!
        David

        Show
        davidzchen David Chen added a comment - Thanks for your help, Hyunsik and Jinho! I was not aware of the rat checks, but that makes sense. I have also made another change to move all the Parquet version properties from the base pom.xml into the tajo-storage pom.xml since there is no longer a dependency of tajo-catalog on Parquet. I have posted a new patch with the rebase and this change. Thanks! David
        Hide
        tajoqa Tajo QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12640308/TAJO-711_20140415_11%3A13%3A43.patch
        against master revision 1d24a25.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 5 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-storage.

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/344//testReport/
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/344//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12640308/TAJO-711_20140415_11%3A13%3A43.patch against master revision 1d24a25. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in tajo-catalog/tajo-catalog-common tajo-storage. Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/344//testReport/ Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/344//console This message is automatically generated.
        Hide
        hyunsik Hyunsik Choi added a comment - - edited

        This is my comment for the concept of schema evolving table.

        Few days ago, I discussed your idea with Hyoungjun in offline. We were very happy to see your interesting idea. I got some additional suggestion from Hyoungjun, and I add my some concrete ideas to them.

        I'd like to give some assumption and define some terms before I discuss the idea.

        • A partitioned table has a schema.
          • Let us call this schema 'parent schema'.
        • Each partition has its own schema.
          • Let us call this schema 'partition schema'.
        • Let us call this kind of table 'a schema-evolving table'.

        (I know that my naming sense is not good. They are temporary names. I hope that some guys suggest better names.)

        The rough idea is as follows:

        • Even though a schema is actually an ordered set of fields, we see the schema is just a set of fields when we deals with the relationship between parent schema and partition schemas.
        • The schema of a schema evolving table must be a super set of all fields in partition schemas.
        • The field set in each schema must be a subset of the parent schema.
        • The same name fields in all partition schemas including the parent schema must be the same data types.
        • The partition schemas among partitions can be different one another.
        • The order of schema fields among partitions can be different. (It's because we just see the fields as a set.)
        • Newly added fields of new partitions are added to the tail of the parent schema.
          • The schema maintenance will be performed when 'ALTER TABLE ADD PARTITION' is executed.

        In planning phases, Tajo will use only the parent schema, and then it will rewrites some projection plan for each partition if needed. When there is no corresponding field required in a query in a certain partition, the field will be NULL value in the processing on the partition. Processing multiple partitions with different schemas will output tuples with the same schema via the same projection.

        Show
        hyunsik Hyunsik Choi added a comment - - edited This is my comment for the concept of schema evolving table. Few days ago, I discussed your idea with Hyoungjun in offline. We were very happy to see your interesting idea. I got some additional suggestion from Hyoungjun, and I add my some concrete ideas to them. I'd like to give some assumption and define some terms before I discuss the idea. A partitioned table has a schema. Let us call this schema 'parent schema'. Each partition has its own schema. Let us call this schema 'partition schema'. Let us call this kind of table 'a schema-evolving table'. (I know that my naming sense is not good. They are temporary names. I hope that some guys suggest better names.) The rough idea is as follows: Even though a schema is actually an ordered set of fields, we see the schema is just a set of fields when we deals with the relationship between parent schema and partition schemas. The schema of a schema evolving table must be a super set of all fields in partition schemas. The field set in each schema must be a subset of the parent schema. The same name fields in all partition schemas including the parent schema must be the same data types. The partition schemas among partitions can be different one another. The order of schema fields among partitions can be different. (It's because we just see the fields as a set.) Newly added fields of new partitions are added to the tail of the parent schema. The schema maintenance will be performed when 'ALTER TABLE ADD PARTITION' is executed. In planning phases, Tajo will use only the parent schema, and then it will rewrites some projection plan for each partition if needed. When there is no corresponding field required in a query in a certain partition, the field will be NULL value in the processing on the partition. Processing multiple partitions with different schemas will output tuples with the same schema via the same projection.
        Hide
        hyunsik Hyunsik Choi added a comment - - edited

        Excellent! Big +1 for the latest patch. I tested the latest patch in a local cluster. It works perfectly. Thank you for your awesome contribution! I'll commit it if there are no additional comment until today's night.

        There is one very trivial suggestion. An instance of FileScanner including AvroScanner is created, and then can be closed without invoking FileScanner::init() method. I'm sorry for not mentioning it in javadoc. Anyway, FileScanner::close() should check the nullity of member variables.

        As I mentioned, I tested the patch on a local cluster. First of all, I prepared the avro schema as follows:

        {
          "type": "record",
          "namespace": "org.apache.tajo",
          "name": "table1",
          "fields": [
            { "name": "id", "type": "int" },
            { "name": "name", "type": "string" }
          ]
        }
        

        Then, I created one database and one table as follows:

        default> create database avro2;
        Ok
        
        default> \c avro2
        
        avro> create table avro2 (id int, name text) using avro with ('avro.schema.url' = 'file:///home/hyunsik/schema.avsc');
        Ok
        avro> \d avro2
        
        table name: avro.avro2
        table path: hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2
        store type: AVRO
        number of rows: 0
        volume: 0 B
        Options: 
          'avro.schema.url'='file:///home/hyunsik/schema.avsc'
        
        schema: 
        id  INT4
        name  TEXT
        

        Next, I inserted rows 6,001,215 rows to the avro table via INSERT OVERWRITE INTO statement as follows:

        avro> insert overwrite into avro2 (id, name) select l_orderkey::int4, l_returnflag from tpch.lineitem;
        Progress: 8%, response time: 0.397 sec
        Progress: 17%, response time: 1.2 sec
        Progress: 69%, response time: 2.202 sec
        Progress: 100%, response time: 2.909 sec
        final state: QUERY_SUCCEEDED, response time: 2.909 sec
        OK
        

        I checked the generated files.

        [hyunsik@local05 hadoop-2.3.0]$ bin/hadoop dfs -ls hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2
        DEPRECATED: Use of this script to execute hdfs command is deprecated.
        Instead use the hdfs command for it.
        
        Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hyunsik/Code/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
        It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
        14/04/16 14:43:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
        Found 23 items
        -rw-r--r--   3 hyunsik supergroup    1331444 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000000
        -rw-r--r--   3 hyunsik supergroup    1335487 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000001
        -rw-r--r--   3 hyunsik supergroup    1335522 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000002
        -rw-r--r--   3 hyunsik supergroup    1351444 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000003
        -rw-r--r--   3 hyunsik supergroup    1590096 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000004
        -rw-r--r--   3 hyunsik supergroup    1590222 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000005
        -rw-r--r--   3 hyunsik supergroup    1589538 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000006
        -rw-r--r--   3 hyunsik supergroup    1590408 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000007
        -rw-r--r--   3 hyunsik supergroup    1590168 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000008
        -rw-r--r--   3 hyunsik supergroup    1589226 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000009
        -rw-r--r--   3 hyunsik supergroup    1589688 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000010
        -rw-r--r--   3 hyunsik supergroup    1589790 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000011
        -rw-r--r--   3 hyunsik supergroup    1590048 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000012
        -rw-r--r--   3 hyunsik supergroup    1590204 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000013
        -rw-r--r--   3 hyunsik supergroup    1590234 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000014
        -rw-r--r--   3 hyunsik supergroup    1589562 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000015
        -rw-r--r--   3 hyunsik supergroup    1590276 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000016
        -rw-r--r--   3 hyunsik supergroup    1590720 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000017
        -rw-r--r--   3 hyunsik supergroup    1590198 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000018
        -rw-r--r--   3 hyunsik supergroup    1589508 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000019
        -rw-r--r--   3 hyunsik supergroup    1590042 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000020
        -rw-r--r--   3 hyunsik supergroup    1589814 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000021
        -rw-r--r--   3 hyunsik supergroup    1026861 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000022
        

        Then, I tried to execute some simple and distributed queries:

        avro> select id from avro2 limit 10;
        Progress: 100%, response time: 0.351 sec
        final state: QUERY_SUCCEEDED, response time: 0.351 sec
        result: 10 rows (80 B)
        id
        -------------------------------
        1860579
        1860579
        1860579
        1860580
        1860580
        1860580
        1860580
        1860580
        1860580
        1860581
        
        avro> select id, name from avro2 order by id asc limit 10;
        Progress: 8%, response time: 0.399 sec
        Progress: 41%, response time: 1.202 sec
        Progress: 100%, response time: 1.574 sec
        final state: QUERY_SUCCEEDED, response time: 1.574 sec
        result: 10 rows (40 B)
        id,  name
        -------------------------------
        1,  N
        1,  N
        1,  N
        1,  N
        1,  N
        1,  N
        2,  N
        3,  R
        3,  R
        3,  A
        avro> select id, name from avro2 order by id desc limit 10;
        Progress: 6%, response time: 0.401 sec
        Progress: 45%, response time: 1.203 sec
        Progress: 100%, response time: 1.551 sec
        final state: QUERY_SUCCEEDED, response time: 1.551 sec
        result: 10 rows (100 B)
        id,  name
        -------------------------------
        6000000,  N
        6000000,  N
        5999975,  R
        5999975,  A
        5999975,  A
        5999974,  R
        5999974,  R
        5999973,  N
        5999972,  N
        5999972,  N
        
        avro> select count(id), count(name) from avro2;
        Progress: 19%, response time: 0.401 sec
        Progress: 100%, response time: 0.776 sec
        final state: QUERY_SUCCEEDED, response time: 0.776 sec
        result: 1 rows (16 B)
        ?count,  ?count_1
        -------------------------------
        6001215,  6001215
        
        Show
        hyunsik Hyunsik Choi added a comment - - edited Excellent! Big +1 for the latest patch. I tested the latest patch in a local cluster. It works perfectly. Thank you for your awesome contribution! I'll commit it if there are no additional comment until today's night. There is one very trivial suggestion. An instance of FileScanner including AvroScanner is created, and then can be closed without invoking FileScanner::init() method. I'm sorry for not mentioning it in javadoc. Anyway, FileScanner::close() should check the nullity of member variables. As I mentioned, I tested the patch on a local cluster. First of all, I prepared the avro schema as follows: { "type" : "record" , "namespace" : "org.apache.tajo" , "name" : "table1" , "fields" : [ { "name" : "id" , "type" : " int " }, { "name" : "name" , "type" : "string" } ] } Then, I created one database and one table as follows: default > create database avro2; Ok default > \c avro2 avro> create table avro2 (id int , name text) using avro with ('avro.schema.url' = 'file: ///home/hyunsik/schema.avsc'); Ok avro> \d avro2 table name: avro.avro2 table path: hdfs: //127.0.0.1:8020/tajo/warehouse/avro/avro2 store type: AVRO number of rows: 0 volume: 0 B Options: 'avro.schema.url'='file: ///home/hyunsik/schema.avsc' schema: id INT4 name TEXT Next, I inserted rows 6,001,215 rows to the avro table via INSERT OVERWRITE INTO statement as follows: avro> insert overwrite into avro2 (id, name) select l_orderkey::int4, l_returnflag from tpch.lineitem; Progress: 8%, response time: 0.397 sec Progress: 17%, response time: 1.2 sec Progress: 69%, response time: 2.202 sec Progress: 100%, response time: 2.909 sec final state: QUERY_SUCCEEDED, response time: 2.909 sec OK I checked the generated files. [hyunsik@local05 hadoop-2.3.0]$ bin/hadoop dfs -ls hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2 DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Java HotSpot(TM) 64-Bit Server VM warning: You have loaded library /home/hyunsik/Code/hadoop-2.3.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now. It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'. 14/04/16 14:43:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 23 items -rw-r--r-- 3 hyunsik supergroup 1331444 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000000 -rw-r--r-- 3 hyunsik supergroup 1335487 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000001 -rw-r--r-- 3 hyunsik supergroup 1335522 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000002 -rw-r--r-- 3 hyunsik supergroup 1351444 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000003 -rw-r--r-- 3 hyunsik supergroup 1590096 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000004 -rw-r--r-- 3 hyunsik supergroup 1590222 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000005 -rw-r--r-- 3 hyunsik supergroup 1589538 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000006 -rw-r--r-- 3 hyunsik supergroup 1590408 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000007 -rw-r--r-- 3 hyunsik supergroup 1590168 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000008 -rw-r--r-- 3 hyunsik supergroup 1589226 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000009 -rw-r--r-- 3 hyunsik supergroup 1589688 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000010 -rw-r--r-- 3 hyunsik supergroup 1589790 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000011 -rw-r--r-- 3 hyunsik supergroup 1590048 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000012 -rw-r--r-- 3 hyunsik supergroup 1590204 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000013 -rw-r--r-- 3 hyunsik supergroup 1590234 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000014 -rw-r--r-- 3 hyunsik supergroup 1589562 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000015 -rw-r--r-- 3 hyunsik supergroup 1590276 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000016 -rw-r--r-- 3 hyunsik supergroup 1590720 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000017 -rw-r--r-- 3 hyunsik supergroup 1590198 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000018 -rw-r--r-- 3 hyunsik supergroup 1589508 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000019 -rw-r--r-- 3 hyunsik supergroup 1590042 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000020 -rw-r--r-- 3 hyunsik supergroup 1589814 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000021 -rw-r--r-- 3 hyunsik supergroup 1026861 2014-04-16 14:40 hdfs://127.0.0.1:8020/tajo/warehouse/avro/avro2/part-01-000022 Then, I tried to execute some simple and distributed queries: avro> select id from avro2 limit 10; Progress: 100%, response time: 0.351 sec final state: QUERY_SUCCEEDED, response time: 0.351 sec result: 10 rows (80 B) id ------------------------------- 1860579 1860579 1860579 1860580 1860580 1860580 1860580 1860580 1860580 1860581 avro> select id, name from avro2 order by id asc limit 10; Progress: 8%, response time: 0.399 sec Progress: 41%, response time: 1.202 sec Progress: 100%, response time: 1.574 sec final state: QUERY_SUCCEEDED, response time: 1.574 sec result: 10 rows (40 B) id, name ------------------------------- 1, N 1, N 1, N 1, N 1, N 1, N 2, N 3, R 3, R 3, A avro> select id, name from avro2 order by id desc limit 10; Progress: 6%, response time: 0.401 sec Progress: 45%, response time: 1.203 sec Progress: 100%, response time: 1.551 sec final state: QUERY_SUCCEEDED, response time: 1.551 sec result: 10 rows (100 B) id, name ------------------------------- 6000000, N 6000000, N 5999975, R 5999975, A 5999975, A 5999974, R 5999974, R 5999973, N 5999972, N 5999972, N avro> select count(id), count(name) from avro2; Progress: 19%, response time: 0.401 sec Progress: 100%, response time: 0.776 sec final state: QUERY_SUCCEEDED, response time: 0.776 sec result: 1 rows (16 B) ?count, ?count_1 ------------------------------- 6001215, 6001215
        Hide
        hyunsik Hyunsik Choi added a comment -

        Currently, Tajo does not support nested schema. Due to some reasons, we should specify schemas of avro tables via WITH clause. Later, we can make the schema specification way better by converting Tajo schema into avro schema. Also, it would be completed when Tajo supports directly nested schemas and non-scalar data types.

        Show
        hyunsik Hyunsik Choi added a comment - Currently, Tajo does not support nested schema. Due to some reasons, we should specify schemas of avro tables via WITH clause. Later, we can make the schema specification way better by converting Tajo schema into avro schema. Also, it would be completed when Tajo supports directly nested schemas and non-scalar data types.
        Hide
        jhkim Jinho Kim added a comment -

        +1
        Great!! David.

        Show
        jhkim Jinho Kim added a comment - +1 Great!! David.
        Hide
        hyunsik Hyunsik Choi added a comment -

        This patch can be applied to both 0.8 and 0.9 without any modification. So, I committed it to both 0.8 and 0.9. Thank you for your contribution!

        Show
        hyunsik Hyunsik Choi added a comment - This patch can be applied to both 0.8 and 0.9 without any modification. So, I committed it to both 0.8 and 0.9. Thank you for your contribution!
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #175 (See https://builds.apache.org/job/Tajo-master-build/175/)
        TAJO-711: Add Avro storage support. (David Chen via hyunsik) (hyunsik: rev 8da52ede0afecdc9c3302748d322ee4929986d5a)

        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
        • tajo-catalog/tajo-catalog-common/src/main/proto/CatalogProtos.proto
        • tajo-storage/pom.xml
        • tajo-storage/src/test/java/org/apache/tajo/storage/TestStorages.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/package-info.java
        • tajo-storage/src/test/resources/testVariousTypes.avsc
        • tajo-storage/src/test/java/org/apache/tajo/storage/TestMergeScanner.java
        • CHANGES.txt
        • tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogUtil.java
        • pom.xml
        • tajo-storage/src/test/resources/storage-default.xml
        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroUtil.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/StorageConstants.java
        • tajo-storage/src/test/java/org/apache/tajo/storage/v2/TestStorages.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroScanner.java
        • tajo-storage/src/main/resources/storage-default.xml
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #175 (See https://builds.apache.org/job/Tajo-master-build/175/ ) TAJO-711 : Add Avro storage support. (David Chen via hyunsik) (hyunsik: rev 8da52ede0afecdc9c3302748d322ee4929986d5a) tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java tajo-catalog/tajo-catalog-common/src/main/proto/CatalogProtos.proto tajo-storage/pom.xml tajo-storage/src/test/java/org/apache/tajo/storage/TestStorages.java tajo-storage/src/main/java/org/apache/tajo/storage/avro/package-info.java tajo-storage/src/test/resources/testVariousTypes.avsc tajo-storage/src/test/java/org/apache/tajo/storage/TestMergeScanner.java CHANGES.txt tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogUtil.java pom.xml tajo-storage/src/test/resources/storage-default.xml tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroUtil.java tajo-storage/src/main/java/org/apache/tajo/storage/StorageConstants.java tajo-storage/src/test/java/org/apache/tajo/storage/v2/TestStorages.java tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroScanner.java tajo-storage/src/main/resources/storage-default.xml
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-0.8.0-build #74 (See https://builds.apache.org/job/Tajo-0.8.0-build/74/)
        TAJO-711: Add Avro storage support. (David Chen via hyunsik) (hyunsik: rev 9c7badcd601ad1d6f056a676262c6702514771ef)

        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroScanner.java
        • tajo-storage/src/test/java/org/apache/tajo/storage/TestMergeScanner.java
        • tajo-storage/src/main/resources/storage-default.xml
        • tajo-storage/src/test/resources/testVariousTypes.avsc
        • pom.xml
        • tajo-storage/pom.xml
        • tajo-storage/src/test/java/org/apache/tajo/storage/TestStorages.java
        • tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogUtil.java
        • tajo-storage/src/test/java/org/apache/tajo/storage/v2/TestStorages.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java
        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroUtil.java
        • tajo-catalog/tajo-catalog-common/src/main/proto/CatalogProtos.proto
        • tajo-storage/src/test/resources/storage-default.xml
        • tajo-storage/src/main/java/org/apache/tajo/storage/avro/package-info.java
        • CHANGES.txt
        • tajo-storage/src/main/java/org/apache/tajo/storage/StorageConstants.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.8.0-build #74 (See https://builds.apache.org/job/Tajo-0.8.0-build/74/ ) TAJO-711 : Add Avro storage support. (David Chen via hyunsik) (hyunsik: rev 9c7badcd601ad1d6f056a676262c6702514771ef) tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroScanner.java tajo-storage/src/test/java/org/apache/tajo/storage/TestMergeScanner.java tajo-storage/src/main/resources/storage-default.xml tajo-storage/src/test/resources/testVariousTypes.avsc pom.xml tajo-storage/pom.xml tajo-storage/src/test/java/org/apache/tajo/storage/TestStorages.java tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/CatalogUtil.java tajo-storage/src/test/java/org/apache/tajo/storage/v2/TestStorages.java tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroAppender.java tajo-storage/src/main/java/org/apache/tajo/storage/avro/AvroUtil.java tajo-catalog/tajo-catalog-common/src/main/proto/CatalogProtos.proto tajo-storage/src/test/resources/storage-default.xml tajo-storage/src/main/java/org/apache/tajo/storage/avro/package-info.java CHANGES.txt tajo-storage/src/main/java/org/apache/tajo/storage/StorageConstants.java
        Hide
        davidzchen David Chen added a comment - - edited

        Thanks for your detailed review and for committing this patch!

        I think I am checking whether dataFileReader is null in AvroScanner but I do see that I am not making that check in AvroAppender. It seems that it is also possible for FileAppender to be closed without calling init, in which case close will still need to check whether the member variables it uses are null. If that is the case, then I can go ahead and open a separate ticket and make that change, which would be a trivial change.

        Also, thanks for going over your idea for schema evolution for tables, Hyunsik. I think your idea is very interesting. I would like think about it some more and then get back to you.

        Show
        davidzchen David Chen added a comment - - edited Thanks for your detailed review and for committing this patch! I think I am checking whether dataFileReader is null in AvroScanner but I do see that I am not making that check in AvroAppender . It seems that it is also possible for FileAppender to be closed without calling init , in which case close will still need to check whether the member variables it uses are null. If that is the case, then I can go ahead and open a separate ticket and make that change, which would be a trivial change. Also, thanks for going over your idea for schema evolution for tables, Hyunsik. I think your idea is very interesting. I would like think about it some more and then get back to you.

          People

          • Assignee:
            davidzchen David Chen
            Reporter:
            davidzchen David Chen
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development