Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.0, 0.20.1, 0.21.0, 0.22.0
    • Fix Version/s: 0.21.0
    • Component/s: None
    • Labels:
      None

      Description

      It would be nice if TFile can be split by Record Sequence Number. This way, columnar storage like PIG-833 can align fields that belong to the same row but in different columns.

      1. hadoop-6218-trunk-20091012.patch
        18 kB
        Hong Tang
      2. hadoop-6218-20091012.patch
        18 kB
        Hong Tang
      3. HADOOP-6218-0.20.patch
        15 kB
        Raghu Angadi
      4. HADOOP-6218-0.20.patch
        14 kB
        Raghu Angadi
      5. hadoop-6218-20090827.patch
        12 kB
        Hong Tang

        Activity

        Hide
        Hong Tang added a comment -

        Preliminary patch for review. Need to add more test cases.

        Show
        Hong Tang added a comment - Preliminary patch for review. Need to add more test cases.
        Hide
        Raghu Angadi added a comment -

        Preliminary review:

        The patch looks good.

        The patch changes the 'createScanner' API names. For 0.20.2 and 0.21 can we keep the old constructors as well (deprecated in 0.21). I think this is required for 0.20.x at least.

        Couple new public methods need JavaDoc.

        We need add a unit test.

        This is not a ondisk format change for TFile. Mainly exposes the record count it already kept. In that sense I think it is safe for 0.20 and 0.21.

        Show
        Raghu Angadi added a comment - Preliminary review: The patch looks good. The patch changes the 'createScanner' API names. For 0.20.2 and 0.21 can we keep the old constructors as well (deprecated in 0.21). I think this is required for 0.20.x at least. Couple new public methods need JavaDoc. We need add a unit test. This is not a ondisk format change for TFile. Mainly exposes the record count it already kept. In that sense I think it is safe for 0.20 and 0.21.
        Hide
        Raghu Angadi added a comment -

        Patch for 0.20 is attached.

        It also includes previous 'createScanner' methods for backward compatibility. These methods are deprecated.

        I will add a unit test for both trunk and 0.20 patch later.

        Show
        Raghu Angadi added a comment - Patch for 0.20 is attached. It also includes previous 'createScanner' methods for backward compatibility. These methods are deprecated. I will add a unit test for both trunk and 0.20 patch later.
        Hide
        Hong Tang added a comment -

        We probably need to do the following before marking the patch ready:

        • Add some unit tests.
        • Retain the old create(..) APIs but mark them deprecated.
        Show
        Hong Tang added a comment - We probably need to do the following before marking the patch ready: Add some unit tests. Retain the old create(..) APIs but mark them deprecated.
        Hide
        Hong Tang added a comment -

        Sorry looks like the @deprecate is already in the 0.20 patch.

        Show
        Hong Tang added a comment - Sorry looks like the @deprecate is already in the 0.20 patch.
        Hide
        Raghu Angadi added a comment -

        The attached 0.20 patch includes test for createScannerByRecordNum() interface.

        Is there a test for Reader.getKeyNear()? The test for Reader.getRecordNumNear() should be on the same lines.

        Show
        Raghu Angadi added a comment - The attached 0.20 patch includes test for createScannerByRecordNum() interface. Is there a test for Reader.getKeyNear() ? The test for Reader.getRecordNumNear() should be on the same lines.
        Hide
        Hong Tang added a comment -

        Added a few more tests.

        Show
        Hong Tang added a comment - Added a few more tests.
        Hide
        Hong Tang added a comment -

        Forgot to mention that the patch "hadoop-6218-20091012.patch" applies to Hadoop 0.20.

        Show
        Hong Tang added a comment - Forgot to mention that the patch "hadoop-6218-20091012.patch" applies to Hadoop 0.20.
        Hide
        Hong Tang added a comment -

        Patch "hadoop-6218-trunk-20091012.patch" is for hadoop-common trunk and 0.21.

        Show
        Hong Tang added a comment - Patch "hadoop-6218-trunk-20091012.patch" is for hadoop-common trunk and 0.21.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421854/hadoop-6218-20091012.patch
        against trunk revision 823756.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 12 new or modified tests.

        -1 patch. The patch command could not apply the patch.

        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/80/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421854/hadoop-6218-20091012.patch against trunk revision 823756. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/80/console This message is automatically generated.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12421855/hadoop-6218-trunk-20091012.patch
        against trunk revision 823756.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 12 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421855/hadoop-6218-trunk-20091012.patch against trunk revision 823756. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/81/console This message is automatically generated.
        Hide
        Hong Tang added a comment -

        I think the patch is ready to commit.

        Show
        Hong Tang added a comment - I think the patch is ready to commit.
        Hide
        Raghu Angadi added a comment -

        +1.

        We could remove the deprecated api in 0.21 and/or trunk.

        Show
        Raghu Angadi added a comment - +1. We could remove the deprecated api in 0.21 and/or trunk.
        Hide
        Devaraj Das added a comment -

        I just committed this to trunk. This patch cannot be committed to 0.20/0.21 branches since it is a feature add.
        Thanks Hong and Raghu!

        Show
        Devaraj Das added a comment - I just committed this to trunk. This patch cannot be committed to 0.20/0.21 branches since it is a feature add. Thanks Hong and Raghu!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #59 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/59/)
        . Adds a feature where TFile can be split by Record Sequeunce number. Contributed by Hong Tang and Raghu Angadi.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #59 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/59/ ) . Adds a feature where TFile can be split by Record Sequeunce number. Contributed by Hong Tang and Raghu Angadi.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #126 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/126/)
        . Adds a feature where TFile can be split by Record Sequeunce number. Contributed by Hong Tang and Raghu Angadi.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #126 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/126/ ) . Adds a feature where TFile can be split by Record Sequeunce number. Contributed by Hong Tang and Raghu Angadi.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #67 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/67/)
        . Moving the commit comment to 0.20.2.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #67 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/67/ ) . Moving the commit comment to 0.20.2.
        Hide
        Devaraj Das added a comment -

        Since the voting regarding committing the patches on 0.20/21 branches passed, I committed the respective patches there as well.

        Show
        Devaraj Das added a comment - Since the voting regarding committing the patches on 0.20/21 branches passed, I committed the respective patches there as well.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #92 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/92/)
        . Committing the core jars into mapred

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #92 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk-Commit/92/ ) . Committing the core jars into mapred
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #135 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/135/)
        . Moving the commit comment to 0.20.2.

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #135 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/135/ ) . Moving the commit comment to 0.20.2.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/120/)
        . Committing the core jars into mapred

        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Mapreduce-trunk/120/ ) . Committing the core jars into mapred
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #52 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h2.grid.sp2.yahoo.net #52 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h2.grid.sp2.yahoo.net/52/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #79 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk-Commit/79/ )
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/)

        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #120 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Hdfs-trunk/120/ )
        Hide
        Hudson added a comment -

        Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/)

        Show
        Hudson added a comment - Integrated in Hdfs-Patch-h5.grid.sp2.yahoo.net #78 (See http://hudson.zones.apache.org/hudson/job/Hdfs-Patch-h5.grid.sp2.yahoo.net/78/ )

          People

          • Assignee:
            Hong Tang
            Reporter:
            Hong Tang
          • Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development