Hadoop Common
  1. Hadoop Common
  2. HADOOP-8597

FsShell's Text command should be able to read avro data files

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.3-alpha
    • Component/s: fs
    • Labels:

      Description

      Similar to SequenceFiles are Apache Avro's DataFiles. Since these are getting popular as a data format, perhaps it would be useful if fs -text were to add some support for reading it, like it reads SequenceFiles. Should be easy since Avro is already a dependency and provides the required classes.

      Of discussion is the output we ought to emit. Avro DataFiles aren't simple as text, nor have they the singular Key-Value pair structure of SequenceFiles. They usually contain a set of fields defined as a record, and the usual text emit, as available from avro-tools via http://avro.apache.org/docs/current/api/java/org/apache/avro/tool/DataFileReadTool.html, is in proper JSON format.

      I think we should use the JSON format as the output, rather than a delimited form, for there are many complex structures in Avro and JSON is the easiest and least-work-to-do way to display it (Avro supports json dumping by itself).

      1. HADOOP-8597.patch
        4 kB
        Ivan Vladimirov Ivanov
      2. HADOOP-8597-2.patch
        14 kB
        Ivan Vladimirov Ivanov
      3. HADOOP-8597.patch
        14 kB
        Doug Cutting
      4. HADOOP-8597.patch
        14 kB
        Doug Cutting

        Activity

        Transition Time In Source Status Execution Times Last Executer Last Execution Date
        Patch Available Patch Available Open Open
        1h 14m 1 Doug Cutting 10/Sep/12 22:36
        Open Open Patch Available Patch Available
        58d 12h 47m 2 Doug Cutting 10/Sep/12 22:39
        Patch Available Patch Available Resolved Resolved
        23h 8m 1 Doug Cutting 11/Sep/12 21:48
        Resolved Resolved Closed Closed
        156d 16h 24m 1 Arun C Murthy 15/Feb/13 13:12
        Arun C Murthy made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1194 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1194/)
        HADOOP-8597. Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607)

        Result = SUCCESS
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1194 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1194/ ) HADOOP-8597 . Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1163 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1163/)
        HADOOP-8597. Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607)

        Result = FAILURE
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1163 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1163/ ) HADOOP-8597 . Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607) Result = FAILURE cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2743 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2743/)
        HADOOP-8597. Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607)

        Result = FAILURE
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2743 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2743/ ) HADOOP-8597 . Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607) Result = FAILURE cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2719 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2719/)
        HADOOP-8597. Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607)

        Result = SUCCESS
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2719 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2719/ ) HADOOP-8597 . Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Resolved [ 5 ]
        Fix Version/s 2.0.3-alpha [ 12323273 ]
        Resolution Fixed [ 1 ]
        Hide
        Doug Cutting added a comment -

        I just committed this. Thanks, Ivan!

        Show
        Doug Cutting added a comment - I just committed this. Thanks, Ivan!
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2782 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2782/)
        HADOOP-8597. Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607)

        Result = SUCCESS
        cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2782 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2782/ ) HADOOP-8597 . Permit FsShell's text command to read Avro files. Contributed by Ivan Vladimirov. (Revision 1383607) Result = SUCCESS cutting : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1383607 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/CHANGES.txt /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/Display.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/fs/shell/TestTextCommand.java
        Hide
        Doug Cutting added a comment -

        Ivan, patches are normally against trunk. After they're committed to trunk they may be backported to a branch.

        http://wiki.apache.org/hadoop/HowToContribute

        This patch should probably be committed to trunk and to branch-2 with fix-version 2.0.3-alpha.

        Show
        Doug Cutting added a comment - Ivan, patches are normally against trunk. After they're committed to trunk they may be backported to a branch. http://wiki.apache.org/hadoop/HowToContribute This patch should probably be committed to trunk and to branch-2 with fix-version 2.0.3-alpha.
        Hide
        Ivan Vladimirov Ivanov added a comment -

        Sorry for the inconvenience that applying my patch caused. Since I am new to the project I was unsure against which version (or branch) to create the patch - so I chose "release-2.0.0-alpha". It seemed to most closely match the "Affects Version/s" field. In retrospect the choice was probably a mistake. To avoid such problems in the future, I would like to ask the following question - Should patches be created against the first branch with a version number greater or equal to that in the "Affects Version/s" field ("branch-2.0.1-alpha" in the current case) or if the version is new enough to directly use the trunk.

        Thank you for taking the time to review my patch. I hope that it will be useful and would be very happy if it gets committed.

        Show
        Ivan Vladimirov Ivanov added a comment - Sorry for the inconvenience that applying my patch caused. Since I am new to the project I was unsure against which version (or branch) to create the patch - so I chose "release-2.0.0-alpha". It seemed to most closely match the "Affects Version/s" field. In retrospect the choice was probably a mistake. To avoid such problems in the future, I would like to ask the following question - Should patches be created against the first branch with a version number greater or equal to that in the "Affects Version/s" field ("branch-2.0.1-alpha" in the current case) or if the version is new enough to directly use the trunk. Thank you for taking the time to review my patch. I hope that it will be useful and would be very happy if it gets committed.
        Hide
        Hadoop QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12544535/HADOOP-8597.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1429//testReport/
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1429//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544535/HADOOP-8597.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1429//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1429//console This message is automatically generated.
        Doug Cutting made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Doug Cutting made changes -
        Attachment HADOOP-8597.patch [ 12544535 ]
        Hide
        Doug Cutting added a comment -

        New version with AvroFileInputStream made static.

        Show
        Doug Cutting added a comment - New version with AvroFileInputStream made static.
        Doug Cutting made changes -
        Status Patch Available [ 10002 ] Open [ 1 ]
        Hide
        Doug Cutting added a comment -

        Jenkins says that org.apache.hadoop.fs.shell.Display$AvroFileInputStream should be static.

        Show
        Doug Cutting added a comment - Jenkins says that org.apache.hadoop.fs.shell.Display$AvroFileInputStream should be static.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12544521/HADOOP-8597.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1427//testReport/
        Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1427//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html
        Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1427//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12544521/HADOOP-8597.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. -1 findbugs. The patch appears to introduce 1 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in hadoop-common-project/hadoop-common. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/1427//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HADOOP-Build/1427//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-common.html Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/1427//console This message is automatically generated.
        Doug Cutting made changes -
        Status Open [ 1 ] Patch Available [ 10002 ]
        Assignee Ivan Vladimirov Ivanov [ ivani ]
        Hide
        Doug Cutting added a comment -

        +1 Patch looks good to me. Let's see what Jenkins says.

        Show
        Doug Cutting added a comment - +1 Patch looks good to me. Let's see what Jenkins says.
        Doug Cutting made changes -
        Attachment HADOOP-8597.patch [ 12544521 ]
        Hide
        Doug Cutting added a comment -

        Not sure why, but your patch file didn't apply cleanly for me. Here's the same patch, but a version that applies cleanly.

        Show
        Doug Cutting added a comment - Not sure why, but your patch file didn't apply cleanly for me. Here's the same patch, but a version that applies cleanly.
        Ivan Vladimirov Ivanov made changes -
        Attachment HADOOP-8597-2.patch [ 12544405 ]
        Hide
        Ivan Vladimirov Ivanov added a comment -

        Done - a unit test is added.

        Show
        Ivan Vladimirov Ivanov added a comment - Done - a unit test is added.
        Hide
        Doug Cutting added a comment -

        This looks like a useful addition. Can you please add a unit test for it?

        Show
        Doug Cutting added a comment - This looks like a useful addition. Can you please add a unit test for it?
        Ivan Vladimirov Ivanov made changes -
        Field Original Value New Value
        Attachment HADOOP-8597.patch [ 12543525 ]
        Hide
        Ivan Vladimirov Ivanov added a comment -

        The proposed patch adds the logic to output the content of Avro data files in JSON format.

        The implementation does not use the DataFileReadTool class since, as it turned out, the org.apache.avro.tool package is not currently part of the project's dependencies. As a consequence this allowed a more memory efficient implementation, which keeps only a constant number of Avro records in memory.

        Show
        Ivan Vladimirov Ivanov added a comment - The proposed patch adds the logic to output the content of Avro data files in JSON format. The implementation does not use the DataFileReadTool class since, as it turned out, the org.apache.avro.tool package is not currently part of the project's dependencies. As a consequence this allowed a more memory efficient implementation, which keeps only a constant number of Avro records in memory.
        Harsh J created issue -

          People

          • Assignee:
            Ivan Vladimirov Ivanov
            Reporter:
            Harsh J
          • Votes:
            1 Vote for this issue
            Watchers:
            11 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development