Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8597

FsShell's Text command should be able to read avro data files

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 2.0.0-alpha
    • Fix Version/s: 2.0.3-alpha
    • Component/s: fs
    • Labels:

      Description

      Similar to SequenceFiles are Apache Avro's DataFiles. Since these are getting popular as a data format, perhaps it would be useful if fs -text were to add some support for reading it, like it reads SequenceFiles. Should be easy since Avro is already a dependency and provides the required classes.

      Of discussion is the output we ought to emit. Avro DataFiles aren't simple as text, nor have they the singular Key-Value pair structure of SequenceFiles. They usually contain a set of fields defined as a record, and the usual text emit, as available from avro-tools via http://avro.apache.org/docs/current/api/java/org/apache/avro/tool/DataFileReadTool.html, is in proper JSON format.

      I think we should use the JSON format as the output, rather than a delimited form, for there are many complex structures in Avro and JSON is the easiest and least-work-to-do way to display it (Avro supports json dumping by itself).

        Attachments

        1. HADOOP-8597.patch
          4 kB
          Ivan Vladimirov Ivanov
        2. HADOOP-8597-2.patch
          14 kB
          Ivan Vladimirov Ivanov
        3. HADOOP-8597.patch
          14 kB
          Doug Cutting
        4. HADOOP-8597.patch
          14 kB
          Doug Cutting

          Activity

            People

            • Assignee:
              ivani Ivan Vladimirov Ivanov
              Reporter:
              qwertymaniac Harsh J

              Dates

              • Created:
                Updated:
                Resolved:

                Issue deployment