Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-8597

FsShell's Text command should be able to read avro data files

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0-alpha
    • 2.0.3-alpha
    • fs

    Description

      Similar to SequenceFiles are Apache Avro's DataFiles. Since these are getting popular as a data format, perhaps it would be useful if fs -text were to add some support for reading it, like it reads SequenceFiles. Should be easy since Avro is already a dependency and provides the required classes.

      Of discussion is the output we ought to emit. Avro DataFiles aren't simple as text, nor have they the singular Key-Value pair structure of SequenceFiles. They usually contain a set of fields defined as a record, and the usual text emit, as available from avro-tools via http://avro.apache.org/docs/current/api/java/org/apache/avro/tool/DataFileReadTool.html, is in proper JSON format.

      I think we should use the JSON format as the output, rather than a delimited form, for there are many complex structures in Avro and JSON is the easiest and least-work-to-do way to display it (Avro supports json dumping by itself).

      Attachments

        1. HADOOP-8597.patch
          14 kB
          Doug Cutting
        2. HADOOP-8597.patch
          14 kB
          Doug Cutting
        3. HADOOP-8597.patch
          4 kB
          Ivan Vladimirov Ivanov
        4. HADOOP-8597-2.patch
          14 kB
          Ivan Vladimirov Ivanov

        Activity

          People

            ivani Ivan Vladimirov Ivanov
            qwertymaniac Harsh J
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: