Uploaded image for project: 'Tajo (Retired)'
  1. Tajo (Retired)
  2. TAJO-30

Parquet Integration

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 0.8.0, 0.9.0
    • None

    Description

      Parquet is a columnar storage format developed by Twitter. Implement Parquet (http://parquet.io/) support for Tajo.

      The implementation consists of the following:

      • ParquetScanner and ParquetAppender - FileScanner and FileAppenders for reading and writing Parquet.
      • TajoParquetReader and TajoParquetWriter - Top-level reader and writer for serializing/deserializing to Tajo Tuples.
      • TajoReadSupport and TajoWriteSupport - Abstractions to perform conversion between Parquet and Tajo records.
      • TajoRecordMaterializer - Materializes Tajo Tuples from Parquet's internal representation.
      • TajoRecordConverter - Used by TajoRecordMateriailzer to materialize a Tajo Tuple.
      • TajoSchemaConverter - Converts between Tajo and Parquet schemas.

      Attachments

        1. TAJO-30_20140326_21:04:36.patch
          74 kB
          David Chen
        2. TAJO-30_20140326_05:34:17.patch
          74 kB
          David Chen
        3. TAJO-30_20140326_05:06:57.patch
          74 kB
          David Chen
        4. null_handling.patch
          4 kB
          Hyunsik Choi
        5. TAJO-30.patch
          56 kB
          David Chen

        Issue Links

          Activity

            People

              davidzchen David Chen
              hyunsik Hyunsik Choi
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: