Uploaded image for project: 'Tajo'
  1. Tajo
  2. TAJO-424

Make serializer/deserializer configurable in CSVFile

    Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: Storage
    • Labels:
      None

      Description

      The CSVFile serializer/deserializer is fixed to TextSerializeDeserialize in the LazyTuple. This should be configurable.

      1. TAJO-424.patch
        23 kB
        Jinho Kim
      2. TAJO-424_2.patch
        65 kB
        Jinho Kim

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-trunk-postcommit #626 (See https://builds.apache.org/job/Tajo-trunk-postcommit/626/)
        TAJO-424: Make serializer/deserializer configurable in CSVFile. (jinho) (jinossy: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=2c53ccc11ffc94bbbddfaa4fe1831b12405ff229)

        • tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java
        • tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestLazyTuple.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/TextSerializerDeserializer.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/LazyTuple.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SerializerDeserializer.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/CSVFile.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/CompressedSplitLineReader.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SplitLineReader.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/TextSerializeDeserialize.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SerializeDeserialize.java
        • tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestStorages.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/BinarySerializeDeserialize.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/LineReader.java
        • CHANGES.txt
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/BinarySerializerDeserializer.java
        • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/v2/RCFileScanner.java
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-trunk-postcommit #626 (See https://builds.apache.org/job/Tajo-trunk-postcommit/626/ ) TAJO-424 : Make serializer/deserializer configurable in CSVFile. (jinho) (jinossy: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=2c53ccc11ffc94bbbddfaa4fe1831b12405ff229 ) tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestLazyTuple.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/TextSerializerDeserializer.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/LazyTuple.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SerializerDeserializer.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/CSVFile.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/CompressedSplitLineReader.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SplitLineReader.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/TextSerializeDeserialize.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SerializeDeserialize.java tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestStorages.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/BinarySerializeDeserialize.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/LineReader.java CHANGES.txt tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/BinarySerializerDeserializer.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/v2/RCFileScanner.java
        Hide
        jhkim Jinho Kim added a comment -

        Thanks jihoon for the review!
        I've just committed it.

        Show
        jhkim Jinho Kim added a comment - Thanks jihoon for the review! I've just committed it.
        Hide
        jihoonson Jihoon Son added a comment -

        +1.
        This patch looks good to me.

        Show
        jihoonson Jihoon Son added a comment - +1. This patch looks good to me.
        Hide
        jhkim Jinho Kim added a comment -

        Hyunsik,
        Thank you for the review
        1) I've updated the second patch.
        2) You're right. we need rename. I will create new jira issue

        Show
        jhkim Jinho Kim added a comment - Hyunsik, Thank you for the review 1) I've updated the second patch. 2) You're right. we need rename. I will create new jira issue
        Hide
        hyunsik Hyunsik Choi added a comment -

        +1

        Nice work. In addition, I would like to two suggestions. (1) SerializeDeserialize is verb, and I think it is only verb in many member variables. How about changing the name to SerializeDeserialize*r*. (2) In addition, if we support custom (de) serializer, CSVFile is not a proper name anymore. We need to rename the format name and its properties, such csvfile.delimiter and csvfile.serde. Probably, it is right time to rename them because Tajo is under the relatively early stage. It would be good to create a separate jira issue for it.

        Anyway, I agreed this patch.

        Show
        hyunsik Hyunsik Choi added a comment - +1 Nice work. In addition, I would like to two suggestions. (1) SerializeDeserialize is verb, and I think it is only verb in many member variables. How about changing the name to SerializeDeserialize*r*. (2) In addition, if we support custom (de) serializer, CSVFile is not a proper name anymore. We need to rename the format name and its properties, such csvfile.delimiter and csvfile.serde. Probably, it is right time to rename them because Tajo is under the relatively early stage. It would be good to create a separate jira issue for it. Anyway, I agreed this patch.
        Hide
        jhkim Jinho Kim added a comment -

        This patch additionally fix wrong row count sometimes.
        I've refer to MAPREDUCE-5656

        Show
        jhkim Jinho Kim added a comment - This patch additionally fix wrong row count sometimes. I've refer to MAPREDUCE-5656

          People

          • Assignee:
            jhkim Jinho Kim
            Reporter:
            jhkim Jinho Kim
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development