Tajo
  1. Tajo
  2. TAJO-200

RCFile compatible to apache hive

    Details

    • Type: New Feature New Feature
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.8.0
    • Component/s: storage
    • Labels:
      None

      Description

      • Support both the text and the binary serialization/deserialization.
        • dafault : org.apache.tajo.storage.BinarySerializeDeserialize
      • use SequenceFile.metadata.
        • key: rcfile.serde
        • value: org.apache.tajo.storage.BinarySerializeDeserialize, org.apache.tajo.storage.TextSerializeDeserialize
      • improve memory efficiency
      • support tajo pushdown projection
      • support compression
      1. TAJO-200_2.patch
        174 kB
        Jinho Kim
      2. TAJO-200.patch
        169 kB
        Jinho Kim

        Issue Links

          Activity

          Hide
          Jinho Kim added a comment -
          Text Serialize/Deserialize
          // Tajo
          CREATE TABLE tablename (col1 type, col2 type)
          USING RCFILE WITH ('rcfile.serde'='org.apache.tajo.storage.TextSerializeDeserialize')
          
          //Hive 0.11 <=
          CREATE TABLE tablename (col1 type, col2 type)
          STORED AS RCFILE 
          
          
          Binary Serialize/Deserialize
          // Tajo
          CREATE TABLE tablename (col1 type, col2 type)
          USING RCFILE
          
          //Hive
          CREATE TABLE tablename (col1 type, col2 type)
          ROW FORMAT SERDE 
            'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' 
          STORED AS RCFILE 
          
          Lineitem example
          CREATE TABLE lineitem (L_ORDERKEY bigint, 
          L_PARTKEY bigint, 
          L_SUPPKEY bigint, 
          L_LINENUMBER bigint, 
          L_QUANTITY double, 
          L_EXTENDEDPRICE double, 
          L_DISCOUNT double, 
          L_TAX double, 
          L_RETURNFLAG text, 
          L_LINESTATUS text, 
          L_SHIPDATE text, 
          L_COMMITDATE text, 
          L_RECEIPTDATE text, 
          L_SHIPINSTRUCT text, 
          L_SHIPMODE text, 
          L_COMMENT text) 
          USING RCFILE WITH ('rcfile.serde'='org.apache.tajo.storage.TextSerializeDeserialize', 
          'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec',
          'rcfile.null'='\\N')
          
          Show
          Jinho Kim added a comment - Text Serialize/Deserialize // Tajo CREATE TABLE tablename (col1 type, col2 type) USING RCFILE WITH ('rcfile.serde'='org.apache.tajo.storage.TextSerializeDeserialize') //Hive 0.11 <= CREATE TABLE tablename (col1 type, col2 type) STORED AS RCFILE Binary Serialize/Deserialize // Tajo CREATE TABLE tablename (col1 type, col2 type) USING RCFILE //Hive CREATE TABLE tablename (col1 type, col2 type) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS RCFILE Lineitem example CREATE TABLE lineitem (L_ORDERKEY bigint, L_PARTKEY bigint, L_SUPPKEY bigint, L_LINENUMBER bigint, L_QUANTITY double , L_EXTENDEDPRICE double , L_DISCOUNT double , L_TAX double , L_RETURNFLAG text, L_LINESTATUS text, L_SHIPDATE text, L_COMMITDATE text, L_RECEIPTDATE text, L_SHIPINSTRUCT text, L_SHIPMODE text, L_COMMENT text) USING RCFILE WITH ('rcfile.serde'='org.apache.tajo.storage.TextSerializeDeserialize', 'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec', 'rcfile. null '='\\N')
          Hide
          Jinho Kim added a comment -

          I've attached the patch. and verified 'mvn clean install'.

          Show
          Jinho Kim added a comment - I've attached the patch. and verified 'mvn clean install'.
          Hide
          Jinho Kim added a comment -

          I've uploaded the second patch that fix bzip2 decompression.

          Show
          Jinho Kim added a comment - I've uploaded the second patch that fix bzip2 decompression.
          Hide
          Hyunsik Choi added a comment -

          +1
          Great job!

          Show
          Hyunsik Choi added a comment - +1 Great job!
          Hide
          Jinho Kim added a comment -

          Thanks hyunsik for the review!
          I've just committed it.

          Show
          Jinho Kim added a comment - Thanks hyunsik for the review! I've just committed it.
          Hide
          Hudson added a comment -

          SUCCESS: Integrated in Tajo-trunk-postcommit #559 (See https://builds.apache.org/job/Tajo-trunk-postcommit/559/)
          TAJO-200: RCFile compatible to apache hive. (jinho) (jinossy: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=effa7df63627940fe7b7e6c89591795a95cdbb3e)

          • tajo-common/src/main/java/org/apache/tajo/datum/BlobDatum.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java
          • tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/rcfile/TestRCFile.java
          • tajo-core/tajo-core-storage/src/main/resources/storage-default.xml
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SerializeDeserialize.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/BinarySerializeDeserialize.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncByteArrayOutputStream.java
          • CHANGES.txt
          • tajo-common/src/main/java/org/apache/tajo/datum/Inet4Datum.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncChunkedByteArrayOutputStream.java
          • tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestStorages.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/RCFileWrapper.java
          • tajo-common/src/main/java/org/apache/tajo/datum/TextDatum.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/TextSerializeDeserialize.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncByteArrayInputStream.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncDataOutputBuffer.java
          • tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java
          • tajo-core/tajo-core-storage/src/test/resources/storage-default.xml
          • tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java
          • tajo-common/src/main/java/org/apache/tajo/util/Bytes.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncDataInputBuffer.java
          • tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/v2/RCFileScanner.java
          Show
          Hudson added a comment - SUCCESS: Integrated in Tajo-trunk-postcommit #559 (See https://builds.apache.org/job/Tajo-trunk-postcommit/559/ ) TAJO-200 : RCFile compatible to apache hive. (jinho) (jinossy: https://git-wip-us.apache.org/repos/asf?p=incubator-tajo.git&a=commit&h=effa7df63627940fe7b7e6c89591795a95cdbb3e ) tajo-common/src/main/java/org/apache/tajo/datum/BlobDatum.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/RCFile.java tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/rcfile/TestRCFile.java tajo-core/tajo-core-storage/src/main/resources/storage-default.xml tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/SerializeDeserialize.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/BinarySerializeDeserialize.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncByteArrayOutputStream.java CHANGES.txt tajo-common/src/main/java/org/apache/tajo/datum/Inet4Datum.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncChunkedByteArrayOutputStream.java tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestStorages.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/RCFileWrapper.java tajo-common/src/main/java/org/apache/tajo/datum/TextDatum.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/TextSerializeDeserialize.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncByteArrayInputStream.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncDataOutputBuffer.java tajo-core/tajo-core-storage/src/test/java/org/apache/tajo/storage/TestCompressionStorages.java tajo-core/tajo-core-storage/src/test/resources/storage-default.xml tajo-common/src/main/java/org/apache/tajo/datum/DatumFactory.java tajo-common/src/main/java/org/apache/tajo/util/Bytes.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/rcfile/NonSyncDataInputBuffer.java tajo-core/tajo-core-storage/src/main/java/org/apache/tajo/storage/v2/RCFileScanner.java

            People

            • Assignee:
              Jinho Kim
              Reporter:
              Jinho Kim
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development