Details

    • Type: Task
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.11.0
    • Component/s: Documentation
    • Labels:
      None

      Activity

      Hide
      githubbot ASF GitHub Bot added a comment -

      GitHub user eminency opened a pull request:

      https://github.com/apache/tajo/pull/764

      TAJO-1682: Write ORC document

      You can merge this pull request into a Git repository by running:

      $ git pull https://github.com/eminency/tajo TAJO-1682

      Alternatively you can review and apply these changes as the patch at:

      https://github.com/apache/tajo/pull/764.patch

      To close this pull request, make a commit to your master/trunk branch
      with (at least) the following in the commit message:

      This closes #764


      commit 962dd359c3959cda5c9513567cab94511e1b800a
      Author: Jongyoung Park <eminency@gmail.com>
      Date: 2015-07-17T03:29:58Z

      Initial ORC document

      commit a8766828bb0cc359c236cad84486e7aaefc6fe4f
      Author: Jongyoung Park <eminency@gmail.com>
      Date: 2015-07-17T03:30:56Z

      adjust title length

      commit 75e3ac4b03cc398fd3794f8e2f8cc3d8d1f26833
      Author: Jongyoung Park <eminency@gmail.com>
      Date: 2015-07-17T05:42:58Z

      file_formats.rst is modified for orc


      Show
      githubbot ASF GitHub Bot added a comment - GitHub user eminency opened a pull request: https://github.com/apache/tajo/pull/764 TAJO-1682 : Write ORC document You can merge this pull request into a Git repository by running: $ git pull https://github.com/eminency/tajo TAJO-1682 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/764.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #764 commit 962dd359c3959cda5c9513567cab94511e1b800a Author: Jongyoung Park <eminency@gmail.com> Date: 2015-07-17T03:29:58Z Initial ORC document commit a8766828bb0cc359c236cad84486e7aaefc6fe4f Author: Jongyoung Park <eminency@gmail.com> Date: 2015-07-17T03:30:56Z adjust title length commit 75e3ac4b03cc398fd3794f8e2f8cc3d8d1f26833 Author: Jongyoung Park <eminency@gmail.com> Date: 2015-07-17T05:42:58Z file_formats.rst is modified for orc
      Hide
      githubbot ASF GitHub Bot added a comment -

      Github user jihoonson commented on a diff in the pull request:

      https://github.com/apache/tajo/pull/764#discussion_r39709356

      — Diff: tajo-docs/src/main/sphinx/table_management/orc.rst —
      @@ -0,0 +1,48 @@
      +***
      +ORC
      +***
      +
      +*ORC(Optimized Row Columnar)* is a columnar storage format from Hive. ORC improves performance for reading,
      +writing, and processing data.
      +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
      +
      +==========================
      +How to Create a ORC Table?
      — End diff –

      Should be changed to ```create an orc```

      Show
      githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/764#discussion_r39709356 — Diff: tajo-docs/src/main/sphinx/table_management/orc.rst — @@ -0,0 +1,48 @@ +*** +ORC +*** + +* ORC(Optimized Row Columnar) * is a columnar storage format from Hive. ORC improves performance for reading, +writing, and processing data. +For more details, please refer to `ORC Files < https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC >`_ at Hive wiki. + +========================== +How to Create a ORC Table? — End diff – Should be changed to ```create an orc```
      Hide
      githubbot ASF GitHub Bot added a comment -

      Github user jihoonson commented on a diff in the pull request:

      https://github.com/apache/tajo/pull/764#discussion_r39709431

      — Diff: tajo-docs/src/main/sphinx/table_management/orc.rst —
      @@ -0,0 +1,48 @@
      +***
      +ORC
      +***
      +
      +*ORC(Optimized Row Columnar)* is a columnar storage format from Hive. ORC improves performance for reading,
      +writing, and processing data.
      +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
      +
      +==========================
      +How to Create a ORC Table?
      +==========================
      +
      +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
      +
      +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
      +statement. Below is an example statement for creating a table using orc files.
      +
      +.. code-block:: sql
      +
      + CREATE TABLE table1 (
      + id int,
      + name text,
      + score float,
      + type text
      + ) USING orc;
      +
      +===================
      +Physical Properties
      +===================
      +
      +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
      +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
      +
      +Now, ORC file provides the following physical properties.
      +
      +* ``orc.max.merge.distance``: Reading property. When stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
      +* ``orc.stripe.size``: Writing property. It decides size of each stripe. Default is 64MB.
      +* ``orc.compression.kind``: Writing property. The compression algorithm used to compress data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``.
      +* ``orc.buffer.size``: Writing property. It decides size of writing buffer. Default is 256KB.
      +* ``orc.rowindex.stride``: Writing property. Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000.
      +
      +======================================
      +Compatibility Issues with Apache Hive™
      +======================================
      +
      +At the moment, Tajo only supports flat relational tables.
      +As a result, Tajo's ORC storage type does not support nested schemas.
      +However, we are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_).
      — End diff –

      I think that this sentence is redundant. It would be enough if tajo currently supports only flat schema.

      Show
      githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/764#discussion_r39709431 — Diff: tajo-docs/src/main/sphinx/table_management/orc.rst — @@ -0,0 +1,48 @@ +*** +ORC +*** + +* ORC(Optimized Row Columnar) * is a columnar storage format from Hive. ORC improves performance for reading, +writing, and processing data. +For more details, please refer to `ORC Files < https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC >`_ at Hive wiki. + +========================== +How to Create a ORC Table? +========================== + +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`. + +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE`` +statement. Below is an example statement for creating a table using orc files. + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING orc; + +=================== +Physical Properties +=================== + +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters. + +Now, ORC file provides the following physical properties. + +* ``orc.max.merge.distance``: Reading property. When stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB. +* ``orc.stripe.size``: Writing property. It decides size of each stripe. Default is 64MB. +* ``orc.compression.kind``: Writing property. The compression algorithm used to compress data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``. +* ``orc.buffer.size``: Writing property. It decides size of writing buffer. Default is 256KB. +* ``orc.rowindex.stride``: Writing property. Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000. + +====================================== +Compatibility Issues with Apache Hive™ +====================================== + +At the moment, Tajo only supports flat relational tables. +As a result, Tajo's ORC storage type does not support nested schemas. +However, we are currently working on adding support for nested schemas and non-scalar types (` TAJO-710 < https://issues.apache.org/jira/browse/TAJO-710 >`_). — End diff – I think that this sentence is redundant. It would be enough if tajo currently supports only flat schema.
      Hide
      githubbot ASF GitHub Bot added a comment -

      Github user jihoonson commented on a diff in the pull request:

      https://github.com/apache/tajo/pull/764#discussion_r39709603

      — Diff: tajo-docs/src/main/sphinx/table_management/orc.rst —
      @@ -0,0 +1,48 @@
      +***
      +ORC
      +***
      +
      +*ORC(Optimized Row Columnar)* is a columnar storage format from Hive. ORC improves performance for reading,
      +writing, and processing data.
      +For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
      +
      +==========================
      +How to Create a ORC Table?
      +==========================
      +
      +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
      +
      +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
      +statement. Below is an example statement for creating a table using orc files.
      +
      +.. code-block:: sql
      +
      + CREATE TABLE table1 (
      + id int,
      + name text,
      + score float,
      + type text
      + ) USING orc;
      +
      +===================
      +Physical Properties
      +===================
      +
      +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
      +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
      +
      +Now, ORC file provides the following physical properties.
      +
      +* ``orc.max.merge.distance``: Reading property. When stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
      — End diff –

      ```Reading property``` and ```writing property``` look weird to me. These sentences will be enough without them.

      Show
      githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/764#discussion_r39709603 — Diff: tajo-docs/src/main/sphinx/table_management/orc.rst — @@ -0,0 +1,48 @@ +*** +ORC +*** + +* ORC(Optimized Row Columnar) * is a columnar storage format from Hive. ORC improves performance for reading, +writing, and processing data. +For more details, please refer to `ORC Files < https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC >`_ at Hive wiki. + +========================== +How to Create a ORC Table? +========================== + +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`. + +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE`` +statement. Below is an example statement for creating a table using orc files. + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING orc; + +=================== +Physical Properties +=================== + +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters. + +Now, ORC file provides the following physical properties. + +* ``orc.max.merge.distance``: Reading property. When stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB. — End diff – ```Reading property``` and ```writing property``` look weird to me. These sentences will be enough without them.
      Hide
      githubbot ASF GitHub Bot added a comment -

      Github user eminency commented on the pull request:

      https://github.com/apache/tajo/pull/764#issuecomment-140977064

      Hi, @jihoonson .

      I applied your suggestions.

      Show
      githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/764#issuecomment-140977064 Hi, @jihoonson . I applied your suggestions.
      Hide
      githubbot ASF GitHub Bot added a comment -

      Github user jihoonson commented on the pull request:

      https://github.com/apache/tajo/pull/764#issuecomment-140980974

      +1, thank you for the quick update.

      Show
      githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/764#issuecomment-140980974 +1, thank you for the quick update.
      Hide
      githubbot ASF GitHub Bot added a comment -

      Github user asfgit closed the pull request at:

      https://github.com/apache/tajo/pull/764

      Show
      githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/764
      Hide
      hudson Hudson added a comment -

      FAILURE: Integrated in Tajo-master-CODEGEN-build #516 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/516/)
      TAJO-1682: Write ORC document. (jihoonson: rev b0c0a390e3e774c4004156ad0027cf8d3de4c876)

      • tajo-docs/src/main/sphinx/table_management/orc.rst
      • CHANGES
      • tajo-docs/src/main/sphinx/table_management/file_formats.rst
      Show
      hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #516 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/516/ ) TAJO-1682 : Write ORC document. (jihoonson: rev b0c0a390e3e774c4004156ad0027cf8d3de4c876) tajo-docs/src/main/sphinx/table_management/orc.rst CHANGES tajo-docs/src/main/sphinx/table_management/file_formats.rst
      Hide
      jihoonson Jihoon Son added a comment -

      I'll change the issue type to 'TASK' because I've already committed it as TASK..

      Show
      jihoonson Jihoon Son added a comment - I'll change the issue type to 'TASK' because I've already committed it as TASK..
      Hide
      jihoonson Jihoon Son added a comment -

      This patch is committed to master and 0.11.
      Thank you for your work!

      Show
      jihoonson Jihoon Son added a comment - This patch is committed to master and 0.11. Thank you for your work!
      Hide
      hudson Hudson added a comment -

      SUCCESS: Integrated in Tajo-master-build #874 (See https://builds.apache.org/job/Tajo-master-build/874/)
      TAJO-1682: Write ORC document. (jihoonson: rev b0c0a390e3e774c4004156ad0027cf8d3de4c876)

      • tajo-docs/src/main/sphinx/table_management/orc.rst
      • CHANGES
      • tajo-docs/src/main/sphinx/table_management/file_formats.rst
      Show
      hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #874 (See https://builds.apache.org/job/Tajo-master-build/874/ ) TAJO-1682 : Write ORC document. (jihoonson: rev b0c0a390e3e774c4004156ad0027cf8d3de4c876) tajo-docs/src/main/sphinx/table_management/orc.rst CHANGES tajo-docs/src/main/sphinx/table_management/file_formats.rst
      Hide
      hudson Hudson added a comment -

      SUCCESS: Integrated in Tajo-0.11.0-build #51 (See https://builds.apache.org/job/Tajo-0.11.0-build/51/)
      TAJO-1682: Write ORC document. (jihoonson: rev 7a603d592b281e8dffc6eccbd354343230c39dad)

      • tajo-docs/src/main/sphinx/table_management/file_formats.rst
      • CHANGES
      • tajo-docs/src/main/sphinx/table_management/orc.rst
      Show
      hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.11.0-build #51 (See https://builds.apache.org/job/Tajo-0.11.0-build/51/ ) TAJO-1682 : Write ORC document. (jihoonson: rev 7a603d592b281e8dffc6eccbd354343230c39dad) tajo-docs/src/main/sphinx/table_management/file_formats.rst CHANGES tajo-docs/src/main/sphinx/table_management/orc.rst

        People

        • Assignee:
          eminency Jongyoung Park
          Reporter:
          eminency Jongyoung Park
        • Votes:
          0 Vote for this issue
          Watchers:
          4 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development