Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.12.0, 0.11.1
    • Component/s: Documentation
    • Labels:
      None

      Description

      Currently, Tajo doesn't provide enough informations about partition table. Thus, we need to add more informations to following documentation.
      http://tajo.apache.org/docs/current/partitioning/column_partitioning.html

        Activity

        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-0.11.1-build #154 (See https://builds.apache.org/job/Tajo-0.11.1-build/154/)
        TAJO-1740: Update Partition Table document. (jaehwa) (blrunner: rev 4c34c53c61701497b7eabc6e819a8f20d3526c5a)

        • tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
        • CHANGES
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-0.11.1-build #154 (See https://builds.apache.org/job/Tajo-0.11.1-build/154/ ) TAJO-1740 : Update Partition Table document. (jaehwa) (blrunner: rev 4c34c53c61701497b7eabc6e819a8f20d3526c5a) tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst CHANGES
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #1055 (See https://builds.apache.org/job/Tajo-master-build/1055/)
        TAJO-1740: Update Partition Table document. (blrunner: rev 6c87e8e9daba0a934c6dbba20b361788e42f2cda)

        • tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
        • CHANGES
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #1055 (See https://builds.apache.org/job/Tajo-master-build/1055/ ) TAJO-1740 : Update Partition Table document. (blrunner: rev 6c87e8e9daba0a934c6dbba20b361788e42f2cda) tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst CHANGES
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #657 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/657/)
        TAJO-1740: Update Partition Table document. (blrunner: rev 6c87e8e9daba0a934c6dbba20b361788e42f2cda)

        • CHANGES
        • tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #657 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/657/ ) TAJO-1740 : Update Partition Table document. (blrunner: rev 6c87e8e9daba0a934c6dbba20b361788e42f2cda) CHANGES tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user asfgit closed the pull request at:

        https://github.com/apache/tajo/pull/896

        Show
        githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/tajo/pull/896
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-172446722

        Thanks @eminency and @jinossy . I'll ship it soon.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-172446722 Thanks @eminency and @jinossy . I'll ship it soon.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-172441743

        It looks good. +1, too.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-172441743 It looks good. +1, too.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jinossy commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-172440900

        +1 ship it!

        Show
        githubbot ASF GitHub Bot added a comment - Github user jinossy commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-172440900 +1 ship it!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-172433690

        Thanks @eminency . I reflected your comments and added some descriptions.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-172433690 Thanks @eminency . I reflected your comments and added some descriptions.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-172430843

        Hi, I leave some comments.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-172430843 Hi, I leave some comments.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r49963051

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -44,9 +72,166 @@ during query planning phase.

        • LIKE predicates with a leading wild-card character
        • IN list predicates

        +.. code-block:: sql
        +
        + SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL';
        + SELECT * FROM student WHERE country = 'USA' AND (city = 'NEWYORK' OR city = 'BOSTON');
        + SELECT * FROM student WHERE country = 'USA' AND city <> 'NEWYORK';
        +
        +
        +==================================================
        +Add data to Partition Table
        +==================================================
        +
        +Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it.
        +
        +For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema.
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student_source (
        + id INT,
        + name TEXT,
        + gender char(1),
        + grade TEXT,
        + country TEXT,
        + city TEXT,
        + phone TEXT
        + );
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + gender char(1),
        + grade TEXT,
        + phone TEXT
        + ) PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +
        +How to INSERT dynamically to partition table
        +--------------------------------------------------------
        +
        +If you want to load an entire country or an entire city in one fell swoop:
        +
        +.. code-block:: sql
        +
        + INSERT OVERWRITE INTO student
        + SELECT id, name, gender, grade, phone, country, city
        + FROM student_source;
        +
        +
        +How to CTAS dynamically to partition table
        +--------------------------------------------------------
        +
        +when a partition table is created:
        +
        +.. code-block:: sql
        +
        + DROP TABLE if exists student;
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + gender char(1),
        + grade TEXT,
        + phone TEXT
        + ) PARTITION BY COLUMN (country TEXT, city TEXT)
        + AS SELECT id, name, gender, grade, phone, country, city
        + FROM student_source;
        +
        +
        +.. note::
        +
        + When loading data into a partition, it’s necessary to include the partition columns as the last columns in the query. The column names in the source query don’t need to match the partition column names.
        +
        +
        ==================================================
        Compatibility Issues with Apache Hive™
        ==================================================

        If partitioned tables of Hive are created as external tables in Tajo, Tajo can process the Hive partitioned tables directly.
        -There haven't known compatibility issues yet.
        \ No newline at end of file
        +
        +
        +How to create partition table
        +--------------------------------------------------------
        +
        +If you create a partition table as follows in Tajo:
        +
        +.. code-block:: sql
        +
        + default> CREATE TABLE student (
        + id INT,
        + name TEXT,
        + gender char(1),
        + grade TEXT,
        + phone TEXT
        + ) PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +
        +And then you can get table information in Hive:
        +
        +.. code-block:: sql
        +
        + hive> desc student;
        + OK
        + id int
        + name string
        + gender char(1)
        + grade string
        + phone string
        + country string
        + city string
        +
        + # Partition Information
        + # col_name data_type comment
        +
        + country string
        + city string
        +
        +
        +Or as you create the table in Hive:
        +
        +.. code-block:: sql
        +
        + hive > CREATE TABLE student (
        + id int,
        + name string,
        + gender char(1),
        + grade string,
        + phone string
        + ) PARTITIONED BY (country string, city string)
        + ROW FORMAT DELIMITED
        + FIELDS TERMINATED BY '|' ;
        +
        +You will see table information in Tajo:
        +
        +.. code-block:: sql
        +
        + default> \d student;
        + table name: default.student
        + table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student
        + store type: TEXT
        + number of rows: 0
        + volume: 0 B
        + Options:
        + 'text.null'='
        N'
        + 'transient_lastDdlTime'='1438756422'
        + 'text.delimiter'='|'
        +
        + schema:
        + id INT4
        + name TEXT
        + gender CHAR(1)
        + grade TEXT
        + phone TEXT
        +
        + Partitions:
        + type:COLUMN
        + columns::default.student.country (TEXT), default.student.city (TEXT)
        +
        +
        +How to add data to partition table
        +--------------------------------------------------------
        +
        +In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system
        — End diff –

        '.' is missed at end of the statement.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r49963051 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -44,9 +72,166 @@ during query planning phase. LIKE predicates with a leading wild-card character IN list predicates +.. code-block:: sql + + SELECT * FROM student WHERE country = 'KOREA' AND city = 'SEOUL'; + SELECT * FROM student WHERE country = 'USA' AND (city = 'NEWYORK' OR city = 'BOSTON'); + SELECT * FROM student WHERE country = 'USA' AND city <> 'NEWYORK'; + + +================================================== +Add data to Partition Table +================================================== + +Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it. + +For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema. + +.. code-block:: sql + + CREATE TABLE student_source ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); + + CREATE TABLE student ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + phone TEXT + ) PARTITION BY COLUMN (country TEXT, city TEXT); + + +How to INSERT dynamically to partition table +-------------------------------------------------------- + +If you want to load an entire country or an entire city in one fell swoop: + +.. code-block:: sql + + INSERT OVERWRITE INTO student + SELECT id, name, gender, grade, phone, country, city + FROM student_source; + + +How to CTAS dynamically to partition table +-------------------------------------------------------- + +when a partition table is created: + +.. code-block:: sql + + DROP TABLE if exists student; + + CREATE TABLE student ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + phone TEXT + ) PARTITION BY COLUMN (country TEXT, city TEXT) + AS SELECT id, name, gender, grade, phone, country, city + FROM student_source; + + +.. note:: + + When loading data into a partition, it’s necessary to include the partition columns as the last columns in the query. The column names in the source query don’t need to match the partition column names. + + ================================================== Compatibility Issues with Apache Hive™ ================================================== If partitioned tables of Hive are created as external tables in Tajo, Tajo can process the Hive partitioned tables directly. -There haven't known compatibility issues yet. \ No newline at end of file + + +How to create partition table +-------------------------------------------------------- + +If you create a partition table as follows in Tajo: + +.. code-block:: sql + + default> CREATE TABLE student ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + phone TEXT + ) PARTITION BY COLUMN (country TEXT, city TEXT); + + +And then you can get table information in Hive: + +.. code-block:: sql + + hive> desc student; + OK + id int + name string + gender char(1) + grade string + phone string + country string + city string + + # Partition Information + # col_name data_type comment + + country string + city string + + +Or as you create the table in Hive: + +.. code-block:: sql + + hive > CREATE TABLE student ( + id int, + name string, + gender char(1), + grade string, + phone string + ) PARTITIONED BY (country string, city string) + ROW FORMAT DELIMITED + FIELDS TERMINATED BY '|' ; + +You will see table information in Tajo: + +.. code-block:: sql + + default> \d student; + table name: default.student + table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student + store type: TEXT + number of rows: 0 + volume: 0 B + Options: + 'text.null'=' N' + 'transient_lastDdlTime'='1438756422' + 'text.delimiter'='|' + + schema: + id INT4 + name TEXT + gender CHAR(1) + grade TEXT + phone TEXT + + Partitions: + type:COLUMN + columns::default.student.country (TEXT), default.student.city (TEXT) + + +How to add data to partition table +-------------------------------------------------------- + +In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system — End diff – '.' is missed at end of the statement.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r49962664

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + country TEXT,
          + city TEXT,
          + phone TEXT
          + );

        -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys.
        -Then, you should create a table as follows:
        +If you want to make country as partitioned column, your Tajo definition would be this:

        .. code-block:: sql

        • CREATE TABLE orders (
        • id INT,
        • item_name TEXT,
        • price
        • ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT);
          + CREATE TABLE student (
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + city TEXT,
          + phone TEXT
          + ) PARTITION BY COLUMN (country TEXT);
          +
          +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values.
          +Here's an example statement to create a table:
          +
          +.. code-block:: sql
          +
          + CREATE TABLE student (
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + phone TEXT
          + ) USING PARQUET
          + PARTITION BY COLUMN (country TEXT, city TEXT);
          +
          +The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files.
            • End diff –

        'with id, name, grade' looks ambiguous. Did you miss something like 'etc.'?

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r49962664 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. -Then, you should create a table as follows: +If you want to make country as partitioned column, your Tajo definition would be this: .. code-block:: sql CREATE TABLE orders ( id INT, item_name TEXT, price ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT); + CREATE TABLE student ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + city TEXT, + phone TEXT + ) PARTITION BY COLUMN (country TEXT); + +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values. +Here's an example statement to create a table: + +.. code-block:: sql + + CREATE TABLE student ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + phone TEXT + ) USING PARQUET + PARTITION BY COLUMN (country TEXT, city TEXT); + +The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files. End diff – 'with id, name, grade' looks ambiguous. Did you miss something like 'etc.'?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r49962595

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + country TEXT,
          + city TEXT,
          + phone TEXT
          + );

        -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys.
        -Then, you should create a table as follows:
        +If you want to make country as partitioned column, your Tajo definition would be this:

        .. code-block:: sql

        • CREATE TABLE orders (
        • id INT,
        • item_name TEXT,
        • price
        • ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT);
          + CREATE TABLE student (
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + city TEXT,
          + phone TEXT
          + ) PARTITION BY COLUMN (country TEXT);
          +
          +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values.
            • End diff –

        1. Using 'can' looks better instead of 'will'.
        2. I think the meaning of "but the 2nd column will be the original values." is not so clear.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r49962595 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. -Then, you should create a table as follows: +If you want to make country as partitioned column, your Tajo definition would be this: .. code-block:: sql CREATE TABLE orders ( id INT, item_name TEXT, price ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT); + CREATE TABLE student ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + city TEXT, + phone TEXT + ) PARTITION BY COLUMN (country TEXT); + +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values. End diff – 1. Using 'can' looks better instead of 'will'. 2. I think the meaning of "but the 2nd column will be the original values." is not so clear.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-172295051

        Hi Guys, anything else to commit this patch?

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-172295051 Hi Guys, anything else to commit this patch?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-161849906

        Thanks @dkhwangbo

        And @jihoonson , if possible, could you check this patch again?

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-161849906 Thanks @dkhwangbo And @jihoonson , if possible, could you check this patch again?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkhwangbo commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-161630152

        +1

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkhwangbo commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-161630152 +1
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r46517234

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + country TEXT,
          + city TEXT,
          + phone TEXT
          + );

        -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys.
        -Then, you should create a table as follows:
        +Now you want to partition on country. Your Tajo definition would be this:
        — End diff –

        Thanks @dkhwangbo . I reflected your comments.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r46517234 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. -Then, you should create a table as follows: +Now you want to partition on country. Your Tajo definition would be this: — End diff – Thanks @dkhwangbo . I reflected your comments.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkhwangbo commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r46516840

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + country TEXT,
          + city TEXT,
          + phone TEXT
          + );

        -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys.
        -Then, you should create a table as follows:
        +Now you want to partition on country. Your Tajo definition would be this:
        — End diff –

        looks better!

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkhwangbo commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r46516840 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. -Then, you should create a table as follows: +Now you want to partition on country. Your Tajo definition would be this: — End diff – looks better!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r46516518

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + country TEXT,
          + city TEXT,
          + phone TEXT
          + );

        -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys.
        -Then, you should create a table as follows:
        +Now you want to partition on country. Your Tajo definition would be this:
        — End diff –

        Thanks @dkhwangbo .
        How about write ```If you want to make country as partitioned column``` instead of your comments?

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r46516518 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. -Then, you should create a table as follows: +Now you want to partition on country. Your Tajo definition would be this: — End diff – Thanks @dkhwangbo . How about write ```If you want to make country as partitioned column``` instead of your comments?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkhwangbo commented on the pull request:

        https://github.com/apache/tajo/pull/896#issuecomment-161527753

        I leave a comment.

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkhwangbo commented on the pull request: https://github.com/apache/tajo/pull/896#issuecomment-161527753 I leave a comment.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user dkhwangbo commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/896#discussion_r46514712

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + gender char(1),
          + grade TEXT,
          + country TEXT,
          + city TEXT,
          + phone TEXT
          + );

        -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys.
        -Then, you should create a table as follows:
        +Now you want to partition on country. Your Tajo definition would be this:
        — End diff –

        I think, ```If you want to make partition on country,``` more better.

        Show
        githubbot ASF GitHub Bot added a comment - Github user dkhwangbo commented on a diff in the pull request: https://github.com/apache/tajo/pull/896#discussion_r46514712 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,50 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); -Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. -Then, you should create a table as follows: +Now you want to partition on country. Your Tajo definition would be this: — End diff – I think, ```If you want to make partition on country,``` more better.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user blrunner opened a pull request:

        https://github.com/apache/tajo/pull/896

        TAJO-1740: Update Partition Table document

        Please see https://github.com/apache/tajo/pull/728 which includes history for TAJO-1740.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/blrunner/tajo TAJO-1740-2

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/896.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #896


        commit 3f983e61adfbb5c936e935fcc026e8ffcd1067bc
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-12-03T03:03:51Z

        TAJO-1740: Update Partition Table document


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/896 TAJO-1740 : Update Partition Table document Please see https://github.com/apache/tajo/pull/728 which includes history for TAJO-1740 . You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo TAJO-1740 -2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/896.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #896 commit 3f983e61adfbb5c936e935fcc026e8ffcd1067bc Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-12-03T03:03:51Z TAJO-1740 : Update Partition Table document
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner closed the pull request at:

        https://github.com/apache/tajo/pull/728

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner closed the pull request at: https://github.com/apache/tajo/pull/728
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/728#issuecomment-161502087

        Thanks @jihoonson .
        Appears to be twisted. I'll create new PR soon.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/728#issuecomment-161502087 Thanks @jihoonson . Appears to be twisted. I'll create new PR soon.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on the pull request:

        https://github.com/apache/tajo/pull/728#issuecomment-161279525

        Thanks, but your patch looks to be include other patches. Would you fix it?

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on the pull request: https://github.com/apache/tajo/pull/728#issuecomment-161279525 Thanks, but your patch looks to be include other patches. Would you fix it?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/728#issuecomment-161227637

        Thanks @jihoonson

        I reflected your comments as follows.
        https://github.com/blrunner/tajo/commit/37283a0164a9fba298d1bd64e9b7f38b74d2dc10

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/728#issuecomment-161227637 Thanks @jihoonson I reflected your comments as follows. https://github.com/blrunner/tajo/commit/37283a0164a9fba298d1bd64e9b7f38b74d2dc10
        Hide
        tajoqa Tajo QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12754421/TAJO-1740.patch
        against master revision release-0.9.0-rc0-585-gce44234.

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/898//testReport/
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/898//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754421/TAJO-1740.patch against master revision release-0.9.0-rc0-585-gce44234. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/898//testReport/ Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/898//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user jihoonson commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/728#discussion_r46116233

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,40 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume a table with the following schema.

        • id INT,
        • item_name TEXT,
        • price FLOAT
          +.. code-block:: sql
          +
          + id INT,
          + name TEXT,
          + grade TEXT
            • End diff –

        Maybe you intend that this table has the five columns of id, name, grade, country, and city. If so, how about showing all columns as the below example of ```student_source```? It will be much easier to understand.

        Show
        githubbot ASF GitHub Bot added a comment - Github user jihoonson commented on a diff in the pull request: https://github.com/apache/tajo/pull/728#discussion_r46116233 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,40 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume a table with the following schema. id INT, item_name TEXT, price FLOAT +.. code-block:: sql + + id INT, + name TEXT, + grade TEXT End diff – Maybe you intend that this table has the five columns of id, name, grade, country, and city. If so, how about showing all columns as the below example of ```student_source```? It will be much easier to understand.
        Hide
        hyunsik Hyunsik Choi added a comment -

        I think that this issue can be resolved regardless of creating 0.11.0 RC artifact. So, I rescheduled it to 0.11.1.

        Show
        hyunsik Hyunsik Choi added a comment - I think that this issue can be resolved regardless of creating 0.11.0 RC artifact. So, I rescheduled it to 0.11.1.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/677#issuecomment-143170294

        I saw new PR. Please ignore the last comment.
        https://github.com/apache/tajo/pull/728

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/677#issuecomment-143170294 I saw new PR. Please ignore the last comment. https://github.com/apache/tajo/pull/728
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/677#issuecomment-143170169

        I recommend you to keep a single page for column partition. Each content is too short. Later, it may be hard to see all contents at once after we add new partition type.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/677#issuecomment-143170169 I recommend you to keep a single page for column partition. Each content is too short. Later, it may be hard to see all contents at once after we add new partition type.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/728#issuecomment-142468734

        Thanks @hyunsik
        I've reflected your comments.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/728#issuecomment-142468734 Thanks @hyunsik I've reflected your comments.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/728#discussion_r40145878

        — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst —
        @@ -11,22 +11,40 @@ How to Create a Column Partitioned Table
        You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        the ``PARTITION BY COLUMN`` clause with partition keys.

        -For example, assume there is a table ``orders`` composed of the following schema. ::
        +For example, assume there is a table ``student`` composed of the following schema.
        — End diff –

        ``assume a table with the following schema`` would be better.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/728#discussion_r40145878 — Diff: tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst — @@ -11,22 +11,40 @@ How to Create a Column Partitioned Table You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use the ``PARTITION BY COLUMN`` clause with partition keys. -For example, assume there is a table ``orders`` composed of the following schema. :: +For example, assume there is a table ``student`` composed of the following schema. — End diff – ``assume a table with the following schema`` would be better.
        Hide
        tajoqa Tajo QA added a comment -

        +1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12754421/TAJO-1740.patch
        against master revision release-0.9.0-rc0-445-gfd4a3f8.

        +1 @author. The patch does not contain any @author tags.

        +0 tests included. The patch appears to be a documentation patch that doesn't require tests.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The applied patch does not increase the total number of javadoc warnings.

        +1 checkstyle. The patch generated 0 code style errors.

        +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed unit tests in .

        Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/856//testReport/
        Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/856//console

        This message is automatically generated.

        Show
        tajoqa Tajo QA added a comment - +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754421/TAJO-1740.patch against master revision release-0.9.0-rc0-445-gfd4a3f8. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The applied patch does not increase the total number of javadoc warnings. +1 checkstyle. The patch generated 0 code style errors. +1 findbugs. The patch does not introduce any new Findbugs (version 2.0.3) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TAJO-Build/856//testReport/ Console output: https://builds.apache.org/job/PreCommit-TAJO-Build/856//console This message is automatically generated.
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user blrunner opened a pull request:

        https://github.com/apache/tajo/pull/728

        TAJO-1740: Update Partition Table document

        Currently, Tajo doesn't provide enough informations about partition table. Thus, we need to add more informations to following documentation.
        http://tajo.apache.org/docs/current/partitioning/column_partitioning.html

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/blrunner/tajo tajo-partition-document

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/728.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #728


        commit 888505d462d79e86c34d3ea8383e4207153f2731
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-09-04T08:47:17Z

        TAJO-1740: Update Partition Table document


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/728 TAJO-1740 : Update Partition Table document Currently, Tajo doesn't provide enough informations about partition table. Thus, we need to add more informations to following documentation. http://tajo.apache.org/docs/current/partitioning/column_partitioning.html You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo tajo-partition-document Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/728.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #728 commit 888505d462d79e86c34d3ea8383e4207153f2731 Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-09-04T08:47:17Z TAJO-1740 : Update Partition Table document
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner closed the pull request at:

        https://github.com/apache/tajo/pull/677

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner closed the pull request at: https://github.com/apache/tajo/pull/677
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/677#issuecomment-136245900

        Hi @eminency

        Thank you for your detailed review.
        Honestly, I've changed existing contents too much. So, I'll make new PR to keep existing ones. When making new PR, I'll must reference your feedback.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/677#issuecomment-136245900 Hi @eminency Thank you for your detailed review. Honestly, I've changed existing contents too much. So, I'll make new PR to keep existing ones. When making new PR, I'll must reference your feedback.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on the pull request:

        https://github.com/apache/tajo/pull/677#issuecomment-132450030

        Hi, @blrunner .

        I reviewed your writing documents and left some inline comments.

        Please check them out.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on the pull request: https://github.com/apache/tajo/pull/677#issuecomment-132450030 Hi, @blrunner . I reviewed your writing documents and left some inline comments. Please check them out.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/677#discussion_r37380422

        — Diff: tajo-docs/src/main/sphinx/partitioning/hive_compatibility.rst —
        @@ -0,0 +1,90 @@
        +********************************
        +Hive Compatibility
        +********************************
        +
        +Tajo provides HiveCatalogStore to process the Hive partitioned tables directly. If you wish to use HiveCatalogStore, you should specify hive configurations to both tajo-env.sh file and catalog-site.xml file. Please see the following page.
        +
        +.. toctree::
        + :maxdepth: 1
        +
        + /hive_integration
        +
        +================================================
        +How to create partition table
        +================================================
        +
        +If you want to create a partition table as follows in Tajo:
        +
        +.. code-block:: sql
        +
        + default> CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +
        +And then you can get table information in Hive:
        +
        +.. code-block:: sql
        +
        + hive> desc student;
        + OK
        + id int
        + name string
        + grade string
        + country string
        + city string
        +
        + # Partition Information
        + # col_name data_type comment
        +
        + country string
        + city string
        +
        +
        +Or as you create the table in Hive:
        +
        +.. code-block:: sql
        +
        + hive > CREATE TABLE student (
        + id int,
        + name string,
        + grade string
        + ) PARTITIONED BY (country string, city string)
        + ROW FORMAT DELIMITED
        + FIELDS TERMINATED BY '|' ;
        +
        +You will see table information in Tajo:
        +
        +.. code-block:: sql
        +
        + default> \d student;
        + table name: default.student
        + table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student
        + store type: TEXT
        + number of rows: 0
        + volume: 0 B
        + Options:
        + 'text.null'='
        N'
        + 'transient_lastDdlTime'='1438756422'
        + 'text.delimiter'='|'
        +
        + schema:
        + id INT4
        + name TEXT
        + grade TEXT
        +
        + Partitions:
        + type:COLUMN
        + columns::default.student.country (TEXT), default.student.city (TEXT)
        +
        +
        +
        +================================================
        +How to add data to partition table
        +================================================
        +
        +In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system
        — End diff –

        '.' is missed at the end of last statement.

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/677#discussion_r37380422 — Diff: tajo-docs/src/main/sphinx/partitioning/hive_compatibility.rst — @@ -0,0 +1,90 @@ +******************************** +Hive Compatibility +******************************** + +Tajo provides HiveCatalogStore to process the Hive partitioned tables directly. If you wish to use HiveCatalogStore, you should specify hive configurations to both tajo-env.sh file and catalog-site.xml file. Please see the following page. + +.. toctree:: + :maxdepth: 1 + + /hive_integration + +================================================ +How to create partition table +================================================ + +If you want to create a partition table as follows in Tajo: + +.. code-block:: sql + + default> CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) PARTITION BY COLUMN (country TEXT, city TEXT); + + +And then you can get table information in Hive: + +.. code-block:: sql + + hive> desc student; + OK + id int + name string + grade string + country string + city string + + # Partition Information + # col_name data_type comment + + country string + city string + + +Or as you create the table in Hive: + +.. code-block:: sql + + hive > CREATE TABLE student ( + id int, + name string, + grade string + ) PARTITIONED BY (country string, city string) + ROW FORMAT DELIMITED + FIELDS TERMINATED BY '|' ; + +You will see table information in Tajo: + +.. code-block:: sql + + default> \d student; + table name: default.student + table uri: hdfs://your_hdfs_namespace/user/hive/warehouse/student + store type: TEXT + number of rows: 0 + volume: 0 B + Options: + 'text.null'=' N' + 'transient_lastDdlTime'='1438756422' + 'text.delimiter'='|' + + schema: + id INT4 + name TEXT + grade TEXT + + Partitions: + type:COLUMN + columns::default.student.country (TEXT), default.student.city (TEXT) + + + +================================================ +How to add data to partition table +================================================ + +In Tajo, you can add data dynamically to partition table of Hive with both ``INSERT INTO ... SELECT`` and ``Create Table As Select (CTAS)`` statments. Tajo will automatically filter the data to HiveMetastore, create directories and move filtered data to appropriate directory on the distributed file system — End diff – '.' is missed at the end of last statement.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/677#discussion_r37380325

        — Diff: tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst —
        @@ -0,0 +1,73 @@
        +*********************************
        +Define Partition Table
        +*********************************
        +
        +Tajo makes it easy to specify an automatic partition scheme when the table is created.
        +
        +================================================
        +How to Create Partitione Table
        +================================================
        +
        +You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        +the ``PARTITION BY COLUMN`` clause with partition keys.
        +
        +For example, assume there is a table ``student`` composed of the following schema.
        +
        +.. code-block:: sql
        +
        + id INT,
        + name TEXT,
        + grade TEXT
        +
        +Now you want to partition on country. Your Tajo definition would be this:
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) PARTITION BY COLUMN (country TEXT);
        +
        +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values.
        +Here's an example statement to create a table:
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) USING PARQUET
        + PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files.
        +
        +You might have noticed that while the partitioning key columns are a part of the table DDL, they’re only listed in the ``PARTITION BY`` clause. In Tajo, as data is written to disk, each partition of data will be automatically split out into different folders, e.g. country=USA/city=NEWYORK. During a read operation, Tajo will use the folder structure to quickly locate the right partitions and also return the partitioning columns as columns in the result set.
        +
        +
        +==================================================
        +Partition Pruning on Partition Table
        +==================================================
        +
        +The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing
        +during query planning phase.
        +
        +* ``=``
        +* ``<>``
        +* ``>``
        +* ``<``
        +* ``>=``
        +* ``<=``
        +* LIKE predicates with a leading wild-card character
        +* IN list predicates
        +
        +Now above example table data is partitioned by country and city, so when the query is applied on table it can easily access the required row by the help partitions
        — End diff –

        'the help of partitions' looks better

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/677#discussion_r37380325 — Diff: tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst — @@ -0,0 +1,73 @@ +********************************* +Define Partition Table +********************************* + +Tajo makes it easy to specify an automatic partition scheme when the table is created. + +================================================ +How to Create Partitione Table +================================================ + +You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use +the ``PARTITION BY COLUMN`` clause with partition keys. + +For example, assume there is a table ``student`` composed of the following schema. + +.. code-block:: sql + + id INT, + name TEXT, + grade TEXT + +Now you want to partition on country. Your Tajo definition would be this: + +.. code-block:: sql + + CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) PARTITION BY COLUMN (country TEXT); + +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values. +Here's an example statement to create a table: + +.. code-block:: sql + + CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) USING PARQUET + PARTITION BY COLUMN (country TEXT, city TEXT); + +The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files. + +You might have noticed that while the partitioning key columns are a part of the table DDL, they’re only listed in the ``PARTITION BY`` clause. In Tajo, as data is written to disk, each partition of data will be automatically split out into different folders, e.g. country=USA/city=NEWYORK. During a read operation, Tajo will use the folder structure to quickly locate the right partitions and also return the partitioning columns as columns in the result set. + + +================================================== +Partition Pruning on Partition Table +================================================== + +The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing +during query planning phase. + +* ``=`` +* ``<>`` +* ``>`` +* ``<`` +* ``>=`` +* ``<=`` +* LIKE predicates with a leading wild-card character +* IN list predicates + +Now above example table data is partitioned by country and city, so when the query is applied on table it can easily access the required row by the help partitions — End diff – 'the help of partitions' looks better
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/677#discussion_r37380187

        — Diff: tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst —
        @@ -0,0 +1,73 @@
        +*********************************
        +Define Partition Table
        +*********************************
        +
        +Tajo makes it easy to specify an automatic partition scheme when the table is created.
        +
        +================================================
        +How to Create Partitione Table
        +================================================
        +
        +You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use
        +the ``PARTITION BY COLUMN`` clause with partition keys.
        +
        +For example, assume there is a table ``student`` composed of the following schema.
        +
        +.. code-block:: sql
        +
        + id INT,
        + name TEXT,
        + grade TEXT
        +
        +Now you want to partition on country. Your Tajo definition would be this:
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) PARTITION BY COLUMN (country TEXT);
        +
        +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values.
        +Here's an example statement to create a table:
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) USING PARQUET
        + PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files.
        — End diff –

        Is 'parquet files' right?

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/677#discussion_r37380187 — Diff: tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst — @@ -0,0 +1,73 @@ +********************************* +Define Partition Table +********************************* + +Tajo makes it easy to specify an automatic partition scheme when the table is created. + +================================================ +How to Create Partitione Table +================================================ + +You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use +the ``PARTITION BY COLUMN`` clause with partition keys. + +For example, assume there is a table ``student`` composed of the following schema. + +.. code-block:: sql + + id INT, + name TEXT, + grade TEXT + +Now you want to partition on country. Your Tajo definition would be this: + +.. code-block:: sql + + CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) PARTITION BY COLUMN (country TEXT); + +Now your users will still query on ``WHERE country = '...'`` but the 2nd column will be the original values. +Here's an example statement to create a table: + +.. code-block:: sql + + CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) USING PARQUET + PARTITION BY COLUMN (country TEXT, city TEXT); + +The statement above creates the student table with id, name, grade. The table is also partitioned and data is stored in parquet files. — End diff – Is 'parquet files' right?
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/677#discussion_r37380110

        — Diff: tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst —
        @@ -0,0 +1,73 @@
        +*********************************
        +Define Partition Table
        +*********************************
        +
        +Tajo makes it easy to specify an automatic partition scheme when the table is created.
        +
        +================================================
        +How to Create Partitione Table
        +================================================
        — End diff –

        'Partitione' -> 'Partition'

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/677#discussion_r37380110 — Diff: tajo-docs/src/main/sphinx/partitioning/define_partition_table.rst — @@ -0,0 +1,73 @@ +********************************* +Define Partition Table +********************************* + +Tajo makes it easy to specify an automatic partition scheme when the table is created. + +================================================ +How to Create Partitione Table +================================================ — End diff – 'Partitione' -> 'Partition'
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/677#discussion_r37380089

        — Diff: tajo-docs/src/main/sphinx/partitioning/alter_partition.rst —
        @@ -0,0 +1,66 @@
        +********************************
        +Alter partition
        +********************************
        +
        +You can ALTER TABLE to add or drop partitions for partition table.
        +
        +For example, assume there is a ``student`` table composed of the following schema.
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +
        +========================
        +ADD PARTITION
        +========================
        +
        +Synopsis
        +
        +.. code-block:: sql
        +
        + ALTER TABLE <table_name> [IF NOT EXISTS] ADD PARTITION (<partition column> = <partition value>, ...) [LOCATION = <partition's path>]
        +
        +Description
        +
        +You can use ``ALTER TABLE ADD PARTITION`` to ADD PARTITIONs to a table. The location must be a directory inside of which data files reside. If the location doesn't exist on the file system, Tajo will make the location by force. ``ADD PARTITION`` changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results. An error is thrown if the partition for the table already exists. You can use ``IF NOT EXISTS`` to skip the error.
        +
        +Examples
        +
        +.. code-block:: sql
        +
        + – Each ADD PARTITION clause creates a subdirectory in HDFS.
        + ALTER TABLE student ADD PARTITION (country='KOREA', city='SEOUL');
        + ALTER TABLE student ADD PARTITION (country='KOREA', city='PUSAN');
        + ALTER TABLE student ADD PARTITION (country='USA', city='NEWYORK');
        + ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON');
        + – Redirect queries, INSERT, and LOAD DATA for one partition to a specific different directory.
        + ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON') LOCATION '/usr/external_data/new_years_day';
        +
        +
        +========================
        + DROP PARTITION
        +========================
        +
        +Synopsis
        +
        +.. code-block:: sql
        +
        + ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE]
        +
        +Description
        +
        +You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for partition table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error.
        +
        +Examples
        +
        +.. code-block:: sql
        +
        + – Delete just metadata
        + ALTER TABLE table1 DROP PARTITION (country = 'KOREA' , city = 'SEOUL');
        + – Delete table data and metadata
        + ALTER TABLE table1 DROP PARTITION (country = 'USA', city = 'NEWYORK' ) PURGE
        — End diff –

        ';' is missed

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/677#discussion_r37380089 — Diff: tajo-docs/src/main/sphinx/partitioning/alter_partition.rst — @@ -0,0 +1,66 @@ +******************************** +Alter partition +******************************** + +You can ALTER TABLE to add or drop partitions for partition table. + +For example, assume there is a ``student`` table composed of the following schema. + +.. code-block:: sql + + CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) PARTITION BY COLUMN (country TEXT, city TEXT); + + +======================== +ADD PARTITION +======================== + + Synopsis + +.. code-block:: sql + + ALTER TABLE <table_name> [IF NOT EXISTS] ADD PARTITION (<partition column> = <partition value>, ...) [LOCATION = <partition's path>] + + Description + +You can use ``ALTER TABLE ADD PARTITION`` to ADD PARTITIONs to a table. The location must be a directory inside of which data files reside. If the location doesn't exist on the file system, Tajo will make the location by force. ``ADD PARTITION`` changes the table metadata, but does not load data. If the data does not exist in the partition's location, queries will not return any results. An error is thrown if the partition for the table already exists. You can use ``IF NOT EXISTS`` to skip the error. + + Examples + +.. code-block:: sql + + – Each ADD PARTITION clause creates a subdirectory in HDFS. + ALTER TABLE student ADD PARTITION (country='KOREA', city='SEOUL'); + ALTER TABLE student ADD PARTITION (country='KOREA', city='PUSAN'); + ALTER TABLE student ADD PARTITION (country='USA', city='NEWYORK'); + ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON'); + – Redirect queries, INSERT, and LOAD DATA for one partition to a specific different directory. + ALTER TABLE student ADD PARTITION (country='USA', city='BOSTON') LOCATION '/usr/external_data/new_years_day'; + + +======================== + DROP PARTITION +======================== + + Synopsis + +.. code-block:: sql + + ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE] + + Description + +You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for partition table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exists. You can use ``IF EXISTS`` to skip the error. + + Examples + +.. code-block:: sql + + – Delete just metadata + ALTER TABLE table1 DROP PARTITION (country = 'KOREA' , city = 'SEOUL'); + – Delete table data and metadata + ALTER TABLE table1 DROP PARTITION (country = 'USA', city = 'NEWYORK' ) PURGE — End diff – ';' is missed
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user eminency commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/677#discussion_r37379980

        — Diff: tajo-docs/src/main/sphinx/partitioning/add_data_to_partition_table.rst —
        @@ -0,0 +1,62 @@
        +********************************
        +Add data to Partition Table
        +********************************
        +
        +Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it.
        +
        +For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema.
        +
        +.. code-block:: sql
        +
        + CREATE TABLE student_source (
        + id INT,
        + name TEXT,
        + gender char(1),
        + grade TEXT,
        + country TEXT,
        + city TEXT,
        + phone TEXT
        + );
        +
        + CREATE TABLE student (
        + id INT,
        + name TEXT,
        + grade TEXT
        + ) PARTITION BY COLUMN (country TEXT, city TEXT);
        +
        +
        +================================================
        +How to INSERT dynamically to partition table
        +================================================
        +
        +If you want to load an entire country or an entire city in one fell swoop:
        +
        +.. code-block:: sql
        +
        + INSERT OVERWRITE INTO student
        + SELECT id, name, grade, country, city
        + FROM student_source;
        +
        +
        +================================================
        +How to CTAS dynamically to partition table
        +================================================
        +
        +If you want to load an entire country or an entire city in one fell swoop:
        — End diff –

        Explanation is same precisely as above one.
        I think it's better to add some words like 'including CREATE TABLE' or 'when a partition table is created'

        Show
        githubbot ASF GitHub Bot added a comment - Github user eminency commented on a diff in the pull request: https://github.com/apache/tajo/pull/677#discussion_r37379980 — Diff: tajo-docs/src/main/sphinx/partitioning/add_data_to_partition_table.rst — @@ -0,0 +1,62 @@ +******************************** +Add data to Partition Table +******************************** + +Tajo provides a very useful feature of dynamic partitioning. You don't need to use any syntax with both ``INSERT INTO ... SELECT`` and ``Create Table As Select(CTAS)`` statments for dynamic partitioning. Tajo will automatically filter the data, create directories, move filtered data to appropriate directory and create partition over it. + +For example, assume there are both ``student_source`` and ``student`` tables composed of the following schema. + +.. code-block:: sql + + CREATE TABLE student_source ( + id INT, + name TEXT, + gender char(1), + grade TEXT, + country TEXT, + city TEXT, + phone TEXT + ); + + CREATE TABLE student ( + id INT, + name TEXT, + grade TEXT + ) PARTITION BY COLUMN (country TEXT, city TEXT); + + +================================================ +How to INSERT dynamically to partition table +================================================ + +If you want to load an entire country or an entire city in one fell swoop: + +.. code-block:: sql + + INSERT OVERWRITE INTO student + SELECT id, name, grade, country, city + FROM student_source; + + +================================================ +How to CTAS dynamically to partition table +================================================ + +If you want to load an entire country or an entire city in one fell swoop: — End diff – Explanation is same precisely as above one. I think it's better to add some words like 'including CREATE TABLE' or 'when a partition table is created'
        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user blrunner opened a pull request:

        https://github.com/apache/tajo/pull/677

        TAJO-1740: Update Partition Table document

        Currently, Tajo doesn't provide enough informations about partition table. Thus, we need to add more informations to following documentation.

        http://tajo.apache.org/docs/current/partitioning/column_partitioning.html

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/blrunner/tajo TAJO-1740

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/677.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #677


        commit 56f5e805ea3f1dba9924901081c932172c4a007f
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2015-08-05T08:22:37Z

        TAJO-1740: Update Partition Table document


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/677 TAJO-1740 : Update Partition Table document Currently, Tajo doesn't provide enough informations about partition table. Thus, we need to add more informations to following documentation. http://tajo.apache.org/docs/current/partitioning/column_partitioning.html You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo TAJO-1740 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/677.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #677 commit 56f5e805ea3f1dba9924901081c932172c4a007f Author: JaeHwa Jung <blrunner@apache.org> Date: 2015-08-05T08:22:37Z TAJO-1740 : Update Partition Table document

          People

          • Assignee:
            blrunner Jaehwa Jung
            Reporter:
            blrunner Jaehwa Jung
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development