Details

    • Type: Improvement
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.10.0
    • Component/s: Catalog
    • Labels:
      None

      Description

      Currently, tajo provides column partition for partitioned table. But there is a lack in HCatalogStore. Tajo doesn't store a list of partitions for partitioned table in CatalogStore. But Hive store a list of partitions in HiveMetaStore. So, if you wrote partitioned table on Tajo, you must run msck repair function or add partition on Hive. I think that it is a very inefficient work. Thus, tajo need to repair partitioned directory list after writing partitioned table in HCatalgStore as follows:

      INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem;
      ALTER TABLE lineitem ADD PARTITIONS;
      

      For reference, "ADD PARTITIONS" just run on HCatalgStore.It doesn't run on another CatalogStore.

        Activity

        Hide
        githubbot ASF GitHub Bot added a comment -

        GitHub user blrunner opened a pull request:

        https://github.com/apache/tajo/pull/263

        TAJO-1053: ADD PARTITIONS for HCatalogStore.

        I tried to resolve this issue. But unfortunately, current tajo doesn't provide ALTER PARTITION command. In addition, hive doesn't support a api for repair all partitions at a time. So, there is a one way which scan all directories of a tajo and run ALTER PARITION api for each all partitions. But it is very inefficient way and it will be a cause of HiveMetaStore low performance. Thus, we need to guide our users to run msck command on hive.

        You can merge this pull request into a Git repository by running:

        $ git pull https://github.com/blrunner/tajo TAJO-1053

        Alternatively you can review and apply these changes as the patch at:

        https://github.com/apache/tajo/pull/263.patch

        To close this pull request, make a commit to your master/trunk branch
        with (at least) the following in the commit message:

        This closes #263


        commit e90a003c1d86ef4bbe484906cd4f329c84d6bdb5
        Author: JaeHwa Jung <blrunner@apache.org>
        Date: 2014-11-20T08:15:17Z

        TAJO-1053: ADD PARTITIONS for HCatalogStore.


        Show
        githubbot ASF GitHub Bot added a comment - GitHub user blrunner opened a pull request: https://github.com/apache/tajo/pull/263 TAJO-1053 : ADD PARTITIONS for HCatalogStore. I tried to resolve this issue. But unfortunately, current tajo doesn't provide ALTER PARTITION command. In addition, hive doesn't support a api for repair all partitions at a time. So, there is a one way which scan all directories of a tajo and run ALTER PARITION api for each all partitions. But it is very inefficient way and it will be a cause of HiveMetaStore low performance. Thus, we need to guide our users to run msck command on hive. You can merge this pull request into a Git repository by running: $ git pull https://github.com/blrunner/tajo TAJO-1053 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tajo/pull/263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #263 commit e90a003c1d86ef4bbe484906cd4f329c84d6bdb5 Author: JaeHwa Jung <blrunner@apache.org> Date: 2014-11-20T08:15:17Z TAJO-1053 : ADD PARTITIONS for HCatalogStore.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on a diff in the pull request:

        https://github.com/apache/tajo/pull/263#discussion_r20838803

        — Diff: tajo-docs/src/main/sphinx/hcatalog_integration.rst —
        @@ -37,3 +37,16 @@ Finally, you should specify HCatalogStore as Tajo catalog driver class in ``conf
        <name>tajo.catalog.store.class</name>
        <value>org.apache.tajo.catalog.store.HCatalogStore</value>
        </property>
        +
        +.. note::
        +
        + Hive stores a list of partitions for each table in its metastore. If new partitions are
        + directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user
        + ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or
        + ``MSCK REPAIR TABLE table_name`` command.
        +
        + But current tajo doesn't provide ``ADD PARTITION`` command and hive doesn't provide a api for
        — End diff –

        ```a api``` should be an api.

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on a diff in the pull request: https://github.com/apache/tajo/pull/263#discussion_r20838803 — Diff: tajo-docs/src/main/sphinx/hcatalog_integration.rst — @@ -37,3 +37,16 @@ Finally, you should specify HCatalogStore as Tajo catalog driver class in ``conf <name>tajo.catalog.store.class</name> <value>org.apache.tajo.catalog.store.HCatalogStore</value> </property> + +.. note:: + + Hive stores a list of partitions for each table in its metastore. If new partitions are + directly added to HDFS, HiveMetastore will not able aware of these partitions unless the user + ``ALTER TABLE table_name ADD PARTITION`` commands on each of the newly added partitions or + ``MSCK REPAIR TABLE table_name`` command. + + But current tajo doesn't provide ``ADD PARTITION`` command and hive doesn't provide a api for — End diff – ```a api``` should be an api.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/263#issuecomment-64300228

        Thanks @hyunsik .
        I've just updated it.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/263#issuecomment-64300228 Thanks @hyunsik . I've just updated it.
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user hyunsik commented on the pull request:

        https://github.com/apache/tajo/pull/263#issuecomment-64316480

        +1 ship it!

        Show
        githubbot ASF GitHub Bot added a comment - Github user hyunsik commented on the pull request: https://github.com/apache/tajo/pull/263#issuecomment-64316480 +1 ship it!
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner closed the pull request at:

        https://github.com/apache/tajo/pull/263

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner closed the pull request at: https://github.com/apache/tajo/pull/263
        Hide
        githubbot ASF GitHub Bot added a comment -

        Github user blrunner commented on the pull request:

        https://github.com/apache/tajo/pull/263#issuecomment-64318304

        @hyunsik , thanks for quick review.

        Show
        githubbot ASF GitHub Bot added a comment - Github user blrunner commented on the pull request: https://github.com/apache/tajo/pull/263#issuecomment-64318304 @hyunsik , thanks for quick review.
        Hide
        hudson Hudson added a comment -

        SUCCESS: Integrated in Tajo-master-build #465 (See https://builds.apache.org/job/Tajo-master-build/465/)
        TAJO-1053: ADD PARTITIONS for HCatalogStore. (jaehwa) (blrunner: rev 3ae44b1d2a1cf49123eb1d1c30f081a8a8d0e7fb)

        • tajo-docs/src/main/sphinx/hcatalog_integration.rst
        • CHANGES
        Show
        hudson Hudson added a comment - SUCCESS: Integrated in Tajo-master-build #465 (See https://builds.apache.org/job/Tajo-master-build/465/ ) TAJO-1053 : ADD PARTITIONS for HCatalogStore. (jaehwa) (blrunner: rev 3ae44b1d2a1cf49123eb1d1c30f081a8a8d0e7fb) tajo-docs/src/main/sphinx/hcatalog_integration.rst CHANGES
        Hide
        hudson Hudson added a comment -

        FAILURE: Integrated in Tajo-master-CODEGEN-build #107 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/107/)
        TAJO-1053: ADD PARTITIONS for HCatalogStore. (jaehwa) (blrunner: rev 3ae44b1d2a1cf49123eb1d1c30f081a8a8d0e7fb)

        • tajo-docs/src/main/sphinx/hcatalog_integration.rst
        • CHANGES
        Show
        hudson Hudson added a comment - FAILURE: Integrated in Tajo-master-CODEGEN-build #107 (See https://builds.apache.org/job/Tajo-master-CODEGEN-build/107/ ) TAJO-1053 : ADD PARTITIONS for HCatalogStore. (jaehwa) (blrunner: rev 3ae44b1d2a1cf49123eb1d1c30f081a8a8d0e7fb) tajo-docs/src/main/sphinx/hcatalog_integration.rst CHANGES

          People

          • Assignee:
            blrunner Jaehwa Jung
            Reporter:
            blrunner Jaehwa Jung
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development