Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Major
    • Resolution: Won't Fix
    • Affects Version/s: 0.8.0
    • Fix Version/s: 0.8.0
    • Component/s: Catalog
    • Labels:
      None

      Description

      I designed Tajo to use Hive metastore with HCatalog.
      In this case, Tajo has to have an interface to connect to HCatalog.
      And I think that this connect is very useful to connect to another catalogs as follows:
      some different Hive catalogs, HBase catalogs

      So, I named this interface as CatalogDriver. It will has some properties as follows:

      • catalog namespace name
      • catalog URI
      • catalog driver class
      1. TAJO-289_2.patch
        89 kB
        Jaehwa Jung
      2. TAJO-289.patch
        91 kB
        Jaehwa Jung

        Activity

        Hide
        blrunner Jaehwa Jung added a comment -

        TAJO-16 already includes this issue.

        Show
        blrunner Jaehwa Jung added a comment - TAJO-16 already includes this issue.
        Hide
        blrunner Jaehwa Jung added a comment -

        Thanks Jihoon.
        I am replying as follows:

        • What is the purpose of JobConf?
          I understand your question. As you mentioned, JobConf is not used in my patch. It is used in HCatalog and MapReduce1 jar files. Of course, MapReduce1 jar files already include JobConf class file. But if I add MapReduce1 jar file into Tajo library path, TajoWorker fail to start up. For TajoWorker, I deleted default resource file declaration. If other guys have another idea for this issue, I'll follow it.
        • HCatalogStore supports DROP TABLE statement
          You are right. HCatalog cannot support CREATE and DROP statement. But HCatalog uses HiveMetaStoreClient to get table informations. And HiveMetaStoreClient supports CREATE and DROP and ALTER statement. So, I created CREATE and DROP issue. For now, I think that HCatalogStore can be renamed like HiveStore.
        • DatumFactory.
          I agree with you. I revered DatumFactory now . We should implement another data handler for Hive.
        Show
        blrunner Jaehwa Jung added a comment - Thanks Jihoon. I am replying as follows: What is the purpose of JobConf? I understand your question. As you mentioned, JobConf is not used in my patch. It is used in HCatalog and MapReduce1 jar files. Of course, MapReduce1 jar files already include JobConf class file. But if I add MapReduce1 jar file into Tajo library path, TajoWorker fail to start up. For TajoWorker, I deleted default resource file declaration. If other guys have another idea for this issue, I'll follow it. HCatalogStore supports DROP TABLE statement You are right. HCatalog cannot support CREATE and DROP statement. But HCatalog uses HiveMetaStoreClient to get table informations. And HiveMetaStoreClient supports CREATE and DROP and ALTER statement. So, I created CREATE and DROP issue. For now, I think that HCatalogStore can be renamed like HiveStore. DatumFactory. I agree with you. I revered DatumFactory now . We should implement another data handler for Hive.
        Hide
        jihoonson Jihoon Son added a comment -

        Thanks for your patch.
        There are some points that need to discuss.

        • What is the purpose of JobConf? If you want to store key-value pairs, we have already TajoConf for the same purpose. In addition, it looks that JobConf is not used in your patch.
        • Why did you divide the issue about HCatalogStore into separate sub-issues like 'HCatalogStore supports SELECT statement' and 'HCatalogStore supports DROP TABLE statement'? I think that HCatalogStore does not work until all the sub-issues are submitted.
        • It would be better that codes for checking null values move to some other class instead of DatumFactory. We may need to design a new class to handle null values from hive stores.
        Show
        jihoonson Jihoon Son added a comment - Thanks for your patch. There are some points that need to discuss. What is the purpose of JobConf? If you want to store key-value pairs, we have already TajoConf for the same purpose. In addition, it looks that JobConf is not used in your patch. Why did you divide the issue about HCatalogStore into separate sub-issues like 'HCatalogStore supports SELECT statement' and 'HCatalogStore supports DROP TABLE statement'? I think that HCatalogStore does not work until all the sub-issues are submitted. It would be better that codes for checking null values move to some other class instead of DatumFactory. We may need to design a new class to handle null values from hive stores.
        Hide
        blrunner Jaehwa Jung added a comment -

        I'm so glad to upload this patch.
        From now on, Tajo can connect HiveMetastore and can get lots of data stored on HDFS through HiveMetaStore. Of course, there were some limitation as follows:

        • current version support just text file format.
        • current version cannot support compression type.

        But, I'll implement unsupported functions time soon.

        If you want to use HiveMetaStore as CatalogServer, you have to start up HiveMetaStore server as follows:

        $HIVE_HOME/bin/hive --service metastore
        

        Or you can start up HiveMetaStore server through HCatalog as follows:

        $HCATALOG_HOME/sbin/hcat_server.sh start
        

        If you start up HCatalog server, HCatalogServer start up HiveMetaStore server actually.

        If you start up HiveMetaStore server, you have to update catalog-site.xml as follows:

          <property>
            <name>tajo.catalog.store.class</name>
            <value>org.apache.tajo.catalog.store.HCatalogStore</value>
          </property>
          <property>
            <name>tajo.catalog.uri</name>
            <value>thrift://localhost:10001</value>
          </property>
        

        'tajo.catalog.url' is HiveMetaStore sever uri. You must specify your address on this property.

        And I borrowed JobConf from CDH and I removed static resource files declaratives. If JobConf use static resource files (mapred-default.xml, mapred-site.xml), TajoWorker made error message as folllows:

        Service:org.apache.tajo.worker.TajoWorkerManagerService is started.
        2013-11-08 23:44:03,411 INFO  worker.TaskRunnerManager (TaskRunnerManager.java:run(139)) - FinishedQueryMasterTaskCleanThread started: expire interval minutes = 720
        2013-11-08 23:44:03,419 ERROR service.CompositeService (CompositeService.java:start(72)) - Error starting services org.apache.tajo.worker.TajoWorker
        org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:8082
        	at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272)
        	at org.apache.tajo.pullserver.TajoPullServerService.start(TajoPullServerService.java:237)
        	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
        	at org.apache.tajo.worker.TajoWorker.start(TajoWorker.java:257)
        	at org.apache.tajo.worker.TajoWorker.startWorker(TajoWorker.java:125)
        	at org.apache.tajo.worker.TajoWorker.main(TajoWorker.java:659)
        Caused by: java.net.BindException: Address already in use
        	at sun.nio.ch.Net.bind(Native Method)
        	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124)
        	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
        	at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193)
        	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
        	at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
        	at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
        	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
        	at java.lang.Thread.run(Thread.java:680)
        

        After I modified JobConf, TajoWorker start up always successfully.

        Finally, my development environments are as follows:

        • hadoop 2.0.x-alpha
        • hive-0.11.0
        • hcatalog-0.5.0
        Show
        blrunner Jaehwa Jung added a comment - I'm so glad to upload this patch. From now on, Tajo can connect HiveMetastore and can get lots of data stored on HDFS through HiveMetaStore. Of course, there were some limitation as follows: current version support just text file format. current version cannot support compression type. But, I'll implement unsupported functions time soon. If you want to use HiveMetaStore as CatalogServer, you have to start up HiveMetaStore server as follows: $HIVE_HOME/bin/hive --service metastore Or you can start up HiveMetaStore server through HCatalog as follows: $HCATALOG_HOME/sbin/hcat_server.sh start If you start up HCatalog server, HCatalogServer start up HiveMetaStore server actually. If you start up HiveMetaStore server, you have to update catalog-site.xml as follows: <property> <name> tajo.catalog.store.class </name> <value> org.apache.tajo.catalog.store.HCatalogStore </value> </property> <property> <name> tajo.catalog.uri </name> <value> thrift://localhost:10001 </value> </property> 'tajo.catalog.url' is HiveMetaStore sever uri. You must specify your address on this property. And I borrowed JobConf from CDH and I removed static resource files declaratives. If JobConf use static resource files (mapred-default.xml, mapred-site.xml), TajoWorker made error message as folllows: Service:org.apache.tajo.worker.TajoWorkerManagerService is started. 2013-11-08 23:44:03,411 INFO worker.TaskRunnerManager (TaskRunnerManager.java:run(139)) - FinishedQueryMasterTaskCleanThread started: expire interval minutes = 720 2013-11-08 23:44:03,419 ERROR service.CompositeService (CompositeService.java:start(72)) - Error starting services org.apache.tajo.worker.TajoWorker org.jboss.netty.channel.ChannelException: Failed to bind to: 0.0.0.0/0.0.0.0:8082 at org.jboss.netty.bootstrap.ServerBootstrap.bind(ServerBootstrap.java:272) at org.apache.tajo.pullserver.TajoPullServerService.start(TajoPullServerService.java:237) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) at org.apache.tajo.worker.TajoWorker.start(TajoWorker.java:257) at org.apache.tajo.worker.TajoWorker.startWorker(TajoWorker.java:125) at org.apache.tajo.worker.TajoWorker.main(TajoWorker.java:659) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:124) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.jboss.netty.channel.socket.nio.NioServerBoss$RegisterTask.run(NioServerBoss.java:193) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) at java.lang.Thread.run(Thread.java:680) After I modified JobConf, TajoWorker start up always successfully. Finally, my development environments are as follows: hadoop 2.0.x-alpha hive-0.11.0 hcatalog-0.5.0
        Hide
        jihoonson Jihoon Son added a comment -

        +1, JaeHwa.
        I'm glad for you to start to implement HCatalogStore.
        And, I always welcome to discuss the Catalog Federation issue.

        Show
        jihoonson Jihoon Son added a comment - +1, JaeHwa. I'm glad for you to start to implement HCatalogStore. And, I always welcome to discuss the Catalog Federation issue.
        Hide
        blrunner Jaehwa Jung added a comment -

        Thanks Jihoon.

        I think that Catalog Federation supports users to analyze Hive tables and Tajo tables simultaneously. But as I mentioned to you at TAJO-298, we will discuss Catalog Federation fully.

        Anyway, I'm going to implement HCatalogStore and some interfaces to integrate with Hive metastore. First, I'll develop HCatalogStore to support SELECT FROM query. And I'll develop interfaces to support another query type at another issue.

        Show
        blrunner Jaehwa Jung added a comment - Thanks Jihoon. I think that Catalog Federation supports users to analyze Hive tables and Tajo tables simultaneously. But as I mentioned to you at TAJO-298 , we will discuss Catalog Federation fully. Anyway, I'm going to implement HCatalogStore and some interfaces to integrate with Hive metastore. First, I'll develop HCatalogStore to support SELECT FROM query. And I'll develop interfaces to support another query type at another issue.
        Hide
        jihoonson Jihoon Son added a comment -

        +1 for this idea.

        In this way, users cannot use HCatalog and a Tajo's catalog simultaneously,
        but it will make the implementation and query planning more simpler.

        Show
        jihoonson Jihoon Son added a comment - +1 for this idea. In this way, users cannot use HCatalog and a Tajo's catalog simultaneously, but it will make the implementation and query planning more simpler.
        Hide
        blrunner Jaehwa Jung added a comment -

        At first, I designed to implement a new class called as CatalogDriver. But now, I think that it was an unnecessary class because of CatalogStore. If users want to use some db as a catalog server, we just have to implements CatalogStore interface. I think that HCatalog interface is in the same way. So I will implements HCatalogStore exending AbstractDBStore.

        Show
        blrunner Jaehwa Jung added a comment - At first, I designed to implement a new class called as CatalogDriver. But now, I think that it was an unnecessary class because of CatalogStore. If users want to use some db as a catalog server, we just have to implements CatalogStore interface. I think that HCatalog interface is in the same way. So I will implements HCatalogStore exending AbstractDBStore.

          People

          • Assignee:
            blrunner Jaehwa Jung
            Reporter:
            blrunner Jaehwa Jung
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development