Hive
  1. Hive
  2. HIVE-80

Add testcases for concurrent query execution

    Details

      Description

      Can use one driver object per query.

        Issue Links

          Activity

          Hide
          Carl Steinbach added a comment -

          HiveServer can't support concurrent connections due to a limitation of the current HiveServer Thrift API. There's a proposal for a new HiveServer2 Thrift API which fixes these problems located here:

          https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API

          Show
          Carl Steinbach added a comment - HiveServer can't support concurrent connections due to a limitation of the current HiveServer Thrift API. There's a proposal for a new HiveServer2 Thrift API which fixes these problems located here: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Thrift+API
          Hide
          Konstantin Boudnik added a comment -

          Perhaps my question is bigger than this issue or shall be asked somewhere else, but looking at Hive code I couldn't help noticing that it is essentially build as a singleton i.e. only a single instance of Hive object is allowed to exist.

          What was/is a design motivation behind of this? Why Hive can't be instantiated at will by the client so different instances can be independently for query analysis, job submissions, etc.? This will make Hive much more flexible and extendable from Hive client applications perspective, won't it?

          Show
          Konstantin Boudnik added a comment - Perhaps my question is bigger than this issue or shall be asked somewhere else, but looking at Hive code I couldn't help noticing that it is essentially build as a singleton i.e. only a single instance of Hive object is allowed to exist. What was/is a design motivation behind of this? Why Hive can't be instantiated at will by the client so different instances can be independently for query analysis, job submissions, etc.? This will make Hive much more flexible and extendable from Hive client applications perspective, won't it?
          Hide
          Alex Kozlov added a comment -

          I will be out of office July 9-22, 2011. If you have anything urgent,
          please contact my manager Omer Trajman omer@cloudera.com.



          Alex Kozlov
          Solutions Architect
          Cloudera, Inc

          Hadoop World 2011 in New York
          City<http://www.cloudera.com/company/events/hadoop-world-2011/>
          <http://www.cloudera.com/company/press-center/hadoop-world-nyc/>

          Show
          Alex Kozlov added a comment - I will be out of office July 9-22, 2011. If you have anything urgent, please contact my manager Omer Trajman omer@cloudera.com. – – Alex Kozlov Solutions Architect Cloudera, Inc Hadoop World 2011 in New York City< http://www.cloudera.com/company/events/hadoop-world-2011/ > < http://www.cloudera.com/company/press-center/hadoop-world-nyc/ >
          Hide
          Frank LoVecchio added a comment -

          Edit: I put up some test results for different scenarios, including multiple connections threaded, here: https://github.com/franklovecchio/hiveserver-loadtest . Multi-threaded jobs = wonky.

          Show
          Frank LoVecchio added a comment - Edit: I put up some test results for different scenarios, including multiple connections threaded, here: https://github.com/franklovecchio/hiveserver-loadtest . Multi-threaded jobs = wonky.
          Hide
          Frank LoVecchio added a comment -

          We have tested multiple client connections submitting jobs rapidly to a single hiveserver (running on a Brisk implementation); I would not recommend doing this unless you have a que of some sort on your end. Otherwise, you will see this:

          ERROR in runJob
          org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused

          Has anyone used multiple hiveservers for managing multiple connections? (something like Amazon's cloud map/reduce, where they spin up a temporary instance?)

          Thanks

          Show
          Frank LoVecchio added a comment - We have tested multiple client connections submitting jobs rapidly to a single hiveserver (running on a Brisk implementation); I would not recommend doing this unless you have a que of some sort on your end. Otherwise, you will see this: ERROR in runJob org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused Has anyone used multiple hiveservers for managing multiple connections? (something like Amazon's cloud map/reduce, where they spin up a temporary instance?) Thanks
          Hide
          Florin Diaconeasa added a comment -

          Hello,

          I'm thinking about writing a Hive client in order to handle some of the queries that i wish to run and the dependencies between them.

          From what i read on the Hive wiki ( http://wiki.apache.org/hadoop/Hive/HiveServer ) it appears that the hive server is single threaded.

          I might be mistaking, but this is what i understand from that text: if i launch 2 requests from my client towards the hive server, the hive server will not handle them in parallel and the hadoop requests done by hive itself will run one after another, not in parallel.

          Is this right? If (hopefully) not , could someone please clarify that for me, if possible?

          Thank you.

          Show
          Florin Diaconeasa added a comment - Hello, I'm thinking about writing a Hive client in order to handle some of the queries that i wish to run and the dependencies between them. From what i read on the Hive wiki ( http://wiki.apache.org/hadoop/Hive/HiveServer ) it appears that the hive server is single threaded. I might be mistaking, but this is what i understand from that text: if i launch 2 requests from my client towards the hive server, the hive server will not handle them in parallel and the hadoop requests done by hive itself will run one after another, not in parallel. Is this right? If (hopefully) not , could someone please clarify that for me, if possible? Thank you.
          Hide
          Venkatesh Seetharam added a comment -

          > Does anyone know whether Driver is thread-safe?
          Driver is not thread safe.

          Show
          Venkatesh Seetharam added a comment - > Does anyone know whether Driver is thread-safe? Driver is not thread safe.
          Hide
          Jeff Zhang added a comment -

          Does anyone know whether Driver is thread-safe ? If so, each query can been executed by one Driver.
          And I notice that in HWI for each session there is one Driver, it seems Driver should be thread-safe, just want to ensure about it.

          Show
          Jeff Zhang added a comment - Does anyone know whether Driver is thread-safe ? If so, each query can been executed by one Driver. And I notice that in HWI for each session there is one Driver, it seems Driver should be thread-safe, just want to ensure about it.
          Hide
          Neil Conway added a comment -

          Arvind, I'm not actively working on it, so please go ahead.

          Show
          Neil Conway added a comment - Arvind, I'm not actively working on it, so please go ahead.
          Hide
          Arvind Prabhakar added a comment -

          This sounds like a good plan. If Neil is not actively working on this issue, I can move this to my queue and start working on it.

          Show
          Arvind Prabhakar added a comment - This sounds like a good plan. If Neil is not actively working on this issue, I can move this to my queue and start working on it.
          Hide
          Ning Zhang added a comment -

          Yes we should add more test cases for parallel execution. There is an open issue HIVE-1019 for parallel execution. The HIVE_PLAN* file name need to be unique rather than relying on timestamp.

          Show
          Ning Zhang added a comment - Yes we should add more test cases for parallel execution. There is an open issue HIVE-1019 for parallel execution. The HIVE_PLAN* file name need to be unique rather than relying on timestamp.
          Hide
          Ashish Thusoo added a comment -

          yes I think what Ning is saying is correct. We should however add a test case to the unit tests to check that. I am not sure that we added a test case for the parallel execution stuff.

          Show
          Ashish Thusoo added a comment - yes I think what Ning is saying is correct. We should however add a test case to the unit tests to check that. I am not sure that we added a test case for the parallel execution stuff.
          Hide
          Ning Zhang added a comment -

          I think after HIVE-549 (parallel execution) was committed, Driver allows more than one tasks running in parallel. So this JIRA may be fixed as a by-product. But we never tried it yet. It may make sense to try it on trunk or release 0.5.

          Show
          Ning Zhang added a comment - I think after HIVE-549 (parallel execution) was committed, Driver allows more than one tasks running in parallel. So this JIRA may be fixed as a by-product. But we never tried it yet. It may make sense to try it on trunk or release 0.5.
          Hide
          John Sichi added a comment -

          From internal discussions at Facebook, the best I can gather is that there may still be some thread-unsafe code, but no one knows for sure. Given that, the only approach may be to do as much review as possible (e.g. grep for statics that shouldn't be there), ask everyone to add any known issues here, and then set up a testbed and see what turns up.

          Show
          John Sichi added a comment - From internal discussions at Facebook, the best I can gather is that there may still be some thread-unsafe code, but no one knows for sure. Given that, the only approach may be to do as much review as possible (e.g. grep for statics that shouldn't be there), ask everyone to add any known issues here, and then set up a testbed and see what turns up.
          Hide
          Arvind Prabhakar added a comment -

          I wanted to fix this JIRA and so started looking at it. From what I have observed it appears that the HiveServer is multi-thread capable. Specifically:

          • The HiveServer is using a TThreadPoolServer which is multi-threaded.
          • The ThriftHiveProcessorFactory overrides the getProcessor() call and returns a new instance of HiveServerHandler on every invokation.
          • Every instance of HiveServerHandler has its own thread local session state and a private driver instance.
          • Query execution is thread safe thanks to HIVE-77.

          Give the above, I believe that this JIRA should be marked closed and resolved. If you think I missed something in my analysis, can you please point that out?

          Show
          Arvind Prabhakar added a comment - I wanted to fix this JIRA and so started looking at it. From what I have observed it appears that the HiveServer is multi-thread capable. Specifically: The HiveServer is using a TThreadPoolServer which is multi-threaded. The ThriftHiveProcessorFactory overrides the getProcessor() call and returns a new instance of HiveServerHandler on every invokation. Every instance of HiveServerHandler has its own thread local session state and a private driver instance. Query execution is thread safe thanks to HIVE-77 . Give the above, I believe that this JIRA should be marked closed and resolved. If you think I missed something in my analysis, can you please point that out?
          Hide
          Cliff Resnick added a comment -

          The ThreadLocal fix won't work with the unit tests, which is run an embedded mapreduce on a separate thread. In a real-world scenario perhaps it's a worthwhile hack - it does work for us - but there are certainly better options.

          After 0.4 release, if HiveConnection is still not threadsafe I will delve more into this. In the meantime I'm removing the patch.

          Show
          Cliff Resnick added a comment - The ThreadLocal fix won't work with the unit tests, which is run an embedded mapreduce on a separate thread. In a real-world scenario perhaps it's a worthwhile hack - it does work for us - but there are certainly better options. After 0.4 release, if HiveConnection is still not threadsafe I will delve more into this. In the meantime I'm removing the patch.
          Hide
          Namit Jain added a comment -

          The unit tests are failing with this patch

          Show
          Namit Jain added a comment - The unit tests are failing with this patch
          Hide
          Cliff Resnick added a comment -

          This fixes a broken patch previously submitted

          Show
          Cliff Resnick added a comment - This fixes a broken patch previously submitted
          Hide
          Matt Pestritto added a comment -

          Patch Attached.

          Show
          Matt Pestritto added a comment - Patch Attached.
          Hide
          Cliff Resnick added a comment -

          I'm attaching a patch to org.apache.hadoop.hive.ql.exec.Utilities. Currently this class has a static field instance of type mapredWork. Changing the reference to ThreadLocal has eliminated the race conditions we found while executing several concurrent queries through a simple HiveConnection pool.

          Show
          Cliff Resnick added a comment - I'm attaching a patch to org.apache.hadoop.hive.ql.exec.Utilities. Currently this class has a static field instance of type mapredWork. Changing the reference to ThreadLocal has eliminated the race conditions we found while executing several concurrent queries through a simple HiveConnection pool.
          Hide
          Namit Jain added a comment -

          The patch looks OK - I don't know why job was static to start with.

          Show
          Namit Jain added a comment - The patch looks OK - I don't know why job was static to start with.
          Hide
          Neil Conway added a comment -

          I quickly hacked this patch together to see if it fixed the problem by removing the static JobConf variable. It seemed to fix the races, but it looked like the execution of multiple queries was still serialized. I haven't had a chance to look into it further...

          Show
          Neil Conway added a comment - I quickly hacked this patch together to see if it fixed the problem by removing the static JobConf variable. It seemed to fix the races, but it looked like the execution of multiple queries was still serialized. I haven't had a chance to look into it further...
          Hide
          Ashish Thusoo added a comment -

          hmm...

          We do generate a new JobConf in ExecDriver. I think the static can be dropped in HiveInputFormat. I don't think that is needed at all. Have you done that to get around this issue?

          Show
          Ashish Thusoo added a comment - hmm... We do generate a new JobConf in ExecDriver. I think the static can be dropped in HiveInputFormat. I don't think that is needed at all. Have you done that to get around this issue?
          Hide
          Neil Conway added a comment -

          BTW, one issue we've run into when running multiple queries by using multiple Drivers in a client program is that HiveInputFormat seems to be dependent on a static "JobConf" variable, so there's a race condition when running multiple queries concurrently.

          Show
          Neil Conway added a comment - BTW, one issue we've run into when running multiple queries by using multiple Drivers in a client program is that HiveInputFormat seems to be dependent on a static "JobConf" variable, so there's a race condition when running multiple queries concurrently.
          Hide
          Neil Conway added a comment -

          Raghu, thanks for your work on 438. In the short term, we're planning to implement the MQO prototype using multiple Drivers embedded directly in a client program. That means there's no short-term dependency on getting this ticket resolved. I should have the cycles to look at this in ~2 weeks – but if someone else would like to do it first, by all means go ahead.

          Show
          Neil Conway added a comment - Raghu, thanks for your work on 438. In the short term, we're planning to implement the MQO prototype using multiple Drivers embedded directly in a client program. That means there's no short-term dependency on getting this ticket resolved. I should have the cycles to look at this in ~2 weeks – but if someone else would like to do it first, by all means go ahead.
          Hide
          Raghotham Murthy added a comment -

          Neil, you can get the patch that Zheng posted to HIVE-438 and copy over the libthrift.jar and libfb303.jar (also attached). That should get you moving with this jira.

          Show
          Raghotham Murthy added a comment - Neil, you can get the patch that Zheng posted to HIVE-438 and copy over the libthrift.jar and libfb303.jar (also attached). That should get you moving with this jira.
          Hide
          Raghotham Murthy added a comment -

          Actually, right after I commented on this jira, I realized that the problem is with the incompatible change between thrift in apache vs thrift we had used originally. The following changes have to be made in hive to get it working with any thrift in apache.

          • change libthrift.jar
          • change com.facebook.thrift to org.apache.thrift
          • handle some incompatible changes
          • fix some of the warnings

          I'll create a new jira for this.

          Show
          Raghotham Murthy added a comment - Actually, right after I commented on this jira, I realized that the problem is with the incompatible change between thrift in apache vs thrift we had used originally. The following changes have to be made in hive to get it working with any thrift in apache. change libthrift.jar change com.facebook.thrift to org.apache.thrift handle some incompatible changes fix some of the warnings I'll create a new jira for this.
          Hide
          Neil Conway added a comment -

          Raghu: Well, running "ant thriftif" works with the trunk release of thrift, as well – but the generated code doesn't compile; same results for r760184 and r758922. I'll do a binary search tomorrow to find a version of Thrift that works with hive, unless anyone knows of one offhand ...?

          Show
          Neil Conway added a comment - Raghu: Well, running "ant thriftif" works with the trunk release of thrift, as well – but the generated code doesn't compile; same results for r760184 and r758922. I'll do a binary search tomorrow to find a version of Thrift that works with hive, unless anyone knows of one offhand ...?
          Hide
          Raghotham Murthy added a comment -

          We have not really recorded the thrift version used to generate the files. Can you try one of the instant releases at: http://instant.thrift-rpc.org ? I was able to run ant thriftif with r760184 of thrift at http://gitweb.thrift-rpc.org/?p=thrift.git;spfx=thrift-instant-r760184;a=snapshot;h=b1139424416009c980a9634c44f2806f469f8c1c;sf=tgz.

          Show
          Raghotham Murthy added a comment - We have not really recorded the thrift version used to generate the files. Can you try one of the instant releases at: http://instant.thrift-rpc.org ? I was able to run ant thriftif with r760184 of thrift at http://gitweb.thrift-rpc.org/?p=thrift.git;spfx=thrift-instant-r760184;a=snapshot;h=b1139424416009c980a9634c44f2806f469f8c1c;sf=tgz .
          Hide
          Neil Conway added a comment -

          BTW, is there any documentation on how to regenerate the Thrift-generated source files? I made a trivial change to service/if/hive_service.thrift, and then reran

          $ cd service
          $ ant thriftif

          However, the resulting Thrift-generated code fails to compile. The first few of the many compile errors are:

          [javac] Compiling 4 source files to /Users/neilconway/hive-trunk/build/service/classes
          [javac] /Users/neilconway/hive-trunk/service/src/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java:46: cannot find symbol
          [javac] symbol : constructor Client(org.apache.thrift.protocol.TProtocol,org.apache.thrift.protocol.TProtocol)
          [javac] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client
          [javac] super(iprot, oprot);
          [javac] ^
          [javac] /Users/neilconway/hive-trunk/service/src/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java:57: writeMessageBegin(com.facebook.thrift.protocol.TMessage) in com.facebook.thrift.protocol.TProtocol cannot be applied to (org.apache.thrift.protocol.TMessage)
          [javac] oprot_.writeMessageBegin(new TMessage("compile", TMessageType.CALL, seqid_));
          [javac] ^
          [javac] /Users/neilconway/hive-trunk/service/src/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java:60: write(org.apache.thrift.protocol.TProtocol) in org.apache.hadoop.hive.service.ThriftHive.compile_args cannot be applied to (com.facebook.thrift.protocol.TProtocol)
          [javac] args.write(oprot_);
          [javac] ^

          So I'm guessing I need to be using an old version of Thrift? Any info on which version to use or which procedure to follow would be very helpful.

          Show
          Neil Conway added a comment - BTW, is there any documentation on how to regenerate the Thrift-generated source files? I made a trivial change to service/if/hive_service.thrift, and then reran $ cd service $ ant thriftif However, the resulting Thrift-generated code fails to compile. The first few of the many compile errors are: [javac] Compiling 4 source files to /Users/neilconway/hive-trunk/build/service/classes [javac] /Users/neilconway/hive-trunk/service/src/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java:46: cannot find symbol [javac] symbol : constructor Client(org.apache.thrift.protocol.TProtocol,org.apache.thrift.protocol.TProtocol) [javac] location: class org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore.Client [javac] super(iprot, oprot); [javac] ^ [javac] /Users/neilconway/hive-trunk/service/src/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java:57: writeMessageBegin(com.facebook.thrift.protocol.TMessage) in com.facebook.thrift.protocol.TProtocol cannot be applied to (org.apache.thrift.protocol.TMessage) [javac] oprot_.writeMessageBegin(new TMessage("compile", TMessageType.CALL, seqid_)); [javac] ^ [javac] /Users/neilconway/hive-trunk/service/src/gen-javabean/org/apache/hadoop/hive/service/ThriftHive.java:60: write(org.apache.thrift.protocol.TProtocol) in org.apache.hadoop.hive.service.ThriftHive.compile_args cannot be applied to (com.facebook.thrift.protocol.TProtocol) [javac] args.write(oprot_); [javac] ^ So I'm guessing I need to be using an old version of Thrift? Any info on which version to use or which procedure to follow would be very helpful.
          Hide
          Neil Conway added a comment -

          Raghu, thanks for the feedback.

          Maintaining backward compatibility for the hive service is important, then? Okay, I can just add new methods rather than changing the existing ones.

          WRT the map of queryId => driver, that's correct.

          Show
          Neil Conway added a comment - Raghu, thanks for the feedback. Maintaining backward compatibility for the hive service is important, then? Okay, I can just add new methods rather than changing the existing ones. WRT the map of queryId => driver, that's correct.
          Hide
          Raghotham Murthy added a comment -

          Neil, please go ahead and take this over. Your plan sounds good.

          A few comments:

          • This change would mean breaking compatibility of the hive service. It might be better to just add additional methods that use the queryId and deprecate the old methods.
          • Inside the HiveServerHandler, I am assuming, you would just use a map of queryid to driver (instead of a single driver object)
          • If a client dies, thrift will time out and release the handler. So, that should not cause problems.
          Show
          Raghotham Murthy added a comment - Neil, please go ahead and take this over. Your plan sounds good. A few comments: This change would mean breaking compatibility of the hive service. It might be better to just add additional methods that use the queryId and deprecate the old methods. Inside the HiveServerHandler, I am assuming, you would just use a map of queryid to driver (instead of a single driver object) If a client dies, thrift will time out and release the handler. So, that should not cause problems.
          Hide
          Neil Conway added a comment -

          A reasonable way to implement this might be as follows:

          • Change the HiveServer#execute() method to return a unique ID for each active query (this can just be QueryPlan#getQueryId()).
          • Change the rest of the HiveServer methods to be parameterized by the query ID
          • Inside HiveServer, create a separate Driver object for each active query
          • Perhaps add a HiveServer#close() method that clients can use when they're finished executing a query

          One issue is that if a client dies spontaneously, it might not call close(), which would leak resources at the server.

          Comments? If you're not working on this right now Raghu, I'd be happy to take a crack at it.

          Show
          Neil Conway added a comment - A reasonable way to implement this might be as follows: Change the HiveServer#execute() method to return a unique ID for each active query (this can just be QueryPlan#getQueryId()). Change the rest of the HiveServer methods to be parameterized by the query ID Inside HiveServer, create a separate Driver object for each active query Perhaps add a HiveServer#close() method that clients can use when they're finished executing a query One issue is that if a client dies spontaneously, it might not call close(), which would leak resources at the server. Comments? If you're not working on this right now Raghu, I'd be happy to take a crack at it.
          Hide
          Ashish Thusoo added a comment -

          Downgrading to critical as HiveServer has a bunch of other fixes and is at best experimental right now.

          Show
          Ashish Thusoo added a comment - Downgrading to critical as HiveServer has a bunch of other fixes and is at best experimental right now.

            People

            • Assignee:
              Carl Steinbach
              Reporter:
              Raghotham Murthy
            • Votes:
              6 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

              • Created:
                Updated:

                Development