Hive
  1. Hive
  2. HIVE-3709

Stop storing default ConfVars in temp file

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.10.0
    • Fix Version/s: 0.11.0
    • Component/s: Configuration
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      To work around issues with Hadoop's Configuration object, specifically it's addResource(InputStream), default configurations are written to a temp file (I think HIVE-2362 introduced this).

      This, however, introduces the problem that once that file is deleted from /tmp the client crashes. This is particularly problematic for long running services like the metastore server.

      Writing a custom InputStream to deal with the problems in the Configuration object should provide a work around, which does not introduce a time bomb into Hive.

      1. HIVE-3709.3.patch.txt
        7 kB
        Kevin Wilfong
      2. HIVE-3709.2.patch.txt
        5 kB
        Kevin Wilfong
      3. HIVE-3709.1.patch.txt
        5 kB
        Kevin Wilfong

        Issue Links

          Activity

          Hide
          Ashutosh Chauhan added a comment -

          +1 for getting rid of writing temp file. I have also been hit by this.

          Show
          Ashutosh Chauhan added a comment - +1 for getting rid of writing temp file. I have also been hit by this.
          Show
          Kevin Wilfong added a comment - https://reviews.facebook.net/D6723
          Hide
          Carl Steinbach added a comment -

          +1. Will commit if tests pass.

          Show
          Carl Steinbach added a comment - +1. Will commit if tests pass.
          Hide
          Carl Steinbach added a comment -

          @Kevin: While testing I got a failures in TestMTQueries and TestHiveServerSessions. I think these problems can probably be fixed by modifying getConfVarInputStream() to return a new InputStream instead of a cached copy.

          Show
          Carl Steinbach added a comment - @Kevin: While testing I got a failures in TestMTQueries and TestHiveServerSessions. I think these problems can probably be fixed by modifying getConfVarInputStream() to return a new InputStream instead of a cached copy.
          Hide
          Kevin Wilfong added a comment -

          Thanks Carl, I switched to caching the byte[] and returning a new InputStream wrapping that byte[]. Now those two tests pass.

          Show
          Kevin Wilfong added a comment - Thanks Carl, I switched to caching the byte[] and returning a new InputStream wrapping that byte[]. Now those two tests pass.
          Hide
          Carl Steinbach added a comment -

          +1. Running tests.

          Show
          Carl Steinbach added a comment - +1. Running tests.
          Hide
          Carl Steinbach added a comment -

          @Kevin: I still see errors in TestHiveServerSessions when I run the test individually:

          % ant clean package test -Dtestcase=TestHiveServerSessions

          test:
          [echo] Project: service
          [junit] WARNING: multiple versions of ant detected in path for junit
          [junit] jar:file:/Users/carl/.local/java/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
          [junit] and jar:file:/Users/carl/Work/repos/hive-test/build/ivy/lib/hadoop0.20.shim/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
          [junit] Running org.apache.hadoop.hive.service.TestHiveServerSessions
          [junit] Hive history file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_789001489.txt
          [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 8.439 sec
          [junit] Hive history file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_788616740.txt
          [junit] [Fatal Error] :1:1: Content is not allowed in prolog.
          [junit] [Fatal Error] :92:58: The element type "name" must be terminated by the matching end-tag "</name>".
          [junit] Test org.apache.hadoop.hive.service.TestHiveServerSessions FAILED
          [for] /Users/carl/Work/repos/hive-test/service/build.xml: The following error occurred while executing this line:
          [for] /Users/carl/Work/repos/hive-test/build.xml:325: The following error occurred while executing this line:
          [for] /Users/carl/Work/repos/hive-test/build-common.xml:455: Tests failed!

          BUILD FAILED
          /Users/carl/Work/repos/hive-test/build.xml:320: Keepgoing execution: 1 of 12 iterations failed.

          Show
          Carl Steinbach added a comment - @Kevin: I still see errors in TestHiveServerSessions when I run the test individually: % ant clean package test -Dtestcase=TestHiveServerSessions test: [echo] Project: service [junit] WARNING: multiple versions of ant detected in path for junit [junit] jar: file:/Users/carl/.local/java/ant/lib/ant.jar!/org/apache/tools/ant/Project.class [junit] and jar: file:/Users/carl/Work/repos/hive-test/build/ivy/lib/hadoop0.20.shim/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [junit] Running org.apache.hadoop.hive.service.TestHiveServerSessions [junit] Hive history file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_789001489.txt [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 8.439 sec [junit] Hive history file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_788616740.txt [junit] [Fatal Error] :1:1: Content is not allowed in prolog. [junit] [Fatal Error] :92:58: The element type "name" must be terminated by the matching end-tag "</name>". [junit] Test org.apache.hadoop.hive.service.TestHiveServerSessions FAILED [for] /Users/carl/Work/repos/hive-test/service/build.xml: The following error occurred while executing this line: [for] /Users/carl/Work/repos/hive-test/build.xml:325: The following error occurred while executing this line: [for] /Users/carl/Work/repos/hive-test/build-common.xml:455: Tests failed! BUILD FAILED /Users/carl/Work/repos/hive-test/build.xml:320: Keepgoing execution: 1 of 12 iterations failed.
          Hide
          Kevin Wilfong added a comment -

          Individually ran TestHiveServerSessions (several times), TestHiveServer and TestMTQueries and they passed. Ran the full ant test and that still passes for me.

          Show
          Kevin Wilfong added a comment - Individually ran TestHiveServerSessions (several times), TestHiveServer and TestMTQueries and they passed. Ran the full ant test and that still passes for me.
          Hide
          Ashutosh Chauhan added a comment -

          Kevin, Will HADOOP-8573 fix this?

          Show
          Ashutosh Chauhan added a comment - Kevin, Will HADOOP-8573 fix this?
          Hide
          Kevin Wilfong added a comment -

          It looks like that fixes the issue on a single thread where it ends up reading from the same InputStream repeatedly, which is why I overrode the close method to reset the InputStream.

          It does not look like it will fix the multi-threaded issue. If two threads get Configuration objects constructed using the copy constructor, and hence get the same InputStream since the resources are not cloned themselves, and loadResources has not been called before the copy constructor, it looks like it could be possible that both threads call loadResources at about the same time causing the issues Carl was seeing in TestHiveServerSessions.

          Show
          Kevin Wilfong added a comment - It looks like that fixes the issue on a single thread where it ends up reading from the same InputStream repeatedly, which is why I overrode the close method to reset the InputStream. It does not look like it will fix the multi-threaded issue. If two threads get Configuration objects constructed using the copy constructor, and hence get the same InputStream since the resources are not cloned themselves, and loadResources has not been called before the copy constructor, it looks like it could be possible that both threads call loadResources at about the same time causing the issues Carl was seeing in TestHiveServerSessions.
          Hide
          Carl Steinbach added a comment -

          Another option for fixing this problem is to give administrators the ability to specify the location where the temporary file should be written (for setuid processes running on POSIX systems the conventional location is somewhere under /var). For example we could add a configuration property named hive.process.local.temporary.dir and have it default to $

          {user.home}

          /.hive/$

          {process_id}

          .

          Show
          Carl Steinbach added a comment - Another option for fixing this problem is to give administrators the ability to specify the location where the temporary file should be written (for setuid processes running on POSIX systems the conventional location is somewhere under /var). For example we could add a configuration property named hive.process.local.temporary.dir and have it default to $ {user.home} /.hive/$ {process_id} .
          Hide
          Chris McConnell added a comment -

          I was also looking into this with 3596, I was able to fix utilizing a location similar to the suggestion Carl made above, however I think that pushes the problem to another location, rather than addressing the actual issue. I like where Kevin is going with this fix, I had thought about the possibility of checking the confVarURL in the copy constructor, removing and re-creating if it did not exist, but even that would not be perfect depending upon timing.

          Show
          Chris McConnell added a comment - I was also looking into this with 3596, I was able to fix utilizing a location similar to the suggestion Carl made above, however I think that pushes the problem to another location, rather than addressing the actual issue. I like where Kevin is going with this fix, I had thought about the possibility of checking the confVarURL in the copy constructor, removing and re-creating if it did not exist, but even that would not be perfect depending upon timing.
          Hide
          Kevin Wilfong added a comment -

          I had considered the solution Carl mentioned, but I didn't go that route because it could result in clutter in the directory specified, in particular we can't guarantee the file will be deleted in the presence of catastrophic failures. To solve this the user would need to set up some sort of periodic cleanup which puts us back in the same position. We might be able to work around this by regularly touching the file, but I'm not 100% sure.

          Carl, are you still seeing threading problems with the most recent patch? TestHiveServerSessions has been succeeding for me consistently.

          Show
          Kevin Wilfong added a comment - I had considered the solution Carl mentioned, but I didn't go that route because it could result in clutter in the directory specified, in particular we can't guarantee the file will be deleted in the presence of catastrophic failures. To solve this the user would need to set up some sort of periodic cleanup which puts us back in the same position. We might be able to work around this by regularly touching the file, but I'm not 100% sure. Carl, are you still seeing threading problems with the most recent patch? TestHiveServerSessions has been succeeding for me consistently.
          Hide
          Ashutosh Chauhan added a comment -

          I am also not in favor of workaround. Writing to a filesystem unnecessarily should be avoided. Kevin's approach is better.

          Show
          Ashutosh Chauhan added a comment - I am also not in favor of workaround. Writing to a filesystem unnecessarily should be avoided. Kevin's approach is better.
          Hide
          Carl Steinbach added a comment -

          Committed to trunk. Thanks Kevin!

          Show
          Carl Steinbach added a comment - Committed to trunk. Thanks Kevin!
          Hide
          Carl Steinbach added a comment -

          @Kevin: I tried running TestHiveServerSessions again and wasn't able to provoke the failure I saw before.

          Show
          Carl Steinbach added a comment - @Kevin: I tried running TestHiveServerSessions again and wasn't able to provoke the failure I saw before.
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-h0.21 #1824 (See https://builds.apache.org/job/Hive-trunk-h0.21/1824/)
          HIVE-3709. Stop storing default ConfVars in temp file (Kevin Wilfong via cws) (Revision 1415038)

          Result = FAILURE
          cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1415038
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/LoopingByteArrayInputStream.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-h0.21 #1824 (See https://builds.apache.org/job/Hive-trunk-h0.21/1824/ ) HIVE-3709 . Stop storing default ConfVars in temp file (Kevin Wilfong via cws) (Revision 1415038) Result = FAILURE cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1415038 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/LoopingByteArrayInputStream.java
          Hide
          Hudson added a comment -

          Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/)
          HIVE-3709. Stop storing default ConfVars in temp file (Kevin Wilfong via cws) (Revision 1415038)

          Result = ABORTED
          cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1415038
          Files :

          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
          • /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/LoopingByteArrayInputStream.java
          Show
          Hudson added a comment - Integrated in Hive-trunk-hadoop2 #54 (See https://builds.apache.org/job/Hive-trunk-hadoop2/54/ ) HIVE-3709 . Stop storing default ConfVars in temp file (Kevin Wilfong via cws) (Revision 1415038) Result = ABORTED cws : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1415038 Files : /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/LoopingByteArrayInputStream.java

            People

            • Assignee:
              Kevin Wilfong
              Reporter:
              Kevin Wilfong
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development