Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
HiveHistory log files (hive_job_log_hive_*.txt files) store information about hive query such as query string, plan , counters and MR job progress information.
There is no mechanism to delete these files and as a result they get accumulated over time, using up lot of disk space.
I don't think this is used by most people, so I think it would better to turn this off by default. Jobtracker logs already capture most of this information, though it is not as structured as history logs.
Attachments
Attachments
- HIVE-4513.1.patch
- 37 kB
- Thejas Nair
- HIVE-4513.2.patch
- 38 kB
- Thejas Nair
- HIVE-4513.3.patch
- 38 kB
- Thejas Nair
- HIVE-4513.4.patch
- 38 kB
- Thejas Nair
- HIVE-4513.5.patch
- 42 kB
- Thejas Nair
- HIVE-4513.6.patch
- 43 kB
- Thejas Nair
Issue Links
- is blocked by
-
HIVE-4500 HS2 holding too many file handles of hive_job_log_hive_*.txt files
- Closed
- is duplicated by
-
HIVE-3779 An empty value to hive.logquery.location can't disable the creation of hive history log files
- Resolved
-
HIVE-1708 make hive history file configurable
- Resolved
- relates to
-
HIVE-4374 Hive does not periodically cleanup hive.exec.scratchdir
- Open
Activity
Having this on by default can cause problems if you have same machine and config being used for large number of queries. For example, while doing some stress tests on HiveServer2, the mount with /tmp became full because hive history files (around 20k of them) took 2.5 GB.
HIVE-4513.1.patch - Uses a proxy class that makes HiveHistory no-op, if hive history is disabled.
HIVE-4513.2.patch -
SessionState temp file gets created in same dir as hive history dir, but it was relying on hivehistory to create that dir. Now it independently does a -create-if-not-exist.
review board link with HIVE-4513.3.patch - https://reviews.apache.org/r/11029/
HIVE-4513.4.patch
- add @Override to interface functions being implemented in HiveHistoryImpl
- Removing javadoc duplication in HiveHistoryImpl. It will automatically inherit the documentation from interface.
- Logging the exception in code unrelated to patch, to partly address Brock's concern. Since the code is not part of the patch, I don't want to increase the scope to address that concern.
thiruvel Interesting that you asked me today, I am working on it right now!
HIVE-4513.5.patch - patch addressing review comments.
Also, removed use of hive history from TestCliDriver.vm and TestHBaseCliDriver.vm as it does not get used. When there is a failure that gets logged in hive history, it also results in client exit code being non zero. So the use of hivehistory there is redundant.
I will upload the updated patch to reviewboard once it is up again.
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597018/HIVE-4513.5.patch
ERROR: -1 due to 4 failed/errored test(s), 2775 tests executed
Failed tests:
org.apache.hadoop.hive.service.TestHiveServerSessions.testSessionVars org.apache.hcatalog.mapreduce.TestSequenceFileReadWrite.testSequenceTableWriteReadMR org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_bucketed_table org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_script_broken_pipe1
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/359/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/359/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 4 tests failed
This message is automatically generated.
HIVE-4513.6.patch - addresses review comments.
Also fixes race condition the race condition that was causing the TestHiveServerSessions.testSessionVars test failure. This race condition gets exposed when hivehistory is disabled, because when hive history is enabled it attempts to create the same dir this way, but on failure it just logs a warning.
I found another issue while making these changes, - SessionState temp file gets created in history file directory, created HIVE-5055 to track that.
Overall: -1 at least one tests failed
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597289/HIVE-4513.6.patch
ERROR: -1 due to 1 failed/errored test(s), 2776 tests executed
Failed tests:
org.apache.hadoop.hive.cli.TestNegativeMinimrCliDriver.testNegativeCliDriver_mapreduce_stack_trace_hadoop20
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/385/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/385/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed
This message is automatically generated.
Regarding the pre-commit test result, testNegativeCliDriver_mapreduce_stack_trace_hadoop20 is a flaky test tracked in HIVE-4851.
Reviewboard has the latest patch.
Thanks thejas. +1. I guess we can close the duplicates HIVE-1708 and HIVE-3779.
We back-ported this to Hive10 and it works as expected. cdrome had some comments on the patch. These fall in the vicinity but should be addressed as a separate JIRA.
1. HiveHistoryViewer.java: Its good as "private void init()"
2. HiveHistoryUtil.java: parseLine() method is not thread-safe. It uses parseBuffer which could be modified by multiple threads. Currently only HWA uses it.
> Chris Drome had some comments on the patch. These fall in the vicinity but should be addressed as a separate JIRA.
Created HIVE-5071 to address thread safety issues.
thiruvel Thanks for the feedback and pointers to the duplicate jiras. Yes, I think it is better to address thread safety issue in another jira, as it is not a regression introduced by this patch, the issue was there when the function was in HiveHistory.java (but I should have noticed the bug!)
Overall: +1 all checks pass
Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12597289/HIVE-4513.6.patch
SUCCESS: +1 2850 tests passed
Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/412/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/412/console
Messages:
Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase
This message is automatically generated.
FAILURE: Integrated in Hive-trunk-h0.21 #2265 (See https://builds.apache.org/job/Hive-trunk-h0.21/2265/)
HIVE-4513 : disable hivehistory logs by default (Thejas Nair via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513445)
- /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
- /hive/trunk/conf/hive-default.xml.template
- /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
- /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
- /hive/trunk/ql/src/test/templates/TestCliDriver.vm
FAILURE: Integrated in Hive-trunk-hadoop2 #356 (See https://builds.apache.org/job/Hive-trunk-hadoop2/356/)
HIVE-4513 : disable hivehistory logs by default (Thejas Nair via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513445)
- /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
- /hive/trunk/conf/hive-default.xml.template
- /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
- /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
- /hive/trunk/ql/src/test/templates/TestCliDriver.vm
FAILURE: Integrated in Hive-trunk-hadoop2-ptest #56 (See https://builds.apache.org/job/Hive-trunk-hadoop2-ptest/56/)
HIVE-4513 : disable hivehistory logs by default (Thejas Nair via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513445)
- /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
- /hive/trunk/conf/hive-default.xml.template
- /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
- /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
- /hive/trunk/ql/src/test/templates/TestCliDriver.vm
FAILURE: Integrated in Hive-trunk-hadoop1-ptest #126 (See https://builds.apache.org/job/Hive-trunk-hadoop1-ptest/126/)
HIVE-4513 : disable hivehistory logs by default (Thejas Nair via Ashutosh Chauhan) (hashutosh: http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1513445)
- /hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
- /hive/trunk/conf/hive-default.xml.template
- /hive/trunk/hbase-handler/src/test/templates/TestHBaseCliDriver.vm
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryImpl.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryProxyHandler.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryUtil.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistoryViewer.java
- /hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/session/SessionState.java
- /hive/trunk/ql/src/test/org/apache/hadoop/hive/ql/history/TestHiveHistory.java
- /hive/trunk/ql/src/test/templates/TestCliDriver.vm
This issue has been fixed and released as part of 0.12 release. If you find further issues, please create a new jira and link it to this one.
This adds the configuration parameter hive.session.history.enabled in HiveConf.java and hive-default.xml.template, so it needs to be documented in the wiki here:
Blocked by
HIVE-4500as it relies on config parameter being introduced in that patch.