[HIVE-7292] Hive on Spark - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Done
Affects Version/s: None
Fix Version/s: 1.1.0
Component/s: Spark
Labels:
- Spark-M1
- Spark-M2
- Spark-M3
- Spark-M4
- Spark-M5

Description

Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Hive-on-Spark.pdf
25/Jun/14 23:28
290 kB
Xuefu Zhang

Issue Links

contains

HIVE-7717 Add .q tests coverage for "union all" [Spark Branch]

Resolved

HIVE-7767 hive.optimize.union.remove does not work properly [Spark Branch]

Resolved

HIVE-8242 Investigate test failures when hive.multigroupby.singlereducer and hive.optimize.multigroupby.common.distincts are set to false [Spark Branch]

Open

HIVE-7745 NullPointerException when turn on hive.optimize.union.remove, hive.merge.mapfiles and hive.merge.mapredfiles [Spark Branch]

Resolved

HIVE-8216 auto_smb_mapjoin_14.q failed test with exception. [Spark Branch]

Resolved

HIVE-8233 multi-table insertion doesn't work with ForwardOperator [Spark Branch]

Resolved

HIVE-8496 Re-enable statistics [Spark Branch]

Resolved

HIVE-8542 Enable groupby_map_ppr.q and groupby_map_ppr_multi_distinct.q [Spark Branch]

Resolved

HIVE-8545 Exception when casting Text to BytesWritable [Spark Branch]

Resolved

HIVE-8208 Multi-table insertion optimization #1: don't always break operator tree. [Spark Branch]

Resolved

HIVE-8215 Multi-table insertion optimization #3: use 1+1 tasks instead of 1+N tasks [Spark Branch]

Resolved

HIVE-8249 Refactoring SparkPlan and SparkPlanGenerator [Spark Branch]

Resolved

HIVE-8209 Multi-table insertion optimization #2: use separate context [Spark Branch]

Resolved

HIVE-8463 Add numPartitions info to SparkEdgeProperty [Spark Branch]

Resolved

HIVE-8207 Add .q tests for multi-table insertion [Spark Branch]

Resolved

HIVE-8430 Enable parquet_join.q [Spark Branch]

Resolved

HIVE-8431 Enable smb_mapjoin_11.q and smb_mapjoin_12.q [Spark Branch]

Resolved

HIVE-8533 Enable all q-tests for multi-insertion [Spark Branch]

Resolved

HIVE-7437 Check if servlet-api and jetty module in Spark library are an issue for hive-spark integration [Spark Branch]

Resolved

HIVE-7939 Refactoring GraphTran to make it conform to SparkTran interface. [Spark Branch]

Resolved

depends upon

SPARK-2633 enhance spark listener API to gather more spark job information

Resolved

SPARK-2636 Expose job ID in JobWaiter API

Resolved

SPARK-2895 Support mapPartitionsWithContext in Spark Java API

Resolved

SPARK-2421 Spark should treat writable as serializable for keys

Resolved

SPARK-4290 Provide an equivalent functionality of distributed cache as MR does

Resolved

SPARK-2420 Dependency changes for compatibility with Hive

Resolved

incorporates

HIVE-7773 Union all query finished with errors [Spark Branch]

Resolved

HIVE-7525 Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]

Resolved

HIVE-7728 Enable q-tests for TABLESAMPLE feature [Spark Branch]

Resolved

HIVE-7729 Enable q-tests for ANALYZE TABLE feature [Spark Branch]

Resolved

HIVE-7731 Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch]

Resolved

HIVE-7775 enable sample8.q.[Spark Branch]

Resolved

HIVE-7776 enable sample10.q.[Spark Branch]

Resolved

HIVE-7810 Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]

Resolved

HIVE-7870 Insert overwrite table query does not generate correct task plan [Spark Branch]

Resolved

HIVE-8054 Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]

Resolved

HIVE-8055 Code cleanup after HIVE-8054 [Spark Branch]

Resolved

HIVE-7613 Research optimization of auto convert join to map join [Spark branch]

Resolved

HIVE-7746 Cleanup SparkClient and make refreshLocalResources method synchronized [Spark Branch]

Resolved

HIVE-7372 Select query gives unpredictable incorrect result when parallelism is greater than 1 [Spark Branch]

Resolved

HIVE-7387 Guava version conflict between hadoop and spark [Spark-Branch]

Resolved

HIVE-7431 When run on spark cluster, some spark tasks may fail

Resolved

HIVE-7467 When querying HBase table, task fails with exception: java.lang.IllegalAccessError: com/google/protobuf/HBaseZeroCopyByteString

Resolved

HIVE-7489 Change ql/pom.xml to fix mvn project setup [Spark Branch]

Resolved

HIVE-7530 Go thru the common code to find references to HIVE_EXECUCTION_ENGINE to make sure conditions works with Spark [Spark Branch]

Resolved

HIVE-7540 NotSerializableException encountered when using sortByKey transformation

Resolved

HIVE-7556 Fix code style, license header, tabs, etc. [Spark Branch]

Resolved

HIVE-7624 Reduce operator initialization failed when running multiple MR query on spark

Resolved

HIVE-7626 Add jar through CLI did not loaded by Spark executor[Spark Branck]

Resolved

HIVE-7627 FSStatsPublisher does fit into Spark multi-thread task mode[Spark Branch]

Resolved

HIVE-7642 Set hive input format by configuration.[Spark Branch]

Resolved

HIVE-7643 ExecMapper static states lead to unpredictable query result.[Spark Branch]

Resolved

HIVE-7747 Submitting a query to Spark from HiveServer2 fails [Spark Branch]

Resolved

HIVE-7761 Failed to analyze stats with CounterStatsAggregator [SparkBranch]

Resolved

HIVE-7763 Failed to query TABLESAMPLE on empty bucket table [Spark Branch]

Resolved

HIVE-7780 Query with OVER clause return duplicate results[Spark Branch]

Resolved

HIVE-7795 Enable ptf.q and ptf_streaming.q.[Spark Branch]

Resolved

HIVE-7799 TRANSFORM failed in transform_ppr1.q[Spark Branch]

Resolved

HIVE-7909 Fix sample8.q automatic test failure[Spark Branch]

Resolved

HIVE-7916 Snappy-java error when running hive query on spark [Spark Branch]

Resolved

HIVE-7956 When inserting into a bucketed table, all data goes to a single bucket [Spark Branch]

Resolved

HIVE-8118 Support work that have multiple child works to work around SPARK-3622 [Spark Branch]

Resolved

HIVE-8300 Missing guava lib causes IllegalStateException when deserializing a task [Spark Branch]

Resolved

HIVE-8426 paralle.q assert failed.[Spark Branch]

Resolved

HIVE-8548 Integrate with remote Spark context after HIVE-8528 [Spark Branch]

Resolved

HIVE-7551 expand spark accumulator to support hive counter [Spark Branch]

Resolved

HIVE-7552 Collect spark job statistic through spark metrics [Spark Branch]

Resolved

HIVE-8699 Enable support for common map join [Spark Branch]

Open

HIVE-7516 Add capacity control over queries running on Spark cluster [Spark Branch]

Resolved

HIVE-7564 Remove some redundant code plus a bit of cleanup in SparkClient [Spark Branch]

Resolved

HIVE-7659 Unnecessary sort in query plan [Spark Branch]

Resolved

HIVE-7707 Optimize SparkMapRecordHandler implementation

Resolved

HIVE-7726 Refactor to reuse common logic between SparkMapRecordHandler and ExecMapper.

Resolved

HIVE-7727 Refactor to reuse common logic between SparkReduceRecordHandler and ExecReducer

Resolved

HIVE-8029 Remove reducers number configure in SparkTask [Spark Branch]

Resolved

HIVE-8219 Multi-Insert optimization, don't sink the source into a file [Spark Branch]

Resolved

HIVE-8220 Refactor multi-insert code such that plan splitting and task generation are modular and reusable [Spark Branch]

Resolved

HIVE-7772 Add tests for order/sort/distribute/cluster by query [Spark Branch]

Resolved

HIVE-8098 The spark golden file for union_remove_25 is different from MR version [Spark Branch]

Open

HIVE-7370 Initial ground work for Hive on Spark [Spark branch]

Resolved

HIVE-7371 Identify a minimum set of JARs needed to ship to Spark cluster [Spark Branch]

Resolved

HIVE-7526 Research to use groupby transformation to replace Hive existing partitionByKey and SparkCollector combination

Resolved

HIVE-7567 support automatic calculating reduce task number [Spark Branch]

Resolved

HIVE-8024 Find out whether it's possible to remove UnionOperator from original operator tree [Spark Branch]

Resolved

HIVE-8043 Support merging small files [Spark Branch]

Resolved

HIVE-8274 Refactoring SparkPlan and SparkPlanGeneration [Spark Branch]

Resolved

HIVE-8537 Update to use the stable TaskContext API [Spark Branch]

Resolved

HIVE-7893 Find a way to get a job identifier when submitting a spark job [Spark Branch]

Resolved

HIVE-8073 Go thru all operator plan optimizations and disable those that are not suitable for Spark [Spark Branch]

Closed

is blocked by

SPARK-2243 Support multiple SparkContexts in the same JVM

Resolved

is related to

HIVE-9611 Allow SPARK_HOME as well as spark.home to define sparks location

Resolved

HIVE-9134 Uber JIRA to track HOS performance work

Open

HIVE-9367 CombineFileInputFormatShim#getDirIndices is expensive

Resolved

SPARK-2741 Publish version of spark assembly which does not contain Hive

Resolved

relates to

HIVE-7607 Spark "Explain" should give useful information on dependencies [Spark Branch]

Resolved

requires

HIVE-7958 SparkWork generated by SparkCompiler may require multiple Spark jobs to run

Resolved

SPARK-2688 Need a way to run multiple data pipeline concurrently

Resolved

HIVE-7391 Refactoring TezWork/TezEdgeProperty for code reuse

Resolved

links to

https://cwiki.apache.org/confluence/display/Hive/Hive+on+Spark

(15 contains, 6 depends upon, 63 incorporates, 1 is blocked by, 4 is related to, 1 relates to, 3 requires, 1 links to)

Sub-Tasks

1.	Refactoring: make Hive reduce side data processing reusable [Spark Branch]	Reopened	Xuefu Zhang
2.	Refactoring: make Hive map side data processing reusable [Spark Branch]	Resolved	Xuefu Zhang
3.	Create SparkWork [Spark Branch]	Resolved	Xuefu Zhang
4.	Create SparkTask [Spark Branch]	Resolved	Chinna Rao Lalam
5.	Create SparkCompiler [Spark Branch]	Resolved	Xuefu Zhang
6.	Create SparkClient, interface to Spark cluster [Spark Branch]	Resolved	Chengxiang Li
7.	Create RDD translator, translating Hive Tables into Spark RDDs [Spark Branch]	Resolved	Rui Li
8.	Create SparkShuffler, shuffling data between map-side data processing and reduce-side processing [Spark Branch]	Resolved	Rui Li
9.	Create SparkPlan, DAG representation of a Spark job [Spark Branch]	Resolved	Xuefu Zhang
10.	Create MapFunction [Spark Branch]	Resolved	Xuefu Zhang
11.	Create ReduceFunction [Spark Branch]	Resolved	Xuefu Zhang
12.	Create SparkPlanGenerator [Spark Branch]	Resolved	Xuefu Zhang
13.	Create a MiniSparkCluster and set up a testing framework [Spark Branch]	Resolved	Rui Li
14.	Research into reduce-side join [Spark Branch]	Resolved	Szehon Ho
15.	Spark 1.0.1 is released, stop using SNAPSHOT [Spark Branch]	Resolved	Brock Noland
16.	Exclude hadoop 1 from spark dep [Spark Branch]	Resolved	Brock Noland
17.	Load Spark configuration into Hive driver [Spark Branch]	Resolved	Chengxiang Li
18.	Counters, statistics, and metrics [Spark Branch]	Resolved	Chengxiang Li
19.	Spark job monitoring and error reporting [Spark Branch]	Resolved	Chengxiang Li
20.	Implement pre-commit testing [Spark Branch]	Resolved	Brock Noland
21.	Enhance SparkCollector [Spark Branch]	Resolved	Venki Korukanti
22.	Enhance HiveReduceFunction's row clustering [Spark Branch]	Resolved	Chao Sun
23.	Support Hive's multi-table insert query with Spark [Spark Branch]	Resolved	Chao Sun
24.	Support order by and sort by on Spark [Spark Branch]	Resolved	Rui Li
25.	Support cluster by and distributed by [Spark Branch]	Resolved	Rui Li
26.	Support union all on Spark [Spark Branch]	Resolved	Na Yang
27.	StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch]	Open	Unassigned
28.	StarterProject: Fix exception handling in POC code [Spark Branch]	Resolved	Chao Sun
29.	StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch]	Resolved	Chao Sun
30.	Make sure multi-MR queries work [Spark Branch]	Resolved	Chao Sun
31.	Support dynamic partitioning [Spark Branch]	Resolved	Chinna Rao Lalam
32.	Instantiate SparkClient per user session [Spark Branch]	Resolved	Chinna Rao Lalam
33.	Support analyze table [Spark Branch]	Resolved	Chengxiang Li
34.	Find solution for closures containing writables [Spark Branch]	Resolved	Unassigned
35.	Support Hive TABLESAMPLE [Spark Branch]	Resolved	Chengxiang Li
36.	Create TestSparkCliDriver to run test in spark local mode [Spark Branch]	Resolved	Szehon Ho
37.	Update to Spark 1.2 [Spark Branch]	Resolved	Brock Noland
38.	Implement native HiveMapFunction [Spark Branch]	Resolved	Chengxiang Li
39.	Implement native HiveReduceFunction [Spark Branch]	Resolved	Chengxiang Li
40.	Start running .q file tests on spark [Spark Branch]	Resolved	Chinna Rao Lalam
41.	Fix qtest-spark pom.xml reference to test properties [Spark Branch]	Resolved	Brock Noland
42.	Create SparkReporter [Spark Branch]	Resolved	Chengxiang Li
43.	Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch]	Resolved	Chao Sun
44.	TestSparkCliDriver should not use includeQueryFiles [Spark Branch]	Resolved	Brock Noland
45.	Add .q tests coverage for "union all" [Spark Branch]	Resolved	Na Yang
46.	Enable q-tests for TABLESAMPLE feature [Spark Branch]	Resolved	Chengxiang Li
47.	Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch]	Resolved	Chao Sun
48.	Enable q-tests for ANALYZE TABLE feature [Spark Branch]	Resolved	Na Yang
49.	Add qfile_regex to qtest-spark pom [Spark Branch]	Resolved	Brock Noland
50.	Enable timestamp.* tests [Spark Branch]	Resolved	Brock Noland
51.	Enable avro* tests [Spark Branch]	Resolved	Brock Noland
52.	PTest2 separates test files with spaces while QTestGen uses commas [Spark Branch]	Resolved	Brock Noland
53.	Cleanup Reduce operator code [Spark Branch]	Resolved	Rui Li
54.	hive.optimize.union.remove does not work properly [Spark Branch]	Resolved	Na Yang
55.	Integrate with Spark executor scaling [Spark Branch]	Resolved	Chengxiang Li
56.	Research optimization of auto convert join to map join [Spark branch]	Resolved	Suhas Satish
57.	Support windowing and analytic functions [Spark Branch]	Resolved	Chengxiang Li
58.	Enable windowing and analytic function qtests [Spark Branch]	Resolved	Chengxiang Li
59.	Union all query finished with errors [Spark Branch]	Resolved	Rui Li
60.	Enable tests on Spark branch (1) [Sparch Branch]	Resolved	Brock Noland
61.	Enable tests on Spark branch (2) [Sparch Branch]	Resolved	Venki Korukanti
62.	Enable tests on Spark branch (3) [Sparch Branch]	Resolved	Chengxiang Li
63.	Enable tests on Spark branch (4) [Sparch Branch]	Resolved	Chinna Rao Lalam
64.	Enable map-join tests which Tez executes [Spark Branch]	Resolved	Rui Li
65.	CounterStatsAggregator throws a class cast exception	Resolved	Brock Noland
66.	union_null.q is not deterministic	Closed	Brock Noland
67.	StarterProject: enable groupby4.q [Spark Branch]	Resolved	Suhas Satish
68.	Research commented out unset in Utiltities [Spark Branch]	Resolved	Unassigned
69.	Update union_null results now that it's deterministic [Spark Branch]	Resolved	Brock Noland
70.	Refresh SparkContext when spark configuration changes [Spark Branch]	Resolved	Chinna Rao Lalam
71.	Enable reduce-side join tests (1) [Spark Branch]	Resolved	Szehon Ho
72.	Merge from trunk (1) [Spark Branch]	Resolved	Brock Noland
73.	Re-order spark.query.files in sorted order [Spark Branch]	Resolved	Brock Noland
74.	Build long running HS2 test framework	Closed	Suhas Satish
75.	Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch]	Resolved	Na Yang
76.	Re-enable lazy HiveBaseFunctionResultList [Spark Branch]	Resolved	Jimmy Xiang
77.	Enable qtest load_dyn_part1.q [Spark Branch]	Resolved	Venki Korukanti
78.	orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch]	Resolved	Venki Korukanti
79.	optimize_nullscan.q fails due to differences in explain plan [Spark Branch]	Resolved	Venki Korukanti
80.	Support multiple concurrent users	Resolved	Chengxiang Li
81.	Support subquery [Spark Branch]	Resolved	Xuefu Zhang
82.	enable Qtest scriptfile1.q [Spark Branch]	Resolved	Chengxiang Li
83.	enable sample8.q.[Spark Branch]	Resolved	Chengxiang Li
84.	enable sample10.q.[Spark Branch]	Resolved	Chengxiang Li
85.	Insert overwrite table query does not generate correct task plan [Spark Branch]	Resolved	Na Yang
86.	Research Hive dependency on MR distributed cache[Spark Branch]	Open	Unassigned
87.	Merge from trunk (2) [Spark Branch]	Resolved	Brock Noland
88.	Investigate query failures (1)	Resolved	Thomas Friedrich
89.	Investigate query failures (2)	Resolved	Thomas Friedrich
90.	Investigate query failures (3)	Resolved	Thomas Friedrich
91.	Investigate query failures (4)	Resolved	Thomas Friedrich
92.	Merge from trunk (3) [Spark Branch]	Resolved	Brock Noland
93.	Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch]	Resolved	Rui Li
94.	Fix TestSparkCliDriver => optimize_nullscan.q	Resolved	Brock Noland
95.	Merge trunk into spark 9/12/2014	Resolved	Brock Noland
96.	Enable vectorization for spark [spark branch]	Resolved	Chinna Rao Lalam
97.	Code cleanup after HIVE-8054 [Spark Branch]	Resolved	Na Yang
98.	Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch]	Resolved	Na Yang
99.	Remove obsolete code from SparkWork [Spark Branch]	Resolved	Chao Sun
100.	Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch]	Resolved	Na Yang
101.	Support SMB Join for Hive on Spark [Spark Branch]	Resolved	Szehon Ho
102.	Merge from trunk to spark 9/20/14	Resolved	Brock Noland
103.	clone SparkWork for join optimization	Resolved	Unassigned
104.	GroupByShuffler.java missing apache license header [Spark Branch]	Resolved	Chao Sun
105.	Merge from trunk to spark 9/29/14	Resolved	Xuefu Zhang
106.	Enable windowing.q for spark [Spark Branch]	Resolved	Jimmy Xiang
107.	Merge trunk into spark 10/4/2015 [Spark Branch]	Resolved	Brock Noland
108.	Fix fs_default_name2.q on spark [Spark Branch]	Resolved	Brock Noland
109.	Investigate flaky test parallel.q	Resolved	Jimmy Xiang
110.	TPCDS query #7 fails with IndexOutOfBoundsException [Spark Branch]	Resolved	Jimmy Xiang
111.	Research Bucket Map Join [Spark Branch]	Resolved	Na Yang
112.	Research on skewed join [Spark Branch]	Resolved	Rui Li
113.	Make reduce side join work for all join queries [Spark Branch]	Resolved	Xuefu Zhang
114.	Turn on all join .q tests [Spark Branch]	Resolved	Chao Sun
115.	Print Spark job progress format info on the console[Spark Branch]	Resolved	Chengxiang Li
116.	Support Hive Counter to collect spark job metric[Spark Branch]	Resolved	Chengxiang Li
117.	Update timestamp in status console [Spark Branch]	Resolved	Brock Noland
118.	TPC-DS Query 96 parallelism is not set correcly	Resolved	Chao Sun
119.	Merge trunk into spark 10/17/14 [Spark Branch]	Resolved	Brock Noland
120.	UT: add TestSparkMinimrCliDriver to run UTs that use HDFS	Open	Thomas Friedrich
121.	UT: fix bucket_num_reducers test	Open	Chinna Rao Lalam
122.	UTs: create missing output files for some tests under clientpositive/spark	Open	Thomas Friedrich
123.	UT: add test flag in hive-site.xml for spark tests	Resolved	Thomas Friedrich
124.	UT: fix rcfile_bigdata test [Spark Branch]	Resolved	Chinna Rao Lalam
125.	UT: fix bucketsort_insert tests - related to SMBMapJoinOperator	Resolved	Chinna Rao Lalam
126.	UT: fix list_bucket_dml_2 test [Spark Branch]	Resolved	Chinna Rao Lalam
127.	Update async action in SparkClient as Spark add new Java action API[Spark Branch]	Resolved	Chengxiang Li
128.	Add remote Spark client to Hive [Spark Branch]	Resolved	Marcelo Masiero Vanzin
129.	Enable collect table statistics based on SparkCounter[Spark Branch]	Resolved	Chengxiang Li
130.	HivePairFlatMapFunction.java missing license header [Spark Branch]	Resolved	Chao Sun
131.	Add InterfaceAudience annotations to spark-client [Spark Branch]	Resolved	Marcelo Masiero Vanzin
132.	convert joinOp to MapJoinOp and generate MapWorks only [Spark Branch]	Resolved	Suhas Satish
133.	Implement bucket map join optimization [Spark Branch]	Resolved	Jimmy Xiang
134.	Convert SMBJoin to MapJoin [Spark Branch]	Resolved	Szehon Ho
135.	Support hints of SMBJoin [Spark Branch]	Resolved	Szehon Ho
136.	Reduce Side Join with single reducer [Spark Branch]	Resolved	Szehon Ho
137.	Enable parallelism in Reduce Side Join [Spark Branch]	Resolved	Szehon Ho
138.	Increase level of parallelism in reduce phase [Spark Branch]	Resolved	Jimmy Xiang
139.	Combine Hive Operator statistic and Spark Metric to an uniformed query statistic.[Spark Branch]	Resolved	Chengxiang Li
140.	Result differences after merge [Spark Branch]	Resolved	Brock Noland
141.	Fix tests after merge [Spark Branch]	Resolved	Brock Noland
142.	Enable table statistic collection on counter for CTAS query[Spark Branch]	Resolved	Chengxiang Li
143.	spark-client build failed sometimes.[Spark Branch]	Resolved	Chengxiang Li
144.	Collect Spark TaskMetrics and build job statistic[Spark Branch]	Resolved	Chengxiang Li
145.	Null Pointer Exception when counter is used for stats during inserting overwrite partitioned tables [Spark Branch]	Resolved	Na Yang
146.	numRows and rawDataSize are not collected by the Spark stats [Spark Branch]	Resolved	Na Yang
147.	Investigate test failures related to HIVE-8545 [Spark Branch]	Resolved	Jimmy Xiang
148.	Fix hadoop-1 build [Spark Branch]	Resolved	Jimmy Xiang
149.	Merge from trunk 11/6/14 [SPARK BRANCH]	Resolved	Brock Noland
150.	Should only register used counters in SparkCounters[Spark Branch]	Resolved	Chengxiang Li
151.	insert1.q and ppd_join4.q hangs with hadoop-1 [Spark Branch]	Resolved	Chengxiang Li
152.	Create some tests that use Spark counter for stats collection [Spark Branch]	Resolved	Chengxiang Li
153.	UT: update hive-site.xml for spark UTs to add hive_admin_user to admin role	Resolved	Thomas Friedrich
154.	UT: fix partition test case [Spark Branch]	Resolved	Chinna Rao Lalam
155.	UT: fix udf_context_aware	Resolved	Aihua Xu
156.	UT: fix hook_context_cs test case	Open	Unassigned
157.	Switch precommit test from local to local-cluster [Spark Branch]	Resolved	Szehon Ho
158.	Print prettier Spark work graph after HIVE-8793 [Spark Branch]	Resolved	Jimmy Xiang
159.	Release RDD cache when Hive query is done [Spark Branch]	Resolved	Jimmy Xiang
160.	Choose a persisent policy for RDD caching [Spark Branch]	Resolved	Jimmy Xiang
161.	Hive/Spark/Yarn integration [Spark Branch]	Resolved	Chengxiang Li
162.	Update new spark progress API for local submitted job monitoring [Spark Branch]	Resolved	Rui Li
163.	Visualize generated Spark plan [Spark Branch]	Closed	Chinna Rao Lalam
164.	Downgrade guava version to be consistent with Hive and the rest of Hadoop [Spark Branch]	Open	Unassigned
165.	Fix test TestHiveKVResultCache [Spark Branch]	Resolved	Jimmy Xiang
166.	Use MEMORY_AND_DISK for RDD caching [Spark Branch]	Resolved	Jimmy Xiang
167.	Merge from trunk to spark [Spark Branch]	Resolved	Brock Noland
168.	downgrade guava version for spark branch from 14.0.1 to 11.0.2.[Spark Branch]	Resolved	Chengxiang Li
169.	Servlet classes signer information does not match [Spark branch]	Resolved	Chengxiang Li
170.	IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch]	Resolved	Xuefu Zhang
171.	Remove unnecessary dependency collection task [Spark Branch]	Resolved	Rui Li
172.	Make sure Spark + HS2 work [Spark Branch]	Resolved	Chengxiang Li
173.	Merge from trunk Nov 28 2014	Resolved	Brock Noland
174.	Find thread leak in RSC Tests [Spark Branch]	Resolved	Rui Li
175.	Logging is not configured in spark-submit sub-process	Resolved	Brock Noland
176.	SparkCounter display name is not set correctly[Spark Branch]	Resolved	Chengxiang Li
177.	Clean up temp files of RSC [Spark Branch]	Open	Unassigned
178.	Avoid using SPARK_JAVA_OPTS [Spark Branch]	Resolved	Rui Li
179.	Re-enable remaining tests after HIVE-8970 [Spark Branch]	Resolved	Chao Sun
180.	Enable ppd_join4 [Spark Branch]	Resolved	Chao Sun
181.	Replace akka for remote spark client RPC [Spark Branch]	Resolved	Marcelo Masiero Vanzin
182.	Spark Memory can be formatted string [Spark Branch]	Resolved	Jimmy Xiang
183.	Support multiple mapjoin operators in one work [Spark Branch]	Resolved	Jimmy Xiang
184.	HiveException: Conflict on row inspector for {table}	Resolved	Jimmy Xiang
185.	Choosing right preference between map join and bucket map join [Spark Branch]	Open	Unassigned
186.	Add additional logging to SetSparkReducerParallelism [Spark Branch]	Resolved	Brock Noland
187.	Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch]	Resolved	Chengxiang Li
188.	NPE in RemoteSparkJobStatus.getSparkStatistics [Spark Branch]	Resolved	Rui Li
189.	Generate better plan for queries containing both union and multi-insert [Spark Branch]	Resolved	Chao Sun
190.	Allow RPC Configuration [Spark Branch]	Resolved	Unassigned
191.	Hive should not submit second SparkTask while previous one has failed.[Spark Branch]	Resolved	Chengxiang Li
192.	Hive hangs while failed to get executorCount[Spark Branch]	Resolved	Chengxiang Li
193.	Skip child tasks if parent task failed [Spark Branch]	Resolved	Unassigned
194.	Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch]	Resolved	Jimmy Xiang
195.	Investigate IOContext object initialization problem [Spark Branch]	Resolved	Xuefu Zhang
196.	Spark Client RPC should have larger default max message size [Spark Branch]	Resolved	Brock Noland
197.	Spark counter serialization error in spark.log [Spark Branch]	Resolved	Chengxiang Li
198.	Error when cleaning up in spark.log [Spark Branch]	Open	Unassigned
199.	TimeoutException when trying get executor count from RSC [Spark Branch]	Resolved	Chengxiang Li
200.	Check cross product for conditional task [Spark Branch]	Resolved	Rui Li
201.	infer_bucket_sort_convert_join.q and mapjoin_hook.q failed.[Spark Branch]	Resolved	Xuefu Zhang
202.	bucket_map_join_spark4.q failed due to NPE.[Spark Branch]	Resolved	Jimmy Xiang
203.	Support backup task for join related optimization [Spark Branch]	Patch Available	Chao Sun
204.	windowing.q failed when mapred.reduce.tasks is set to larger than one	Resolved	Chao Sun
205.	Add unit test for multi sessions.[Spark Branch]	Resolved	Chengxiang Li
206.	Enable beeline query progress information for Spark job[Spark Branch]	Resolved	Chengxiang Li
207.	RSC stdout is logged twice [Spark Branch]	Resolved	Jimmy Xiang
208.	Clean up GenSparkProcContext.clonedReduceSinks and related code [Spark Branch]	Closed	Chao Sun
209.	authorization_admin_almighty1.q fails with result diff [Spark Branch]	Resolved	Unassigned
210.	Merge from trunk to spark 12/26/2014 [Spark Branch]	Resolved	Brock Noland
211.	UT: set hive.support.concurrency to true for spark UTs	Open	Unassigned
212.	UT: udf_in_file fails with filenotfoundexception [Spark Branch]	Resolved	Chinna Rao Lalam
213.	Create a separate API for remote Spark Context RPC other than job submission [Spark Branch]	Resolved	Marcelo Masiero Vanzin
214.	Add listeners on JobHandle so job status change can be notified to the client [Spark Branch]	Resolved	Marcelo Masiero Vanzin
215.	TimeOutException when using RSC with beeline [Spark Branch]	Resolved	Unassigned
216.	One-pass SMB Optimizations [Spark Branch]	Resolved	Szehon Ho
217.	Choose Kryo as the serializer for pTest [Spark Branch]	Resolved	Xuefu Zhang
218.	Test windowing.q is failing [Spark Branch]	Resolved	Unassigned
219.	Add more log information for debug RSC[Spark Branch]	Resolved	Chengxiang Li
220.	Spark branch compile failed on hadoop-1[Spark Branch]	Resolved	Chengxiang Li
221.	Research on build mini HoS cluster on YARN for unit test[Spark Branch]	Resolved	Chengxiang Li
222.	Remove authorization_admin_almighty1 from spark tests [Spark Branch]	Resolved	Xuefu Zhang
223.	Investigate differences for auto join tests in explain after merge from trunk [Spark Branch]	Resolved	Chao Sun
224.	Followup for HIVE-9125, update ppd_join4.q.out for Spark [Spark Branch]	Resolved	Xuefu Zhang
225.	Remove tabs from spark code [Spark Branch]	Resolved	Brock Noland
226.	SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch]	Resolved	Rui Li
227.	Merge trunk to spark 1/5/2015 [Spark Branch]	Resolved	Szehon Ho
228.	Merge from spark to trunk January 2015	Resolved	Szehon Ho
229.	Explain query should share the same Spark application with regular queries [Spark Branch]	Resolved	Jimmy Xiang
230.	Ensure custom UDF works with Spark [Spark Branch]	Resolved	Xuefu Zhang
231.	Code cleanup [Spark Branch]	Resolved	Szehon Ho
232.	TODO cleanup task1.[Spark Branch]	Resolved	Chengxiang Li
233.	Cleanup code for getting spark job progress and metrics	Open	Rui Li
234.	Improve replication factor of small table file given big table partitions [Spark branch]	Open	Jimmy Xiang
235.	Set default miniClusterType back to none in QTestUtil.[Spark branch]	Resolved	Chengxiang Li
236.	Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch]	Resolved	Xuefu Zhang
237.	thrift.transport.TTransportException [Spark Branch]	Open	Chao Sun
238.	Cleanup Modified Files [Spark Branch]	Resolved	Szehon Ho
239.	Merge from trunk to spark 1/8/2015	Resolved	Szehon Ho
240.	BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch]	Resolved	Chengxiang Li
241.	Address review items on HIVE-9257 [Spark Branch]	Resolved	Brock Noland
242.	Optimize split grouping for CombineHiveInputFormat [Spark Branch]	Resolved	Jimmy Xiang
243.	Address review of HIVE-9257 (ii) [Spark Branch]	Resolved	Szehon Ho
244.	Fix windowing.q for Spark on trunk	Resolved	Rui Li
245.	Merge from spark to trunk (follow-up of HIVE-9257)	Resolved	Szehon Ho
246.	SparkJobMonitor timeout as sortByKey would launch extra Spark job before original job get submitted [Spark Branch]	Resolved	Chengxiang Li
247.	Fix tests with some versions of Spark + Snappy [Spark Branch]	Resolved	Brock Noland
248.	add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode [Spark Branch]	Resolved	Pierre Yin
249.	Shutting down cli takes quite some time [Spark Branch]	Resolved	Rui Li
250.	Make WAIT_SUBMISSION_TIMEOUT configuable and check timeout in SparkJobMonitor level.[Spark Branch]	Resolved	Chengxiang Li
251.	Avoid ser/de loggers as logging framework can be incompatible on driver and workers	Resolved	Rui Li
252.	ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch]	Resolved	Chengxiang Li
253.	Add jar/file doesn't work with yarn-cluster mode [Spark Branch]	Resolved	Rui Li
254.	Merge trunk to spark 1/21/2015	Resolved	Szehon Ho
255.	Move more hive.spark.* configurations to HiveConf [Spark Branch]	Resolved	Szehon Ho
256.	LocalSparkJobStatus may return failed job as successful [Spark Branch]	Resolved	Rui Li
257.	Push YARN configuration to Spark while deply Spark on YARN [Spark Branch]	Resolved	Chengxiang Li
258.	MapJoin task shouldn't start if HashTableSink task failed [Spark Branch]	Resolved	Unassigned
259.	No error thrown when global limit optimization failed to find enough number of rows [Spark Branch]	Resolved	Rui Li
260.	Make Remote Spark Context secure [Spark Branch]	Resolved	Marcelo Masiero Vanzin
261.	Failed job may not throw exceptions [Spark Branch]	Resolved	Rui Li
262.	Enable CBO related tests [Spark Branch]	Closed	Chinna Rao Lalam
263.	UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch]	Resolved	Chao Sun
264.	Hive reported exception because that hive's derby version conflict with spark's derby version [Spark Branch]	Patch Available	Pierre Yin
265.	Enable infer_bucket_sort_dyn_part.q for TestMiniSparkOnYarnCliDriver test. [Spark Branch]	Open	Unassigned
266.	SparkSessionImpl calcualte wrong cores number in TestSparkCliDriver [Spark Branch]	Open	Unassigned
267.	Merge trunk to Spark branch 2/2/2015 [Spark Branch]	Resolved	Xuefu Zhang
268.	SHUFFLE_SORT should only be used for order by query [Spark Branch]	Closed	Rui Li
269.	Revert changes in two test configuration files accidently brought in by HIVE-9552 [Spark Branch]	Resolved	Xuefu Zhang
270.	Enable more unit tests for UNION ALL [Spark Branch]	Closed	Chao Sun
271.	Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]	Resolved	Jimmy Xiang
272.	'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch]	Closed	Rui Li
273.	Improve some qtests	Closed	Rui Li
274.	Support Impersonation [Spark Branch]	Closed	Brock Noland
275.	Address RB comments for HIVE-9425 [Spark Branch]	Closed	Unassigned
276.	Hive on Spark is not as aggressive as MR on map join [Spark Branch]	Resolved	Unassigned
277.	Merge trunk to Spark branch 2/15/2015 [Spark Branch]	Closed	Xuefu Zhang
278.	Upgrade to spark 1.3 [Spark Branch]	Closed	Brock Noland
279.	Print yarn application id to console [Spark Branch]	Closed	Rui Li
280.	Utilize spark.kryo.classesToRegister [Spark Branch]	Closed	Jimmy Xiang
281.	java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence	Resolved	Unassigned
282.	Merge trunk to Spark branch 02/27/2015 [Spark Branch]	Closed	Xuefu Zhang
283.	Load spark-defaults.conf from classpath [Spark Branch]	Closed	Brock Noland
284.	Querying parquet tables fails with IllegalStateException [Spark Branch]	Resolved	Unassigned
285.	Print spark job id in history file [spark branch]	Closed	Chinna Rao Lalam
286.	Add jar/file doesn't work with yarn-cluster mode [Spark Branch]	Closed	Rui Li
287.	Merge trunk to Spark branch 3/6/2015 [Spark Branch]	Closed	Xuefu Zhang
288.	New Beeline queries will hang If Beeline terminates in-properly [Spark Branch]	Closed	Jimmy Xiang
289.	Avoid Utilities.getMapRedWork for spark [Spark Branch]	Closed	Rui Li
290.	RSC has memory leak while execute multi queries.[Spark Branch]	Closed	Chengxiang Li
291.	HiveInputFormat implementations getsplits may lead to memory leak.[Spark Branch]	Open	Unassigned
292.	Log the information of cached RDD [Spark Branch]	Resolved	Chinna Rao Lalam
293.	Provide more informative stage description in Spark Web UI [Spark Branch]	Open	Unassigned
294.	Improve common join performance [Spark Branch]	Open	Unassigned
295.	Merge trunk to Spark branch 03/27/2015 [Spark Branch]	Closed	Xuefu Zhang
296.	Fix test failures after HIVE-10130 [Spark Branch]	Closed	Chao Sun
297.	Merge Spark branch to master 7/30/2015	Closed	Xuefu Zhang
298.	Implement Hybrid Hybrid Grace Hash Join for Spark Branch [Spark Branch]	Open	Unassigned
299.	Hive on Spark job configuration needs to be logged [Spark Branch]	Closed	Szehon Ho
300.	ParseException issue (Failed to recognize predicate 'user') [Spark Branch]	Resolved	Sivashankar
301.	Merge trunk to spark 4/14/2015 [Spark Branch]	Resolved	Szehon Ho
302.	Fix test failures after last merge from trunk [Spark Branch]	Open	Unassigned
303.	Merge spark to trunk 4/15/2015	Closed	Szehon Ho
304.	Cancel connection when remote Spark driver process has failed [Spark Branch]	Closed	Chao Sun
305.	Enable parallel order by for spark [Spark Branch]	Closed	Rui Li
306.	Hive query should fail when it fails to initialize a session in SetSparkReducerParallelism [Spark Branch]	Closed	Chao Sun
307.	NPE in SparkUtilities::isDedicatedCluster [Spark Branch]	Closed	Rui Li
308.	Dynamic RDD caching optimization for HoS.[Spark Branch]	Closed	Chengxiang Li
309.	Combine equivalent Works for HoS[Spark Branch]	Closed	Chengxiang Li
310.	Followup for HIVE-10550, check performance w.r.t. persistence level [Spark Branch]	Open	GaoLun
311.	Make HIVE-10001 work with Spark [Spark Branch]	Open	Unassigned
312.	Make HIVE-10568 work with Spark [Spark Branch]	Closed	Rui Li
313.	Merge master to Spark branch 7/29/2015 [Spark Branch]	Resolved	Xuefu Zhang
314.	Merge master to Spark branch 6/7/2015 [Spark Branch]	Resolved	Unassigned
315.	HoS can't control number of map tasks for runtime skew join [Spark Branch]	Closed	Rui Li
316.	Upgrade Spark dependency to 1.4 [Spark Branch]	Closed	Rui Li
317.	Hive not able to pass Hive's Kerberos credential to spark-submit process [Spark Branch]	Resolved	Unassigned
318.	Enable more tests for grouping by skewed data [Spark Branch]	Resolved	Mohit Sabharwal
319.	Add more tests for HIVE-10844[Spark Branch]	Closed	GaoLun
320.	Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch]	Closed	Xuefu Zhang
321.	Merge master to Spark branch 6/20/2015 [Spark Branch]	Resolved	Xuefu Zhang
322.	Support multi edge between nodes in SparkPlan[Spark Branch]	Closed	Chengxiang Li
323.	Investigate intermitten failure of join28.q for Spark	Resolved	Mohit Sabharwal
324.	Add support for running negative q-tests [Spark Branch]	Closed	Mohit Sabharwal
325.	HashTableSinkOperator doesn't support vectorization [Spark Branch]	Closed	Rui Li
326.	Support hive.explain.user for Spark	Closed	Sahil Takiar
327.	Query fails when there isn't a comparator for an operator [Spark Branch]	Closed	Rui Li
328.	Enable native vectorized map join for spark [Spark Branch]	Closed	Rui Li
329.	Research on recent failed qtests[Spark Branch]	Resolved	Chengxiang Li
330.	Combine equavilent leaf works in SparkWork[Spark Branch]	Open	Chengxiang Li
331.	Optimization around job submission and adding jars [Spark Branch]	Resolved	Chengxiang Li
332.	Print "Execution completed successfully" as part of spark job info [Spark Branch]	Closed	Ferdinand Xu
333.	Prewarm Hive on Spark containers [Spark Branch]	Closed	Xuefu Zhang
334.	Merge master to Spark branch 9/16/2015 [Spark Branch]	Resolved	Xuefu Zhang
335.	HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch]	Resolved	Unassigned
336.	Merge file doesn't work for ORC table when running on Spark. [Spark Branch]	Closed	Rui Li
337.	Fix test failures after HIVE-11844 [Spark Branch]	Closed	Rui Li
338.	Merge master to Spark branch 10/28/2015 [Spark Branch]	Closed	Xuefu Zhang
339.	Merge master into spark 11/17/2015 [Spark Branch]	Resolved	Xuefu Zhang
340.	[Spark Branch] ClassNotFoundException occurs during query case with group by and UDF defined	Open	Chengxiang Li
341.	NullPointerException thrown by Executors causes job can't be finished	Open	Unassigned

Activity

People

Assignee:: Xuefu Zhang

Reporter:: Xuefu Zhang

Votes:: 49 Vote for this issue

Watchers:: 188 Start watching this issue

Dates

Created:: 25/Jun/14 19:56

Updated:: 19/Oct/21 12:21

Resolved:: 11/Sep/17 21:43