Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292

Hive on Spark

    XMLWordPrintableJSON

Details

    Description

      Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

      Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

      Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

      This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!

      Attachments

        1. Hive-on-Spark.pdf
          290 kB
          Xuefu Zhang

        Issue Links

          1.
          Refactoring: make Hive reduce side data processing reusable [Spark Branch] Sub-task Reopened Xuefu Zhang
          2.
          Refactoring: make Hive map side data processing reusable [Spark Branch] Sub-task Resolved Xuefu Zhang
          3.
          Create SparkWork [Spark Branch] Sub-task Resolved Xuefu Zhang
          4.
          Create SparkTask [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          5.
          Create SparkCompiler [Spark Branch] Sub-task Resolved Xuefu Zhang
          6.
          Create SparkClient, interface to Spark cluster [Spark Branch] Sub-task Resolved Chengxiang Li
          7.
          Create RDD translator, translating Hive Tables into Spark RDDs [Spark Branch] Sub-task Resolved Rui Li
          8.
          Create SparkShuffler, shuffling data between map-side data processing and reduce-side processing [Spark Branch] Sub-task Resolved Rui Li
          9.
          Create SparkPlan, DAG representation of a Spark job [Spark Branch] Sub-task Resolved Xuefu Zhang
          10.
          Create MapFunction [Spark Branch] Sub-task Resolved Xuefu Zhang
          11.
          Create ReduceFunction [Spark Branch] Sub-task Resolved Xuefu Zhang
          12.
          Create SparkPlanGenerator [Spark Branch] Sub-task Resolved Xuefu Zhang
          13.
          Create a MiniSparkCluster and set up a testing framework [Spark Branch] Sub-task Resolved Rui Li
          14.
          Research into reduce-side join [Spark Branch] Sub-task Resolved Szehon Ho
          15.
          Spark 1.0.1 is released, stop using SNAPSHOT [Spark Branch] Sub-task Resolved Brock Noland
          16.
          Exclude hadoop 1 from spark dep [Spark Branch] Sub-task Resolved Brock Noland
          17.
          Load Spark configuration into Hive driver [Spark Branch] Sub-task Resolved Chengxiang Li
          18.
          Counters, statistics, and metrics [Spark Branch] Sub-task Resolved Chengxiang Li
          19.
          Spark job monitoring and error reporting [Spark Branch] Sub-task Resolved Chengxiang Li
          20.
          Implement pre-commit testing [Spark Branch] Sub-task Resolved Brock Noland
          21.
          Enhance SparkCollector [Spark Branch] Sub-task Resolved Venki Korukanti
          22.
          Enhance HiveReduceFunction's row clustering [Spark Branch] Sub-task Resolved Chao Sun
          23.
          Support Hive's multi-table insert query with Spark [Spark Branch] Sub-task Resolved Chao Sun
          24.
          Support order by and sort by on Spark [Spark Branch] Sub-task Resolved Rui Li
          25.
          Support cluster by and distributed by [Spark Branch] Sub-task Resolved Rui Li
          26.
          Support union all on Spark [Spark Branch] Sub-task Resolved Na Yang
          27.
          StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch] Sub-task Open Unassigned
          28.
          StarterProject: Fix exception handling in POC code [Spark Branch] Sub-task Resolved Chao Sun
          29.
          StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch] Sub-task Resolved Chao Sun
          30.
          Make sure multi-MR queries work [Spark Branch] Sub-task Resolved Chao Sun
          31.
          Support dynamic partitioning [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          32.
          Instantiate SparkClient per user session [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          33.
          Support analyze table [Spark Branch] Sub-task Resolved Chengxiang Li
          34.
          Find solution for closures containing writables [Spark Branch] Sub-task Resolved Unassigned
          35.
          Support Hive TABLESAMPLE [Spark Branch] Sub-task Resolved Chengxiang Li
          36.
          Create TestSparkCliDriver to run test in spark local mode [Spark Branch] Sub-task Resolved Szehon Ho
          37.
          Update to Spark 1.2 [Spark Branch] Sub-task Resolved Brock Noland
          38.
          Implement native HiveMapFunction [Spark Branch] Sub-task Resolved Chengxiang Li
          39.
          Implement native HiveReduceFunction [Spark Branch] Sub-task Resolved Chengxiang Li
          40.
          Start running .q file tests on spark [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          41.
          Fix qtest-spark pom.xml reference to test properties [Spark Branch] Sub-task Resolved Brock Noland
          42.
          Create SparkReporter [Spark Branch] Sub-task Resolved Chengxiang Li
          43.
          Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch] Sub-task Resolved Chao Sun
          44.
          TestSparkCliDriver should not use includeQueryFiles [Spark Branch] Sub-task Resolved Brock Noland
          45.
          Add .q tests coverage for "union all" [Spark Branch] Sub-task Resolved Na Yang
          46.
          Enable q-tests for TABLESAMPLE feature [Spark Branch] Sub-task Resolved Chengxiang Li
          47.
          Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch] Sub-task Resolved Chao Sun
          48.
          Enable q-tests for ANALYZE TABLE feature [Spark Branch] Sub-task Resolved Na Yang
          49.
          Add qfile_regex to qtest-spark pom [Spark Branch] Sub-task Resolved Brock Noland
          50.
          Enable timestamp.* tests [Spark Branch] Sub-task Resolved Brock Noland
          51.
          Enable avro* tests [Spark Branch] Sub-task Resolved Brock Noland
          52.
          PTest2 separates test files with spaces while QTestGen uses commas [Spark Branch] Sub-task Resolved Brock Noland
          53.
          Cleanup Reduce operator code [Spark Branch] Sub-task Resolved Rui Li
          54.
          hive.optimize.union.remove does not work properly [Spark Branch] Sub-task Resolved Na Yang
          55.
          Integrate with Spark executor scaling [Spark Branch] Sub-task Resolved Chengxiang Li
          56.
          Research optimization of auto convert join to map join [Spark branch] Sub-task Resolved Suhas Satish
          57.
          Support windowing and analytic functions [Spark Branch] Sub-task Resolved Chengxiang Li
          58.
          Enable windowing and analytic function qtests [Spark Branch] Sub-task Resolved Chengxiang Li
          59.
          Union all query finished with errors [Spark Branch] Sub-task Resolved Rui Li
          60.
          Enable tests on Spark branch (1) [Sparch Branch] Sub-task Resolved Brock Noland
          61.
          Enable tests on Spark branch (2) [Sparch Branch] Sub-task Resolved Venki Korukanti
          62.
          Enable tests on Spark branch (3) [Sparch Branch] Sub-task Resolved Chengxiang Li
          63.
          Enable tests on Spark branch (4) [Sparch Branch] Sub-task Resolved Chinna Rao Lalam
          64.
          Enable map-join tests which Tez executes [Spark Branch] Sub-task Resolved Rui Li
          65.
          CounterStatsAggregator throws a class cast exception Sub-task Resolved Brock Noland
          66.
          union_null.q is not deterministic Sub-task Closed Brock Noland
          67.
          StarterProject: enable groupby4.q [Spark Branch] Sub-task Resolved Suhas Satish
          68.
          Research commented out unset in Utiltities [Spark Branch] Sub-task Resolved Unassigned
          69.
          Update union_null results now that it's deterministic [Spark Branch] Sub-task Resolved Brock Noland
          70.
          Refresh SparkContext when spark configuration changes [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          71.
          Enable reduce-side join tests (1) [Spark Branch] Sub-task Resolved Szehon Ho
          72.
          Merge from trunk (1) [Spark Branch] Sub-task Resolved Brock Noland
          73.
          Re-order spark.query.files in sorted order [Spark Branch] Sub-task Resolved Brock Noland
          74.
          Build long running HS2 test framework Sub-task Closed Suhas Satish
          75.
          Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch] Sub-task Resolved Na Yang
          76.
          Re-enable lazy HiveBaseFunctionResultList [Spark Branch] Sub-task Resolved Jimmy Xiang
          77.
          Enable qtest load_dyn_part1.q [Spark Branch] Sub-task Resolved Venki Korukanti
          78.
          orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch] Sub-task Resolved Venki Korukanti
          79.
          optimize_nullscan.q fails due to differences in explain plan [Spark Branch] Sub-task Resolved Venki Korukanti
          80.
          Support multiple concurrent users Sub-task Resolved Chengxiang Li
          81.
          Support subquery [Spark Branch] Sub-task Resolved Xuefu Zhang
          82.
          enable Qtest scriptfile1.q [Spark Branch] Sub-task Resolved Chengxiang Li
          83.
          enable sample8.q.[Spark Branch] Sub-task Resolved Chengxiang Li
          84.
          enable sample10.q.[Spark Branch] Sub-task Resolved Chengxiang Li
          85.
          Insert overwrite table query does not generate correct task plan [Spark Branch] Sub-task Resolved Na Yang
          86.
          Research Hive dependency on MR distributed cache[Spark Branch] Sub-task Open Unassigned
          87.
          Merge from trunk (2) [Spark Branch] Sub-task Resolved Brock Noland
          88.
          Investigate query failures (1) Sub-task Resolved Thomas Friedrich
          89.
          Investigate query failures (2) Sub-task Resolved Thomas Friedrich
          90.
          Investigate query failures (3) Sub-task Resolved Thomas Friedrich
          91.
          Investigate query failures (4) Sub-task Resolved Thomas Friedrich
          92.
          Merge from trunk (3) [Spark Branch] Sub-task Resolved Brock Noland
          93.
          Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] Sub-task Resolved Rui Li
          94.
          Fix TestSparkCliDriver => optimize_nullscan.q Sub-task Resolved Brock Noland
          95.
          Merge trunk into spark 9/12/2014 Sub-task Resolved Brock Noland
          96.
          Enable vectorization for spark [spark branch] Sub-task Resolved Chinna Rao Lalam
          97.
          Code cleanup after HIVE-8054 [Spark Branch] Sub-task Resolved Na Yang
          98.
          Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch] Sub-task Resolved Na Yang
          99.
          Remove obsolete code from SparkWork [Spark Branch] Sub-task Resolved Chao Sun
          100.
          Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch] Sub-task Resolved Na Yang
          101.
          Support SMB Join for Hive on Spark [Spark Branch] Sub-task Resolved Szehon Ho
          102.
          Merge from trunk to spark 9/20/14 Sub-task Resolved Brock Noland
          103.
          clone SparkWork for join optimization Sub-task Resolved Unassigned
          104.
          GroupByShuffler.java missing apache license header [Spark Branch] Sub-task Resolved Chao Sun
          105.
          Merge from trunk to spark 9/29/14 Sub-task Resolved Xuefu Zhang
          106.
          Enable windowing.q for spark [Spark Branch] Sub-task Resolved Jimmy Xiang
          107.
          Merge trunk into spark 10/4/2015 [Spark Branch] Sub-task Resolved Brock Noland
          108.
          Fix fs_default_name2.q on spark [Spark Branch] Sub-task Resolved Brock Noland
          109.
          Investigate flaky test parallel.q Sub-task Resolved Jimmy Xiang
          110.
          TPCDS query #7 fails with IndexOutOfBoundsException [Spark Branch] Sub-task Resolved Jimmy Xiang
          111.
          Research Bucket Map Join [Spark Branch] Sub-task Resolved Na Yang
          112.
          Research on skewed join [Spark Branch] Sub-task Resolved Rui Li
          113.
          Make reduce side join work for all join queries [Spark Branch] Sub-task Resolved Xuefu Zhang
          114.
          Turn on all join .q tests [Spark Branch] Sub-task Resolved Chao Sun
          115.
          Print Spark job progress format info on the console[Spark Branch] Sub-task Resolved Chengxiang Li
          116.
          Support Hive Counter to collect spark job metric[Spark Branch] Sub-task Resolved Chengxiang Li
          117.
          Update timestamp in status console [Spark Branch] Sub-task Resolved Brock Noland
          118.
          TPC-DS Query 96 parallelism is not set correcly Sub-task Resolved Chao Sun
          119.
          Merge trunk into spark 10/17/14 [Spark Branch] Sub-task Resolved Brock Noland
          120.
          UT: add TestSparkMinimrCliDriver to run UTs that use HDFS Sub-task Open Thomas Friedrich
          121.
          UT: fix bucket_num_reducers test Sub-task Open Chinna Rao Lalam
          122.
          UTs: create missing output files for some tests under clientpositive/spark Sub-task Open Thomas Friedrich
          123.
          UT: add test flag in hive-site.xml for spark tests Sub-task Resolved Thomas Friedrich
          124.
          UT: fix rcfile_bigdata test [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          125.
          UT: fix bucketsort_insert tests - related to SMBMapJoinOperator Sub-task Resolved Chinna Rao Lalam
          126.
          UT: fix list_bucket_dml_2 test [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          127.
          Update async action in SparkClient as Spark add new Java action API[Spark Branch] Sub-task Resolved Chengxiang Li
          128.
          Add remote Spark client to Hive [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin
          129.
          Enable collect table statistics based on SparkCounter[Spark Branch] Sub-task Resolved Chengxiang Li
          130.
          HivePairFlatMapFunction.java missing license header [Spark Branch] Sub-task Resolved Chao Sun
          131.
          Add InterfaceAudience annotations to spark-client [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin
          132.
          convert joinOp to MapJoinOp and generate MapWorks only [Spark Branch] Sub-task Resolved Suhas Satish
          133.
          Implement bucket map join optimization [Spark Branch] Sub-task Resolved Jimmy Xiang
          134.
          Convert SMBJoin to MapJoin [Spark Branch] Sub-task Resolved Szehon Ho
          135.
          Support hints of SMBJoin [Spark Branch] Sub-task Resolved Szehon Ho
          136.
          Reduce Side Join with single reducer [Spark Branch] Sub-task Resolved Szehon Ho
          137.
          Enable parallelism in Reduce Side Join [Spark Branch] Sub-task Resolved Szehon Ho
          138.
          Increase level of parallelism in reduce phase [Spark Branch] Sub-task Resolved Jimmy Xiang
          139.
          Combine Hive Operator statistic and Spark Metric to an uniformed query statistic.[Spark Branch] Sub-task Resolved Chengxiang Li
          140.
          Result differences after merge [Spark Branch] Sub-task Resolved Brock Noland
          141.
          Fix tests after merge [Spark Branch] Sub-task Resolved Brock Noland
          142.
          Enable table statistic collection on counter for CTAS query[Spark Branch] Sub-task Resolved Chengxiang Li
          143.
          spark-client build failed sometimes.[Spark Branch] Sub-task Resolved Chengxiang Li
          144.
          Collect Spark TaskMetrics and build job statistic[Spark Branch] Sub-task Resolved Chengxiang Li
          145.
          Null Pointer Exception when counter is used for stats during inserting overwrite partitioned tables [Spark Branch] Sub-task Resolved Na Yang
          146.
          numRows and rawDataSize are not collected by the Spark stats [Spark Branch] Sub-task Resolved Na Yang
          147.
          Investigate test failures related to HIVE-8545 [Spark Branch] Sub-task Resolved Jimmy Xiang
          148.
          Fix hadoop-1 build [Spark Branch] Sub-task Resolved Jimmy Xiang
          149.
          Merge from trunk 11/6/14 [SPARK BRANCH] Sub-task Resolved Brock Noland
          150.
          Should only register used counters in SparkCounters[Spark Branch] Sub-task Resolved Chengxiang Li
          151.
          insert1.q and ppd_join4.q hangs with hadoop-1 [Spark Branch] Sub-task Resolved Chengxiang Li
          152.
          Create some tests that use Spark counter for stats collection [Spark Branch] Sub-task Resolved Chengxiang Li
          153.
          UT: update hive-site.xml for spark UTs to add hive_admin_user to admin role Sub-task Resolved Thomas Friedrich
          154.
          UT: fix partition test case [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          155.
          UT: fix udf_context_aware Sub-task Resolved Aihua Xu
          156.
          UT: fix hook_context_cs test case Sub-task Open Unassigned
          157.
          Switch precommit test from local to local-cluster [Spark Branch] Sub-task Resolved Szehon Ho
          158.
          Print prettier Spark work graph after HIVE-8793 [Spark Branch] Sub-task Resolved Jimmy Xiang
          159.
          Release RDD cache when Hive query is done [Spark Branch] Sub-task Resolved Jimmy Xiang
          160.
          Choose a persisent policy for RDD caching [Spark Branch] Sub-task Resolved Jimmy Xiang
          161.
          Hive/Spark/Yarn integration [Spark Branch] Sub-task Resolved Chengxiang Li
          162.
          Update new spark progress API for local submitted job monitoring [Spark Branch] Sub-task Resolved Rui Li
          163.
          Visualize generated Spark plan [Spark Branch] Sub-task Closed Chinna Rao Lalam
          164.
          Downgrade guava version to be consistent with Hive and the rest of Hadoop [Spark Branch] Sub-task Open Unassigned
          165.
          Fix test TestHiveKVResultCache [Spark Branch] Sub-task Resolved Jimmy Xiang
          166.
          Use MEMORY_AND_DISK for RDD caching [Spark Branch] Sub-task Resolved Jimmy Xiang
          167.
          Merge from trunk to spark [Spark Branch] Sub-task Resolved Brock Noland
          168.
          downgrade guava version for spark branch from 14.0.1 to 11.0.2.[Spark Branch] Sub-task Resolved Chengxiang Li
          169.
          Servlet classes signer information does not match [Spark branch] Sub-task Resolved Chengxiang Li
          170.
          IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] Sub-task Resolved Xuefu Zhang
          171.
          Remove unnecessary dependency collection task [Spark Branch] Sub-task Resolved Rui Li
          172.
          Make sure Spark + HS2 work [Spark Branch] Sub-task Resolved Chengxiang Li
          173.
          Merge from trunk Nov 28 2014 Sub-task Resolved Brock Noland
          174.
          Find thread leak in RSC Tests [Spark Branch] Sub-task Resolved Rui Li
          175.
          Logging is not configured in spark-submit sub-process Sub-task Resolved Brock Noland
          176.
          SparkCounter display name is not set correctly[Spark Branch] Sub-task Resolved Chengxiang Li
          177.
          Clean up temp files of RSC [Spark Branch] Sub-task Open Unassigned
          178.
          Avoid using SPARK_JAVA_OPTS [Spark Branch] Sub-task Resolved Rui Li
          179.
          Re-enable remaining tests after HIVE-8970 [Spark Branch] Sub-task Resolved Chao Sun
          180.
          Enable ppd_join4 [Spark Branch] Sub-task Resolved Chao Sun
          181.
          Replace akka for remote spark client RPC [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin
          182.
          Spark Memory can be formatted string [Spark Branch] Sub-task Resolved Jimmy Xiang
          183.
          Support multiple mapjoin operators in one work [Spark Branch] Sub-task Resolved Jimmy Xiang
          184.
          HiveException: Conflict on row inspector for {table} Sub-task Resolved Jimmy Xiang
          185.
          Choosing right preference between map join and bucket map join [Spark Branch] Sub-task Open Unassigned
          186.
          Add additional logging to SetSparkReducerParallelism [Spark Branch] Sub-task Resolved Brock Noland
          187.
          Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch] Sub-task Resolved Chengxiang Li
          188.
          NPE in RemoteSparkJobStatus.getSparkStatistics [Spark Branch] Sub-task Resolved Rui Li
          189.
          Generate better plan for queries containing both union and multi-insert [Spark Branch] Sub-task Resolved Chao Sun
          190.
          Allow RPC Configuration [Spark Branch] Sub-task Resolved Unassigned
          191.
          Hive should not submit second SparkTask while previous one has failed.[Spark Branch] Sub-task Resolved Chengxiang Li
          192.
          Hive hangs while failed to get executorCount[Spark Branch] Sub-task Resolved Chengxiang Li
          193.
          Skip child tasks if parent task failed [Spark Branch] Sub-task Resolved Unassigned
          194.
          Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch] Sub-task Resolved Jimmy Xiang
          195.
          Investigate IOContext object initialization problem [Spark Branch] Sub-task Resolved Xuefu Zhang
          196.
          Spark Client RPC should have larger default max message size [Spark Branch] Sub-task Resolved Brock Noland
          197.
          Spark counter serialization error in spark.log [Spark Branch] Sub-task Resolved Chengxiang Li
          198.
          Error when cleaning up in spark.log [Spark Branch] Sub-task Open Unassigned
          199.
          TimeoutException when trying get executor count from RSC [Spark Branch] Sub-task Resolved Chengxiang Li
          200.
          Check cross product for conditional task [Spark Branch] Sub-task Resolved Rui Li
          201.
          infer_bucket_sort_convert_join.q and mapjoin_hook.q failed.[Spark Branch] Sub-task Resolved Xuefu Zhang
          202.
          bucket_map_join_spark4.q failed due to NPE.[Spark Branch] Sub-task Resolved Jimmy Xiang
          203.
          Support backup task for join related optimization [Spark Branch] Sub-task Patch Available Chao Sun
          204.
          windowing.q failed when mapred.reduce.tasks is set to larger than one Sub-task Resolved Chao Sun
          205.
          Add unit test for multi sessions.[Spark Branch] Sub-task Resolved Chengxiang Li
          206.
          Enable beeline query progress information for Spark job[Spark Branch] Sub-task Resolved Chengxiang Li
          207.
          RSC stdout is logged twice [Spark Branch] Sub-task Resolved Jimmy Xiang
          208.
          Clean up GenSparkProcContext.clonedReduceSinks and related code [Spark Branch] Sub-task Closed Chao Sun
          209.
          authorization_admin_almighty1.q fails with result diff [Spark Branch] Sub-task Resolved Unassigned
          210.
          Merge from trunk to spark 12/26/2014 [Spark Branch] Sub-task Resolved Brock Noland
          211.
          UT: set hive.support.concurrency to true for spark UTs Sub-task Open Unassigned
          212.
          UT: udf_in_file fails with filenotfoundexception [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          213.
          Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin
          214.
          Add listeners on JobHandle so job status change can be notified to the client [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin
          215.
          TimeOutException when using RSC with beeline [Spark Branch] Sub-task Resolved Unassigned
          216.
          One-pass SMB Optimizations [Spark Branch] Sub-task Resolved Szehon Ho
          217.
          Choose Kryo as the serializer for pTest [Spark Branch] Sub-task Resolved Xuefu Zhang
          218.
          Test windowing.q is failing [Spark Branch] Sub-task Resolved Unassigned
          219.
          Add more log information for debug RSC[Spark Branch] Sub-task Resolved Chengxiang Li
          220.
          Spark branch compile failed on hadoop-1[Spark Branch] Sub-task Resolved Chengxiang Li
          221.
          Research on build mini HoS cluster on YARN for unit test[Spark Branch] Sub-task Resolved Chengxiang Li
          222.
          Remove authorization_admin_almighty1 from spark tests [Spark Branch] Sub-task Resolved Xuefu Zhang
          223.
          Investigate differences for auto join tests in explain after merge from trunk [Spark Branch] Sub-task Resolved Chao Sun
          224.
          Followup for HIVE-9125, update ppd_join4.q.out for Spark [Spark Branch] Sub-task Resolved Xuefu Zhang
          225.
          Remove tabs from spark code [Spark Branch] Sub-task Resolved Brock Noland
          226.
          SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch] Sub-task Resolved Rui Li
          227.
          Merge trunk to spark 1/5/2015 [Spark Branch] Sub-task Resolved Szehon Ho
          228.
          Merge from spark to trunk January 2015 Sub-task Resolved Szehon Ho
          229.
          Explain query should share the same Spark application with regular queries [Spark Branch] Sub-task Resolved Jimmy Xiang
          230.
          Ensure custom UDF works with Spark [Spark Branch] Sub-task Resolved Xuefu Zhang
          231.
          Code cleanup [Spark Branch] Sub-task Resolved Szehon Ho
          232.
          TODO cleanup task1.[Spark Branch] Sub-task Resolved Chengxiang Li
          233.
          Cleanup code for getting spark job progress and metrics Sub-task Open Rui Li
          234.
          Improve replication factor of small table file given big table partitions [Spark branch] Sub-task Open Jimmy Xiang
          235.
          Set default miniClusterType back to none in QTestUtil.[Spark branch] Sub-task Resolved Chengxiang Li
          236.
          Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch] Sub-task Resolved Xuefu Zhang
          237.
          thrift.transport.TTransportException [Spark Branch] Sub-task Open Chao Sun
          238.
          Cleanup Modified Files [Spark Branch] Sub-task Resolved Szehon Ho
          239.
          Merge from trunk to spark 1/8/2015 Sub-task Resolved Szehon Ho
          240.
          BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch] Sub-task Resolved Chengxiang Li
          241.
          Address review items on HIVE-9257 [Spark Branch] Sub-task Resolved Brock Noland
          242.
          Optimize split grouping for CombineHiveInputFormat [Spark Branch] Sub-task Resolved Jimmy Xiang
          243.
          Address review of HIVE-9257 (ii) [Spark Branch] Sub-task Resolved Szehon Ho
          244.
          Fix windowing.q for Spark on trunk Sub-task Resolved Rui Li
          245.
          Merge from spark to trunk (follow-up of HIVE-9257) Sub-task Resolved Szehon Ho
          246.
          SparkJobMonitor timeout as sortByKey would launch extra Spark job before original job get submitted [Spark Branch] Sub-task Resolved Chengxiang Li
          247.
          Fix tests with some versions of Spark + Snappy [Spark Branch] Sub-task Resolved Brock Noland
          248.
          add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode [Spark Branch] Sub-task Resolved Pierre Yin
          249.
          Shutting down cli takes quite some time [Spark Branch] Sub-task Resolved Rui Li
          250.
          Make WAIT_SUBMISSION_TIMEOUT configuable and check timeout in SparkJobMonitor level.[Spark Branch] Sub-task Resolved Chengxiang Li
          251.
          Avoid ser/de loggers as logging framework can be incompatible on driver and workers Sub-task Resolved Rui Li
          252.
          ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch] Sub-task Resolved Chengxiang Li
          253.
          Add jar/file doesn't work with yarn-cluster mode [Spark Branch] Sub-task Resolved Rui Li
          254.
          Merge trunk to spark 1/21/2015 Sub-task Resolved Szehon Ho
          255.
          Move more hive.spark.* configurations to HiveConf [Spark Branch] Sub-task Resolved Szehon Ho
          256.
          LocalSparkJobStatus may return failed job as successful [Spark Branch] Sub-task Resolved Rui Li
          257.
          Push YARN configuration to Spark while deply Spark on YARN [Spark Branch] Sub-task Resolved Chengxiang Li
          258.
          MapJoin task shouldn't start if HashTableSink task failed [Spark Branch] Sub-task Resolved Unassigned
          259.
          No error thrown when global limit optimization failed to find enough number of rows [Spark Branch] Sub-task Resolved Rui Li
          260.
          Make Remote Spark Context secure [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin
          261.
          Failed job may not throw exceptions [Spark Branch] Sub-task Resolved Rui Li
          262.
          Enable CBO related tests [Spark Branch] Sub-task Closed Chinna Rao Lalam
          263.
          UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch] Sub-task Resolved Chao Sun
          264.
          Hive reported exception because that hive's derby version conflict with spark's derby version [Spark Branch] Sub-task Patch Available Pierre Yin
          265.
          Enable infer_bucket_sort_dyn_part.q for TestMiniSparkOnYarnCliDriver test. [Spark Branch] Sub-task Open Unassigned
          266.
          SparkSessionImpl calcualte wrong cores number in TestSparkCliDriver [Spark Branch] Sub-task Open Unassigned
          267.
          Merge trunk to Spark branch 2/2/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang
          268.
          SHUFFLE_SORT should only be used for order by query [Spark Branch] Sub-task Closed Rui Li
          269.
          Revert changes in two test configuration files accidently brought in by HIVE-9552 [Spark Branch] Sub-task Resolved Xuefu Zhang
          270.
          Enable more unit tests for UNION ALL [Spark Branch] Sub-task Closed Chao Sun
          271.
          Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch] Sub-task Resolved Jimmy Xiang
          272.
          'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] Sub-task Closed Rui Li
          273.
          Improve some qtests Sub-task Closed Rui Li
          274.
          Support Impersonation [Spark Branch] Sub-task Closed Brock Noland
          275.
          Address RB comments for HIVE-9425 [Spark Branch] Sub-task Closed Unassigned
          276.
          Hive on Spark is not as aggressive as MR on map join [Spark Branch] Sub-task Resolved Unassigned
          277.
          Merge trunk to Spark branch 2/15/2015 [Spark Branch] Sub-task Closed Xuefu Zhang
          278.
          Upgrade to spark 1.3 [Spark Branch] Sub-task Closed Brock Noland
          279.
          Print yarn application id to console [Spark Branch] Sub-task Closed Rui Li
          280.
          Utilize spark.kryo.classesToRegister [Spark Branch] Sub-task Closed Jimmy Xiang
          281.
          java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence Sub-task Resolved Unassigned
          282.
          Merge trunk to Spark branch 02/27/2015 [Spark Branch] Sub-task Closed Xuefu Zhang
          283.
          Load spark-defaults.conf from classpath [Spark Branch] Sub-task Closed Brock Noland
          284.
          Querying parquet tables fails with IllegalStateException [Spark Branch] Sub-task Resolved Unassigned
          285.
          Print spark job id in history file [spark branch] Sub-task Closed Chinna Rao Lalam
          286.
          Add jar/file doesn't work with yarn-cluster mode [Spark Branch] Sub-task Closed Rui Li
          287.
          Merge trunk to Spark branch 3/6/2015 [Spark Branch] Sub-task Closed Xuefu Zhang
          288.
          New Beeline queries will hang If Beeline terminates in-properly [Spark Branch] Sub-task Closed Jimmy Xiang
          289.
          Avoid Utilities.getMapRedWork for spark [Spark Branch] Sub-task Closed Rui Li
          290.
          RSC has memory leak while execute multi queries.[Spark Branch] Sub-task Closed Chengxiang Li
          291.
          HiveInputFormat implementations getsplits may lead to memory leak.[Spark Branch] Sub-task Open Unassigned
          292.
          Log the information of cached RDD [Spark Branch] Sub-task Resolved Chinna Rao Lalam
          293.
          Provide more informative stage description in Spark Web UI [Spark Branch] Sub-task Open Unassigned
          294.
          Improve common join performance [Spark Branch] Sub-task Open Unassigned
          295.
          Merge trunk to Spark branch 03/27/2015 [Spark Branch] Sub-task Closed Xuefu Zhang
          296.
          Fix test failures after HIVE-10130 [Spark Branch] Sub-task Closed Chao Sun
          297.
          Merge Spark branch to master 7/30/2015 Sub-task Closed Xuefu Zhang
          298.
          Implement Hybrid Hybrid Grace Hash Join for Spark Branch [Spark Branch] Sub-task Open Unassigned
          299.
          Hive on Spark job configuration needs to be logged [Spark Branch] Sub-task Closed Szehon Ho
          300.
          ParseException issue (Failed to recognize predicate 'user') [Spark Branch] Sub-task Resolved Sivashankar
          301.
          Merge trunk to spark 4/14/2015 [Spark Branch] Sub-task Resolved Szehon Ho
          302.
          Fix test failures after last merge from trunk [Spark Branch] Sub-task Open Unassigned
          303.
          Merge spark to trunk 4/15/2015 Sub-task Closed Szehon Ho
          304.
          Cancel connection when remote Spark driver process has failed [Spark Branch] Sub-task Closed Chao Sun
          305.
          Enable parallel order by for spark [Spark Branch] Sub-task Closed Rui Li
          306.
          Hive query should fail when it fails to initialize a session in SetSparkReducerParallelism [Spark Branch] Sub-task Closed Chao Sun
          307.
          NPE in SparkUtilities::isDedicatedCluster [Spark Branch] Sub-task Closed Rui Li
          308.
          Dynamic RDD caching optimization for HoS.[Spark Branch] Sub-task Closed Chengxiang Li
          309.
          Combine equivalent Works for HoS[Spark Branch] Sub-task Closed Chengxiang Li
          310.
          Followup for HIVE-10550, check performance w.r.t. persistence level [Spark Branch] Sub-task Open GaoLun
          311.
          Make HIVE-10001 work with Spark [Spark Branch] Sub-task Open Unassigned
          312.
          Make HIVE-10568 work with Spark [Spark Branch] Sub-task Closed Rui Li
          313.
          Merge master to Spark branch 7/29/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang
          314.
          Merge master to Spark branch 6/7/2015 [Spark Branch] Sub-task Resolved Unassigned
          315.
          HoS can't control number of map tasks for runtime skew join [Spark Branch] Sub-task Closed Rui Li
          316.
          Upgrade Spark dependency to 1.4 [Spark Branch] Sub-task Closed Rui Li
          317.
          Hive not able to pass Hive's Kerberos credential to spark-submit process [Spark Branch] Sub-task Resolved Unassigned
          318.
          Enable more tests for grouping by skewed data [Spark Branch] Sub-task Resolved Mohit Sabharwal
          319.
          Add more tests for HIVE-10844[Spark Branch] Sub-task Closed GaoLun
          320.
          Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch] Sub-task Closed Xuefu Zhang
          321.
          Merge master to Spark branch 6/20/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang
          322.
          Support multi edge between nodes in SparkPlan[Spark Branch] Sub-task Closed Chengxiang Li
          323.
          Investigate intermitten failure of join28.q for Spark Sub-task Resolved Mohit Sabharwal
          324.
          Add support for running negative q-tests [Spark Branch] Sub-task Closed Mohit Sabharwal
          325.
          HashTableSinkOperator doesn't support vectorization [Spark Branch] Sub-task Closed Rui Li
          326.
          Support hive.explain.user for Spark Sub-task Closed Sahil Takiar
          327.
          Query fails when there isn't a comparator for an operator [Spark Branch] Sub-task Closed Rui Li
          328.
          Enable native vectorized map join for spark [Spark Branch] Sub-task Closed Rui Li
          329.
          Research on recent failed qtests[Spark Branch] Sub-task Resolved Chengxiang Li
          330.
          Combine equavilent leaf works in SparkWork[Spark Branch] Sub-task Open Chengxiang Li
          331.
          Optimization around job submission and adding jars [Spark Branch] Sub-task Resolved Chengxiang Li
          332.
          Print "Execution completed successfully" as part of spark job info [Spark Branch] Sub-task Closed Ferdinand Xu
          333.
          Prewarm Hive on Spark containers [Spark Branch] Sub-task Closed Xuefu Zhang
          334.
          Merge master to Spark branch 9/16/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang
          335.
          HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch] Sub-task Resolved Unassigned
          336.
          Merge file doesn't work for ORC table when running on Spark. [Spark Branch] Sub-task Closed Rui Li
          337.
          Fix test failures after HIVE-11844 [Spark Branch] Sub-task Closed Rui Li
          338.
          Merge master to Spark branch 10/28/2015 [Spark Branch] Sub-task Closed Xuefu Zhang
          339.
          Merge master into spark 11/17/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang
          340.
          [Spark Branch] ClassNotFoundException occurs during query case with group by and UDF defined Sub-task Open Chengxiang Li
          341.
          NullPointerException thrown by Executors causes job can't be finished Sub-task Open Unassigned

          Activity

            People

              xuefuz Xuefu Zhang
              xuefuz Xuefu Zhang
              Votes:
              49 Vote for this issue
              Watchers:
              188 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: