Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7292

Hive on Spark

Log workAgile BoardRank to TopRank to BottomBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    Description

      Spark as an open-source data analytics cluster computing framework has gained significant momentum recently. Many Hive users already have Spark installed as their computing backbone. To take advantages of Hive, they still need to have either MapReduce or Tez on their cluster. This initiative will provide user a new alternative so that those user can consolidate their backend.

      Secondly, providing such an alternative further increases Hive's adoption as it exposes Spark users to a viable, feature-rich de facto standard SQL tools on Hadoop.

      Finally, allowing Hive to run on Spark also has performance benefits. Hive queries, especially those involving multiple reducer stages, will run faster, thus improving user experience as Tez does.

      This is an umbrella JIRA which will cover many coming subtask. Design doc will be attached here shortly, and will be on the wiki as well. Feedback from the community is greatly appreciated!

      Attachments

        1. Hive-on-Spark.pdf
          290 kB
          Xuefu Zhang

        Issue Links

        1.
        Refactoring: make Hive reduce side data processing reusable [Spark Branch] Sub-task Reopened Xuefu Zhang Actions
        2.
        Refactoring: make Hive map side data processing reusable [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        3.
        Create SparkWork [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        4.
        Create SparkTask [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        5.
        Create SparkCompiler [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        6.
        Create SparkClient, interface to Spark cluster [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        7.
        Create RDD translator, translating Hive Tables into Spark RDDs [Spark Branch] Sub-task Resolved Rui Li Actions
        8.
        Create SparkShuffler, shuffling data between map-side data processing and reduce-side processing [Spark Branch] Sub-task Resolved Rui Li Actions
        9.
        Create SparkPlan, DAG representation of a Spark job [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        10.
        Create MapFunction [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        11.
        Create ReduceFunction [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        12.
        Create SparkPlanGenerator [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        13.
        Create a MiniSparkCluster and set up a testing framework [Spark Branch] Sub-task Resolved Rui Li Actions
        14.
        Research into reduce-side join [Spark Branch] Sub-task Resolved Szehon Ho Actions
        15.
        Spark 1.0.1 is released, stop using SNAPSHOT [Spark Branch] Sub-task Resolved Brock Noland Actions
        16.
        Exclude hadoop 1 from spark dep [Spark Branch] Sub-task Resolved Brock Noland Actions
        17.
        Load Spark configuration into Hive driver [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        18.
        Counters, statistics, and metrics [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        19.
        Spark job monitoring and error reporting [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        20.
        Implement pre-commit testing [Spark Branch] Sub-task Resolved Brock Noland Actions
        21.
        Enhance SparkCollector [Spark Branch] Sub-task Resolved Venki Korukanti Actions
        22.
        Enhance HiveReduceFunction's row clustering [Spark Branch] Sub-task Resolved Chao Sun Actions
        23.
        Support Hive's multi-table insert query with Spark [Spark Branch] Sub-task Resolved Chao Sun Actions
        24.
        Support order by and sort by on Spark [Spark Branch] Sub-task Resolved Rui Li Actions
        25.
        Support cluster by and distributed by [Spark Branch] Sub-task Resolved Rui Li Actions
        26.
        Support union all on Spark [Spark Branch] Sub-task Resolved Na Yang Actions
        27.
        StarterProject: Move configuration from SparkClient to HiveConf [Spark Branch] Sub-task Open Unassigned Actions
        28.
        StarterProject: Fix exception handling in POC code [Spark Branch] Sub-task Resolved Chao Sun Actions
        29.
        StarterProject: Move from assert to Guava Preconditions.* in Hive on Spark [Spark Branch] Sub-task Resolved Chao Sun Actions
        30.
        Make sure multi-MR queries work [Spark Branch] Sub-task Resolved Chao Sun Actions
        31.
        Support dynamic partitioning [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        32.
        Instantiate SparkClient per user session [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        33.
        Support analyze table [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        34.
        Find solution for closures containing writables [Spark Branch] Sub-task Resolved Unassigned Actions
        35.
        Support Hive TABLESAMPLE [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        36.
        Create TestSparkCliDriver to run test in spark local mode [Spark Branch] Sub-task Resolved Szehon Ho Actions
        37.
        Update to Spark 1.2 [Spark Branch] Sub-task Resolved Brock Noland Actions
        38.
        Implement native HiveMapFunction [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        39.
        Implement native HiveReduceFunction [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        40.
        Start running .q file tests on spark [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        41.
        Fix qtest-spark pom.xml reference to test properties [Spark Branch] Sub-task Resolved Brock Noland Actions
        42.
        Create SparkReporter [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        43.
        Incorrect result returned when a map work has multiple downstream reduce works [Spark Branch] Sub-task Resolved Chao Sun Actions
        44.
        TestSparkCliDriver should not use includeQueryFiles [Spark Branch] Sub-task Resolved Brock Noland Actions
        45.
        Add .q tests coverage for "union all" [Spark Branch] Sub-task Resolved Na Yang Actions
        46.
        Enable q-tests for TABLESAMPLE feature [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        47.
        Research to find out if it's possible to submit Spark jobs concurrently using shared SparkContext [Spark Branch] Sub-task Resolved Chao Sun Actions
        48.
        Enable q-tests for ANALYZE TABLE feature [Spark Branch] Sub-task Resolved Na Yang Actions
        49.
        Add qfile_regex to qtest-spark pom [Spark Branch] Sub-task Resolved Brock Noland Actions
        50.
        Enable timestamp.* tests [Spark Branch] Sub-task Resolved Brock Noland Actions
        51.
        Enable avro* tests [Spark Branch] Sub-task Resolved Brock Noland Actions
        52.
        PTest2 separates test files with spaces while QTestGen uses commas [Spark Branch] Sub-task Resolved Brock Noland Actions
        53.
        Cleanup Reduce operator code [Spark Branch] Sub-task Resolved Rui Li Actions
        54.
        hive.optimize.union.remove does not work properly [Spark Branch] Sub-task Resolved Na Yang Actions
        55.
        Integrate with Spark executor scaling [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        56.
        Research optimization of auto convert join to map join [Spark branch] Sub-task Resolved Suhas Satish Actions
        57.
        Support windowing and analytic functions [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        58.
        Enable windowing and analytic function qtests [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        59.
        Union all query finished with errors [Spark Branch] Sub-task Resolved Rui Li Actions
        60.
        Enable tests on Spark branch (1) [Sparch Branch] Sub-task Resolved Brock Noland Actions
        61.
        Enable tests on Spark branch (2) [Sparch Branch] Sub-task Resolved Venki Korukanti Actions
        62.
        Enable tests on Spark branch (3) [Sparch Branch] Sub-task Resolved Chengxiang Li Actions
        63.
        Enable tests on Spark branch (4) [Sparch Branch] Sub-task Resolved Chinna Rao Lalam Actions
        64.
        Enable map-join tests which Tez executes [Spark Branch] Sub-task Resolved Rui Li Actions
        65.
        CounterStatsAggregator throws a class cast exception Sub-task Resolved Brock Noland Actions
        66.
        union_null.q is not deterministic Sub-task Closed Brock Noland Actions
        67.
        StarterProject: enable groupby4.q [Spark Branch] Sub-task Resolved Suhas Satish Actions
        68.
        Research commented out unset in Utiltities [Spark Branch] Sub-task Resolved Unassigned Actions
        69.
        Update union_null results now that it's deterministic [Spark Branch] Sub-task Resolved Brock Noland Actions
        70.
        Refresh SparkContext when spark configuration changes [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        71.
        Enable reduce-side join tests (1) [Spark Branch] Sub-task Resolved Szehon Ho Actions
        72.
        Merge from trunk (1) [Spark Branch] Sub-task Resolved Brock Noland Actions
        73.
        Re-order spark.query.files in sorted order [Spark Branch] Sub-task Resolved Brock Noland Actions
        74.
        Build long running HS2 test framework Sub-task Closed Suhas Satish Actions
        75.
        Insert overwrite table query has strange behavior when set hive.optimize.union.remove=true [Spark Branch] Sub-task Resolved Na Yang Actions
        76.
        Re-enable lazy HiveBaseFunctionResultList [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        77.
        Enable qtest load_dyn_part1.q [Spark Branch] Sub-task Resolved Venki Korukanti Actions
        78.
        orc_analyze.q fails due to random mapred.task.id in FileSinkOperator [Spark Branch] Sub-task Resolved Venki Korukanti Actions
        79.
        optimize_nullscan.q fails due to differences in explain plan [Spark Branch] Sub-task Resolved Venki Korukanti Actions
        80.
        Support multiple concurrent users Sub-task Resolved Chengxiang Li Actions
        81.
        Support subquery [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        82.
        enable Qtest scriptfile1.q [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        83.
        enable sample8.q.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        84.
        enable sample10.q.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        85.
        Insert overwrite table query does not generate correct task plan [Spark Branch] Sub-task Resolved Na Yang Actions
        86.
        Research Hive dependency on MR distributed cache[Spark Branch] Sub-task Open Unassigned Actions
        87.
        Merge from trunk (2) [Spark Branch] Sub-task Resolved Brock Noland Actions
        88.
        Investigate query failures (1) Sub-task Resolved Thomas Friedrich Actions
        89.
        Investigate query failures (2) Sub-task Resolved Thomas Friedrich Actions
        90.
        Investigate query failures (3) Sub-task Resolved Thomas Friedrich Actions
        91.
        Investigate query failures (4) Sub-task Resolved Thomas Friedrich Actions
        92.
        Merge from trunk (3) [Spark Branch] Sub-task Resolved Brock Noland Actions
        93.
        Use HiveKey instead of BytesWritable as key type of the pair RDD [Spark Branch] Sub-task Resolved Rui Li Actions
        94.
        Fix TestSparkCliDriver => optimize_nullscan.q Sub-task Resolved Brock Noland Actions
        95.
        Merge trunk into spark 9/12/2014 Sub-task Resolved Brock Noland Actions
        96.
        Enable vectorization for spark [spark branch] Sub-task Resolved Chinna Rao Lalam Actions
        97.
        Code cleanup after HIVE-8054 [Spark Branch] Sub-task Resolved Na Yang Actions
        98.
        Disable hive.optimize.union.remove when hive.execution.engine=spark [Spark Branch] Sub-task Resolved Na Yang Actions
        99.
        Remove obsolete code from SparkWork [Spark Branch] Sub-task Resolved Chao Sun Actions
        100.
        Refactor the GraphTran code by moving union handling logic to UnionTran [Spark Branch] Sub-task Resolved Na Yang Actions
        101.
        Support SMB Join for Hive on Spark [Spark Branch] Sub-task Resolved Szehon Ho Actions
        102.
        Merge from trunk to spark 9/20/14 Sub-task Resolved Brock Noland Actions
        103.
        clone SparkWork for join optimization Sub-task Resolved Unassigned Actions
        104.
        GroupByShuffler.java missing apache license header [Spark Branch] Sub-task Resolved Chao Sun Actions
        105.
        Merge from trunk to spark 9/29/14 Sub-task Resolved Xuefu Zhang Actions
        106.
        Enable windowing.q for spark [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        107.
        Merge trunk into spark 10/4/2015 [Spark Branch] Sub-task Resolved Brock Noland Actions
        108.
        Fix fs_default_name2.q on spark [Spark Branch] Sub-task Resolved Brock Noland Actions
        109.
        Investigate flaky test parallel.q Sub-task Resolved Jimmy Xiang Actions
        110.
        TPCDS query #7 fails with IndexOutOfBoundsException [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        111.
        Research Bucket Map Join [Spark Branch] Sub-task Resolved Na Yang Actions
        112.
        Research on skewed join [Spark Branch] Sub-task Resolved Rui Li Actions
        113.
        Make reduce side join work for all join queries [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        114.
        Turn on all join .q tests [Spark Branch] Sub-task Resolved Chao Sun Actions
        115.
        Print Spark job progress format info on the console[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        116.
        Support Hive Counter to collect spark job metric[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        117.
        Update timestamp in status console [Spark Branch] Sub-task Resolved Brock Noland Actions
        118.
        TPC-DS Query 96 parallelism is not set correcly Sub-task Resolved Chao Sun Actions
        119.
        Merge trunk into spark 10/17/14 [Spark Branch] Sub-task Resolved Brock Noland Actions
        120.
        UT: add TestSparkMinimrCliDriver to run UTs that use HDFS Sub-task Open Thomas Friedrich Actions
        121.
        UT: fix bucket_num_reducers test Sub-task Open Chinna Rao Lalam Actions
        122.
        UTs: create missing output files for some tests under clientpositive/spark Sub-task Open Thomas Friedrich Actions
        123.
        UT: add test flag in hive-site.xml for spark tests Sub-task Resolved Thomas Friedrich Actions
        124.
        UT: fix rcfile_bigdata test [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        125.
        UT: fix bucketsort_insert tests - related to SMBMapJoinOperator Sub-task Resolved Chinna Rao Lalam Actions
        126.
        UT: fix list_bucket_dml_2 test [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        127.
        Update async action in SparkClient as Spark add new Java action API[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        128.
        Add remote Spark client to Hive [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin Actions
        129.
        Enable collect table statistics based on SparkCounter[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        130.
        HivePairFlatMapFunction.java missing license header [Spark Branch] Sub-task Resolved Chao Sun Actions
        131.
        Add InterfaceAudience annotations to spark-client [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin Actions
        132.
        convert joinOp to MapJoinOp and generate MapWorks only [Spark Branch] Sub-task Resolved Suhas Satish Actions
        133.
        Implement bucket map join optimization [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        134.
        Convert SMBJoin to MapJoin [Spark Branch] Sub-task Resolved Szehon Ho Actions
        135.
        Support hints of SMBJoin [Spark Branch] Sub-task Resolved Szehon Ho Actions
        136.
        Reduce Side Join with single reducer [Spark Branch] Sub-task Resolved Szehon Ho Actions
        137.
        Enable parallelism in Reduce Side Join [Spark Branch] Sub-task Resolved Szehon Ho Actions
        138.
        Increase level of parallelism in reduce phase [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        139.
        Combine Hive Operator statistic and Spark Metric to an uniformed query statistic.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        140.
        Result differences after merge [Spark Branch] Sub-task Resolved Brock Noland Actions
        141.
        Fix tests after merge [Spark Branch] Sub-task Resolved Brock Noland Actions
        142.
        Enable table statistic collection on counter for CTAS query[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        143.
        spark-client build failed sometimes.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        144.
        Collect Spark TaskMetrics and build job statistic[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        145.
        Null Pointer Exception when counter is used for stats during inserting overwrite partitioned tables [Spark Branch] Sub-task Resolved Na Yang Actions
        146.
        numRows and rawDataSize are not collected by the Spark stats [Spark Branch] Sub-task Resolved Na Yang Actions
        147.
        Investigate test failures related to HIVE-8545 [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        148.
        Fix hadoop-1 build [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        149.
        Merge from trunk 11/6/14 [SPARK BRANCH] Sub-task Resolved Brock Noland Actions
        150.
        Should only register used counters in SparkCounters[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        151.
        insert1.q and ppd_join4.q hangs with hadoop-1 [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        152.
        Create some tests that use Spark counter for stats collection [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        153.
        UT: update hive-site.xml for spark UTs to add hive_admin_user to admin role Sub-task Resolved Thomas Friedrich Actions
        154.
        UT: fix partition test case [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        155.
        UT: fix udf_context_aware Sub-task Resolved Aihua Xu Actions
        156.
        UT: fix hook_context_cs test case Sub-task Open Unassigned Actions
        157.
        Switch precommit test from local to local-cluster [Spark Branch] Sub-task Resolved Szehon Ho Actions
        158.
        Print prettier Spark work graph after HIVE-8793 [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        159.
        Release RDD cache when Hive query is done [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        160.
        Choose a persisent policy for RDD caching [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        161.
        Hive/Spark/Yarn integration [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        162.
        Update new spark progress API for local submitted job monitoring [Spark Branch] Sub-task Resolved Rui Li Actions
        163.
        Visualize generated Spark plan [Spark Branch] Sub-task Closed Chinna Rao Lalam Actions
        164.
        Downgrade guava version to be consistent with Hive and the rest of Hadoop [Spark Branch] Sub-task Open Unassigned Actions
        165.
        Fix test TestHiveKVResultCache [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        166.
        Use MEMORY_AND_DISK for RDD caching [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        167.
        Merge from trunk to spark [Spark Branch] Sub-task Resolved Brock Noland Actions
        168.
        downgrade guava version for spark branch from 14.0.1 to 11.0.2.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        169.
        Servlet classes signer information does not match [Spark branch] Sub-task Resolved Chengxiang Li Actions
        170.
        IOContext problem with multiple MapWorks cloned for multi-insert [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        171.
        Remove unnecessary dependency collection task [Spark Branch] Sub-task Resolved Rui Li Actions
        172.
        Make sure Spark + HS2 work [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        173.
        Merge from trunk Nov 28 2014 Sub-task Resolved Brock Noland Actions
        174.
        Find thread leak in RSC Tests [Spark Branch] Sub-task Resolved Rui Li Actions
        175.
        Logging is not configured in spark-submit sub-process Sub-task Resolved Brock Noland Actions
        176.
        SparkCounter display name is not set correctly[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        177.
        Clean up temp files of RSC [Spark Branch] Sub-task Open Unassigned Actions
        178.
        Avoid using SPARK_JAVA_OPTS [Spark Branch] Sub-task Resolved Rui Li Actions
        179.
        Re-enable remaining tests after HIVE-8970 [Spark Branch] Sub-task Resolved Chao Sun Actions
        180.
        Enable ppd_join4 [Spark Branch] Sub-task Resolved Chao Sun Actions
        181.
        Replace akka for remote spark client RPC [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin Actions
        182.
        Spark Memory can be formatted string [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        183.
        Support multiple mapjoin operators in one work [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        184.
        HiveException: Conflict on row inspector for {table} Sub-task Resolved Jimmy Xiang Actions
        185.
        Choosing right preference between map join and bucket map join [Spark Branch] Sub-task Open Unassigned Actions
        186.
        Add additional logging to SetSparkReducerParallelism [Spark Branch] Sub-task Resolved Brock Noland Actions
        187.
        Remove wrappers for SparkJobInfo and SparkStageInfo [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        188.
        NPE in RemoteSparkJobStatus.getSparkStatistics [Spark Branch] Sub-task Resolved Rui Li Actions
        189.
        Generate better plan for queries containing both union and multi-insert [Spark Branch] Sub-task Resolved Chao Sun Actions
        190.
        Allow RPC Configuration [Spark Branch] Sub-task Resolved Unassigned Actions
        191.
        Hive should not submit second SparkTask while previous one has failed.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        192.
        Hive hangs while failed to get executorCount[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        193.
        Skip child tasks if parent task failed [Spark Branch] Sub-task Resolved Unassigned Actions
        194.
        Bucket mapjoin should use the new alias in posToAliasMap [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        195.
        Investigate IOContext object initialization problem [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        196.
        Spark Client RPC should have larger default max message size [Spark Branch] Sub-task Resolved Brock Noland Actions
        197.
        Spark counter serialization error in spark.log [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        198.
        Error when cleaning up in spark.log [Spark Branch] Sub-task Open Unassigned Actions
        199.
        TimeoutException when trying get executor count from RSC [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        200.
        Check cross product for conditional task [Spark Branch] Sub-task Resolved Rui Li Actions
        201.
        infer_bucket_sort_convert_join.q and mapjoin_hook.q failed.[Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        202.
        bucket_map_join_spark4.q failed due to NPE.[Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        203.
        Support backup task for join related optimization [Spark Branch] Sub-task Patch Available Chao Sun Actions
        204.
        windowing.q failed when mapred.reduce.tasks is set to larger than one Sub-task Resolved Chao Sun Actions
        205.
        Add unit test for multi sessions.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        206.
        Enable beeline query progress information for Spark job[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        207.
        RSC stdout is logged twice [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        208.
        Clean up GenSparkProcContext.clonedReduceSinks and related code [Spark Branch] Sub-task Closed Chao Sun Actions
        209.
        authorization_admin_almighty1.q fails with result diff [Spark Branch] Sub-task Resolved Unassigned Actions
        210.
        Merge from trunk to spark 12/26/2014 [Spark Branch] Sub-task Resolved Brock Noland Actions
        211.
        UT: set hive.support.concurrency to true for spark UTs Sub-task Open Unassigned Actions
        212.
        UT: udf_in_file fails with filenotfoundexception [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        213.
        Create a separate API for remote Spark Context RPC other than job submission [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin Actions
        214.
        Add listeners on JobHandle so job status change can be notified to the client [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin Actions
        215.
        TimeOutException when using RSC with beeline [Spark Branch] Sub-task Resolved Unassigned Actions
        216.
        One-pass SMB Optimizations [Spark Branch] Sub-task Resolved Szehon Ho Actions
        217.
        Choose Kryo as the serializer for pTest [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        218.
        Test windowing.q is failing [Spark Branch] Sub-task Resolved Unassigned Actions
        219.
        Add more log information for debug RSC[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        220.
        Spark branch compile failed on hadoop-1[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        221.
        Research on build mini HoS cluster on YARN for unit test[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        222.
        Remove authorization_admin_almighty1 from spark tests [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        223.
        Investigate differences for auto join tests in explain after merge from trunk [Spark Branch] Sub-task Resolved Chao Sun Actions
        224.
        Followup for HIVE-9125, update ppd_join4.q.out for Spark [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        225.
        Remove tabs from spark code [Spark Branch] Sub-task Resolved Brock Noland Actions
        226.
        SetSparkReducerParallelism is likely to set too small number of reducers [Spark Branch] Sub-task Resolved Rui Li Actions
        227.
        Merge trunk to spark 1/5/2015 [Spark Branch] Sub-task Resolved Szehon Ho Actions
        228.
        Merge from spark to trunk January 2015 Sub-task Resolved Szehon Ho Actions
        229.
        Explain query should share the same Spark application with regular queries [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        230.
        Ensure custom UDF works with Spark [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        231.
        Code cleanup [Spark Branch] Sub-task Resolved Szehon Ho Actions
        232.
        TODO cleanup task1.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        233.
        Cleanup code for getting spark job progress and metrics Sub-task Open Rui Li Actions
        234.
        Improve replication factor of small table file given big table partitions [Spark branch] Sub-task Open Jimmy Xiang Actions
        235.
        Set default miniClusterType back to none in QTestUtil.[Spark branch] Sub-task Resolved Chengxiang Li Actions
        236.
        Let Context.isLocalOnlyExecutionMode() return false if execution engine is Spark [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        237.
        thrift.transport.TTransportException [Spark Branch] Sub-task Open Chao Sun Actions
        238.
        Cleanup Modified Files [Spark Branch] Sub-task Resolved Szehon Ho Actions
        239.
        Merge from trunk to spark 1/8/2015 Sub-task Resolved Szehon Ho Actions
        240.
        BaseProtocol.Error failed to deserialization due to NPE.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        241.
        Address review items on HIVE-9257 [Spark Branch] Sub-task Resolved Brock Noland Actions
        242.
        Optimize split grouping for CombineHiveInputFormat [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        243.
        Address review of HIVE-9257 (ii) [Spark Branch] Sub-task Resolved Szehon Ho Actions
        244.
        Fix windowing.q for Spark on trunk Sub-task Resolved Rui Li Actions
        245.
        Merge from spark to trunk (follow-up of HIVE-9257) Sub-task Resolved Szehon Ho Actions
        246.
        SparkJobMonitor timeout as sortByKey would launch extra Spark job before original job get submitted [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        247.
        Fix tests with some versions of Spark + Snappy [Spark Branch] Sub-task Resolved Brock Noland Actions
        248.
        add num-executors / executor-cores / executor-memory option support for hive on spark in Yarn mode [Spark Branch] Sub-task Resolved Pierre Yin Actions
        249.
        Shutting down cli takes quite some time [Spark Branch] Sub-task Resolved Rui Li Actions
        250.
        Make WAIT_SUBMISSION_TIMEOUT configuable and check timeout in SparkJobMonitor level.[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        251.
        Avoid ser/de loggers as logging framework can be incompatible on driver and workers Sub-task Resolved Rui Li Actions
        252.
        ClassNotFoundException occurs during hive query case execution with UDF defined [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        253.
        Add jar/file doesn't work with yarn-cluster mode [Spark Branch] Sub-task Resolved Rui Li Actions
        254.
        Merge trunk to spark 1/21/2015 Sub-task Resolved Szehon Ho Actions
        255.
        Move more hive.spark.* configurations to HiveConf [Spark Branch] Sub-task Resolved Szehon Ho Actions
        256.
        LocalSparkJobStatus may return failed job as successful [Spark Branch] Sub-task Resolved Rui Li Actions
        257.
        Push YARN configuration to Spark while deply Spark on YARN [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        258.
        MapJoin task shouldn't start if HashTableSink task failed [Spark Branch] Sub-task Resolved Unassigned Actions
        259.
        No error thrown when global limit optimization failed to find enough number of rows [Spark Branch] Sub-task Resolved Rui Li Actions
        260.
        Make Remote Spark Context secure [Spark Branch] Sub-task Resolved Marcelo Masiero Vanzin Actions
        261.
        Failed job may not throw exceptions [Spark Branch] Sub-task Resolved Rui Li Actions
        262.
        Enable CBO related tests [Spark Branch] Sub-task Closed Chinna Rao Lalam Actions
        263.
        UNION ALL query failed with ArrayIndexOutOfBoundsException [Spark Branch] Sub-task Resolved Chao Sun Actions
        264.
        Hive reported exception because that hive's derby version conflict with spark's derby version [Spark Branch] Sub-task Patch Available Pierre Yin Actions
        265.
        Enable infer_bucket_sort_dyn_part.q for TestMiniSparkOnYarnCliDriver test. [Spark Branch] Sub-task Open Unassigned Actions
        266.
        SparkSessionImpl calcualte wrong cores number in TestSparkCliDriver [Spark Branch] Sub-task Open Unassigned Actions
        267.
        Merge trunk to Spark branch 2/2/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        268.
        SHUFFLE_SORT should only be used for order by query [Spark Branch] Sub-task Closed Rui Li Actions
        269.
        Revert changes in two test configuration files accidently brought in by HIVE-9552 [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        270.
        Enable more unit tests for UNION ALL [Spark Branch] Sub-task Closed Chao Sun Actions
        271.
        Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch] Sub-task Resolved Jimmy Xiang Actions
        272.
        'Error while trying to create table container' occurs during hive query case execution when hive.optimize.skewjoin set to 'true' [Spark Branch] Sub-task Closed Rui Li Actions
        273.
        Improve some qtests Sub-task Closed Rui Li Actions
        274.
        Support Impersonation [Spark Branch] Sub-task Closed Brock Noland Actions
        275.
        Address RB comments for HIVE-9425 [Spark Branch] Sub-task Closed Unassigned Actions
        276.
        Hive on Spark is not as aggressive as MR on map join [Spark Branch] Sub-task Resolved Unassigned Actions
        277.
        Merge trunk to Spark branch 2/15/2015 [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        278.
        Upgrade to spark 1.3 [Spark Branch] Sub-task Closed Brock Noland Actions
        279.
        Print yarn application id to console [Spark Branch] Sub-task Closed Rui Li Actions
        280.
        Utilize spark.kryo.classesToRegister [Spark Branch] Sub-task Closed Jimmy Xiang Actions
        281.
        java.lang.NoSuchMethodError occurs during hive query execution which has 'ADD FILE XXXX.jar' sentence Sub-task Resolved Unassigned Actions
        282.
        Merge trunk to Spark branch 02/27/2015 [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        283.
        Load spark-defaults.conf from classpath [Spark Branch] Sub-task Closed Brock Noland Actions
        284.
        Querying parquet tables fails with IllegalStateException [Spark Branch] Sub-task Resolved Unassigned Actions
        285.
        Print spark job id in history file [spark branch] Sub-task Closed Chinna Rao Lalam Actions
        286.
        Add jar/file doesn't work with yarn-cluster mode [Spark Branch] Sub-task Closed Rui Li Actions
        287.
        Merge trunk to Spark branch 3/6/2015 [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        288.
        New Beeline queries will hang If Beeline terminates in-properly [Spark Branch] Sub-task Closed Jimmy Xiang Actions
        289.
        Avoid Utilities.getMapRedWork for spark [Spark Branch] Sub-task Closed Rui Li Actions
        290.
        RSC has memory leak while execute multi queries.[Spark Branch] Sub-task Closed Chengxiang Li Actions
        291.
        HiveInputFormat implementations getsplits may lead to memory leak.[Spark Branch] Sub-task Open Unassigned Actions
        292.
        Log the information of cached RDD [Spark Branch] Sub-task Resolved Chinna Rao Lalam Actions
        293.
        Provide more informative stage description in Spark Web UI [Spark Branch] Sub-task Open Unassigned Actions
        294.
        Improve common join performance [Spark Branch] Sub-task Open Unassigned Actions
        295.
        Merge trunk to Spark branch 03/27/2015 [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        296.
        Fix test failures after HIVE-10130 [Spark Branch] Sub-task Closed Chao Sun Actions
        297.
        Merge Spark branch to master 7/30/2015 Sub-task Closed Xuefu Zhang Actions
        298.
        Implement Hybrid Hybrid Grace Hash Join for Spark Branch [Spark Branch] Sub-task Open Unassigned Actions
        299.
        Hive on Spark job configuration needs to be logged [Spark Branch] Sub-task Closed Szehon Ho Actions
        300.
        ParseException issue (Failed to recognize predicate 'user') [Spark Branch] Sub-task Resolved Sivashankar Actions
        301.
        Merge trunk to spark 4/14/2015 [Spark Branch] Sub-task Resolved Szehon Ho Actions
        302.
        Fix test failures after last merge from trunk [Spark Branch] Sub-task Open Unassigned Actions
        303.
        Merge spark to trunk 4/15/2015 Sub-task Closed Szehon Ho Actions
        304.
        Cancel connection when remote Spark driver process has failed [Spark Branch] Sub-task Closed Chao Sun Actions
        305.
        Enable parallel order by for spark [Spark Branch] Sub-task Closed Rui Li Actions
        306.
        Hive query should fail when it fails to initialize a session in SetSparkReducerParallelism [Spark Branch] Sub-task Closed Chao Sun Actions
        307.
        NPE in SparkUtilities::isDedicatedCluster [Spark Branch] Sub-task Closed Rui Li Actions
        308.
        Dynamic RDD caching optimization for HoS.[Spark Branch] Sub-task Closed Chengxiang Li Actions
        309.
        Combine equivalent Works for HoS[Spark Branch] Sub-task Closed Chengxiang Li Actions
        310.
        Followup for HIVE-10550, check performance w.r.t. persistence level [Spark Branch] Sub-task Open GaoLun Actions
        311.
        Make HIVE-10001 work with Spark [Spark Branch] Sub-task Open Unassigned Actions
        312.
        Make HIVE-10568 work with Spark [Spark Branch] Sub-task Closed Rui Li Actions
        313.
        Merge master to Spark branch 7/29/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        314.
        Merge master to Spark branch 6/7/2015 [Spark Branch] Sub-task Resolved Unassigned Actions
        315.
        HoS can't control number of map tasks for runtime skew join [Spark Branch] Sub-task Closed Rui Li Actions
        316.
        Upgrade Spark dependency to 1.4 [Spark Branch] Sub-task Closed Rui Li Actions
        317.
        Hive not able to pass Hive's Kerberos credential to spark-submit process [Spark Branch] Sub-task Resolved Unassigned Actions
        318.
        Enable more tests for grouping by skewed data [Spark Branch] Sub-task Resolved Mohit Sabharwal Actions
        319.
        Add more tests for HIVE-10844[Spark Branch] Sub-task Closed GaoLun Actions
        320.
        Remote Spark client doesn't use Kerberos keytab to authenticate [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        321.
        Merge master to Spark branch 6/20/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        322.
        Support multi edge between nodes in SparkPlan[Spark Branch] Sub-task Closed Chengxiang Li Actions
        323.
        Investigate intermitten failure of join28.q for Spark Sub-task Resolved Mohit Sabharwal Actions
        324.
        Add support for running negative q-tests [Spark Branch] Sub-task Closed Mohit Sabharwal Actions
        325.
        HashTableSinkOperator doesn't support vectorization [Spark Branch] Sub-task Closed Rui Li Actions
        326.
        Support hive.explain.user for Spark Sub-task Closed Sahil Takiar Actions
        327.
        Query fails when there isn't a comparator for an operator [Spark Branch] Sub-task Closed Rui Li Actions
        328.
        Enable native vectorized map join for spark [Spark Branch] Sub-task Closed Rui Li Actions
        329.
        Research on recent failed qtests[Spark Branch] Sub-task Resolved Chengxiang Li Actions
        330.
        Combine equavilent leaf works in SparkWork[Spark Branch] Sub-task Open Chengxiang Li Actions
        331.
        Optimization around job submission and adding jars [Spark Branch] Sub-task Resolved Chengxiang Li Actions
        332.
        Print "Execution completed successfully" as part of spark job info [Spark Branch] Sub-task Closed Ferdinand Xu Actions
        333.
        Prewarm Hive on Spark containers [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        334.
        Merge master to Spark branch 9/16/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        335.
        HiveException (Failed to close AbstractFileMergeOperator) occurs during loading data to ORC file, when hive.merge.sparkfiles is set to true. [Spark Branch] Sub-task Resolved Unassigned Actions
        336.
        Merge file doesn't work for ORC table when running on Spark. [Spark Branch] Sub-task Closed Rui Li Actions
        337.
        Fix test failures after HIVE-11844 [Spark Branch] Sub-task Closed Rui Li Actions
        338.
        Merge master to Spark branch 10/28/2015 [Spark Branch] Sub-task Closed Xuefu Zhang Actions
        339.
        Merge master into spark 11/17/2015 [Spark Branch] Sub-task Resolved Xuefu Zhang Actions
        340.
        [Spark Branch] ClassNotFoundException occurs during query case with group by and UDF defined Sub-task Open Chengxiang Li Actions
        341.
        NullPointerException thrown by Executors causes job can't be finished Sub-task Open Unassigned Actions

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            xuefuz Xuefu Zhang Assign to me
            xuefuz Xuefu Zhang
            Votes:
            49 Vote for this issue
            Watchers:
            188 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment