Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2210

Apache Spark stucks while reading Kudu table.

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Not A Bug
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: client, perf, spark
    • Labels:
      None
    • Target Version/s:

      Description

      When I try reading Kudu table with Apache Spark using following code

      import org.apache.kudu.spark.kudu._
      import sqlContext.implicits._
      val kuduOptions: Map[String, String] = Map(
      "kudu.table"  -> "test_table", 
      "kudu.master" -> "host1:7051,host2:7051,host3:7051")
      val kuduDF = sqlContext.read.options(kuduOptions).kudu
      kuduDF.registerTempTable("t")
      sqlContext.sql(" SELECT * FROM t  where id in (1111,2222) ").show(50, false)
      

      after completing 95% of tasks the job stucks for more than three days. The table is partitioned by date and partitions have uneven size. Table have one partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and some partitions with Mb's and kb's of data.

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              Andrew_ya Andrew Ya
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: