Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2210

Apache Spark stucks while reading Kudu table.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Not A Bug
    • None
    • None
    • client, perf, spark
    • None

    Description

      When I try reading Kudu table with Apache Spark using following code

      import org.apache.kudu.spark.kudu._
      import sqlContext.implicits._
      val kuduOptions: Map[String, String] = Map(
      "kudu.table"  -> "test_table", 
      "kudu.master" -> "host1:7051,host2:7051,host3:7051")
      val kuduDF = sqlContext.read.options(kuduOptions).kudu
      kuduDF.registerTempTable("t")
      sqlContext.sql(" SELECT * FROM t  where id in (1111,2222) ").show(50, false)
      

      after completing 95% of tasks the job stucks for more than three days. The table is partitioned by date and partitions have uneven size. Table have one partition 12 Gb size, about 20 partitions with size between 1 Gb and 3 Gb and some partitions with Mb's and kb's of data.

      Attachments

        Activity

          People

            Unassigned Unassigned
            Andrew_ya Andrew Ya
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: