Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-5923

Very slow query when using Oracle hive metastore and table has lots of partitions

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.2.0
    • None
    • SQL
    • None

    Description

      This has two aspects

      • The direct sql support for oracle is broken in hive 0.13.1. Fails when partitions get bigger than 1000 due oracle limitation on IN clause. This cause fall back to ORM which is very slow(20 minutes to even start the query)
      • Hive it self does not suffer this problem as it passes down to the metadata query, filter terms that restrict the partitions returned. SparkSQL is always asking for all partitions event if they are not all needed. Even when we patched hive it was still taking 2 minutes

      Attachments

        Activity

          People

            Unassigned Unassigned
            tbfenet Matthew Taylor
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: