Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-8131 Improve Database support
  3. SPARK-8435

Cannot create tables in an specific database using a provider

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.5.0
    • SQL
    • None
    • Spark SQL 1.4.0 (Spark-Shell), Hive metastore, MySQL Driver, Linux

    Description

      Hello,

      I've been trying to create tables in different catalogs using a Hive metastore and when I execute the "CREATE" statement, I realized that it is created into the default catalog.

      This is what I'm trying.

      scala> sqlContext.sql("CREATE DATABASE IF NOT EXISTS testmetastore COMMENT 'Testing catalogs' ")
      scala> sqlContext.sql("USE testmetastore")
      scala> sqlContext.sql("CREATE TABLE students USING org.apache.spark.sql.parquet OPTIONS (path '/user/hive, highavailability 'true', DefaultLimit '1000')")

      And this is what I get. I can see that it is kind of working because it seems that when it checks if the table exists, it searchs in the correct catalog (testmetastore). But finally when it tries to create the table, it uses the default catalog.

      scala> sqlContext.sql("CREATE TABLE students USING a OPTIONS (highavailability 'true', DefaultLimit '1000')").show
      15/06/18 10:28:48 INFO HiveMetaStore: 0: get_table : db=testmetastore tbl=students
      15/06/18 10:28:48 INFO audit: ugi=ccaballero ip=unknown-ip-addr cmd=get_table : db=testmetastore tbl=students
      15/06/18 10:28:48 INFO Persistence: Request to load fields "comment,name,type" of class org.apache.hadoop.hive.metastore.model.MFieldSchema but object is embedded, so ignored
      15/06/18 10:28:48 INFO Persistence: Request to load fields "comment,name,type" of class org.apache.hadoop.hive.metastore.model.MFieldSchema but object is embedded, so ignored
      15/06/18 10:28:48 INFO HiveMetaStore: 0: create_table: Table(tableName:students, dbName:default, owner:ccaballero, createTime:1434616128, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:

      Unknown macro: {DefaultLimit=1000, serialization.format=1, highavailability=true}

      ), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:

      Unknown macro: {EXTERNAL=TRUE, spark.sql.sources.provider=a}

      , viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
      15/06/18 10:28:48 INFO audit: ugi=ccaballero ip=unknown-ip-addr cmd=create_table: Table(tableName:students, dbName:default, owner:ccaballero, createTime:1434616128, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:array<string>, comment:from deserializer)], location:null, inputFormat:org.apache.hadoop.mapred.SequenceFileInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe, parameters:

      ), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{})), partitionKeys:[], parameters:

      Unknown macro: {EXTERNAL=TRUE, spark.sql.sources.provider=a}

      , viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
      15/06/18 10:28:49 INFO SparkContext: Starting job: show at <console>:20
      15/06/18 10:28:49 INFO DAGScheduler: Got job 2 (show at <console>:20) with 1 output partitions (allowLocal=false)
      15/06/18 10:28:49 INFO DAGScheduler: Final stage: ResultStage 2(show at <console>:20)
      15/06/18 10:28:49 INFO DAGScheduler: Parents of final stage: List()
      15/06/18 10:28:49 INFO DAGScheduler: Missing parents: List()
      15/06/18 10:28:49 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[6] at show at <console>:20), which has no missing parents
      15/06/18 10:28:49 INFO MemoryStore: ensureFreeSpace(1792) called with curMem=0, maxMem=278302556
      15/06/18 10:28:49 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 1792.0 B, free 265.4 MB)
      15/06/18 10:28:49 INFO MemoryStore: ensureFreeSpace(1139) called with curMem=1792, maxMem=278302556
      15/06/18 10:28:49 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1139.0 B, free 265.4 MB)
      15/06/18 10:28:49 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:59110 (size: 1139.0 B, free: 265.4 MB)
      15/06/18 10:28:49 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
      15/06/18 10:28:49 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[6] at show at <console>:20)
      15/06/18 10:28:49 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
      15/06/18 10:28:49 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1379 bytes)
      15/06/18 10:28:49 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
      15/06/18 10:28:49 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 628 bytes result sent to driver
      15/06/18 10:28:49 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 10 ms on localhost (1/1)
      15/06/18 10:28:49 INFO DAGScheduler: ResultStage 2 (show at <console>:20) finished in 0.010 s
      15/06/18 10:28:49 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool
      15/06/18 10:28:49 INFO DAGScheduler: Job 2 finished: show at <console>:20, took 0.016204 s
      ++

      ++
      ++

      Any suggestions would be appreciated.

      Thank you.

      Attachments

        Activity

          People

            lian cheng Cheng Lian
            ccaballero Cristian
            Votes:
            3 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: