Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-17547

HBase-Spark Module : TableCatelog doesn't support multiple columns from Single Column family

    XMLWordPrintableJSON

Details

    • Reviewed
    • spark-hbase
    • Patch

    Description

      Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from single column family.

      Description:
      Datasource API under HBase-Spark Module having error, which accessing more than 1 columns from same column family.
      If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example.

      def empcatalog = s"""{

      "table": {"namespace":"empschema", "name":"emp"}

      ,

      "rowkey":"key",
      "columns":{
      "empNumber": {"cf":"rowkey", "col":"key", "type":"string"}

      ,

      "city": {"cf":"pdata", "col":"city", "type":"string"}

      ,

      "empName": {"cf":"pdata", "col":"name", "type":"string"}

      ,

      "jobDesignation": {"cf":"pdata", "col":"designation", "type":"string"}

      ,

      "salary": {"cf":"pdata", "col":"salary", "type":"string"}
      }
      }""".stripMargin

      Here, we have city, name, designation, salary from pdata column family.

      Exception while saving Dataframe at HBase.

      java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added
      at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
      at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
      at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
      at scala.collection.immutable.List.foreach(List.scala:381)
      at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
      at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
      at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
      at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)

      HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return.

      Unit test has been written for the same at DefaultSourceSuite.scala, writeCatalog object definition.

      Attachments

        1. HBASE-17547.master.001.patch
          4 kB
          Chetan Khatri

        Activity

          People

            chetkhatri Chetan Khatri
            chetkhatri Chetan Khatri
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: