Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
Reviewed
-
spark-hbase
-
Patch
Description
Issue: HBase-Spark Module : TableCatelog doesn't supports multiple columns from single column family.
Description:
Datasource API under HBase-Spark Module having error, which accessing more than 1 columns from same column family.
If your catalog having the format where you have multiple columns from single / multiple column family, at that point it throws an exception, for example.
def empcatalog = s"""{
"table":
{"namespace":"empschema", "name":"emp"}
, |
"rowkey":"key", |
"columns":{ |
"empNumber":
{"cf":"rowkey", "col":"key", "type":"string"}
, |
"city":
{"cf":"pdata", "col":"city", "type":"string"}
, |
"empName":
{"cf":"pdata", "col":"name", "type":"string"}
, |
"jobDesignation":
{"cf":"pdata", "col":"designation", "type":"string"}
, |
"salary": {"cf":"pdata", "col":"salary", "type":"string"} |
} |
}""".stripMargin |
Here, we have city, name, designation, salary from pdata column family.
Exception while saving Dataframe at HBase.
java.lang.IllegalArgumentException: Family 'pdata' already exists so cannot be added
at org.apache.hadoop.hbase.HTableDescriptor.addFamily(HTableDescriptor.java:827)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:98)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation$$anonfun$createTable$1.apply(HBaseRelation.scala:95)
at scala.collection.immutable.List.foreach(List.scala:381)
at org.apache.spark.sql.execution.datasources.hbase.HBaseRelation.createTable(HBaseRelation.scala:95)
at org.apache.spark.sql.execution.datasources.hbase.DefaultSource.createRelation(HBaseRelation.scala:58)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:457)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
HBaseTableCatalog.scala class has getColumnFamilies method which returns duplicates, which should not return.
Unit test has been written for the same at DefaultSourceSuite.scala, writeCatalog object definition.