[HBASE-22711] Spark connector doesn't use the given mapping when inserting data - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: connector-1.0.0
Fix Version/s: hbase-connectors-1.0.1
Component/s: hbase-connectors
Labels:
None

Description

In some cases a Spark DataFrames cannot be read back with the same mapping as they were written. For example:

val sql = spark.sqlContext

val persons =
    """[
      |{"name": "alice", "age": 20, "height": 5, "email": "alice@alice.com"},
      |{"name": "bob", "age": 23, "height": 6, "email": "bob@bob.com"},
      |{"name": "carol", "age": 12, "email": "carol@carol.com", "height": 4.11}
      |]
    """.stripMargin

val df = spark.read.json(Seq(persons).toDS)

df.write
  .format("org.apache.hadoop.hbase.spark")
  .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email STRING c:email, height FLOAT p:height")
  .option("hbase.table", "person")
  .option("hbase.spark.use.hbasecontext", false)
  .save()

It cannot be read back with the same mapping:

val df2 = sql.read
  .format("org.apache.hadoop.hbase.spark")
  .option("hbase.columns.mapping", "name STRING :key, age SHORT p:age, email STRING c:email, height FLOAT p:height")
  .option("hbase.table", "person")
  .option("hbase.spark.use.hbasecontext", false)
  .load()

df2.createOrReplaceTempView("tableView")

val results = sql.sql("SELECT * FROM tableView")
results.show()

The results:

+---+-----+---------+---------------+
|age| name|   height|          email|
+---+-----+---------+---------------+
|  0|alice|   2.3125|alice@alice.com|
|  0|  bob|    2.375|    bob@bob.com|
|  0|carol|2.2568748|carol@carol.com|
+---+-----+---------+---------------+

Spark stores integer values in long, floating point values in double so shorts become 8 bytes long, floats also become 8 bytes long in HBase:

shell> scan 'person'
 alice                column=p:age, timestamp=1563450714829, value=\x00\x00\x00\x00\x00\x00\x00\x14
 alice                column=p:height, timestamp=1563450714829, value=@\x14\x00\x00\x00\x00\x00\x00

Attachments

Issue Links

links to

GitHub Pull Request #39

Activity

People

Assignee:: Balazs Meszaros

Reporter:: Balazs Meszaros

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 18/Jul/19 12:06

Updated:: 22/Jul/19 14:32

Resolved:: 22/Jul/19 14:32