Uploaded image for project: 'CarbonData'
  1. CarbonData
  2. CARBONDATA-3565

Binary to string issue when loading dataframe data in NewRddIterator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.6.0
    • 2.0.0
    • spark-integration
    • None

    Description

      • issue
        Spark DataFrame(SQL) load complex binary data to a hive table, the data will be broken when reading out. I see in RddIterator, the data will be converted to a string, and then be converted back.
      • test case
        Binary data can be DataOutputStream#writeDouble and so on.
      • discussion
        I think CarbonScalaUtil#getString operation can be removed now. I dig deep into the code in 2016, the code was used in kettle CsvInput (commit: 0018756d). But the code has been removed now, I think this converting operation is a little redundant. (UPDATE: The follow-up code GenericParser will use this string-convert logic, should consider here.)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              514793425@qq.com ChenKai
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h
                  6h