Uploaded image for project: 'Sqoop'
  1. Sqoop
  2. SQOOP-3267

Incremental import to HBase deletes only last version of column

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 1.4.7
    • Fix Version/s: 1.5.0
    • Component/s: hbase-integration
    • Labels:
      None

      Description

      Deletes are supported since SQOOP-3149, but we're only deleting the last version of a column when the corresponding cell was set to NULL in the source table.

      This can lead to unexpected and misleading results if the row has been transferred multiple times, which can easily happen if it's being modified on the source side.

      Also SQOOP-3149 is using a new Put command for every column instead of a single Put per row as before. This could probably lead to a performance drop for wide tables (for which HBase is otherwise usually recommended).

      Jilani Shaik, Anna Szonyi could you please comment on what you think would be the expected behavior here?

        Attachments

        1. SQOOP-3267.1.patch
          2 kB
          Daniel Voros
        2. SQOOP-3267.2.patch
          19 kB
          Daniel Voros

          Issue Links

            Activity

              People

              • Assignee:
                dvoros Daniel Voros
                Reporter:
                dvoros Daniel Voros
              • Votes:
                0 Vote for this issue
                Watchers:
                6 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: