Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-14267

In Mapreduce on HBase scenario, restart in TableInputFormat will result in getting wrong data.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Abandoned
    • None
    • None
    • Client, mapreduce
    • None

    Description

      When I run a mapreduce job on HBase, I will modify the row got from Result.getRow(), for example, reverse the row. Since my program is very complicated to handle data, it takes long time, and the lease int Region server expired.
      Result#195

        public byte [] getRow() {
          if (this.row == null) {
            this.row = (this.cells == null || this.cells.length == 0) ?
                null :
                CellUtil.cloneRow(this.cells[0]);
          }
          return this.row;
        }
      

      TableInputFormat will restart the scan from last row, but the row has been modified, so it will read wrong data.
      TableRecordReaderImpl#218

            } catch (IOException e) {
              // do not retry if the exception tells us not to do so
              if (e instanceof DoNotRetryIOException) {
                throw e;
              }
              // try to handle all other IOExceptions by restarting
              // the scanner, if the second call fails, it will be rethrown
              LOG.info("recovered from " + StringUtils.stringifyException(e));
              if (lastSuccessfulRow == null) {
                LOG.warn("We are restarting the first next() invocation," +
                    " if your mapper has restarted a few other times like this" +
                    " then you should consider killing this job and investigate" +
                    " why it's taking so long.");
              }
              if (lastSuccessfulRow == null) {
                restart(scan.getStartRow());
              } else {
                restart(lastSuccessfulRow);
                scanner.next();    // skip presumed already mapped row
              }
              value = scanner.next();
              if (value != null && value.isStale()) numStale++;
              numRestarts++;
            }
            if (value != null && value.size() > 0) {
              key.set(value.getRow());
              lastSuccessfulRow = key.get();
              return true;
            }
      

      Attachments

        1. HBASE_14267_trunk_v1.patch
          0.5 kB
          Qianxi Zhang

        Activity

          People

            Unassigned Unassigned
            qianxiZhang Qianxi Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: