Uploaded image for project: 'ORC'
  1. ORC
  2. ORC-1030

Java Tools Recover File command does not accurately find OrcFile.MAGIC

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.7.0, 1.6.11
    • 1.7.1
    • Java, tools
    • None

    Description

              while (remaining > 0) {
                int toRead = (int) Math.min(DEFAULT_BLOCK_SIZE, remaining);
                byte[] data = new byte[toRead];
                long startPos = corruptFileLen - remaining;
                fdis.readFully(startPos, data, 0, toRead);
      
                // find all MAGIC string and see if the file is readable from there
                int index = 0;
                long nextFooterOffset;
                byte[] magicBytes = OrcFile.MAGIC.getBytes(StandardCharsets.UTF_8);
                while (index != -1) {
                  index = indexOf(data, magicBytes, index + 1);
                  if (index != -1) {
                    nextFooterOffset = startPos + index + magicBytes.length + 1;
                    if (isReadable(corruptPath, conf, nextFooterOffset)) {
                      footerOffsets.add(nextFooterOffset);
                    }
                  }
                }
      
                System.err.println("Scanning for valid footers - startPos: " + startPos +
                    " toRead: " + toRead + " remaining: " + remaining);
                remaining = remaining - toRead;
              }
      

      Two adjacent reads may be exactly separated by OrcFile.MAGIC, making it impossible to find the location of the recovered file. Because the current implementation only matches in a single read.

        private static int indexOf(final byte[] data, final byte[] pattern, final int index) {
          if (data == null || data.length == 0 || pattern == null || pattern.length == 0 ||
              index > data.length || index < 0) {
            return -1;
          }
      
          int j = 0;
          for (int i = index; i < data.length; i++) {
            if (pattern[j] == data[i]) {
              j++;
            } else {
              j = 0;
            }
      
            if (j == pattern.length) {
              return i - pattern.length + 1;
            }
          }
      
          return -1;
        }
      

      This matching algorithm is wrong when i does not backtrack after a failed match in the middle. As a simple example data = OOORC, pattern= ORC, index = 1, this algorithm will return -1.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Guiyankuang Yiqun Zhang
            Guiyankuang Yiqun Zhang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment