Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-1758

processing escapes in a jute record is quadratic

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: 0.13.0
    • Fix Version/s: 0.15.0
    • Component/s: record
    • Labels:
      None

      Description

      The following code appears in hadoop/src/c++/librecordio/csvarchive.cc :

      static void replaceAll(std::string s, const char *src, char c)
      {
      std::string::size_type pos = 0;
      while (pos != std::string::npos) {
      pos = s.find(src);
      if (pos != std::string::npos)

      { s.replace(pos, strlen(src), 1, c); }

      }
      }

      This is used in the context of replacing jute escapes in the code:

      void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag)
      {
      t = readUptoTerminator(stream);
      if (t[0] != '\'')

      { throw new IOException("Errror deserializing string."); }

      t.erase(0, 1); /// erase first character
      replaceAll(t, "%0D", 0x0D);
      replaceAll(t, "%0A", 0x0A);
      replaceAll(t, "%7D", 0x7D);
      replaceAll(t, "%00", 0x00);
      replaceAll(t, "%2C", 0x2C);
      replaceAll(t, "%25", 0x25);

      }

      Since this replaces the entire string for each instance of the escape sequence, practically anything would be better. I would propose that within deserialize we allocate a char * [since each replacement is smaller than the original], scan for each %, and either do a general hex conversion in place or look for one of the six patterns, and after each replacement move down the unmodified text and scan for the % fom that starting point.

      -dk

        Attachments

          Activity

            People

            • Assignee:
              vivekr Vivek Ratan
              Reporter:
              dking Dick King
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: