[HADOOP-1758] processing escapes in a jute record is quadratic - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.13.0
Fix Version/s: 0.15.0
Component/s: record
Labels:
None

Description

The following code appears in hadoop/src/c++/librecordio/csvarchive.cc :

static void replaceAll(std::string s, const char *src, char c)
{
std::string::size_type pos = 0;
while (pos != std::string::npos) {
pos = s.find(src);
if (pos != std::string::npos)

{ s.replace(pos, strlen(src), 1, c); }

}
}

This is used in the context of replacing jute escapes in the code:

void hadoop::ICsvArchive::deserialize(std::string& t, const char* tag)
{
t = readUptoTerminator(stream);
if (t[0] != '\'')

{ throw new IOException("Errror deserializing string."); }

t.erase(0, 1); /// erase first character
replaceAll(t, "%0D", 0x0D);
replaceAll(t, "%0A", 0x0A);
replaceAll(t, "%7D", 0x7D);
replaceAll(t, "%00", 0x00);
replaceAll(t, "%2C", 0x2C);
replaceAll(t, "%25", 0x25);

}

Since this replaces the entire string for each instance of the escape sequence, practically anything would be better. I would propose that within deserialize we allocate a char * [since each replacement is smaller than the original], scan for each %, and either do a general hex conversion in place or look for one of the six patterns, and after each replacement move down the unmodified text and scan for the % fom that starting point.

-dk

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

1758_01.patch
30/Aug/07 05:00
2 kB
Vivek Ratan

Activity

People

Assignee:: Vivek Ratan

Reporter:: Dick King

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 22/Aug/07 18:03

Updated:: 05/Nov/07 18:12

Resolved:: 05/Sep/07 21:56