[STDCXX-142] std::stringsteram insertion of character arrays very slow - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: 4.1.2, 4.1.3
Fix Version/s: 4.1.4
Component/s: 27. Input/Output
Labels:
None
Environment:

all

Description

From http://mail-archives.apache.org/mod_mbox/incubator-stdcxx-dev/200602.mbox/%3c43ECE5A3.1070609@roguewave.com%3e:

-------- Original Message --------
Subject: Benchmarking stdcxx
Date: Fri, 10 Feb 2006 12:12:35 -0700
From: Andrew Black <ablack@roguewave.com>
Reply-To: stdcxx-dev@incubator.apache.org
To: stdcxx-dev@incubator.apache.org

Greetings all.

I thought it might be interesting to do some benchmarking, comparing the
performance of stdcxx with other standard libraries. As there are a
number of attributes that can be compared when doing a benchmark, and an
even larger number of classes that can be looked at, there is a fair
amount of choice in what to measure. As a starting point, I chose to
measure the runtime performace of stringstream objects.

Measurements were taken on my linux box (a 1.9 GHz P4), with a light
load (number of running applications, but most were idle) and an 8d
(single threaded, release, shared) version of stdcxx. Each test was run
5 times in a row, with a count of 500000 iterations. The following
table lists the run times collected. All times are in seconds.

----------------------------------------------

test name

gcc 3.2.3

stdcxx 4.1.3

----------------------------------------

usr

sys

usr

sys

----------------------------------------

read_single	8.977	0.008	13.997	0.012
	7.856	0.008	13.913	0.016
	8.021	0.012	13.817	0.024
	7.736	0.020	28.634	0.016
	7.844	0.012	13.841	0.016

----------------------------------------

read_multi	0.608	0.744	0.864	0.756
	0.688	0.704	0.860	0.736
	0.660	0.728	0.856	0.712
	0.608	0.792	0.848	0.724
	0.552	0.796	0.796	0.780

----------------------------------------

write_single	1.976	0.000	30.450	0.048
	2.356	0.012	30.526	0.064
	1.984	0.000	30.354	0.032
	1.964	0.012	30.350	0.028
	1.936	0.000	30.286	0.036

----------------------------------------

write_multi	1.172	2.352	32.326	2.320
	1.092	2.444	31.102	2.216
	1.164	2.360	30.482	2.248
	1.148	2.380	31.930	2.180
	1.000	2.532	29.534	2.272

----------------------------------------

read_write_single	7.684	0.000	13.649	0.016
	7.684	0.012	13.685	0.016
	7.664	0.012	14.193	0.016
	8.353	0.012	13.745	0.016
	7.700	0.012	13.677	0.004

----------------------------------------

read_write_cycle	0.056	0.412	0.000
	0.056	0.424	0.004
	0.056	0.428	0.004
	0.056	0.420	0.004
	0.056	0.412	0.004

----------------------------------------

read_write_multi	0.664	0.732	1.028	0.716
	0.676	0.712	0.988	0.744
	0.632	0.752	1.036	0.716
	0.688	0.704	1.080	0.732
	0.632	0.732	0.940	0.804

----------------------------------------

write_read_single	7.868	0.016	43.407	0.044
	7.896	0.012	43.895	0.044
	7.888	0.008	43.307	0.076
	7.912	0.012	43.391	0.032
	8.337	0.016	43.375	0.044

----------------------------------------

write_read_cycle	0.056	0.000	0.412	0.004
	0.056	0.000	0.404	0.016
	0.056	0.000	0.412	0.000
	0.056	0.000	0.420	0.000
	0.052	0.004	0.416	0.004

----------------------------------------

write_read_multi	7.340	2.404	43.591	2.408
	7.420	2.400	42.347	2.196
	7.440	2.376	45.227	2.336
	7.232	2.476	43.679	2.316
	7.348	2.488	44.271	2.348

----------------------------------------

Analysis:
Using the numbers above, I did some basic analysis. System times spent
for a given test appear to be roughly the same, so I am overlooking
those numbers at this time.
To look at these numbers, I see two or three stastical operations that
could be of use.
The first operation is the arithmatic average ('average') of the
numbers. This is the 'classic' sum and divide average. The second
operation is the medan value (middle number) in the set. The final
operation is what I term the 'middle average'. I calculate this by
throwing out the highest and lowest value, then calculating the
arithmatic average of the remaining numbers.
In the tables below, ratio indicates how much longer the stdcxx runs
take compared to the gcc runs, with 0% indicating they take the same
amount of time.

--------------------------------------+

read_single

gcc

stdcxx

ratio

--------------------------------------+

average

8.087

16.840

108.25%

--------------------------------------+

middle average

7.907

13.917

76.01%

--------------------------------------+

medan

7.856

13.913

77.10%

--------------------------------------+

read_multi

gcc

stdcxx

ratio

--------------------------------------+

average

0.623

0.845

35.56%

--------------------------------------+

middle average

0.625

0.855

36.67%

--------------------------------------+

medan

0.608

0.856

40.79%

--------------------------------------+
--------------------------------------+

write_single

gcc

stdcxx

ratio

--------------------------------------+

average

2.043

30.393

1387.53%

--------------------------------------+

middle average

1.975

30.385

1438.72%

--------------------------------------+

medan

1.976

30.354

1436.13%

--------------------------------------+

write_multi

gcc

stdcxx

ratio

--------------------------------------+

average

1.115

31.075

2686.48%

--------------------------------------+

middle average

1.135

31.171

2647.18%

--------------------------------------+

medan

1.148

31.102

2609.23%

--------------------------------------+

read_write_single

gcc

stdcxx

ratio

--------------------------------------+

average

7.817

13.790

76.41%

--------------------------------------+

middle average

7.689

13.720

78.20%

--------------------------------------+

medan

7.684

13.685

78.10%

--------------------------------------+

read_write_cycle

gcc

stdcxx

ratio

--------------------------------------+

average

0.056

0.419

648.57%

--------------------------------------+

middle average

0.056

0.419

647.62%

--------------------------------------+

medan

0.056

0.420

650.00%

--------------------------------------+

read_write_multi

gcc

stdcxx

ratio

--------------------------------------+

average

0.658

1.014

54.07%

--------------------------------------+

middle average

0.657

1.017

54.77%

--------------------------------------+

medan

0.664

1.028

54.82%

--------------------------------------+

write_read_single

gcc

stdcxx

ratio

--------------------------------------+

average

7.980

43.475

444.79%

--------------------------------------+

middle average

7.899

43.391

449.35%

--------------------------------------+

medan

7.896

43.391

449.53%

--------------------------------------+

write_read_cycle

gcc

stdcxx

ratio

--------------------------------------+

average

0.055

0.413

647.83%

--------------------------------------+

middle average

0.056

0.413

638.10%

--------------------------------------+

medan

0.056

0.412

635.71%

--------------------------------------+

write_read_multi

gcc

stdcxx

ratio

--------------------------------------+

average

7.356

43.823

495.74%

--------------------------------------+

middle average

7.369

43.847

494.99%

--------------------------------------+

medan

7.348

43.679

494.43%

--------------------------------------+

Conclusions:
Looking over the processed numbers from the runs, one thing that jumps
out at me is the write times, particularly the write_single and
write_multi benchmarks. Both of these benchmarks are an order of
magnitude slower than their GCC counterparts (at least on this
computer). The write_multi benchmark in particular shows what happens
if you stream large amounts of data (~250 MB worth of data in this case)
into a strstream, without streaming any out.

Future:
For those interested in trying to repeat these tests, I have attached
the source and makefile files I used to generate these benchmarks. This
particular benchmark is a work in progress. There are several
additional things that could be benchmarked regarding stringstreams.
These include allocation (default, string, copy), pseudo-random
read/writes (rather than pattern read/writes), reads and writes of
varying length strings, and reading/writing using something other than
the insertion and extraction operators.

--Andrew Black

Attachments

Issue Links

is blocked by

STDCXX-149 [LWG #432] stringbuf::overflow() makes only one write position available

Closed

is related to

STDCXX-366 Add benchmarking framework

Open

Sub-Tasks

std::stringbuf(const string&) inefficient

Resolved

Martin Sebor

Activity

People

Assignee:: Martin Sebor

Reporter:: Martin Sebor

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 13/Feb/06 08:14

Updated:: 17/Oct/07 21:30

Resolved:: 05/Mar/06 07:23