Benchmarking HTML entity encoding

I benchmark HTML entity encoding with Surveyor::Benchmark::HTMLEntities using the system I describe in Distribute benchmarks. Already Surveyor::App is making things easier for me.

Tokuhirom shipped HTML::Escape, which implements it encoder in XS, as he describes in his blog post about it. It can be a lot faster than the pure Perl HTML::Entities:

% env survey -p Surveyor::Benchmark::HTMLEntities http://www.perl.org/
Fetching http://www.perl.org/
HTML is 14983 bytes
> (418)
< (418)
& (7)
' (103)
Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 13 wallclock secs (13.75 usr +  0.01 sys = 13.76 CPU) @ 726.74/s (n=10000)
html_escape:  1 wallclock secs ( 0.64 usr +  0.00 sys =  0.64 CPU) @ 15625.00/s (n=10000)

The pure Perl fair fight is also much faster in HTML::Escape:

% env PERL_ONLY=1 survey -p Surveyor::Benchmark::HTMLEntities http://www.perl.org/
Fetching http://www.perl.org/
HTML is 14857 bytes
> (416)
< (416)
& (7)
' (103)
Benchmark: timing 10000 iterations of html_entities, html_escape...
html_entities: 14 wallclock secs (13.74 usr +  0.01 sys = 13.75 CPU) @ 727.27/s (n=10000)
html_escape:  7 wallclock secs ( 7.32 usr +  0.01 sys =  7.33 CPU) @ 1364.26/s (n=10000)

There’s a reason for though: HTML::Escape only handles the characters <, >, &, ‘, and “, while HTML::Entities lets me configure the characters to escape and by default also escapes wide characters.

My Surveyor::App made this simple for me. I created the benchmark, but also ran the target code tests to ensure that I’m returning the same thing. Through that I was able to adjust the target code of HTML::Entities to only escape the same things as HTML::Escape. I might have skipped that step otherwise.

And, now knowing this, I updated the Stackoverflow answer for How can I encode a string for HTML?.