I benchmark HTML entity encoding with Surveyor::Benchmark::HTMLEntities using the system I describe in Distribute benchmarks. Already Surveyor::App is making things easier for me.
Tokuhirom shipped HTML::Escape, which implements it encoder in XS, as he describes in his blog post about it. It can be a lot faster than the pure Perl HTML::Entities:
% env survey -p Surveyor::Benchmark::HTMLEntities http://www.perl.org/ Fetching http://www.perl.org/ HTML is 14983 bytes > (418) < (418) & (7) ' (103) Benchmark: timing 10000 iterations of html_entities, html_escape... html_entities: 13 wallclock secs (13.75 usr + 0.01 sys = 13.76 CPU) @ 726.74/s (n=10000) html_escape: 1 wallclock secs ( 0.64 usr + 0.00 sys = 0.64 CPU) @ 15625.00/s (n=10000)
The pure Perl fair fight is also much faster in HTML::Escape:
% env PERL_ONLY=1 survey -p Surveyor::Benchmark::HTMLEntities http://www.perl.org/ Fetching http://www.perl.org/ HTML is 14857 bytes > (416) < (416) & (7) ' (103) Benchmark: timing 10000 iterations of html_entities, html_escape... html_entities: 14 wallclock secs (13.74 usr + 0.01 sys = 13.75 CPU) @ 727.27/s (n=10000) html_escape: 7 wallclock secs ( 7.32 usr + 0.01 sys = 7.33 CPU) @ 1364.26/s (n=10000)
There’s a reason for though: HTML::Escape only handles the characters <, >, &, ‘, and “, while HTML::Entities lets me configure the characters to escape and by default also escapes wide characters.
My Surveyor::App made this simple for me. I created the benchmark, but also ran the target code tests to ensure that I’m returning the same thing. Through that I was able to adjust the target code of HTML::Entities to only escape the same things as HTML::Escape. I might have skipped that step otherwise.
And, now knowing this, I updated the Stackoverflow answer for How can I encode a string for HTML?.