Item401: OSX fails unit tests - notably UTF8 seg faults.
Priority: Normal
Current State: Closed
Released In: 1.0.0
Target Release: patch
Applies To: Engine
Component:
Branches:
weird thing. seems to me that
/($regex{validUtf8StringRegex})*/
causes it to barf, so perhaps its a resource limit that's making it go?
the patch below stops it, and I presume that while horrid, still works?
Index: /Users/svend/Sites/foswiki/core/lib/Foswiki.pm
===================================================================
--- Foswiki.pm (revision 1184)
+++ Foswiki.pm (working copy)
@@ -521,7 +521,10 @@
return undef if ( $text =~ $regex{validAsciiStringRegex} );
# If not UTF-8 - assume in site character set, no conversion required
- return undef unless ( $text =~ $regex{validUtf8StringRegex} );
+ #return undef unless ( $text =~ $regex{validUtf8StringRegex} ); #<-- this seg faults on OSX leopard.
+ my $trial = $text;
+ $trial =~ s/$regex{validUtf8CharRegex}//g;
+ return unless (length($trial) == 0);
# If site charset is already UTF-8, there is no need to convert anything:
if ( $Foswiki::cfg{Site}{CharSet} =~ /^utf-?8$/i ) {
Sorry, I can't really comment usefully here. I have been avoiding this piece of code like the plague. Richard is probably the only person who fully understands it.
I'm just wondering (and apparently I'm not the only one), why we're using a regexp where we could directly use Encode.
I know Encode is another module to require, thus another piece of code that gets loaded, but anyway some modules already require it (such as Wysiwyg).
To my humble opinion, if we want to go UTF-8, we will have to use some proper tool to do it, and thus Encode seems the appropriate choice.
Re-inventing the wheel using regexp can work, but...
Also, Encode uses XS, thus is much quicker than a regexp to achieve the same.
Funny:
http://develop.twiki.org/trac/changeset/17776
Item6146: Adding Encode as a required CPAN module
Encode was first released with perl 5.007003 (patchlevel perl/15039, released on 2002-03-05)
But according to people using it, it makes no sense doing UTF-8 with anything older than perl 5.8.3.
The patch above does fix the segfault on my mac.
I've commited it, and am adding a new task for Olivier and his
Encode
replacement work.
Item438
I'm going to close this, even though there are still unit test failures - the remaining are the 'rename topic issues' that are not OSX specific, they will occur on any non-case sensitive File system - notably on windows too -
Item439
--
SvenDowideit - 12 Dec 2008
If we are not supporting Perl 5.6 at all, we can just use Encode for this, or use Perl's feature to do same check (see
TWiki:Codev.UTF8 re security part - can we get this page pulled into Foswiki btw?). I suspect the performance benefit of using Encode is tiny if any, as this code is only processing a small part of the URL (topic and web names), and it is only paid by sites with UTF-8 URLs as there's an earlier check for pure ASCII I believe.
That regex was mostly used to work across 5.6 and 5.8 but it's clearly not that easy to read and if we are only supporting 5.8 (for all sites, not just those with UTF-8 turned on) then Encode is the way to go.
--
RichardDonkin - 13 Dec 2008