Item10748: foswiki Response::body() corrupting international chars if the underlying store is utf-8 and Site-CharSet is set to utf-8
Priority: Urgent
Current State: Needs Developer
Released In: n/a
Target Release: n/a
For example: any ndash (HTML entity:
–
): '–' is converted to a question-mark when
MongoDBPlugin is enabled.
Currently we have
$Foswiki::cfg{Site}{CharSet} = 'utf-8';
Steps to reproduce:
- Configure with MongoDBPlugin. Don't need search/query algo's set.
- Edit a topic in WYSIWYG
- Switch to raw
- place
–
- Switch back to WYSIWYG
- Save & continue
- Observe ? instead of –
- Disable MongoDBPlugin in configure
- place the
–
character into the topic again, and save
- Observe that
–
is no longer corrupted
--
PaulHarvey - 16 May 2011
The only reason I've got WYSIWYG in the steps-to-reproduce is to avoid any ambiguity with which dash to paste in. The-UTF-8-version-of-– just seems long-winded
WysiwygPlugin happily converts entities into the "native" charset (by design), and given that UTF-8 can represent them natively, they are converted instead of being left as entities
--
PaulHarvey - 16 May 2011
Okay, a new and improved procedure:
- Disable MongoDBPlugin
- Edit this topic, and save: TestNdash.txt - observe that the topic remains unchanged
- Enable MongoDBPlugin
- Edit the topic, and save again
- Observe that the ndash is replaced with a '?'
--
PaulHarvey - 17 May 2011
The attached file uses UTF-8 encoding
--
PaulHarvey - 17 May 2011
this appears to be a foswiki core issue - I've commited a change that I suspect resolves it. Please provide feedback?
--
SvenDowideit - 17 May 2011
It now works for plain old CGI. But
FastCGIEngineContrib is crashing (some Foswiki.pm regexes fail with an error mentioning "malformed utf-8", other times it's "wide character in output"
--
PaulHarvey - 17 May 2011
Same feedback I did with Paul's other task. I would think binmode is more appropriate on the output filehandle than encode, but I don't have time to test it
--
OlivierRaginel - 17 May 2011
need to consider the viewfile case
--
PaulHarvey - 18 May 2011
If every open is in :utf8, then viewfile should also work. But I'll try to work on this and test this asap.
--
OlivierRaginel - 18 May 2011
See also
Item10635.
--
PaulHarvey - 19 May 2011
This is so dodgy, it makes me want to weep. I personally do not favour attempting to fix this without major investment in unit tests and conversion to unicode.
--
CrawfordCurrie - 21 May 2011
I think we all agree there. The question is: what should we target to? My guess is that we should put everything to utf8, hence do all open calls in utf8 and binmode STDOUT too.
But yes, this needs serious testing, and I'll try to start by looking into how catalyst and others are dealing with it.
--
OlivierRaginel - 21 May 2011