Item10825: Using save in a rest call destroys character encoding.
Priority: Urgent
Current State: Closed
Released In: 2.0.0
Target Release: major
Given you POST a form to the save handler asynchronously, e.g. via jquery.form's ajaxSubmit.
Then this content will be utf8 encoded no matter what the charset of the html page is. That's standard by the specs.
That's different to a normal POST which will use the same charset as indicated by the html page it comes from.
Foswiki's Request object however silently assumes that anything it receives is encoded the way it has said while
rendering the html page.
So no surprise chars get mangled during a rest call of save.
Next problem is that the call itself does not specify any charset in the http headers part of the Content-Type header.
That's a no-go unfortunately.
From what I have tested the
only http header that differentiates a normal POST from an ajax POST is the
X-Requested-With
header, which is set to
XMLHttpRequest
in case of a rest call ... which also indicates that the content is encoded
in utf8.
Here's the patch to trunk's Request.pm that tries to fix this:
Index: lib/Foswiki/Request.pm
===================================================================
--- lib/Foswiki/Request.pm (revision 11812)
+++ lib/Foswiki/Request.pm (working copy)
@@ -407,11 +407,24 @@
my ( $key, @value ) = rearrange( [ 'NAME', [qw(VALUE VALUES)] ], @p );
+ my $requestedWith = $this->header("X-Requested-With") || '';
+
return @{ $this->{param_list} } unless defined $key;
if ( defined $value[0] ) {
push @{ $this->{param_list} }, $key
unless exists $this->{param}{$key};
- $this->{param}{$key} = ref $value[0] eq 'ARRAY' ? $value[0] : [@value];
+
+ if ($requestedWith eq 'XMLHttpRequest') {
+ # transform to site charset
+ if (ref $value[0] eq 'ARRAY') {
+ $this->{param}{$key} = [map(toSiteCharSet($_), @{$value[0]})];
+ } else {
+ $this->{param}{$key} = [map(toSiteCharSet($_), @value)];
+ }
+ } else {
+ $this->{param}{$key} = ref $value[0] eq 'ARRAY' ? $value[0] : [@value];
+ }
+
}
if ( defined $this->{param}{$key} ) {
return wantarray
@@ -423,6 +436,28 @@
}
}
+sub toSiteCharSet {
+ my $encoded = shift;
+
+ return $encoded
+ if ( !defined $Foswiki::cfg{Site}{CharSet}
+ || $Foswiki::cfg{Site}{CharSet} =~ m/^utf-?8$/i );
+
+ require Encode;
+ import Encode;
+ my $encoding = Encode::resolve_alias( $Foswiki::cfg{Site}{CharSet} );
+ if ( not $encoding ) {
+ print STDERR "'Conversion from $Foswiki::cfg{Site}{CharSet} not supported, or name not recognised - check perldoc Encode::Supported\n";
+ return $encoded;
+ }
+ else {
+
+ # converts to {Site}{CharSet}, generating HTML NCR's when needed
+ my $octets = Encode::decode( 'utf-8', $encoded );
+ return Encode::encode( $encoding, $octets, &Encode::FB_HTMLCREF() );
+ }
+}
+
=begin TML
Any comments?
--
MichaelDaum - 01 Jun 2011
This needs to work correctly for any request, not just XHR.
So please test both
X-Requested-With
as well as
Content-Type
.
Lack of
Content-Type
in a request which contains body data, such as POST, leaves us guessing about the nature of the data, and should be considered a very easily fixed bug of the client which built the request (unless the data is so alien that it can't be described with a MIME type or charset).
--
PaulHarvey - 01 Jun 2011
Please don't leave the "Waiting for" field empty when asking for feedback. I assume you meant to ask Micha, so I put his name in.
--
CrawfordCurrie - 08 Jun 2011
It was me leaving WaitingFor empty. That's by purpose to express the concept "anybody who feels inclined to comment / can somebody cross-check and leave his 2cent / This is sensitive stuff / I just don't want break it without a second opinion".
From what I can tell: it saved my life on a client's intranet using some custom ajax code, similar to the async "Save+Continue" in
NatEditPlugin.
--
MichaelDaum - 08 Jun 2011
Just one minor comment: as Paul wrote, it might make sense to encode more than just XHR, and some stuff which might not be in UTF-8. Si maybe just extend the code so that the toSiteCharset function takes an encoding as argument, and passes utf-8 for an XHR. Then it can easily be extended to get the encoding from the HTTP header, and re-encode it there.
Removing anybody from the WaitingFor field as plague doesn't like it. Micha, if you want other people to comment, ask them, or put them in the field.
--
OlivierRaginel - 25 Jun 2011
Michael. There is a developer desire to have me release a 1.1.4 within the next very few days. Is this one you want to fix the next day or so or do we have to carry it over to 1.1.5?
I agree it is an important bug
--
KennethLavrsen - 26 Jun 2011
I have been promising to help "finish" this task but got distracted on tackling deeper UTF-8 issues on trunk. I don't have time in the next few days to help do any more for 1.1.4.
At minimum we should just apply Michael's fix as-is for 1.1.4. XHR are probably the vast majority of rest script requests anyway. So in that case we should open a new urgent task to cover non-XHR requests for 1.1.5.
But it would be nice if somebody has time to add code and test non-XHR requests for 1.1.4. I just can't do it this week.
--
PaulHarvey - 26 Jun 2011
first up
do not duplicate existing functions
there's already a
Foswiki::I18N::toSiteCharSet
. if it needs some care, fix it.
- Nother version is part of the WysiwygPlugin, called
RESTParameter2SiteCharSet()
... so there already is redundancy. And they are causing trouble as they are applied multiple times to the same string by foswiki. -- MichaelDaum - 27 Jul 2011
secondly, I'd be tempted to set a foswiki context for it, and thus test for that context - that way plugins or whatever also get to know about it.
--
SvenDowideit - 01 Jul 2011
these charset conversion functions desperately need a new home
--
MichaelDaum - 01 Jul 2011
Warning: some plugins implementing REST handlers decode url params to the site's charset on their own. If both - the core and the plugins - do so, things go wrong badly.
See Item10995.
For what it's worth, there is quite an array of plugins that I reviewed that all implement some kind of encoding/decoding similar but not 100% equivalent to the method above, most notably also found in
WysiwygPlugin::Handlers.pm, called
RESTParameters2CharSet()
.
--
MichaelDaum - 23 Jul 2011
Which plugins?
--
CrawfordCurrie - 26 Sep 2011
The above patch is wrong. Basically we can't fix it until all of Foswiki decodes to internal string representation ... nothing we want to do on foswiki-1.1.x as it affects lots of code that needs fixing. So downgrading this bug item to normal not to block 1.1.4 anymore.
--
MichaelDaum - 18 Oct 2011
Concur, but I really want this "done" for 1.2/2.0 (and I will help with the work), so I'm making it urgent again against
ReleaseTarget minor.
--
PaulHarvey - 19 Oct 2011
Item11242 marked as duplicate of this item.
--
MichaelDaum - 09 Nov 2011
Also:
Item11932
--
MichaelDaum - 08 Jun 2012
Latest
NatEditPlugin now has got a REST handler for save+continue which is a thin wrapper around the normal save to perform the parameter transcoding.
WysiwygPlugin basically does the same for its REST handlers.
So this all isn't
really party time.
--
MichaelDaum - 10 Jul 2012
So why not use the one in
WysiwygPlugin?
--
SvenDowideit - 11 Jul 2012
Cus I've got wikis with no
WysiwygPlugin installed. And gawd it would be better to remove it from
NatEditPlugin and WysiwygPlugin and make it all work without jumping thru burning hoops. Right now I am more driven by urgent requests of users not being able to save&continue.
--
MichaelDaum - 11 Jul 2012
See also
ImprovedRESTSupport
--
CrawfordCurrie - 27 Mar 2013
Discussed at
FoswikiCamp2014, and agreed that documenting
save
script as inappropriate for REST (unless the site is Unicode) is appropriate.
--
CrawfordCurrie - 13 Mar 2014