Priority: Urgent
Current State: Waiting for Release
Released In: 2.2.0
Target Release: minor
I just discovered in
Foswiki/Serialise/Embedded.pm
:
$args =~ s/\s*([^=]+)="([^"]*)"/
$res{$1} = Foswiki::Meta::dataDecode( $2 ), ''/ge;
That regex allows spaces in parameter names in meta-data. So I might have:
%META:FOO{bar="ok bar ="snafu"}%
When this is parsed, the Meta object will end up with FOO having two parameters,
'bar' => 'ok'
and
'bar ' => 'snafu'
This is generally not a problem because meta-data tends to be generated by someone e.g. plugins. But where a footpad has hacked a topic, they might have introduced an errant space (indeed, this is how I encountered this).
I think this is just a bad regex (it's probably been there since the early days of the empire). However fixing it might cause an issue as a parameter that has up to now been ignored suddenly jumps into life or gets a new value.
Either the code needs to change to trim the parameter name, or this "feature" needs to be clearly documented.
(Actually it allows arbitrary chars. So
%META:WORKFLOW{bar="ok" *&^ ^%£ $% $="oh dear"}%
is also fine. The parameter name
*&^ ^%£ $% $
is quite likely to break something)
MetaData says:
- Format is the same as for any other macros except that each meta-data macro must be on a line on its own.
-
%META:<type>{key1="value1" key2="value2" ...}%
- The characters
%"\r\n{}
are encoded in argument values, using the standard URL encoding.
- Meta-data is divided into core meta-data, described below, and extension meta-data, which shares the same syntax but is used by extensions.
- Dates are stored as "epoch times" i.e. the integer number of seconds since 1st January 1970.
Which is clearly wrong, as the macro parser is nowhere near as liberal. I favour tightening down the syntax.
--
CrawfordCurrie - 10 Apr 2017
(13:24:56) gac410: This only impacts META: and not macro { } syntax, right?
(13:26:17) jast: makes sense to me, too, FWIW
(13:26:24) cdot: right
(13:26:55) MichaelDaum: just a bad regex ... depends on how much worse you are going to make it ;)
(13:27:22) cdot: MichaelDaum: I was just going to use \S. Since the core is unicode now, that should be OK.
(13:29:05) MichaelDaum: you mean [^\s"] ?
(13:29:47) MichaelDaum: what does Foswiki::Attr look like?
(13:30:20) cdot: Foswiki::Attrs uses [#a-z0-9_]+
(13:31:48) MichaelDaum: gac410, not sure. Let's fix it.
(13:32:20) MichaelDaum: cdot, maybe we should try to bring both parsers F:A and F:S:E in line ... a bit
(13:32:51) MichaelDaum: stuff like %META:FOO{$="bar"}% is a bit off imho
(13:33:06) cdot: maybe. The Attrs parser is a lot more functional than the META parser, which just needs to be fast.
(13:33:20) cdot: for example, Attrs has "plain text"
(13:33:30) cdot: and other "features".
(13:33:47) gac410: For F:A I added # as a valid charcter to support SCRIPTURL{ #=" " } for anchors
(13:33:50) MichaelDaum: +1 on faster. de-serializing meta is _the_ no.1 performance killer atm.... as the meta cache that we have isnt being used most of the time
(13:34:31) cdot: y. The app I found this problem with is the SQL cache (DBIStoreContrib). You can't have '#' in a column name in SQL :-(
(13:34:50) cdot: so I think tightening even further to exclude the # is good business.
(13:34:50) MichaelDaum: ah ok
(13:34:57) MichaelDaum: y
(13:35:22) gac410: cdot... Remove # ... From the Foswiki::Attr and macros. That breaks all the Canonical URL changes
(13:35:44) gac410: no, just disallow # in META
(13:35:51) gac410: oh... okay phew...
(13:40:10) cdot: OK, changing the META parser to: $args =~ s/\s*([a-z][a-z0-9_]*)="([^"]*)"/.../gi
(13:40:44) MichaelDaum: that might be too latin1 centric
(13:41:10) MichaelDaum: and break people using MetaDataPlugin with unicode attribute names
(13:41:56) MichaelDaum: in MetaDataPlugin DataForm field names become meta data key names...
(13:43:01) cdot: %META:汉语/漢語{中文="OMG"}%
(13:43:06) gac410: Y. We support unicode fieldnames now ... I think.
(13:43:37) cdot: if it were user facing I might worry about it. But this is hidden, system level stuff.
(13:44:03) MichaelDaum: cdot, lets keep any [a-z][A-Z] type of regexes out of the code, please
(13:44:43) cdot: all right, we can compromise on \w+ then
Fixed by limiting parameter names to perl \w (unicode word character)
--
Main.CrawfordCurrie - 10 Apr 2017 - 15:30
This branch has
not been merged to any release branch nor scheduled for any release. Please do so and specify the next appropriate release this is going to be included. Thanks.
--
MichaelDaum - 01 Oct 2018
Actually this seems to be merged into master, which makes it part of Foswiki 2.2. So it could either be "Waiting for Release" in "2.2" target minor, or as a single commit, it could be cherry picked into Relase02x01 and set waiting for 2.1.7
--
GeorgeClark - 01 Oct 2018