Internationalisation guidelines
This document is targeted at developers (core code, plugin code and user interface developers). If you are looking for instructions on configuring your Foswiki to work with your local language, see
InternationalizationSupplement
In the following,
I18N is shorthand for "internationalisation".
The Good News
It's easy to do
- It really is easy to internationalize your plugin or core code, so it works with almost any language, not just English!
- Typically, only a few lines need to change, in very simple ways
- All the hard work has been done for you already.
It will help your plugin, and Foswiki
- Using I18N lets your plugin (and Foswiki) be much more widely used, meaning more feedback, patches and general goodness
- The I18N regexes make your code more flexible- e.g. there's now a single place to change definition of a WikiWord across all plugins.
- They also work across systems from Windows to Linux and even mainframes, and across Perl versions and browsers
What you need to do
You do need to read this page and be a little careful when using regular expressions - any time you see
[A-Z]
,
\w
or
\b
, a little alarm bell should go off in your head saying "I should use one of the
I18N-aware regular expressions instead".
Goal of this page
Foswiki has reasonable internationalization (
I18N) support today (see
UserInterfaceInternationalisation). However, plugin developers and core coders need guidelines to avoid
I18N issues in the future, and to ensure their plugins are widely used, not just by people who speak English as their first language.
I18N code has tended to regress over the last few years - partly due to lack of unit tests, but also due to new code being written that doesn't follow these guidelines.
Overview
Internationalization in Foswiki supports use of locales to ensure
WikiWords and other page contents work with international characters (e.g.
GrødWeb.BlåBærGrød
), which means avoiding all use of
[A-Z]
or
\w
in regular expressions (except when you really mean A to Z of course, e.g. variable names that only use ASCII alphanumeric).
Fortunately, the changes required to your code are quite simple, and
"all the hard work has been done" (to quote
SteffenPoulsen!)
UserInterfaceInternationalisation guidelines are now included below - these show you how to make your user interface text ("message strings") internationalised, so that it can be translated as part of
TranslationUserInterface efforts.
What if you don't bother?
Unless you are careful about using regular expressions ('regexes') that match alphabetic characters, your plugin or core module probably won't work for users in languages other than English.
Guidelines
Preparing a plugin or core module for I18N
There's one simple thing you absolutely must do in your plugin or core module to allow for
I18N. You have to make sure the plugin can "see" the regular expressions set up in
Foswiki.pm
, and that it uses locale information.
You can do this by adding the following lines to the plugin, somewhere after the package declaration. This is from
InterwikiPlugin, which is part of the core and a good example:
BEGIN {
# Do a dynamic 'use locale' for this module
if( $Foswiki::cfg{UseLocale} ) {
require locale;
import locale();
}
}
Now everything is set up for using the regular expressions from
Foswiki.pm
. This means that you can just write
$Foswiki::regex{wikiWordRegex}
instead of figuring out your own matching rules for matching a
WikiWord, across the many sorts of broken
I18N locales and different Perl versions. A huge amount of testing and debugging is encapsulated for you, and you are also guaranteed that your regex code will work in future when Foswiki has
UnicodeSupport!
Fixing regular expressions that match letters
The main thing to do when internationalizing plugins is to
never use
[A-Z]
except when you really mean A to Z, ASCII only (e.g. in Macro names perhaps). Also, you should
never, ever use
\w
or
\b
since this match 'words' based on locales and
I18N characters, but don't work in many environments that have broken locales (including all Windows systems!) - instead, read on for how to write simple regexes that do the same thing portably across a wide variety of systems.
Whenever you see these patterns, ask yourself 'should this match accented characters as well as A-Z?' and 'is this really trying to match a page (topic) name or a web name'? (Page names are normally
WikiWords, but not always.)
Once you have identified these problem areas, you need to use the regexes carefully crafted in the startup code in
Foswiki.pm
. These regexes work across Perl 5.005_03, Perl 5.6, and 5.8 or higher, including environments with very broken Perl locales (e.g. Windows), so they are your best option for cross-version and cross-platform
I18N support. Code using these regexes will also work with future
UnicodeSupport when that's implemented, despite the actual regexes changing dramatically.
Quickly find problematic regexes -
To easily check core code or plugins for possible use of regexes that don't take account of
I18N, just run the following one-liner under Linux or Cygwin - it will search all *.pm files including any subdirectories for use of
[a-z]
,
\w
and
\b
, which are all potential issues unless you really mean A to Z without any international characters:
find . -name '*.pm' | xargs egrep -i '\(\[a-z\]|\\w|\\b\)' >regex-warnings.txt
Some editors such as
VimEditor and
EmacsCPerlMode can take this output file and help you easily navigate to the right place to see the code in context.
Types of pre-defined regexes
The startup code in
Foswiki.pm
pre-defines a number of complete regexes as well as strings for use in building character classes as part regexes. Naming is used to distinguish these - examples are from the point of view of the calling code:
- Complete regexes are compiled using
qr/.../
and can be used as part of larger regexes, or as is. They are named fooRegex
, e.g. $Foswiki::regex{wikiWordRegex}
and are usually 'concept regexes' that match email addresses, WikiWords, etc.
- Strings for use in character classes (i.e. within
[....]
in a regex) are just strings and must be used only in character classes. They are named foo
, i.e. no Regex suffix - for example $Foswiki::regex{mixedAlphaNum}
. On a Perl platform with broken I18N locales, this would be the string "a-zA-Z0-9"
- note no square brackets!
Fixing core code regexes
This is similar to the plugin code below, but a bit less verbose as you have direct access to regexes without going through the plugin API. For example, you would change the following:
if( $topic =~ /^\^\([\_\-a-zA-Z0-9\|]+\)\$$/ ) {
Into:
if( $topic =~ /^\^\([\_\-$Foswiki::regex{mixedAlphaNum}\|]+\)\$$/ ) {
That's still not hugely readable, but more complex regexes can be greatly simplified. The following code is much more readable than using
a-z
etc, as well as working for
I18N:
$anchorName =~ s/($Foswiki::regex{wikiWordRegex})/_$1/go;
# Prevent automatic WikiWord or CAPWORD linking in explicit links
$link =~ s/(?<=[\s\(])($Foswiki::regex{wikiWordRegex}|[$Foswiki::regex{upperAlpha}])/$1/
Fixing plugin regexes
Plugin code is a bit more verbose than core code as it must first get the regexes via the Plugin API - here's an example adapted from Plugin:InterwikiPlugin:
# Regexes for the Site:page format InterWiki reference
my $mixedAlphaNum = Foswiki::Func::getRegularExpression('mixedAlphaNum');
my $upperAlpha = Foswiki::Func::getRegularExpression('upperAlpha');
$sitePattern = "([$upperAlpha][$mixedAlphaNum]+)";
$pagePattern = "([${mixedAlphaNum}_\/][$mixedAlphaNum" . '\.\/\+\_\,\;\:\!\?\%\#-]+?)';
Regex efficiency
You may also want to use
/o
on your regex, or compile it using
$fooPatternRegex = qr/$someRegexVar/
, which should give better performance if used more than once, e.g. in loops or when running under
mod_perl. See
perldoc perlop
and
perldoc perlre
for details, and don't use this if the regex (not the substitution right-hand-side) includes 'real' variables that vary between invocations of your code, e.g. user name.
International message strings
As Foswiki supports user interface internationalization, you should now avoid putting English language strings
directly into Perl code. In addition, you should follow the main
InternationalisationGuidelines to ensure that regular expressions and other code work well across multiple languages and locales (i.e. countries or regions).
The
Foswiki::I18N
class encapsulates message text internationalization, and the
i18n
field of the Foswiki session object is an instance of this class. Thus, wherever
you might need to write an English string inside Perl code, you must write it wrapped in a call
to the
Foswiki:I18N::maketext
method, like this:
# $session is an instance of Foswiki class
my $msg = $session->i18n->maketext("Access denied: you don't have access for editing this topic.");
You can also interpolate parameters into the text, and let the translator correctly translate messages,
keeping a place for your parameters. Just write placeholders for the parameters numbered with the
parameter order in the
maketext
call:
[_1]
for the first parameter,
[_2]
for the second,
and so on. Then you can do things like this:
# $session is an instance of Foswiki class
my $msg = $session->i18n->maketext("This is topic [_1] on the web [_2].", $topic, $web);
Note that translators can change the order in which parameters appear in translated text
(i.e.
[_2]
appearing first than
[_1]
), but they must keep the text's semantics, so that
substituting the first parameter into
[_1]
and second parameter into
[_2]
says the
same thing that is said in original, whatever order they are in.
See
UserInterfaceInternationalisation for guidelines for writing message strings that can be translated.
Testing your fixes
Don't forget to test your code across a number of different
I18N areas:
- Page (WikiWord) and web names with I18N characters
- Page contents with I18N characters - usually not a problem
- Attachments with I18N characters in filename or the topic/web that contains attachment
- Searching for I18N characters - especially if external programs used
- Sorting to include I18N characters - whether using internal Perl code or external programs
If there is a valid locale that works within Perl, most things should 'just work' once you have fixed the regexes. However, on Windows and other platforms where locales are broken in Perl terms, you will only be able to do
I18N for page contents and page/web names.
Example of code that needs fixing
Taking the original
UpdateInfoPlugin as an example (now fixed...) - this uses
\w
to match a
WikiWord (which is actually incorrect anyway, as it will match non-System.WikiWords!), when it should use the relevant
WikiWord regex via the plugin API. You frequently find that you end up fixing other bugs when adding
I18N support, because it forces you to look closely at the regexes.
Discussion
Any comments on how the I18N documentation is written or could be improved