Item9289: Backlink search is A-Z centric and cannot handle I18N or special characters
Priority: Normal
Current State: Confirmed
Released In: n/a
Target Release: minor
Applies To: Engine
Component: I18N, Backlinks
Branches:
Our backlinks feature is flawed.
There are several problems
- If you have a topic called "Hello++Hello" the search code barfs because of the + signs in a regex.
- If the topic name you search for is part of an I18N word like SuperDuperØsters then it finds SuperDuper as a backlink because it does not see the Ø as a letter. Same with accented characters.
There are two problems.
First the regex search cannot handle regex characters. We need to escape the regex characters of %TOPIC. Maybe we need a BACKLINKS macro which can be defined only in template land where we can escape the regex characters in TOPIC?
2nd the regex in the backlinks templates has a [A-Za-z0-9] in them.
We cannot use A-Z in ANY regex in our user interface. Most countries in this world has other characters then A-Z.
Even if we cannot get it perfect, we can at least make it near perfect. We probably have to use a character class. Maybe a negative one that lists all the characters we do not see as part of a topic name. We are depending on the name filter and I am not sure we can get to this. But at least we can use the default. Not many admins will ever change the namefilter anyway.
--
KennethLavrsen - 09 Jul 2010
Defer to 1.1.5
--
GeorgeClark - 13 Dec 2011
Defer to 2.0. UTF8 core rework is planned for a major release.
--
GeorgeClark - 18 Feb 2012
The A-Z referred to is in templates. Not sure how to put unicode character ranges into searches, as the search engines are not
required to support any specific classes.
The templates also don't provide any way to quotemeta.
--
Main.CrawfordCurrie - 24 Jun 2015 - 16:17