This question about Developing extensions (plugins skins etc.): Asked
Solr WordDelimiterFilterFactory splitOnNumerics ?
We are using solr 5.5.5 and
SolrPlugin 7.30 on latest Foswiki.
It seems that the Solr WordDelimiterFilterFactory default setting splitOnNumeric=1 yields to bad search results, at least in our context.
E.g. when searching for acc09 (which is known abbreviation in our context) then top ranked results are .xlsx attachment which have somewhere separated acc and 09 in the filename, the document itself does not contain
acc09.
On the otherside when searching quoted "acc09" or use facet topic then we get many topics which contain
acc09 exactly (expected result).
Two questions:
- why are topics containting multiple exact terms not ranked on top ?
- would it not be more intuitive to have splitOnNumeric=0 as default ?
http://lucene.apache.org/core/5_5_5/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilterFactory.html
I think i will adjust foswiki_configs/conf/schema.xml and use splitOnNumeric=0.
--
UlrichLeodolter - 04 Dec 2019
Could you report back whether these settings give better results and I'll integrate it into the next release. Thanks.
--
MichaelDaum - 04 Dec 2019