Item8629: refactor KinoSearchContrib to use StringiferContrib
Priority: Urgent
Current State: Closed
Released In: n/a
Target Release: n/a
or, consider deprecating
KinoSearchContrib when the
SolrPlugin is released. we should be able to keep
KinoSearchPlugin (but plugged into the Solr backend) so that existing topic contents continue to work. if we're really ambitious, we can ship kinosearch script shims so that
KinoSearch upgraders wouldn't have to even change their system scripts.
--
WillNorris - 26 Feb 2010
I would much rather keep the variety. There may be other reasons why Solr isn't suitable for some users.
--
SvenDowideit - 27 Feb 2010
but plugged into the Solr backend ... does not compute.
Basically, kino as well as good old plucene are embedded frontends to lucene. Solr, on the other hand is a "serverization" of lucene ... and a lot more. Feature-wise, it is a true superset of kino or plucene.
Main disadvantage, lots of perl dependencies.
StringifierContrib is meant to be a share based between both as they both need serialization of binary data. This is comparable with
Tika which I tried as well. It does have some advantages over
StringifierContrib, i.e. also able to return meta data in an xml format, like exif data of a photo. However, I abandoned Tika as basic tests on various office files showed that
StringifierContrib had a higher coverage while being more robust against weird office files.
StringifierContrib has been externalized and improved over time. For one, it does not crash on password encoded xls files etc.
Coming soon: caching - don't re-stringify the same binary file if it has changed in the meantime. That's part of
SolrPlugin for no good reason.
There are still some more fundamental issues with
StringifierContrib inherited from its origins, foremost it recodes all strings to iso-8859-15. This might very well clash with the rest of the site's encoding. Next, its per line way of trying to detect a documents encoding might be overkill. So there's room for speed improvements here.
Other than that, it is a solid piece of work.
--
MichaelDaum - 27 Feb 2010
Done.
--
AndrewJones - 06 Jun 2010
There may be something wrong with this work?
* AndreU has quit (Quit: AndreU)
<alice|wl_> any help with kinosearch here?
<alice|wl_> accoring to ks_test I miss a Foswiki/Contrib/KinoSearchContrib/StringifyBase.pm
<alice|wl_> it is not in the package
--
SvenDowideit - 12 Jul 2010
I have removed the
ks_test
script from the contrib, as the functionality is covered with the
stringify
script in
StringifierContrib.
--
AndrewJones - 14 Jul 2010
Forgot to remove config items.
--
AndrewJones - 25 Mar 2011