This question about : Answered
Solr search doesn't show contents if the attchment is .doc or .pdf
Not quite sure should I ask Foswiki or Solr search engine?
We use solr search engine with Foswiki. In the search, if the attachment is .doc or .pdf, the contents is showed as non-recognised like below, but if the attachme is .txt or .xlsx, it's fine. Any advice is appreciated.
?? ?? ? ?? ? ????????????????????????????????????????????????????????????????????????????????????????????????????????????????¿? ????? ????? ? ? ? ·? ?? ? ? ? ?? ??? ? ? ????? ????? ? ? ? ????????? ? ? ? ? ???????? ? ? ? ? ? ? ? ? ? ? ????? ? ? ??? ? ? ? ? ?
P.S. We don't use natskin plugin.
--
YangShen - 30 Nov 2014
This happens when the helpers that you configured in
StringifierContrib are unable to read the files correctly. Best results are achieved using
$Foswiki::cfg{StringifierContrib}{WordIndexer} = 'soffice';
$Foswiki::cfg{StringifierContrib}{PowerpointIndexer} = 'soffice';
or
$Foswiki::cfg{StringifierContrib}{WordIndexer} = 'wv';
Password-protected office files may fail as well. These are only readable with a user interaction, i.e. not full-text-indexable.
One you changed these settings go to
<foswiki-dir>/working/work_areas/SolrPlugin/
and delete all subdirectiries in there.
These are the cached stringified versions of your office documents. Once you've deleted them your next indexing run will extract
the office docs using the newer stringifier settings.
There is a testing tool available as well: see
<foswiki-dir>/tools/stringify <file-name>
that you can use to fire up stringifier on
the commandline.
--
MichaelDaum - 01 Dec 2014
'wv' was not working when I tried. 'antiword' looks the correct one. All others are good. Thanks a lot.
--
YangShen - 14 Jan 2015