This is the topic to discuss development of
StringifierContrib
If you need support, go to
Support.StringifierContrib where you can ask questions and find answers to previously asked questions.
If you want to report a bug, or a feature request, go to
Tasks.StringifierContrib where you can see already submitted issues and where you can submit a new bug report or feature request.
Active Items
Discussion
I use Apache Tika to stringify - it's one actively maintained package that converts many file formats.
Currently it's on a TWiki site, not a FOSwiki site, but I wanted to let folks know it's an option,
and I'm happy to provide more detail if there is interest.
--
ClifKussmaul - 21 Dec 2011
Clif, yes please - it makes sense for us to support different options to cater for setups
--
SvenDowideit - 22 Dec 2011
Steps:
- Download tika.jar from tika.apache.org, and save it (e.g. /usr/local/tika).
- Download tika2txt (attached below), adjust the paths & Java options, save it (e.g. /usr/local/bin), and test.
- Download Tika.pm (attached below), adjust the paths and file types handled, and save it with other Stringifiers.
- Disable other stringifiers to avoid conflicts, and test.
--
ClifKussmaul - 22 Dec 2011
I've been evaluating tika in depth and found out that it has a lower coverage of various office formats compared to
StringifierContrib.
--
MichaelDaum - 23 Dec 2011
it might have lower coverage, but its useful to have alternatives that might have either different results, or might suit the admin better
--
SvenDowideit - 23 Dec 2011
On a client site with pdf, doc/docx, ppt/pptx, xls/xlsx files, results for the largest files of each type seemed similar, and sometimes favored tika. We had some problems with PPTX in tika 0.9 that were fixed in tika 1.0. YMMV, of course.
--
ClifKussmaul - 23 Dec 2011