Item9900: fall back to HTML::TreeBuilder if html2text
not available
Priority: Normal
Current State: No Action Required
Released In: n/a
Target Release: n/a
--
WillNorris - 26 Oct 2010
Note, I deliberately switched from HTML::TreeBuilder to html2text as the former has got significant encoding problems. Their output isn't equivalent either, which results in significant indexing differences. As the html converter is reused by other converters as well, the impact of changing this work horse is quite significant.
--
MichaelDaum - 27 Oct 2010
is there another pure perl module that could be used as a fallback?
--
WillNorris - 27 Oct 2010
What is the state of play with this?
I have a stringify problem in that there is no html2text when it tries to index a PPT file I have uploaded.
This is the full output from the Cron log when Kinoupdate ran:
Uncaught exception from user code:
exec of html2text -ascii %FILENAME|F% failed: No such file or directory at /home2/mydomain/public_html/foswiki/lib/Foswiki/Sandbox.pm line 542.
at /usr/lib/perl5/site_perl/5.8.8/Error.pm line 184
Error::throw('Error::Simple', 'exec of html2text -ascii %FILENAME|F% failed: No such file or...') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Sandbox.pm line 542
Foswiki::Sandbox::sysCommand('Foswiki::Sandbox', 'html2text -ascii %FILENAME|F%', 'FILENAME', '/tmp/eOi3Kxb968') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/StringifierContrib/Plugins/HTML.pm line 31
Foswiki::Contrib::StringifierContrib::Plugins::HTML::stringForFile('Foswiki::Contrib::StringifierContrib::Plugins::HTML=HASH(0x2b...', '/tmp/eOi3Kxb968') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/StringifierContrib.pm line 43
Foswiki::Contrib::StringifierContrib::stringFor('Foswiki::Contrib::StringifierContrib', '/tmp/eOi3Kxb968') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/StringifierContrib/Plugins/PPT.pm line 46
Foswiki::Contrib::StringifierContrib::Plugins::PPT::stringForFile('Foswiki::Contrib::StringifierContrib::Plugins::PPT=HASH(0x2b0...', '/home2/mydomain/public_html/foswiki/pub/MyCompany/ZincRedox/Zi...') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/StringifierContrib.pm line 43
Foswiki::Contrib::StringifierContrib::stringFor('Foswiki::Contrib::StringifierContrib', '/home2/mydomain/public_html/foswiki/pub/MyCompany/ZincRedox/Zi...') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/KinoSearchContrib/Index.pm line 568
Foswiki::Contrib::KinoSearchContrib::Index::indexAttachment('Foswiki::Contrib::KinoSearchContrib::Index=HASH(0x2a1c990)', 'KinoSearch::InvIndexer=HASH(0x2afdfa0)', 'MyCompany', 'ZincRedox', 'HASH(0x170f430)') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/KinoSearchContrib/Index.pm line 535
Foswiki::Contrib::KinoSearchContrib::Index::indexTopic('Foswiki::Contrib::KinoSearchContrib::Index=HASH(0x2a1c990)', 'KinoSearch::InvIndexer=HASH(0x2afdfa0)', 'MyCompany', 'ZincRedox', 'WIKIWEBMASTER', 1, 'TopicSummary', 1, 'INCLUDEWARNING', ...) called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/KinoSearchContrib/Index.pm line 336
Foswiki::Contrib::KinoSearchContrib::Index::addTopics('Foswiki::Contrib::KinoSearchContrib::Index=HASH(0x2a1c990)', 'MyCompany', 'ZincRedox') called at /home2/mydomain/public_html/foswiki/lib/Foswiki/Contrib/KinoSearchContrib/Index.pm line 116
Foswiki::Contrib::KinoSearchContrib::Index::updateIndex('Foswiki::Contrib::KinoSearchContrib::Index=HASH(0x2a1c990)', '') called at ./kinoupdate line 38
The error left a load of files in the Kinosearch directory, so I cleared them out and ran Kinoupdate. That failed too. So now my index is missing.
--
BobCorless - 09 Nov 2010
I installed this Perl module
http://search.cpan.org/~kryde/HTML-FormatExternal-18/
It appears to include
Html2Text.
However, I get the same error in Stringify at line 542.
--
BobCorless - 10 Nov 2010
Since installing HTML::FormatExternal, the cron job has been running without errors. However, the PPT is not getting indexed. Running stringify against the PPT from a command line results in the error "exec of html2text -ascii %FILENAME|F% failed: No such file or directory at /home2/mydomain/public_html/foswiki/lib/Foswiki/Sandbox.pm line 542"
--
BobCorless - 11 Nov 2010
Bob, please install html2text. This is a new dependency. The old way to convert html to text was flawed wrt utf8 encoding. That's why I changed the backend. Should be fine afterwards.
--
MichaelDaum - 11 Nov 2010
html2text is included in the HTML::FormatExternal which I have already installed. If I uninstall HTML::FormatExternal and specifically install just HTML::FormatText::Html2text (18) it installs HTML::FormatExternal.
I still get the error described above when running stringify from a prompt. Kinoupdate runs without errors (command line or rest) but does not index the ppt.
--
BobCorless - 11 Nov 2010
Please install the correct
html2text
This one is NOT a perl base one. See
http://www.mbayer.de/html2text/
--
MichaelDaum - 12 Nov 2010
Ah OK. Easy when you know how......
..... however.....
I think I installed it correctly following the instructions because when I run the stringify command from the tools directory and nominate the PPT file, I do get output with text lines from the PPT file. But these are not showing up in the Kinosearch after both a kinoupdate and kinoindex.
--
BobCorless - 12 Nov 2010
Something is wrong with my Stringifier install because I've uploaded a DOC for the first time on 1.1.1 and I get errors about antiword path. This worked in 1.0.9 so I need to investigate where I've gone wrong.
--
BobCorless - 12 Nov 2010
Please let me know what came out of it so that I can update the docu if necessary in case this will help upgraders of stringifier or kino.
--
MichaelDaum - 12 Nov 2010
Micha, you forgot to add html2textCmd in configure so user can define it (useful in hosted environment when it's not in the PATH). I also changed the name so it's more aligned with the rest of the plugins.
--
OlivierRaginel - 12 Nov 2010
Thanks for fixing that
--
MichaelDaum - 12 Nov 2010