Item13134: HTML::Tidy fails to validate html5
Priority: Urgent
Current State: Closed
Released In: 2.1.7
Target Release: patch
Applies To: Extension
Component: UnitTestContrib
Branches: master Release02x01
HTMLValtidation tests currently fail mostly due to HTML::Tidy's incapability to deal with HTML5.
Example:
<meta charset="utf-8" />
... produces
line 5 column 19 - Warning: <meta> proprietary attribute "charset"
line 5 column 19 - Warning: <meta> lacks "content" attribute
... which is obviously wrong.
Similarly any custom
data-*
as introduced by HTML5 will cause validation warnings.
See also
Item10739. Paul's approach was to filter tidy's output and ignore those warnings.
Validating HTML5 (under perl) doesn't seem to be so easy. Alternatives are:
Note that there are more reasons to look into replacing HTML::Tidy with another HTML validator. For example, building HTML::Tidy using
cpanm
requires tidyp, yet another fork of the original tidy project. However tidyp isn't shipped by distributions.
So you have to download and compile tidyp yourself before cpanm-ing HTML::Tidy.
--
MichaelDaum - 05 Dec 2014
Best approach atm is to filter out these false alarms. I have downloaded the html5 aware tidy version to check that no problems persist.
--
MichaelDaum - 13 Dec 2014
Task::HTML5 could be an option
--
MichaelDaum - 26 Feb 2016
Task::HTML5 has seen its latest release in 2011, so I assume it is not maintained anymore.
The latest tool is called HTML::Tidy5 as per
https://github.com/petdance/html-tidy5 and
http://www.html-tidy.org/
However it does not comple ootb:
https://github.com/petdance/html-tidy5/issues/3 as it requires tidy-5.6.0 ...yet debian and ubuntu only have 5.2.0.
The
tidy
commandline tool seems to be fine though.
--
MichaelDaum - 02 Jun 2019
HTML::Tidy5 is probably the correct route. it is from the same author as HTML::Tidy and depends on htmltidy which both appear to be actively maintained
--
TimothyLegge - 23 Feb 2020