This question about Missing functionality: Answered
Rendering of broken links
In our installation, we create project pages which have a template which creates links to documentation (pdf's for example) that live on our network. These links are actually available through our apache server, so in essence they are sitting on our apache server. for example:
http://ourwiki.company.com/v/somedocument.pdf
I would like to figure a way to display some sort of special "broken" icon if that actual pdf file is not existing in the link target as specified. Currently you have to click on the link and then get redirected tot he "link missing" page.
It would be great if there was a way to render the link (prior to clicking it) in a way the showed that the destination was missing. Kind of similar to the way that undefined wikiwords get decorated with "?" automatically.
--
JimParker - 04 Jun 2012
None of the current extensions I could find will provide this feature. It would add a bit of page load time and server load to check every external link, so some sort of server caching of the link status would be most likely required.
The
ExternalLinkPlugin - which marks off-site links with a small ICON could probably be extended to do checking like this. But unfortunately nothing off the shelf comes to mind.
--
GeorgeClark - 04 Jun 2012
Ok good idea. I will see of I can hack that plugin
I only want to do this checking on only a handful of links on only certain pages so I might be able to deal with performance.
--
JimParker - 05 Jun 2012
Hmmm....I would need some sort of function i could call from Perl that would tell me if the target link is valid or not....if i had that info, i could add code to decorate the external link one way for valid links and another way for invalid links.
Any ideas?
--
JimParker - 05 Jun 2012
The
CPAN:LWP library can be used to fetch URL's using perl. Rather than issuing a GET, it would probably be better to use the HEAD operation so that the complete download is avoided. Foswiki::Net also provides an http read function, but it doesn't expose the HEAD method.
--
GeorgeClark - 08 Jun 2012
Thanks, good tip. I had already implemented this with a simple call to LWP::Simple::head(). It works fine now. I even added a config field where the user can select the icon to display for a good or bad link.
The next challenge i have is that now i want to be able to selectively turn off the call to "head" as it causes slower page loading (as expected!). Its a challenge to me, the novice, because this is not actually a macro extension, but soemthing that gets invoked automatically for all external links. Still debating on approach.
--
JimParker - 11 Jun 2012
One last question: how could i modify this to make it also process links that are directly written into the page without the
text format? I am having a tough time decoding the regex that does it now.
--
JimParker - 11 Jun 2012
Matching HTML links can get a bit tricky, There is a regex for finding html tags in the
WysiwygPlugin/TML2HTML.pm
that might be helpful:
$text =~ s/(\<a
(?:\s+
(?: href|target|title|class )= # Supported attribute
(?: \'[^\']*\' | \"[^\"]*\" | [^\'\"\s]+ )+ # One or more SQ, DQ or space delimited strings
)+ # One or more attributes - href is required
\s*\>
.*? # the link text
\<\/a\s*\> # closing tag
)/
$this->_liftOutLink($1)/geixo;
This pulls out the entire <a>...</a. tag for processing by the _liftOutLink subroutine. It's written in the "x" regex format so that the individual components can be documented.
In thinking about this task, note that this plugin currently uses the
commonTagsHandler
which is probably at the wrong time. %MACROS will not have been completely expanded, so link verification might fail if the links are made up of %MACRO results. You might have to do some of this work in a postRenderingHandler. I'm not sure where the best place to do this would be.
--
GeorgeClark - 12 Jun 2012
Well, it DOES work with $formfield()'s being expanded first. I have a search that results in $formfield extraction and i use that in a table format statement which results in a link generation. The link is properly adorned.
The bigger issue i have is that i wanted a raw
http:// link in the page to be checked. But if the raw
http:// url isn't bracketed (
text) then the
CommonTagsHandler() never gets a crack at the url.
So my regex isn't going to get a chance to work.
--
JimParker - 12 Jun 2012
Ok, new update. I DID get this to work....i WAS getting the info i needed in the commonTagsHandler(). I just needed a special regex that would find the right links i wanted to test against.
--
JimParker - 13 Jun 2012
Is there any way to check all pages for broken links? The idea is to do this as a periodic maintenance task to make sure all external links are still valid.
--
DonaldFast - 24 Jan 2014
Never mind -- I see that general site link checker will work. ( i picked the wrong one and it was failing )
--
DonaldFast - 24 Jan 2014