Item2646: Fix search code so that forking also works reliably in Windows
Priority: Normal
Current State: Needs Developer
Released In: n/a
Target Release: n/a
We seem to have a problem with the search code. It does not correctly handle the maximum length command line you have available in Windows.
At the moment we recommend Windows users to use
PurePerl. But we really should be able to use forking as well.
It may work today depending on the length of the path to the Foswiki installation directory.
This is the text from a previous bug
Item2504 which was resolved with the work around to recommend pure perl search for Windows.
I have refactored the information from 2504 below.
--
KennethLavrsen - 16 Jan 2010
Take this simple
METASEARCH
<table>
%METASEARCH{type="parent" web="%WEB%" topic="WebHome" format="<tr><td>[[$web.$topic][$topic]]</td></tr>"}%
</table>
This successfully shows the children of
WebHome as delivered with Foswiki 1.0.8 (stricly 1.0.6 upgraded with 1.0.7 and 1.0.8 upgrades), but not my own topics which have this as their parent.
Using degug=raw to show you one topic (of about 4) that does not get found (I had to remove % in front of META as it gets removed on save otherwise):
META:TOPICINFO{author="JulianLevens" date="1260800812" format="1.1" version="1.1"}%
META:TOPICPARENT{name="WebHome"}%
---+!! !PCI
Whereas this provided topic (WebNotify) with foswiki:
META:TOPICINFO{author="ProjectContributor" date="1231502400" format="1.1" version="1"}%
META:TOPICPARENT{name="WebHome"}%
Is found as part of that search.
However, if I switch to
PurePerl, then both items are found.
But
This search:
%SEARCH{ "*." topic="FileSend*Converter" scope="text" type="regex" nosearch="on" nonoise="on" nototal="off" format="$n()---+++$topic%BR%$percntINCLUDE{\"$topic\" section=\"summary\"}$percnt"}%
Works fine with the Forking algorithm, but pure perl gives me this output on the page:
Could not perform search. Error was: Quantifier follows nothing in regex; marked by <-- HERE in m/* <-- HERE ./ at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/SearchAlgorithms/PurePerl.pm line 41, line 1. at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/SearchAlgorithms/PurePerl.pm line 41 Foswiki::Store::SearchAlgorithms::PurePerl::__ANON__('META:TOPICINFO{author="JulianLevens" date="1249485343" forma...') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/SearchAlgorithms/PurePerl.pm line 47 Foswiki::Store::SearchAlgorithms::PurePerl::search('*.', 'ARRAY(0x1f43e0c)', 'HASH(0x1fa2494)', 'C:/PROGRA~1/Foswiki/Foswiki_1_0_8_pa/data/Main/', undef, 'Main') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store/RcsFile.pm line 332 Foswiki::Store::RcsFile::searchInWebContent('Foswiki::Store::RcsLite=HASH(0x1ff31ac)', '*.', 'ARRAY(0x1f43e0c)', 'HASH(0x1fa2494)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Store.pm line 2029 Foswiki::Store::searchInWebContent('Foswiki::Store=HASH(0xf088e4)', '*.', 'Main', 'ARRAY(0x1f43e0c)', 'HASH(0x1fa2494)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Search.pm line 260 Foswiki::Search::_searchTopics('Foswiki::Search=HASH(0x1780f14)', 'Main', 'text', 'regex', 'HASH(0x1780ef4)', 'ARRAY(0x1f6050c)', 'FileSendBinConverter', 'FileSendCountsConverter', 'FileSendEZTSVConverter', ...) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Search.pm line 680 Foswiki::Search::searchWeb('Foswiki::Search=HASH(0x1780f14)', 'inline', 1, 'topic', 'FileSend*Converter', 'search', '*.', 'basetopic', 'FileSend', ...) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 3836 Foswiki::__ANON__() called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 379 eval {...} called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 371 Error::subs::try('CODE(0x17800d4)', 'HASH(0x1780e54)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 3845 Foswiki::SEARCH('Foswiki=HASH(0x74fe2c)', 'Foswiki::Attrs=HASH(0x1780d14)', 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 2872 Foswiki::_expandTagOnTopicRendering('Foswiki=HASH(0x74fe2c)', 'SEARCH', ' "*." topic="FileSend*Converter" scope="text" type="regex" no...', 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 2777 Foswiki::_processTags('Foswiki=HASH(0x74fe2c)', '---+!! File Send\x{a}%TOC%\x{a}%STARTSECTION{type="include"}%\x{a}---++ I...', 'CODE(0xe13954)', 16, 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 2694 Foswiki::expandAllTags('Foswiki=HASH(0x74fe2c)', 'SCALAR(0xe14144)', 'FileSend', 'Main', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki.pm line 3022 Foswiki::handleCommonTags('Foswiki=HASH(0x74fe2c)', '---+!! File Send\x{a}%TOC%\x{a}%STARTSECTION{type="include"}%\x{a}---++ I...', 'Main', 'FileSend', 'Foswiki::Meta=HASH(0x1d7d0a4)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI/View.pm line 388 Foswiki::UI::View::_prepare('---+!! File Send\x{a}%TOC%\x{a}%STARTSECTION{type="include"}%\x{a}---++ I...', 'Foswiki=HASH(0x74fe2c)', 'Main', 'FileSend', 'Foswiki::Meta=HASH(0x1d7d0a4)', 0) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI/View.pm line 368 Foswiki::UI::View::view('Foswiki=HASH(0x74fe2c)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI.pm line 304 Foswiki::UI::__ANON__() called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 379 eval {...} called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/CPAN/lib//Error.pm line 371 Error::subs::try('CODE(0x936c54)', 'HASH(0x1d845dc)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI.pm line 391 Foswiki::UI::_execute('Foswiki::Request=HASH(0xec2874)', 'CODE(0xec755c)', 'view', 1) called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/UI.pm line 275 Foswiki::UI::handleRequest('Foswiki::Request=HASH(0xec2874)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/lib/Foswiki/Engine/CGI.pm line 29 Foswiki::Engine::CGI::run('Foswiki::Engine::CGI=HASH(0xd75c04)') called at C:/Program Files/Foswiki/Foswiki_1_0_8_pa/bin/view line 45
It was found that the search had a typo but there was still an error. This text has been removed for clarity.
The typo was %SEARCH{ "*." should have been %SEARCH{ ".*"
After some testing I found this in the Apache error logs. Crucially, the Sandbox::sysCommand error is not created when I switch to
PurePerl searching.
This would suggest that the forking algorithm is not entirely successful calling grep under Windows. Is this a clue?
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] [Tue Dec 22 04:53:12 2009] CGI.pm: Use of uninitialized value $_ in -d at C:/strawberry/perl/lib/CGI.pm line 4083., referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] [Tue Dec 22 04:53:12 2009] CGI.pm: Use of uninitialized value $_ in -d at C:/strawberry/perl/lib/CGI.pm line 4083., referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8885), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8782), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8897), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8873), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8765), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8899), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8796), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8911), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8887), referer: http://tw4-wiki/
[Tue Dec 22 13:53:14 2009] [error] [client 10.132.96.221] WARNING: Sandbox::sysCommand commandline probably too long (8779), referer: http://tw4-wiki/
This certainly gave me an idea. I created a topic called WikiChild1 and that appears even under the forking algorithm. This possibly explains why the apparently standard topics Web... and Wiki... appear whereas mine earlier in the alphabet do not. It suggests the possibility to me that a list of topics to search is passed to grep and when this list is too big the early entries are ignored. Remember I'm running on Windows (more details above) which may be relevant.
--
JulianLevens - 22 Dec 2009
If the problem is that grep cannot take all the topics under Windows because of command line max, then there is no solution other than
PurePerl which is not a bad solution I would say.
Then the actions would be
- Change the configure setting to non expert
- Document in configure that PurePerl should be used for native Windows (not cygwin) - if I understand the conclusion right
- Put a note in InstallationGuide about this.
This is what we decided to do in Item2504
--
KennethLavrsen - 05 Jan 2010
Had a chat with Sven and my assumption is wrong.
Here is the IRC log
[02:11] <Lavr> Sven what is your view on http://foswiki.org/Tasks/Item2504? Should Windows always search with PurePerl?
[02:12] <SvenDowideit> no :)
[02:12] <SvenDowideit> my opinion is that we should fix our bugs
[02:13] <SvenDowideit> grep used to work quite well on windows, but somewhere it stopped being as reliable
[02:14] <SvenDowideit> its a bit surprising because i recal running the unit tests on windows last time i had time
[02:14] <Lavr> The submitters argument is that it fails because Windows has a limit on max number of characters on command line. So maybe it will always fail if we pass too long a string to grep?
[02:14] <SvenDowideit> that has always been the case
[02:14] <SvenDowideit> on unix too
[02:14] <SvenDowideit> thats why there is code there attempting to deal with it
[02:15] <Lavr> it seems this is the reporters problem. Try and read his follow up carefully.
[02:15] <SvenDowideit> but the attempt is very simplistic -
[02:15] <SvenDowideit> if i had time, i would have already
[02:15] <SvenDowideit> at this point i'm already stealing time to try the rc
[02:15] <Lavr> That is OK. I just wanted to hear your view and I got it. Thanks
[02:16] * toffe82 has joined #foswiki
[02:16] <SvenDowideit> i did put a comment in the sandbox code wrt how we should recode it iirc
[02:16] <SvenDowideit> atm it chooses a number of topics to add to the command line each time
[02:17] <SvenDowideit> what it should to is calculate based on the length of the data path and the topic names +1space
[02:17] <SvenDowideit> and then compare to the known length of the command buffer
[02:17] <Lavr> Ah so it can fail if people on the site uses very long topic names
[02:18] <SvenDowideit> (also slightly doccoed in the code i think)
[02:18] <SvenDowideit> worse
[02:18] <SvenDowideit> it will fail more the longer the path to the data dir
[02:18] <Lavr> blast
[02:18] <SvenDowideit> ie - c:\ProgramFiles\foswiki\foswiki\data
[02:19] <SvenDowideit> but the maths would not be that hard to impl
[02:19] <Lavr> I'll copy this IRC trail to the report.
--
KennethLavrsen - 08 Jan 2010
PurePerl will be OK with documentation updated for other Windows users. I'll try to look at fixing the forking option, but no idea how long that will take, my workload is way too high right now. As I intend to move to Fast CGI eventually, will the
NativeSearchContrib be another alternative or does that also depend on grep and have similar issues?
--
JulianLevens - 11 Jan 2010
I use
NativeSearchContrib with FastCGI and ModPerl and it works well as far as i can tell, but I never tested it on windows.
--
GilmarSantosJr - 11 Jan 2010
Rleated?
Support.Question380
--
PaulHarvey - 14 Jan 2010
This new item derived from
Item2504 has been set to Waiting For
JulianLevens and set to Urgent with 1.1.0 scope.
--
KennethLavrsen - 16 Jan 2010
No way is this urgent.
NativeSearchContrib (which does
not use grep or any other command-line tool) has worked on Windows for a long time now. Flipping back to "Normal" and "Waiting" to see if we can determine what the real problem is here, because it's not clear from the discussion above (several people talking at cross-purposes, AFAICT)
--
CrawfordCurrie - 09 Apr 2010
During my investigations I found that the code already have comments/code pertaining to this:
# process topics in sets, fix for Codev.ArgumentListIsTooLongForSearch
my $maxTopicsInSet = 512; # max number of topics for a grep call
#TODO: the number is actually dependant on the length of the path to each file
#SMELL: the following while loop should probably be made by sysCommand, as this is a leaky abstraction.
##heck, on pre WinXP its only 2048, post XP its 8192 - http://support.microsoft.com/kb/830473
$maxTopicsInSet = 128 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );
I also note
http://partmaps.org/era/unix/arg-max.html, that max topics of 512 could cause problems on non-windows boxes, albeit unlikely and on old systems.
As far as I can see it's not possible to pass, to grep, the files to process via another file.
A quick fix for me if I set change this last line above to:
$maxTopicsInSet = 64 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );
then I can use the forking algorithm successfully. By adding some prints to STDERR I was able to get a better idea of how the forking
sub search
calls
sandbox->systemCommand
and how systemCommand expands this into the final command line. systemCommand handles this expansion a little differently between OS'es, so a one size fits's all calculation is not easy. The calculation would also have to be done each time around the loop; e.g. when searching a total of 1000 topics use 100 topics in first set and 75 in the next and so on, depending on the length of topic names selected.
Note that the new size of 64, I did calculate based on our set-up as reasonable, and indeed probably is for the vast majority of Windows set-up. Of course if someone else has a deep directory structure and/or long web-name and/or long topic names and/or a large regex then that could still break. It's pretty clear that 128 is just too generous.
The general problem with this idea is that it would create a fragile link between
systemCommand
and the forking
sub search
. A change to
systemCommand
could cause the calculation to be too generous and grep too fail.
Pushing this logic into systemCommand would make more sense, but systemCommand was designed, quite reasonably, to be agnostic to the nature of the command passed, by using templates and so on. It will require introducing a special list of params that are expanded into the template last. The idea being that if 1000 characters are already consumed, then 7192 are left, then by inserting one 'special' param one item at a time just before the total budget is used up would allow for this. It would also require systemCommand to return the list shortened by those consumed and the caller of course to handle that appropriately.
Is this a reasonable approach? Or have missed something important?
A performance question: at what size will forking become slower than pure-perl due to increased overhead of extra grep calls?
Note: a very simple patch is possible (to 64 from 128) to allow the forking search to work on Windows, but caveats would need to added to the docs. Could a configure variable be used to set the limit? Indeed, is all the work suggested above overkill?
--
JulianLevens - 16 Apr 2010
Some supporting maths (where appropriate size includes all escaping prior to passing to grep):
Element |
Size |
Notes |
grep command |
128 |
overkill, on my machine it's 54: c:/PROGRA~1/GnuWin32/bin/grep "-E" "-i" "-l" "-H" "--" |
single regex |
512 |
overkill, my largest is around 150 and that's with a number of AND (ie ';') chunks, and aren't these chunks broken down and grepped one by one? |
That leaves 7552 (8192 - 128 - 512)
Element |
Size |
Notes |
Path name |
50 |
for a particular set-up this is a fixed size: eg C:\\PROGRA~1\\Foswiki\\Foswiki_1_0_8_pa\\data\\ with some effort (renaming directory and reconfiguring) this could be made quite a bit smaller if necessary |
Extras |
6 |
\\ + .txt |
Web Name |
12 |
these two will vary topic by topic |
Topic Name |
38 |
but an average length of 50 combined is reasonable |
That's 106 per topic in the search. That suggests a maximum of 71 topics to pass to each grep call (7552 / 106). Turning that on it's head and setting the limit to 64, allows an extra 12 chars per average topic name length. (Or pessimistically set the limit to 32 to allow for very large topic path names).
- In theory this will not cater for all Windows installations
- In practice this will cater for all Windows installations †
On my set-up with the current Windows limit set to 128 topics my searches were only failing by a few hundred bytes not thousands.
So my 'patch', such as it is, follows:
Forking.pm
circa line 68:
- $maxTopicsInSet = 128 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );
+ $maxTopicsInSet = 64 if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' );
Sorry, I'm not au fait with SVN GIT et al just yet.
†: probably
--
JulianLevens - 23 Apr 2010
Hold that thought, a number of my application searches are failing. I doubt this has anythig to do with the above, but suggests another problem with
Forking.pm
on Windows.
More to follow ...
--
JulianLevens - 27 Apr 2010
I've amended
Forking.pm
but updating the following block as follows:
if ( $Foswiki::cfg{DetailedOS} eq 'MSWin32' ) {
#try to escape the ^ and "" for native windows grep and apache
$searchString =~ s/\[\^/[^^/g;
# Fix escaping and quoting for Windows
$searchString =~ s#\\#\\\\#g;
$searchString =~ s#"#\\"#g;
$searchString = q(") . $searchString . q(");
}
My searches are now largely correct, with one fly in the ointment being the need to convert '\.' within searches to [.] (and I cannot be sure that there are further special cases with '\').
However, I've also done some rough timing (in seconds) just using stopwatch on the browser as follows.
Application |
Forking 64 |
Forking 100 |
PurePerl |
List company applications |
53 |
21 |
12 |
List company teams |
65 |
48 |
15 |
I feel that the difference in timing is sufficient to be considered significant, even though this performance testing has not been exhaustive.
Forking 64/100 refers to the number of files allowed per grep. It suggests the possibility that performance could indeed improve considerably if 512 files per set were possible as in Unix/Linux.
I am left to recommend
PurePerl for Windows and rely on:
- Future foswiki caching to improve performance
- Possibly NativeSearchContrib
- Enhance grep on windows to allow all the files to searched, to be specified in one go, by writing them to a separate file but ...
--
JulianLevens - 28 May 2010
we should probably commit the patch, even if its incomplete, as its an improvement..
--
SvenDowideit - 08 Jul 2010
in a depressing bit of parallel evolution, I made the maxTopicInSet dependant on some maths in
Item9134 - but I'm adding the quoting fixes to see how things improve.
mmm, it might not come to anything tho - it looks like trunk forking on windows is pretty unhappy.
--
SvenDowideit - 15 Jul 2010