Item10149: versions
query w/SEARCH, RCS store = hang
Priority: Enhancement
Current State: Confirmed
Released In: 2.2.0
Target Release: minor
Applies To: Engine
Component: SEARCH
Branches: trunk
I've not had much luck with the
versions
query in
SEARCH.
It works fine for individual topics with
QUERY, but
SEARCH is just too slow unless you only have a few dozen topics.
I think we need a strategy before we ship this. At a minimum, disable
versions
in
SEARCH with RCS stores, or alternatively we introduce a new feature to make SEARCHes abort long-running queries in a graceful manner (can we just return a shorter-than-normal resultset, and log/emit an error?).
The query I ran on trunk.foswiki.org was something like:
%SEARCH{
"versions[author='PaulHarvey']"
web="Development"
limit="50"
}%
Which ran for several minutes before Koen killed the process for me.
--
PaulHarvey - 11 Dec 2010
Yeah, well, TBH I'm not that surprised. A versions query has to load a shitload of information just to query, and that's not efficient. As you say, the RCS store just isn't built for this kind of query.
Should this be a fix just for a versions query, or is there a more general problem, that a user should be able to put a limit on the amount of time spent on a query? A general mechanism would work for other types of bad query.
--
CrawfordCurrie - 11 Apr 2011
I don't have any good ideas on how to control a hypothetical "search timeout" feature. Configure setting? Macro param? URL param? What should be the default?
On my own wiki, I have some reports that just take 10s of seconds, and a dot graph that takes minutes - I need to be able to run those from a cron job, where I save the output back into "cache" topic which is refreshed every hour.
Let's say we have a default
SEARCH timeout of 10s - I need a way for my cron-job scenario to override it so that it will run to completion from CLI.
Additionally: I don't know about general Foswiki practice, but the biggest troubles on my wiki are nested searches. If there's 200 SEARCHes as a result of an outer
SEARCH, how do we apply a "timeout" in that case, if each individual
SEARCH is still on the order of ~2s?
Hmm, so at first glance it seems that a general solution is a can of worms.
If I get time to invest in this, I think I'd rather try to ship a SearchAlgorithm which continues to work with RCS store but perhaps caches just the
%META
part of every topic version in
working
somehow (reproducing the
data/
directory layout, but
Topic.txt,v
would be a directory)
--
PaulHarvey - 11 Apr 2011
I have added a
caveat emptor to the
QuerySearch topic to warn of the performance risks. Having done this I think it is valid to regrade this report from 'Urgent' to 'Enhancement'.
--
CrawfordCurrie - 21 Jan 2013
I think that caching of the RCS log information plus the %META would be good as a basic feature of the RCS based Store. Anything that needs to dip into the rcs log records, for ex. the Attachment display of the comment field, is horribly slow. Avoiding RCS to access the topic metadata history without a full rcs pass would be very helpful. Maybe store it along side the
file
, and
file,v
as
file,meta
.
--
GeorgeClark - 21 Jan 2013
Crawford. Any more to add to Caveat Emptor with regard to the
PlainFileStore? Is the very poor performance of the versions search alleviated with the PFS
--
GeorgeClark - 12 Jan 2015
Anecdotally yes, but I haven't benchmarked it. The RCS performance is down to having to reconstruct every previous revision from the
,v
. Since the PFS stores the old revisions as plain text, it
should be much faster.
--
CrawfordCurrie - 17 Jan 2015
Missed the release meeting where this was discussed, so here's my take on it.
RCS is fundamentally slow for versions, and I personally think it's a waste of time trying to do anything with it. As I said above I haven't benchmarked the PFS for these queries, a first step before any other work would be to do this.
One way to accelerate these queries would be to use a store cache, like the
DBIStoreContrib (which uses an SQL DB to cache the store in a structured DB for accelerated queries). Personally I think that's the best approach. The
DBIStoreContrib doesn't currently cache old revisions, but it could be made to do so. Either way, I think this should be taken out of the mainstream release plan, and if anyone really wants it, they can extend
DBIStoreContrib (or fund that work).
Another type of store might be another valid approach - one that simply stores entire topics and their history in a DB for rapid retrieval, avoiding file system accesses. Could be an interesting project, especially when coupled with
MichaelDaum's ideas on minimising topic re-reads.
--
Main.CrawfordCurrie - 21 Mar 2017 - 08:29