Item5718: if file dates don't match the edit time, SEARCH and limit will return incorrect results.
Priority: Normal
Current State: Confirmed
Released In: n/a
Target Release: n/a
Applies To: Engine
Component: SEARCH
Branches:
When I tell WebChanges to give me more results by clicking 100, 200, 500 etc, it gives me
different results. This seems to happen with
SEARCH generally, at least when using
order
and
reverse
. The simplest way to describe it is that increasing
limit
increases the number of things it
finds. Example:
TWiki web with less than 300 topics. A team made possibly hundreds of edits to many dozens of topics in the space of a day, 18 Jun 2008. Going into WebChanges, they expected to see basically all displayed changes as being from 18 Jun 2008, but there were a few from today (19 Jun 2008), then several from 18 Jun 2008, then the rest were from
before the 18th.
So I've done some testing. I searched (CTL+F) for the string " -18 Jun 2008 - " on the WebChanges result (this matched the rev line for each result uniquely). Here's what I got:
Limit |
Hits |
50 |
19 |
100 |
39 |
200 |
54 |
500 |
66 |
- If there were 66 hits in reverse chronological order in a listing of 500, they should also show up in both the 100- and 200-limit searches. (When I did this testing, there were 17 results from today, the 19th.)
- In every case, results from before 18 Jun 2008 were displayed at the bottom, after the 18 June 2008 hits - even on the 50-limit search, which should have at least tailed out still displaying hits from this date.
- Also, the order changes. Eg, the first two topics displayed on the 50-limit search are in reverse order on the 500-limited search.
I've marked this "urgent" because it seems to indicate that the validity of search output is somewhat compromised.
If I say "give me the most recent 50 changes", I expect it to look at all the topics and give me the 50 most recently changed. Instead, it seems to be getting the 50 topics that it can grab without having to get out of its chair, and then listing them in reverse chronological order.
--
TWiki:Main/MarcusLeonard - 19 Jun 2008
mmm, I
think this is due to the non-obvious way that the search options are applied - I think limit happens first, then the sort. There was once an attempt to change that, but iirc there were some TWiki apps that didn't like the result.
I'm about to make some changes to the way that information is passed to the search and query backends - perhaps this will be helped by it.
damned good find!
--
SvenDowideit - 19 Jun 2008
ah yes :/
# For performance:
# * sort by approx time (to get a rough list)
# * shorten list to the limit + some slack
# * sort by rev date on shortened list to get the accurate list
# SMELL: Ciaro had efficient two stage handling of modified sort.
# SMELL: In Dakar this seems to be pointless since latest rev
--
SvenDowideit - 19 Jun 2008
This is a known problem all the way back to Cairo.
The problem manifests itself very badly if you have manipulated (edited or copied) topics in and out of a web via the file system.
I almost want to make a bet that this is what happened that day. I would not be surprised if the 100s of topics were manipulated directly on the file system or copied from another web.
TWiki generates the list of last modified by doing a pre-sort using the file time stamp instead of the timestamp in the topic. This gives a huge performance advantage when you have many topics in a web.
Normally
WebChanges works pretty well and trying it here on Bugs I get valid results no matter what limit. But we have seen this problem when someone have hacked the files directly here.
In a future release we will have to revisit this. With an improved storage we can improve this a lot by having a simple DB (could be as simple as an additional file of topic and their time stamp in date order) that the modified criteria is taken from.
Marcus - put this back to urgent if I am wrong that the files were manipulated outside TWiki editing.
--
KennethLavrsen - 27 Jun 2008
Hi Kenneth,
They were edited in the normal way. As I mentioned above, the team using the web made possibly hundreds of edits to many dozens of topics in the space of a day, so it wasn't done at the filesystem level, it was done by the normal edit process through browsers.
I do sometimes do mass operations in the web at the filesystem level, but this problem was flagged by their team leader when they noticed that their edits weren't showing up as expected. I'm assuming that, even if the
txt
files had been tinkered with directly, any subsequent "normal" TWiki editing would have "put things right"...?
I'm just thinking about why the system-level edits would make a difference... Even if they were edited at the system level (I tend to use
sed
to do the bulk stuff), the timestamps should still line up the same way. Let's say I do a bulk edit on 300 topics to change permissions in
META:PREFERENCE
, and all the
txt
files get a timestamp within a second or two of each other. If humans come along the next day and do a mass of edits, all those edits they made are still more recent than the system edits. Seems to me that the edit, whether done by a TWiki perl script or by me with
sed
, is still an edit that changes the file at the given time. So, if TWiki uses the file timestamp instead of reading the serial date inside the file, it should be
more accurate in terms of showing what got edited last;
any sort of edit should be faithfully noted. Seems like there's something fishy going on here... Sven's comment above does seem to explain it.
Anyway, I'll mark it back to urgent as you advise. Cheers,
--
MarcusLeonard - 05 Jul 2008 (PS: Just noticed the signature block now leaves out the TWiki.org interwiki link - is that right?)
I still cannot reproduce this Marcus.
There is something you do not tell us here. There is an important detail that is left out. Can you create an example and attach the web here in a tar.gz?
Did you modify the
WebChanges?
--
KennethLavrsen - 13 Jul 2008
Hi,
I tried hard to reproduce the issue in my development environment. Questions for you -Main.MarcusLeonard
[1]. Did you modify WebChanges topic? If yes - is it possible to add the content in this page?
[2]. Kindly request you to attach the .tar.gz of the Web which is showing this issue?
Please let me know if any one able to reproduce this issue???
Main.SopanShewale
TWiki:Codev.GeorgetownReleaseMeeting2008x07x21 decided to downgrade this to normal as noone can reproduce this and reporter has not given the needed info to reproduce it.
--
KennethLavrsen - 22 Jul 2008
Thanks
MarcusLeonard for sharing the details...
In fact I am able to recreate the bug on my sample instances....
My instance shows following - i have tested this on TWiki
Limit |
Number of topics* |
date/time |
I am working on this.
--
SopanShewale - 01 Aug 2008
Would love a patch for this if you come up with one!
--
MarcusLeonard - 02 Aug 2008
First limiting and then sorting is sacrificing correctness for speed. This is a known trade-off in TWiki that has been made rather conscious. If you'd like correct sorting+limitting, have a look at
DBCachePlugin, where this tradeoff isn't needed anymore, and rewrite
WebChanges using an appropriate DBQUERY statement. That's the only solution we can offer right now without changing
SEARCH in the core.
--
MichaelDaum - 03 Aug 2008
Yikes. Okay... I'll look at some interim solutions, I guess. But can you tell me what the plan for this problem is in the longer term? It seems pretty serious to me; basically,
SEARCH doesn't do what it says it does, or what people will think it does. Limiting-then-searching seems like a "lucky dip", because you won't necessarily know how the limiting will be applied. It reduces confidence in TWiki, which is not a good situation for something that's supposed to be a premier information management tool.
--
MarcusLeonard - 04 Aug 2008
In the unlikely event that someone else comes across this problem, here's a workaround. Bacically,
touch
the files back to their intended dates. This implies UNIX/Linux, but if you're messing around with files like this, you're likely already there.
Problem: you've updated all the timestamps to when you messed with the files. Solution: get the save-dates out of the files and then touch each file with its date. Do something like this:
for file in *.txt
do
stamp=`head -1 -q $file | cut -d '"' -f 4`
#echo ${stamp} # Check you cut the right field out.
touch -d @${stamp} ${file}
done
Fixed. WebChanges now looks sensible again.
--
MarcusLeonard - 06 Mar 2010
I wonder if we can do something about the pre-optimisation that can cause this
--
SvenDowideit - 09 Apr 2010
updated summary.
I wonder if we should write a
FoswikiValidationPlugin that can detect issues and fix them? Sadly, we don't want to be slowing down the
SEARCH rendering to compare the file time and the in TOPICINFO times..
--
SvenDowideit - 18 May 2010
I have actually fixed this issue in foswiki 1.1 to
some degree (compared to foswiki 1.0, we correctly show topics when using the
SEARCH date
parameter), however, this is done by
not using the filesystem time, so is dramatically slower.
I feel that we
should be able to have the 2 default store's that we ship set the .txt file's timestamp as forced by
forcedate
, so I've got a (failing) unit test for this, and then will look at making it so for 1.1 - and then I'll look towards a restHandler / manage action to do the same as Marcus' script above.
sven@quad7:~/src/foswiki$ bzr diff trunk/UnitTestContrib/test/unit/StoreSmokeTests.pm
=== modified file 'trunk/UnitTestContrib/test/unit/StoreSmokeTests.pm'
--- trunk/UnitTestContrib/test/unit/StoreSmokeTests.pm 2010-05-26 14:30:16 +0000
+++ trunk/UnitTestContrib/test/unit/StoreSmokeTests.pm 2010-05-27 01:04:30 +0000
@@ -343,4 +343,29 @@
}
+#the file system date has always been wrong when using forcedate.
+sub verify_forcedate {
+ my $this = shift;
+
+ my $text = <<HERE;
+%META:TOPICINFO{author="TopicUserMapping_guest" date="1" format="1.1" version="1.2"}%
+---+ Progressive Sexuality
+A Symbol Interpreted In American Architecture. Meta-Physics Of Marxism & Poverty In The American Landscape. Exploration Of Crime In Mexican Sculptures: A Study Seen In American Literature. Brief Survey Of Suicide In Italian Art: The Big Picture. Special Studies In Bisexual Female Architecture. Brief Survey Of Suicide In Polytheistic Literature: Analysis, Analysis, and Critical Thinking. Radical Paganism: Modern Theories. Liberal Mexican Religion In The Modern Age.
+
+HERE
+ my $topicObject =
+ Foswiki::Meta->new( $this->{twiki}, $this->{test_web},
+ "VeryOldTopic", $text );
+ my $rev = $topicObject->save(forcedate=>123);
+ $this->assert_num_equals(1, $rev);
+ my $file_date = $this->{twiki}->{store}->getApproxRevTime( $this->{test_web}, "VeryOldTopic" );
+
+ #potential test code for release 1.0.x - but it didn't work there either
+ #my $rev = $this->{twiki}->{store}->saveTopic( $this->{twiki}->{user}, $this->{test_web}, "VeryOldTopic",
+ # $text, undef, { forcedate=>123 } );
+ #my $file_date = $this->{twiki}->{store}->getTopicLatestRevTime( $this->{test_web}, "VeryOldTopic" );
+
+ $this->assert_num_equals(123, $file_date);
+}
+
1;
--
SvenDowideit - 27 May 2010