Feature Proposal: Support multi-key sorting in SEARCH
Motivation
The current support for sorting SEARCH results is crude at best. Improve it to allow sorting on multiple keys in different directions.
This proposal is a formalisation of a discussion that has been running for years. Sven reckons it's too late for him to implement in 1.1, but maybe it's important enough that we need to reconsider that, and get him some help.
Description and Documentation
Deprecate
order
and
reverse
parameters on SEARCH.
Add
sort
parameter (using the
format
tokens as a comma separated list) as follows:
If the
sort
parameter is given, the
order
and
reverse
parameters will be ignored.
Add a
groupby
parameter, which triggers a
footer
and
header
to be rendered whenever that 'sorting' specifier changes.
so if you want to reproduce the legacy
order
output, you can say
%!SEARCH{
sort="$web,$topic"
groupby="$web"
]%
or, as a complex and slightly weird example
%SEARCH{
sort=""%CALC{$SOMETHIGN($date)}%,$formfield(Status)"}%"
groupby="$formfield(WaitingFor)"
}
which is likely to output many headers and footers - as the result is not sorted by
WaitingFor
- but if one person has alot of tasks waiting on them (um, like i do) they would show up in an output ordered by last edit date and status.
--
Contributors: Many
Discussion
the reason I think its too late, is that I have to re-write the sorting code already :/ but that said, yes, a spec that is agreed to will improve the chances that it'll be done
--
SvenDowideit - 10 Mar 2010
one thing that is missing from the above proposal, is how to do partitioning, or grouping - the historical rendering groups by web, and outputs one header and footer per grouping - imo this is still a useful thing - imagine grouping by date - day/week, or by status..
which brings me back to the original proposal,
MakeSEARCHResultPartitioningByWebOptional and adding a
partition
or
groupby
parameter, with 'web' as the default..?
--
SvenDowideit - 13 Mar 2010
Yes, I deliberately avoided that here to keep the focus on sorting. Currently the only sorting option you have is:
order="key"
. As you highlighted elsewhere, results are partitioned by web, so this in fact has to be read as
order="web,key"
- "web" is an implicit major sort key. Let's say you simply extend this with
partition
with
web
as the default. and say
order="web" partition="topic"
, you are applying a major sort key
topic
and minor sort key
web
. That's workable, but:
- is not intention-revealing for the user, and
- requires complex interpretation of
reverse
(and/or support for partition="-topic"
? Dunno)
The alternative is to have
partition
as a boolean, and write
sort="topic,web" partition="on"
which IMHO is more readable.
--
CrawfordCurrie - 14 Mar 2010
I'll see what happens if we use
partition
as a bool - though I think that will tell the user that its only the first of the multi-key sort, and it might limit the applicability needlessly - no idea
I'm preferring to call the param
groupby
and have it actually list the field/sto group results by - when there is a grouping, there's one header&footer output per group..
wrt reverse - yeah, basically, the sorting is implicit by the use of the
groupby
so maybe we just go all nerdy, and do
sort="(web,date),topic"
to groupby (web,date)....
--
SvenDowideit - 16 Mar 2010
I've changed the proposal to be much more expressive - note that while we're using the
format
$tokens
, I'm not proposing that the
sort
parameter is made into a string and then sorted (though on some Search Algo's that may be a useful pre-sort.
--
SvenDowideit - 17 Mar 2010
OK, I see the connection of
$keys
with the format tokens. Makes sense.
re:
call the param groupby
and have it actually list the field/sto group results again raises the interesting question "should a group impose an implicit sort key"? To illustrate, let's say I have a sequence of results: Aardvark, Cat, Iguana, Jaguar,
KingCobra,
KomodoDragon, Lemur, Zebra. I say
sort="-$topic" groupby="class"
(where class divides between "mammals" and "reptiles"). I would expect:
viz. the
sort is by topic name, but that is then divided into
groups. If "groupby" defines an implicit sort key, then:
- Zebra
- Lemur
- Jaguar
- Cat
- Aardvark
which to me is counter-intuitive.
The second interesting question is, how does the visual representation of a group get defined? Are there additional
groupheader
, groupseparator= and
groupfooter
params?
While not directly impinging on the definition of the
sort
parameter, I think these questions are important enough to raise a concern.
--
CrawfordCurrie - 17 Mar 2010
it depends if groups are higher order than results, and I think they are. So your counter-intuitive example is logical, it groups and then sorts within the groups.
--
ArthurClemens - 17 Mar 2010
If we look at the available keys,
$web
,
$topic
,
$createddate
,
$modified
,
$editby
,
$formfield(name)
we can see that the date keys are useless for grouping as they stand. So we are left with grouping by
$web
,
$topic
,
$editby
,
$formfield(name)
. Grouping by topic is interesting - do the result sets treat multiple hits on the same topic as separate, groupable hits? If not, what does "group by topic" mean?
--
CrawfordCurrie - 17 Mar 2010
Grouping by date can theoretically be enhanced with grouping by day, week, month, year.
--
ArthurClemens - 17 Mar 2010
all good points - for after we've gotten 1.1 cooked
--
SvenDowideit - 01 Apr 2010
MartinKaufmann raised the need for having different sorting criteria. For example, suppose I want to sort stuff based on typical dotted-version-numbers used for software versions. I would like versions (in this case, perl versions) to sort so that "5.8.8" < "5.10" < "5.10.1"
I would also like this ordering-criterion to be pluggable. Perhaps plugins should be able to register comparison-operators.
--
MichaelTempest - 08 Jul 2010
The more I think about it, the more I think it's essential that we make the sort expression use the query syntax. For example, say I have a topic that has a formfield containing the name of another topic; I want to sort
numerically on the value of a formfield
in the second topic.
sort="number:OtherTopic/OtherField"
the typename: can be naturally extended to support different collation sequences; number, alpha, date, version etc.
--
CrawfordCurrie - 08 Jul 2010
Great discussion. At the risk of sounding a
little crazy/precious/annoying - any chance we avoid "using up" the : character just now?
How about wrapping it in parens like a method call:
sort="number(OtherTopic/OtherField)"
I have an extremely embryonic, distant vision forming in my head about how one might extend the query syntax to work with hierarchical data/result sets a little easier and it involves the :
On the other hand, I won't be offended at all if we go ahead with
numeric:...
- don't want to get in the way of progress - just wondering if the
parens()
idea was at all appealing in this case
(OTOH maybe I missed the boat entirely - are we talking about extending query syntax, or is
numeric:...
only meant to be pseudo-query syntax?)
--
PaulHarvey - 08 Jul 2010
AFAICT,
sort="p,q,r..."
means sort first by
p, then by
q, the by
r, etc. Then, Crawford's suggestion is that each of
p,
q,
r, etc is an expression in query syntax for the value to sort by, optionally preceded by the collation method. If present, the collation method is separated from the sort-by expression with
:
--
MichaelTempest - 08 Jul 2010
Correct. And if there is no collation method, then the sort-by expression stands on it's own, which doesn't happen with parentheses, as they are legal in a query. The intent was:
sort-exprs ::= sort-expr ( ',' sort-exprs )? ;
sort-expr ::= ( collation-method ':' )? query ;
collation-method ::= /^[A-Za-z]+$/ ;
as long as we
parse the sort order rather than trying to use regex analysis on it, we are not at risk of "using up" either ':' or ',' - and I very much appreciate the desire not to. This can be done as a clean derivation of the Query parser - in the same way as the IF parser is derived.
--
CrawfordCurrie - 08 Jul 2010
Sort::Maker is now part of the distributed CPAN libs. See
GeneralSortingMechanism (the attached file shows code examples). That is for constructing sort queries. Up to now we have discussed the sort command queries.
Noone has argued against Crawford's idea to use query language. Do we need to decide on that?
--
ArthurClemens - 17 Feb 2011
Actually, I'd vote for re-using Crawfords suggestion of using some deriative of query syntax for specifying sort parameters.
If there is a caveat, it's this: sort=SomeTextField will potentially be able to use an index on a back-end store that supports it. Whereas something simple like sort=uc(
SomeTextField) would prevent the use of the index (it's possible to define the index as uc(
SomeTextField), but that requires extra user nous).
Note that in general sorting and indexing issues are close bedfellows. i.e. a query for 'LEVENS' = uc(
SomeTextField) is likely to lose the benefit of the index.
On my work with
VDBI, it has become clear that I may need to store fields two or more times: exactly as-is and indexable version(s). I see that this sort of concept has already been picked up in
AllowTypedData.
--
JulianLevens - 27 Feb 2012
Is it not possible to automate generation of new indexes somehow? Perhaps that's asking too much.
See also:
Tasks/Item11022 and
SearchOrderByTopicElement.
Something that concerns me is sorting on a
QuerySearch expression which resolves to something that isn't a scalar, eg.
attachments.name
, I guess we simply need to not support those (pick
attachments[0].name
instead), but we need SEARCH macro to be more vocal/informative when these unsupported things are given to it, rather than just silent failure.
--
PaulHarvey - 27 Feb 2012