Feature Proposal: Refactor ACL Check and Pager on ResultSet As Filter
Motivation
This is a beginning of
DelegateACLChecksToStoreOrCache - to make large scale wiki's with complex permissions and groups to be performant.
Description and Documentation
When iterating over a Query
ResultSet, rather than the code needing to parse&load a meta object to then evaluate the permissions, the iterator should be wrapped in a
FilterIterator that is able to skip all hidden results.
For the current implementation, the iterators will need to parse&load, and thus a
ResultSet will have new accessors in addition to
next()
:
isMeta()
nextMeta()
(or something)
This filter API will need to be be selectable listener style - so that we can add a
ACLCacheContrib that store the info for faster querying.
API change
No change to users of
Foswiki::Func::query
, they will still be handed an iterator that will only give them topics that the user has permission to see.
Internally to Store::*Algo::query
in 1.1,
query
returns a
ResultSet iterator of topics that match with an ACL context of
Admin
, and then
Foswiki::Search::FormatResults
parses and loads them to test ACLs.
Instead, the unaccelerated core
query
will return a
FilterIterator on the previous
ResultSet which has an identical result.
the point of this is that other
*Algo=s can return their own pre-evaluated ACL iterators - the API contract is that query will give you an iterator of tom elements that user(group) is allowed to XXX (ok, initially =VIEW
, but i wonder if filtering a resultSet for CHANGE or other random permission might also have some utility.
One of the ideas behind this, is to make
SEARCH{format=" * $topic"}
a viable replacement for
VarTOPICLIST, by making it possible to not load and parse each topic in a result set during formatting unless absolutely necessary.
TOPICLIST
is effectivly the same as SEARCHing in an Admin context, and only rendering the name of that topic. (atm, even in Admin context, we needlessly load the topic)
Clearly, the next thing to write is an ACL cache that we can ship with the core
but
MongoDB will be faster again
Examples
1360
on a web of 25,000 topics should be an extremely tight loop.
VarTOPICLIST should be able to be implemented this way too - with the ACL filter off.
Impact
Implementation
--
Contributors: SvenDowideit - 29 Apr 2011
Discussion
I don't quite get what nextMeta vs isMeta means. Shouldn't the resultset iterator just give you the next "thing" - and the resultset should only contain topics you're allowed to access? That's what I'd like to see as a user of
Foswiki::Func::query
. Somehow I'd assumed that the search algo would be able to take the "user"-supplied query, and AND it with a subquery which filters it according to VIEW permissions for a certain user. Something like:
<user query form.name='FooForm'...> AND (
NOT (
JoeUser IN DENYVIEW
OR OtherGroup IN DENYVIEW
OR AllowGroup IN DENYVIEW
) AND (
NOT (ALLOWVIEW) OR (
JoeUser IN ALLOWVIEW
OR OtherGroup IN ALLOWVIEW
OR AllowGroup IN ALLOWVIEW
)
)
Whereas it seems the proposal you've got here is still going to result in N ACL-database-requests to get a total hit count for a web that has N topics in it.
Which is probably still going to be a lot faster than the current situation; but not as fast as wrapping it all up into a single query.
I guess what I'm asking for is clarification on how a user of
Foswiki::Func::query
is affected by this refactoring effort (because I'd imagined that we'd modify the calling signature to include some sort of
aclfilter => { cuid => 'JoeUser', action => 'VIEW'}
option).
Foswiki::Func::query
could throw an exception if
aclfilter
isn't supported by the search algo (or doesn't know about the
action
)
--
PaulHarvey - 29 Apr 2011
next
on a result set returns a string that (most times) is an address (web.topic) - so we can't take that. the point of ismeta / getmeta is to use any pre-created topic object that was made as a side effect of not being able to do an ACL check out of band.
ok, I need to rewind a little and elaborate on my thinking.
One great big assumption (for me) is that we don't want to irreversibly tie up the store, search and ACL components. They should be
able to be combined into a superscalar monolith, but it should be possible to use (er, for eg) google for the search engine, random github repo's for the store, and facebook's user and friend system for user and grouping to filter for permissions.
or more reasonably, utilizes a corporations LDAP for users and some broader permissions to resources, leverage the company wide federated search, and do so from information that a store serves up from their document repository.
while this may sound huge and unlikely (and its not, i've done one already, and am planning another) The crux of it is can we make an API that allows us to be that flexible without hurting the normal case.
I've updated the spec at the top.
basically Paul, your suggestion does not apply to core foswiki as we ship it, this feature request is about making the API able to do it.
--
SvenDowideit - 29 Apr 2011
Okay so basically we agree that checking ACLs must become part of a search, not something done afterwards on application layer. Hiding this behind a filter facade seems to be an appropriate way to go. However from your explanations above I get that
currently
Foswiki::Func::query
does
not return an ACL-checked result set, or is it? That needs clarification as we might run into a compatibility problem changing this. So calling this function requires some kind of
ignorePermissions
defaulting to the
value we use it now. There also might be some code buried in plugins that make use of
Foswiki::Func::query
and then do their owh ACL check while formating results. We need to take care of these poor buggers as well.
Checking ACLs can't be done on the backend without a complete user/groups model living in the same place. This is the far bigger issue behind this all as it needs rethinking of a wider range of Foswiki's core modules lingering around in isolation right now.
Wrt external authorities for users and groups: facebook can't check ACLs for you, that'd be a misunderstanding. They only provide a record of user information besides a single sign on feature, but don't have any notion of ACLs for private objects of course. To a certain level of abstraction LDAP+Kerberos isn't different.
Judging from my years of work on
LdapContrib I can only conclude to implement a kind of
PluggableUserManagement that caches user information dragged in from different connectors (a set of ldap directories, oauth, openid, you name it) and populate
the tables holding user mappings as users register and log in. Prefetching user records (as is the current default in LdapContrib) should be the absolute exception, mostly exclusively used to fetch groups and roles, but not all of the user records.
In 99.9% of all cases where people set up a large scale and well performing Foswiki, all of the three (1) the content store, (2) the user/groups/roles records and (3) the ACLs on objects will be set up living in a homogeneous infrastructure. Not doing so will inevitably bring down performance.
In fact, this would be such a deprecated configuration that it should IMHO not be encouraged or even made possible at all. In that I am all for a "monolithic" architecture, even when it means to give up some of the flexibility we have right now. It is this
flexibility which hurts further development in that area.
Note also, that even with a "monolithic" architecture there's no reason not to have a pluggable user mapping and stuff. The difference however is that instead of making all of the user mapping replaceable, it will become more of a "connectors" way to
interface with external user management systems. Independent from this is the actual login management to
redirect users to the authority that is able to authenticate the user. That's more of a way orchestrating the browser rather than serving as a single
login gateway to all of the connectors. That's because delegating authentication is bad by design of course, and not really considered by anybody.
--
MichaelDaum - 29 Apr 2011
Cool. I think it would be useful if we
don't hard-code 'VIEW' - we are desperately missing nice ways to view what the ACL situation is; if we can see what topics we can (RENAME, CHANGE, COMMENT, etc.) that would be one tool users could use to build confidence & understanding of ACLs.
Secondly, what Sven seems to be saying is that an ACL query embedded with the user query will still be possible in this architecture if the search algo chooses to do that, so i guess that's cool.
--
PaulHarvey - 29 Apr 2011
this was implemented mid 2011 and thus will be in 1.2.0. No docco is expected to be written because the only effect users should see is that paging is magnitudes faster than in 1.1
(ok, so for the 3 people I know about, it does change the
SearchAlgorithm API significantly. But this is an expert foswiki internals area where I suspect they will appreciate the ability to optimise ACL evaluation)
--
SvenDowideit - 01 Mar 2012