Feature Proposal: Refactor ACL Check and Pager on ResultSet As Filter

Motivation

This is a beginning of DelegateACLChecksToStoreOrCache - to make large scale wiki's with complex permissions and groups to be performant.

Description and Documentation

When iterating over a Query ResultSet, rather than the code needing to parse&load a meta object to then evaluate the permissions, the iterator should be wrapped in a FilterIterator that is able to skip all hidden results.

For the current implementation, the iterators will need to parse&load, and thus a ResultSet will have new accessors in addition to next() : isMeta() nextMeta() (or something)

This filter API will need to be be selectable listener style - so that we can add a ACLCacheContrib that store the info for faster querying.

API change

No change to users of Foswiki::Func::query, they will still be handed an iterator that will only give them topics that the user has permission to see.

Internally to Store::*Algo::query

in 1.1, query returns a ResultSet iterator of topics that match with an ACL context of Admin, and then Foswiki::Search::FormatResults parses and loads them to test ACLs.

Instead, the unaccelerated core query will return a FilterIterator on the previous ResultSet which has an identical result.

the point of this is that other *Algo=s can return their own pre-evaluated ACL iterators - the API contract is that query will give you an iterator of tom elements that user(group) is allowed to XXX (ok, initially =VIEW, but i wonder if filtering a resultSet for CHANGE or other random permission might also have some utility.

Performance

One of the ideas behind this, is to make SEARCH{format="   * $topic"} a viable replacement for VarTOPICLIST, by making it possible to not load and parse each topic in a result set during formatting unless absolutely necessary. TOPICLIST is effectivly the same as SEARCHing in an Admin context, and only rendering the name of that topic. (atm, even in Admin context, we needlessly load the topic)

Clearly, the next thing to write is an ACL cache that we can ship with the core smile but MongoDB will be faster again smile

Examples

1360

on a web of 25,000 topics should be an extremely tight loop. VarTOPICLIST should be able to be implemented this way too - with the ACL filter off.

Impact

%WHATDOESITAFFECT%
edit

Implementation

-- Contributors: SvenDowideit - 29 Apr 2011

Discussion

I don't quite get what nextMeta vs isMeta means. Shouldn't the resultset iterator just give you the next "thing" - and the resultset should only contain topics you're allowed to access? That's what I'd like to see as a user of Foswiki::Func::query. Somehow I'd assumed that the search algo would be able to take the "user"-supplied query, and AND it with a subquery which filters it according to VIEW permissions for a certain user. Something like:

  • JoeUser is the user we're checking ACLs for
  • JoeUser belongs to AllowGroup via membership of OtherGroup (also for the sake of this example, these are the only two groups he belongs to)
  • The web we're querying has ALLOWWEBVIEW = AllowGroup and DENYWEBVIEW = DenyGroup

<user query form.name='FooForm'...> AND (
  NOT (
    JoeUser IN DENYVIEW
    OR OtherGroup IN DENYVIEW
    OR AllowGroup IN DENYVIEW
  ) AND (
    NOT (ALLOWVIEW) OR (
      JoeUser IN ALLOWVIEW
      OR OtherGroup IN ALLOWVIEW
      OR AllowGroup IN ALLOWVIEW
    )
)

Whereas it seems the proposal you've got here is still going to result in N ACL-database-requests to get a total hit count for a web that has N topics in it.

Which is probably still going to be a lot faster than the current situation; but not as fast as wrapping it all up into a single query.

I guess what I'm asking for is clarification on how a user of Foswiki::Func::query is affected by this refactoring effort (because I'd imagined that we'd modify the calling signature to include some sort of aclfilter => { cuid => 'JoeUser', action => 'VIEW'} option).

Foswiki::Func::query could throw an exception if aclfilter isn't supported by the search algo (or doesn't know about the action)

-- PaulHarvey - 29 Apr 2011

next on a result set returns a string that (most times) is an address (web.topic) - so we can't take that. the point of ismeta / getmeta is to use any pre-created topic object that was made as a side effect of not being able to do an ACL check out of band.

ok, I need to rewind a little and elaborate on my thinking.

One great big assumption (for me) is that we don't want to irreversibly tie up the store, search and ACL components. They should be able to be combined into a superscalar monolith, but it should be possible to use (er, for eg) google for the search engine, random github repo's for the store, and facebook's user and friend system for user and grouping to filter for permissions.

or more reasonably, utilizes a corporations LDAP for users and some broader permissions to resources, leverage the company wide federated search, and do so from information that a store serves up from their document repository.

while this may sound huge and unlikely (and its not, i've done one already, and am planning another) The crux of it is can we make an API that allows us to be that flexible without hurting the normal case.

I've updated the spec at the top.

basically Paul, your suggestion does not apply to core foswiki as we ship it, this feature request is about making the API able to do it.

-- SvenDowideit - 29 Apr 2011

Okay so basically we agree that checking ACLs must become part of a search, not something done afterwards on application layer. Hiding this behind a filter facade seems to be an appropriate way to go. However from your explanations above I get that currently Foswiki::Func::query does not return an ACL-checked result set, or is it? That needs clarification as we might run into a compatibility problem changing this. So calling this function requires some kind of ignorePermissions defaulting to the value we use it now. There also might be some code buried in plugins that make use of Foswiki::Func::query and then do their owh ACL check while formating results. We need to take care of these poor buggers as well.

Checking ACLs can't be done on the backend without a complete user/groups model living in the same place. This is the far bigger issue behind this all as it needs rethinking of a wider range of Foswiki's core modules lingering around in isolation right now.

Wrt external authorities for users and groups: facebook can't check ACLs for you, that'd be a misunderstanding. They only provide a record of user information besides a single sign on feature, but don't have any notion of ACLs for private objects of course. To a certain level of abstraction LDAP+Kerberos isn't different. Judging from my years of work on LdapContrib I can only conclude to implement a kind of PluggableUserManagement that caches user information dragged in from different connectors (a set of ldap directories, oauth, openid, you name it) and populate the tables holding user mappings as users register and log in. Prefetching user records (as is the current default in LdapContrib) should be the absolute exception, mostly exclusively used to fetch groups and roles, but not all of the user records.

In 99.9% of all cases where people set up a large scale and well performing Foswiki, all of the three (1) the content store, (2) the user/groups/roles records and (3) the ACLs on objects will be set up living in a homogeneous infrastructure. Not doing so will inevitably bring down performance. In fact, this would be such a deprecated configuration that it should IMHO not be encouraged or even made possible at all. In that I am all for a "monolithic" architecture, even when it means to give up some of the flexibility we have right now. It is this flexibility which hurts further development in that area.

Note also, that even with a "monolithic" architecture there's no reason not to have a pluggable user mapping and stuff. The difference however is that instead of making all of the user mapping replaceable, it will become more of a "connectors" way to interface with external user management systems. Independent from this is the actual login management to redirect users to the authority that is able to authenticate the user. That's more of a way orchestrating the browser rather than serving as a single login gateway to all of the connectors. That's because delegating authentication is bad by design of course, and not really considered by anybody.

-- MichaelDaum - 29 Apr 2011

Cool. I think it would be useful if we don't hard-code 'VIEW' - we are desperately missing nice ways to view what the ACL situation is; if we can see what topics we can (RENAME, CHANGE, COMMENT, etc.) that would be one tool users could use to build confidence & understanding of ACLs.

Secondly, what Sven seems to be saying is that an ACL query embedded with the user query will still be possible in this architecture if the search algo chooses to do that, so i guess that's cool.

-- PaulHarvey - 29 Apr 2011

this was implemented mid 2011 and thus will be in 1.2.0. No docco is expected to be written because the only effect users should see is that paging is magnitudes faster than in 1.1

(ok, so for the 3 people I know about, it does change the SearchAlgorithm API significantly. But this is an expert foswiki internals area where I suspect they will appreciate the ability to optimise ACL evaluation)

-- SvenDowideit - 01 Mar 2012
Topic revision: r11 - 08 Jul 2015, MichaelDaum
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy