File::Temp UNLINK
option to have files removed after use. Or when files need to be preserved for other use, clean up at the end of processing. Extensions should not be leaving behind clutter in the first place.
For persistent files that are related to topics, then they can be handled in a afterRename handler, or tracked and removed in an afterSaveHandler, so they are either moved or deleted as needed.
GenPDFAddOn and DirectedGraphPlugin use a lot of temporary files as well as persistent files in the extension work_area
and persistent files cached in pub/
It was a bit of effort, but hopefully they are not leaving behind cruft in the temp directories. DGP keeps a map of dynamic attachments in the work_area, so it can match up before/after each run to remove stale attachments.
Do we need a new handler or do we need Tasks opened against extensions that leave stuff behind.
-- GeorgeClark - 23 May 2011
Fair observations, but here are some examples with my rationale:
GoogleWeatherPlugin - caches location-specific data for a period of time long enough to keep Google from blacklisting the requestor. Need to remove data for locations that time out and are no longer used. But they might be used again by some other topic before the required time between requests. So you can't hook rename/save - you need something timer-directed. I have other cases like this where webcam data is cached from servers that blacklist abusers.
Rename will handle rename, move, and move-to-trash (delete). But when the trash (which last I looked still needed a manual empty script) is emptied, what callback is going to remove the stale data? Perhaps we need one...but currently admins just groan and add a cron cleanup script.
GaugePlugin - creates dynamic data in pub that has to remain for as long as a browser might do a refresh - but needs to go away eventually. The gory details are under negotiation (I'm trying to get the maintainer on TWiki to improve things), but it also lends itself to a timer-based approach.
VarCachePlugin - handles expiration of cached data only when a topic is viewed. It has no way to empty the cache at the expiration time for a topic that's not been viewed for, say a month. You pretty much need a timer-based mechanism if you want to free the disk space.
I agree that plugins/extensions shouldn't leave stuff behind unnecessarily - and it's fair to open tasks against those that do. The ones you mentioned were deficient; I'm glad they were fixed. But I think some do require a timer mechanism because there is no (guaranteed) webserver event to trigger GC. (Note that tick_foswiki handles the login session cache and stale edit locks. Any plugin that does something that looks like this will also want to be timer driver. Maybe someday we'll have a semi-persistent editor undo buffer - that might want to be removed (by some admins) if no login for a few months.
This proposal provides a standard way - driven by an existing mechanism - to handle the cleanup cases that want/need time-driven garbage collection. It keeps the knowledge about what the script does in the plugin - where it belongs.
So I think this proposal serves a useful purpose. I agree that when possible, it's better to have plugins clean up after each request. When it's not - or when it's very expensive or inconvenient, GC is the alternative.
This proposal is a pretty simple, clean way to provide GC to plugins that need it. No plugin is required to use it.
I know on my (small) site, it will get rid of easily a half-dozen heuristic GC scripts.
On the other hand, it may encourage lazy plugin authors to clean-up this way rather than aggressively cleaning up for each request. That's not optimal, but it beats the current state where lazy plugin authors don't clean up at all
-- TimotheLitt - 23 May 2011
Providing a new handler which any extension can define which is called by the tick script. That is a great idea.
It is so silly to waste runtime as part of views, edits and saves to clean garbage. It is much better to let a cron job do this in the background (scheduler task in Windows).
Having ONE standard tick script that can run a few times per day is not a big deal to setup.
And then the plugin author can write a simple routine to remove garbage.
Same handler can also be used by an extension to send emails, and any other thing that needs to run regularly. It does not have to be limited to garbage handling.
With a little care this can become really useful.
-- KennethLavrsen - 23 May 2011
Kenneth -- Exactly, although I do think that simple cleanup shouldn't be defered. And tick_foswiki is already part of the distribution and should already be running in every installation. And it already creates a session. That's why I picked it.
If we put other functions into this handler, we might want to have the cron job run at a relatively high frequency - maybe every 15 or 30 minutes, and pass the handler the time since the last run as well as the current time. Then the handler can decide if enough time has passed for it to run. (e.g. something that reads a lot of topics might want to run once a week, while simple garbage collection might run daily and some e-mails might be every 15 minutes....)
Thanks for all the thoughts - They'll refine the prototype.
-- TimotheLitt - 24 May 2011
Yes, I like this idea. MartinCleaver proposed something similar some years ago, but no-one ever implemented it. tick_foswiki
is exactly the right place for this to be called.
It would be best if you could support registration of a listener. For example,
StaticMethod registerPulseHandler(\&handler, $schedule))
where handler
is the handler function and $schedule
is the schedule on which the handler function should be called. That would allow a plugin to register different handlers on different (and even user-defined) schedules. There are CPAN modules for handling cron scripts that support the specification of schedules (the default would be to simply call the handler on every pulse of tick_foswiki).
I believe there are other feature requests/tasks covering this same topic; it would be worth having a search.
(The problem with letting the handler decide whether it's time to be called is that it would require it to remember when it was last called > yet another timestamp file. A cron schedule could be easily and consistently specified in =configure
, and would support non-linear schedules)
If we increase the calling frequency of tick_foswiki we might need to consider a daemon version that keeps the perl interpreter (and foswiki) in memory. Cross that bridge when we come to it.
-- CrawfordCurrie - 24 May 2011
You mean CPAN:Schedule::Cron? Doesn't seem to be actively maintained, but I see the possibilities. Do you have experience with this (or some other)?
I had already thought of the daemon version, though I was aiming to keep things simple.
It seems to me that the fancy scheduling can be done as a second phase, since the default would just be to call the named handler (pluginCleanup) that I started with on every tick. I think that would handle the common cases with something that's easy to backport and easy to use.
A plugin wanting fancy scheduling wouldn't have the pluginCleanup function; instead it would register a schedule in initPlugin. Not wanting to waste cycles doing that normally, I'd suggest a context variable (perhaps 'cleanup_active') that tells plugins that they can register.
I'm a bit cautious about adding schedules to configure - perhaps they're OK as expert over-rides. We already have an overwhelming amount of configurability. It's important that plugins have a sensible set of defaults so that they normally just plug and play.
A related consideration is synchronization. Schedule::Cron can fork - which is good for performance, but ripe for interaction bugs. But even if run non-forking, we still have periodic events in tick_foswiki running against webserver events. At the risk of adding complexity, perhaps we also need an api for locking a plugin's persistent data. Something like a shared(read,view)/exclusive(create,delete,write) lock on the plugin.pm file taken explicitly by the plugin during normal operations, and implicitly locked exclusive by tick_foswiki around the callbacks? Wrapped in a "lockPersistentData( 'read' | 'write')" syntax...
I've stumbled across a number of cases of plugins that don't understand concurrency issues - while they are broken today, periodic events will make things worse. At least this is an opportunity to raise consciousness by providing an API. Can someone take that to a separate feature proposal?
I'll try to run some experiments with Schedule::Cron in the next few days and see if it feels viable.
-- TimotheLitt - 24 May 2011
Registering a pulse handler when not running in the pulse service is a NOP; no need to explain the context variable to plugin authors, all they need to know is that registerPulseHandler
only does something when called by a pulse service.
BTW I want to get away from "named handlers" in the plugin sense and move towards a listener/event architecture for plugins. So again, I ask you not to add a "pluginCleanup" handler, but instead support registering arbitrary functions as pulse handlers.
Schedule only need to be added to configure if a plugin needs admin configurable schedules. I'm sure much of the time the plugin author will just want to say "on every tick" - or perhaps, "no more often than once a week".
-- CrawfordCurrie - 24 May 2011
In principle, a plugin may require more than one type of processing, and the schedules for each may be different. It is much easier to do this, and also much easier to specify the schedule for each handler, if we support registering arbitrary functions as pulse handlers.
The plugin's schedules might be configurable via configure
, which means that the admin could set them to be the same. This means we should not rely on the schedules being unique. So I suggest giving names to handlers, so that the pulse-scheduler has a unique identifier for each one. The scheduler should be combine the caller's package with the given identifier so that different plugins may use the same identifiers without clashing.
Something like this:
# Shuffle the deck on every tick Foswiki::Func::registerPulseHandler( 'shuffle tags', \&pulseExampleShuffler ); # Toss out old stuff based on the admin's schedule. The default is "do it daily" Foswiki::Func::registerPulseHandler( 'clean up', \&pulseExampleGC, $Foswiki::cfg{Plugins}{MyPlugin}{GarbageCollectionSchedule} || '1 0 * * *' );-- MichaelTempest - 25 May 2011 I have built a working prototype of a timed task daemon - working for TWIiki, that is :-(. But that should be good enough for some feedback and to test the theories. I didn't take all the advice, but you should recognize what's here. (He who does gets extra votes...) I think it's a reasonable start. One of the constraints on this prototype was that I did not want to modify any core files, which some of your suggestions would require. In the attached tar file, you'll find 3 files. Here's how to get started. First, find your friendly TWiki test system. (I don't have a Foswiki running yet, and in any case I want them to accept ti too.) cpan install
R/RO/ROLAND/Schedule-Cron-1.01_1.tar.gz
note this is the latest "Developer" release; the standard release wouldn't install and has bugs.
cd to your twiki root, and unpack the tar file.
mv your tick_twiki.pl file to something like standard_tick_twiki.pl.
move (or link) etc/sysconfig/TWiki
to the real etc/sysconfig
. Make sure it's owned by your webserver. Edit it to match your configuration.
Create softlinks to tools/experimental_tick_twiki.pl
from /etc/init.d/TWiki
and tools/tick_twiki.pl
Run chkconfig -add TWiki
(I haven't tested this yet, but it should work.)
You should be in business. You may want to adjust the frequency of your tick_twiki runs - all they do is restart the daemon if it's crashed. So every 30 mins is probably reasonable. But then, so is not running it at all
You can run /etc/init.d/TWiki status
to verify. (start if chkconfig doesn't start it for you)
You can enable PeriodicTestPlugin
- it does nothing useful, but does test the APIs. Feel free to try your own clients.
There are two mechanisms provided: pluginCleanup
mechanism (Crawford may persuade me to remove it later, but I like the simplicity for the writer, and it's consistent with the way plugins work now.)
$TWiki::cfg{CleanupSchedule}
, which I am currently overriding to 0-59/2 * * * * 30
at the top of the file. A real default will be negotiated later - for now, if you comment this out, you'll get the traditional daily at midnight default.
API Highlights: $TWiki::cfg{CleanupSchedule}
or a default. I suggest using a $TWiki::cfg{Plugins}{YourName}{FooSchedule}
.
-d
for more, and you really DON"T want -v
(I warned you). Look in the debug.log
and warn*.log
files.
If run under the perl debugger, you can set breakpoints in the daemon; you'll get an X-window when you hit them.
You can get a full listing of the execution queue from the command line using status dump
. This signals the daemon, and writes to the debug log
.
--help
on the command line will give you a mini man page for the script.
This should be enough for reasonable experimentation. I know it has rough edges, and it probably has bugs. (What do you expect for a couple of hours of prototyping?)
I will qualify the task names with the caller's package in the next iteration; for now, do something like "$pluginName_".name
.
I suppose configure should learn about entering and validating crontab time strings. I think that's quite different between the two forks, so I'm in no rush.
I'm not sure about making core changes to provide stub routines (probably in plugins.pm). For now, we can live with the context variable.
Do not start porting to Foswiki yet. It's not stable yet, and the TWiki folks need to have their say.
However, I do think it's at about 80% (maybe better) complete. I encourage you to play with it and also to separate your thoughts on functional defficiencies from tose on style. (Not that style isn't important, but it's not first on my list for this.)
Enjoy,
-- TimotheLitt - 25 May 2011
Sounds good, but I hope you will take on board what Michael and I have said about registering a handler (which is consistent with the existing registerTagHandler
and registerRESTHandler
) rather than having the hard-coded, only-one-function pluginCleanup
approach, which is very limited. BTW pluginCleanup
is not consistent with the rest of the plugin architecture, which uses handler functions to implement listeners installed at different positions in the rendering cycle (which has always been a PITA as each plugin can only register a single listener at each position). The pluginCleanup
function is not a handler in this sense, so is out of band (and potentially confusing) for most plugin authors.
WRT functionality, I think you need to support the concept of different functions being applied on different schedules without requiring the plugin author to disentangle the schedule. A classic requirement for this is found in the mailer; we want to be able to mail out change details on a different schedule to mailing out digests, which have a different schedule to newsletters. At the moment we have to do this with separate cron jobs, which is error prone due to synchronisation issues.
-- CrawfordCurrie - 25 May 2011
The prototype already supports both models. Periodic Task(I)Wed May 25 08:38:05 2011: Schedule::Cron - Starting job 0 with ('initWiki','none',{'p' => '/var/www/servers/twiki/working/tick_daemon.pid','d' => 1},bless( {...} Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 0-59/2 * * * * 30 TWiki::Plugins::PeriodicTestPlugin::cronTask1( 1,4,19 ) Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 15 8-17/2 * * 1-5 TWiki::Plugins::PeriodicTestPlugin::Mail( runmail,Mailer.Log ) Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 18 20 * Jul-Sep Sun,Sat TWiki::Plugins::PeriodicTestPlugin::News( runnews,News.Log ) Periodic Task(I)Wed May 25 08:38:06 2011: initWiki Periodic Task(I)Wed May 25 08:38:06 2011: AddTask: 0-59/2 * * * * 30 TWiki::Periodic::TickTock( HASH(0x87ed2dc) ) Periodic Task[24320]: Event queue listing Periodic Task[24320]: 0-59/2 * * * * 30 Next: Wed May 25 08:38:30 2011 - TWiki::Plugins::PeriodicTestPlugin::cronTask1 (session, 1, 4, 19) Periodic Task[24320]: 15 8-17/2 * * 1-5 Next: Wed May 25 10:15:00 2011 - TWiki::Plugins::PeriodicTestPlugin::Mail (session, runmail, Mailer.Log) Periodic Task[24320]: 18 20 * Jul-Sep Sun,Sat Next: Sat Jul 2 20:18:00 2011 - TWiki::Plugins::PeriodicTestPlugin::News (session, runnews, News.Log) Periodic Task[24320]: 0-59/2 * * * * 30 Next: Wed May 25 08:38:30 2011 - TWiki::Periodic::TickTock (session, HASH(0x87ed2dc)) Periodic Task[24320]: End of event queue Periodic Task(I)Wed May 25 08:38:06 2011: initWiki finished successfully Periodic Task(I)Wed May 25 08:38:06 2011: Schedule::Cron - Finished job 0 Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Starting job 0 with ('TWiki::Plugins::PeriodicTestPlugin::cronTask1',bless( {...} Periodic Task(I)Wed May 25 08:38:30 2011: TWiki::Plugins::PeriodicTestPlugin::cronTask1 finished successfully Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Finished job 0 Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Starting job 3 with ('TWiki::Periodic::TickTock',bless( {...} Periodic Task(I)Wed May 25 08:38:30 2011: Expire sessions Periodic Task(I)Wed May 25 08:38:30 2011: Expire leases Periodic Task(I)Wed May 25 08:38:30 2011: Cleanup plugins Periodic Task(I)Wed May 25 08:38:30 2011: ReplaceSchedule: New schedule for TWiki::Plugins::PeriodicTestPlugin::cronTask1: 0-59 * * * * 10 Periodic Task(E)Wed May 25 08:38:30 2011: Schedule::Cron - Error within job 3: delete at /var/www/servers/twiki/lib/TWiki/Plugins/PeriodicTestPlugin.pm line 152. Periodic Task(W)Wed May 25 08:38:30 2011: TWiki::Periodic::TickTock exited with status 1 Periodic Task(I)Wed May 25 08:38:30 2011: Schedule::Cron - Finished job 3-- TimotheLitt - 25 May 2011 Cool! I reserve the right to dislike
pluginCleanup
(there are far too many plugin handlers already) but can live with it. Not sure why you had problems with the username; that should be specified in Config.spec for the extension, I guess. I do like the fact that you are trying to keep this out of the core at this stage, so it can be used with older releases, but I suspect we should consider integrating your work directly in the core (or as a default plugin) so it's available to everyone without the need to install an additional extension. Beyond that I can't really comment until we've seen the code.
-- CrawfordCurrie - 25 May 2011
Thanks. We all have our dislikes; I won't make a final call on pluginCleanup
until I have some more experience.
I was just distracted with the username, too much going on. It's now specified in sysconfig/TWiki (I need it before creating the session). It's one username for all tasks.
Yes, my hope is that you (and TWiki) will integrate this - it's very isolated (2 files), and it is a plug-in (no pun intended) replacement for the tick script. It won't be very useful unless plugins can count on it being there - and at that point, I hope the plugins will start taking responsibility for their garbage collection. Note that the plugin in the prototype is just a demo/test scaffolding. It would not be released, but EmptyPlugin would get a subset as sample code.
Code: The latest code snapshot is attached to this topic. I'm still fussing with logging and error handling, and it will need more documentation.
I may add a variant of AddTask (probably spelled AddAsyncTask) that runs the task in its own fork. This would support resource-intensive tasks - but they'd have more potential synchronization issues. Standard AddTasks will continue to run single-threaded in the daemon - but still need to worry about synchronization issues wrt. the webserver.
That said, it's mostly there. (And quite a bit more complicated and function-rich than my original baseline.)
You're welcome to peruse, play with the prototype, & review the code. Just understand that it's still evolving. (Though that's a good time to make helpful comments. I do listen...)
I appreciate all the constructive comments and thoughts.
-- TimotheLitt - 25 May 2011
Posted V2.0-004, which has basic configure support, including pretty thorough validation of timespecs. The GUI isn't pretty, but it does seem to work. It would be nicer to at least have six text boxes - 1 for each subfield, but I didn't see how to do that. Crontab is inherently ugly.
This adds three new files to lib/TWiki/Configure/. You also need to apply a small patch to TWiki.spec; the patch file is included. The setting is in "Miscellaneous Settings".
Custom schedules for other tasks would add a couple of lines in TWiki.spec & clone the CleanupSchedule.pm file - just change the package name and the cfg key.
I've heard that Foswiki has a "new" configure architecture, but as I haven't looked at it, don't know how much work it will be to port. However, the timespec validator (about 200 lines of script!) is a separate routine - hopefully that's the hard part.
-- TimotheLitt - 26 May 2011
Please don't extend the API beyond Foswiki::Func (or TWiki::Func). This really does make it hard for new developers to learn. In my opinion, the API is already too broad and poorly defined, so please do not make it worse by adding another package to the API that plugins may use. I expect the "engine" code would not live in TWiki::Func; but please do add a wrapper in TWiki::Func. It makes the learning curve less steep
-- MichaelTempest - 26 May 2011
Seems reasonable. I don't want to touch (Foswiki|TWiki)::Func.pm, but the next drop will export the API functions into the TWiki::Func namespace as
TWiki::Func::AddTask, TWiki::Func::DeleteTask, TWiki::Func::NextRuntime, and TWiki::Func::ReplaceSchedule
They only exist when running under the task daemon, and will be documented there.
TWiki seems to want this for their next release, modulo the checkin mechanics. I hope to be "done" soon...
-- TimotheLitt - 26 May 2011
Progress. I have a much closer to industrial strength prototype running. I created a documentation page - it doesn't really belong in this topic, so I put it at PeriodicTasks. I hope no-one is offended - I couldn't find a better place. It has admin and developer documentation, including installation instructions (sigh, yes for TWiki) & screenshots. It's not intended as a discussion topic, but as a start on release documentation. Comments are welcome here.
It's worth a look.
Asynchronous task support is working, logging is working and there's a configure GUI that is better than crontab - at least, I think so. I had to break my rules and make a very small patch to configure to make it work, however.
There is no longer an /etc/sysconfig file - I came up with a scheme that eliminates that.
There is a mechanism for on-the-fly reconfiguration.
I had some problems with Schedule::Cron. I've included a patch in the latest .tar file, but it's not documented in my install instructions. The owner promised to review and release a new kit late this week. It's not a small patch - I fixed a year's worth of his RT backlog as well as my 4 bugs.
I hope less effort will be required with the Foswiki new Configure.
Latest code snapshot is here (same place).
Enjoy,
-- TimotheLitt - 31 May 2011
Tried to stay out of this discussion up to now but got curious as soon as PeriodicTasks materialized. I like the direction this goes but would really like to see it mature into a more general async task manager usable not only for periodic tasks but also for those tasks that better stay out of the response code flow and run only once and immediately. Examples for these kind of tasks are
Foswiki::Func::registerTask
instead of addTask :-). I'm less sure about the way it's implied to hand around state between different plugins or tasks/runs ($session
). I also need to understand why su/gid bit is necessary, as it's not permitted on our webserver environment, and also why a symlink to bin is necessary (really this should be a script that's no harder to run than foswiki_tick.pl or a rest handler), but I guess that's minor implementation detail.
What counts is the API.
I am finishing up a project in a couple of months in which I will need this functionality. Two modes: run every N minutes (I already do this with a traditional cron job, to re-generate report topics & attachments). And secondly what I'm really wanting is a "single-shot" task launched async'ly (also blocking, avoiding pile-ups). I haven't thought through how to allow the plugin to manage a schedule that gets longer and longer, further and further behind (drop tasks? implement a producer/consumer message queue thingy?)...
Cool work
-- PaulHarvey - 31 May 2011
Let's see:
Michael -
Once and immediate tasks - that really wasn't my objective. I'd really need to think harder about the requirements. You may just want your own daemon; I could package up some of the infrastructure (like registering for config change notification); there are already tools like Proc::Daemon.
Paul,
register v.s. add - you know, I thought about that. But I add is so much less typing :-). Maybe - before there are lots of consumers. Are we that pedantic?
The setuid/setgid story: root
, you don't need it. Things will work, life will be good.
nobody
, apache
, webserver or who-knows what
.
tick_twiki.pl
, which this can be a drop-in replacement for) to finding setlib.cfg was a wrapper script that read cd /path-to-bin && ./tick_twiki.pl
. That's a config file by another name - and it meant a separate script (small) for each wiki. And entering an editor is more work than creating a symlink. My scheme got rid of that. (Though it doesn't care if you run in that directory, it will still follow $0). registerTagHandler
, registerRESTHandler
- and to branch off to add
for the sake of 4 characters of typing is rather churlish. Note also that the coding standard for both Foswiki and TWiki requires lower-case first character function names.
A couple of notes; I see from reading the code example in PeriodicTasks that the plugin author is expected to know about the context variable Periodic_Tasks. Why? Why can't this check be done in Foswiki::Func::registerTask
? One less thing for the plugin author to have to worry about.
Also, your example shows a task being added from a plugin, but doesn't say how you might add a task from a Contrib (which doesn't have an init
function). This is a problem that affects other plugins that can themselves be extended, and is done by supporting registration through configure
. For example, the JQueryPlugin lets you define $Foswiki::Cfg{Plugins}{JQueryPlugin}{Plugins}
to be a set of modules that the JQueryPlugin is to load when it is started up. That lets you register a new jquery-plugin without having to implement a Foswiki-plugin. This is needed for - for example - MailerContrib, which doesn't have a plugin. The analogous task registration might be something like this:
$Foswiki::cfg{PeriodicTasks}{Mailer}{Function} = 'Foswiki::Contrib::MailerContrib::notify'; $Foswiki::cfg{PeriodicTasks}{Mailer}{Schedule} = '1 * * * 3'; $Foswiki::cfg{PeriodicTasks}{Mailer}{Arguments} = [ 1, 2, 3 ]; $Foswiki::cfg{PeriodicTasks}{CacheCleanup}{Function} = 'Foswiki::Plugins::CacheCleanupPlugin::pluginCleanup'; $Foswiki::cfg{PeriodicTasks}{CacheCleanup}{Schedule} = '1 * * * *';If you adopt this approach you don't actually need any changes to the
Foswiki::Fun
API - the whole thing can be done via configure
- though I confess I rather like your fine-grained task management through the Func API.
-- CrawfordCurrie - 03 Jun 2011
Thank you for your detailed answers. I understand about the symlinks - it just doesn't "feel" the same as the other scripts (for example, running rest script). But maybe I'm too close to it these days (I am running trunk in production). I understand the misapprehension admins (other than myself?) must feel about having to enumerate specific LIB paths just to fire something that should "just work".
I'm not saying it's a bad idea, it's just that so many arbitrary inconsistencies have been painfully removed, it would be a shame to add a new one. Which might mean that we find a solution that covers all the other scripts as well?
And I understand we're dragging out the scope of your original goals. That just means you're doing something right
-- PaulHarvey - 03 Jun 2011
Crawford,
I don't remember seeing a coding standard, but as I'm making other changes, I'll adapt. Pointer?
The reason that the context variable is required is that the API doesn't exist when running under the webserver (e.g. normally). The whole thing lives in what used to be tick_*wiki.pl, which materializes all this stuff before calling *wiki->new(). It's unconventional - but the idea was to avoid touching func.pm - and also, to not load the code for the services into the webserver environment. (Keep in mind that tasks generate wiki requests; unlike everything else that's oriented toward responding to them...)
So you can't call Add/Register/anything unless you know it's there...you'll die calling an undefined function. So there's no way to stub it out. Of course, if I patched *Func.pm, I could put stubroutines there - but that's just baggage for the webserver. And what would the caller do? There's nothing you can do with them, because the data isn't there. The calller doesn't want to check each call - either it's running scheduled, or it's running under the webserver. It's not a fine-grained choice. If it was just one "Add a task" call, it would be a wash. But as you'll see, probably it's a larger block of code.
And this really is a new context, so it felt reasonable. Given that EmptyPlugin will provide a template, the test will just be part of the formula for how you write a plugin.
I hadn't gotten to the Contrib problem - thanks for explaining the situation and for the concept. As a first pass, I now provide a Contrib loader,that runs after the normal wiki->new initialization, but before any task is scheduled. Your contrib would have a small interface to the task schedular. It can simply wrap your existing code, or take advantage of the other facilities.
You define what to load with these items - you can point to any module in @INC, but usually I'd expect it to be as shown:
{Periodic}{Contrib}{*}{Module} = 'Xwiki::Contrib::*::Tasks'; {Periodic}{Contrib}{*}{Version} = "3.0"; # Optional, minimum acceptable versionand it will require/import and optionally VERSION-check all listed {Module}s. I will eventually call each's initContrib with something like the same signature as initPlugin - I should be able to dig that out of the session - except maybe
$installweb
?
This gives the contrib a chance to initialize & decide what other part(s) of itself (if any) to load. And it can then decide what task(s) it wants to schedule. For example, MailerContrib may have multiple schedules - maybe different for news vs mail, maybe per-web. It would obtain those from the normal contrib's namespace. I don't see any point in replacing command line arguments in the framework.
I also provide a contribCleanup convenience call-if-there, so it would be exactly analogous to a plugin - except only loaded in the scheduled environment.
I put an example below - which loads.
Does this seem reasonable to you?
As for nested plugins - I'm inclined to say that if JQuery loads an extension, JQuery gets to pass on the call to its initPlugin
. (However that's spelled.) And by extension check for and call, pluginCleanup
from its own.
Paul,
Based on your previous comment, I will take a setlib.cfg from cwd, if there is one. If not, I'll follow the links. So for most of us, nothing changes. if you (cd bin && ../tools/tick_*wiki.pl), you'll get the setlib.cfg from there. If you (cd tools && ./tick_*wiki.pl)
(where there's no setlib.cfg
), I'll try to follow the links, thus supporting system startup of the (unusual) multi-version-wiki environment.
I've been beaten-up in the past for not thinking of the multiple-version-wiki environment. So far, the links are the best way I know to make that work for system startup... And other scripts can certainly do the same thing, including shell scripts - readlink -en is your friend in the shell. But if you have other ideas, I'm open.
And now for the other news.
I've been pondering all the feedback, and come to some conclusions.
First, I'm not developing a general queue or batch job management system. The world has enough of those. But *wiki does have a unique set of issues that do seem worth addressing. We are pure perl, we have a complex set of configurable plugins/addons. It's expensive to instantiate a session, so a persistent environment is desirable. We would like to have unified configuration and management of maintenance and some batch/off-line processing. Scheduled processing is one part of the problem, but there are other events that we want to trigger tasks. And we don't include maintenance processing as a first class construct - whether it's working area cleanup in plugins, the tick_*wiki stuff, or Contribs like Mailer.
I'm coming to think of this as an environment that processes a non-web source of wiki requests. The environment is different because under a webserver, you have to deal with time limits, users who navigate away, webserver restarts, and other external factors that raise havoc with maintenance activities. The environment I'm creating has a wiki session, but is stable and event-driven. Plus, it integrates maintenance coding into developing plugins/extensions rather than leaving it to ad-hoc cron scripts.
So, since I've said this is a prototype - I'm making some changes. Again.
Orthogonal to the synchronous/asynchronous (threaded vs. forked) task types, I'm implementing a triggering model. So in addition to a task being triggered by a cron-like schedule, it can also be triggered on anything that select (the system call, not the perl function) can wait on.
So, for example, you can register (and yes, I called the API registerFileHandles ) a callback for a listening socket. I do that internally, so one can get status from the command line (or a plugin). I expect I'll have a forking version too, as most sane people don't like to write non-blocking select-threaded code.
But, you can also register other events. One that I'm building in is inotify - makes watching for config file changes much more efficient and response more timely. And you can use that to monitor directories (e.g. under working/yourfaclity/) used for request queues. So your off-line PDF generation can watch for a request, have a thread forked in real-time & put it's output back in pub. Or whatever. I will fall-back to polled monitoring with stat() on systems that don't support inotify, though hopefully over time others will add their equivalents. I'm still thinking about the minimum semantics to make supporting multiple systems easy.
I am also thinking about an at (or after) a specific time trigger. Cron is great for expressing periodic schedules, but clueless about "do this on 4-jul-1853" or r"etry this once 30 seconds from now."
And so I expect I'll change spellings and signatures - but then, no one else has actually coded to this yet - that I know of.
I think this will provide enough infrastructure for others to build solutions to the issues raised in prior comments.
I suspect it will be a few days - some of this is tricky, and I have other stuff in my queue as well.
By the way, I've seen a taint issue in tick_twiki (4.2.3) but haven't investigated. Anyone been there (care to?) It would be helpful, as setuid forces taint mode...and I don't need the distraction of investigating...
| 03 Jun 2011 - 12:06 | (main) Periodic Task[15812](E): Schedule::Cron - Error within job 5: Insecure dependency in unlink while running with -T switch at /var/www/servers/twiki/lib/TWiki/Store/RcsFile.pm line 732.|Here's what a contrib interface module looks like (Unsurprisingly similar to a plugin, I hope. Not the same to catch errors.):
package TWiki::Contrib::PeriodicTasks::MailerContrib; # Always use strict to enforce variable scoping use warnings; use strict; require TWiki::Func; # The plugins API use vars qw( $VERSION $RELEASE ); #$VERSION = '$Rev: 15942 (11 Aug 2008) $'; $VERSION = 1.1; # Checked by loader. $RELEASE = 'V0.000-001'; our $contribName = 'MailerContrib'; sub initContrib { my( $topic, $web, $user, $installWeb ) = @_; TWiki::Func::writeDebug( "$contribName loaded" ); unless( TWiki::Func::getContext()->{Periodic_Task} ) { die "Configuration error: " . __PACKAGE__ . "$contribName should never be initialzed by a webserver" } # Task definitions, reconfig handler, etc goes here. my $dummy = $TWiki::cfg{Contrib}{$contribName}{Useless}; return 1; } # Task run on standard plugin/contrib cleanup schedule # # You need only define this subroutine for it to be called on the admin-defined schedule # $TWiki::cfg{CleanupSchedule} # # For a simple contrib, this is all you need. This sample code simply deletes old files # in the working area. The age is configured by a web preference or a config item. # # This name (contribCleanup) is required. sub contribCleanup { my( $session, $now ) = @_; TWiki::Func::writeDebug( "$contribName: Running contribCleanup: $now" ); my $wa = TWiki::Func::getWorkArea($contribName); # Maximum age for files before they are deleted. # Note that updating MaxAge in configure will be reflected here without any code in the contrib. my $maxage = TWiki::Func::getPreferencesValue( "\U$contribName\E_MAXAGE" ) || $TWiki::cfg{Contrib}{$contribName}{MaxAge} || 24; my $oldest = $now - ($maxage*60*60); # One might want to select only certain files from the working area and/or log deletions. foreach my $wf ( glob( "$wa/*" ) ) { my( $uid, $gid, $mtime ) = (stat $wf)[4,5,9]; if( $uid == $> && $gid == $)+0 && $mtime < $oldest) { <font style="background-color: #f5f5f5;"> </font>$wf =~ /^(.*$)$/; # Untaint so -T works $wf = $1; unlink $wf or TWiki::Func::writeWarning( "Unable to delete $wa: $!" ); } } return 0; } 1;Finally just for fun, here's a status report generated from an network-triggered synchronous task -the daemon packages it up and sends it back to your command line.
tools/experimental_tick_twiki.pl -d status list Daemon is running (16104) Job queue (ordered by next scheduled execution time): Job 0 */2 * * * * 34 Next: Fri Jun 3 12:24:34 2011 - TWiki::Plugins::PeriodicTestPlugin::cronTask1 Job 5 */2 * * * * 34 Next: Fri Jun 3 12:24:34 2011 - TWiki::Periodic::TickTock Job 6 */1 * * * * 19 Next: Fri Jun 3 12:25:19 2011 - TWiki::Periodic::ReConfig Job 3 */2 * * * * 23 Next: Fri Jun 3 12:26:23 2011 - TWiki::Periodic::Forker-1 Job 4 */2 * * * * 23 Next: Fri Jun 3 12:26:23 2011 - TWiki::Periodic::Forker-2 Job 2 18 20 * Jul-Sep Sun,Sat Next: Sat Jul 2 20:18:00 2011 - TWiki::Plugins::PeriodicTestPlugin::News Job 1 1,3,4,6-17 8,15-17 * Feb,Apr,May,Aug-Dec/2 Tue,Thu 1,7,14,23 Next: Tue Aug 2 08:01:01 2011 - TWiki::Periodic::Mail End of job queue Active asynchronous tasks: PID Started Name 16093 Fri Jun 3 12:24:23 2011 TWiki::Periodic::Forker-1 16094 Fri Jun 3 12:24:23 2011 TWiki::Periodic::Forker-2 End of active task listOK, it's not that exciting, but I can be easily amused. Sometimes. -- TimotheLitt - 03 Jun 2011 Very cool! Especially the inotify stuff, it is exciting. Thank you for entertaining the ever growing scope creep FWIW, I never had a problem running multiple wikis with the existing arrangements. My own scripts do require you to have an
FOSWIK_LIBS
envar set or run the cumbersome sudo -u www-data perl -wT -I /path/to/foswiki/lib mytick.pl
, however
-- PaulHarvey - 04 Jun 2011
The coding standards are at FoswikiCodingConventions (which is in turn linked from DevelopersBible, where all the developer help is portaled).
w.r.t the API - OK, I understand. You are "monkey patching" the API during task runs. That feels wrong to me, because it means that the API is different depending on the runtime context - ouch! What was the rationale for not doing this stuff in your own namespace e.g. Foswiki::Tasks
? You would still need to consult the runtime context to determine if the API is available, but in terms of code separation and encapsulation I feel it would be cleaner. Func
API. This would make sense if the functionality was ultimately to be adopted into Func, but my gut still tels me what you have here is so significantly more than that, that it ought to stand in it's own package.
initContrib
. Contribs don't have an init
step, because there is nothing to init them from. Plugins are registered at startup, by virtue of their entry in configure (and auto-discovery, though that's discouraged), but there is nothing analagous for Contribs. Are you advocating a general purpose init
step for contribs? If not, if initContrib
is specific to the task environment, then the name needs to be specific to the role - e.g. initPeriodicTasks
.
I'm still struggling with the semantics of the cleanup step, especially now you have added this step to contribs. There has to be a clear definition of what cleanup
actually means. There are a number of different points at which "cleanup" is appropriate; for example: tick_twiki
taint issues; as you know that module is trivial, and doesn't take any input other than what comes from data files (which may be tainted, of course). I have not seen any such issues with Foswiki, but there have been several thousand bugfixes in the core code since we forked, several of which involved taint issues, and it could be any one of those. Only by nailing down the issue to a reproducible testcase (and ideally reproducing it on Foswiki) could it be addressed.
I really like the sound of your triggering model. That's something I've wanted for the longest time
Keep up the good work!
-- CrawfordCurrie - 04 Jun 2011
Timothe, exciting times...
-- MichaelDaum - 04 Jun 2011
I've been scratching my head about a few things for the last few days. I have not been able to resolve these things in my head, so I figure I should mention them.
If plugins need an API to the task scheduler, then I do believe that Foswiki::Func should provide that API. However, I wonder how much of an API is needed.
As Crawford pointed out, contribs also have a use for periodic tasks and config-change-handler tasks, but contribs have no initialisation interface and so a Foswiki::Func API for adding tasks would not be useful to contribs. We could add an initialisation interface for contribs, but I think that should be the subject of a separate feature proposal. With contribs as they are today, I do not see how contribs could use the task scheduler.
I am also unclear about the usecase for replaceSchedule. When would that be used? Do we have a usecase for plugins (or contribs) to change their schedule on-the-fly? That could be powerful, but I suspect nasty surprises could lurk there. How will developers debug this, and provide support for it? Will configure
be able to query the daemon about the current schedule?
Defining a task schedule via Foswiki::cfg sounds simpler and more attractive than using a run-time API for managing schedules, so I am delighted to see that PeriodicTasks now shows a configure interface. But... how does that mechanism for defining a schedule interact with addTask and replaceSchedule? I assume that addTask and replaceSchedule won't be modifying Foswiki::cfg...
In contrast, I do see the point of something like nextRuntime, which could be useful to regular web-server processing as well as the scheduled tasks.
I do take Crawford's point that perhaps the API should live in another package. I therefore suggest something like Foswiki::Func::taskScheduler
which returns a reference to a task scheduler object (when executing from the daemon) or undef (when executing in a web-server environment). That object may provide the API. This would make the functionality discoverable to new or inexperienced developers (because it is accessible from Foswiki::Func), it would avoid dumping many functions into Foswiki::Func, and it would encapsulate the task API.
(Or - how about if Foswiki::Func::taskScheduler
returns an reference to an object, and that object's class conforms to an interface (pure virtual base class). The API is defined in terms of that interface. The actual class might differ between daemon and webserver usage.)
This is good and exciting work, but some aspects still have me puzzled.
-- MichaelTempest - 06 Jun 2011
Michael correctly observes that the plugins API will have to tell configure
if it changes something. In reality, the daemon, and the plugins API all have to tell eachother what's going on. Simplest approach is to kick the daemon in the head each time a change is made (though of course, plugins using the API and configure may still conflict).
-- CrawfordCurrie - 06 Jun 2011
Thanks for all the feedback. Sorry I've been off-line for a while.
I have a pretty good idea of what version 3 will look like - just need a few tuit's to get it consolidated. I will post something when the bits are there.
A couple of quick responses. I plan to rename this - it's no longer simply periodic, nor are the tasks only cleanup. Probably "Task Framework". I've been brow-beaten into a bigger project - but it seems useful and there's this other stuff I'm avoiding by working on it However, the degenerate case (I just want my plugin to delete old files once in a while) remains simple.
Think of the Tasks Environment as a new place under which the whole wiki code runs for specialized functions. A webserver "replacement" for these pesky maintenance tasks. But there's no user doing a GET or POST to trigger action. (Or to abort it at an inopportune time.) So, to run under this environment, contribs have to register themselves. That's how the environment knows they want service. I decided to use the familiar plugin model - but from a contrib's point of view, it's plugging out of wiki, and in to the framework. Current plugins, which aren't contribs can run in both environments.
For an example, let me pick on MailerContrib again. Today, a shell script (twiki_maintenance
) is started by cron
using a crontab
schedule. That script sequentially runs (on sub-schedules) various tasks, including run_tick_twiki
, runmailnewsnotify
, runmailwebnotify
, and runstatistics
. One has to do it this way because MailerContrib in particular gets in trouble if it runs news and web concurrently. Each of these run scripts does a cd
and runs the corresponding perl notify script. And that script is just a command line UI wrapper for the worker Contrib code, which is built as an object!
Under the Tasking Framework/environment, things are parallel - but simpler. MailerContrib registers with Configure at install time (pretty much as today - I want Configure to be the management interface for wiki, we don't need yet another - quite.) The registration causes the tasking daemon to load MailerContrib 's interface module. It doesn't know what MailerContrib wants, just that it needs to be activated. MailerContrib consults its config items (and an astrologer or anything else) and calls back requesting that something be called with the news argument list weekly and the webchanges argument list daily. It can defer loading most of its code until the first call - which may be in a private (asynch) fork. No crontab, no shell wrappers with magic -I switches, it just gets called. Mail generation may take a while on a large, busy web, so perhaps it asks for an asynch task that handles both. Or it takes out a lock. That's your design choice. It's somewhat less work. But all the scheduling and control is in one place for the administrator - under *configure.
I'm more than happy to remove the Foswiki::Func aliases. I'm already thinking along the lines of a more object-oriented interface - it will reduce the number of APIs names (by turning them into methods). But the abstractions are a bit different - we have something like a task which is activated by a trigger, that may be periodic, inotify, or something else.
The current schedule - and more - is available dynamically from the daemon. In fact, it has an embedded (very limited) webserver. So debuggers can connect directly and restart, suspend and resume tasks. And if you use the magic macro, it can be embedded in a wiki page. The command line mangement tools just talk to the webserver and get text. There's even a magic ability to click a button and start the daemon. (Yes, it starts itself, and no, you don't need shell access.)
The daemon notices that LocalSite.cfg has been modified (polling or inotify); it reloads it on-the-fly. Your tasks can register for notifications - by specific configuration item - and will be told. So the simplest model is that configure changes a schedule, your task is registerd for that config item, and calls replaceSchedule.
All this is running now. What I haven't gotten to is the non-periodic stuff (except inotify on the config file, which works well), changing the API yet again, and posting updated documentation and screenshots.
Kicking daemons in the head is OK for developers and environments where not much is happening. If all the folks who've stepped up and said they want to also run queues and indexing and other long operations actually use this, it will be a big deal to stop and start. The graceful restart waits for all async (forked) kids to finish - and that could be minutes or hours, and during that time, I don't pick new tasks to run so the pipeline drains. It's not hard to deal with dynamic changes if you code for that from the start - and this is a new thing, so there's no excuse not to.
Then again, nothing has to migrate until the owner has round (or octagonal) tuits. Cron is still there.
Some of this will be easier to deal with when you can actually touch it. I understand the "other wiki" problem. To that end, I'm trying to get a new VM running Foswiki (as well as TWiki) trunk. It hasn't been easy - I just wrote up some of the challenges. But since for now this VM is dedicated to this, I should be able to make it available to interested parties on the public network. The good news is that except for configure, it really knows very little of which wiki (or the wiki's internals) it's running over.
But it sure is far from the 25 lines of code that would have solved the "clean up your disk space" problem.
Mailer | V0.000-001 | lib/TWiki/Contrib/PeriodicTasks/MailerContrib.pm |
TopicSummary | Plugins need a working/temp file cleanup mechanism |
CurrentState | ParkedProposal |
CommittedDeveloper | SvenDowideit |
ReasonForDecision | AcceptedBy14DayRule |
DateOfCommitment | 7 Sep 2011 |
ConcernRaisedBy | |
BugTracking | Tasks.Item10780 |
RelatedTopics | PeriodicTasks |
PlannedFor |