TIP DBIStoreContrib is not installed on Foswiki.org.

DBIStoreContrib

Supports fast queries and searches over the content of topics and attachments by caching wiki topics in an SQL database (e.g. MS SQL Server, MySQL, PostgreSQL)

  • Automatically translates %SEARCH{type="query" into SQL queries
  • Supports plain-text searches over attachment content
  • Foswiki data is "cached" in an external SQL database that can be queried by other tools (though changes made in the database by other tools will not be reflected in the wiki)

The DBIStoreContrib has been designed to be as compatible as possible with existing %SEARCH expressions, and is usable with all well-written extensions (those that have been written to use the Foswiki::Func API, and do not perform direct-to-disk file operations.)

It has "personality modules" that support the following SQL implementations:

Note that there are many different versions of these databases and full compatibility with any given version cannot be guaranteed.

The release package includes the DBIStorePlugin, which hooks into the Foswiki save operations to incrementally update the database.

Alternatively, you can use the included dbistore_manage.pl program to create cron (or iwatch) jobs to update the database offline.

Database Schema

The database schema must reflect 1:1 the structure of data in Foswiki topics. The default schema will work for most databases, though you may need to tweak it for Microsoft SQL Server.

The schema is defined using a Perl hash structure. Top level keys in this hash represent:
  • The name of a column type declaration, if the key starts with an underscore e.g. _DATE
    • column type names map to a hash defining a column, as described below
  • The name of a table
    • table names map to a hash defining the columns in each row of the table
      • Each column name maps to a hash defining the column
      • column names can also map to the name of a column type declaration e.g. _DATE.

Declaring tables

Some basic administrative tables must always exist in the schema:

topic
Contains all wiki topics.

  • web - the web name
  • name - the topic name
  • timestamp - the date/time the row was inserted
  • text - the full text of the topic, without embedded meta-data
  • raw - the full text of the topic including embedded meta-data

metatypes
Contains the names of all meta-tables.
  • name - table name, e.g. TOPICINFO, FORM etc

Other tables are used for different types of Foswiki %META: - for example, the table TOPICINFO corresponds to the %META:TOPICINFO found at the top of all (raw) topics. Each column in these tables corresponds to an attribute in the corresponding %META:.

If the default names of tables would conflict with other uses in the database, you can define an optional prefix to be used on table names in configure.

If the FILEATTACHMENT table in the schema has a field called 'text', and the StringifierContrib is installed, then the attachment will automatically be serialised (if necessary) and stored in text form in the database. This pseudo-field can be searched using standard SEARCH queries, for example:

%SEARCH{"attachments.text~'*telephoto*' type="query"}%

Note that this will only tell you the topic, and not the specific attachment that contained the text.

Automatically extending the schema

The schema should normally specify a table for every %META: that may occur in topics. This works so long as you know what meta-data will be added by the plugins you use. If you don't know, you can optionally specify that new tables are automatically added by setting {AutoloadUnknownMETA} in configure.

Normally only columns defined in the schema are loaded. Columns can be automatically added for attributes missing from the schema by setting {AutoAddUnknownFields} in configure. Automatically added columns will be given the type specified by the _DEFAULT column type declaration.

IDEA! it is always best to make sure all meta-data has a schema entry. Automatic extension of the schema is relatively expensive, and should be avoided if possible.

Defining columns

Each row in each table always has a tid column, which must be an integer type. This column is used to associate the row with a row in the topic table (and meta-data in other tables).

Columns are defined in a Perl hash containing attributes:
  • type (required) specifies the SQL type for the column
  • basetype optionally specifies a column type declaration to base this type on (local attributes override those in the base type)
  • truncate_to if set to a length, then the value of that field stored in the DB will be truncated to that length. This only affects searching. If truncate_to is not set, then trying to store a value longer than the field accepts will be an SQL error.
  • default can be used to provide a default for the column. If you don't give a default, one will automatically be chosen from the column type.
  • unique may be set to the name of a uniqueness constraint (see below)
  • primary if true, the column will be the PRIMARY KEY for the table. This automatically enforces a uniqueness constraint on the column.
  • index if true, then an index will be created for the column

For example, the WORKFLOW table for the WorkflowPlugin might be:
       tid => { type => 'INT', primary => 1 }
       name => { type => 'TEXT', truncate_to => 30 }

Using a basetype
  _INDEXED => { type => 'INT', index => 1, unique => 'smatter' },
  ...
  SOMEDATA => {
    attr => { basetype => '_INDEXED', type=> 'TEXT', truncate_to => 20 }
  }
will give a column type of
{ type => 'TEXT', index => 1, unique => 'smatter', truncate_to => 20 }

Uniqueness constraints

The unique column attribute is used to specify what unique-set the column belongs to. Unique-sets force each value of a column (or the combined values of a set of a group of columns) to be unique in the table. There may be more than one unique-set in a single table. For example,

    topic => {
        ...
        web  => { type => 'VARCHAR(256)', index => 1, unique => 'webtopic' },
        name => { type => 'VARCHAR(128)', index => 1, unique => 'webtopic' },
        ...
        },
specifies the unique-set webtopic on both web and name columns. The database will enforce the constraint that no two rows can have the same web and topic (but two rows can have the same web, so long as the topic is different, or the same topic, so long as the web is different). Note that primary columns are automatically unique and don't need a unique attribute.

_DEFAULT

The column type declaration _DEFAULT must exist and must be a text type, ideally supporting arbitrary length text strings, if either of the {AutoloadUnknownMETA} or {AutoAddUnknownFields} options are enabled in configure.

SQL Types

Different database implementations have different requirements for the best types to use with that database, so the schema has to be tailored for the specific database being used. See http://foswiki.org/Support/DBIStoreContribSchemas for more information.

Search Limitations

Regular Expressions

Regular expression searches are mapped to whatever regular expression support exists in the database. Most SQL databases support sophisticated regular expression matching; however there are features of the default Perl syntax supported by Foswiki that cannot be mapped to the databases. Regular expressions written using this extended syntax may fail.

Numeric comparison

Foswiki is built on Perl, which "knows" when two values in the database can be compared using numeric comparison (the , <, >, < and > = operators). SQL doesn't have this kind of support, and will always
use lexical comparison on strings, so be careful when comparing using numeric comparison operators, as the results are likely to be different to what base Foswiki would return.

Date comparison

Date conversion using the d2n operator is not supported with PostgreSQL and MySQL.

Row indexes

Integer indexes are not supported. Use queries instead.

Representational types

The type defined in the schema must be long enough to store any possible value for the given field, but be as short as possible to maximise the DB's chance of building a decent index for it.

length of an array

The length() operator only works on string data, not on tables.

Unicode

Unicode characters are supported in the database by UTF-8 encoding everything and storing as bytes. This is a compromise between the need to store international characters, and doing so in an efficient way (SQL Server, especially, is very bad at storing unicode natively).

If you intend to query the database outside of Foswiki, bear in mind that all strings are UTF-8 encoded. That will also apply to any data you have stored in form field values and extended meta-data tags.

If you are using Foswiki <2 then data will automatically be converted from your {Site}{CharSet} to UTF-8 before saving in the database.

Installation Instructions

You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.

Open configure, and open the "Extensions" section. "Extensions Operation and Maintenance" Tab -> "Install, Update or Remove extensions" Tab. Click the "Search for Extensions" button. Enter part of the extension name or description and press search. Select the desired extension(s) and click install. If an extension is already installed, it will not show up in the search results.

You can also install from the shell by running the extension installer as the web server user: (Be sure to run as the webserver user, not as root!)
cd /path/to/foswiki
perl tools/extension_installer <NameOfExtension> install

If you have any problems, or if the extension isn't available in configure, then you can still install manually from the command-line. See https://foswiki.org/Support/ManuallyInstallingExtensions for more help.

  • Go to configure and:
    1. Set a DSN in the 'Extensions' section (the default is for sqlite3). Example DSNs:
      • dbi:ODBC:wiki - generic ODBC
      • dbi:Pg:dbname=foswiki - PostgreSQL
      • dbi:mysql:database=wiki;host=mysqlserver;port=1234 - MySQL
    2. Select a suitable personality module for your database type.
    3. Select Foswiki::Store::QueryAlgorithms::DBIStoreContrib for the {Store}{QueryAlgorithm} EXPERT setting 1 Select Foswiki::Store::SearchAlgorithms::DBIStoreContrib for the {Store}{SearchAlgorithm} setting

Loading the database

It is necessary to initialise the database from the existing store. You can use tools/dbistore_manage.pl to do this.
$ cd bin
$ perl ../tools/dbistore_manage.pl --help
will give details.

The dbistore_manage.pl program also allows you to run Foswiki search queries and SQL statements against the database from the command-line.

The best approach when initially setting up a DB is to pick a specific topic and load it. For example,
$ perl ../tools/dbistore_manage.pl --reset --load Sandbox.TestTopic --query "web='Sandbox' AND topic='TestTopic'"
If you are using PostgreSQL, MySQL or SQLite, the default schema should work fine for you. However if you are using SQL Server, you will have to tweak the schema as descibed in "Microsoft SQL Server Notes", below.

Testing

Basic tests for queries can be found in the DBIStoreTest topic.

MySQL Notes

Here's how the test environment (where the database server is running on the same host as Foswiki) is set up:
CREATE USER foswiki@localhost IDENTIFIED BY 'foswiki';
GRANT SELECT, INSERT, CREATE, DROP ON foswiki to foswiki@localhost;
FLUSH PRIVILEGES;
CREATE DATABASE foswiki;
ALTER DATABASE foswiki COLLATE utf8_unicode_ci;
We assume a UTF-8 encoding.

Postgresql Notes

Here's how the test environment (where the database server is running on the same host as Foswiki) is set up:
CREATE USER foswiki PASSWORD 'foswiki';
CREATE DATABASE foswiki;
We allow the DBD driver to handle the encoding. This will usually be UTF-8.

Microsoft SQL Server Notes

SQL Server is supported via ODBC. See "ODBC Notes" for more.

If you are using the FreeTDS driver, make sure your odbc.ini is selecting the highest possible protocol version that your server supports. See http://www.freetds.org/userguide/choosingtdsprotocol.htm#FTN.AEN870

A couple of {Extensions}{DBIStoreContrib}{DBIAttributes} must be set in configure:
{
    LongTruncOk => 1,
    LongReadLen => 1024
}
Note that LongReadLen is used to determine when to force bound parameters to use SQL_LONGVARBINARY for transmission over TDS. If you find the server is complaining about failure to convert to the server character set, you may need to reduce (or increase!) this limit.

Unicode

Unicode support in SQL Server consists of the UTF-16 encoding, which uses 2 bytes per character. If you are using mainly western characters with only a few high-bit characters, this is inefficient, but is the only international character set support available. While there is DBD support for unicode conversion to/from UTF-16, we found it difficult to configure and concluded it was not worth the effort. Anyone who wants to take the time to work out how to support UTF-16 is very welcome to try, and contribute their results back to the community.

The alternative we have chosen is to simply store UTF-8 octets in the database, and pretend they are character strings. However there is no UTF-8 collation on SQL server, so sorting (and possibly searching) of non-western character sets will not work. However it should suffice for users of principally western text.

Note that you must make sure you set a case-sensitive collation for the database. For example,
ALTER DATABASE Foswiki COLLATE Latin1_General_100_CS_AS_SC;
Without this, upper and lower case strings will be compared case-insensitively, and the DB will be unable to distinguish ThisTopicName and THISTOPICNAME.

ALERT! IF YOU WANT FULL SUPPORT FOR UNICODE CHARACTERS, DON'T USE SQL SERVER

Authentication

If you are using Windows authentication for access to the database server, then the simplest thing to do is to use the ODBC driver and create a data source for SQL Server which uses Windows authentication. Then set an empty username and password in the DBIStoreContrib configuration.

Regular Expressions

SQL Server does not come equipped with regular expression matching, which is required for Foswiki, so you will need to install a regular expression library. The default personality module included in this module requires the user function dbo.fn_RegExIsMatch to stub the .NET RegEx class. Instructions for building and installing this user function can be found at http://www.codeproject.com/Articles/19502/A-T-SQL-Regular-Expression-Library-for-SQL-Server

Schema

Depending on the version you are running you will need to replace TEXT types with a VARCHAR type in the schema. Note that index fields are limited to 900 bytes.

SQLite Notes

SQLite requires the pcre module to be installed to support regular expression searches. The path to this module is set up in configure. We allow the DBD driver to handle the encoding. This will usually be UTF-8.

ODBC Notes

DBD::ODBC can be used as an interface to a number of different database types, and it may be the simplest way to access your database. ODBC should be transparent; simply select the personality that applies to your database, with an ODBC DSN. You may need to tune the DBD::ODBC driver for use with your specific database; this can be done using the configure EXPERT setting {Extensions}{DBIStoreContrib}{DBIAttributes}.

Note that there has been no testing with ODBC drivers that are compiled with unicode support enabled.

How it works

Searches in Foswiki fall into two types - text searches and query searches. Text searches are what you use to find text in a topic, while query searches use a structured query language to specify logical combinations of field values. Foswiki has two internal configuration options, {Store}{SearchAlgorithm} and {Store}{QueryAlgorithm}, that select the algorithms that it will use for these different types of search.

The default Foswiki query algorithm works by cherry-picking parts of queries to create regular expression searches that can then be passed though the search algorithm. These searches are then used as a filter on the set of topics, to narrow down to a subset that is then "brute force" matched against the original query to give a final set of matching topics. This is fine when you have relatively few queries, but is not terribly efficient if you make a lot of use of query searches.

The DBIStoreContrib does things differently. When you select DBIStoreContrib for the query algorithm, query searches are automatically transformed into SQL queries, which are very fast. Further, you can optionally choose to use the database for standard text searches (including regular expression searches, if your DB supports them).

DBIStoreContrib maintains the SQL database using an (included) Foswiki plugin to "eavesdrop" on save operations in Foswiki. When a topic is saved, it is automatically added to the database in the background. The DB only caches the latest version of a topic; version histories are not stored (that would be the job of a full store implementation, which is a different project).

The database is interfaced to via the standard Perl DBI interface, so any RDBMS that has an adapter can be used for the cache. A number of common SQL implementations are supported out of the box. If you want to use a different database you will have to implement a personality module for it; if you do this, please contribute the new personality module (and appropriate usage information) back to the community.

The schema used to represent Foswiki topics is (currently) a 1:1 mapping of the schema described in QuerySearch. The same schema could be used to store Foswiki topics in the actual store, and this is one of the longer term goals. Among other things, this would allow us to search topic histories.

The Code

Here's an overview of the important bits of the contrib:
  • lib/Foswiki/Contrib/DBIStoreContrib/HoistSQL.pm - code that hoists SQL statements from Foswiki queries
  • lib/Foswiki/Contrib/DBIStoreContrib/Personality/*.pm - implementations of personality modules for different databases
  • lib/Foswiki/Plugins/DBIStorePlugin.pm - plugin handlers (Foswiki <=1.1 only)
  • lib/Foswiki/Store/QueryAlgorithms/DBIStoreContrib.pm - the query algorithm
  • lib/Foswiki/Store/SearchAlgorithms/DBIStoreContrib.pm - the search algorithm
  • tools/dbistore_manage.pl - command line program for manipulating and querying the DB.

Info

Change History:  
2.2 (Jul 2017) Fixed Foswikitask:Item14427
2.1 (Jun 2017) Fixed Foswikitask:Item14419
2.0 (April 2017) Unicode support for Foswiki 2.0
1.2 (18 Oct 2014) Foswikitask:Item12992: Added UNIQUE constraints to schema Foswikitask:Item13056: fix fileattachment moves in plugin
1.1 (12 Aug 2014) Fixes for Foswikitask:Item12989 and Foswikitask:Item12990:
1.0 (8 May 2014) First version
Dependencies:
NameVersionDescription
DBI>=0.1Required
Encode>=0required
DBD::Pg>=1.49Optional,Required for PostgreSQL.
DBD::MySql>=0Optional,Required for MySQL.
DBD::SQLite>=0Optional,Required for SQLite.
DBD::ODBC>=0Optional,Required for ODBC
File::Temp>=0optional,Required for serialised attachments
Foswiki::Contrib::StringifierContrib>=0optional,Required for serialised attachments
Home page: http://foswiki.org/bin/view/Extensions/DBIStoreContrib

PackageForm edit

Author Crawford Currie http://c-dot.co.uk
Version 2.3
Release 14 Jul 2017
Description Supports fast queries and searches over the content of topics and attachments by caching wiki topics in an SQL database (e.g. MS SQL Server, MySQL, PostgreSQL)
Copyright © 2010-2017 C-Dot Consultants
License GPL (Gnu General Public License)
Home http://foswiki.org/Extensions/DBIStoreContrib
Support http://foswiki.org/bin/view/Support/DBIStoreContrib
Repository https://github.com/foswiki/DBIStoreContrib
ExtensionClassification
ExtensionType ContribPackage
Compatibility Foswiki 1.1.9 and later
IncompatibleWith
ImageUrl
DemoUrl http://
SupportUrl DBIStoreContrib
ModificationPolicy CoordinateWithAuthor
I Attachment Action Size Date Who Comment
DBIStoreContrib.md5md5 DBIStoreContrib.md5 manage 168 bytes 14 Jul 2017 - 11:58 CrawfordCurrie  
DBIStoreContrib.sha1sha1 DBIStoreContrib.sha1 manage 192 bytes 14 Jul 2017 - 11:58 CrawfordCurrie  
DBIStoreContrib.tgztgz DBIStoreContrib.tgz manage 59 K 14 Jul 2017 - 11:57 CrawfordCurrie  
DBIStoreContrib.zipzip DBIStoreContrib.zip manage 74 K 14 Jul 2017 - 11:57 CrawfordCurrie  
DBIStoreContrib_installerEXT DBIStoreContrib_installer manage 6 K 14 Jul 2017 - 11:58 CrawfordCurrie  
Topic revision: r12 - 14 Jul 2017, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy