DBIStoreContrib
Supports fast queries and searches over the content of topics and attachments by caching wiki topics in an SQL database (e.g. MS SQL Server, MySQL, PostgreSQL)
- Automatically translates
%SEARCH{type="query"
into SQL queries
- Supports plain-text searches over attachment content
- Foswiki data is "cached" in an external SQL database that can be queried by other tools (though changes made in the database by other tools will not be reflected in the wiki)
The DBIStoreContrib has been designed to be
as compatible as possible with
existing
%SEARCH
expressions, and is usable with all well-written
extensions (those that have been written to use the Foswiki::Func API,
and do not perform direct-to-disk file operations.)
It has "personality modules" that support the following SQL implementations:
Note that there are many different versions of these databases and
full compatibility with any given version cannot be guaranteed.
The release package includes the DBIStorePlugin, which hooks into the
Foswiki save operations to incrementally update the database.
Alternatively, you can use the included
dbistore_manage.pl
program
to create cron (or iwatch) jobs to update the database offline.
Database Schema
The database schema must reflect 1:1 the structure of data in Foswiki
topics. The default schema will work for most databases, though you
may need to tweak it for Microsoft SQL Server.
The schema is defined using a Perl hash structure. Top level keys in
this hash represent:
- The name of a column type declaration, if the key starts with an underscore e.g.
_DATE
- column type names map to a hash defining a column, as described below
- The name of a table
- table names map to a hash defining the columns in each row of the table
- Each column name maps to a hash defining the column
- column names can also map to the name of a column type declaration e.g.
_DATE
.
Declaring tables
Some basic administrative tables must always exist in the schema:
topic
Contains all wiki topics.
-
web
- the web name
-
name
- the topic name
-
timestamp
- the date/time the row was inserted
-
text
- the full text of the topic, without embedded meta-data
-
raw
- the full text of the topic including embedded meta-data
metatypes
Contains the names of all meta-tables.
-
name
- table name, e.g. TOPICINFO
, FORM
etc
Other tables are used for different types of Foswiki
%META:
-
for example, the table
TOPICINFO
corresponds to the
%META:TOPICINFO
found at the top of all (raw) topics. Each
column in these tables corresponds to an attribute in the
corresponding
%META:
.
If the default names of tables would conflict with other uses in the
database, you can define an optional prefix to be used on table names
in
configure
.
If the
FILEATTACHMENT
table in the schema has a field called 'text',
and the
StringifierContrib is installed, then the attachment will
automatically be serialised (if necessary) and stored in text form in
the database. This pseudo-field can be searched using standard SEARCH
queries, for example:
%SEARCH{"attachments.text~'*telephoto*' type="query"}%
Note that this will only tell you the topic, and not the specific
attachment that contained the text.
Automatically extending the schema
The schema should normally specify a table for every
%META:
that may occur in topics. This works so long as you know what
meta-data will be added by the plugins you use. If you don't know, you
can optionally specify that new tables are automatically added by
setting
{AutoloadUnknownMETA}
in
configure
.
Normally only columns defined in the schema are loaded. Columns can be
automatically added for attributes missing from the schema by setting
{AutoAddUnknownFields}
in
configure
. Automatically added columns
will be given the type specified by the
_DEFAULT
column type
declaration.
it is always best to make sure all meta-data has a schema
entry. Automatic extension of the schema is relatively expensive, and
should be avoided if possible.
Defining columns
Each row in each table always has a
tid
column, which must be an
integer type. This column is used to associate the row with a row in
the
topic
table (and meta-data in other tables).
Columns are defined in a Perl hash containing attributes:
-
type
(required) specifies the SQL type for the column
-
basetype
optionally specifies a column type declaration to base this type on (local attributes override those in the base type)
-
truncate_to
if set to a length, then the value of that field stored in the DB will be truncated to that length. This only affects searching. If truncate_to
is not set, then trying to store a value longer than the field accepts will be an SQL error.
-
default
can be used to provide a default for the column. If you don't give a default
, one will automatically be chosen from the column type.
-
unique
may be set to the name of a uniqueness constraint (see below)
-
primary
if true, the column will be the PRIMARY KEY for the table. This automatically enforces a uniqueness constraint on the column.
-
index
if true, then an index will be created for the column
For example, the WORKFLOW table for the
WorkflowPlugin might be:
tid => { type => 'INT', primary => 1 }
name => { type => 'TEXT', truncate_to => 30 }
Using a
basetype
_INDEXED => { type => 'INT', index => 1, unique => 'smatter' },
...
SOMEDATA => {
attr => { basetype => '_INDEXED', type=> 'TEXT', truncate_to => 20 }
}
will give a column type of
{ type => 'TEXT', index => 1, unique => 'smatter', truncate_to => 20 }
Uniqueness constraints
The
unique
column attribute is used to specify what
unique-set the
column belongs to. Unique-sets force each value of a column (or the
combined values of a set of a group of columns) to be unique in the
table. There may be more than one unique-set in a single table. For
example,
topic => {
...
web => { type => 'VARCHAR(256)', index => 1, unique => 'webtopic' },
name => { type => 'VARCHAR(128)', index => 1, unique => 'webtopic' },
...
},
specifies the unique-set
webtopic
on both
web
and
name
columns. The database will enforce the constraint that no two rows can
have the same
web
and
topic
(but two rows can have the same
web
,
so long as the
topic
is different, or the same
topic
, so long as
the
web
is different). Note that
primary
columns are automatically
unique and don't need a
unique
attribute.
_DEFAULT
The column type declaration
_DEFAULT
must exist and must be a text
type, ideally supporting arbitrary length text strings, if either of
the
{AutoloadUnknownMETA}
or
{AutoAddUnknownFields}
options are
enabled in
configure
.
SQL Types
Different database implementations have different requirements for the
best types to use with that database, so the schema has to be tailored
for the specific database being used. See
http://foswiki.org/Support/DBIStoreContribSchemas for more
information.
Search Limitations
Regular Expressions
Regular expression searches are mapped to whatever regular expression
support exists in the database. Most SQL databases support
sophisticated regular expression matching; however there are features
of the default Perl syntax supported by Foswiki that cannot be mapped
to the databases. Regular expressions written using this extended
syntax may fail.
Numeric comparison
Foswiki is built on Perl, which "knows" when two values in the
database can be compared using numeric comparison (the
, <, >, <
and
> = operators). SQL doesn't have this kind of support, and will always
use lexical comparison on strings, so be careful when comparing using
numeric comparison operators, as the results are likely to be
different to what base Foswiki would return.
Date comparison
Date conversion using the
d2n
operator is not supported with
PostgreSQL and MySQL.
Row indexes
Integer indexes are not supported. Use queries instead.
Representational types
The type defined in the schema must be long enough to store any
possible value for the given field, but be as short as possible to
maximise the DB's chance of building a decent index for it.
length of an array
The
length()
operator only works on string data, not on tables.
Unicode
Unicode characters are supported in the database by UTF-8 encoding
everything and storing as bytes. This is a compromise between the need
to store international characters, and doing so in an efficient way
(SQL Server, especially, is very bad at storing unicode natively).
If you intend to query the database outside of Foswiki, bear in mind
that all strings are UTF-8 encoded. That will also apply to any data
you have stored in form field values and extended meta-data tags.
If you are using Foswiki <2 then data will automatically be converted
from your {Site}{CharSet} to UTF-8 before saving in the database.
Installation Instructions
You do not need to install anything in the browser to use this extension. The following instructions are for the administrator who installs the extension on the server.
Open configure, and open the "Extensions" section. "Extensions Operation and Maintenance" Tab -> "Install, Update or Remove extensions" Tab. Click the "Search for Extensions" button.
Enter part of the extension name or description and press search. Select the desired extension(s) and click install. If an extension is already installed, it will
not show up in the
search results.
You can also install from the shell by running the extension installer as the web server user: (Be sure to run as the webserver user, not as root!)
cd /path/to/foswiki
perl tools/extension_installer <NameOfExtension> install
If you have any problems, or if the extension isn't available in
configure
, then you can still install manually from the command-line. See
https://foswiki.org/Support/ManuallyInstallingExtensions for more help.
- Go to
configure
and:
- Set a DSN in the 'Extensions' section (the default is for sqlite3). Example DSNs:
-
dbi:ODBC:wiki
- generic ODBC
-
dbi:Pg:dbname=foswiki
- PostgreSQL
-
dbi:mysql:database=wiki;host=mysqlserver;port=1234
- MySQL
- Select a suitable personality module for your database type.
- Select
Foswiki::Store::QueryAlgorithms::DBIStoreContrib
for the {Store}{QueryAlgorithm}
EXPERT setting 1 Select Foswiki::Store::SearchAlgorithms::DBIStoreContrib
for the {Store}{SearchAlgorithm}
setting
Loading the database
It is necessary to initialise the database from the existing
store. You can use
tools/dbistore_manage.pl
to do this.
$ cd bin
$ perl ../tools/dbistore_manage.pl --help
will give details.
The
dbistore_manage.pl
program also allows you to run Foswiki search
queries and SQL statements against the database from the command-line.
The best approach when initially setting up a DB is to pick a specific
topic and load it. For example,
$ perl ../tools/dbistore_manage.pl --reset --load Sandbox.TestTopic --query "web='Sandbox' AND topic='TestTopic'"
If you are using PostgreSQL, MySQL or SQLite, the default schema
should work fine for you. However if you are using SQL Server, you
will have to tweak the schema as descibed in "Microsoft SQL Server Notes",
below.
Testing
Basic tests for queries can be found in the
DBIStoreTest topic.
MySQL Notes
Here's how the test environment (where the database server is running
on the same host as Foswiki) is set up:
CREATE USER foswiki@localhost IDENTIFIED BY 'foswiki';
GRANT SELECT, INSERT, CREATE, DROP ON foswiki to foswiki@localhost;
FLUSH PRIVILEGES;
CREATE DATABASE foswiki;
ALTER DATABASE foswiki COLLATE utf8_unicode_ci;
We assume a UTF-8 encoding.
Postgresql Notes
Here's how the test environment (where the database server is running
on the same host as Foswiki) is set up:
CREATE USER foswiki PASSWORD 'foswiki';
CREATE DATABASE foswiki;
We allow the DBD driver to handle the encoding. This will usually be
UTF-8.
Microsoft SQL Server Notes
SQL Server is supported via ODBC. See "ODBC Notes" for more.
If you are using the FreeTDS driver, make sure your
odbc.ini
is
selecting the highest possible protocol version that your server
supports. See
http://www.freetds.org/userguide/choosingtdsprotocol.htm#FTN.AEN870
A couple of
{Extensions}{DBIStoreContrib}{DBIAttributes}
must be set
in
configure
:
{
LongTruncOk => 1,
LongReadLen => 1024
}
Note that
LongReadLen
is used to determine when to force
bound parameters to use
SQL_LONGVARBINARY
for transmission over TDS.
If you find the server is complaining about failure to convert
to the server character set, you may need to reduce (or increase!) this
limit.
Unicode
Unicode support in SQL Server consists of the UTF-16 encoding, which
uses 2 bytes per character. If you are using mainly western characters
with only a few high-bit characters, this is inefficient, but is the
only international character set support available. While there is
DBD support for unicode conversion to/from UTF-16, we found it
difficult to configure and concluded it was not worth the
effort. Anyone who wants to take the time to work out how to support
UTF-16 is very welcome to try, and contribute their results back to
the community.
The alternative we have chosen is to simply store UTF-8 octets in the
database, and pretend they are character strings. However there is no
UTF-8 collation on SQL server, so sorting (and possibly searching) of
non-western character sets will not work. However it should suffice for
users of principally western text.
Note that you must make sure you set a case-sensitive collation for
the database. For example,
ALTER DATABASE Foswiki COLLATE Latin1_General_100_CS_AS_SC;
Without this, upper and lower case strings will be compared
case-insensitively, and the DB will be unable to distinguish
ThisTopicName
and
THISTOPICNAME
.
IF YOU WANT FULL SUPPORT FOR UNICODE CHARACTERS, DON'T USE SQL SERVER
Authentication
If you are using Windows authentication for access to the database
server, then the simplest thing to do is to use the ODBC driver and
create a data source for SQL Server which uses Windows
authentication. Then set an empty username and password in the
DBIStoreContrib configuration.
Regular Expressions
SQL Server does not come equipped with regular expression matching,
which is required for Foswiki, so you will need to install a regular
expression library. The default personality module included in this
module requires the user function
dbo.fn_RegExIsMatch
to stub the
.NET
RegEx class. Instructions for building and installing this
user function can be found at
http://www.codeproject.com/Articles/19502/A-T-SQL-Regular-Expression-Library-for-SQL-Server
Schema
Depending on the version you are running you will need to replace
TEXT
types with a
VARCHAR
type in the schema. Note that index
fields are limited to 900 bytes.
SQLite Notes
SQLite requires the
pcre
module to be installed to support regular
expression searches. The path to this module is set up in
configure
.
We allow the DBD driver to handle the encoding. This will usually
be UTF-8.
ODBC Notes
DBD::ODBC can be used as an interface to a number of different
database types, and it may be the simplest way to access your
database. ODBC should be transparent; simply select the personality
that applies to your database, with an ODBC DSN. You may need to tune
the DBD::ODBC driver for use with your specific database; this can be
done using the
configure
EXPERT setting
{Extensions}{DBIStoreContrib}{DBIAttributes}
.
Note that there has been
no testing with ODBC drivers that are
compiled with unicode support enabled.
How it works
Searches in Foswiki fall into two types -
text searches and
query searches. Text searches are what you use to find text in a topic,
while query searches use a structured query language to specify
logical combinations of field values. Foswiki has two internal
configuration options,
{Store}{SearchAlgorithm}
and
{Store}{QueryAlgorithm}
, that select the algorithms that it will use
for these different types of search.
The default Foswiki query algorithm works by cherry-picking parts of
queries to create regular expression searches that can then be passed
though the search algorithm. These searches are then used as a filter
on the set of topics, to narrow down to a subset that is then "brute
force" matched against the original query to give a final set of
matching topics. This is fine when you have relatively few queries,
but is not terribly efficient if you make a lot of use of query
searches.
The DBIStoreContrib does things differently. When you select
DBIStoreContrib for the query algorithm, query searches are
automatically transformed into SQL queries, which are very
fast. Further, you can optionally choose to use the database for
standard text searches (including regular expression searches, if your
DB supports them).
DBIStoreContrib maintains the SQL database using an (included)
Foswiki plugin to "eavesdrop" on save operations in Foswiki. When a
topic is saved, it is automatically added to the database in the
background. The DB only caches the latest version of a topic; version
histories are not stored (that would be the job of a full store
implementation, which is a different project).
The database is interfaced to via the standard Perl DBI interface, so
any RDBMS that has an adapter can be used for the cache. A number of
common SQL implementations are supported out of the box. If you want
to use a different database you will have to implement a personality
module for it; if you do this, please contribute the new personality
module (and appropriate usage information) back to the community.
The schema used to represent Foswiki topics is (currently) a 1:1
mapping of the schema described in
QuerySearch. The same schema
could be used to store Foswiki topics in the actual store, and this is
one of the longer term goals. Among other things, this would allow us
to search topic histories.
The Code
Here's an overview of the important bits of the contrib:
-
lib/Foswiki/Contrib/DBIStoreContrib/HoistSQL.pm
- code that hoists SQL statements from Foswiki queries
-
lib/Foswiki/Contrib/DBIStoreContrib/Personality/*.pm
- implementations of personality modules for different databases
-
lib/Foswiki/Plugins/DBIStorePlugin.pm
- plugin handlers (Foswiki <=1.1 only)
-
lib/Foswiki/Store/QueryAlgorithms/DBIStoreContrib.pm
- the query algorithm
-
lib/Foswiki/Store/SearchAlgorithms/DBIStoreContrib.pm
- the search algorithm
-
tools/dbistore_manage.pl
- command line program for manipulating and querying the DB.
Info