This question about Missing functionality: Answered
Search By Relevance
A client asks:
We've had request by some folks that search be more "relevant". We asked them "relevant to what". The thinking is that the search results be based on "metrics" like "most visited" shown first or something like that. Is that possible? Just wondering what our options are.
They're planning to install
KinoSearchPlugin but my understanding of that is that it will give them indexed searches and the ability to search within attachments. Nothing about "relevancy".
I'm not convinced that "relevancy" even means much in a small wiki but... any suggestions for these people?
--
VickiBrown - 13 Feb 2014
most relevant results searching for web server performance using SolrPlugin
Sorting search results by relevance is a bit more difficult to explain compared to sorting documents according to their own static properties (e.g. sort by name). That's because it is a dynamic value computed during search time whereas properties of result documents themselves are gathered during index time and thus are rather static.
Relevance is a function that computes a score for each document in the result set based on the query and the person doing the query and a few other factors that might play a role. Think of the relevance function as a weighted sum of factors each contributing to the score of result documents. Each factor of a relevance function influences the sorting thus contributing to the document score.
When designing a relevance function you'd like to model a kind of sorting that feels most natural to users. Things that are considered to be more "interesting" or "relevant" to the user are sorted before others.
Typical properties playing a role in relevance functions are
- best matching the query term: the fewer parts of the query match, the lower the score; (might also take fuzzy or phonetic matches into account)
- distance of query terms found in a result documet: a two-word query matches best when found adjacent in a document rather than finding both words in distant places of the same document
- most recently changed: documents that have been created or edited more recently can be considered more relevant
- number of likes
- number of times a document has been put into a favorites list
- five-star rating done by users voting on the document's importance or quality
- number of times a document has been viewed, fading out numbers of views in the past
- featured documents: manually boost documents to appear at the top of certain queries
- click-through: elevate result documents that have been actually clicked on
- personal interests: metadata about the person doing the query, probably gathered explicitly as part of a skills management undertaking (fields of interest, current projects, past projects)
- documents that "friends" or team members have liked or been visiting recently (or friends of friends).
Some of these factors don't really play a role on small to medium size wikis as they can distort a search sorting more than helping to create one that feels "natural".
Small wikis suffer from sparse statistics. In corporate intranet wikis, it might very well be that the single most important document isn't visited at all anymore as everybody has internalized its content already being an employee of the organization. Yet still
this one document is considered most important.
How to configure a search engine is thus pretty specific to the use case and data available.
Most of these factors can be implemented using
SolrPlugin. Some of them need additional extensions interacting with solr, e.g. we don't have a proper LikePlugin , not hard to implement, just didn't have the time to implement it. Another missing feature playing a role in other products is "follow" or "friends". These things are added into the soup as well on well known social networking sites.
The "sort by relevance" available in SolrPlugin out of the box is: sort search results by showing best matches first (relevance of documents vs user query) while breaking ties using a most-recently-changed metric. Matches in titles score higher than found elsewhere, followed by a few other places a match counts higher (topic title, topic name, categories, tags, summary field, ...).
Anything else needs more customizing.
Besides its role in sorting search results, "relevance" is used to compute "similar documents", another feature of SolrPlugin. It allows you to add an extra navigation item pointing towards other documents in the wiki based on the meta data that are considered relevant while comparing two documents. Following these navigation links allows you to hop from topic to topic within the same thematic field that you are currently dealing with without you having to craft these links manually. All you have to do is assign proper meta data like categories and tags to your topics.
Below snapshot shows a topic about nginx on my personal wiki being similar to a few other topics in the web.
similar topics using default relevance metric
Note that documents might be considered "similar" based on quite different properties of DataForm fields depending on the WikiApp the current topic is participating in.
similar topics using a customized relevance metric
Here the recent remake of RoboCop is considered most similar to the original RoboCop movie from 1987. Movies are considered similar based on their genre, director, writer, producer besides the normal meta data being gathered by the indexer.
Bottom line: relevance is a pretty central concept in search engines. It can help to find results quicker by sorting searched documents in a way that user
expect them to be sorted, whatever is applicable for the deployed system. Sort-by-relevance does play an important role even in small wikis for personal knowledge management. It become more powerful the broader the scope of the relevance function is taking more social factors into account. However be warned: wikis behind a firewall suffer from sparse data. Extracting the wrong metric might hurt more than it helps by distorting search sorting in unexpected ways.
(cross-posted here
http://www.michaeldaumconsulting.com/Blog/AreYourSearchResultsRelevant)
--
MichaelDaum - 13 Feb 2014
Vicki, I just went through the process of a fresh
SolrPlugin install, using the code on trunk and with Michael's guidance. I have updated the documentation and some other bits and pieces to reflect what I found. I can recommend you try the
SolrPlugin, but beware that the default pages it comes with (e.g. SolrSearch) have a stack of dependencies.
--
CrawfordCurrie - 13 Feb 2014