Writing Query Rules to Classify Resources in Delores Extensions

The power of Waypoint comes from the fact that a set of classification facets is used to pigeon-hole individual OERs. The user can then ‘filter’ the content by selecting qualifying criteria from any number of facets.

In Waypoint individual resources are classified against each of facets selected to describe and using rules which have been written for a particular domain. Lucene, which does the indexing, has a query language syntax which is based on Boolean logic, but has additional terms which may be used to increase the power of discrimination through such things as proximity and range measures and fuzzy search.

So, a standard Boolean rule might be written: ‘gear AND machining’, which, not surprisingly would find all content which contain ‘gear’ and ‘machining’ in the content.

Alternatively, a phrase: ‘ “gear machining” ’ with the two words in double-quotes will return only those descriptions which have these two words next to each other in this order.

This can be modified to a proximity query – so: ‘“gear machining” ~10’ will find any occurrence of these two words where they occur within 10 words of each other in either order.

The Lucene Syntax is very powerful. This, however, brings its own problems, since the performance of the classification is dependent on the rules that have been written, and the rule-writer is spoil for choice. The facts are that rule-writing is more an art than a science and that expertise both in the practise of rule writing and the domain is very necessary to a good domain classification.

For the novice, the best approach is to limit, in the first instance, rules to those using simple Boolean expressions. Then it is a matter of assessing the classification performance and making step-wise adjustments. Thus, the rules are tuned to eliminate invalid classifications and to incorporate new rules to deal with missed resources.  It is always necessary for the classification performance to be checked by domain experts. We are taking this step-wise and expert-check approach in developing the classification for Delores Extensions.

It should, however, be remembered, that the OERs in Delores Extensions are pre-filtered by sux0r into two streams, delivering only those engineering resources to Waypoint for further classification that apply specifically to engineering design.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment