The Foxtrot recommender system recommends on-line research papers from a dynamic database. An ontology is used to represent user profiles, allowing ontological relationships to infer more information about a profile than can be observed directly from user behaviour and the use of a shared training set for multi-class classification. Profiles are also visualized in terms users understand, allowing direct profile feedback. A year long trial with over 300 people is currently underway to evaluate this system, while it provides a real world service to the staff and students of Southampton University.
Recommender systems, user profiling, ontology, world wide web
The web is increasingly becoming the primary source of research papers to the modern researcher. We address the problem of recommending on-line research papers to over 300 computer science staff and students at Southampton University for a full academic year.
Recommender systems can help deal with the mass of content available on the World-Wide Web. They remove the burden of explicit search queries by learning profiles of the sort of things relevant to users, then recommending new items that similar people have liked or are similar to previously relevant items.
The Foxtrot recommender system is a hybrid recommender system and searchable paper database. Collaborative and content-based recommendation is supported, in addition to direct database searching. The research paper database is classified using a research paper topic ontology. Users are monitored as they browse the web and any papers they find added to the database. Figure 1 shows an overview of the Foxtrot architecture.
Foxtrot uses the research paper topic ontology to infer interests beyond that just seen from observed behaviour. Interest profiles are represented using ontological terms and users can visualize and update their profiles as they see fit.
Foxtrot is an advanced evolution of the Quickstep recommender system [5].
We use a web proxy to unobtrusively monitor each user's web browsing, adding new research papers to the central database as users discover them. The research paper database thus acts as a pool of shared knowledge, available to all users via search and recommendation. Figure 2 shows the search interface Foxtrot uses. Interest profiles are visualized to allow direct profile feedback to be elicited in addition to traditional relevance feedback on each recommendation. Figure 3 shows the interest/time graph used to visualize profiles; users can draw bars on this graph to indicate specific interest in topics.
A k-Nearest Neighbour algorithm is used to classify papers within the database, shown in Figure 4. Papers are represented using term-vectors and an inverse distance weighting used within the term-vector space to compute the closeness of new papers to a given class. Ontology topics are represented by about 100 classes, each having 5-10 manually labelled training examples. Multi-class classification allows the training set to be shared by users, as opposed to each user having their own personal positive and negative set of examples.
Daily profiles are computed by correlating previously browsed research papers with their classification, and storing the profiles in terms of the topics within the ontology. A time decay function weights recently seen papers as being more important than older ones. User feedback also adjusts the interest of topics within the profile. Ontological relationships between topics of interest are used to infer other topics of interest, which might not have been browsed explicitly; an interest value in a specific class adds 50% of the value to its super-class. Figure 5 shows an example of the profiling algorithm in action.
The current topics of interest are taken from the users profile, and a list of similar people computed by applying Pearson-r correlation to all user profiles. Recommendations are thus papers on the current topics of interest that have also been read by similar people to that user. Figure 6 shows an example of the recommendation algorithm in action.
The Foxtrot recommender system uses an ontology-based profile representation, with each class representing a paper topic. This allows a multi-class approach to classification, where the training set of examples can be shared. Traditional profile representations hold personal sets of positive and negative examples that cannot be shared. Ontological relationships between classes are used to infer more information about a profile than is seen explicitly from a user's behaviour, and the profiles visualized to allow users to update their own profile. We expect this novel approach will offer advantages over traditional approaches to recommendation.
We are currently evaluating Foxtrot over a full academic year, providing the system to over 300 computer science staff and students. Users are randomly split into two groups; one group can visualize their profiles and one cannot. We will thus test the overall recommender performance and the degree to which direct profile feedback effects performance on a real world problem.
The Foxtrot trial is due to finish in July 2002. We intend to publish our results shortly after this, as well as evaluating a variety of other profiling techniques on the behavioural log data accumulated.
This work is funded by EPSRC studentship award number 99308831.
Group Lens [3] is an example of a collaborative filter, recommending newsgroup articles based on a Pearson-r correlation of other users ratings. Fab [1] a content-based recommender, recommending web pages based on a nearest-neighbour algorithm working with each individual users set of positive examples. Foxtrot is a hybrid recommender system, combining both these types of approach.
Personal web-based agents such as NewsDude [2] and NewsWeeder [4] build profiles from observed user behaviour. These systems filter new stories and recommend unseen ones based on content. Personal sets of positive and negative example are maintained for each user's profile. In contrast, by using an ontology to represent user profiles Foxtrot shares the training examples for all its classes.
Mladenic [6] provides a good survey of text-learning and agent systems, including content-based and collaborative approaches.