Or, ‘How I went round and round in circles….. and then round and round some more’
One of the initial ideas that was suggested back at the beginning of the project was a way of mining course data in order to provide suggestions for similar courses. Since we now have access to the APMS data (a lot more data than my dummy set), finding ways of suggesting similar courses is now something that I can attempt properly.
The first step in the process was deciding on a method of finding keywords from within the various text descriptions of programmes and modules. OpenCalais, which has been used in a previous project at the university – JISCPress, was one such option.
OpenCalais, which has an easy to use API, takes a body of text and returns a series of keywords, broken down by type, and their relevancy score, which indicates how strongly an identified keyword is relevant to the body of text. Initially I looked at using an existing PHP library that would interface with the API (reinventing the wheel and all that), but found that they did not return all of the data I wanted in a easy to use manner, so I wrote my own code, which will be available on Github.
When I first started this process, I had access to data relating to 6436 modules of study, which are part of 878 individual programmes of study for a total of 349 courses. (If one course has two different years’ intake, it will be represented by 2 programmes. A similar situation exists with modules.)
In the first instance, I looked at generating keywords for all programmes of study (a mistake, which I cover later) and modules. I decided to use the following fields to generate keywords for programmes:
- ‘Aims and Objectives’
- ‘Introduction’
- ‘Specialist Facilities’
- ‘Career Opportunities’
I also used the ‘Synopsis’ field for modules.
This process generated 3,335 keywords, broken down into 36 types of keywords. 17,835 links were generated between programmes of study and keywords and 19,255 links between keywords and modules.