I'm looking for suggestions, but not sure of a better place to ask this.
Simply put, I've been assigned to build a recommendation system which suggests learning materials to students after they take a quiz.
The current system: Students anonymously go to the website, choose their particular book isbn, chapter and lesson, and take a self-assessment quiz. If their score is below a certain percentage, they are 'remediated' and are given the textbook page number of an example associated with that question (this correlation is all in the database). They can then email their scores to their teacher.
Now, there are thousands of digital learning materials collected from over the years, including video and other instructive materials. The point of the project is to incorporate these materials into the old quiz system and 'intelligently' give suggestions to the student for remediation. (These suggestions are in the form of links, to my knowledge). Note, this whole project is more-or-less just a proof of concept that is intended to be applied to some of their more modern quiz systems.
Data available: Each quiz contains the book title, chapter title, lesson title, the actual question and an optional hint for the student. The new materials have metadata such as description, grade-level, remediary/basic/advanced, etc. Each material is categorized into a certain group, for instance "Expressions" in mathematics. The idea is that the questions will also be associated with these groups in order to narrow down the list of those materials retrieved. That is, get all materials associated with the category that question is in.
For starters, a user registration system is needed to store individual quiz score history. I suppose the kinds of additional data needed is dependent on the algorithm to learn off of the data. I've basically been given free reign to choose what modifications need to be made and the kinds of data to store.
I've looked at various papers on the topic, and this one seems pretty relevant: http://www.ascilite.org.au/ajet/ajet26/ghauth.pdf
He uses a combination of cosine similarity for the materials and a user rating system. In my case, I could preprocess the similarity rankings between each question and the materials in the same category (but I guess it would have to be done each time a new material is added). This would be between data I know about the question against data I know about the materials. I think I don't want students explicitly rating material, so instead I would give a rating implicitly based on whether or not that material helped them in their next quiz (eg they viewed material in "expressions" and the next time they encounter "expressions" in a quiz their score increased).
Is this a good approach to take? Is there a way to use the additional metadata about students to do more than simple heuristics? I would definitely appreciate any suggestions as this is the first machine-learning project that I've ever tried to tackle like this. Thanks!
[link] [2 comments]