Hello all,
I'm writing a crossword generator and the generated crosswords are not looking very natural compared to real ones. I am considering using machine learning to learn how to select the position each new word should be put on the grid.
I have two issues:
I can't think of a generic machine learning technique that could be used. This is quite all right though, I think I can get away with using a novel stochastic model to describe the process and fit the data to it with a genetic algorithm (or some other metaheuristic technique).
I can't seem to find a dataset to work with. The only datasets I have found on the Internet are ones with lists of words. For my needs, I require a dataset showing the start and end positions of words on the grid, I don't even really need it to contain the words themselves. Does anyone know if such a dataset exist and if so how one would go about getting their hands on it?
Thanks you in advance!
[link][4 comments]