Hi,
I want to select a graduate program (in philosophy, no less) that attends a number of rather odd desiderata, and for this I want to text-mine publications and call-for-papers for citations in order to trace the citation graphs.
Later on I might use dimensionality reduction if relevant, but for a first approximation I don't expect to have this much structure.
Anyway --
1) What's a good graph visualization tool that can handle automatically generated node coloring and labeling in some gracious way -- possibly generation dynamic visualizations in Flash or something? GraphViz is too simple, and Mathematica, while more serious about layout algorithms, too awkward.
2) What's a good general strategy -- besides flimsy failure-prone text2pdf + ad hoc regexes -- text mining strategies for reading bibliography and citations sections from pdf files (parsed into text if necessary) so we can get to uniformized last names at least and color-coded institutions?
[link][4 comments]