What are some methods to compare related but different text corpora: eg. 1. Product reviews of say 5 different models/versions of the same product (possibly by the same manufacturer). 2. Product reviews of the same product, but by say 5 different age-groups. I've used 5 related-but-different corpora in each case, but pairwise comparisons are fine too.
Examples of what I meant by compare: * find n-grams that tend to be used more in 1 group over the other * find n-grams that seem to be used with the same frequency across groups
A pairwise comparison would compare, say Group 1 vs Group 2 (and ignore Groups 3 thru 5)
[link][4 comments]