I've been working on a personal project, and as a side effect I've been making a lot of parallel text corpuses in language pairs that are hard to find (Vietnamese/Portuguese, Yoruba/Japanese).
In many ways, progress in natural language research is driven by the availability of data. This is particularly true for the field of statistical machine translation, which thrives on the emergence of large quantities of parallel text: text paired with its translation into a second language.
Europarl: A parallel Corpus for Statistical Machine Translation
I know that all of this data has some value and I'm considering setting up a site to sell it instead of letting it just sit on my hard drive. The problem is that I don't really know how to price something like this.
If you work in research that could use data like this, how much would your lab/company pay for a license?
[link][13 comments]