I want to build a crime index and political instability index based in news stories.

Hello, I have this side project where I crawl the local news websites in my country and want to build a crime index and political instability index.

I have already covered the information retrieval part of the project. My plan is:

Unsupervised topic extraction.
Near duplicates detection.
Supervised classification and incident level (crime/political - high/medium/low).

I will use python and sklearn and have already research the algorithms that I can use for those tasks. I think 1 - 2 could give me a relevancy factor of a story: the more news papers publish about an story or topic the more relevant.

My next step is to build the monthly, weekly and daily index (nation-wide and per cities) based on the features that I have, and I'm a little lost here as the "instability sensitivity" might increase to the time. I mean, the index from the major instability incident of the last year could be less than the index for this year. Also if to use fixed scale 0-100 or not.

I would appreciate any pointer to a paper, relevant readings or thoughts.

Thanks.

submitted by alzwke
[link][10 comments]