Debugging Sentiment Analysis classifier

Hi ML community,

I am trying to perform sentiment analysis (positive, negative, neutral) for tweets belonging to a particular domain (say Finance). I have 2 sets of entities belonging to the domain (eg: Finance Organizations). My training set consists of 12k tweets containing mentions of entities in set 1. The first test set is 2.5k tweets of entities in set 1. The second test set is 1k tweets of entities in set 2.

My feature set consists of the standard bag of words from the training set (about 15k), along with some orthographic and lexical features (total 50). These include flags for punctuation, exclamation, capitalization + the number of known positive/negative words from a lexicon.

The problem is that although all data belongs to the same domain, the classifier performs well (accuracy ~80%) on test set 1 and poorly (accuracy ~55%) on test set 2.

I want to know whats causing this difference in performance values through some computation/graphs if possible and how to fix it.

So far, I have performed the following analysis on the features:

use only bag of words features. in this case the performance on test set 1 drops by 5% and on test set 2 improves by 4%
check the average proportion of bag of words features and lexical/orthographic features activated per document for these sets. they all turn out to be pretty close to each other

I use scikit-learn SVM with a linear kernel. Any help or tips are appreciated.

submitted by chain20
[link][3 comments]

Debugging Sentiment Analysis classifier

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112