Sentiment Analysis in the real world

Hi r/machinelearning,

I'm experimenting with performing a sentiment analysis (positive/negative classification) on review text for a commercial application.

As a training set I have ~200k labeled reviews from a popular domain specific website. I intend to experiment with training a classifier at the sentence and at the paragraph (or complete review) level.

The data to be classified is ~300k labeled reviews from the same domain. Due to legal reasons I am not able to train my classifier(s) with this data.

The approaches I am considering for constructing the feature vectors include: uni/bigrams, parts of speech filtered uni/bigrams, and parts of speech tagged uni/bigrams.

Anyways, to my questions:

Is it even feasible to train a model with a feature vector so large? Imagine each review is only 100 words, then the feature space of my training set is as high as 20 million dimensions.

If my feature vectors are all essentially bags of words, how can I use a model trained on the words in my training set to classify the words in my test set? That is to say, will there not be an issue with finding overlap between the vocabulary in the review to be classified and the group of reviews used to train the model?

Thanks :)

submitted by duckstreet
[link][comment]

Sentiment Analysis in the real world

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...