Mixture of Gaussians with TFIDF sparse vectors

Hi guys,

I'm a complete newbie when it comes to Machine Learning (and CS in general, I've only had 2 semesters worth of courses). I'm trying to write an algorithm for document classification for an internship and I'm feeling out of my league.

Right now I've got approximately 2000 documents I need to classify and that number is expected to grow over time. I've got tfidf weightings for each documents, so right now I'm trying to write a Mixture of Gaussians mixture model using the sparse vectors of each documents tfidf weighting (right now there are about 44000 unique words after normalization, so that's how many dimensions I've got).

Things seem to be blowing up, though--I can't reasonably computer a 44000*44000 covariance matrix for each gaussian per iteration, so I'm just doing diagonal matrices (the variance of each term). But then the variance turns out to be so small that when its time for doing the exp() part of this function: http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Non-degenerate_case, multiplying by the inverse eventually makes the resulting scalar too huge to compute. Right now I'm trying to standardize my ifidf scores by subtracting the mean and dividing by the standard deviation, but that hasn't seemed to help with the resulting scalar size.

I really don't know what I'm doing. Is MoG the wrong approach to this? Apparently LDA is the best thing for this sort of thing, but from what I understand that would definitely be out of my league.

I guess don't know what I'm asking, exactly, but if anyone could provide some insight as how I'm approaching this incorrectly or what might be a better strategy, I would really appreciate it.

submitted by ColonelHapablap
[link] [12 comments]

Mixture of Gaussians with TFIDF sparse vectors

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...