Mining frequent itemsets in textual documents (storage issues)

Hi guys,

I am working on a research project that aims to scan a large number of documents and identify itemsets in the form of word sequences. Another team is working in the same task using Markov Chains and we will later compare our approaches.

The problem is that the text corpus we are mining is extremely big. We are dealing with about 19 GB of text files. Whenever we detect an itemset (where k <= 3) we store the information on a relational dbms together with its support count.

However, the tables in our relational dbms get pretty big pretty quickly and it takes a lot of time to query our database. Our queries only search by the first word in a sequence (the order of words matters in our case).

Does anyone have any experience with similar issues? Is it feasible to try with NoSQL databases or Graph databases maybe?

submitted by vshehu
[link][7 comments]

Mining frequent itemsets in textual documents (storage issues)

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...