Document representation for information retrieval systems

Hi everyone, I am trying to figure out which is the best document representation for an information retrieval (IR) system. There are two popular options at the state-of-the-art: (1) BOW (Bag Of Words), adopted by IR systems like Lucene (http://lucene.apache.org/core/); (2) BOF (Bag Of Features), exploited by LETOR (http://research.microsoft.com/en-us/um/beijing/projects/letor/). BOW is apparently able to capture documents semantic, but it only happens if you consider small vocabularies (eg., 5000 words), so it is not scalable. On the other hand, BOF is independent from the vocabulary size, and it may be easily combined to machine learning techniques to build an ad hoc ranking system, even though it is not able to capture the semantic. The question is: which is the most important feature for an IR system? Capturing the semantic or learning how to rank a document? Which basically means: which is the best representation between BOW and BOF?

submitted by dtosato
[link][1 comment]

Document representation for information retrieval systems

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: Ziba Zako ft Rich Bizzy & General Kanene – Chikwati (Prod by: Bicko...

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...