Applications of deep learning and NLP on endangered languages, advise needed

Newbie question. I just started my masters in computer science and I need to propose a research project for my thesis. One thing I'm interested about is using NLP techniques in endangered languages, specifically, pre-Hispanic languages of Central America.

Originally, I proposed a semantic search engine (that searches for concepts rather than just words). However my proposal was rejected because my thesis adviser suggested to that I needed a linguistics grad student whose specialized Amero-Indian languages on board (since I need to understand the semantics, morphology and grammar of the language). This last point seems a bit odd, since I would've guessed that NLP is based on statistical methods and I'm not too sure a deep knowledge of a particular language is required to use them. I do understand that at least some familiarization with the general structure of a language is needed, and the more support the better, but would using book references on a language's structure be really a step-down? (What's the minimum amount of Basque a native English speaker, who has zero knowledge about that language, need to build a semantic search engine using NLP techniques?)

I do know a lot of people who speak one of those languages, but don't have a linguistics background. They could validate results, as users, but couldn't help me any further than that.

Seeing that a student with the profile I'm looking for doesn't exist (in my school), I need to work my way around. So, my question is, what kind of projects involving NLP (and possibly deep learning) could I realistically work on during the next 1.5-2 years?

Here's some other info:

I'm fairly new new to language technologies and just built my first language-processing application, which builds a word cloud from different twitter accounts. Next semester I'll take my first NLP course, though.
I do not intend to push the state of the art of language technology (I think it's unrealistic), rather than use them in a novel way that could be socially relevant.
I was thinking of building n-grams from different indigenous languages, but I'm not too sure that could be a research project for a master's degree. They could be useful for something else though.
If building n-grams is something interesting, how large should my corpus be? For starters, I know the nahuatl version of the Wikipedia has around 10,000 entries. I know the more, the better, but what number of entries could start to yield interesting results?

Any help, advise or pointers to other resources would be greatly appreciated!

submitted by astral_cowboy
[link][2 comments]

Applications of deep learning and NLP on endangered languages, advise needed

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Kurabuitaki na Sota Koya

አዋጅ ቁጥር 881-2007 የሙስና ወንጀሎችን ለመደንገግ የወጣ አዋጅ

SSIS 2019, MSOLEDBSQL, Thread safe? Connection issue

A/L Technology Stream – Subject combinations, Syllabuses and Teacher guides

Asianet plus schedule – list of programs , movie timings etc

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Actress Krissann Barretto Ties The Knot With Nathan Karamchandani In Dreamy...

[ubuntu] Running kickseed is seen stuck at 0%

ZLT P25 (Globe) - back to original firmware

Mp3 Download: Mdu - Mazola

Mafia Hit List: The Top 15 Connecticut Mob Murders Of All-Time

Alex Warren – Ordinary (Wedding Version) – Single [iTunes Plus M4A]

AAD Connect loses the connection with SQL Server

[REQUEST] HP 625 F.21 BIOS Whitelist Removal - sp57760

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

Waves Ultimate 15 v25.04.07 Incl V.R Patch WiN

100+ Short Whatsapp Status in English | Short Status Quotes Words

[ROM][ONEUI 2.5][10.0][A530F/A530W/A730F]FusionX V1.0

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...