I am a noob machine learner, just starting on my career in an NLP job in a few months. And I also plan to go to grad school in the next few years I feel clueless a lot of the times, being mainly self-taught, and don't have anyone to talk to about machine learning, or discuss research topics and papers with.

Is there any place or website where I can go to find mentors? Is there any redditor here who would be willing to mentor me? I'll be much obliged!

submitted by lone_haranguer
[link] [12 comments]

↧

All code from "Machine Learning for Email" now on Github

November 15, 2011, 5:30 am

≫ Next: Large scale ML

≪ Previous: Mentorship?

submitted by agconway
[link] [2 comments]

↧

Large scale ML

November 15, 2011, 4:43 pm

≫ Next: How to compare models with different dimensions

≪ Previous: All code from "Machine Learning for Email" now on Github

Hi all.

I work for a company who is looking to do large scale data mining. I was wondering if there were any solutions currently available to do large scale distributed data mining. Thanks!

submitted by ml_noob
[link] [3 comments]

↧

How to compare models with different dimensions

November 15, 2011, 8:04 pm

≫ Next: How well does gradient descent work on extremely noisy data?

≪ Previous: Large scale ML

I have some data (~200 dim) and not that many examples (~300). I am trying to get the best generative model of the data. I am using GMMs to create the model however I am adding a PCA reduction prior to this and building the GMM over the reduced subspace, the problem arises in comparing distributions with different dimensionality. I attempted to use cross validation of the likelihood however the problem is in one model the input is 100 dim in another model the input is some other dimension size so my understanding is that i cannot compare these likelihoods directly since lower dimensional data will have more approximation error than the higher dimensional data(but less parameters and noise specific to the training data? one would hope) . Any ideas in how to compare these in order to choose the best dimension to represent the data?

submitted by iHeartML
[link] [9 comments]

↧

How well does gradient descent work on extremely noisy data?

November 16, 2011, 6:46 am

≫ Next: This Guy Broke Jeopardy’s All-Time Record… Using ML Techniques To Train Himself

≪ Previous: How to compare models with different dimensions

I'm interested in using gradient descent to predict the probability that a website visitor will click on an ad, given a number of things we know about them (geographic location, browser, operating system, referrer, etc).

Typical click-rates are around 0.1%, and obviously there are a lot of factors that play a part in whether or not the user clicks that aren't represented in the input attributes. This means that from the learning algorithm's perspective the output data is extremely noisy.

How well does gradient descent work on this kind of problem where you're essentially trying to pick out comparatively subtle relationships between the input and output data amidst a lot of noise?

Would a data mining approach be more effective here?

edit: In response to some comments, yes - I would use a logistic regression of some form because the output must be a probability.

edit2: In response to those asking about my cost function - my ultimate goal is, given multiple ads to choose from to show to a user, pick the one they are most likely to click on.

submitted by sanity
[link] [35 comments]

↧

This Guy Broke Jeopardy’s All-Time Record… Using ML Techniques To Train Himself

November 16, 2011, 9:30 am

≫ Next: More courses from Stanford

≪ Previous: How well does gradient descent work on extremely noisy data?

submitted by cavedave
[link] [19 comments]

↧

More courses from Stanford

November 17, 2011, 7:52 am

≫ Next: Using fmin from scipy optimize

≪ Previous: This Guy Broke Jeopardy’s All-Time Record… Using ML Techniques To Train Himself

Link to Probabilistic Graphical Models - starting Jan. More courses at the bottom of the page (ML, NLP, Game Theory, et al).

submitted by amair
[link] [17 comments]

↧

Using fmin from scipy optimize

November 17, 2011, 10:03 am

≫ Next: Can Big Data Fix Healthcare?

≪ Previous: More courses from Stanford

is using the fmin functions from scipy optimize a good way to train neural nets. And in neural nets, why are multipliers used in each weight of the perceptron - has anyone tried powers or more complicated thing on each weight of the perceptron?

submitted by marshallp
[link] [10 comments]

↧

Can Big Data Fix Healthcare?

November 17, 2011, 12:20 pm

≫ Next: ML for making code recommendations (people that called X also called Y)

≪ Previous: Using fmin from scipy optimize

submitted by ohsnaaap
[link] [3 comments]

↧

ML for making code recommendations (people that called X also called Y)

November 17, 2011, 11:49 pm

≫ Next: Data Scientist vs Statistician?

≪ Previous: Can Big Data Fix Healthcare?

Interesting research project at http://eclipse.org/recommenders that leverages static analysis and ML to create nice tools around it in the Eclipse IDE.

submitted by microbiotic
[link] [comment]

↧

Data Scientist vs Statistician?

November 18, 2011, 6:55 pm

≫ Next: Am I planning it right for a PhD in ML?

≪ Previous: ML for making code recommendations (people that called X also called Y)

Hi I thought this would be the most appropriate sub reddit for this kind of thing. My question is what exactly is the difference between the two? I tried googling the answers but most people are dodging the question or give an inaccurate description of statisticians.

submitted by SinisterSamurai
[link] [39 comments]

↧

Am I planning it right for a PhD in ML?

November 20, 2011, 2:01 am

≫ Next: Ask r/ML: What to do when you the size of the feature set is much larger than the training set?

≪ Previous: Data Scientist vs Statistician?

I am a Master's student in EE from a pretty decent school, specializing in Image Processing and Machine Learning. I want to do a PhD in the long run, but I want to be fully prepared, armed with a solid knowledge of the basics and good experience in the field before taking my 5-year plunge. Basically, when I start my PhD, I want to hit the ground running, doing research, and not waste the first year just studying the basics.

Toward this end, my plan after my Master's is to do a research internship in some lab for a year and hope to get a/some publications out.

I just want to ask you guys, will this really help my PhD applications? I want to get into a good PhD program.

What about working in a ML startup? I know there are a LOT of companies out there with TONS of data, will it boost my resume if I worked in some such core ML company (even if only a small startup), gained good knowledge, but got no publications?

Thanks a lot!

submitted by marshmallowsOnFire
[link] [31 comments]

↧

Ask r/ML: What to do when you the size of the feature set is much larger than the training set?

November 20, 2011, 10:37 am

≫ Next: I am interested in applying statistics/machine learning to the field of finance

≪ Previous: Am I planning it right for a PhD in ML?

Hi,

As mentioned above what should I do? If there are resources that I can read about this that'd be great too. Thanks

EDIT: Was going to use NN and SVM...

submitted by tshauck
[link] [14 comments]

↧

I am interested in applying statistics/machine learning to the field of finance

November 20, 2011, 6:25 pm

≫ Next: Any tips about becoming an AI engineer?

≪ Previous: Ask r/ML: What to do when you the size of the feature set is much larger than the training set?

I have a very strong background in finance, but much weaker background in statistics and computer science. I am interesting in learning this topic to apply to my work in financial research.

To be more specific, I am interested in using historical data about a company (company 10K, for example, which includes income statement, balance sheet, etc), and from this data predict their credit rating (which is given by a third party. There are a number of academic articles written on the topic that suggest high accuracy. However, these articles obviously aren't aimed at someone just exploring the field.

So where do I begin? If i wanted to accomplish something like this and get my feet wet, what software do I need? what books/references should I have to learn more? Avoiding returning to school, what are some of the basic books I should read if any (such as linear algebra).

Thanks for your comments.

submitted by rrbest
[link] [27 comments]

↧