Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62576

ML expert opinion on toy example/data?

$
0
0

I spend quite a bit of my time fitting (and re-fitting) complicated non-linear functions (physically motivated) to data. I would LOVE to be able to get some kind of Machine Learning to remove at least some of the burden (and error-prone/human ness). That said, I've hit a wall. I came up with a simple toy example, that I hope a person who is an expert in machine learning/statistics can say "oh yeah, just do X and you're set". I feel that if someone can get me past this plateau, I can charge ahead on the problem myself.

Here is the toy example:

simple plot


And the python code that generates everything:

import pymc as mc import numpy as np import pylab as pl def GaussFunc(x, amplitude, centroid, sigma): return amplitude * np.exp(-0.5 * ((x - centroid) / sigma)**2) wavelength = np.arange(5000, 5050, 0.02) # Profile 1 centroid_one = 5025.0 sigma_one = 2.2 height_one = 0.8 profile1 = GaussFunc(wavelength, height_one, centroid_one, sigma_one, ) # Profile 2 centroid_two = 5027.0 sigma_two = 1.2 height_two = 0.5 profile2 = GaussFunc(wavelength, height_two, centroid_two, sigma_two, ) # Measured values noise = np.random.normal(0.0, 0.02, len(wavelength)) combined = profile1 + profile2 + noise # Some of what I thought was the right approach... sigma_mc_one = mc.Uniform('sig', 0.01, 6.5) height_mc_one = mc.Uniform('height', 0.1, 2.5) centroid_mc_one = mc.Uniform('cen', 5015., 5040.) sigma_mc_two = mc.Uniform('sig2', 0.01, 6.5) height_mc_two = mc.Uniform('height2', 0.1, 2.5) centroid_mc_two = mc.Uniform('cen2', 5015., 5040.) # The remainder is only if you want to plot what this looks like pl.plot(wavelength, combined, label="Measured") pl.plot(wavelength, profile1, color='red', linestyle='dashed', label="1") pl.plot(wavelength, profile2, color='green', linestyle='dashed', label="2") pl.title("Feature One and Two") pl.legend() 

What I'd like to learn: someone explain how to take the wavelength, and the observed "combined" intensity, and the knowledge that some combination of Gaussian functions (height, sigma, centroid) generated this data -- and turn it into the best fit solution: Gauss1(height, sigma, centroid) and Gauss2(height, sigma, centroid).

Various other thoughts that I have about this thing:

  • Please note that the functions that I will actually fit on my real problem are NOT Gaussians -- so please provide the example using something like GaussFunc in my example, and not a "built-in" pymc.Normal() type function.

  • I understand model selection is another (IMPORTANT!) issue: so with the current noise, 1 component (profile) might be all that is statistically justified. But I'd like to see what the best solution for 1, 2, 3, etc. components would be.

  • I don't have to use PyMC -- it's just the one I've picked to try for now. If scikit-learn, astroML, or some other python package can do the job, please let me know!

submitted by jbwhitmore
[link][8 comments]

Viewing all articles
Browse latest Browse all 62576

Trending Articles