Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62787

Ways to improve dynamic time warping word recognition system?

$
0
0

I recently got interested in speech recognition and have implemented a simple dynamic time warp system for word recognition for my own learning purpose. However after testing a bit I believe that I might made a mistake somewhere in the implementation, as the least distance value in traversing the dtw matrix does not accurately separate different word from each other. Here is the step I followed in the implementation.

  1. Compute mfcc for two wav samples using https://github.com/jameslyons/python_speech_features (I will eventually replace it with my own mfcc algorithm)

  2. Compute l2 norm for the top 13 mfcc in order

  3. Traverse through the dtw matrix and find the least distance.

Below are my results

Comparing 1 of the kiwi file to all other file I get the following average

kiwi 53.5627956541 apple 52.8226506157 banana 57.885524018 lime 48.5113003162 orange 63.9675030969 

Here is my code https://gist.github.com/anonymous/1323692feea2a2bcfba4

I feel I am missing some crucial steps, any advice will be appreciated.

Thanks

Edit:

I am using audio file from

https://dl.dropboxusercontent.com/u/15378192/audio.tar.gz

and

http://www.forvo.com/word/apple/#en

submitted by kaustest
[link][4 comments]

Viewing all articles
Browse latest Browse all 62787

Trending Articles