Hi, In my journey to implement Audio-Video Speech recognition, I have successfully seperated audio and video stream using Xuggle, recognize speech using sphinx .. but I am stuck when the paper talks about feature extraction from audio and video.. then classifying it and finally doing decision fusion to improve the accuracy.. What tools(preferably Java libs) should I use to do this. I am trying to implement: http://www.ncbi.nlm.nih.gov/pubmed/23757540 Guys at http://lts5srv2.epfl.ch/~estellers/AVASR/AVASR_index.html have done something similar but there is no information about methodologies. Would OpeIMAJ suffice?
[link][comment]