Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62811

How to regularize / stabilize neural network with sparse inputs?

$
0
0

Hello,

Suppose we have the first layer of a neural network h = activation(W*x). In general, I would regularize this part of the network by applying a small amount of dropout to x, constraining the row-norm of the weight matrix W, and applying a small amount of weight decay to W.

However, I'm not sure that this is still the best strategy if x is very sparse. Suppose in the most extreme case that x is a single categorical variables with tens of thousands of possible values. Also it is a practical necessity that the runtime of the training algorithm is proportional to the number of non-zero elements in x rather than the total size of x.

  1. Dropout on the inputs. This is a sparse operation if implemented correctly. However, I am somewhat concerned that it will be too strong and noisy as a regularizer for sparse categorical features (since by definition the values of a sparse categorical feature are not positively correlated, whereas features like adjacent pixels in an image are highly positively correlated).

  2. Weight constraints. Neither Computing the row-norm of the weight matrix nor dividing the weights by the norm are sparse operations. Also, there are a lot of researchers who work on sparse linear models, and I've never seen any of them use weight constraints.

  3. Weight decay. If one adds an L1/L2 penalty to the cost term, then the gradients and the update are not sparse. Alternatively one could have a penalty which only applies when the corresponding x term is non-zero.

Any ideas / experience on what sorts of methods work well?

submitted by alexmlamb
[link][1 comment]

Viewing all articles
Browse latest Browse all 62811

Trending Articles