when using the Relu activation function, what is the right way to initialize the weights and the bias of one network
[link][4 comments]
when using the Relu activation function, what is the right way to initialize the weights and the bias of one network