1. Replace sigmoid function with rectified linear function: easily for training and test, plus efficient.

2. Dropout training and test could improve accuracy significantly, becasue this is basicly aggregating different highly regularized deep learning model by a geometric mean.

This might be a standard recipe for current deep learning. Based on this recipe, several students of his have won many Kaggle chagllenge.

]]>One interesting paper by Rafael Frongillo et.al in NIPS 2012 detailed the connection between prediction market and stochastic mirror descent (SMD). The market price update in prediction market is actually a stochastic mirror descent. The gradient of objective function F(x,d) w.r.t. x is the -d(C,x), where d here is demand function. The Bregman divergence part uses the conjugate dual of cost function C as the regularization function.

- From the stochastic online optimisation perspective, the market price x is updated by minimizing a potential objective function F(x,d), e.g. if the agent bets using Kelly betters, F(x;d) = W * KL(p || x), where W is the wealth of this agent and p is its distribution over the outcome space. We can see that in each update, market tries to match the price x with the agent’s belief distribution under a specific regularization term (i.e. Bregman divergence term). This regularization can prevent the market price to move into agent belief exactly.
- The interesting part to me is the following relationship-d(C,x) = grad of F(x;d).

The form of demand function determines the form of this potential objective function F: Kelly betters leads to KL divergence, isoelastic utility leads to Renyi divergence.

]]>