The code currently looks like this:
def get_one_layer_mlp(hidden, k):
# input
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')
# user latent features
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
user = mx.symbol.Activation(data = user, act_type="relu")
user = mx.symbol.FullyConnected(data = user, num_hidden = hidden)
# item latent features
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
item = mx.symbol.Activation(data = item, act_type="relu")
item = mx.symbol.FullyConnected(data = item, num_hidden = hidden)
# predict by the inner product
pred = user * item
pred = mx.symbol.sum_axis(data = pred, axis = 1)
pred = mx.symbol.Flatten(data = pred)
# loss layer
pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
return pred
My understanding is that the embedding layer should be able to learn anything that having a single dense layer on top of it could learn, since it can be basically anything. I had thought a deep matrix factorization would look something more like this:
def get_one_layer_mlp(hidden, k):
# input
user = mx.symbol.Variable('user')
item = mx.symbol.Variable('item')
score = mx.symbol.Variable('score')
# user latent features
user = mx.symbol.Embedding(data = user, input_dim = max_user, output_dim = k)
# item latent features
item = mx.symbol.Embedding(data = item, input_dim = max_item, output_dim = k)
# predict by the inner product
pred = mx.symbol.Concat([user, item])
pred = mx.symbol.FullyConnected(data = pred, num_hidden = hidden)
pred = mx.symbol.Activation(data = pred, act_type="relu")
pred = mx.symbol.FullyConnected(data = pred, num_hidden = 1)
# loss layer
pred = mx.symbol.LinearRegressionOutput(data = pred, label = score)
return pred
Basically, the layers should take a concatenation of the latent variables and have layers on top of that, instead of having layers on top of the embedding layers.
The code currently looks like this:
My understanding is that the embedding layer should be able to learn anything that having a single dense layer on top of it could learn, since it can be basically anything. I had thought a deep matrix factorization would look something more like this:
Basically, the layers should take a concatenation of the latent variables and have layers on top of that, instead of having layers on top of the embedding layers.