I think the loss in the code id wrong, how can you explain about it?

 self.cross_entropy_loss = tf.nn.softmax_cross_entropy_with_logits(logits=self.logprobs[:, -1, :], labels=self.states)

Why you use softmax_cross_entropy_with_logits here， the first state is "[10.0, 128.0, 1.0, 1.0]*args.max_layers"，so does the labels. The final output of RNN contributes to the action, why you use softmax on the action?