A modified DeepSpeech2 based Mandarin Recognition. Edit distance in valid data is 14.98% in aishell datasets.
Train this model from audio to chinese character end-to-end.
Use smaller (3x3) convolution kernel instead of 41x21 in paper get improved in valid set, and improve the speed for training model.
To improve the speed during decoding in dynamic GRU, we put the setences with similar length into a batch, and get 1.6x speed up for training model.

