Pre-processing is slow

For large datasets and GPU training the pre-processing time is comparable to training time. It would be great to do at least one of these two things:

1. Optimize pre-processing cocde
2. Make it possible to dump and restore pre-processed dataset