Skip to content

Any info for tweaking training settings for those with little background in LSTMs? #196

@broccolus

Description

@broccolus

Hi All,

Not sure if this is the right place to post this, but I'm looking for a little extra info on how to choose parameters for model training. I have very little background in anything to do with neural networks or even any programming skills. I have been curious to try a little experiment as I find this software fascinating. I am tech-savvy enough that I have installed everything successfully and started training using the default settings. My data set is about 3,000,000 characters. Right now it seems I have reached a point of diminishing returns - the model is consistently underfitting and the loss value doesn't seem to be changing much at each checkpoint. By underfitting I mean that there are consistently many gibberish words and erratic sentence structures despite the structured nature of the data set. A few questions:

  1. How many epochs would training a model generally require to produce effective results? I made it to about 13/50 and it goes quite slow (cpu mode on a crap computer, this point took me >48hrs of constant running). Am I just being impatient? Could the loss value start to change again even after a perceived plateau? Is loss the be-all-end-all of evaluating a training run, or could the model be still improving even if the loss value doesn't change?

  2. If I am faced with underfitting, which model parameters should I change first to improve it? -rnn_size, -num_layers, -batch_size, something else?

  3. Does anybody have any resources designed for beginners that help explain the theory behind neural networks to help me understand exactly what is going on so I can improve my understanding and answer these questions myself?

Thanks all
J

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions