Skip to content

Use of large custom transformer model #63

@Adaickalavan

Description

@Adaickalavan

Refering to the active learning for text classification example given here.

In the given example, we have:

transformer_model_name = 'bert-base-uncased'
transformer_model = TransformerModelArguments(transformer_model_name)
clf_factory = TransformerBasedClassificationFactory(
    transformer_model, 
    num_classes, 
    kwargs=dict({'device': 'cuda', 'mini_batch_size': 32, 
    'class_weight': 'balanced'}))

In my case, I would like to use the language model meta-llama/Llama-2-7b-chat-hf as a sequence classifier by calling it as

base_model = AutoModelForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
    num_labels=1,
)

Then, I would like to perform supervised training with active learning of the Llama sequence-classifer transformer model on the dataset Birchlabs/openai-prm800k-stepwise-critic.

Questions:

  1. How do I modify the example in the repository to get a clf_factory which uses the above base_model instead of providing TransformerModelArguments?

  2. How do I use small-text to handle the large model size of Llama and potentially distribute its training over multiple GPUs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions