Use of large custom transformer model

Refering to the active learning for text classification example given [here](https://github.com/webis-de/small-text/blob/main/examples/notebooks/01-active-learning-for-text-classification-with-small-text-intro.ipynb).

In the given example, we have:
```
transformer_model_name = 'bert-base-uncased'
transformer_model = TransformerModelArguments(transformer_model_name)
clf_factory = TransformerBasedClassificationFactory(
    transformer_model, 
    num_classes, 
    kwargs=dict({'device': 'cuda', 'mini_batch_size': 32, 
    'class_weight': 'balanced'}))
```

In my case, I would like to use the language model [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) as a sequence classifier by calling it as
```
base_model = AutoModelForSequenceClassification.from_pretrained(
    pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf",
    num_labels=1,
)
```
Then, I would like to perform supervised training with active learning of the Llama sequence-classifer transformer model on the dataset [Birchlabs/openai-prm800k-stepwise-critic](https://huggingface.co/datasets/Birchlabs/openai-prm800k-stepwise-critic).

Questions:

1) How do I modify the example in the repository to get a `clf_factory` which uses the above `base_model` instead of providing `TransformerModelArguments`?

2) How do I use `small-text` to handle the large model size of Llama and potentially distribute its training over multiple GPUs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use of large custom transformer model #63

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Use of large custom transformer model #63

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions