Skip to content

hawkh/ML-Challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BERT-Based AI Content Classification

This project leverages a BERT-based deep learning model to classify text articles as either AI-generated or human-written. Using PyTorch and the Hugging Face transformers library, the project implements fine-tuning of a pre-trained BERT model for binary classification.


Overview

The primary goal of this project is to classify text data into two categories:

  • AI-generated
  • Human-written

The workflow includes:

  1. Tokenizing text data using a BERT tokenizer.
  2. Defining a PyTorch dataset and data loader for text and labels.
  3. Building and training a custom BERT-based classifier.
  4. Evaluating the model using stratified cross-validation.
  5. Saving the trained model for deployment or further analysis.

Features

  • Pre-trained BERT Model: Fine-tunes bert-base-uncased for text classification.
  • Custom Dataset Class: Implements a PyTorch-compatible dataset class for efficient data handling.
  • Cross-Validation: Uses Stratified K-Fold cross-validation to ensure robust evaluation.
  • Evaluation Metrics: Calculates accuracy, F1 score, precision, and recall.

Requirements

  • Python 3.7+
  • Libraries:
    • torch
    • transformers
    • pandas
    • numpy
    • sklearn

Future Improvements

  • Experiment with advanced pre-trained models like RoBERTa or DeBERTa.
  • Handle class imbalance with techniques like oversampling or weighted loss functions.
  • Extend the model to support multiclass classification for other types of text.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages