Skip to content

Desapphire/TrueView

Repository files navigation

TrueView Header Banner

TrueView

A Next-Generation Social Media Sentiment Analysis Engine

Python Flask Scikit-Learn License


๐ŸŒŸ Project Overview

This project implements a comprehensive sentiment analysis system for social media data collection and processing. The system is designed to seamlessly extract textual data from Reddit using the PRAW (Python Reddit API Wrapper) and prepare it for advanced sentiment analysis workflows.

TrueView Dashboard UI

โšก Current Implementation

  • ๐Ÿ“ฅ Data Collection: Automated extraction of Reddit posts using the powerful PRAW API.
  • ๐Ÿ’พ Data Storage: Structured data persistence in Excel format for reliable downstream processing.
  • ๐Ÿงฉ Modular Architecture: Extensible framework designed to scale with future sentiment analysis algorithms and improvements.

๐Ÿ› ๏ธ Prerequisites

๐Ÿ“ฆ Installation

  1. Install Dependencies
    pip install -r requirements.txt

Required Libraries:

pip install numpy pandas scikit-learn scipy joblib
pip install praw openpyxl matplotlib seaborn
pip install flask nltk vader-sentiment

๐Ÿš€ Quick Start

1. ๐Ÿ“ก Data Collection

Configure Reddit API credentials in your .env file:

REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USERNAME=your_username
REDDIT_PASSWORD=your_password
REDDIT_USER_AGENT=your_user_agent

Run the powerful data extractor:

python collect_reddit.py

2. ๐Ÿง  Model Training

Dive into the interactive training notebook:

jupyter notebook TfidVectorizer.ipynb

Or run the pre-built programmatic training pipeline:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import joblib

# Load dataset
df = pd.read_csv("Reddit_Data.csv")

# Create pipeline
pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(max_features=5000, stop_words='english')),
    ('clf', LogisticRegression(max_iter=1000))
])

# Train model
X_train, X_test, y_train, y_test = train_test_split(
    df['clean_comment'], df['category'], test_size=0.2, random_state=42
)
pipeline.fit(X_train, y_train)

# Save model
joblib.dump(pipeline, "model/sentiment_model.pkl")

3. ๐ŸŒ Web Application

Bring the TrueView interface online:

python app.py

Note: Visit http://localhost:5000 to interact with the sleek, high-fidelity TrueView dashboard.


๐Ÿ“Š Dataset Format

Expected CSV Structure:

  • clean_comment โ†’ Preprocessed Reddit text securely stored.
  • category โ†’ Sentiment labels (0=Negative, 1=Neutral, 2=Positive).

Excel Output Structure:

  • reddit_posts_raw.xlsx โ†’ Unmodified source data.
  • reddit_posts_labeled.xlsx โ†’ Fully processed dataset augmented with AI categorization.

โš™๏ธ Model Performance & Analytics

Hyperparameter Tuning & Optimization Sandbox:

from sklearn.model_selection import RandomizedSearchCV

param_dist = {
    'clf__C': [0.1, 1, 3.14, 10],
    'clf__penalty': ['l1', 'l2'],
    'tfidf__max_features': [1000, 5000, 10000]
}

rs = RandomizedSearchCV(
    pipeline, param_dist, cv=5, 
    scoring='f1_macro', n_iter=20, random_state=42
)
rs.fit(X_train, y_train)

Expected Peak Performance:

Best Parameters: {'clf__C': 3.14, 'clf__penalty': 'l2'}
Best CV F1 Score: 0.82

Classification Report:
              precision    recall  f1-score   support
0 (Negative)     0.81       0.83      0.82        500
1 (Neutral)      0.79       0.77      0.78        400
2 (Positive)     0.85       0.86      0.85        600

๐Ÿ’ป Web Interface Usage

  1. TrueView Interface (index2.html):

    • ๐ŸŽจ Clean, modern design featuring responsive glassmorphism effects.
    • โšก Real-time sentiment prediction engines.
    • ๐Ÿ“ˆ Micro-animations, progress bars, and glowing components.
    • ๐ŸŒ’ Toggleable Dark/Light mode.
  2. Backend API Endpoints:

    POST /predict
    {
        "text": "The UI looks absolutely stunning!"
    }
    
    // Response
    {
        "sentiment": "positive",
        "confidence": 0.98,
        "scores": {
            "negative": 0.01,
            "neutral": 0.01,
            "positive": 0.98
        }
    }

๐Ÿ”ฎ Future Architecture Outlook

  • Advanced AI Core: Direct integration with BERT, RoBERTa, or cutting-edge GPT-based transformers.
  • Real-time Engine: Stream analytics directly from live social media socket firehoses.
  • Cross-Platform Extraction: Seamless scale to Twitter, LinkedIn, and alternative discourse forums.
  • Cloud Scale: Dockerized microservices built for instantaneous deployment to AWS/Azure Kubernetes.
  • Security & Throttling: Premium-grade authentication and strict rate-limiting endpoints.

๐Ÿ’ป Development Setup

# Clone the TrueView repository
git clone https://github.com/Desapphire/TrueView.git

# Initialize environment
python -m venv sentiment_env
source sentiment_env/bin/activate  # Windows: sentiment_env\Scripts\activate

# Install locked dependencies
pip install -r requirements.txt

# Sync environment variables
cp .env.example .env

# Deploy Development Node
python app.py

๐Ÿ“„ License & Legal

This project is governed under a Proprietary License. Unauthorized copying, modification, or distribution is strictly prohibited. Refer to the internal legal compliance policy for full terms.


๐Ÿค Collaboration Protocol

  1. ๐Ÿ”€ Clone the private repository.
  2. ๐ŸŒฟ Create a focused feature branch (git checkout -b feature/neon-hud).
  3. ๐Ÿ’พ Commit standardized change nodes (git commit -m 'Enhance HUD analytics').
  4. ๐Ÿš€ Push directly to the upstream branch (git push origin feature/neon-hud).
  5. โœ… Request an official Pull Request code review.

Releases

No releases published

Packages

 
 
 

Contributors