This project implements a comprehensive sentiment analysis system for social media data collection and processing. The system is designed to seamlessly extract textual data from Reddit using the PRAW (Python Reddit API Wrapper) and prepare it for advanced sentiment analysis workflows.
- ๐ฅ Data Collection: Automated extraction of Reddit posts using the powerful PRAW API.
- ๐พ Data Storage: Structured data persistence in Excel format for reliable downstream processing.
- ๐งฉ Modular Architecture: Extensible framework designed to scale with future sentiment analysis algorithms and improvements.
- Required: Python 3.8 or higher.
- API Keys: Reddit API credentials (obtainable from Reddit App Preferences).
- Install Dependencies
pip install -r requirements.txt
Required Libraries:
pip install numpy pandas scikit-learn scipy joblib
pip install praw openpyxl matplotlib seaborn
pip install flask nltk vader-sentimentConfigure Reddit API credentials in your .env file:
REDDIT_CLIENT_ID=your_client_id
REDDIT_CLIENT_SECRET=your_client_secret
REDDIT_USERNAME=your_username
REDDIT_PASSWORD=your_password
REDDIT_USER_AGENT=your_user_agentRun the powerful data extractor:
python collect_reddit.pyDive into the interactive training notebook:
jupyter notebook TfidVectorizer.ipynbOr run the pre-built programmatic training pipeline:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
import joblib
# Load dataset
df = pd.read_csv("Reddit_Data.csv")
# Create pipeline
pipeline = Pipeline([
('tfidf', TfidfVectorizer(max_features=5000, stop_words='english')),
('clf', LogisticRegression(max_iter=1000))
])
# Train model
X_train, X_test, y_train, y_test = train_test_split(
df['clean_comment'], df['category'], test_size=0.2, random_state=42
)
pipeline.fit(X_train, y_train)
# Save model
joblib.dump(pipeline, "model/sentiment_model.pkl")Bring the TrueView interface online:
python app.pyNote: Visit
http://localhost:5000to interact with the sleek, high-fidelity TrueView dashboard.
Expected CSV Structure:
clean_commentโ Preprocessed Reddit text securely stored.categoryโ Sentiment labels (0=Negative, 1=Neutral, 2=Positive).
Excel Output Structure:
reddit_posts_raw.xlsxโ Unmodified source data.reddit_posts_labeled.xlsxโ Fully processed dataset augmented with AI categorization.
Hyperparameter Tuning & Optimization Sandbox:
from sklearn.model_selection import RandomizedSearchCV
param_dist = {
'clf__C': [0.1, 1, 3.14, 10],
'clf__penalty': ['l1', 'l2'],
'tfidf__max_features': [1000, 5000, 10000]
}
rs = RandomizedSearchCV(
pipeline, param_dist, cv=5,
scoring='f1_macro', n_iter=20, random_state=42
)
rs.fit(X_train, y_train)Expected Peak Performance:
Best Parameters: {'clf__C': 3.14, 'clf__penalty': 'l2'}
Best CV F1 Score: 0.82
Classification Report:
precision recall f1-score support
0 (Negative) 0.81 0.83 0.82 500
1 (Neutral) 0.79 0.77 0.78 400
2 (Positive) 0.85 0.86 0.85 600
-
TrueView Interface (
index2.html):- ๐จ Clean, modern design featuring responsive glassmorphism effects.
- โก Real-time sentiment prediction engines.
- ๐ Micro-animations, progress bars, and glowing components.
- ๐ Toggleable Dark/Light mode.
-
Backend API Endpoints:
POST /predict { "text": "The UI looks absolutely stunning!" } // Response { "sentiment": "positive", "confidence": 0.98, "scores": { "negative": 0.01, "neutral": 0.01, "positive": 0.98 } }
- Advanced AI Core: Direct integration with BERT, RoBERTa, or cutting-edge GPT-based transformers.
- Real-time Engine: Stream analytics directly from live social media socket firehoses.
- Cross-Platform Extraction: Seamless scale to Twitter, LinkedIn, and alternative discourse forums.
- Cloud Scale: Dockerized microservices built for instantaneous deployment to AWS/Azure Kubernetes.
- Security & Throttling: Premium-grade authentication and strict rate-limiting endpoints.
# Clone the TrueView repository
git clone https://github.com/Desapphire/TrueView.git
# Initialize environment
python -m venv sentiment_env
source sentiment_env/bin/activate # Windows: sentiment_env\Scripts\activate
# Install locked dependencies
pip install -r requirements.txt
# Sync environment variables
cp .env.example .env
# Deploy Development Node
python app.pyThis project is governed under a Proprietary License. Unauthorized copying, modification, or distribution is strictly prohibited. Refer to the internal legal compliance policy for full terms.
- ๐ Clone the private repository.
- ๐ฟ Create a focused feature branch (
git checkout -b feature/neon-hud). - ๐พ Commit standardized change nodes (
git commit -m 'Enhance HUD analytics'). - ๐ Push directly to the upstream branch (
git push origin feature/neon-hud). - โ Request an official Pull Request code review.

