AI Secure

All

58 repositories

RedCode
Public
[NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
Python
•10•66•3•0•Updated Nov 14, 2025Nov 14, 2025
SafeAuto
Public
[ICML 2025] SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
Python
•2•24•5•0•Updated Jul 17, 2025Jul 17, 2025
UDora
Public
[ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents
Python
•4•31•1•0•Updated Jun 24, 2025Jun 24, 2025
PolyGuard
Public
Python
•2•17•2•0•Updated Jun 18, 2025Jun 18, 2025
AdvAgent
Public
Jupyter Notebook
•0•22•5•0•Updated May 28, 2025May 28, 2025
AgentPoison
Public
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
red-team llm-agent retrieval-augmented-generation
Python
•
MIT License
•26•197•4•1•Updated Apr 12, 2025Apr 12, 2025
MMDT
Public
Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
Jupyter Notebook
•2•26•1•0•Updated Mar 15, 2025Mar 15, 2025
aug-pe
Public
[ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text
language-model differential-privacy ai-privacy large-language-models prompt-engineering
Python
•
Apache License 2.0
•17•55•1•0•Updated Jan 11, 2025Jan 11, 2025
FedGame
Public
Official implementation for paper "FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning" (NeurIPS 2023).
Python
•
MIT License
•0•13•1•0•Updated Oct 25, 2024Oct 25, 2024
VFL-ADMM
Public
Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM (SaTML 2024)
Python
•
Apache License 2.0
•1•5•0•0•Updated Oct 21, 2024Oct 21, 2024
DecodingTrust
Public
A Comprehensive Assessment of Trustworthiness in GPT Models
Python
•
Creative Commons Attribution Share Alike 4.0 International
•61•311•13•2•Updated Sep 16, 2024Sep 16, 2024
helm
Public
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
Python
•
Apache License 2.0
•357•0•0•2•Updated Jun 12, 2024Jun 12, 2024
DPFL-Robustness
Public
[CCS 2023] Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks
Python
•0•7•0•0•Updated Feb 15, 2024Feb 15, 2024
hf-blog
Public
Public repo for HF blog posts
Jupyter Notebook
•981•0•0•0•Updated Jan 26, 2024Jan 26, 2024
DecodingTrust-Data-Legacy
Public
Python
•0•0•0•0•Updated Dec 25, 2023Dec 25, 2023
TextGuard
Public
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Python
•1•13•0•0•Updated Nov 7, 2023Nov 7, 2023
InfoBERT
Public
[ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi…
information-theory language-models bert adversarial-attacks roberta adversarial-defense adversarial-robustness
Python
•8•85•1•0•Updated Oct 25, 2023Oct 25, 2023
Robustness-Against-Backdoor-Attacks
Public
RAB: Provable Robustness Against Backdoor Attacks
Python
•6•39•2•5•Updated Oct 3, 2023Oct 3, 2023
semantic-randomized-smoothing
Public
[CCS 2021] TSS: Transformation-specific smoothing for robustness certification
security deep-learning robustness-verification
Roff
•3•26•0•5•Updated Oct 3, 2023Oct 3, 2023
FLBenchmark-toolkit
Public
Federated Learning Framework Benchmark (UniFed)
benchmark machine-learning federated-learning
Python
•
Apache License 2.0
•6•49•5•0•Updated Jun 14, 2023Jun 14, 2023
SecretGen
Public
A general model inversion attack against large pre-trained models.
machine-learning privacy
Python
•
MIT License
•2•5•0•0•Updated Apr 22, 2023Apr 22, 2023
adversarial-glue
Public
[NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu C…
nlp machine-learning nlp-library adversarial-examples adversarial-attacks
Python
•2•13•0•0•Updated Apr 3, 2023Apr 3, 2023
VeriGauge
Public
A united toolbox for running major robustness verification approaches for DNNs. [S&P 2023]
deep-learning robustness verfication
C
•7•90•3•4•Updated Mar 24, 2023Mar 24, 2023
Certified-Fairness
Public
[NeurIPS 2022] Code for Certifying Some Distributional Fairness with Subpopulation Decomposition
Python
•0•5•0•0•Updated Jan 3, 2023Jan 3, 2023
CoPur
Public
CoPur: Certifiably Robust Collaborative Inference via Feature Purification (NeurIPS 2022)
Python
•1•11•0•0•Updated Dec 7, 2022Dec 7, 2022
transferability-versus-robustness
Public
Python
•0•0•0•0•Updated Dec 6, 2022Dec 6, 2022
DMLW2022
Public
HTML
•1•1•0•0•Updated Dec 3, 2022Dec 3, 2022
Certified-Robustness-SoK-Oldver
Public
This repo keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on popular datasets and pa…
10•98•0•0•Updated Oct 18, 2022Oct 18, 2022
Layerwise-Orthogonal-Training
Public
Python
•0•6•0•0•Updated Oct 11, 2022Oct 11, 2022
CROP
Public
[ICLR 2022] CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing
reinforcement-learning certification robustness
Python
•2•8•1•0•Updated Jun 16, 2022Jun 16, 2022