Skip to content
Change the repository type filter

All

    Repositories list

    • RedCode

      Public
      [NeurIPS'24] RedCode: Risky Code Execution and Generation Benchmark for Code Agents
      Python
      106630Updated Nov 14, 2025Nov 14, 2025
    • SafeAuto

      Public
      [ICML 2025] SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
      Python
      22450Updated Jul 17, 2025Jul 17, 2025
    • UDora

      Public
      [ICML 2025] UDora: A Unified Red Teaming Framework against LLM Agents
      Python
      43110Updated Jun 24, 2025Jun 24, 2025
    • PolyGuard

      Public
      Python
      21720Updated Jun 18, 2025Jun 18, 2025
    • AdvAgent

      Public
      Jupyter Notebook
      02250Updated May 28, 2025May 28, 2025
    • [NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
      Python
      2619741Updated Apr 12, 2025Apr 12, 2025
    • MMDT

      Public
      Comprehensive Assessment of Trustworthiness in Multimodal Foundation Models
      Jupyter Notebook
      22610Updated Mar 15, 2025Mar 15, 2025
    • aug-pe

      Public
      [ICML 2024 Spotlight] Differentially Private Synthetic Data via Foundation Model APIs 2: Text
      Python
      175510Updated Jan 11, 2025Jan 11, 2025
    • FedGame

      Public
      Official implementation for paper "FedGame: A Game-Theoretic Defense against Backdoor Attacks in Federated Learning" (NeurIPS 2023).
      Python
      01310Updated Oct 25, 2024Oct 25, 2024
    • VFL-ADMM

      Public
      Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM (SaTML 2024)
      Python
      1500Updated Oct 21, 2024Oct 21, 2024
    • A Comprehensive Assessment of Trustworthiness in GPT Models
      Python
      61311132Updated Sep 16, 2024Sep 16, 2024
    • helm

      Public
      Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).
      Python
      357002Updated Jun 12, 2024Jun 12, 2024
    • [CCS 2023] Unraveling the Connections between Privacy and Certified Robustness in Federated Learning Against Poisoning Attacks
      Python
      0700Updated Feb 15, 2024Feb 15, 2024
    • hf-blog

      Public
      Public repo for HF blog posts
      Jupyter Notebook
      981000Updated Jan 26, 2024Jan 26, 2024
    • Python
      0000Updated Dec 25, 2023Dec 25, 2023
    • TextGuard

      Public
      TextGuard: Provable Defense against Backdoor Attacks on Text Classification
      Python
      11300Updated Nov 7, 2023Nov 7, 2023
    • InfoBERT

      Public
      [ICLR 2021] "InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective" by Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi…
      Python
      88510Updated Oct 25, 2023Oct 25, 2023
    • RAB: Provable Robustness Against Backdoor Attacks
      Python
      63925Updated Oct 3, 2023Oct 3, 2023
    • [CCS 2021] TSS: Transformation-specific smoothing for robustness certification
      Roff
      32605Updated Oct 3, 2023Oct 3, 2023
    • Federated Learning Framework Benchmark (UniFed)
      Python
      64950Updated Jun 14, 2023Jun 14, 2023
    • SecretGen

      Public
      A general model inversion attack against large pre-trained models.
      Python
      2500Updated Apr 22, 2023Apr 22, 2023
    • [NeurIPS 2021] "Adversarial GLUE: A Multi-Task Benchmark for Robustness Evaluation of Language Models" by Boxin Wang*, Chejian Xu*, Shuohang Wang, Zhe Gan, Yu C…
      Python
      21300Updated Apr 3, 2023Apr 3, 2023
    • VeriGauge

      Public
      A united toolbox for running major robustness verification approaches for DNNs. [S&P 2023]
      C
      79034Updated Mar 24, 2023Mar 24, 2023
    • [NeurIPS 2022] Code for Certifying Some Distributional Fairness with Subpopulation Decomposition
      Python
      0500Updated Jan 3, 2023Jan 3, 2023
    • CoPur

      Public
      CoPur: Certifiably Robust Collaborative Inference via Feature Purification (NeurIPS 2022)
      Python
      11100Updated Dec 7, 2022Dec 7, 2022
    • Python
      0000Updated Dec 6, 2022Dec 6, 2022
    • DMLW2022

      Public
      HTML
      1100Updated Dec 3, 2022Dec 3, 2022
    • This repo keeps track of popular provable training and verification approaches towards robust neural networks, including leaderboards on popular datasets and pa…
      109800Updated Oct 18, 2022Oct 18, 2022
    • Python
      0600Updated Oct 11, 2022Oct 11, 2022
    • CROP

      Public
      [ICLR 2022] CROP: Certifying Robust Policies for Reinforcement Learning through Functional Smoothing
      Python
      2810Updated Jun 16, 2022Jun 16, 2022