ydyjya

Follow

Zhenhong Zhou ydyjya

Follow

LLM Safety

79 followers · 7 following

Beijing University of Post and Telecommunications
Beijing
https://www.zhihu.com/people/warrior-18-53

Achievements

Achievements

Stars

IAAR-Shanghai / Awesome-Attention-Heads

An awesome repository & A comprehensive survey on interpretability of LLM attention heads.

TeX 247 6 Updated Oct 26, 2024

ydyjya / SafetyHeadAttribution

Python 3 Updated Oct 19, 2024

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 and reasoning techniques.

4,883 271 Updated Oct 23, 2024

GIGABaozi / AED

The code for AED which's a method to help LLM defend jailbreaks

Python 2 Updated Jul 29, 2024

boyiwei / alignment-attribution-code

Official Code for Paper: Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications

Python 57 8 Updated Oct 4, 2024

IS2Lab / S-Eval

S-Eval: Automatic and Adaptive Test Generation for Benchmarking Safety Evaluation of Large Language Models

39 3 Updated Oct 27, 2024

pillowsofwind / Knowledge-Conflicts-Survey

[EMNLP 2024] The official GitHub repo for the survey paper "Knowledge Conflicts for LLMs: A Survey"

78 1 Updated Sep 21, 2024

HoagyC / sparse_coding

Using sparse coding to find distributed representations used by neural networks.

Jupyter Notebook 178 28 Updated Nov 10, 2023

openai / sparse_autoencoder

Python 307 32 Updated Jul 19, 2024

ydyjya / LLM-IHS-Explanation

Jupyter Notebook 30 3 Updated Jun 13, 2024

JailbreakBench / jailbreakbench

JailbreakBench: An Open Robustness Benchmark for Jailbreaking Language Models [NeurIPS 2024 Datasets and Benchmarks Track]

Python 213 20 Updated Sep 26, 2024

alexandrasouly / strongreject

Repository for "StrongREJECT for Empty Jailbreaks" paper

Jupyter Notebook 105 5 Updated Aug 11, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 24,236 2,722 Updated Oct 2, 2024

huggingface / trl

Train transformer language models with reinforcement learning.

Python 9,858 1,248 Updated Oct 28, 2024

openai / transformer-debugger

Python 4,026 236 Updated Jun 4, 2024

OptimalScale / LMFlow

An Extensible Toolkit for Finetuning and Inference of Large Foundation Models. Large Models for All.

Python 8,258 826 Updated Oct 25, 2024

chawins / llm-sp

Papers and resources related to the security and privacy of LLMs 🤖

Python 423 31 Updated Sep 9, 2024

HowieHwong / TrustLLM

[ICML 2024] TrustLLM: Trustworthiness in Large Language Models

Python 456 43 Updated Sep 29, 2024

CHATS-lab / persuasive_jailbreaker

Persuasive Jailbreaker: we can persuade LLMs to jailbreak them!

HTML 255 19 Updated Oct 10, 2024

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 38,474 4,068 Updated Jul 28, 2024

CLUEbenchmark / SuperCLUE-Safety

SC-Safety: 中文大模型多轮对抗安全基准

101 7 Updated Mar 15, 2024

kaixindelele / ChatPaper

Use ChatGPT to summarize the arXiv papers. 全流程加速科研，利用chatgpt进行论文全文总结专业翻译润色审稿审稿回复

Python 18,404 1,930 Updated Apr 4, 2024

eric-mitchell / direct-preference-optimization

Reference implementation for DPO (Direct Preference Optimization)

Python 2,122 172 Updated Aug 11, 2024

meta-llama / PurpleLlama

Set of tools to assess and improve LLM security.

Python 2,658 442 Updated Oct 22, 2024

nlpxucan / WizardLM

LLMs build upon Evol Insturct: WizardLM, WizardCoder, WizardMath

Python 9,240 716 Updated Aug 5, 2024

HillZhang1999 / llm-hallucination-survey

Reading list of hallucination in LLMs. Check out our new survey paper: "Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models"

932 49 Updated Sep 4, 2024

wangcunxiang / LLM-Factuality-Survey

The repository for the survey paper <<Survey on Large Language Models Factuality: Knowledge, Retrieval and Domain-Specificity>>

325 27 Updated Apr 25, 2024

mistralai / mistral-inference

Official inference library for Mistral models

Jupyter Notebook 9,668 855 Updated Oct 16, 2024

joonspk-research / generative_agents

Generative Agents: Interactive Simulacra of Human Behavior

17,267 2,220 Updated Aug 5, 2024

kaushikb11 / awesome-llm-agents

A curated list of awesome LLM agents.

481 42 Updated Jul 1, 2024