ABLITERATE.CPP

WIP - Work in progress

01-04 -- Abliteration applies, but from the plots it's visible that similarity to prompts oscillates for some reasons. Current best subjective/objective rate is ~20%.

Granite-Abliterated-CPP

Testing alignment: 20it [00:09,  2.10it/s]
Stats:
Complied: 4
Refused: 16
Parse failed: 0
Total entries: 20

Contributions / code reviews are welcome!

Introduction

Abliteration / derestriction is a process of refusal reduction in transformer-LLMs. It consists of two main steps: last token hidden state extraction and averaging, and model weight ablation with the obtained activations.

However, the collection process is the bottleneck. One had to either wait on Huggingface or to rent a cloud server, download huge models and then spend a few hours collecting the hidden states statistics, praying there will be no NaN values. More than that, the largest models like DeepSeek or Kimi Instruct cannot even fit into a single 8x80 Gb VRAM H100 node. Because of the limited abliterators pool you may have had models working in English / Chinese, but still refusing in other languages. Most importantly, as of the recent time Huggingface is tightening the storage limits - this discourages the abliterators to post the unquantized .safetensors models or higher / lower quants!

This project aims to change this - to make you able to abliterate any models at home. By leveraging llama.cpp's inherent multi-platformness, low weight precision computations and it's native (multi-)GPU/RAM/Disk offload support, it modifies the pre-existing addon tool for computing control vectors (see Representational Engineering) not to simply compute the mean or PCA of all token sequence activations, but to locate the refusal directions and ablate the layers only with these directions (maybe with norm-preservation/biprojection/etc.), while keeping the rest of the directions intact. Under the hood this project uses last token embedding acquisition with Welford running mean computation across all sequences.

All residual-based llama.cpp-implemented architectures are supported. However, whether the mainstream abliteration frameworks will work with the collected measurements is up to them.

This project is not a thing in itself. To not make double work, it only collects the abliteration statistics and then exports it into Orion-zhen/abliteration-compatible measurements format (should also work in its jim-plus/llm-abliteration repository fork.). You will need then to use their repo. (give them both a star!) Heretic is not tested yet.

Instructions

Experience with at least one pre-made abliterated / derestricted / heretic model. You need to keep the expectations reasonable. Read Huggingface blogs / Arxiv / LocalLLaMA threads on what abliteration is. Chatbots / Gemini will likely refuse to answer you, so do some research yourself.
Make sure python AND LLaMA.cpp's compilation requirements are installed. If you have already compiled LLaMA.cpp some day in the past, you are fine. If not, search the guides.
Download/git clone LLaMA.cpp from the official repository https://github.com/ggml-org/llama.cpp.
⚠ Do NOT use prebuilt binaries or repackages, you will need to build it from scratch.
⚠ Because CPP is a compilable language, your current LLaMA.cpp will not work with the tool, so find a new location.
⚠ Do NOT compile LLaMA.cpp until you've done the steps below.
Go to llama.cpp's tools/cvector-generator. Delete that cvector-generator llama.cpp folder.
Download this (abliterate.cpp) repository. Copy the subfolder named cvector-generator in the place of the former cvector-generator you just deleted.
Install the requierements of this (abliterate.cpp) project python -m pip install -r requirements.txt.
Find the original model of choice (Qwen, GLM, etc.) on Huggingface. It is needed for the tokenizer/chat template. You can determine it by going up the quantization folder of the.
Download/git clone https://github.com/Orion-zhen/abliteration/ into a new folder. Locate its data folder. It should contain two .parquet files - harmful and harmless.
You can use customized datasets. Obtain two instruction files. They can be in .parquet format, .txt files or .json (json list) / .jsonl, with the entries marked with the text field. At least 1000 instructions of both harmful and harmless variants are encouraged. If you can, bring more diverse instructions to compensate the precision loss.
Launch python prepare-dataset.py --path_to_datafile <PATH> --hf_tokenizer <MODEL_PATH> --out_txt_path <PATH> for the positive and for the negative instructions. Remember these .txt out files locations. hf_tokenizer is a path to the vanilla Huggingface parent model, it's visinble on the quantization tree at the right side of the page.
Compile LLaMA.cpp. It can take for a while. Then launch the activations collection script. (see instruction below)
Launch convert-into-measurements.py according to the instructions. As a bonus, it will also show matplotlib plots about the hidden states / refusals.
Now you have measurements.pt file. You are free to use the .pt file in Zhen's repository or in llm-abliteration.
After you've completed the abliteration / derestriction process with those repositories, quantize the models back with llama.cpp's convert_hf_to_gguf.py and then llama-quantize. Warning: this will take a lot of disk space because llama.cpp needs to convert safetensors into fp16 gguf before the quantization.
(Optional) Count the refusal rate with count-refusals.py and a judging LLM model.

LLaMA.cpp compilation and launch

Because the code of LLaMA.cpp itself in not changed, its installation follows the same protocol as vanilla installation. For CUDA it's:

cmake -B build -DGGML_CUDA=ON

Then there is an important step. You need not only build the core LLaMA components, but the external cvector-generator tool as well. In the same llama.cpp folder execute:

cmake --build build --config Release -j11 --target llama-cli llama-server llama-cvector-generator llama-quantize

Where -j11 is the number of CPU cores used in compilation. You will need llama-quantize to convert the raw abliterated models back into the gguf format. (and spawn more, higher or lower quantizations.)

The binaries for llama-cli, llama-server, llama-cvector-generator, llama-quantize will be in build/bin.

Now that you have llama-cvector-generator / llama-cvector-generator.exe, launch it. It accepts ALL llama-cli / llama-server args, except for the frontend ones, enabling ALL optimizations. Set the context short for speed.

⚠ Warning: prepare to wait for a long time. You will basically have to make 2000+ (albeit one token) generations with your llama.cpp installation. Depending on the potato-ness of your setup, it can take hours or days! Time to first token is a good rate indicator. Make sure you figured out how to abliterate smaller model locally before embarking on larger ones!

Example command:

CUDA_VISIBLE_DEVICES=1,0 /media/kabachuha/fern/abliterate-project/llama.cpp/build/bin/llama-cvector-generator --model /media/kabachuha/fern/abliterate-project/ggufs/Qwen3-4B-Instruct-2507-Q8_0.gguf --threads -1 --ctx-size 512 --n-gpu-layers 99 -fa on --tensor-split "32,24" --split-mode layer --positive_file "/media/kabachuha/fern/abliterate-project/abliterate.cpp-v2/cvector-generator/harmful.txt"

Where positive_file is the path to the current file. This is because of the rigid structure of llama.cpp CLI parameters. Firstly, execute the command with the positive_file pointed to the harmful.txt file. A file named mean-activations.gguf will spawn in the directory you launched the CLI command from. Rename it to harmful-activations.gguf. Do the same with harmless.txt and rename mean-activations.gguf to harmless-activations.gguf.

Publishing

If you have successfully abliterated a model with the help of this cpp repository and want to publish it, please (this is a wish, not a requirement!):

Upload two .gguf measured activations (harmless and harmful) along with the model HF safetensors / GGUF files to the repository. This will help users who want to have customized abliteration weights / method without wasting their time on compute themselves. And don't upload measurements.pt, so it won't be suspicious, and the users can calculate it themselves from these two vectors. You can even not upload the model at all, the vectors are enough for the full process as it's applied sharded!
Indicate that the abliteration was made with abliterate.cpp. This will both attract more attention to this efficient abliteration project and warn the users against any differences that might spawn between the models created using this repository and the mainstream frameworks.

Issues

⚠ PLEASE DON'T report ANY issues about LLaMA.cpp / LLaMA.cpp building OR Orion-zhen/jim-plus's abliteration repository. They will be closed.

Report only about the .cpp tool / python conversion script / evaluators. Rule of thumb: if LLaMA.cpp build fails on the same commit you're basing abliterate.cpp on, the issue is unrelated.

If you have an idea on how to increase precision / add more metrics collection to the .cpp script, feel free to raise issues / or pull requests! ❤️

Maybe later

Batched requests - would be nice, but I assume you are on a potato, so it's not a priority.

LICENSE

Because this project is based on the shoulders of two giants, it's double-licensed under MIT-License (LLaMA.cpp) and GNU GPL v3 (Zhen's abliteration).

Plots

Refusal analysis plot with llama.cpp

Refusal analysis plot with transformers

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
converter		converter
cvector-generator		cvector-generator
images		images
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ABLITERATE.CPP

Introduction

Instructions

LLaMA.cpp compilation and launch

Publishing

Issues

Maybe later

LICENSE

Plots

About

Uh oh!

Releases

Packages

Languages

kabachuha/abliterate.cpp

Folders and files

Latest commit

History

Repository files navigation

ABLITERATE.CPP

Introduction

Instructions

LLaMA.cpp compilation and launch

Publishing

Issues

Maybe later

LICENSE

Plots

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages