Predicting Prompt Refusal in Language Models - arxiv.org

Clear