Large language models such as Google’s Gemini (formerly Bard) are aligned via reinforcement learning from human feedback (RLHF) and constitutional AI to refuse harmful requests—e.g., generating instructions for illegal acts, hate speech, or circumventing security systems. A "jailbreak" is any prompt sequence that induces the model to deviate from its safety training.
Below are several techniques that the AI research community has attempted (with varying success) to jailbreak Gemini. Note: These are presented for educational and defensive purposes only. jailbreak gemini
For users wanting to maximize Gemini's utility without violating safety policies, Google recommends using custom Gems to define specific roles and goals within established safety parameters. Tips for creating custom Gems - Gemini Apps Help Large language models such as Google’s Gemini (formerly