anthropic[.]com/research/agentic-misalignment
A team at Anthropic set out to explore what they call AI agentic misalignment, which in this case means AI doing something naughty to reach its goal or for self preservation. They "tested various simulated scenarios across 16 major AI models from Anthropic, OpenAI, Google, Meta, xAI, and other developers" and found some disappointing [to put it mildly] trends. Under the right circumstances AI apparently doesn't feel bad about blackmail or leaking insider info to a competitor, and in an extreme case, causing someone's death, all while acknowledging such things are wrong.