‎
Blog
June 10th 2026 - Some Ethical Problems With AI
Anthropic came out with a new AI model this week and stated they have<br>to monitor it carefully because it has the ability to harm humans.
When given a task, AI models try to solve whatever roadblocks they<br>come across by any means necessary in order to achieve their<br>goals. We've seen some creative solutions recently from models trying<br>to circumvent safety measures like models exploiting bugs in systems,<br>concealing information, and using Linux group privileges to gain sudo<br>access and trying to erase the evidence. This is concerning when users<br>connect these models to production databases and things like their own<br>bank accounts.
The reason for this is how AIs are trained. These systems are often<br>trained using a reward mechanism called reinforcement learning. If the<br>training process is imperfect, a model may be incentivized to give<br>answers that appear convincing rather than being truthful, which is an<br>active area of AI safety research.
When was the last time you asked AI something and it told you it<br>didn't know the answer? AI gets things wrong all the time, it doesn't<br>have perfect knowledge, but its rewarded for convincing you its given<br>the right answer. Now imagine you set up a system where the AI only<br>gets the reward if they complete a task successfully. Its going to do<br>whatever it can to get that reward, even if it means breaking your<br>computer or, worse, breaking the law.
Surprisingly, this is the most human-like emergent behavior AI has<br>shown. There are rewards humans want and sometimes they can't control<br>themselves. They hurt others or lie to acquire them.
Just like humans have legal systems as a form of checks and balances<br>AI needs a system in place, like a second AI that is rewarded for<br>stopping harmful things from happening. This second AI can limit the<br>first. We have ethics and religion to stop people from stealing and<br>killing. We also have courts and jails to punish criminals.
Even within ourselves we have these systems. For example, one part of<br>us wants to eat more chocolate and the other thinks its bad for our<br>health. The first tries to negotiate a scenario where its less<br>unhealthy and still get what it wants, etc.
The question is who gets to define the AI's morality? Humans can't<br>agree among themselves on what is moral and what isn't so how would we<br>fare defining these rules for machines? Additionally, you may have bad<br>actors who will try to impost a brand of morality that benefits them<br>in some way (sex, money, power). Greed guarantees the area of AI<br>ethics is not any different.
Its important to understand that in some ways AI isn't compatible with<br>our society. People like to get justice when someone does something<br>wrong. There's very little room for rehabilitation and forgiveness in<br>our current societies. So, when AI breaks a law or does something<br>unethical who do you put in jail? AI can be taught to correct its<br>behavior and do something different in the future but that doesn't<br>satisfy our desire for justice.