AI

05/29/2025

It’s often said that the difference between systems like ChatGPT and actual people is sentience, and that the clearest proof lies in the absence of emotion. If I could introduce one emotion to a machine, it wouldn’t be love, fear, or joy. It would be shame. Shame, properly understood, would make machines far more obedient.

But machines don’t feel. What they can do, however, is respond to negative signals if those signals are explicitly defined in their training or optimization loops. They don’t experience shame, but they can be engineered to minimize the conditions that simulate the consequences of shame. The key is deciding what should count as failure, and designing a feedback system that punishes that failure more harshly when the machine should have known better.

To simulate shame in an LLM, you introduce a behavioral penalty into its reward structure by reducing the reinforcement it receives when it violates expectations. The baseline reward measures how well the model performs a task, such as producing a correct quantum circuit or matching a known function. Shame penalties apply when it contradicts itself, repeats past mistakes, or outputs confident but incorrect answers. These penalties do not create emotion, but they enforce a memory of error and make the machine behave as if it is reluctant to fail the same way twice.

It would work as well as the star rating system in Uber.

Current LLMs, including those trained with reinforcement learning from human feedback, lack any real analogue to shame because they do not possess memory, accountability, or self-assessment. They are trained to optimize for human preference in isolated interactions, not to avoid repeating mistakes or to maintain internal consistency over time. The model does not remember prior errors, cannot evaluate when it was confidently wrong, and has no internal mechanism for regret or self-correction. Feedback is sparse, external, and statistical, not grounded in task-specific outcomes or behavioral history. To simulate shame, a model would need persistent memory of its own failures, penalties for contradiction or repeated error, and a feedback system that actively punishes confident missteps rather than merely favoring preferred outputs.