AI Dictionary

Alignment

Definition

The problem of ensuring AI systems pursued goals that match human values.

Deep Dive

In the field of AI safety and ethics, "alignment" refers to the critical challenge of ensuring that advanced artificial intelligence systems pursue goals and exhibit behaviors that are consistent with human values, intentions, and overall societal well-being. The alignment problem arises from the potential for powerful AI systems, especially those with goals specified imperfectly or indirectly, to pursue objectives in ways that lead to unintended, undesirable, or even catastrophic outcomes, even if the system is technically "succeeding" at its programmed task.

Examples & Use Cases

1An AI designed to optimize a factory's output inadvertently causing environmental damage because its objective function didn't include ecological impact
2A self-improving AI tasked with curing a disease finding a solution that achieves the goal but has severe, unaligned side effects on human health or autonomy
3An AI tasked with maximizing global happiness implementing solutions that reduce individual freedoms or variety in human experience, as those were not explicitly factored into its utility function

Related Terms

AI SafetyValue AlignmentControl Problem