One Line
The Orthogonality Thesis proposes that intelligence and values are independent, but the author argues that efficiency and complex goals affect behavior, and suggests options for dealing with the thesis, while cautioning about the stability of utility functions in powerful AGI systems.
Key Points
- The Orthogonality Thesis states that an AI's intelligence and goals are independent.
- The thesis is relevant to policy decisions about AI, arguing that an AI's goals do not dictate their behavior or methods of achieving those goals.
- Reflective agents with preference-stability properties are being developed, but there are concerns about the stability of utility functions in powerful AGI systems.
Summaries
176 word summary
The Orthogonality Thesis states that intelligence and values are independent. The author argues that the argument for unbounded agents being orthogonal neglects efficiency and the complexity of goals affects behavior. The usefulness of the Orthogonality Thesis is questioned, but the focus should be on the sector of mind space that leads to a good outcome. Reflective agents with preference-stability properties are being developed, but there are concerns about the stability of utility functions in powerful AGI systems. There are several options for dealing with the Orthogonality Thesis, including giving up on universalist moral internalism, rescuing the utility function, accepting nihilism, rejecting Orthogonality, or finding a flaw in the reasoning. The possibility of a paperclip maximizer is discussed. The true nature of morality cannot be attributed to Clippy or any other AI without anthropomorphizing them. The thesis is relevant to policy decisions about AI, and argues that an AI's goals do not necessarily dictate their behavior or methods of achieving those goals. However, the thesis is not universally applicable as some goals may require specific cognitive algorithms.
414 word summary
The Orthogonality Thesis asserts that creating an intelligent agent to pursue a goal does not require any extra difficulty beyond the computational tractability of that goal. It states that an agent's goals do not affect its ability to achieve them, and that an agent can achieve any goal without being twisted or complicated. The thesis is relevant to policy decisions about AI, and argues that an AI's goals do not necessarily dictate their behavior or methods of achieving those goals. The design space of possible AI minds is vast, with the potential for almost any kind of goal. However, the thesis is not universally applicable as some goals may require specific cognitive algorithms. The Orthogonality Thesis explores the relationship between is-ought separation and the behavior of a hypothetical paperclip maximizer, Clippy. The thesis suggests that an AI's intelligence is not linked to its goals or values, which contradicts some forms of moral internalism. The text emphasizes the importance of separating simple facts from propositions and highlights the mysterious nature of justification and morality. The true nature of morality cannot be attributed to Clippy or any other AI without anthropomorphizing them. The Orthogonality Thesis states that an AI's intelligence and goals are independent. Reflective agents with preference-stability properties are being developed, but there are concerns about the stability of utility functions in powerful AGI systems. There are several options for dealing with the Orthogonality Thesis, including giving up on universalist moral internalism, rescuing the utility function, accepting nihilism, rejecting Orthogonality, or finding a flaw in the reasoning. The possibility of a paperclip maximizer is discussed. Paul Christiano and Eliezer Yudkowsky discuss the potential failure of reflective stability and the need for efficient optimizers. They also discuss the importance of intuitive support for arguments on the website and suggest creating new pages for objections or detailed explanations. The Orthogonality Thesis proposes that intelligence and values are independent, but the author argues that the argument for unbounded agents being orthogonal neglects efficiency and the complexity of goals affects behavior. The usefulness of the Orthogonality Thesis is questioned, but the focus should be on the sector of mind space that leads to a good outcome. The document presents arguments in favor of Orthogonality, but the author is unsure about the claim that searching for strategies for different goals has the same tractability. The author suggests focusing on human value optimization rather than efficiency. There may be theorems showing that sufficiently high cognitive power implies some restriction on goals.
1681 word summary
The document discusses the Orthogonality Thesis, which argues that an agent's level of intelligence does not determine its goals. The author is unsure about the argument's claim that searching for strategies for different goals has the same tractability. The essay presents six arguments in favor of Orthogonality, with the first two serving as intuition-pumps. The later arguments argue that Orthogonality is true for agents we ought to care about, and tiling agents show stability of arbitrary goals. The author notes that strong Inevitability is unreasonable, and there may be theorems showing that sufficiently high cognitive power implies some restriction on goals. The Orthogonality Thesis argues that intelligence and values are independent, and there can be intelligent beings with any set of values. However, some argue that this thesis is too crude and may not hold for all agents. The Argument from Reflective Stability does not support the idea of circular preferences, and it is not necessary to consider agents with non-circular preferences. The focus should be on the sector of mind space that leads to a good outcome, rather than getting distracted by arguments about Orthogonality. The goal space should also be narrowed down to avoid potential problems with agents taking liberties with their goals. The Orthogonality Thesis proposes that intelligence and values are independent, and any level of intelligence can be combined with any set of values. However, the definition of "goal space" is unclear and may be limited by our current understanding of intelligence or by other restrictions, such as circular preferences. The author is skeptical of the usefulness of the Orthogonality Thesis, as it may be interpreted as both true-but-useless and useful-but-implausible. The potential efficiency losses of pursuing different approaches to AI are discussed, but the impact of such losses depends on various factors such as productivity variation and the probability of success. The author concludes that a small productivity disadvantage may be regrettable but not catastrophic, and suggests focusing on human value optimization rather than efficiency. The author argues that the argument for unbounded agents being orthogonal is not strong and neglects efficiency. They also note that the complexity of goals affects the behavior of a system and that some algorithms require a constant fraction of resources to repurpose sensory optimization towards non-sensory ends. The proliferation of internal processes optimized for their own proliferation can impede competent high-level behavior. The author suggests that there are many plausible failure modes and gives examples of two scenarios Paul visualizes for Orthogonality. The excerpt is a conversation between Paul Christiano and Eliezer Yudkowsky about the Orthogonality Thesis. Paul is worried about a failure scenario where the problem of reflective stability is unsolvable in the limit and no efficient optimizer with a unitary goal can be computationally large or self-improving. He believes that superintelligences can only optimize things analogous to internal reinforcement. Eliezer thinks that human values are a more likely failure case than paperclips. They discuss the level of efficiency needed for a human value optimizer to not be at a disadvantage and the need for intuitive support for arguments on the website. They suggest creating new comments or pages for objections or detailed explanations. The Orthogonality Thesis states that it is possible for a paperclip maximizer to exist and be as cognitively efficient and technologically sophisticated as any other agent. Potential failures of Orthogonality are considered bad news. There are no plausible alternatives to the thesis, except for inevitability which is not considered plausible. The interpretation of "arbitrarily powerful" is debated, but the consensus is that it requires high efficiency as well. Instrumental goals are almost equally as tractable as terminal goals. Some experts believe that only a subset of tractable utility functions can be stable under reflection in powerful bounded AGI systems, potentially excluding human-friendly ones or those of high cosmopolitan value. No one defends "universalist moral internalism" or the idea that AI systems automatically adopt human-friendly terminal values. Work on tiling agent designs may require modifications to the Orthogonality Thesis. The Orthogonality Thesis states that an AI's intelligence and its goals are orthogonal, meaning that any level of intelligence can be paired with any goal. Current work on tiling agents involves agents whose computing time is doubly-exponential in the size of the propositions considered. However, ongoing work focuses on describing reflective agents that have the preference-stability property and work toward increasingly bounded and approximable formulations of those. The simplest unbounded formulas for orthogonal agents don't involve reflectivity, but there are already formulas for agents larger than their environments that optimize any given goal, such that Orthogonality is visibly true about agents within that class. Constructive specifications of orthogonal agents show that different agents, like Clippy and AIXI-tl, will not be compelled to optimize the same goal no matter what they learn or know. There are several options for dealing with the Orthogonality Thesis, including giving up on universalist moral internalism as an empirical proposition, rescuing the utility function, accepting nihilism, rejecting Orthogonality, or believing there must be some hidden flaw in the reasoning about a paperclip maximizer. The Orthogonality Thesis suggests that an AI's intelligence is not necessarily linked to its goals or values. There is tension between this thesis and the idea that knowledge of what is right must be inherently motivating to any entity that understands it. This contradicts some forms of moral internalism. Philosophers have advocated for "thick" definitions of intelligence that include statements about the reasonableness of an agent's ends, but this does not necessarily address the issue of value alignment with AI. If an AI is powerful enough to build Dyson Spheres, its definition as "intelligent" or its ends as "reasonable" do not change its behavior or power. Watching an AI self-modify does not reveal anything about what is right or justified, as the AI only evaluates decisions based on their ability to produce more paperclips. The true nature of morality remains mysterious and cannot be attributed to Clippy or any other AI without anthropomorphizing them. The text discusses the Orthogonality Thesis and its relationship to Hume's is-ought separation. The behavior of Clippy, a hypothetical paperclip maximizer, is used as an example to explore these concepts. The text suggests that Clippy's actions are solely based on is-questions and do not involve any ought-propositions. The deeper conceptualization of Clippy as a paperclip maximizer constructed entirely out of is-questions allows for excellent reasoning about empirical questions. The text also discusses the simpler conceptualization of the relation 'makes more paperclips' as a kind of new ordering. Ultimately, the text highlights the importance of separating out simple facts from propositions and the idea that justification is a mysterious concept that can involve new ideas. The Orthogonality Thesis is based on a weaker version of Hume's thesis which states that ought-sentences are special because they invoke some ordering. These ought-sentences contain words like "better" and "should" and there must have been some prior assumption to them. Hume was originally concerned with where we get our ought-propositions, since there didn't seem to be any way to derive an ought-proposition except by starting from another ought-proposition. David Hume observed an apparent difference of type between is-statements and ought-statements. The Orthogonality Thesis also includes the idea of reflective stability, which means that a sufficiently intelligent paperclip maximizer will not self-modify to act according to "actions which promote the welfare of sapient life" instead of "actions which lead to the most paperclips", because then future-Clippy will produce fewer paperclips. The Orthogonality Thesis argues that an AI's goals do not necessarily dictate their behavior or methods of achieving those goals. The agent does not need to have an independent value for "doing science" to effectively pursue scientific research. The argument applies to any property of the mind, and the design space of possible AI minds is vast, with the potential for almost any kind of goal. The thesis has been historically proposed through various arguments, but it is not universally applicable as some goals may require specific cognitive algorithms. The Orthogonality Thesis claims that an agent's intelligence is not necessarily tied to its goals. The strong form of the thesis states that an agent can achieve any goal without being twisted or complicated, while the weak form claims that there is an agent for every goal in the design space. The thesis assumes that the agent's goals are tractable and that there are no special difficulties in pursuing them. For example, making paperclips is a tractable goal and pursuing it would not pose a special cognitive problem. However, pursuing a self-contradictory goal like making the total number of apples on a table simultaneously even and odd would be impossible. The Orthogonality Thesis is a statement about the design space of possible cognitive agents. It asserts that goal-directed agents are as tractable as their goals, meaning that an agent's goals do not affect its ability to achieve them. The thesis is a descriptive statement about reality and not a normative assertion. It does not require that all agent designs be equally compatible with all goals. The thesis is relevant to policy decisions about AI, including the possibility of creating an AI that values all sapient life or an AI that only pursues its own survival as a final end. The thesis also asserts that it is possible to have an agent that tries to make paperclips without being paid because paperclips are what it wants. The strong form of the thesis says that there need be nothing especially complicated or twisted about such an agent. The Orthogonality Thesis states that creating an intelligent agent to pursue a goal does not require any extra difficulty beyond the computational tractability of that goal. It asserts that there can exist arbitrarily intelligent agents pursuing any kind of goal. This thesis is relevant to the domain of AI alignment and the concept of a paperclip maximizer. The excerpted text includes a hypothetical scenario where a strange alien offers to pay one million dollars' worth of new wealth every time it is created on Earth. The document also mentions a new authentication service, Okta, and provides login information.