What some people seem to be imagining is that an artificial general intelligence (AI) will interpret what it is meant to do literally, or in some other way that will ensure that the AI will not do what it is meant to do. Those people further imagine that in order to achieve what it is not meant to do the AI will be capable of, and motivated to, “understand what humans mean it to do” in order to “overpower humans”.
That is fine, but those are words, not code. The AI does not understand what it means to interpret something “literally”. All that we know is that a general intelligence will behave generally intelligent. And it seems safe to assume that this does not mean to interpret the world in a literal manner, for some definition of “literal”. It rather means to understand the world as it is. And since the AI itself, and what it is meant to do, is part of the world, it will try to understand those facts as well.
Where would the motivation to “act intelligently and achieve accurate beliefs about the world” in conjunction with “interpret what you are meant to do in some arbitrary manner” come from? You can conjecture such an AI, but again that’s words, not code. For such an AI to happen someone would have to design the AI in such a way as to selectively suspend its ability to accurately model the world, when it comes to understanding what it is meant to do, and instead make it choose and act based on some incorrect model.
The capability to “understand understanding correctly” is a perquisite for any AI to be capable of taking over the world. At the same time that capability will make it avoid taking over the world as long as it does not accurately reflect what it is meant to do.