Thoughts on Newcombs Problem

Newcomb’s problem is one of these philosophical thought experiments that I come across every now and then, most recently on the Very Bad Wizards podcast. It is also one of those problems where everybody thinks they are obviously right with their choice and the alternative choice is irrational. I’ve got a couple ideas that are probably novel but that I don’t often see discussed.

Wikipedia defines the problem as follows:

There are two agents: a reliable predictor and a player. Two boxes are designated A and B. The player is given a choice between taking only box B or taking both boxes A and B. The player knows the following:[4]
Box A is transparent and always contains a visible $1,000.
Box B is opaque, and its content has already been set by the predictor:
If the predictor has predicted that the player will take both boxes A and B, then box B contains nothing.
If the predictor has predicted that the player will take only box B, then box B contains $1,000,000.
The player does not know what the predictor predicted or what box B contains while making the choice. Game-theory strategies

The crux is that the predictor is “reliable”, so you’ve just seen 99 people walk up to the two boxes, and they were correctly filled every single time: if they opened the transparent box, the opaque box was empty; if they opened only the opaque box, it contained the money. Take a moment to yourself to think about how you’d approach the problem.

I’ll ignore the exact dollar values for now because $1,000 are easy to ignore if you’re getting 1,000x that. The two-boxers will say that, obviously, by the time they can make the choice, the boxes are already filled. It’s like the prisoner’s dilemma: on a personal level, it’s always better to defect. If the opaque box is full, then great, you get an extra $1,000. If not, then you get at least 1k, which is better than nothing. I will ignore strategies here like “open the opaque box first and then open the transparent box after” for simplicity and assume you can only open both at once, but I don’t think it makes a substantial difference to the problem.

If there is no reverse causality, this is true, and the predictor can only make predictions based on your past behaviour. That is, if you are known to defect on games of game theory or you have boasted about two-boxing before, the opaque box is probably empty. It will only be non-empty if you’re a known good actor that cooperates in these types of games, and you can only defect successfully once before your price is known. In other words, is the payoff in this scenario worthwhile being known as a defector?

Given that this exact scenario happens to most people exactly zero times, that may very well be true! This doesn’t seem like an iterated game, and even if people in your life know that you’d defect, that you’d two-box, that will only have negligible impact on how much they trust you in other cooperative scenarios, especially as defection here doesn’t have any negative externalities. Or at least I hope the impact is negligible, given that two-boxing seems more common. Or maybe it’s just not something most normal well-adjusted people think enough about to have an opinion on. (In the same spirit this post is totally not an exercise in making myself seem more trustworthy by publicly advocating one-boxing.)

On the other hand, the predictor is “reliable,” so it seems likely that their strategy is more sophisticated than observing people’s past behaviour and drawing conclusions about the future. At risk of being fantastical, an entirely consistent explanation that does not violate causality is simulation. If you could observe what choice a perfect copy of a person would make in this situation, you know with perfect accuracy what choice the real version would make (assuming some degree of determinism, I guess). So a sufficiently advanced predictor would just simulate your mind and observe your behaviour. Sure, the real you post-simulation could now two-box without risk as the boxes are already filled, but you may just be the simulated version, so your choice actually does causally affect whether the boxes are filled or no. A perfect simulation is per definition indistinguishable from reality, so you can never know which of the two you are in. If you think this is at all possible or even likely, then one-boxing is the rational choice.

Now, we do not currently have the technology to perfectly simulate human minds, much less without their knowledge, so I would be a lot less worried if the predictor was uncle Joe rather than an angel that descended from the sky to ask me the question. But you know how thought experiments are.

I also thought I was really clever for thinking of the possibility of simulation, but turns out the Wikipedia page has a section on exactly that concept. I guess it’s a classical problem for a reason, and many smarter people have already thought about it a lot. Though hey, rediscovering concepts is always fun, and maybe you got something new out of this too!