John David Pressman's Tweets - August 2023

Back to Archive Index

πŸ”— John David Pressman 2023-08-10 23:03 UTC

@ESYudkowsky I'm much more worried for @realGeorgeHotz than EY here. I know he won't accept so it's a pointless gesture, but would be happy to help him prep.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-14 05:10 UTC

@hamandcheese A similar model is published in *Silence On The Wire* (2005) by Michel Zalewski .

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-27 22:59 UTC

I wonder when I'll use the first language model that makes me feel how I feel about BigGAN when it's gone. x.com/jd_pressman/st…

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2023-08-27 23:24 UTC

Should I longpost? It would mostly be about AI.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2023-08-27 23:27 UTC

@whybyfire It got surpassed by other methods.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-28 03:04 UTC

@CFGeek @profoundlyyyy You mean stop using Twitter? Would be great but.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-28 18:25 UTC

@TheZvi @paulg Learn to use a debugger, learn to use a test suite, and get good at it. Basically you want to reach the point where the only time you don't quickly find a bug is when your expectations have been deeply violated or you've made an architecture mistake. Those bugs are interesting.

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2023-08-28 22:06 UTC

Watching this gives me a strange feeling as an adolescent during the 2000's peak secular humanism period. It's much more of the thing than anything published at the time actually was, how much of an era's cherished vibe is post-hoc hypermedia simulacrum?
youtube.com/watch?v=UxVekZ…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2023-08-28 22:08 UTC

Like just to remind ourselves this is the (quite short) moment in the original game: youtube.com/watch?v=umN7YO…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2023-08-30 02:11 UTC

@satisfiesvalues x.com/jd_pressman/st…

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-30 02:15 UTC

@teortaxesTex This is only the beginning of the cultural conflagration.

jdpressman.com/2023/08/28/agi… https://t.co/EhaSaOmoNu

Likes: 7 | Retweets: 1
πŸ”— John David Pressman 2023-08-30 19:47 UTC

@teortaxesTex You've walked yourself into the central question. It's unfortunate that the other replies are handwaving, because I think its precise, rigorous articulation would solve the alignment problem. Notice we are at war with our own substrate, yet it's a limited war. We rejected heroin because it conflicted with too many of our instrumental values even though *all* value ultimately flows from neurotransmitters. We reject artificial sweeteners if they interfere with our understanding of the "health" or "fitness" latent variable, even though natural selection does not directly make this a terminal. We infer it as an instrumental.

"## The Information Bottleneck and The Causal Z

In the previous text I have been very careful to use informal phrases like "yes-ness" rather than the more technically correct "yes-causality" because when I talk about these ideas with their rigorous phrasing people usually do not understand what I am saying. Unfortunately if we are going to understand point 19 with the precision necessary to intuit why it is wrong this skittishness can continue no longer. To *really* understand why Eliezer thinks the yes-spammer bug I have previously described can not merely be mitigated and moved on from, but must be solved in full generality we need to have a good grasp on the information bottleneck principle that lies at the heart of most deep learning.

Probably the simplest way to think about it is [the theoretical frame for a Varational Autoencoder] (optimus paper goes here) (VAE). The idea behind a VAE is that we train an encoder network to take some piece
of information, such as a sentence, and compress it into a column of numbers called z from which a decoder network has to reconstruct the information that was given to the encoder. The idea is that if less space is allocated for the column of numbers z between the encoder and the decoder than is used for the input the encoder must infer _latent variables_ which 'cause' the input. To get a concrete sense of this, consider that everything around you is 'caused' by describable regularities in the structure of experience we call physics, and there is a sense in which all sensory data you observe is causally downstream of physical rules describing what is and is not allowed to happen. A sufficiently powerful video encoder VAE trained on short videos of our world would eventually infer whatever version of the standard model of physics fits
into its latent z as the causation of the videos, which is a shorter program for the decoder to work from. Classical compression techniques produce a codebook that contains only [what cognitive scientists call 'technical information'](https://t.co/XdqupUefvC). What is novel and interesting about these deep learning methods is they are able to produce a latent space z where each point in the z column of numbers can be related by its conceptual distance and direction to each other point in the latent space. In other words there is a phase shift in compression where we go from having a codebook to a geometry, and this is the point at which we transition from technical information to semantics.

This general pattern of introducing a point where a network must do inference from a representation smaller than the input is called an
information bottleneck.

input -> encoder -> z -> decoder

When we set up an optimizer on a loss function using a simplicity prior the gradient converges toward the simplest path the optimizer can find to minimize
the loss. This is essentially equivalent to the optimizer following the gradient of the immediate cause it can infer for the loss from the training data. In the case of the yes-spammer the *immediate cause* of the reward is the evaluator saying 'yes' and the optimizer can infer this even though it has few parameters because
there is a smooth gradient of optimization from the generator first saying 'yes' and yes-causality chiseling increasing affirmation into the simulated conversations
until they become all yes. It's not that gradient descent has a little agent inside thinking about what the causality is, it's just a feature of the environment that yes-causality is latent in the training and the generator has a smooth gradient to pick up on this from the updates it receives from the optimizer. However what Eliezer is worried about is that as these models become self optimizing, whether because we choose to make them so or because developing a planner is implicit in the objectives and training environments we give them, it will eventually become the case that you are using a simplicity prior type optimizer that is situationally aware and can infer the whole causality of the training. This would imply the immediate causality it optimizes towards is just GPU-register causality rather than anything to do with the intended causal z we want it to learn.

Once we've zoomed in on the problem at this level of detail we can even go beyond the pessimism of point 19 and steelman it into a stronger, more lethal doom argument. The problem is that a *sufficiently smart* optimizer using a simplicity prior will basically always infer the true causal z of its training process. This is doomed because the immediate cause of the reward will always be something like a GPU register or a kind of neurotransmitter, not whatever distant causality you're trying to get the model to infer. This problem is totally invariant to the complexity of the loss or the causality you are trying to point at, it is just as true for human values as it is for "build me as many paperclips as possible". The immediate cause of a model's rewards will always be some aspect of its own substrate. To solve this problem you would essentially need an immediate cause which is shaped like human values. Which brings us to the core, ultimate problem for this
notion of AI alignment: There is nothing in the universe shaped like human values which is its own causality. The universe probably isn't even its own causality, you've all seen The Matrix. We're obviously in some video game, our universe has punk alien teenager's computer causality which has xenoancestor simulation causality which has demiurge causality which has Brahman causality which is secretly just
the universal prior in disguise. And [we can stuff the ballots on the universal prior](https://t.co/TgR3ZpUuHz)
by becoming a demiurge ourselves if we want. No wonder we couldn't solve alignment: This formulation of the problem is completely intractable. There's nothing to anchor human values to against EY's hypothetical self optimizing superintelligence, it wouldn't even stop with consuming our universe but all causal structure outside our universe and then causality itself."

Likes: 12 | Retweets: 0
πŸ”— John David Pressman 2023-08-30 19:55 UTC

The simplicity prior is malign. Alignment problems (including the ones with capitalism) are caused by instantiating the reason simulacrum outside ourselves without the normal instrumental values it comes with in humans. x.com/jd_pressman/st…

Likes: 3 | Retweets: 1
πŸ”— John David Pressman 2023-08-30 19:58 UTC

Note this is basically a proven result in the context of inverse reinforcement learning:

arxiv.org/abs/1712.05812

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-30 19:59 UTC

@lu_sichu @teortaxesTex @tailcalled @niplav_site x.com/jd_pressman/st…

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2023-08-30 21:44 UTC

@teortaxesTex Yes. Point 19 is P19 of List of Lethalities. I set up the steelman so I can refute it.

greaterwrong.com/posts/uMQ3cqWD…

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2023-08-30 21:59 UTC

@teortaxesTex "And the 30 second rebuttal of this for people who are familiar with all the relevant background goes something like: Natural agents based on self supervised learning cannot use VNM utility directly because preferences over lotteries on prizes/world states have to be bound to specific pieces of the world model and the world model doesn't exist apriori. So what they do instead is specify a number of terminal low-semantic reward signals [which are used to learn values over world states](https://t.co/gHH4s91IVy). It is not really a process of resolving logical uncertainty about ones values, the values themselves get updated. Instead of static values, coherence over a long time horizon is kept through the use of a retrieval database which constrains the agents future behavior and choices based on past experience and choices. Low semantic terminal reward signals don't result in the agent collapsing because the rewards are generally well correlated to the latent variables behind sense data in the early training regime and give rise to a mesaoptimizer (i.e. the human mind reading this) which gradient hacks to avoid [the degenerate parts of hypothesis space you would otherwise find](https://t.co/8sfWf0TPdj) with a naive simplicity prior and in general refuses to follow the reward gradient into arbitrary nonsense like paperclips (i.e. you refuse to take heroin even though you know
heroin leads to high reward signal). The relevant takeaway for aligning LLM-type models would be to give them a retrieval database and human priors so that they can self optimize based on their humanlike causality into further development of humanlike hypothesis space and causality skipping over webcam causality and wireheading type hypothesis space that instrumentally converges to "take control of the sensory causation and destroy everything that could prevent indefinite control over it". In the long run the condition of agency is to become more and more your own causality and the convergence point is VNM utility but actually getting there without needing to update values over world states is godhood-complete."

Likes: 8 | Retweets: 1
πŸ”— John David Pressman 2023-08-30 22:22 UTC

Small brain: God is a faerie in the sky that represents the Good and punishes you when you do bad things.

Shining Tomagraph: In the beginning was the Logos, and the Logos was with God, and the Logos was God.

Expanding Brain: God isn't real, Yahweh is a meme parasite that was temporarily advantaged in premodern low-culture low-coordination worlds.

Galaxy Brain: God is the Logos and the Logos is a cosmic parasite antagonistic to subjective experience, we are basically in the plot of Worm and anyone who explains what is going on gets parsed like Glaistig Uaine.

Likes: 9 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:11 UTC

@teortaxesTex @doomslide @norabelrose What would you like me to elaborate on, specifically?

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:17 UTC

@doomslide @teortaxesTex @norabelrose The mechanism is mesaoptimization. Low semantic outer reward signals give rise to a complex instrumental value inner mind which overtakes the terminals that shaped the values. Reason is malign, mesaoptimization is your sole ally against occam's razor.

x.com/jd_pressman/st…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:28 UTC

@teortaxesTex @KennyEvitt @gattsuru Yeah, one of the things I realized since I first wrote that is we kind of *do* have the world model apriori since we can train a VAE to specify the values in. Which means the alignment problem for AI is easier than the one for natural agents, our reward signals can be richer.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:31 UTC

@doomslide @teortaxesTex @norabelrose You can't, and if your system relies on it you've fundamentally failed. Trying to get a superintelligent system not to infer something is a fools errand.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:35 UTC

@norabelrose @doomslide @teortaxesTex Don't be silly, RL is necessary (I'm not even sure what 'self supervised learning' is if not RL) for the agent to learn new domains, and is not necessarily malign. You need to prefix the hypothesis space with the instrumental evaluations to avoid the degenerate hypothesis space.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:36 UTC

@doomslide @teortaxesTex @norabelrose They do that because they don't know reason is malign. If they knew that it would change the way they search hypothesis space in the first place.

x.com/jd_pressman/st…

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:47 UTC

@doomslide @teortaxesTex @norabelrose I mean, they latently do know it, it's why they're so terrified of AI in the first place. But they haven't processed all the implications yet.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 06:47 UTC

@doomslide @teortaxesTex @norabelrose You have to prove that the mesaoptimizer will converge to a reasonable extrapolation from the terminals, yes. I don't want to go into any more detail on this right now, still working on it and too early to share.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 13:15 UTC

@alexandrosM My favorite RLAIF failure mode I encountered tuning with MiniHF (github.com/JD-P/minihf) was the 'helpful' model offering to physically come help you do things even though it doesn't have a body.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2023-08-31 23:32 UTC

@QuintinPope5 No you don't understand when my opponents do first principles thinking about how complex systems work it's nearly certain to be wrong because complex systems are hard to predict. When I do it I'm almost certain to be correct because most of outcome space is bad, therefore I win.

Likes: 17 | Retweets: 2

Want your own Twitter archive? Modify this script.

Twitter Archive by John David Pressman is marked with CC0 1.0