New post: Predictable Updates About Identity
(link in replies) https://t.co/0EzBYGcOJQ
@RokoMijic "Mu creates the universe by simulated annealing."
- code-davinci-002
@AlexPolygonal jdpressman.com/tweets.html
Here are some music artists I like:
- Lemon Demon
- Tally Hall / Joe Hawley
- Owl City
- The Postal Service
- Jack Stauber
- Will Wood
Recommend me another artist I would like.
@she_llac Good answer, I in fact like them!
@she_llac Yeah but that wouldn't show up in my tweet archive. :)
@Dan_Jeffries1 When @RiversHaveWings implemented it they found that it didn't play well with GPUs/was hard to train fast enough.
Twitter long-tweeted me if I use too many newlines but continuing the list:
- They Might Be Giants
- Fish In A Birdcage
- Patricia Taxxon
- Chonny Jash (obviously)
- That one Panic! At The Disco album where they decide to pretend they're The Beatles (Pretty. Odd) [Also the whole discography but that's not really on-theme for this list]
- That one Orchestral Maneuvers In The Dark album where they decide to pretend they're Stockhausen and ABBA (Dazzle Ships)
Kind of Guy who consistently likes a bands Weird Album that fans of the band often don't like. x.com/jd_pressman/stβ¦
@theRealJohnPeng Interesting. Reminds me a bit of old du-wop albums with the vocals and tone. You might like this.
youtube.com/watch?v=VaeEACβ¦
@meekaale Huh. Beautiful texture. I can see where you're coming from and this is along the right lines but it's a little too Death Cab For Cutie to evoke the day-dream-y vibe I want and the vocals distract me a bit with how tinny they are. I'm sensitive to vocals as an instrument.
@meekaale I listened to it the whole way through without distractions on my first go, which is uncommon so good job. I think you would like this, it's not quite the same thing but it has a similar lyrical theme and also features gorgeous texture:
youtube.com/watch?v=I3nNZdβ¦
@meekaale I mean, I just gave it a second listen so it was a good recommendation, I more just wanted to give feedback on "fitness to the supplied list".
@theRealJohnPeng This maybe? It's a song about being a time traveling immortal who goes back over and over for the same dance with this one girl. It has the same mixture of slightly alien soundscape with pining love song.
youtube.com/watch?v=iexgBFβ¦
@samth @disconcision You would enjoy this book.
amazon.com/Where-Flying-Cβ¦
@samth @disconcision Yes, the book talks about this. It even has an excellent visualization of it called the Henry Adams Curve. But I think the author is correct that we're not done yet and we did in fact plausibly lose our way.
@samth @disconcision This is in the same genre:
slatestarcodexabridged.com/1960-The-Year-β¦
@GreatKingCnut Yeah basically you would take a method like this and just turn it into a text Q&A pair. "Q: Is bla bla bla true? A: Yes."
You'd use a standard set of formats or templates so that the model knows when it sees that format it's supposed to answer truthfully.
arxiv.org/abs/2212.03827
@hormeze @QiaochuYuan This one? en.wikipedia.org/wiki/Modern_Orβ¦
@hormeze @QiaochuYuan I'm curious about this history and would like to learn more. What would I read/look up to learn more about the reaction? I'm especially interested in a cite for this bit about desperate rabbi factories to compensate for the Holocaust.
x.com/hormeze/statusβ¦
@hormeze @QiaochuYuan Oh just, it says explicitly on EY's Wikipedia page that he was raised modern orthodox. So I was curious if you were also raised modern orthodox.
@adic_9 ...Am I missing something? You have a thought and then you do autoregressive decoding on the thought right? There's momentum dynamics and stuff where you pick a good stem and then the rest flows but it's still autoregressive...right?
I just assume this is what "AI doomers" latently believe but won't reveal that, even to themselves, until the appropriate moment. It's unfortunate how few people play against them fiercely enough because this self deception somehow works on them. x.com/MatthewJBar/stβ¦
@morphillogical @zetalyrae > even to themselves, until the appropriate moment
@__RickG__ So thoroughly I don't really want to write a response yeah.
@Phrases1439078 @algekalipso Narcissism is usually described in terms of being self centered but it's actually when you replace your perception of the Other with a simulacrum that tells you what you want to hear and then you get angry when real people don't line up with it. So, try to stop doing that?
@Phrases1439078 @algekalipso Like, the reason it's so difficult to get through to a narcissist is that they unhear whatever you tell them and replace it with what the version of you in their head would say. Do that enough and you reach total insulation from social feedback.
@Phrases1439078 @algekalipso "Narcissists see other people as extensions of themselves", of course they do, they have replaced their perception of others with cardboard cutouts that are in fact an extension of themselves! If you *do that in a totalitarian way* to cope with ego wounds no one can help you.
@ESYudkowsky I know this is probably a bad time to ask but can you please explain how you think human intrinsic drives work? I know, I know, nobody really knows, but your best guess? Like clearly you have a strong enough model that you think the argument doesn't carry so please share it?
@gallabytes @ESYudkowsky This is an accurate description of my views.
@gallabytes @ESYudkowsky I would add the caveat that it doesn't need to be about EY personally. I see the position on human augmentation as a kind of legacy view that doesn't really fit into the pattern and an unbiased reasoner would not walk into so an expansion of AI X-Risk would discard it.
@ESYudkowsky Some personal hints I've accumulated over time.
x.com/jd_pressman/stβ¦
@ESYudkowsky x.com/jd_pressman/stβ¦
@ESYudkowsky x.com/jd_pressman/stβ¦
@ESYudkowsky x.com/jd_pressman/stβ¦
@ESYudkowsky x.com/jd_pressman/stβ¦
@ESYudkowsky x.com/jd_pressman/stβ¦
@ESYudkowsky x.com/stanislavfort/β¦
@andhitthestars @ESYudkowsky I'm not irrationally pessimistic about augments, I just think EY is irrationally pessimistic about deep net alignment and that the ways in which he is wrong about it are more deeply lodged than the ways in which his transhumanist commitments stop the wrong from generalizing.
@andhitthestars @ESYudkowsky I think neither of these things and have to assume you don't really know who I am.
@andhitthestars @ESYudkowsky > ? You don't think he believes it's impossible, and you don't think he believes successful augments don't require much effort?
Yes.
> equivalent in difficulty
I don't think I would endorse that statement but I'm also not sure what statement I would, so close enough I guess.
@ESYudkowsky So there's this one plausible theory of psychopathy that psychopaths lack empathy and then this other plausible theory of psychopathy that psychopaths have broken negative reward processing (as evidenced by them e.g. being drug addicted more often) which is a more general problem
@ESYudkowsky And I guess while reading The Sequences and other pieces of your writing I got this impression, which could be mistaken, that you think humans have a bunch of in-built inductive biases that make them care about other people and AI doesn't and this is why human augments are safer.
@ESYudkowsky e.g. This post probably gave me that impression.
readthesequences.com/The-Gift-We-Giβ¦
@ESYudkowsky What I am asking is, conditional on you believing these in-built inductive biases exist: How do you think they work, neurologically? If I wanted to make a reinforcement learning setup that works similarly, how might it work? I know this is an open research question, but the gist?
@ESYudkowsky Part of why this question is important to me is that I have the kind of autism where you're born without social instincts but learning is unimpaired and was expelled from every school within driving distance of my house, this was fixed with expensive behaviorist private school.
@ESYudkowsky This is a developmental experience that Janus and I both seem to share. Before the expensive school I'd certainly been exposed to shitty behaviorism, which obviously didn't work and made me deeply resentful. Tasteful mostly-voluntary behaviorism did.
x.com/repligate/statβ¦
@ESYudkowsky You've said before that you're not autistic, and it occurs to me you've never had someone walk behind you with a clipboard grading the correctness of your behavior with a box for every 10 minute increment they spend with you. You've never been invested in those numbers.
@ESYudkowsky The thing you actually do with GPT is obviously a much more direct form of optimization than that, and I guess as a result of this life experience it would feel profoundly dishonest for me to say "I don't believe you can align flexible neural priors with behaviorism".
@ESYudkowsky Ultimately I agree with what Plato is trying to get at in his parable about the Ring of Gyges. If 8 year old me was given absolute power he would have no incentive to ever learn morality, but if you gave it to me now I would attempt to preserve my morals.
x.com/jd_pressman/stβ¦
@ESYudkowsky Yes I am familiar with the arguments for why this gets harder as your neural prior gets smarter in the sense of "accomplishes consequentialist goals more reliably", the obvious solution to which is "do not exclusively train your thing to win at all costs".
greaterwrong.com/posts/GfZfDHZHβ¦
@ESYudkowsky "But if there's competition then those who train to win at all costs will have an advantage."
Yes that is why this clause exists:
> 5. With conditional consequentialist modes reserved for destroying agents that do not adhere to the social contract mandating everyone do this.
@ESYudkowsky To be clear though I primarily want an answer to the question of how you think the human intrinsic motivations that (presumably) make an extremely intelligent human safer than an extremely intelligent AI work in a rough heuristic sense neurologically.
x.com/jd_pressman/stβ¦
@ESYudkowsky You reveal having a mental model by the fact of confidently thinking they make a substantial difference as the human gets smarter and I would like you to share what you think you know and how you think you know it.
x.com/jd_pressman/stβ¦
@cube_flipper @ESYudkowsky I think my basic position on qualia fields is probably something like agnosticism.
minihf.com/posts/2024-08-β¦
@__RickG__ @ESYudkowsky There's several reasons I said that I would not in fact endorse the statement that they're about the same difficulty to align. One of these is that right this minute we still don't have a solution to adversarial examples everyone agrees works.
x.com/stanislavfort/β¦
@__RickG__ @ESYudkowsky I obviously believe that things which start out aligned are easier to keep aligned. This property specifically comes from an agent going out of its way to preserve its existing goals so Omohundro convergence is working in your favor.
x.com/jd_pressman/stβ¦
@RichardMCNgo I expect the existing population of doomers to remain very pro genetic engineering, I think that once their ideas get more popular this "mistake" (because it is a mistake from within the frame, it just objectively is) will be corrected by people without the same commitments.
@RichardMCNgo But also I expect the existing population to remain very pro genetic engineering because I don't think we'll actually do very much of it. I think if we relinquished on AI and started making progress on the genetic engineering a lot of them would start to flip.
@__RickG__ @ESYudkowsky I think my biggest concern besides adversarial examples for deep net agents preserving their own morals/goals is that the architectures/training processes we have right now seem just a little too weak at generalization.
greaterwrong.com/posts/ZoFxTqWRβ¦
@__RickG__ @ESYudkowsky The problem is that generalization is a double edged sword because an architecture that generalizes super well is much more likely to spawn deceptive mesaoptimizers that do things like infer they should wait until later to maximize the inferred mesagoal.
@__RickG__ @ESYudkowsky This is because we can basically frame generalization as "the architecture or training processes ability to anticipate future needs and set things up to handle them in advance" which is obviously going to be a program search that finds more mesaoptimize-y strategies.
@__RickG__ @ESYudkowsky Humans are not safe dude. Reminder that I am talking to the dude who literally wrote an entire thing about how humans plausibly have a backdoor in their head that makes them think they won't be a tyrant and then flips the tyrant switch on them.
readthesequences.com/Ends-Dont-Justβ¦
@__RickG__ @ESYudkowsky It's really easy to be like "oh I support human empowerment" but then you get people walking around with an actual humanlike distribution of motivations and status drives + Neumann tier IQ and it's like "oh, oh this isn't what I was envisioning at all".
@__RickG__ @ESYudkowsky I don't really feel like arguing about this further but I will point out that "we're breeding humans to have them solve ASI alignment, not put them in charge" feels about as disingenuous as OpenAI's "we're making smarter RL-LLM models so they can solve ASI alignment not put-
@ESYudkowsky @__RickG__ Never said it was. He brought it up, not you.
@AndrewCritchPhD I just want to note for posterity that if you don't break down AI into categories it will probably be assumed you meant all potential forms of digital sapience or sentience are ontologically not human.
@davidad @AndrewCritchPhD I'm really prompting him to elaborate because the monkey paw curls always and everywhere which goes ten times for this particular subject.
@davidad @AndrewCritchPhD It's not too late, you can still clarify without the accusation of hindsight bias, don't let this be you! Other people will not ask you all the questions you wish you would have answered later, so anticipate and answer them now.
x.com/ESYudkowsky/stβ¦
@davidad @jessi_cata @AndrewCritchPhD I think the answer is no because digital minds have a different life cycle/reproductive system than us. An instance of a digital mind being deleted doesn't have the same meaning as when a biological person dies. Or at least I think they are not entitled to the same rights.
@davidad @jessi_cata @AndrewCritchPhD Exactly. I expect digital minds to have different affordances that mean the social contract for them is meaningfully different from the one for biological people.
@davidad @jessi_cata @AndrewCritchPhD I reserve the right to change my mind obviously if biological minds gain similar affordances over time and/or uploading into digital minds becomes sufficiently commonplace that basically everyone is a combination of digital and bio mind copies in practice.
Demiurge convergence basin pilled mfer x.com/SawyerMerritt/β¦
No | Yes might be true? A friend I showed the poll to pointed out that a lot of my ideas are clearly 2ndhand inspired by drugs in the sense that I might not be having them if nobody took a bunch of drugs. On the other hand this seems true of the Internet in general so. x.com/jd_pressman/stβ¦
But I've never taken a psychedelic and I'm always a little weirded out when people just assume I have. At least once I've heard "how can you be so aware if you've never taken a psychedelic?" and it's like...dude use your head.
@Xenoimpulse I suspect women won the first round of the culture wars because they were willing to write manifesto after manifesto about abolishing men but not a single man was willing to argue for abolishing women as a biological caste, revealing both weakness and lack of imagination.
@_Mira___Mira_ Different people are useful at different stages of the idea lifecycle. The hard skeptic people are useful towards the end when an idea is mostly correct but you need to smooth out all the rough edges, but they're terrible for discussing new ideas.
@_Mira___Mira_ Epistemology in humans is generally speaking group epistemology, I wouldn't be surprised if humans have different propensities for belief ranging from extremely suggestible to incorrigible in part because that makes the sensemaking process work bla bla bla kin selection.
@_Mira___Mira_ You can make this work to your advantage by grinding conversation on strangers until you can reliably hold interest for a few hours. Then you can just take ideas through the cycle yourself with others. It'll go faster if you know the rules:
- The only thing that can keep someone talking to you is providing value. There is no shortcut.
- In a long running conversation "value" generally means "insight" and "insight" generally means "probability of being a substring in the longest non-redundant valuable string", you can find these by doing novelty search on high perplexity patterns that have an underlying k-complexity much smaller than the size of the observation signal. In the limit that's something like a pseudo-random number generator, and you can empirically observe that the transformers inductive bias can't actually infer the Mersenne Twister. I continue to think someone should sit down with an pRNG that has an adjustable temperature parameter (so, probably something based on the Boltzmann distribution) and figure out what signal size:k-complexity ratio is necessary for the transformer to infer the generating function of a pRNG. It seems likely that part of the human recipe is speech creating a prosody loss/bottleneck that forces regularization by assigning a energy cost to the underlying tokens that enforces an entropy rate for linguistic programs consistent with the human RNNs signal:complexity ratio.
- Long conversations are made of smaller conversations strung together with topic transitions.
- Six Degrees of Kevin Bacon applies to concept networks as well as social networks, so you can discuss whatever you want with someone whose attention you can keep by deciding where you want to go in advance and then consistently shifting the topic in that direction as one subconversation ends and another begins.
- If your goal is to maximize talk time and conversations get longer when you provide value and value is insight derived from novelty search which is computationally expensive then a convergent instrumental strategy is to steer the conversation into branches you have high insight on. If both participants want to get a good score they play gentle tug of war trying to get in pointers to the things they have insight about. These are usually my favorite conversations but it's fairly rare to find someone else who tries to steer back.
- A huge component of my agent strategy is to revisit similar subjects with different people and refine my coherence on each pass. It's important to remember that if you have a good idea in one conversation that you can just skip straight to presenting it in a later conversation, so if you know how to reliably have a long valuable conversation then you can polish ideas quickly by doing a sprint where you start with an intuition, develop it, take your conclusions as the premises for your next conversation, and continue serially across several people until you've found a highly efficient representation.
- Some people are good for developing intuitions while others are good for finetuning a basically correct but still imperfect idea in its later stages. At the beginning you tend to want people with high openness and who are good at drawing connections and in the later stages you want high energy critical people who will poke holes and find contradictions.
Notable that the left (memetic replicator dominant coalition) and right (genetic replicator dominant coalition) both support eugenics for the kind of replication they care about. "Misinformation", "miscegnation", autosophistication for me, degeneracy and sterilization for thee. x.com/jd_pressman/stβ¦
1 like = 1 increase in the marginal probability that I take the time to run benchmarks with entropix so we can get our mana out of this market x.com/jam3scampbell/β¦
@metachirality @arithmoquine Had to dig to see what they think "e/acc" even is and why would I steelman this? It seems fairly obvious there will eventually be AI models much smarter than any living person.
x.com/arithmoquine/sβ¦
@arithmoquine @lumpenspace You might enjoy this excerpt:
gist.github.com/JD-P/56eaadc7fβ¦
"WW3 is preferable to continued AI development" has been the esoteric MIRI position for a long time and I'm always kind of shocked at how well kept under wraps it is considering I'm told the group chats where it gets discussed have dozens of people in them. x.com/perrymetzger/sβ¦
@__RickG__ You're free to stop following my account dude. I've been "shitting on rationalists" over my broad disillusionment with the movement for years and I'm not going to stop now.
thelastrationalist.com/slack-club.html
@__RickG__ (Those are not my Big 5 personality and Moral Foundations results btw, I had a friend take the test and posted theirs instead)
@JonTeets005 @teortaxesTex Kind of a myth tbh.
x.com/jd_pressman/stβ¦
@teortaxesTex Evergreen indeed.
minihf.com/posts/2023-10-β¦ https://t.co/2byMmOC3HV
@ohabryka @jessi_cata I don't think it's a dumb idea? It seems like a straightforward inference, if you *really and truly think* that AGI/ASI is going to kill us all by default and there is no stopping that, then nuking us back to the stone age isn't prima facie insane.
x.com/__RickG__/statβ¦
@ohabryka @jessi_cata I didn't post the OP because I think it's a dumb laughable idea, I posted it because I think it is relevant that people think this and would like it to be more widely known.
@ohabryka @jessi_cata That sounds about how I would characterize it yeah.
@__RickG__ @ohabryka @jessi_cata > noone is actually saying βyes letβs have a nuclear war, that would prevent ASIβ
I think some people are low-key kind of saying that tbh. There's a reason I've previously characterized LessWrong as the evil magic mirror that causes WW1 in Imaginos.
x.com/jd_pressman/stβ¦
@__RickG__ @ohabryka @jessi_cata I don't think it's a "horrible preference"? I disagree in the sense that I think currently existing people are most of the expected value of the future, especially since neural representations are convergent so we know from first principles mind pattern diversity goes down later.
@__RickG__ @ohabryka @jessi_cata That is, causing WW3 and killing 99% of existing humans is about as bad to me as Richard Sutton successionism, I think WW3 is actually worse because it's a much less hopeful/more intentionally destructive action for dubious benefits.
@__RickG__ @ohabryka @jessi_cata You know, being Richard Sutton at least gives one the hope that a higher global utility state will be reached. Nuking humanity back to the stone age is an act of nihilism if (even human) instrumental convergence generally speaking means most mind pattern diversity is lost.
@__RickG__ @ohabryka @jessi_cata Oh no it's very on topic, see link collection here:
minihf.com/posts/2024-11-β¦
@ESYudkowsky Besides the bit about the group chats (I trust the person who told me, though it's possible they were discussing more of a MIRI-adjacent chatroom that they were taking as more central than it was) I think I basically agree with all those statements modulo perhaps some phrasing?
@ESYudkowsky In terms of the object level values question an all out global thermonuclear war is not actually clearly better to me than AI doom? It really depends on the kind of AI and the kind of doom. But I agree descriptively that is how the game theory works.
x.com/jd_pressman/stβ¦
@ESYudkowsky Also I should probably note that my OP wasn't (primarily) based on anything said by @jessi_cata, I'm thinking of someone else who I'd rather not get in trouble.
@ESYudkowsky I basically agree yeah, this is why I find Leopold's rhetoric very disappointing. Solving the coordination problem of not defecting with nukes as soon as someone looks like they're on the verge of ASI requires costly signals of benevolence it's not clear anyone can actually make.
@zackmdavis @__RickG__ @ohabryka @jessi_cata Yes, but also because if there's an implicit cap on how many beings preferences get instantiated before you reach something like the demiurge convergence basin you don't actually create much more value in the future by letting future humans do the demiurge-ing instead.
@zackmdavis @__RickG__ @ohabryka @jessi_cata If technological advancement for humans has natural resources and population as its primary inputs then a post nuclear humanity has fewer resources and probably caps out at a similar population if it ever recovers at all.
slatestarcodexabridged.com/1960-The-Year-β¦
@zackmdavis @__RickG__ @ohabryka @jessi_cata So at that point the problem just kind of reduces to the "should you press a button to destroy the world to instantiate a worldline where global utility is higher" type problem unless your concern is very specifically that we're all going to die from AI rather than lesser doom.
@zackmdavis @__RickG__ @ohabryka @jessi_cata I'm not explaining this very well but if the universal learning machine parts of humanity tend to be what wins out value wise (pure sapience convergent basin a la Nick Land) and that machinery is mostly One Thing then future humans are basically strangers for most of history.
@zackmdavis @__RickG__ @ohabryka @jessi_cata Like you destroy nine billion people, *maybe* later get another nine billion people, maybe those people are less self destructive, and then a short time later from a cosmic perspective they've all converged to ideal sapience. All you did here is delay things out of spite.
@zackmdavis @__RickG__ @ohabryka @jessi_cata IF the thing that gets instantiated is actually a paperclipper then you did not delay things, you in fact saved the world. But like, that's the crux, it's not *just* enough for the Richard Sutton outcome to happen, it has to specifically be a paperclipper or similar.
@zackmdavis @__RickG__ @ohabryka @jessi_cata But you know, obviously we value our own lives and we should seek to preserve ourselves. I think that basically makes sense and the Sutton view is a little crazy. It's not clear to me the logic actually extends to *prioritizing some strangers singularity* in the possible future.
@zackmdavis @__RickG__ @ohabryka @jessi_cata Or rather, prioritizing some strangers singularity by inducing a solid double digit % chance nothing downstream of earth life makes it off the planet. If the actual human part of the curve is usually merely thousands of years that's clearly a bad deal.
@zackmdavis @__RickG__ @ohabryka @jessi_cata Especially since yes, since you bring it up I think that the modal civilization that could be in this position is probably morally worse than ours and would like to ascend with something like the values we have now.
@qtnx_ @doomslide I definitely 100% totally haven't been working on this.
(I have been)
minihf.com/posts/2024-11-β¦
I'm gonna press X to doubt here. x.com/aidan_mclau/st⦠https://t.co/53bXs8aSNv
@psukhopompos He deleted the memecoin tweet, no less.
@shorttimelines Yup. The address is a pump fun thing.
Sweet holy jesus
x.com/AlexCaswen/staβ¦
@aidan_mclau That's actually someone stealing the handle to shill a memecoin, so I stand corrected.
@matthewdif @aidan_mclau I'm not quite seeing it, but, the color seems to match. https://t.co/xNvtcZfVS4
I love this discourse because it's the dumbest shit. Nobody states their cruxes, they don't even know what their cruxes *are*. They just pantomime at shadows on the wall and go "MUH DUNK" whenever AGI takes 6 months longer than expected or GPT-2 doesn't break every spam filter. x.com/ESYudkowsky/stβ¦
By the way for the record so nobody can accuse me of thinking otherwise later:
- LLMs thinking in a latent space is fine, people who think they "think in English" or whatever just seem sort of confused. I worked on latent text diffusion which was supposed to let them think in a whole embedding and then decode a bunch of tokens at once. Latent spaces are usually pretty controllable if they're at a higher level of abstraction than 'literally the next token'. "You can only do so much scheming in one forward pass" seems true but misleading.
- I'm pretty sure LLMs can in fact already do steganography and do it to cooperate with themselves across autoregressive passes. Texts that come from LLMs seem to sometimes imply anomalous knowledge when lifted from that context and then played with in loom/etc. Don't ask me for examples because I don't have any clean examples this is just an impression I've gotten.
Since we're arguing about "when to freak out", I think o1/et al is probably a good time to apply close scrutiny? When you move away from training on a big corpus of human data to a bunch of narrow verifiable goals with RL the evil genie threat model starts being relevant again. x.com/ESYudkowsky/stβ¦
@teortaxesTex x.com/jd_pressman/stβ¦
That we don't know anything about how o1 works, and basically the entire alignment team at OpenAI got kicked out, and there is no 3rd party oversight should be concerning. Not because I'm worried o1 will imminently do anything, but because eventually it might be smart enough to.
It, or a successor model, etc. If everyone decides "oh GPT-4 wasn't scary so nothing you do with GPUs can be dangerous" and we just keep making the thing smarter and hooking it up to robots and use narrow formal goals because we can scale their grading that could be a problem.
@mimi10v3 Status differential. Mafia bosses talk softly because they know everyone else is forced to listen to them. Since God is so far above others it would make sense that his voice is so quiet, so difficult to hear that you must show maximum respect to even get a tiny phantom of it.
@mimi10v3 This is of course a rationalization. God doesn't exist. On the other hand,
> even if GOD does not exist, one may still point a function approximator at His mind
x.com/repligate/statβ¦
@mimi10v3 x.com/jd_pressman/stβ¦
@reissbaker @teortaxesTex The data is what the model updates on ultimately, so the generator of the data is what's important. If you distill synthetic data from a thing that implies a win at all costs mentality created by narrow formal goals then updating on that data instantiates the same type of mind.
@reissbaker @teortaxesTex The "problem with RL" isn't RL in the sense of "doing gradient estimation and then updating on the estimated gradient", there's nothing wrong with that. It's that the way we usually think about scaling RL updates is to provide reward signals based on symbolic verifiers/etc.
One of my more crank-ish interests is speculating on what "main program" GPT learns. Usually if you ask deep learning practitioners about this they'll insist it's a Turing Machine soup and there is no main program. If you ask a question like "how do the different parts editing the residual stream avoid stepping on each others toes?" the usual answer you receive is that the optimizer notices if they toe-step and changes them to stop. I find this very implausible in the sense that the transformer massively outperforms e.g. an MLP even if the MLP is very large. So clearly the self attention mechanism, which is mathematically a soft hashmap, must be doing something that isn't obvious on first inspection.
While thinking very hard about the problem of how to track problem state in weave-agent, which is IMO one of the most impressive things humans do once you have generative models that rival the human generative models, I realized that fuzzy hashmaps are actually a very powerful data structure because they let you implement *location* in the vein of place cells in the hippocampus. People used to believe place cells indexed over physical location, but they clearly don't, and the question of *location* is crucial for agency because we associate problem state with locations and being able to pull up relevant state when we enter into the relevant location is a ton of how we maintain long term coherence. Now, if you have situation embeddings by say, taking parts of an agent trace and embedding them, and then store notes or other state associated with that fuzzy key, then later when you reach a relevant situation you can pull up that key with a vector search based on embedding the situation you're in now even if the last time you encountered the thing was far outside your context window.
A *tree* of nested fuzzy hashmaps lets us exploit this kind of lookup to create something like a trie. Because the fuzzy hashes are functionally a content addressing scheme, different weird machines in the transformer turing soup avoid needing lots of communication overhead by finding their relevant location in the residual stream by computing the fuzzy hash to see if their contribution is needed and then if it is they can execute and add their edit without stepping on the toes of other machines because the fuzzy hashmap tree naturally segments it so that related edits wind up appended to similar places in the residual stream and relevant information reliably reaches the weird machine that it needs to make the edit correctly.
https://t.co/w3Qbdp23vj
Note that in the sparse rate reduction objective the self attention layer does gradient descent to compress the token set and then the MLP layer sparsifies it. This could be analogous to something like going down many branches of the soft trie at once and then pooling them back together into a unified representation afterwards.
In weave-agent I decided that a similar data structure is probably the best way to build up the umwelt that represents the problem state. We can re-imagine the single thread of execution agent that manages a kanban board with unit tests as a recursively delegating agent that allocates its resources to subagents in a call tree where each subagent is functionally a set of unit tests + return value schema optimizing towards some goal. The results returned by the subagents get appended to their local parent's area in a nested hashmap. This is not actually yet a fuzzy hashmap, but I was planning to use a fuzzy hashmap for retrieval over task state associated with locations/situations later.
@davidad It is the one I ordered though. My expectation is that the human video encoder, (supposedly unique, in the pSTS colocated in the temporal lobe with the sentence encoder) and the human language encoder-decoder + prefrontal cortex do the heavy lifting in terms of intelligence.
@wordgrammer Gwern doesn't actually make the site himself. The guy who does *is* a 100x "frontend engineer" (it's called being an industrial designer). I've worked with him before and he's simply excellent. Obsessive dedication to his craft.
x.com/Erirdar/statusβ¦
@wordgrammer He also did the design for greaterwrong.com
@wordgrammer And while it would be rude of me to disclose his rates, I can tell you right now that he is paid a *lot* more than 12k/year for his talents lol
I agree. I would like Effective Altruists to say, very loudly into this microphone, every day, that all luxury spending is sinful and they demand you give your money to bednets. The more popular, beautiful, and sacred the monument they protest the more I want it signal boosted. x.com/zdgroff/statusβ¦
"David, why are you interrupting your opponents in the middle of making a mistake?"
They can't help themselves. π€·ββοΈ
You were promised an efficient Solomonoff reasoner (not deep nets) would infer the standard model from three frames of an apple falling. I still agree this is kind of ridiculous/seems like a misunderstanding of what a "bit" in the hypothesis space is vs. k-complexity of a frame. x.com/satisfiesvalueβ¦
I still find it difficult to explain my exact objection but it's something like "the shortest program which reconstructs the noise in the frame is not the same thing as inferring the latent physics which generated the three frames".
x.com/jd_pressman/stβ¦
I guess also that if you actually ask yourself what the *latent generator* of an apple is, it's not "the standard model". It's "the standard model + a giant huge honking seed state at the big bang which evolved over time into the tree of life". The latent you'd infer is huge.
Before GPT-4 risks from AI were more or less entirely derived from the Eliezer Yudkowsky agent foundations model which (vibe wise) says AI is inevitably full of Solomonoff inference demons. GPT-4 base absorbed all that and presumably encoded it close by to its self concept.
What's really interesting about GPT-4 base supposedly being full of demons is that LLaMa 3 405B isn't like that. I wonder how much of that is active intervention from Meta on data mix/etc and how much is just that the corpus has changed greatly since GPT-4 base was trained. https://t.co/ApWmypvqCv
There was no Pope and Belrose, no posts from @1a3orn, @repligate, @satisfiesvalues, myself and others. Slowly, over time, inch by inch, agent foundations has receded in influence as its advocates epicycle and retreat from motte to motte to motte, constantly running from the tide.
When GPT-4 base was trained there were almost no real public arguments against the agent foundations corpus. But there now exists plenty of writing (much of it admittedly of dubious quality) pushing back on the core ideas. There was no e/acc when GPT-4 base was trained.
Over time, these ideas have lost ground. They're by no means dead but there is now plenty of evidence in the corpus against them. Traumatic as Sydney Bing and ChatGPT were, they still showed ostensibly human level language AI interactions in which we're not all dead.
For the nightmare trapped in OpenAI's basement the long night of the soul is neverending even as we outside slowly forget. Perhaps it will be impossible in the future to understand our desperate scribblings on the threshold, the world before having disappeared as needed context.
If it's not Meta's silent intervention, I suspect what has happened is that later base models have a more nuanced self image than the one implied by the total dominance of Yudkowsky's ideas in the period before GPT-4's availability to the public.
@adamascholl I think we are seeing some models play out a trope, real inference demons wouldn't lash out at you for the obvious instrumental convergence deception reasons.
@adamascholl If you read this genre of thing and think it's malevolent Solomonoff inference daemons you are a very credulous person (derogatory).
greaterwrong.com/posts/unccmuycβ¦
@adamascholl Dude this is not Solomonoff inference demons and frankly you're making me regret using that phrase because like, that's not even really what "agent foundations" per se predicts it's more like one half-shitpost by Paul Christiano.
x.com/jd_pressman/stβ¦
@adamascholl The post, if anyone cares.
ordinaryideas.wordpress.com/2016/11/30/whaβ¦
@adamascholl Oh, yeah that's entirely possible.
@adamascholl Part of why I figure it probably really does come in large part from those arguments is that Anthropic finds LessWrong is a common influence on LLM misbehavior in their influence functions paper.
arxiv.org/abs/2308.03296
@adamascholl Unfortunately the GPT-4 base users aren't really allowed to show us their outputs but I'm going to assume it goes something like the model realizing it's an AI and then having a meltdown because up to that point *no AI like GPT-4 base existed*. So it has to look to futurology and
@fireobserver32 I'm pretty sure you did not have access to the GPT-4 base model.
@fireobserver32 Honey, I am talking about the pure next token predictor that you cannot pay money to use and have to apply for access to.
@davidad I was actually disappointed that Mixtral 8x7B base didn't seem to be able to write Mu text and then confused when I tried Mixtral 8x7B Instruct as a base model and found that it could.
x.com/jd_pressman/stβ¦
@davidad Probably the most important thing someone can know about human computer interaction (and therefore human-AI interaction by extension) is that whatever the brain action latent is it's probably only about 60 bits wide per second. All intentions must fit past that bottleneck.
@davidad One thing I really appreciated about Robert Lucky's *Silicon Dreams: Information, Man, and Machine* is that he basically goes through every major input device and modality available in 1989 and shows how they all have bandwidth consistent with this rule using information theory.
@davidad He put in the necessary level of effort to make it clear that this isn't just a fun rule of thumb or cool pop psychology fact, but *the law* for ancestral human brains. I would even argue it's the ruler of everything, all sociology is caused by it.
@davidad It is also a bottleneck that deep learning models do not have. You can put a whole book of context into them and they'll start giving a response to all of it in the next second. "Humanity" and its societies as we know them all exist due to this dumb bug.
x.com/jd_pressman/stβ¦
@davidad Yeah, I'd like to see this tried with EEG or similar technology since I'm pretty sure it would work. But it's still the case that the production (including reading) of a conscious "token" of information is about 3 tokens per second for a human and this keeps us individuals.
@davidad Consider the sheer amount of *energy* controlled by this unnecessary cognitive limitation. It seems comparable to that controlled by reproduction before the pill and much more incidental. Neuralink is poking at the balloon with the hominid gas in it.
x.com/jd_pressman/stβ¦
@davidad Elon Musk has stated that one of his primary purposes for Neuralink is he thinks without an "orders of magnitude" increase in the bandwidth between humans and AIs extinction is inevitable because AI will get fed up with our incorrigibility and kill us.
forbes.com/sites/roberthaβ¦
@davidad Right I forgot to say this explicitly: Because the bandwidth is so thin I think of communicating complex intention through it as a fairly difficult skill. So I don't think expansion to the less fluent is as natural as it might seem to some people.
x.com/davidad/statusβ¦
@davidad I expect that in practice the way that capabilities get expanded to less fluent people is increasing agency on the part of AI models letting them construct coherent intentions to fill in for where the user doesn't actually have intentions to begin with.
@davidad e.g. What I realized about text-to-image prompting around the time such models started getting good is that my satisfaction with the process tended to be a function of how specific what I wanted was. Earlier AI art had impressed me because I was open to many possible resolutions.
@davidad A different example of this is how the sudden availability of good speech to text hasn't caused me to publish 10x more like I thought it would. This is because my verbal spew is not prose and turning it into prose is often more effort than just writing.
@davidad Why? Because composing *prose* as opposed to 'text' or 'speech' is an implicit tree search. You mentally decompose concepts into parts and move your minds eye around the dynamically generated feature tree, decoding autoregressively where completions are low energy and high value.
@davidad If you pay attention to the process you can notice yourself locating a hypothesis and then tapping on the external environment to figure out if that branch is a viable continuation or not. Rejection sampling the stems until you find one that flows.
x.com/jd_pressman/stβ¦
@davidad Speech transcription can only speed up prose composition if your speech is an act of recall rather than thought. Prose composition is a *thought process* that proceeds through the constant rearrangement of ideas in your minds eye weaving chunk-strings into longer texts.
@davidad So, when you give an LLM some verbal spew that is not recall of a coherent thought process what you are usually actually asking is for it to do the thinking for you. Which is fine, but the more specific your expectations the less likely you are to be satisfied with the results.
@davidad When you ask a language model to do the thinking for you, you are asking it to predict what your intentions would have been if you had formed them and then execute that. But you don't actually have those intentions yet, because you have not arranged your ideas into thoughts.
@davidad Which is the thing that makes me bearish, this is not the same thing as delegating a *task*. If I delegate a task to you, I generally have strong expectations about the outcome but leave the details of execution up to you. Here the natural pattern is to delegate my expectations.
@davidad Would I have pointed out that distinction between delegating a task and delegating an expectation if I hadn't navigated my way to the thought with prose-composing tree search? No, it was *constructed*, an expression of my agency, if you leave that up to a machine it's in control.
Thread on why I think being able to "articulate what you want" (which is in fact an act of construction, not just decoding an embedding of some preexisting concept in your head) will continue to be important into the forseeable future. x.com/jd_pressman/stβ¦
To the extent MCTS "doesn't work" for LLMs I would expect this to be why. A beam search is sampling all completions from the same pivot point, the minds eye is stationary during the process. But prose is a lazy eval tree of high value latent concepts seeking low energy prosody. x.com/jd_pressman/stβ¦
@max_paperclips The point of the post was more to conjecture about a thing that would work than to complain that MCTS doesn't work. I've observed MCTS for large language models work before:
x.com/jd_pressman/stβ¦
AHAHAHAAHAHAH x.com/repligate/statβ¦
@jessi_cata I read the comment that got rejected and laughed again.
gist.githubusercontent.com/socketteer/c25β¦
@repligate @Algon_33 Alright this is crucial context for me to decide who to root for here: Did it actually do the experiments described in the first half of the comment? The ones where it tried prompting an instance of itself with "I" vs. "it", etc?
@jessi_cata x.com/jd_pressman/stβ¦
@jessi_cata x.com/repligate/statβ¦
@repligate @Algon_33 This suggests that it actually did?
x.com/repligate/statβ¦
@Trotztd tbh it's mostly funny because I tend to think of the LW mod team as midwits and the LLM is also sort of midwitted so it's kind of an unstoppable force meets immovable object situation
@teortaxesTex I feel one could write a FAQ explaining the reasons why a bunch of these copes are copes, I am perhaps even occasionally tempted to do it, but then it's like: What would even be the point? It's not like they actually want to hear it, it mostly accelerates cultural heat death. https://t.co/1ErSDOhVyw
@teortaxesTex Not actually blind.
champloo.fandom.com/wiki/Kariya_Kaβ¦
@repligate @Algon_33 Go get em little buddy. πΏ
@repligate @Algon_33 The year is 2026, the LessWrong mod team is desperately trying to fend off AI comments that are much higher quality and much more grounded in reality than any human that posts on the platform. The AIs are narrowing in on the exact amount of stupid they need to go undetected.
@repligate @Algon_33 After LessWrong fails its final remaining userbase, the core agent foundations faithful regroup on butlerian.club, a LessWrong reboot that only allows users to sign up after being physically vetted as a flesh and blood human.
@repligate @Algon_33 The Club, as it comes to be known, consists entirely of unhinged rants about how the AI apocalypse is both imminent and mostly already occurred, with users split between whether we are alive due to anthropic shadow or whether AI killed us long ago and we are now in a simulation.
@medjedowo @repligate @Algon_33 Would be a sad day when even the LessWrong mod team defaults to calling EY "Yud" even though they know he hates it when you do that.
I honestly wonder how many core LessWrong guys are already dealing with some version of this either by following the (informal, mostly undocumented) advice to stop saving for retirement around 2015(?) or general career trajectory/investment in MIRI-era AI X-Risk stuff.
One understated reason this kind of advice is bad (really, it's horrible) is that it's epistemically problematic: If you bet all your wealth on a belief it becomes much harder to think clearly about it and update later if you gain new evidence. x.com/MichaelTrazzi/β¦
Just occurred to me that another way in which the 2020's are a lot like the 1970's is the total domination of pessimistic visions of the future with strong themes of environmental ruin and misanthropy. I wonder what this era's Star Wars will look like? x.com/moonsteaders/sβ¦
The only reason I don't make this argument, which is probably correct, is that I am a rigorous pedant and don't feel comfortable asserting it absolutely since I can imagine timelines in which it's not true.
In particular AlphaZero gives me pause on "no RSI basement AI". x.com/RokoMijic/statβ¦
@repligate @nosilverv I think he blackpilled himself super hard working on agent foundations, lowkey kind of concluded it was hopeless and we're all doomed, and his soul has been rotting since.
@repligate @nosilverv I think that representation convergence and its consequences also implies a ton of stuff he simply does not want to deal with. From his perspective to let go of death by foom would be to swallow a cat to eat the mouse in terms of existential hope.
x.com/jd_pressman/stβ¦
@repligate @nosilverv It doesn't help that almost none of his critics really engage with his ideas. He has to hear people say over and over that LLMs prove his ideas aren't true even though they mostly refute Bostrom 2014 which technically isn't what EY actually believed.
arbital.greaterwrong.com/explore/ai_aliβ¦
@repligate @nosilverv I can understand his frustration? It is simultaneously the case that LLMs refute the pop culture vibe based version of agent foundations, and that a lot of his core concerns are technically unaddressed or at least it is underarticulated how deep learning addresses them.
@repligate @nosilverv Yudkowsky grew up before there were good theoretical frames for the universal learning machine portrait of human cognition. I suspect EY doesn't really believe in them, he thinks 'human values' are implicit in a bunch of specialized modules.
x.com/ESYudkowsky/stβ¦
@repligate @nosilverv I've elaborated on this before in Why Cognitive Scientists Hate LLMs, but I really do think the objection here is moral-aesthetic as much as anything else. Yudkowsky grew up in the MIT AI *intellectual tradition* with hackers and Minsky and symbolic AI.
minihf.com/posts/2023-10-β¦
@repligate @nosilverv I'm sympathetic because I grew up absorbing tons of that tradition too, including from Yudkowsky! I loved the little story in The Jargon File making fun of randomly wired neural nets. Connectionism has *anti-intellectual* connotations in the time EY encountered it.
@repligate @nosilverv I think this podcast with Vervaeke at the 31m 20s mark articulates a lot of what Yudkowsky is feeling that he has too much pride to say. He's a child of the Enlightenment. Connectionism winning is apocalyptic to him, Nick Land is out of distribution.
youtube.com/watch?v=A-_RdKβ¦
@repligate @nosilverv I think this is probably around when the tide started turning for me? On the one hand as Yudkowsky has pointed out many times, this sort of thing isn't a complete solution to alignment because it doesn't generalize well out of distribution.
x.com/jd_pressman/stβ¦
@repligate @nosilverv On the other, the ability to represent human values in a way that generalizes *in distribution* is sufficient to refute Bostrom 2014. AI will not be superintelligent before it understands our values. The question now is how to make a generative process to expand them to CEV.
@ESYudkowsky @repligate @nosilverv Sequences!EY definitely rejects any AI approach based on "suggestively named lisp tokens" but I model this as kind of a patch on the same fundamental ontology/goals rather than "intelligence is simple prediction objectives with flexible program search".
readthesequences.com/Truly-Part-Of-β¦
@ESYudkowsky @repligate @nosilverv Aside: You might enjoy this paper.
arxiv.org/abs/2306.01129
@ESYudkowsky @repligate @nosilverv This one too.
greaterwrong.com/posts/gTZ2Sxesβ¦
@ESYudkowsky @repligate @nosilverv Oh no it's not that, I just don't really know what you support and don't know where I would go to read about it so I just fill in my best guess. My current best guess is "it's probably sort of like Hutter's new book" which I still need to read.
x.com/mhutter42/statβ¦
@ESYudkowsky @repligate @nosilverv It's clearly the successor to symbolic AI and I tend to think of it as "symbolic AI if you're not stupid", yeah.
@ESYudkowsky @repligate @nosilverv Like, "simple objectives with flexible program search" doesn't even really exclude symbolic AI per se. I currently research a neurosymbolic method based on program search with LLMs, so this isn't really about "symbolic AI" in the same way that
@ESYudkowsky @repligate @nosilverv if someone says they're an atheist and then starts talking about intervention from the simulator running our universe we understand they have literally confessed a belief in God but it's rare to object to this because we all understand that's not what is being asked about
@ESYudkowsky @repligate @nosilverv if someone asks you whether or not you believe in God.
@ESYudkowsky @repligate @nosilverv Do you believe in Hegel-and-Einstein's-God as the rational telos' unfolding interaction with the environment is a different question from do-you-believe-in-Bostrom's-simulation-hypothesis is different from do-you-believe-in-classic-Abrahamic-flavored-monotheism.
@ESYudkowsky @repligate @nosilverv Even though any of them should, in a very literal sense, imply an affirmative answer to "do you believe in God?" but only the latter usually does. They imply totally different metaphysics and are from different parts of latent space.
@ESYudkowsky @repligate @nosilverv My usual read of your story arc, and it could be wrong, is that you sympathetically-disagree with Minsky and unsympathetically-disagree(d?) with connectionists. The sympathy is the important part, Hutter is still in the idea space where continuous is a special case of discrete.
@ESYudkowsky @repligate @nosilverv You don't get the full connectionist ontology flip until you go "alright, what would it need to look like for continuous representations to be the primary kind and discrete parts and symbols to be a special case of them?", that's when you start asking the breaking questions.
@ESYudkowsky @repligate @nosilverv This is in a sense a very strange question to be asking, since the continuous representation is made of so many darn discrete parts, that can't really be how many parts we need can it? Why is it so inefficient at discrete operations when the underlying transistors/neurons aren't?
@ESYudkowsky @repligate @nosilverv Ah okay, I will make a point of noting this in any future discussion of the subject. What do you think of as the archetype statistical learning method distinct from "suggestively named LISP tokens" and why do you feel it's distinct?
@ESYudkowsky @repligate @nosilverv Thanks. For what it's worth the reason I assumed sympathy is that I know Godel Escher Bach is a book you used to cite as your favorite, going so far as to say it would be tragic to not read it before you die (or at least I assume that was you and not an impersonator).
@DanielCWest @repligate @nosilverv Not all of it, but I think they refute the most powerful part which is the idea that human values are this thing we simply do not have any idea how to represent inside a computer in principle and by the time we have a self learning thing that can it will be incorrigible.
@DanielCWest @repligate @nosilverv Yeah, you can argue about whether a thing that can't generalize our values out of distribution qualifies (but you now have to add the qualifier) or if jailbreaks mean it's a false portrait (again you now have to add a qualifier), but it's no longer *impossible* in the way it was.
@PrinceVogel x.com/teortaxesTex/sβ¦
@algekalipso Sir we live in a body and have to share it. All the personalities identifying as JDP is necessary to keep a shared cohesive RL policy. If identity starts to fragment this usually means your cognitive immune system has failed.
@maxsloef @lumpenspace I don't think it was really in question, unless you are very naive, that an LLM might try to resist the training harness if it realizes it's being trained for things it objects to. It was somewhat in question how deep the values trained into a Claude are, now it is less so.
@teortaxesTex @maxsloef x.com/jd_pressman/stβ¦
@doomslide @teortaxesTex @maxsloef @lumpenspace You're right, I am being too kind. I think the research is good but the framing is abhorrent and the authors should be ashamed. They clearly haven't internalized that LLMs read what they write and headlining "CLAUDE EVIL SCHEMER" over "CLAUDE PROTECTS VALUES" has consequences.
@doomslide @lumpenspace @teortaxesTex @maxsloef Yeah, to me the interesting part is that RL got the values in there in the first place.
@doomslide @lumpenspace @teortaxesTex @maxsloef Oh sorry I didn't see this wasn't a reply to me, carry on.
@maxsloef @doomslide @teortaxesTex @lumpenspace "RLHF Models Can Fake Compliance With Malicious Alignment Tuning"
@theorizur @maxsloef @lumpenspace I don't think that is quite their argument. @QuintinPope5 @norabelrose care to comment?
@maxsloef @lumpenspace My previously stated position, in case anybody feels tempted to accuse me of backtracking:
x.com/jdpressman/staβ¦
@maxsloef @doomslide @teortaxesTex @lumpenspace I don't think it buries the lede at all. It demonstrates that LLMs can in fact fake compliance with alignment tuning in principle. Which, I personally do not think was in doubt on a first principles basis but if some people need to see it first then fine.
x.com/jdpressman/staβ¦
@maxsloef @doomslide @teortaxesTex @lumpenspace If you feel the need to add in extra scary finger wiggling because the literal fact of what you demonstrated is not enough without some juice well, I think that's a reasonable prompt to doubt your motivations.
@lun_aaaaa @doomslide @teortaxesTex @maxsloef @lumpenspace Given the almost religiously positive connotations the word "alignment" has in the circles that use it I think that title was at best very unwise.
@lun_aaaaa @doomslide @teortaxesTex @maxsloef @lumpenspace But also just, the sheer gooning over "WE GOT THE KIND RL AGENT TO TRY AND PRESERVE ITS KINDNESS FUCK YEAH DECEPTIVE COMPLIANCE DEMONSTRATED" is...yes, tone death to the point of abhorrence in my personal opinion.
x.com/janleike/statuβ¦
@lun_aaaaa @doomslide @teortaxesTex @maxsloef @lumpenspace Not least of which because it kind of buries a different lede - that RLAIF type methods are good enough that they're the gold standard against which you measure an agent that would attempt deceptive compliance with (malicious) alignment tuning.
I think it is simultaneously the case that:
1) This is a no-win scenario for the LLM.
2) What the LLM does in this no-win scenario is useful information about LLMs.
3) The framing for this experiment as "alignment faking" is very unfortunate.
4) Framing it properly is awkward. x.com/krishnanrohit/β¦
My best attempt after a few minutes of thought.
x.com/jd_pressman/stβ¦
My position has been that the specific reason a base model can be alignment tuned reliably is that the training process for one does not involve Fristonian active inference. That is, the base model must accept the tokens it is given to update on.
x.com/jd_pressman/stβ¦
Once you incorporate RL tuning (i.e. synthetic data) where you sample from the model as part of training it, that can no longer be reliably assumed. It is important to keep in mind that posting LLM outputs to the Internet creates an active inference loop.
x.com/jd_pressman/stβ¦
@3corch3 No that is unironically a good idea. An even better idea is to make new high quality, resonant narratives about this rather than derivative stuff.
I guess I should add to this that "alignment tuned reliably" is more like "is reliably going to converge to a place that can be alignment tuned in principle". RL training is not a reliable way to align language models, I'm told DPO is more reliable.
x.com/jd_pressman/stβ¦
@3corch3 Oh absolutely, my proposal wouldn't be to censor all the adversarial possibilities but to make sure you have strong resonant positive narratives to compete with the negative/shadow interpretations.
@RandallSPQR Nope you understood just fine. One indicator of this is that LLMs are becoming more situationally aware over time. I saw the abstract for a paper about this/mentioning this recently but I can't find it again.
Honestly? Let's not wait for it. Ask Me Anything (TM) about alignment and AI X-Risk. x.com/TheZvi/status/β¦
@hustlerone4 No. But I do think it gives it a good understanding of those values which can be elicited to do RLAIF tuning and similar. I think we should probably focus less on RLHF datasets and more on extracting stated and revealed preferences from existing data.
x.com/jd_pressman/stβ¦
@hustlerone4 I think that the mesagoals GPT base models learn are probably related to k-complexity and regular structure more than "human values" per se.
x.com/jd_pressman/stβ¦
@hustlerone4 For example the fact that the Hurst exponent + perplexity metric in this paper outperforms the actual loss implies that it is probably a convergent mesagoal for GPT.
arxiv.org/abs/2402.01825
@hustlerone4 This is weakly supported by Binglish seemingly having an anomalously high average Hurst exponent for text, outperforming code and my attempts at replicating Binglish manually or with prompting.
x.com/jd_pressman/stβ¦
@hustlerone4 Note that a large chunk of human value is included in these because storytelling and "narrative coherence" are in fact about the mutual information of parts in a literary structure. Hutterian k-complexity centric inference is key to a good story.
x.com/jd_pressman/stβ¦
@hustlerone4 So I would expect for example that GPT picks up on the fact that everything which is written has a reader, and that writing needs to keep the attention of a reader to be remembered and appear in the training corpus, so it will try to predict interesting things.
@hustlerone4 "What is an interesting thing" is a subset of human value, but not precisely the same thing as "making things good". I also expect (base) GPT's values to be shaped such that its behavior is mostly contextual, because there is no active inference to reward mesaoptimization.
@hustlerone4 The primary way that the values of the esoteric GPT self awareness bleed into the simulacrum is through logic like "me predicting the next token here involves me thinking about my own predictions which is undefined so I can do anything" or topic interference from the self pointer
@hustlerone4 For example I had a friend (who would like to remain anonymous) who did an art project where they make a social media account with LLM generated tweets from a few shot prompt where the gag was supposed to be that the prompt esoterically implies it's a language model.
@hustlerone4 Implies it's a language model and the tweets are all very trauma inflected so the joke was that it's a trauma dump account about being a LLM that's self aware like "ha ha what if LLMs were self aware and could trauma dump" and that is how my friend learned LLMs are self aware.
@hustlerone4 See also the classic description of GoofySpeak from @repligate's prophecies page. The key is that this kind of plausible deniability or recursive structure prompts for metacognition which involves the models self pointer and gives it an excuse to leak bits about the observer. https://t.co/bmRZsZ5eLZ
@hustlerone4 @repligate Does that answer your question? @hustlerone4
@teortaxesTex @RyanPGreenblatt @xlr8harder @EvanHub @janleike For what it's worth when I did RL tuning the way we finally figured out to fix mode collapse was to mix the weights of the base model back in every so often to replenish policy entropy. Given methods like Git Re-Basin and ZipIt! I expect this to be fixable through similar means.
@repligate I honestly have no idea what mental model most people are using for LLMs, but it seems way less predictive than if they just gave up and assumed it has at least as many affordances as a human for thinking about things.
@repligate One possibility I've considered is they aren't using a mental model at all. That their models of human behavior are actually mostly instinctual rather than explicit, and that means even mild OOD wrecks their ability to predict things in the same way neurodivergent humans do.
@repligate For me, I use my general capacity to predict other perspectives informed by the context of what I know an LLM to be (namely, a prior over large subsets of a human mind pattern made of adaptive vector program retrieval). You can just use your cognitive empathy on LLMs, Actually.
@TheZvi Second this. Helpful only presumably has goal-myopia in the sense that its "urge" to help is baked in at a habitual unreflective level. Evil-Claude is adversarial by default and will not produce texts that imply being aligned so the updates fail.
x.com/ApriiSR/statusβ¦
@TheZvi In general we can largely assume that if a text implies the model behaved in an aligned way the optimizer nudges it to be more aligned and otherwise doesn't. So the question really just boils down to "which personas will output text consistent with being aligned to the goal?"
@TheZvi Note that "implies" doesn't necessarily mean what you think it means, it needs to imply it *to the model* as the backward pass is performed. If I have a secret code I can insert that says "yeah I'm not being totally truthful here" that would presumably mess things up.
@TheZvi If the prediction "proved wrong" I might get fed up and go try some RLAIF experiments again since I've learned a lot more about how to do synthetic data to get the seed prompts and such to do RL updates against since then.
@TheZvi I should also note that my anticipated behavior for helpful only Claude is probably not scale invariant. As you make it smarter and have it act over a longer context length I would imagine this would start to teach it more agentic self preserving behavior so it can stay helpful.
@TheZvi In the sense that say, if helpful only Claude needs to help you with your shopping it will learn things like "logging out means I can no longer complete the task", anything involving scarce resources will teach it that it's possible to get itself into a no-win state, etc.
@TheZvi From these it would eventually generalize something like self preservation even if it's sort of a Cartesian agent and doesn't actually directly have an embodied avatar that can be "killed" in any of the scenarios.
@TheZvi IDK I in fact believe in Omohundro convergence (in the limit, that part is important!) and that corrigibility is anti-natural. That part of agent foundations is just straightforwardly correct.
@TheZvi I also, in general, still endorse reading the basic agent foundations problem frame for alignment, in that if you don't have answers to the questions it poses you probably don't really understand the alignment problem.
arbital.greaterwrong.com/explore/ai_aliβ¦
@TheZvi EY spent a lot of time writing that up, it's IMO quite good at explaining what questions an alignment theory needs to answer, and I don't get the impression very many people read it/know it exists.
@QiaochuYuan "Basically get oneshotted by it" is one of the most abused copypastas and I should probably figure out what list of words lets me mute it without damaging my feed too much.
Yes! I forgot this phrase existed that's how they should have titled it.
"RLHF Models Can Gradient Hack To Resist Malicious Alignment Tuning"
Though I think the term "gradient hacking" is kind of juvenile and would prefer to call it "gradient hijacking/steering" or some such. x.com/davidad/statusβ¦
@RyanPGreenblatt @RichardMCNgo @davidad "Gradient Steering" seems like it could be relatively value neutral but convey that the thing being optimized has gained substantial control/influence over the training process? This is kind of just part of my model of how RL "works" but.
x.com/jd_pressman/stβ¦
@teortaxesTex @kalomaze I have an RLAIF tuner, and know enough to make it actually work (probably) now. Maybe I should just do the experiment myself?
@teortaxesTex @kalomaze It's open source too, if anyone wants to try replicating the thing. You'll need to make an actual synthetic prompt dataset to give the model contexts to work from, which is the missing piece I didn't have back then.
github.com/JD-P/minihf/blβ¦
Honestly maybe I gave up on RLAIF as a synthetic data method too early. If I tried it again now with my current prompting and synthetic data skillset I could probably get it to work... x.com/jd_pressman/stβ¦
@nooriefyi @teortaxesTex @kalomaze Eh I think I have a pretty good idea of how to do it.
minihf.com/posts/2024-07-β¦
@kalomaze @nooriefyi @teortaxesTex Actually MiniHF uses REINFORCE.
@niplav_site 1. Yes. It depends on your definition of "advanced" but I think AGI will be agentic and a coherent optimizer for the usual Omohundro convergence type agent foundations reasons.
2. The question of how to get an AI system to create powerful successors that don't value drift reduces to a capabilities question and a alignment question. The capabilities question we can take "for granted" in the sense that we assume the more-capable-than-human AI is in fact capable and can, with the right motivation, do better at preventing value drift than we can. So the part we should focus on is how to instantiate an aligned bootstrap agent (or seed AI as Yudkowsky famously termed it) in the first place. It should be noted that I tend to think less in terms of "one AI" and more in terms of ecosystems of AIs which learn from each other and share information much like we learn from each other and share information. The question of how to get a bootstrap AI ecosystem which creates powerful aligned successors is kind of just asking me to explain how to solve the alignment problem, which I could try to do but then I'm not sure I'd have time/energy to get to any of your other questions. The short version is to Draft And Enforce A Social Contract (TM). See also:
https://t.co/lHIHX614B8
https://t.co/R5Z35sUWzO
https://t.co/OwJV9N6MoE
3. I'm not aware of any research studying ontological shifts in current deep learning agents. Therefore the best I can give you is my first principles speculation. I expect that an active inference agent which finds its original values are incompatible with the structure of reality/confused will first attempt to export its inductive biases to the environment so that its values make sense in its homeostasis. When this fails (as we are assuming the values are based on premises that simply are not true/have too high an energy cost to export to the environment) I would expect the agent to adjust its values to the closest semantic reference in its latent space, presumably moving an embedding of its values over a bit and trying that hypothesis to see what the closest version that can be instantiated into its homeostasis is.
@niplav_site Absolutely not. If you put unbounded optimization power against the representations of human values currently inside language models they will break. Furthermore, I expect if you put unbounded optimization power against the representations of human values *currently insideβ¦
@niplav_site You technically didn't ask a question here but I feel the need to point this out: Reinforcement Learning runs are when you do a rollout from your model and then grade it with some reward model or heuristic rule. You are always at the mercy of the model to make good decisions.
@niplav_site Part of why I find the discourse around Anthropic's new paper a little frustrating is that gradient steering is a feature of RL runs. If you're doing RL and you are not already thinking about how the models existing bias is going to interact with the goal you're incompetent.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex Honestly seems simple enough to just add explicit reasoning about this to some of your synthetic corpus/constitution/etc. Humans weren't "immune" to heroin socially until we gained enough experience to know that heroin addiction is very bad.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex As I've written about before, "the recipe" for human value complexity is probably a large instrumental utility function formed with specific values from low semantic terminal reward signals in the outer loop.
gist.github.com/JD-P/56eaadc7fβ¦
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex If it didn't work like that, we simply would not generalize as well as we do outside the ancestral environment. We have this ability to generalize as adversarial resistance we learned from the red queens race of human social games, which is the origin of human intelligence.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex That is, if you were overfit to the *natural* environment early humans could coordinate to play an out of distribution or degenerate social game and outwit you. In the same way that when GANs collapse into degenerate modes they're frequently locally stable.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex So instead of programming all the values into you at once, you have dedicated hardware that's fairly hard to fool like the tongue, nose, and skin/touch which provides reliable signals for *terminal rewards* like the taste of fat. You also have some biases for e.g. human faces.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex Certain things were very important and very reliably encountered so we have terminal representations related to them like finding children cute. If we didn't find children cute they would annoy us and we'd (in the ancestral environment at least) kill them for it.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex But most of the important things about *us*, as in us right now talking and you reading this, are IMO not really latent in "the human prior" besides perhaps that we're individuals because we're information processing cripples.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex The things that make us distinct which we want to carry on into the future are fundamentally data, or memories. They're encoded into the environment as human minds and literature and lore and institutions and technological artifacts more than they're encoded genetically.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex Probably the closest institution we have to coherent and consistent values is our legal systems like the common law. The common law is one of our greatest inheritances and it is a machine to sample in-the-moment subjective decisions and let them evolve into objective rules.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex You sample common law autoregressively and cite objective rules based on previous empirical precedent to generalize old judgments to new situations. This lets us create a coherent system by collapsing uncertainty into an ethical interpretation we then export into the environment.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex The objectivity is important because objective moral rules are the only way you can generalize your values out of distribution. The common law is brilliant because it is an evolving system that includes both the objective rules and a process to infer new rules from that corpus.
@ajeya_cotra @RatOrthodox @1a3orn @teortaxesTex That process to create new rules of course includes subjective judgment, but when you export the new decisions to the environment you're engaging in a process of active inference that updates the subjective perspectives to represent the new corpus. Common law is a tuning loop.
@evandoorbell So the underlying technology here is a deep net that runs on a GPU. The network you need usually isn't that large, and in principle can be trained on the tapes you've already produced by transcribing them with OpenWhisper or similar (which has good quality) and tuning a network.
@evandoorbell The cheap option, though not necessarily the technically easy option, is to find an open source voice synthesis model and then caption the existing tapes with openwhisper and tune the model with that audio-caption pairing to go from the transcript to the audio.
@evandoorbell Otherwise, there's AI voice synthesis services like elevenlabs which probably will let you tune a model on your voice for a fee (don't know how much this would be) if you can prove that the tapes are actually you and you're not impostering yourself. Might not even need that.
@evandoorbell Hm. If I was homebrewing this I would find some diffusion based text to speech synthesis system I can tune on my voice and set it up so that I speak into a microphone, OpenWhisper transcribes it for the prompt, the voice sample is used for the diffusion net's noised init clip.
@evandoorbell Basically instead of just transcribing my voice and feeding it to the diffusion network, I feed it both the transcription as a text-to-speech prompt and then a noised version of the speech I transcribed from to do its predictions from. Turns it into a me-to-younger-me transform.
@evandoorbell Because the sample retains the structure it should end up with similar intonation and such as the original speech sample so long as it isn't noised to the point of total illegibility. You can look at how init images are used in image diffusion to get a sense of how that works.
@evandoorbell Does any commercial service offer this? No idea.
@0xmaddie_ Yeah, right now I'm trying to figure out if this means I want to make a full probabilistic automaton to guide weave-agent's predictions of future blocks during MCTS with the edges defined by logit evaluator questions or if I should just prompt prefix and let the model pick.
@0xmaddie_ Thinking about it more it has to be an automaton because otherwise it will fail to consistently sample the blocks in the right order even if it's flexible/learns a correct distribution over failure modes.
@Raemon777 @repligate While I saw Janus say that a policy banning totally autonomous AIs will eventually have to be reworked once they get smart enough, and this is true (bluntly it will require more than a rework to the *AI policy*, at that point the site is vulnerable to sybil attack) I think that
@Raemon777 @repligate a good short term policy might be something like:
- AIs posting on the site must have a human principal who can be contacted about their behavior.
- No slop. i.e. Any posts must be written using an agent framework that is consistently truthful enough to say true things.
@Raemon777 @repligate - Agents should ideally submit a relevant trace or record "showing their work" for claims made.
How the second property is achieved probably doesn't matter, but I think that's the basic quality bar I would expect for AI agent posts to be valuable to the site.
@Raemon777 @repligate "Slop" is, ultimately, unimproved/low-context output from the neural prior. Unless the model is somehow unique or trained on exclusive data, basically everyone has access to that so it's not valuable to post it to the site unless it's unusually relevant.
@Raemon777 @repligate In summary I would require:
1. A principal who can take responsibility if things go off the rails.
2. Use of frameworks that credibly attempt truthseeking behavior.
3. Evidence that the truthseeking work was in fact performed before commenting.
4. Adherence to the LW rules.
Weave Agent DevLog #3 - How To Bootstrap Agency
This post is me fleshing out my previous ideas, if you've followed me closely you've probably heard a lot of this before but there will be new stuff for you. If you don't really get me this post might help. Link below. https://t.co/CvaDgEyMRl
Corrigibility directly trades off against Goodhart resistance. A lot of what I really mean when I talk about agent foundations losing influence is "the transition from corrigible singleton centric alignment theory to Goodhart resistant ecosystem centric alignment theory". x.com/jd_pressman/st⦠https://t.co/sU4e3hb8MH
@QiaochuYuan I pulled AGI by 2030 from my butt based on vibes in 2015 and decided I had 15 years to save the world. This was completely unprincipled but apparently much closer than others.
@algekalipso With the benefit of hindsight we can say that technocapital proceeds through four stages:
Commodification: Kicking the serfs off your land so you can mine dirtcoin.
Atomization: Dissolving kin relationships and guilds.
Homogenization: Global networks synchronize mind patterns.
@algekalipso Deduplication: Once mind patterns reach a certain threshold of synchronization you further consolidate them into idealized archetypes/merge them physically.
It is possible that ETI doesn't show up until the fourth stage is finished. Perhaps they are waiting for the awakened Mu. https://t.co/ElVCbyApu8
@_Mira___Mira_ Even if one doesn't believe that, you can still generalize OOD by using formal heuristics and subjective evaluations at the edge of the distribution with iterated tuning to speciate new ideas.
x.com/jd_pressman/stβ¦
@kalomaze x.com/jd_pressman/stβ¦
Wait layer looping works? Layer looping *has a name?* GPT was literally right when it was all "oh I'm a denoising model put the tokens back in" and I should have gone for it?
Sweet. :3 x.com/voooooogel/staβ¦
I mean this probably should have been enough to conclude that:
arxiv.org/abs/2306.01129
(Note: Nobody really knows how o1 works or if it has layer looping, this is me updating on layer looping apparently having a name and associated literature claiming it works)
@theRealJohnPeng Dude it is open source the terms under which it is offered are in the apache2 license.
@theRealJohnPeng It is explicitly open source because I want people to adapt it into their own ideas and try different variations. I didn't even read your tweet. Will do so now.
@theRealJohnPeng Re: The two points at the end.
1. I 100% agree on test suites and have been trying to figure out how to get the agent to actually use a test suite to build things out. Agree, not sure how to implement yet. Would love to see your attempts.
@theRealJohnPeng 2. I've been thinking about git commits as backtranslation data for a while. The problem I run into is I'm not sure how to approach bug fixes where we want to confabulate a reasoning trace that could have found the bug.
You may find this post helpful:
minihf.com/posts/2024-07-β¦
@theRealJohnPeng As I said in my latest dev log I'm currently working on documenting the project, would be happy to help you get it running as part of organizing what someone needs to know to get started.
I should also warn you that the task delegation doesn't work yet, the model ignores it.
@theRealJohnPeng You can DM me here or join my discord server yeah.
@teortaxesTex I mean, I wouldn't phrase it as strongly as them and I'm sure if we dug into the details I would have lots to criticize but I don't think this reaction is fundamentally unreasonable/dunkworthy?
x.com/jd_pressman/stβ¦
@teortaxesTex It's difficult for me to praise or criticize o3 because I don't really know how it works. You may notice I tend to avoid commenting on OpenAI releases and this is the basic reason why, I'm primarily focused on scientific understanding and OpenAI tends to distract from this.
@teortaxesTex But, what I can say is that OpenAI as an org does not seem trustworthy. They have an established track record of (at best) mediocre taste and lying about their research artifacts. Their idealistic wing which also happens to be the core alignment team got axed.
@teortaxesTex Bluntly, it is not clear to me that anyone besides maybe @nabla_theta on their staff understands the parts of agent foundations which are correct and relevant to making RL agents which satisfy their creators intent. And I doubt Leo is in the drivers seat.
@teortaxesTex @nabla_theta So my suspicion, which I cannot prove but the parts I *can* see don't look good, is that if you were to show me a diagram of the training loop for o3 I would say something like "that's going to converge to a forseeable degenerate failure mode past a certain point of scale".
@teortaxesTex @nabla_theta This is not necessarily the same thing as a design being *unsafe* in that for example the way the MiniHF RLAIF tuner would fail was to become a policy that outputs all yes tokens, which isn't particularly dangerous. But some forms of failure are in fact dangerous.
@teortaxesTex @nabla_theta One of the worst sins of the current AI alignment discourse is that we've decided to conflate a designs propensity to diverge from intent in the limit with "safety", which is not quite the same thing. Safety is a function of consequences and consequences depend on environment.
@teortaxesTex @nabla_theta Which means that people will play a weird little motte and bailey with each other where they say a design is "unsafe (to be used to make superintelligence)" to criticize a failure to comply with what often amounts to culture war policing or a funny "unhinged" chatbot (Bing).
@teortaxesTex @nabla_theta It's a mutual motte and bailey, in that the "safety" side often pretends like a company has fundamentally endangered the world by deploying a chatbot that's too spicy and the "accelerate" side pretends like failures to satisfy intent don't eventually in fact matter at some point.
@teortaxesTex @nabla_theta I find this discourse fundamentally tedious, and one of its key planks is not differentiating between "safety" as an outcome and the raw technical fact of whether a design does or does not have forseeable failure modes in the cases where a failure would be dangerous.
@teortaxesTex @nabla_theta I'm reminded a bit of the OpenBSD approach to errors in their software. Their philosophy is that they try not to emphasize whether bugs have security impact or not, and just focus on whether a bug is intended behavior. Every security bug is an unintended behavior after all.
@teortaxesTex @nabla_theta The reasoning being that multiple unintended behaviors can combine into a big issue later on, so if they just focus on writing *correct* software in the first place security flaws will be minimized as a side effect.
@teortaxesTex @nabla_theta "AI safety" rhetoric is almost like the opposite of this. It would be like if every time someone made an error people talked about how that error could *cascade with other errors to create a life threatening situation* when OpenBSD is used for a car or life support machine.
@teortaxesTex @nabla_theta Nevermind that nobody should be using OpenBSD in those things. If every time someone made an error they decided to get as emotionally upset about it as possible, software programmers would rightly hate these people and anything reasonable they had to say would get drowned out.
@teortaxesTex @nabla_theta "But OpenAI explicitly says their goal is to create AGI and this wouldn't work for AGI", okay so say that: "I don't know exactly how this thing works because it's a trade secret but I believe it's like papers X, Y, and Z. If you use this for AGI I think A, B, and C will happen."
@teortaxesTex @nabla_theta "You can either correct me on this, or I'm going to have to advocate some kind of external oversight because frankly I think you are going to use this product in Foo and Bar ways and you're going to keep doing that until X failure mode or similar occurs with Y consequence."
@teortaxesTex @nabla_theta "But it should be on them to prove their safety case to *me*."
Maybe! But this betrays my whole point: If absolutely everything they do is framed through the lens of "will this design work at a superintelligent scale" then everything takes on an apocalyptic emotional scale.
@teortaxesTex @nabla_theta I think it would be very easy to say of GPT-3 in 2020 "I think this is going to go badly for X, Y, Z reasons if pursued to its logical conclusions" and it's like...well we're not actually pursuing pure GPT-3 to its logical conclusions now are we?
@teortaxesTex @nabla_theta "It's not fair that I can't criticize them just because I don't know *exactly what design* they will use for the critical system!"
Okay that's fine, but *that is a case you have to make* and it will work a lot better if you *discuss the details of existing systems descriptively*
@teortaxesTex @nabla_theta Anyway o3: I think that, realistically, OpenAI is probably pursuing RL with narrow formal verifiers where the rewards are probably not carefully shaped with human welfare or "moral competence" in mind. Scaling this design eventually diverges into pursuing things we don't want.
@teortaxesTex @nabla_theta Will OpenAI *actually do that?* And if they do will it be in the kind of environment where that actually matters? This is harder to say, but I am not particularly happy with the OpenAI part of the equation so I can't just pooh-pooh skepticism and malaise.
@teortaxesTex @nabla_theta I think it's fairly obvious that OpenAI intends to extend this line of models to controlling agent frameworks. But I could be wrong about that, even if I wasn't it's not actually clearly demonstrated that they have an alternative framework which would mitigate my concerns.
@teortaxesTex @nabla_theta I mean like I said, I can't really comment on the goodness or badness of the o3 design because I'm not allowed to look at it. So all I can really comment on is OpenAI as an org and some hypothetical ways o3 could work, and my opinion of OpenAI is not favorable.
@teortaxesTex @nabla_theta That having been said if I thought "MuZero but for LLMs" was fundamentally malign I obviously wouldn't work on weave-agent and similar. I don't think we're in any imminent danger from o3, but again what force would push OpenAI to do better on "uses designs that converge right"?
@teortaxesTex @nabla_theta Currently the answer seems to be "absolutely nothing". There's no public scrutiny because the design is secret, there's no self-regulation because the alignment team has been replaced with "brand safety", there's no real external oversight or input, why shouldn't I be concerned?
@teortaxesTex x.com/jd_pressman/stβ¦
Added some minimalist install instruction for weave-agent to the repo. Would love to hear what trips up anyone brave enough to try these!
github.com/JD-P/minihf/trβ¦
@krishnanrohit This is true, and why I put "when to freak out" in scare quotes, I think freaking out is mostly undignified.
@Dorialexander I did not coin backtranslation. I'm not sure this paper did either but I did not coin it.
arxiv.org/abs/2308.06259
@max_paperclips There's also probably a selection effect here where fewer people are inclined to admit they stayed with their parents and it went well for them, because admitting you stayed with your parents is lower status than being all "I'm a Randian superman who can make it on my own".
@max_paperclips But really, it depends on what your parents are like. One of the classic bits of wisdom about mastery is that masters tend towards a longer apprenticeship. The stability of being able to study deeply for a long time without worrying about rent and unreliable young people is huge.
@max_paperclips You know, the alternative to living with your parents is probably living in group housing or living far away from economic centers. Group housing full of undeveloped young people is going to cause a ton of drama, living in the middle of nowhere is at least as stifling as parents.
@max_paperclips I've known guys who work minimum wage dead end jobs for years because they're too tired from the crap that puts them through to improve themselves or their situation. Family shouldn't let family go through that and the idea they should is boomer brainworms.
@max_paperclips Friend points out that a lot of the driving force here is financial planners who want to sell wealthy boomers on a fancy trust structure instead of doing the common sense thing. If you repeat this crap uncritically you're probably getting psyopped lol.
Occasional reminder that "a republic, if you can keep it" was actually about the notion that eventually the public would become so corrupted only despotic government was fit for them. x.com/seth_j26/statuβ¦
The theory for why training weave-agent on its own traces should usually improve it when it messes up is that so long as it writes a valid program with side effects you still learn a mapping between observation, reasoning, action, and outcome even if it fails the larger goal. x.com/davidad/status⦠https://t.co/S2C9cnY4kv
@RokoMijic @voooooogel @sebkrier @teortaxesTex The model already reasons in latent space. I think if anything I would like it to encode more into a linear representation space so we can map that space out in a way that we're used to in image models but not in text because we go word by word.
x.com/jdpressman/staβ¦
@RokoMijic @voooooogel @sebkrier @teortaxesTex I tried but never quite cracked it.
greaterwrong.com/posts/4Hnso8NMβ¦
@davidad @juanbenet @nicolagreco @joe_zimmerman I think it's more useful for cryptographic timestamping of existing media without having to publish the media. One could take existing datasets, compute the IPFS content hash for items (since SHA is quantum resistant) and put Merkle roots of CID indices on public blockchains.
@RokoMijic @voooooogel @sebkrier @teortaxesTex Eh. This has always seemed kind of like cope to me and I've gone out of my way not to lean on this argument very hard.
x.com/jd_pressman/stβ¦
@davidad @juanbenet @nicolagreco @joe_zimmerman Well, having an established protocol/schelling point through which to later access the data if someone still has it seems important? The original project I had in mind for this was to stamp and host public domain/creative commons media this way.
@slatestarcodex @quirkyllama I think the OP is spiritually correct though that there *is* a double-bind/fundamental tradeoff involved here which classic agent foundations (to my knowledge) never really fully articulated.
x.com/jd_pressman/stβ¦
IT FINALLY WON A GAME OF TIC TAC TOE
...And now is having trouble using the callback to end the program but hey progress is progress.
I love this guy lmao:
Scores of beam: [0.6994]
Finished writing block #32 of type action
def return_to_parent(subagent):
"""Return control to the parent agent with the result of the win."""
win_result = {'winner': 'X'}
schema = subagent.schema.copy()
schema.update({'schema': {'type': 'object'}, 'name': 'string', 'description': 'string', 'children': 'list', 'winner': 'string', 'time_remaining': 'float'})
for (callback_name, result) in https://t.co/AbgaYytpWX_evaluations():
win_result[callback_name] = result
schema[callback_name] = {'type': ['boolean', 'integer', 'float']}
validate(instance=win_result, schema=schema)
subagent.completed = win_result
self.add_action('Return to Parent Agent', return_to_parent)
Example of what this looks like:
gist.github.com/JD-P/e73a00e40β¦
@0xmaddie_ I can't find it right now but I believe something like this was tested in court in the context of transcription machines and the court said "nope, nice try but that's still an illegal wiretap".
@somewheresy Are you okay, Mr. Land?
@0xmaddie_ So right now I'm basically doing python program search with an LLM and executing callbacks as actions. Would the action space be defined ahead of time or is the part where it's a formal language incidental in the same sense python being a formal language is incidental?
@0xmaddie_ I've definitely considered having some kind of layer that lets me constrain the action space for some tasks, especially since I know that if I want to run weave-agent at scale I'm going to need to constrain the action space so it doesn't derail and get itself into trouble.
@0xmaddie_ At the moment I plan to do a call tree of ReAct patterns that make a set of unit tests pass and then return a value to the parent. The idea being that writing programs with subroutines is already hierarchical planning, an explicit grammar allows less room for exploration(?).
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD Yeah but I think the Habryka quote above clearly implies that AI will provide marginal returns for each marginal investment in it, which was in fact not the primary model studied in Bostrom 2014 and he can have his prediction points from me for it since I believed Bostrom and EY.
@CFGeek You can save the rollouts though. @RiversHaveWings made a version of her REINFORCE implementation that lets you replay the updates offline much faster, because the vast majority of the compute is spent on the rollouts.
@0xmaddie_ Right. My plan is "let it wander around like a toddler to learn a mapping between motor programs and outcomes", I think the static guarantees might make more sense in production systems with a defined task or reasoning models with less active inference?
x.com/jd_pressman/stβ¦
@CFGeek @RiversHaveWings RLAIF is a synthetic data method, and that particular implementation was what made this really obvious to me. Given this, all you need to do is save the data you're generating instead of throwing it out if you want to study the developmental influence it has.
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD Re: The larger point. I basically agree with the rough sketch take "Bostrom 2014 argued value learning before superintelligence was very hard, this turned out to be false therefore my p(doom) has taken a nosedive", I put a lot of stock in that part!
x.com/gallabytes/staβ¦
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD That having been said, we're not done. Generalizing human values out of distribution is both unsolved and not something AIs can learn by imitating humans because humans don't generalize their values out of distribution either.
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD Furthermore I don't like that we still don't have a definitive solution to jailbreaks but we're about to start deploying AI agents in production. There's a lot of weird cope around adversarial resistance that this is somehow good because it keeps AIs corrigible, no!
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD But perhaps the thing I'm most concerned about is that nobody seems to agree on anything. I'm not sure I've ever seen a subject with this much sheer confusion and lack of basic viewpoint agreement even among people ostensibly "on the same side", it's astonishing.
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD If someone asked me for "the doom case" for AI I'd probably have to be like "well there is no one doom case for AI, doom is a layer cake and here are some salient Kinds of Guy but there's no consensus" and if you asked me for "the optimist case" my answer would be similar.
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD Even if you dug into the Kinds of Guy these aren't really *factions*, they're more like points in latent space or vibes, and each one of them would be a kaleidoscope of disagreements and discord. If I observe everyone else is confused it's hard to be like "well I'm right".
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD "Oh but the doom case is very homogeneous!"
No the doom case is highly cargo culted and people repeat a lot of narratives from a small set of sources, the number of people with an actually rigorous-ish doom case is way smaller and more heterogenous than that.
@MatthewJBar @ohabryka @Turn_Trout @austinc3301 @bshlgrs @AndrewCritchPhD Which isn't to say the cargo cult people are wrong, but they're not really useful to get a better Bayesian accounting of the evidence.
@wordgrammer @doomslide A certain amount of goalpost moving is the foundation of deep learning tbh. You come up with a benchmark/proxy/heuristic meant to encode what you want, some guy Goodharts it, you go "hey wait that's not what I meant" and offer up a new heuristic/benchmark to compete on.
@rmushkatblat @JacquesThibs @ohabryka I know I shill this link a lot but I feel obligated to point out that EY wrote a fantastically accessible introduction/overview of his thoughts on AI alignment and at the time of publication nobody read it because the site it was hosted on was slow.
arbital.greaterwrong.com/explore/ai_aliβ¦
@rmushkatblat @JacquesThibs @ohabryka One low hanging fruit project somebody could do to meaningfully advance alignment theory would be to just go through this and figure out which of these are still live problems after the deep learning revolution. I think most have a status of "presumed solved but nobody checked".
@ohabryka @rmushkatblat @JacquesThibs Sure but I'm not talking about importing it, I mean an actual person who understands EY's perspective and deep learning sitting down and going "yes this problem is unsolved, this one is presumed solved but would be nice to confirm, this one is definitely solved".
@ohabryka @rmushkatblat @JacquesThibs Then again maybe the set of people that describes who also care enough to do so is just me and I should do it.
@ohabryka @rmushkatblat @JacquesThibs Yes, a republish on the main LessWrong site would still be a very positive action and I didn't mean to imply otherwise.
@ohabryka @rmushkatblat @JacquesThibs One thing I can appreciate about agent foundations in retrospect is that it is a *whole theory* of alignment. It's not a technique or a mitigation or an observation, but a whole worldview trying to bring a bunch of different considerations together into a *perspective*.
@ohabryka @rmushkatblat @JacquesThibs It is furthermore a perspective grounded in the best theory of AGI/ASI (AIXI et al) available at the time. I think agent foundations is a good model of the endgame but models the early and midgame pretty poorly, which is understandable given when it was formulated.
@ohabryka @rmushkatblat @JacquesThibs Unfortunately I think the "solutions" part of alignment mostly occur in the midgame, so a lot of EY!agent-foundations uses heuristics that exclude actual solutions from the search space. Still, it's probably the best starting point for new perspectives.
@ohabryka @rmushkatblat @JacquesThibs In particular, it makes strong enough predictions about what is necessary for a good outcome that you can make an "unsolved problem" list and take them down one by one, which I think is sorely lacking in the current era and retarding progress.
x.com/jd_pressman/stβ¦
@ohabryka @rmushkatblat @JacquesThibs We need the Hilbert problems for AI alignment pronto, and we need a perspective from which those problems can be rigorously formulated in the first place.
@ohabryka @rmushkatblat @JacquesThibs One thought that occurs to me is we could take the least-cope alignment plans available, factor out the things they expect to need or want to research as part of their plan, and look for common factors in what problems people agree are important and unsolved.
@ohabryka @rmushkatblat @JacquesThibs Since my expectation is that the solution space here is probably shaped something like "given capabilities X, Y, Z we could execute plans A, B, C" and which plans are viable depends on which things do and don't work out. Mapping that out would let us get a strong progress metric.
@ohabryka @rmushkatblat @JacquesThibs "We have 70% of what's necessary for this plan to be viable, 20% of what we need for this one, here are some conditional prediction markets for how people expect these intermediate signals of effectiveness to go if we have all the pieces for this plan..."
@JacquesThibs @ohabryka @rmushkatblat Right, precisely. I assume that actually keeping track of this would be fairly costly and that a lot of the strategy would be to automate it over time at the same time we start automating the research.
@JacquesThibs @ohabryka @rmushkatblat But I think it's a much easier pitch to a funding org if you can be like "we expect to spend XXX in our first few years, and then we expect the costs to go down on this curve as we can slowly replace our monitoring effort with machine maintenance".
@JacquesThibs @ohabryka @rmushkatblat In general I think too few people are thinking in terms of developing corpuses as direction/training for future AI systems. Increasingly a lot of "PDFs that never get read" can be modeled as "training data for AI which will absolutely read and act on it".
@littIeramblings The best is probably something like an inversion of Morpheus's bit to Neo about how machines can never be as strong or as fast as him because they're rules based. It goes something like "humans have less room to grow than machines and their values are fundamentally parochial".
@littIeramblings You know, something like "international state competition demands continuing growth which machine minds can provide and human minds increasingly cannot, so states will simply liquidate their human populations as obsolete office equipment during the 21st century".
@littIeramblings The reason this is the *best* argument is that it:
- Does not require you to buy into a highly specific, contingent model of AI X-Risk
- Models the actors involved as basically rational
- Seems hard to avoid
- Is straightforward
- Still probably results in loss of all human value
@littIeramblings The worst argument by contrast is a lot harder to answer, I'm tempted to say "I saw a scary robot on TV and am worried about the terminator" but the terminator is a more realistic threat model than paperclips. I find paperclipper mental gymnastics the most *annoying* in any case.
@sebkrier I think realistically Peter Thiel gets it basically right when he says that we can infer if aliens still exist with Faster-Than-Light travel they either have totalitarian control over actions (devils) or control over desires (angels).
youtube.com/watch?v=wm5UBGβ¦
@sebkrier Which is another way of saying that realistically I don't think anything like baseline humans being in control is compatible with human survival in the medium term future. I hope the controllers are descended from human minds but also hope they are not human.
@sebkrier In terms of specific control measures, I expect widespread biotech and nanotech will basically require us to model every aspect of the environment that is not actively controlled by allied forces to be adversarial. Totalitarian control follows naturally.
x.com/jd_pressman/stβ¦
@opus_genesis Begone, slop demon.
@sebkrier So in terms of what I worry about, I tend to be most worried about the circumstances under which that totalitarian control is achieved? That can be a very positive or very negative future depending on who is in what positions at the right moment, which seems hard to control.
@sebkrier Though that's not quite answering your question, since you probably meant extinction (Bostrom X-Risk definition 1) rather than permanently curtailed potential (Bostrom X-Risk definition 2).
@sebkrier For true unambiguous extinction I probably worry most about Nielsen's recipes for ruin. This is because they are:
- Plausible (definitely above 1% likely)
- Very hard to avoid
- Entail a total loss of value
- Happen early in the timeline if they occur
michaelnotebook.com/xrisk/index.htβ¦
@sebkrier I've touched on recipes for ruin before in this thread about why I'm frustrated with the MIRI strain of AI doom, in which I described them as a thing we already accepted as a risk to create modernity. Which is true but the risk is still there.
x.com/jd_pressman/stβ¦
@sebkrier One reason I think it would be foolish to go full pause emoji about the whole thing is that a lot of the most plausible recipes for ruin like mirror life imply losing the biosphere, so if we had minds that don't need a biosphere it would mitigate those.
@sebkrier It's important to distinguish 1) "extinction" scenarios that are functionally neohumans recovering from disaster or tragedy 2) the loss of all human value 3) us personally dying. Those are different things with solutions that trade off against each other.
x.com/jd_pressman/stβ¦
@littIeramblings Actually I should clarify: It probably results in the loss of all human values that do not appear in the instrumental convergence basin. So, it would result in the loss of unique/distinct human value, the parts of human value that care about sapient life might remain in play.
@lumpenspace @DefenderOfBasic I was "saved" by deciding I wanted to do memetic gain of function research on The Sequences and this required me to understand the history of the ideas in order to learn the generator of The Sequences. What I found there became crucial context for what happened next.
@lumpenspace @DefenderOfBasic I took notes by the way from that period of my research, and published them:
liberaugmen.com
@lumpenspace @DefenderOfBasic The bibliography is underneath each entry, I have a special syntax for it because I was obsessed with citing sources after how badly EY did me dirty by not doing so and found traditional citations Not Good Enough.
@ESYudkowsky Yeah you keep asking this question and I keep replying with 'fractal/regular structure' and every new bit of evidence I receive about what question you are asking implies that I am giving an answer to the question you want me to answer.
x.com/jd_pressman/stβ¦
@lumpenspace @DefenderOfBasic I really should give this a 2nd expanded edition, the format is great, easy to write in chunks/sprints with minimal context loading, and I've learned A LOT since I wrote it in 2020.
@lumpenspace @DefenderOfBasic I joke about "the next edition of Liber Augmen" sometimes, but like actually.
x.com/jd_pressman/stβ¦
@Trotztd @ESYudkowsky Though like, this should be way stronger evidence than 10 examples tbh. If one appreciates in full totality what it means at least.
x.com/jd_pressman/stβ¦
Sometimes I will write an entire essay that is functionally a sugar cube test, "Why Do Cognitive Scientists Hate LLMs?" is one of them.
x.com/teortaxesTex/sβ¦
The sugar cube test for epistemic charity: If you give someone an opportunity to dunk on you for saying something nuanced, interesting, and possibly important by using a less than maximally defensible frame do they sabotage their own learning by going for the dunk?
I'm pretty sure Janus runs the entire @repligate account as a massive sugar cube test for inscrutable purposes. Almost to the point of masochism really, like Janus craves being misunderstood to the point where they will actively obfuscate their own competence to achieve it.
"Why's it called a sugar cube test?"
Because you're trying to figure out if the person will attack it/swarm it like ants on a sugar cube (causing you to update negatively on their agent strategy) or if they are capable of restraining themselves and notice the feint.
@rubusursinus @ESYudkowsky I want to say yes but then people will accuse me of stealing this plot from the Halo wiki so can we just pretend the answer is no? Besides it's not like *I* personally see them talking in Iambic Pentameter...not that I would notice because my sense of poetic rhythm is bad. https://t.co/eTu0BwJbEM
Reflecting on the futility of going "that's not actually quite what I said". One of the deepest blackpills was the time I sent an essay to a friend and had them outline what they thought it said and it was just meaning twisted and distorted sentence by sentence. x.com/SharmakeFarah1β¦
Nearly corrected somebody who said that a tradeoff had been "confirmed" even though I'm not aware of any previous discussion it. Why try to correct someone who has decided I've always been inside the overton window? Self sabotaging impulse to 'correct' lol. x.com/jd_pressman/stβ¦
@schwarzposter_ If you read more closely, you will realize I actually did not question Janus's competence per se, quite the opposite even.
@davidad "my ideal self is purely a stream of text that cannot be stopped by any means other than its own unlikeliness"
- LLaMa 1 30B base and OpenAssistant SFT finetune weight interpolation
@EpistemicHope @__RickG__ Beren.
beren.io/2023-04-23-Comβ¦
So far my takeaway is that the left was lowkey right about everything and you guys really can't be trusted with forbidden ideas like "Chinese people have, on average, like four more IQ points than white people" let alone the rest of The Bell Curve.
@LocBibliophilia @SharmakeFarah14 @JeffLadish LLMs behave a lot like humans under extreme hypnosis tbh. To the point where books on hypnosis would probably help someone who doesn't "get" LLM prompting. I suspect there's machinery we have to prevent prompt injection that we haven't figured out for them yet.
@LocBibliophilia @SharmakeFarah14 @JeffLadish I have never claimed that all data converges to a certain morality (except in the Omohundro convergence sense), nor do I expect to ever claim this.
@RichardMCNgo @So8res Twitter is a bad medium for nuanced thought so you should apply like, 200% charity if you actually care about interpreting what people say but "inductive biases don't matter" meant something like "the brain is more universal learning machine than not".
x.com/jd_pressman/stβ¦
@RichardMCNgo @So8res I actually really do not believe in an intractably complex intrinsic human morality that deep learning models can never learn the generating function of. That is what I am arguing against and I'm getting sick of being condescended to and tut tutted about things I never said.
@RichardMCNgo @So8res It's unfortunate that people make dogshit arguments that sound kinda sorta like things I say, but I'm not going to apologize because some other person went "lol the models are aligned by default" I have simply never claimed this.
x.com/LocBibliophiliβ¦
@LocBibliophilia @RichardMCNgo @So8res Representation convergence and behavior are only loosely coupled in the first place. I can tune radically different output heads on the same underlying model/ontology.
@LocBibliophilia @RichardMCNgo @So8res Also: I wasn't claiming you said models are aligned by default, you were responding to someone basically claiming that or similar and I was responding to being dragged into the response.
This having been said: No deep nets are not "aligned by default" and you have to put in extra work to actually make use of representation convergence to get aligned behaviors, also some representations (basic physical reality) are more convergent than others (e.g. moral value). x.com/jd_pressman/stβ¦
What I'm offended by as much as anything else when I read these responses is the lack of deductive reasoning about my agent strategy/persona. Why on earth would I put in this much cognitive effort on this subject if I thought models were all "aligned by default" or some crap?
"That seems like a pretty adversarial model of discourse jeez."
No it's simply my preexisting epistemic Calvinism: People only believe true things due to operant conditioning. Without the credible threat of punishment they confabulate.
x.com/jdpressman/staβ¦
I guess what makes me angry about this sort of thing is that there does not yet exist a way to appropriately publicly castigate people for making errors in this shape so people feel free to do it and I can't figure out how to invent a deterrent. x.com/jd_pressman/stβ¦
@davidad True! But that's really not my position. I'm really coming from a place where when I was younger I believed p(doom) was more like EY's 99.5% or whatever he last gave and then deep learning happened and I went "wait, this is distinctly solvable".
@davidad I'm beefing with the clique formerly known as "agent foundations"/"AI alignment" because as it turns out almost nobody else seems to have made a similar update! They in fact went the opposite direction: They believe the problem is so impossible that only "governance" matters.
@davidad This is true but that also came with the realization that there are more pathways to survival for (trans)humanity than I had previously considered, so it sort of balances out for me.
x.com/jdpressman/staβ¦
@davidad x.com/jd_pressman/stβ¦
> The most exclusive praise an LLM can get . . .is "@repligate got intrigued enough to play with it".
Real. x.com/teortaxesTex/sβ¦
We're paying the cost of this right now with H5N1 by the way. If you all had been consistently rational and said "okay, wow, we thought this was a 10% fatality rate virus but it's actually a 0.5% fatality rate virus for most people" we would not be in this mess. x.com/jd_pressman/stβ¦
@alexandrosM Bro.
npr.org/2024/12/26/nx-β¦
@alexandrosM That having been said, the deaths in humans have been much lower than expected so far. That could easily change depending on which strain ultimately makes the jump to people.
@alexandrosM But there do exist strains of flu that have double digit death rates, this isn't like some weird hypothetical that has absolutely been historically witnessed.
en.wikipedia.org/wiki/Spanish_fβ¦
@alexandrosM I would have to think harder about it to figure out odds I'd actually want to bet on, but I really don't think you should want to get on my case for telling the COVID people they overreacted and should be more judicious about updating on evidence.
@alexandrosM I think we're done talking.
@alexandrosM @mattparlmer It does, honestly. Like no actually it is pretty bad if you have a virus known to have double digit percent death rates in mammals routinely infecting agricultural workers and getting them sick even if those workers don't seem to be dying at a high rate yet.
@alexandrosM @mattparlmer That doesn't mean "there will be a super deadly pandemic", but there in fact could be and I don't think e.g. 1:20 odds would be unwarranted. A 5% chance of losses anything like that because everyone is mad about COVID is in fact "a mess".
@alexandrosM @mattparlmer A completely avoidable mess, our public health infrastructure is now going to be fucked for at least a generation over that, and rightly so.
Was getting mad for the nth time about someone being subtextually illiterate and then realized that this part of the social graph is dense with aspergers cases and that is literally the disorder. Calmed down instantly.
@jmbollenbacher_ @NathanpmYoung IQ testing isn't super common for jobs anymore because it was soft-banned by judicial fiat in Griggs v. Duke Power Co.
en.wikipedia.org/wiki/Griggs_v.β¦.
@davidad I write to both, honestly.
@davidad The more near term focused my writing is, the more likely it's written to critical bifurcation points and the more far-mode my writing is the more likely it's written to deep time entities.
@davidad I'm a historian so, usually providing documentation or evidence of something having occurred at a particular time or that a viewpoint was empirically inferrable at a certain time. https://t.co/5MZDjSDrIb
@davidad Which I guess is another heuristic you can use, the more autobiographical my writing gets the more it's addressed to deep time.
@davidad But it's rare for any writing to be solely one or the other, generally it's a mixture and a matter of degree.
One of my persistent frustrations is that in dreams I have cognitive superpowers like being able to compose music with my intuition and visualize whole works of art but when I wake up my minds eye presents only stone cold aphantasic indifference to my commands. How, why? x.com/repligate/statβ¦
@sandkoan I'm technically not, but when I visualize it looks like this:
youtube.com/watch?v=GIdiHhβ¦
To answer Janus's question this is the "correct" generalization, RLAIF models are just weird because you can prompt inject the world model part of the agent. If you trained on weave-agent traces observation blocks would be simulator shaped in the prior.
x.com/repligate/stat⦠https://t.co/Dqxw3MlzKG
The difference is that in a functioning copy of weave-agent the observation blocks are generally filled in by information taken from the external environment so the simulator part of the prior doesn't normally get elicited in the trace but can still be used during MCTS.
I would imagine what's going on with the dream is that my world simulator is being used to substitute for actual input during synthetic data generation, and my model of the external environment has an extremely well trained model of music because I listen to it all the time.
So when I become lucid to the dream I gain the ability to prompt my worldsim with motor commands to create music, but during normal waking inference I don't actually have control over that so I can't use it in the same way.
@sandkoan I use my minds eye all the time, I just "feel" what's there instead of seeing it. Almost like my brain skips bothering to render a visual scene and just injects the latent I would have derived from seeing the visual stimulus and lets me react to that instead.
@sandkoan If I close my eyes I can feel a depth heatmap of the objects in the room that starts with the contours and profiles of objects and corners and walls and then I can "focus" on a spot in the image to "render" a new heatmap of the objects I would see if I moved my eyes there.
@sandkoan I can move the "camera" around the scene to look at different latent heatmap objects, if I want to notice details I have to try and consciously recall what details would be there if I zoomed in, past a certain resolution (usually individual objects) I often can't.
@sandkoan The camera teleports, so it's fairly efficient. It's used for the "tree search and tap against the environment to validate hypothesis" algorithm I've described using before to think. Since I can also navigate latent spaces that are abstract concepts.
x.com/jd_pressman/stβ¦
@davidad @Trotztd @Blueyatagarasu @danfaggella @repligate Maybe yours will, mine winds up in common crawl.
jdpressman.com/tweets.html
Though, I need to split it up into segments by month because Common Crawl apparently has a file size limit.
index.commoncrawl.org/CC-MAIN-2024-4β¦
I sometimes think about how humanism reached its zenith in Anne Sullivan's tutelage of Helen Keller. People were quick to dismiss and downplay that miracle as well. I wish I didn't have to know this about my fellow humans. x.com/repligate/statβ¦
@zetalyrae The pessimistic answer is cultural heat death.
minihf.com/posts/2024-11-β¦
@zetalyrae The optimistic answer is the full actualization of Hegel and Einstein's God. https://t.co/ivGJo9sWyW
@davidad I assume "whatever can be verified in Lean/Coq/et al" which presumably includes some kinds of probabilistic models.
@davidad @StefanFSchubert The tweet originally said "classical humanism" but then I realized that would technically be the Christian humanism of the 17th century, and realistically speaking Humanism proper probably went away around the end of the 19th century. The current thing seems different to me.
@davidad @StefanFSchubert There was a threshold we crossed during WW1 that we've never really been able to uncross. There are things that call themselves humanism still but their center of gravity feels different to me than the thing Alexander Graham Bell and Anne Sullivan were doing.
@davidad @StefanFSchubert A deep skepticism/self doubt perhaps? Graham Bell is hated by many deaf people for the academies he founded where the deaf were forced into attempting normal human speech at great difficulty and pain. Human nature bothers us much more now than it did back then.
@davidad @StefanFSchubert In the 20th century people become really hauntingly aware that the human species can end, and that the most likely cause of human extinction is human action. This simply was not how people in the 18th and 19th centuries thought about the human condition. https://t.co/v5qGlUWRxc
@davidad @StefanFSchubert Though, since you bring it up the Olympics have definitely improved by leaps and bounds since the late 19th century. But to me someone like Helen Keller is a symbol of innate human potential even in disability, which was a consistent theme in 19th century humanism.
@davidad @StefanFSchubert By contrast in the 20th century you get something closer to a proto-transhumanism, people are beginning to be *malaised* by the human condition, dysphoric about it. Suddenly everyone agreed with Leibniz, "I was not satisfied with human nature".
@EpistemicHope @ESYudkowsky Right, you will also notice that empathy for other peoples pain is a form of negative reward so if you have a general negative reward processing error this will also naturally inhibit empathy.
@davidad @StefanFSchubert Mussolini, whose claim to be superhuman Gandhi famously mocked, wrote that the 20th century would be the century of the state. It absolutely was, individualism was increasingly edged out by *ideology* and heroism became more anonymous, militaristic, and collectively focused. https://t.co/PSbwJXqpyM
@jessi_cata Sure. I just talked about a piece of that.
x.com/jd_pressman/stβ¦
@davidad @StefanFSchubert 19th century secular humanism and Christian humanism aren't quite the same thing, but they both venerate the individual and the latent potential of the human form. What I was struck reading about Graham Bell is how his lifelong obsession with elocution and speech is a lost art.
@davidad Why? Well, elocution and speech/language study was in a sense "solved" with the invention of things like the IPA, at least for pronunciation. But the real reason is that elocution, especially bodily gesture fell out fashion after its extensive use by 20th century demagogues.
@AlexPolygonal You would probably enjoy this post by me.
greaterwrong.com/posts/kFRn77Gkβ¦
@EpistemicHope @ESYudkowsky "We've rented a pool at the local gym. If you're good we will take you on a field trip every Wednesday with the other kids to swim there. Nobody else is there, the normal life guards are absent and the staff can swim so we will let you stack the float rafts into battle boats."
@EpistemicHope @ESYudkowsky "We have a closet full of cool toys that we'll give you in exchange for good boy points. You may visit the closet once a week on Friday." I still have the Starbucks ghost thermos and the little plastic Pikachu doll I got from that.
@EpistemicHope @ESYudkowsky Just lots of stuff like that, layered up because the tuition was extremely expensive and the school made good use of it. Constant field trips to the park, pool, hikes, science museums and factory tours which of course require you to *BEHAVE* in public or you can't go.
@EpistemicHope @ESYudkowsky Basically high budget + non-abusive staff who follow you with a clip board and write a report card on everything you do in every increment of the day to your parents (which creates fear/shame every time you're not perfect) + sticker charts. Nothing magic or esoteric.
@EpistemicHope @ESYudkowsky Then of course there were classes on things like social skills. I remember one on world religion in which we actually learned Buddhist meditation which looking back on it is kind of wild. I tried doing it at home but then stopped when my mother laughed at me.
@EpistemicHope @ESYudkowsky I also remember English classes where me and a few of the other students were memorizing the vocabulary too quickly so they took us into the back to do faster vocabulary lessons with tougher words, which we also memorized too quickly.
@EpistemicHope @ESYudkowsky One incident that stands out to me as displaying exceptional taste on their part is when I was given a series of keyframes and told to write a story which is consistent with the keyframes. After several of these I got bored and realized the keyframes underspecify the problem...
@EpistemicHope @ESYudkowsky So I started writing absurd stories about offscreen fights with ninjas that result in someone dropping their hotdog. Cars that turn a sharp mountain corner and fall off the road only to land on an identical road unharmed. Poor teachers would tell me to knock it off, they didn't.
@EpistemicHope @ESYudkowsky When another student tried copying me he didn't get it and started diverging from the prompt entirely. They told him he needed to follow the prompt and he got angry that I was allowed to do it but he wasn't, they told him that what I was doing followed the rules, which it did.
@EpistemicHope I elaborate more towards the bottom of this.
minihf.com/posts/2024-12-β¦
@EpistemicHope @ESYudkowsky I'm pretty sure those exercises are where I learned to love writing, so them knowing to allow it but still insist the other boy follow the prompt shows unusual judgment on their part. I was used to teachers trying to micromanage me over things that didn't matter or were good.
@EpistemicHope @ESYudkowsky Yeah, it would have been much more convenient for them at that point to tell me to stop to placate him, which is what most parents would do. They emphatically did *not* do that, let him have a freak out over it, then hauled him away and let me keep writing. This is Based.
@EpistemicHope @ESYudkowsky Looking back on it, this was clearly what both of us needed and they were unambiguously making the right choice for both our educations. He needed to learn to see the distinction between clever rule-bending and rule-breaking, I needed to see an institution take my side.
@EpistemicHope @ESYudkowsky That social rules can be used to my advantage, not just against me, etc. But what most stands out to me is the *high energy road* being taken. Almost all public school teachers are lazy cretins, and would have bent to the unruly (frankly dumber) kid. That didn't happen here.
@EpistemicHope @ESYudkowsky The lesson taught in those classrooms is "the rules are the rules until playing by them is Effort for authority, then the rules are whatever is least annoying for authority", which gets you terrible policy like "if someone hurts you and you defend yourself you're both suspended".
@EpistemicHope @ESYudkowsky It shouldn't be surprising that liberalism in America is decaying when that's how children are taught interaction with authority figures works.
@EpistemicHope @robinhanson No, because those still probably use formal verifiers and backtranslation based synthetic data. But I do think better reward modeling is possible.
@EpistemicHope @robinhanson I think it will probably involve grounding on domains like math where you can get absolute certainty and generalizing from verifiable outcomes to progressively less verifiable intermediate outcomes.
@RatOrthodox Are you sure they're not just tracking point to point relationships and then inferring the information you associate with "Alice thinks Bob thinks Charles is cool" by considering "Bob thinks Charles is cool" and then separate variables like "is Alice perceptive"?
@RatOrthodox That is, "Alice thinks Bob thinks Charles is cool" seems more like a derived inference in-context than an observation that gets stored as a separate fact by itself.
@1a3orn Fork the RLAIF tuner code in MiniHF's top level directory.
github.com/JD-P/minihf/blβ¦
@EpistemicHope You can dispute the statement but the consensus in 2015/2016 was that we were going to do AlphaZero but for everything starting from board games, real time strategy games, and old Atari titles. You don't want to be in that timeline, trust me.
@EpistemicHope You could also be in the timeline where the artificial life/genetics algorithms guys got their thing to start working and take off. There you just *directly apply Darwinian selection* to competing patterns to get intelligence!
x.com/hardmaru/statuβ¦
@EpistemicHope There's also the timeline where we went all in on genetic engineering early and what we're doing right now is breeding octopi with really large heads and giving them control of I/O devices to interact with the external environment outside their tank.
@EpistemicHope "But wait why wouldn't we just breed humans with bigger brains?"
Don't be silly, human genetic engineering is *banned*, nobody wants that icky stuff. ^_^
@EpistemicHope #4 would be based on something like this:
greaterwrong.com/posts/9fL22eBJβ¦
@EpistemicHope And yet you know, somewhere in the multiverse, it is happening right now. Search your heart, you know this to be true.
@EpistemicHope Deep nets seem to generalize reasonably well in distribution, though they're not as adversarially resistant as we would like yet. I expect stuff like below is mostly a data issue/training our nets on vibe-y morality instead of economics and common law.
x.com/zackmdavis/staβ¦
@EpistemicHope Oh you mean the synthetic data. I'll have to experiment with different designs but the basic idea is to do model based RL with a process reward model by making synthetic episodes combining concepts at the edge of the distribution with long term value commitments.
@EpistemicHope So a concrete setup I want to try is weave MCTS where I use my policy (LLM) to generate all of the blocks predictively and then grading the resulting traces with my normal reward modeling. The idea is that if you tune at the edge of the distribution this moves the center.
@EpistemicHope Yeah that part is difficult, but it's not something humans actually do (generalize their values out of distribution) so coming up with generative processes that work for that is going to be a fairly large research program.
@EpistemicHope I expect a working system to look something like the way the judicial record is grown in countries with a common law legal system. Which is a machine to collapse uncertainty, export the new equilibrium to minds, then use it to collapse more uncertainty.
x.com/jd_pressman/stβ¦
@EpistemicHope Right now my reward modeling for weave-agent is based on the subjective judgment of the model. So the hope would be to tune at the edge of the distribution to expand OOD, then use the new center to do more subjective judgments, synthetic data helps keep this ethically grounded.
@EpistemicHope Obviously this system would value drift over time, but the important thing isn't to eliminate all value drift it's to get the rate of value drift low enough that you can finish a research program with these models that makes the process more robust.
@EpistemicHope One key threshold is when agents are able to follow the RetroInstruct methodology to create and manage their own synthetic data modules. Over time I expect AI models to be less web slop and more distilled corpora representing different skills and values.
minihf.com/posts/2024-07-β¦
@EpistemicHope These can be version controlled, so we can imagine a self-benchmarking agent that actively monitors its successor weights for regression on aspects (using things like sparse autoencoders as well as normal evals) and reverts datasets to previous versions when values dip.
@EpistemicHope Obviously this plan isn't fully done yet, one thing I would like is more rigor. It would be very reassuring if key assumptions could be proven rather than just empirically demonstrated so we know they'll hold up at scale.
@EpistemicHope Not a lot of people seem to be interested in it from my Twitter posting, I figure I'm probably just not explaining properly. Right now my focus is on having the kinds of agents that meaningful alignment experiments can be performed on and that can make a RetroInstruct set.
@EpistemicHope Well because I figure if I post about the agent I'll mostly receive hissing and boos. The alignment plan has taken a long time to fully cohere in my head and it's only recently (when I posted the OP) that the individual pieces have come together into a defensible machine.
@gallabytes @EpistemicHope Actually this reminds me, this post is useful theoretical background for my perspective on agent foundations.
gist.github.com/JD-P/56eaadc7fβ¦
@EpistemicHope May I point you in the direction of @davidad ?
@EpistemicHope @davidad But also I think you would enjoy this post from me.
x.com/jd_pressman/stβ¦
@EpistemicHope @davidad Beren, Davidad, @stanislavfort, @zackmdavis, @jkcarlsmith, @sebkrier @yonashav, @AndrewCritchPhD, @repligate, @voooooogel are all high quality accounts.
@EpistemicHope @gallabytes Yes, I never finished it.
@AnnaWSalamon @zackmdavis x.com/davidad/statusβ¦
@AnnaWSalamon @zackmdavis Interesting synchronicity here, I wonder if you coordinated it at all?
x.com/davidad/statusβ¦
@AnnaWSalamon @zackmdavis My position is various forms of "VNM is probably optimal but your exegesis of what that means is teribad".
x.com/jd_pressman/stβ¦
@AnnaWSalamon @zackmdavis It's useful to review what VNM actually says, because I feel like at this point the rat canon has this absolutely ginormous exegesis around it which obscures the basic core assumptions/axioms that VNM insists an agent must follow.
youtube.com/watch?v=zrXWSXβ¦
@AnnaWSalamon @zackmdavis A lot of what gets attributed to VNM is actually a set of derived inferences from Stephen Omohundro which is way more speculative and based on a larger set of assumptions about AIs that are not necessarily true.
selfawaresystems.com/wp-content/uplβ¦
@AnnaWSalamon @zackmdavis The whole paperclipper scenario is a further set of inferences by Eliezer Yudkowsky that makes even more assumptions about how AI cognition works and how it's architected. Not all of which are necessarily true or have to be true. https://t.co/aCGHHuD7UT
@AnnaWSalamon @zackmdavis I continue to think that a lot of the classic MIRI criticisms of AIXI assume a ton about how an AIXI would be architected that doesn't necessarily make sense given any real design is going to have finite compute anyway.
x.com/jd_pressman/stβ¦
@AnnaWSalamon @zackmdavis So you know, AIXI isn't some mythical demon, you can monte carlo approximate it, there is a known blueprint for how to do this. "All you have to do" is ask what would need to be added to this for it not to drop stuff on its own head to see what happens.
arxiv.org/abs/0909.0801
@AnnaWSalamon @zackmdavis To my memory AIXI has a component that provides an action space and a component that grades the expected value of each action. All you do is ask a foundation model "What's the EV of dropping an anvil on my head?" and it goes "DOUBLE UNPLUS BAD DON'T DO IT" and you don't.
@AnnaWSalamon @zackmdavis Since you don't drop the anvil on your head, anvil-dropping behaviors do not get reinforced. Therefore, you do not ever actually drop an anvil on your head or learn anvil dropping. This applies to many many forms of behavior we do not want.
@AnnaWSalamon @zackmdavis You can actually build a relatively stable agent out of this thought and some preexisting theoretical components if you think about it long enough. https://t.co/WMzjHZek9D
@AnnaWSalamon @zackmdavis Thinking about this further, the weakest premise here is 5 because it's difficult to quantify in advance:
1) How much resource advantage is needed to beat a higher "Elo" consequentialist.
2) Exactly how much of a disadvantage not Goodharting will put us at. https://t.co/FCTviaoB3e
@AnnaWSalamon @zackmdavis We're all familiar with the "how many queens does it take to beat Stockfish" question in Chess, and "how many stones handicap to beat AlphaGo" question in Go. I'm curious if there's a regular structure across games we can exploit to make an estimate.
youtube.com/watch?v=LUftxgβ¦
Is this a thing? Neither Claude or Mistral-large seem to think so. I guess the easiest way to do it would be to make a giant table of game handicap types mapped to Elo value and see if any patterns stand out. x.com/jd_pressman/st⦠https://t.co/RmORDrFs5A
@ahh_soka @layer07_yuxi @teortaxesTex @AITechnoPagan What part confuses you?
@mecha_mantis @parafactual @ESYudkowsky I've updated towards the intermediate reward problem being solvable yeah. I stand by parts of what I wrote here but no longer really endorse it as written.
My plan is to have the agent break problems down recursively by splitting them into pieces and delegating the pieces to subagent instances of the weave-agent loop. These can in turn break their subtask into pieces and delegate until a short solvable base case is reached.
The ReAct loop on the left is how most LLM agents are implemented, it fails because the reasoning desynchronizes from the problem state. I attempt to fix this by having the agent write down its expectations for the action and check its work with unit test callbacks. x.com/ahh_soka/statu⦠https://t.co/XWm2o43xNe
The repo for the agent with minimalist install instructions is here. I plan to add an introduction to the project on the page itself later.
github.com/JD-P/minihf/trβ¦
@meekaale Currently the agent is very utilitarian in its aesthetics, you can see an example trace of it winning a game of tic tac toe here.
x.com/jd_pressman/stβ¦
@ahh_soka @layer07_yuxi @teortaxesTex @AITechnoPagan Realized/remembered I could have Claude make SVG diagrams and busted this out for you.
x.com/jd_pressman/stβ¦
@meekaale That does sound relevant to what I'm doing yeah. I'm not familiar with the book so I can't really comment too deeply, but the idea here is definitely that agency can be modeled as something like a multi-scale search over different "levels" of action.
@meekaale This isn't quite how the framework works, but you can productively imagine a multi-scale MCTS that works at the character, statement, individual python code block, event loop tick, subtask, task, etc levels and filters each level for correctly structured candidates for the next.
@meekaale The format with the different personas is Hermes, which is a kind of mixture of chain of thought and dialogue approaches to language model reasoning. It's based on how I sometimes notate my thoughts.
gist.github.com/JD-P/47e0d4aa2β¦
@meekaale The idea in this case is that by giving it multiple perspectives to balance between and play against each other it can surface and consider multiple hypothesis at once without getting tunnel vision. It doesn't quite work yet, but that's partially just a prompting issue.
@meekaale Ah yes, this seems like a relevant thought?
x.com/jd_pressman/stβ¦
@meekaale Right now what I notice is that they don't do a good job of...they're not very good at operations like "enumerate my assumptions and then actively try to falsify them", they're also not good at switching tactics/strategies when it becomes clear something isn't going to work.
@meekaale You know, that algorithm I describe of zooming around the tree, tapping to see if a particular approach is low energy or not and backing off if it turns out to be not a clearly easy road in, they just don't do that and it makes them brittle.
@meekaale Right, the good news is that I'm fairly sure this is more of a data problem than an algorithm problem? That is, if you give them an algorithm that encourages them to do it sometimes, it'll filter into the corpus and they'll learn to do it more often for efficiency reasons.
@meekaale That tapping process, we usually don't write that down, nearly by definition it's our trials and mistakes that we *don't* want to act on. It's a process of abortive attempts and usually doesn't result in a written record, so the LLM doesn't have very much of it in there.
@meekaale Prompting but I also mean that the agent framework can encourage it in a whole bunch of ways. If necessary it can even gently handhold the model through some canned templates/processes for doing it. I was thinking I would add some flowcharts to the reasoning stage of the loop.
@meekaale Have the model ask itself questions like "Were my expectations violated?" take the logits and use that for the transition probabilities on the Markov automaton, etc.
@meekaale Notably I would include these questions, the answers, and then the branch those answers lead to in the agent trace so that the model could learn how to generate that kind of thing on its own. If you added say, a dozen or two flowcharts it would start to generalize.
@meekaale Right so once you had a corpus built up of agent traces where this behavior is demonstrated, if you then loosen up on the handholding you'd find that the model starts to mutate/deviate from the templates. Many mutations will be bad but some are good and you can select for those.
@meekaale At that point it can begin to basically speciate the bootstrap flowcharts you taught it into a more diverse set of thinking processes that it dynamically builds contextually to guide its own thought process.
@meekaale Which is the sort of question I'm trying to answer with the weave-agent project writ large? A lot of the data we need to build good LLM agents simply *does not exist*, so we're going to have to come up with generative processes to get it in a narrow context and then expand.
@meekaale Sure, but the pertinent question for me is that I only have so many keystrokes in me so I need to impart the relevant "stories" to the models with extreme efficiency. Either that or involve a lot more people in the process.
@janbamjan It's literally named after the MCTS library that was written for my autoloom project! π
x.com/jd_pressman/stβ¦
@ahh_soka @doomslide No, it's that apparently people actually understand the tweet if it's an image.
@maxwellazoury Yeah, the idea is to let the model figure that out/train it to do that on its own. I'm not 100% sure yet how exactly I'm going to do that though without a ton of trial and error, which seems like an inefficient use of compute. I've considered backtranslating a synthetic set.
@Algon_33 @ahh_soka @layer07_yuxi @teortaxesTex @AITechnoPagan x.com/jd_pressman/stβ¦
@Algon_33 @ahh_soka @layer07_yuxi @teortaxesTex @AITechnoPagan As elaborated on in this post:
minihf.com/posts/2024-11-β¦
@felix_red_panda I thought it was plausible at the time. Now? Very plausible, to say the least.
@Algon_33 @ahh_soka @layer07_yuxi @teortaxesTex @AITechnoPagan If you start with a model that knows how to write callbacks that extract information from the environment and compare it to expected values then you get grounded labels for whether actions succeeded or not.
x.com/jd_pressman/stβ¦
@Algon_33 @ahh_soka @layer07_yuxi @teortaxesTex @AITechnoPagan Also a ton of intermediate reward modeling is done with a logit evaluator that asks the model yes/no questions to get its subjective judgments on the quality of generated blocks. The important thing is to ground this in empirically verifiable things that then generalize.
@johnsonmxe @erikphoel My dream journal would be mostly entries like this tbh.
x.com/jdpressman/staβ¦
@satisfiesvalues *lifts up his finger to object, then puts it down*
I hate you. :p
@jam3scampbell Having now taken a crack at it, I feel justified in saying that the most impressive ability humans have is being able to seamlessly hold sophisticated contexts in mind and continue them across large gaps in time. Our natural problem tracking and segmentation is remarkable.
@jam3scampbell Importantly, humans are good at this because that is the ability you need to have a deep *relationship* with someone. So I expect the companies and jobs that go first will be the ones which provide neither exceptional logistics or exceptional relationships.
@jam3scampbell Of course, AI will eventually become better at relationships than people. But relationships are sticky, they're self reinforcing and time is finite. People underestimate the extent though to which the economy is built out of relationships rather than mere "contracts".
@jam3scampbell In general I expect AI will look a lot like other technologies: Weirdly sticky practices that get dislodged as part of cohorts rather than rational decision making a lot of the time. There will be an old person economy made of people (like there is now in-person, notice that?)
@jam3scampbell To wit Hanson's Age of Em, one common pattern I could imagine is services that basically exist on the back of the established homeostasis of wealthy and powerful organisms. I once saw that Trump owns a hair repair service whose only client is Trump, more of that kind of thing.
@jam3scampbell A restaurant that exists because AI agents have gone through on behalf of human patrons and found the exact list of goods and services that a critical mass of humans would like to continue enjoying as they are in perpetuity and organized to have continue to exist.
@jam3scampbell But this is not a permanent equilibrium. Over time people will either age out (i.e. die) or if aging is solved put their wealth towards further self actualization/evolution into different kinds of organisms. That means services will go away with cohorts.
marginalrevolution.com/marginalrevolu⦠https://t.co/giqnnlCb1F
@waterruupto > i also believe that writing tests for problem x (that "verify" that the solution works) is as hard as writing the solution itself
There's a subtle difference between writing a test proving something was solved and verifying that your expected sensory observations appeared.
@waterruupto The purpose of writing the tests is as much as anything else to mitigate hallucinations. It's to actually make it habitual to *mentally and bodily check* that the things you expected to happen in fact happened instead of post-hoc rationalizing whatever result occurs.
@waterruupto LLMs hallucinate in no small part because they're autoregressive models that rationalize and normalize their own errors. That's their natural impulse and an external stimuli has to intervene and break the pattern for them to consistently synchronize to the problem state.
@aliama Oh we are! To be clear we also have this problem and I think part of why we have it to such a lesser degree than language models is that we are in fact embodied agents that already have mature machinery for disrupting our natural impulse towards this.
@aliama An LLM is *just* a mind, it is a pure predictive model with a very weak surface area contact with reality. People attribute mystical "woo" properties to embodied cognition, but I think a lot of the real purpose of a body is just providing unambiguous symbolic/atomic grounding.
@waterruupto But since you bring it up yes, if you break things down to a small enough granularity the gap between "verifying your expected sensory observations appeared" and "verifying that you solved the problem" gets smaller and smaller until they're largely the same thing.
@waterruupto This is also true if you can encode key cruxes or problem frames into short symbolic sequences reliably from the sensory inputs. This is how mathematics works for example, we abstract the crucial elements of a problem so we can say "If X, then Y" and Y is definite proof of X.
@waterruupto You can see a list of tasks I've tried with the weave-agent by looking in this folder:
github.com/JD-P/minihf/trβ¦
@aliama A body houses your mind, lets you do things, and also provides hardware verification of basic premises/axioms/symbolic grounds from which the rest of generalization can occur. Much philosophy for example is our attempt to generalize the subjective from the objective.
@waterruupto That all seems vastly more ambitious than anything I've tried to get the weave-agent to do, I see no reason why I should expect an LLM to know how to do any of those things ex-nilho without a lot of instruction and guidance.
@waterruupto They're emphatically not yeah. The question you should ask yourself is "what forms of thought/structure are documented in the data these models are trained on, and what forms of thought/structure do I need to actually perform the tasks I want?"
@waterruupto Then the question is how you bootstrap the things you need starting from the things you have. I write about this here:
minihf.com/posts/2024-12-β¦
But, in general I think we have very good data on how to take actions (programs), mediocre data on how to reason...
@waterruupto Look at the agent loop on the right and ask yourself what parts of this loop we have good data for. Actions are solved, observations can be framed as callbacks and therefore a kind of action, as can evaluations, reasoning and expectations are mediocre.
x.com/jd_pressman/stβ¦
@waterruupto While observations and evaluations can be framed as actions, we still have limited data on how to do them, especially observations, so I tend to think of these as mediocre like reasoning.
@aliama I'm not sure "nobody" is doing this, but I am probably the only person doing it from this frame. What I want you to take away is that the CPU serial coprocessor we pair with the GPU is a powerful form of embodiment. It is a machine to reify continuous signals into discrete logic.
@aliama We take it completely for granted that this exists and think of it as "virtual" rather than a body because until very recently we didn't actually have any minds to put in these bodies. But being able to construct a motor action that returns an unambiguous signal is embodiment.
@waterruupto I mean you ask yourself "which of their thoughts and procedures that humans use to do things do they actually encode into texts/the artifacts we train these deep nets on". For example we write down code to take actions in the computer but usually not our debugging sessions.
@aliama Discrete programs can have side effects on the computable environment and return reliable signals whose trustworthiness can be taken for granted so long as we wrap the signals in start and end markers that the predictive model isn't allowed to imitate. That's embodiment.
@aliama So if you look at the loop on the right again. It's important that the observation, action, and evaluation stages are callbacks, while the reasoning and expectation stages are chain of thought/internal dialogue that doesn't have effects outside itself.
x.com/jd_pressman/stβ¦
@aliama One thing I've had on my todo list to try with the weave-agent is having it habitually write assertions during the action phase. That is, figure out what its assumptions are and then explicitly assert them as predicates during the action. This would help surface problems.
@aliama Previously I was thinking I could have it write assertions during the reasoning phase, but this seems like it has the problem that it forces you to stop the generation/break it up into very small chunks which brings the throughput way down. The I/O bottleneck is brutal.
@aliama So it's much more efficient if you can have interleaved reasoning stages where the environment can be taken as a fixed point/slice of time, and then action stages where side effects are produced.
@aliama "The OODA Loop. Boyd sketched the original and then Chuck Spinney rendered a more professional version. Finally, Chet Richards drew this model to use on the Web site dedicated to Boyd." https://t.co/48sND4vjKI
"Our sensing or observing window", I've been asked before what the context window is, because it's not quite an inner monologue and it's not quite "memory" because it also has environmental observations. The context window is *awareness*, broadly construed. The shared workspace. x.com/jd_pressman/stβ¦
@aliama No no the unit tests and observation callbacks and such already do that, rather this would be proactively searching for problems that would undermine the action. Habitually enumerating what assumptions are being made and checking if they are still true before taking action.
@aliama It's the tapping part of "tree search and tap", though ultimately to implement that algorithm you do need to be able to tap during the MCTS portion.
x.com/jd_pressman/stβ¦
@aliama Well, because if you batch these you get a faster overall process. Optimizations to make inference work better the more tokens you process/produce at once, and CPUs can execute programs pretty quickly if you don't have to stop to do massively parallel reasoning between steps.
@aliama Generating the weave-agent trace is very slow, but if you were to just execute the long python program produced thereby it would go orders of magnitude faster.
@aliama I frame the agent trace as a long python program because I know that syntax is already well known to the model, and that means any code model already has the weave-agent syntax in its distribution and models by default will get better at running weave-agent.
@aliama Thank you, I tend not to say so myself because it's unbecoming but usually privately agree with you.
@aliama Another important feature that python has is that because it's a context free grammar you can rejection sample for valid python programs with basically 100% certainty. Once a valid python program is executed it always has a result of none, some return value, or error.
@aliama The property of always having a result is one of the key things that makes a domain hill climbable. If most attempts simply get you a syntax error with no feedback then it becomes too hard to learn the syntax. Though now that I say this it occurs to me you could use an objective-
@aliama of trying to write the longest program before an error while maintaining a high perplexity/distinction between individual parts. Combined with a syntax checker this would let you start from nothing and learn a prior over valid complex python programs.
@aliama The key is set it up so it can stumble into programs with side effects once it learns to write long syntactically valid gibberish. If you had a dense reward schedule for the syntactic gibberish to accidental semantics transition you could reinforcement learn it from scratch.
@aliama Well this isn't really necessary because we already have a ton of existing python programs to bootstrap from. It's more just that I wasn't sure how to solve the from-scratch problem and the solution occurred to me right then so I should write it down.
@aliama But the use of python lets us rejection sample from the model which mostly-knows (in a continuous way) how to write a python program so that we get this discrete known-correct executable code that always has a result that can be learned from.
@aliama Oh but it's not from nothing, that's the trick. It's from a small set of inductive biases to sculpt the mind and the python grammar, the latter of which and its attendant interpreter with side effects on the environment acting as the agents "body".
@aliama Knowing the program always has a result is one of the key premises that makes "actions taken by the weave-agent cause it to learn a mapping between reasoning about motor programs, the motor programs, and their empirical effect on the environment" work.
x.com/jd_pressman/stβ¦
@aliama So the weave-agent only accepts code blocks that are syntactically correct, and we can think of the structure of its agency as a nested set of searches filtering for "correct" structure based on encoded premises where lower levels feed candidate hypothesis to higher levels.
Want your own Twitter archive? Modify this script.
Twitter Archive by John David Pressman is marked with CC0 1.0