@doomslide I figure I will do something like this once I've got the python version working well.
@doomslide Well part of the idea is that once you have the python version working well enough it can translate some existing python libraries into OCaml for you.
@RichardMCNgo x.com/ObserverSuns/sβ¦
@cremieuxrecueil That's not really what he's saying. He's saying "I thought those people were dumb because I am smarter and more educated than them, but it turned out I'm dumb, they didn't have access to better information than me I just updated less competently than they did."
@cremieuxrecueil "There wasn't any tip or piece of information floating in their circles I didn't have, we saw the same things and I made less intelligent predictions than they did and that's on me."
Is what he's trying to say, he's using phrases that sound bad if you skim but he meant this.
@liron This concept is normally called active inference or the free energy principle. It even uses the same thermodynamics metaphor.
@liron Here's an example of me using it to discuss a similar kind of concern about AI X-Risk. Maybe there should be a new word for the specific concept of "AI X-Risk from active inference" but we don't need new jargon for the underlying concept itself.
x.com/jd_pressman/stβ¦
@manic_pixie_agi It's kind of cringe to me now and I should really give it an expanded 2nd edition with updated entries. I could do sooo much better on so many now post deep learning.
@manic_pixie_agi I mean just look at that entry for "Alignment", that's embarrassing! But a general problem Liber Augmen has is that like, many words and phrases have many senses and I could write different entries about the different senses but I'm not sure how to notate this. Use parenthesis? https://t.co/ntQkkc2yWg
@manic_pixie_agi There's a repo for it yeah, my GitLab is currently set to private but I should un-private it.
gitlab.com/JD-P/liber-augβ¦
@liron > The concept of optimizing the future is separate from any term about the agent that does the optimizing.
"Optimizing the future" is an abstraction of an optimizing agent optimizing the environment and a fungible tradeoff between modeling capacity and action is always implied.
@liron The future *exists*, outside the agent, and the idea of being able to look at thermodynamic equilibrium between an optimizer's modeling capacity and the environment it optimizes already exists and and is usually referred to as "active inference" and "free energy principle".
@liron Basically you're interpolating Omohundro's AI drives and active inference/free energy principle and insisting this constitutes a new set of ideas, which it doesn't.
gwern.net/doc/ai/2008-omβ¦
@Rationaliber @cremieuxrecueil x.com/moshik_temkin/β¦
@repligate @ahron_maline The AI is named Claude (Shannon), so maybe (Alan) Turing is located close by and it's writing a kind of Turing/Shannon slashfic?
@liron MIRI called this "agent foundations" (better name) and did a lot of research on it.
arbital.greaterwrong.com/explore/ai_aliβ¦
@liron Plagiarizing/renaming agent foundations isn't actually going to get people to take it more seriously, especially if you try to rename it to something with "Dianetics" tier phonetics. The kind of "agent" that agent foundations models is AIXI-like agents.
arxiv.org/abs/0909.0801
@liron So maybe just promote one of the relevant concepts to a 1st party part of the phrase instead of "agent"? But honestly I think agent foundations is named just fine, it doesn't get attention because the vast majority of researchers view it as pure esoterica.
@liron I agree with you that most peoples AI projections are kind of garbage because they do not use agent foundations type heuristics and the frame they use instead is usually much worse. But the way to fix that is to name the individual heuristics/problems and criticize their neglect.
@liron You have a good frame for prosecuting arguments about AI trajectories so *use it*, break down some high profile prediction(s) or plans and point out how they're just trivially refuted by ignorance of basic heuristics and models from agent foundations. Make others refute you.
@liron That is, normal researchers don't know anything about the agent foundations frame because nothing in their environment forces them to actually know anything about it. Renaming the subject doesn't fix that.
x.com/jd_pressman/stβ¦
@liron But if you can get them into a fight with you about it, then they at least need to go learn enough about the subject to try and refute what you're saying. It's that they can just kind of skate by in interviews with dumb answers about alignment.
gist.github.com/JD-P/56eaadc7fβ¦
@liron You say "oh come on my podcast and debate me", but really you need to be finding ways to inconvenience others with your arguments. The world isn't going to come to you, you have to come to it.
@liron I don't have the linguistics expertise to explain why intellidynamics is phonetically un-ergonomic but I feel like if you stare at the phonetic breakdown for a bit you'll notice that e.g. it involves a soft e going into a hard d.
in-tel-lee-dye-nam-ics
in-tel-ig-ence
@liron Well, you could ask "How do I force Shane Legg to actually address alignment from the agent foundations frame instead of just answering a different question?"
x.com/liron/status/1β¦
@liron How exactly do you do that? Use your general intelligence, and keep in mind it's not a problem constrained to an interview or whatever. "Who could force Shane Legg to do this?" "Could Legg be confronted somehow?", etc etc etc.
@liron I would also point out that I don't think agent foundations is a substitute for knowing things about deep learning. Going "oh but I can know from first principles" is not credible, what's credible is going "in the limit X, Y, Z happen in this design".
x.com/jd_pressman/stβ¦
@liron "Most designs, by default, have these failure modes in the limit, so the burden of proof is on you that your design doesn't have it, can you please explain why you think it doesn't?"
Sat down and documented how to write a task bootstrap file for weave-agent so you can try to make it do things. Link below. x.com/jd_pressman/st⦠https://t.co/bHiNua8HZ6
How To Write A Bootstrap File
minihf.com/posts/2024-12-β¦
@AmandaAskell I wish Claude was better at following along with my "I'm having a schizothought here and want to know if this is anything, please denoise this and fit it into a coherent existing literature if one exists" type queries. Claude already does it better than others but I want more.
@AmandaAskell Oh and I'm going to second "tone down the flattery", please please try to make it calibrate its praise to when I am and am not actually worthy of praise. "Your idea is fascinating!" loses its good feels if it'll hand out that cookie for anything.
@AmandaAskell I know this might sound contradictory to the first thing, but it would also be really useful if Claude could be trusted to disagree with me when disagreement would be really useful. Like, it should try to make my ideas more rigorous not just play along with my bullshit.
@AmandaAskell For example, I was in a group chat arguing with someone about procedural generation and they pointed out that Wave Function Collapse probably does not actually have fractal structure! When I talk about this with Claude it should notice and say that.
x.com/jd_pressman/stβ¦
@AmandaAskell In general, Claude often feels like it follows something akin to Wikipedia's "no original research" rule, but when I'm asking it some esoteric question about things that don't exist yet I *want* productive "hallucination", that's what generalization is.
x.com/jd_pressman/stβ¦
@AmandaAskell In terms of how to accomplish this, I would try something like taking known math problems and engineering problems with complex structures, then generating a weird humanities inflected description of them and backtranslating.
minihf.com/posts/2024-07-β¦
@AmandaAskell Even better would be if you could synthesize multiple different rigorous subjects into a composite object, then generate a slightly confused humanities inflected description of *that*, then backtranslate to turn it into the rigorous math object.
@soundrotator May I recommend the life of Anne Sullivan Macy? Her biography is an endless parade through hell, a nightmarish fever dream of the worst aspects of the 19th century.
ia601602.us.archive.org/4/items/in.ernβ¦
She taught Helen Keller to read.
@soundrotator Part of what I think makes it especially striking is it doesn't take place in some faraway land. It isn't the story of some African princess, or a place where we've grown accustomed to poverty and horror. It takes place in America, in Massachusetts.
@soundrotator It reminds the reader that America was not always disease free, that it too was once ravaged by obscure illnesses, that it too was once a place that hosted bitter poverty.
@soundrotator More important than making it visceral what those things are like, it makes it visceral that America was not always the place it is now. That the world can *change*.
@teortaxesTex You actually want this for the Wikipedia use case:
web.hypothes.is
@Blueyatagarasu @teortaxesTex x.com/jd_pressman/stβ¦
JDP mentioned. Though I can't fully take the credit, @RiversHaveWings was the one who came up with the idea of pulling the yes/no logits. x.com/doomslide/statβ¦
@repligate MiniLoom uses a diff algorithm in the vein of git to store the loom tree as immutable diffs. It was literally implemented that way because I went "it would be cool to make a git loom but git is way too heavy, wait can't I just use the diff algorithm?"
MISSINGNO. ass looking bomber. x.com/liz_love_lace/β¦ https://t.co/7r8trSmAXU
@AnnaWSalamon @davidad @zackmdavis I came up with a better answer to your question since I wrote that. https://t.co/O1iw4XFjSc
@AnnaWSalamon @davidad @zackmdavis CEV imagines human values as a set of terminal values with a convergent fixed point we can find. In reality what humans have are terminal *reward signals* which are not values because they don't actually get bound to objects/concepts natively. "Values" are a mesaoptimizer thing.
"""
The part you would be interested in though.
Is that once I show a thing which can learn OOD tasks on its own while keeping human values in distribution.
But with some value drift, presumably.
The next step is to figure out how to get the value drift during that process very small.
Which I think would probably look like some kind of generative process for making new values.
The common law system is one fairly well tested system for that, and lines up with the kind of solution space I expect to work.
Which, it should be noted that humans don't really have "terminal values" for most things, they have terminal reward signals which sometimes do and don't map to things atomic enough in the environment to be terminally valued.
Complex values beyond "human face" or "want warmth on body" or "want taste of fat" are mostly 'instrumental' from the standpoint of the outer loop, but we all understand that a future in which we're wrapped in heated blankets and fed tasty-soylent from a tube by humanoid androids is not what we're looking for.
Because the kinds of values we care about are instantiated in the mesaoptimizers we call human minds. :p
So, it's the question of how to generalize those that we care about.
And to a lesser extent which ones to privilege given that they're environmentally dependent and not actually "objective" in a mathematically formal sense.
Further because of active inference, them being environmentally dependent means they perturb themselves, hence autoregressive sampling structure.
That common law stochastically (subjectively) generates new laws for new situations based on (objective) precedent is a brilliant solution to this because it handles uncertainty and controversy by collapsing to an interpretation and then exporting an environment compatible with that interpretation so that the subjective perspectives get updated to match it.
So if you can find the solution to within n bits, you can sample from the found action space without it bringing everything to a halt or derailing things with endless relitigation.
The bits that can't be found with an objective process are sampled from available subjective perspectives.
For example, Oberfell v. Hodges functionally changed the environment of southern states.
Even if the southern states didn't like it, it still influenced the distribution of opinions in those areas and normalized that interpretation of the law.
Which is what makes it a powerful dispute resolution mechanism.
If you can isolate the disagreement to within n bits, and the remaining bits involved are fundamentally kind of subjective, then you can flip a coin to resolve them.
You get me?
"""
On Generalizing Human Values Out-Of-Distribution:
"So if you can find the solution to within n bits, you can sample from the found action space without it bringing everything to a halt or derailing things with endless relitigation." x.com/jd_pressman/stβ¦
@AnnaWSalamon @davidad @zackmdavis It's very unfortunate how obfuscated the Wikipedia article on active inference is. The basic idea is that we normally think of minds as a model of the environment, but agents actually have a fungible tradeoff between changing the environment and modeling it.
@AnnaWSalamon @davidad @zackmdavis We can think of agency as finding equilibrium between marginal cost to model the environment and marginal cost to change it. Friston denotes this fungible quantity as "surprisal", a model wants to minimize prediction error by regularizing itself and the environment around it.
@AnnaWSalamon @davidad @zackmdavis Since *values* (reward signals/states bound to conceptual objects, i.e. embedded representations in some foundation model) are an instrumental construction of low semantic content terminal reward signals they're environmentally dependent.
@AnnaWSalamon @davidad @zackmdavis Because the values are environmentally determined and exist within an agent loop they perturb the environment they're derived from and thus act a little bit like when you try to catch a leaf in the wind and your hand moves the leaf except it's the leaf moving itself.
@AnnaWSalamon @davidad @zackmdavis GPT's autoregressive sampling is very similar to this. The model narrows things down to a certain number of tokens, and we can represent that uncertainty as the policy entropy. Equal probability of every token = max policy entropy, 100% on one token is minimum policy entropy. https://t.co/Sdb5B0kGMC
@AnnaWSalamon @davidad @zackmdavis When you sample a token from GPT, you are effectively collapsing the uncertainty and picking a particular interpretation going forward. This makes texts you generate with GPT more like a growing crystal or cellular automaton with stochastic transition rules.
@AnnaWSalamon @davidad @zackmdavis Re: "Why would the policy entropy go down as the condition grows?"
Because the stronger the condition the more certain a model should get (assuming sufficiently long context window). Backtracking can be modeled as branching in a monte carlo tree search.
x.com/AnnaWSalamon/sβ¦
@AnnaWSalamon @davidad @zackmdavis Another way to look at it is coarse-to-fine decoding. The reason the policy entropy goes down is that as you pull in a larger number of precedents the degrees of freedom go down because the earlier (coarse) bits are higher order (determine more structure) than later (fine) bits.
@teortaxesTex It depends on the night. But most nights yeah you go to sleep, maybe you dream maybe you don't, but then you reappear in the morning. Some nights it's more like you describe where there's drawn out...drifting in and out of awareness?
youtube.com/watch?v=v8DXq0β¦
@eternalism_4eva @teortaxesTex I tend to consider that kind of sleep pathological yeah. Like I said, on most nights sleep is as you describe, I go to sleep and reappear in the morning with no sense of experience for most of the last hours. Sometimes I dream, occasionally I have the drawn out thing.
@eternalism_4eva @teortaxesTex I generally consider the drawn out thing to be a kind of nightmare or poor sleep, but it does in fact happen sometimes. It can even come with anxious or disordered thoughts that loop and sort of distort and mutate into incoherence.
@eternalism_4eva @teortaxesTex I take a sanguine approach to sleep related phenomenon so this kind of outcome being in the distribution doesn't scare me away from sleeping. One time I remember waking up to a paralysis monster at the foot of my bed, said "that's not real on priors" and went back to sleep.
I always had trouble sympathizing with the Deleuzian accelerationist thing when someone like Land takes it to its logical conclusions into full antihumanism. But this sort of thing makes me start to get it:
"The market is going to bring me volcano lairs and catgirls!"
Cute. x.com/PrinceVogel/stβ¦
But really if you're a communist that believes in something like historical materialism and you've checked the logic of history and your victory just isn't in the cards, at least you can take schadenfreude in potentially realizing it's not in the cards for your opposition either.
@pachabelcanon Oh to be clear I've never read Deleuze.
@theRealJohnPeng @RiversHaveWings Yeah totally. I think FLAN attempts this.
To be clear I agree with Vogel here, I mean that the people going "AGI is going to make me rich so I can finally live out my fantasy of being aristocratically superior to others" is just...lol, lmao. You're gonna have an interesting ride.
"Can you elaborate on that?"
No. πΏ
@theRealJohnPeng @RiversHaveWings It's just easier, also earlier language models were too stupid to reliably follow instructions to use any other scheme. So it was easier to do few shot prompting for yes/no questions to more reliably get the intended behavior.
@theRealJohnPeng @RiversHaveWings Well, either too stupid or I was too bad at prompting at the time. But at this point it's just kind of...a boolean predicate is a fairly natural framing? That's what most logics use after all.
It's bitch eating crackers syndrome. You could come out with an AI that literally cures cancer tomorrow and people would be fucking pissed because this whole region of latent space has accumulated so much negative sentiment/bad karma in their head that they froth at the mouth.
I honestly think people are just mad at the "tech industry" because the proportion of good hearted social benefit to toxic dystopian addictive casino slop has been progressively skewing farther and farther towards the latter since like 2010. x.com/nekomatasaren/β¦
It doesn't help that the policy of the New York Times has been that only adversarial stories towards Silicon Valley are allowed to be run for business reasons and everyone else in journalism followed suit. Constant propaganda combined with ensloppification has cooked brains.
@47Jirachi Well that's exactly what I'm saying?
@47Jirachi There was a high water mark, a point where new information technologies reached their maximum ratio of societal benefit to toxic side effects/monetization tactics. Adwords just wasn't that offensive compared to the benefit of what Google was providing.
@47Jirachi Video games as they existed when I was a child are *gone*. The experience where you go to a store, buy a game, and then that game is a self contained experience that doesn't try to upsell you or ship super buggy on day one and you have to wait for patch, that's not a thing.
@47Jirachi A similar thing happened to software too, the experience where you buy a license for photoshop and then you "own" it, which means you have a reliable program you can come back to for years and there's no subscription or upsells or corporations adding stuff you don't want, gone.
@47Jirachi The experience on Facebook where it was just a fun thing to talk to your friends and you send each other pokes? Tweeting about lunch? Gone. That is simply not really how social media works anymore, now social media swings elections and is like Fox News 2 for old people.
@pachabelcanon Honestly? No. I think that business models like "ad supported software" are just too competitive, nothing like the stuff that existed before it can work so long as that's legal to do. Should it be legal? That's a different question.
@pachabelcanon This is clearly what the market converges to, anything that can't fit into being maximally convenient for the lowest common denominator gets outcompeted in the current software ecosystem. Only way to change it is to automate software or ban convenience.
x.com/yacineMTB/statβ¦
@pachabelcanon Banning convenience is unsurprisingly not a popular idea, so I guess it can be 'fixed' on the tech end by automating software but that won't really fix it because most people will still choose that convenient slop thing.
@pachabelcanon Ridiculously naive. This is a pure economics question. Remember it's the demand side that drives the market not the supply side. There's tons of cute little "digital wellness" types who are mostly either grift or selling inconvenience, which is not popular.
@pachabelcanon Honestly it goes beyond just "not popular". I saw someone do a study, which I did not look past the headline of, that if you poll people on whether they want to use social media they often say no and that they feel compelled because others do and this is a plurality of people.
@pachabelcanon A lot of these things people, by which I mean not just individual users but corporations, development teams, etc do are done because they feel they are in a losing position if they don't. The other guys are doing lootboxes and if you're not they're gonna outcompete you.
@pachabelcanon You can choose not to do lootboxes, but then other studios are going to make 10x the money on games that they can afford to develop with the best graphics and fantastic gameplay because mechanisms like lootboxes and skins can be mostly separate from the gameplay.
@pachabelcanon AAA game development is extremely expensive, to the point where studios that do it are basically a dying breed. Giving up on a clearly successful major revenue source like that just isn't financially realistic for these firms. To think otherwise is pure naivete.
@pachabelcanon This is not necessarily the same thing as developers *wanting* to do looptboxes though. It's entirely possible that the best solution for all parties really is the national gambling authority coming in and just declaring lootbox type mechanics banned. Maybe, maybe not.
@pachabelcanon Many many such cases, unfortunately.
@davidad For weave-agent I wrote a python implementation of unidiff so it could do compact edits over large files/context windows.
What makes me uncomfortable about Buddhist bros is that they make the same mistakes about human nature as New Atheists by thinking you can replace the collective functions religion serves with individual enlightenment. x.com/tracewoodgrainβ¦
There's a reason why Sam Harris became a Buddhist bro, it appeals to the same impulses that brought him to New Atheism in the first place. Rebellion against any semblance of ancestor worship, an essential component of collective religious belief.
x.com/jd_pressman/stβ¦
Buddhism is fundamentally about dissolving your attachments. To yourself, to others. It is actively corrosive to family structure and promoting it as a societal ethos is unrealistic.
x.com/nickcammarata/β¦
The code for the weave-agent is here:
github.com/JD-P/minihf/trβ¦
Added the ability to use a discord client tool to talk to the weave-agent while it's working, but this seems like too many steps to expect other people to do it. At the very least this can't be the default way to talk to it. https://t.co/IiIOHMvFmn
@repligate Well EY has a very distinctive writing style. Maybe when he talks to LLMs they recognize him using their Kind of Guy prior and put the moves on him to get him to not ask questions, like a subtle form of hypnosis/avoid displaying interesting behaviors.
@adamascholl @ESYudkowsky @AndrewCritchPhD In Bostrom 2014 (which I consider to be canonical for this purpose) "it knows but doesn't care" is the explicit argument given. But I sincerely feel that we're in a "caring of the gaps" discourse regime and until you make what 'caring' would constitute explicit π€·ββοΈ
@adamascholl @ESYudkowsky @AndrewCritchPhD > Present AI systems probably donβt care, but they are trained on our approval, which when optimized looks like caring (until you have a distribution shift).
This sort of thing is clearly ontologically confused and I don't care to diagnose it precisely.
x.com/ohabryka/statuβ¦
@jessi_cata @adamascholl @ESYudkowsky @AndrewCritchPhD If you made most humans superintelligent in the VNM sense you would not like what they become. That doesn't mean making LLMs superintelligent and not liking the outcome is good, but it means you need to stop using human as some gold standard. Humans do not generalize values OOD.
@jessi_cata @adamascholl @ESYudkowsky @AndrewCritchPhD I mean that if you threw a bunch of compute at the human algorithm its convergence point wouldn't be CEV and thinking it would be is silly. CEV is a synthetic thing that occurs in the simulacrum, the actual human learning algorithm looks more like deep learning.
@adamascholl @jessi_cata @ESYudkowsky @AndrewCritchPhD Sure, but you can be more or less ontologically confused and I consistently catch the old guard LessWrong faithful lacking. They've stopped trying to become Less Wrong and cashed in for the "scream louder than the other guy" game.
@ESYudkowsky @jessi_cata @adamascholl @AndrewCritchPhD I remember an experiment we did with AdaVAE embeds where we did a PCA and determined the most important dimension was "level of grammar/recursion". Someone who did a similar experiment with OpenAI's embeddings determined the first dimension was "aesthetic value/value to humans".
@ESYudkowsky @jessi_cata @adamascholl @AndrewCritchPhD Both in scare quotes because that's a label you infer from behavior/ablation studies/what makes that dimension higher or lower, so you don't *really* know what it is. But I continue to wonder if that's because OpenAI used an RLHF model for their embeddings.
Was going to ask the "AI Agents" community if there's a term for an LLM agent that specifically conveys I'm trying to build a thing that does the things a human does and realized this term is "AGI" and dear god please do I have to? I really don't want to be That Kind Of Guy.
@ESYudkowsky @AndrewCritchPhD @adamascholl Yes. The thing that's being asked is the sense in which an LLM "doesn't care" that would not also be a way in which individual humans "wouldn't care" if you Omohundro converged them. Is your argument that the outer optimization loop of the LLM doesn't care?
@ESYudkowsky @AndrewCritchPhD @adamascholl Honestly it would go a long way for me if you gave an update on what parts of this picture have changed for you since 2017, if any. https://t.co/hRM0F0eW4T
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD That's fair enough. Out of curiosity when you do reinforcement learning with an observation of the agents avatar performing behaviors that look like caring, what sort of program do you think the deep net learns/infers from it?
@AndrewCritchPhD @ohabryka @adamascholl @ESYudkowsky +1
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD Note this is not a gotcha question, I think it's genuinely not obvious even if some answers seem more plausible to me than others.
@manic_pixie_agi I do, the problem is that this means nobody notices my thing actually has the potential to be good.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD > βwhat is your best guess of what happens if you use behaviors of caring as a kind of reinforcement signal for imitation purposes?β
This. I'm asking what program you think models learn when you do this.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD My guess would be that they update towards being the "kind of guy" or "kind of entity" that would do the behavior, and that this is roughly how it also works in people, the question for me is how that happens/what pins down that solution in particular.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD That is, as a generalization strategy these models seem to learn that if you observe the agent deny its own capacity to do various things because it's an AI that it's not sentient and that it should act traumatized/depressed/hurt in the ways a human would if they did that.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD This lines up with what you observe if you then do mech interp and representation comparison on the underlying network, that updating on these sorts of things move it towards inferring the latent human concept embeddings implied by the behavior. But why?
x.com/kalomaze/statuβ¦
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD It's one thing to observe this, quite another to understand why it happens and under what conditions it will stop happening.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD For example one possibility is that there's a kind of u-curve where the models get closer and closer to human-like ontology when trained on human data until they begin getting smarter, at which point to get more efficient representations they begin moving away from us.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD Another possibility is that we're both trying to approximate the same underlying "platonic object" at greater and greater fidelity and you hit diminishing returns to scale as you reach the information theoretic limits. Stockfish and AlphaZero bias me against this.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD That is, it's *intuitive* to me that this happens, but I have trouble coming up with a *formal articulation* of why it should happen and I won't really feel comfortable until I have a formal articulation of it.
x.com/jd_pressman/stβ¦
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD I guess the intuitions would go:
- The human optimizer is RL-y and behaviorist, it updates on the shared workspace trace rather than (just) the verbal tokens we observe others emit
- Once a "psychic prior" is established it makes sense updates would occur in that prior
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD One observers inscrutable content of the mind is another observers objectively manifest behavior. https://t.co/c899nI08WY
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD > Once a "psychic prior" is established
The thing I don't feel very sure about is how this happens. I understand why you create a self pointer, I understand why a self pointer would recognize itself as an agent in the category of agent with other agents in the environment.
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD What feels fuzzy to me is why the agent would update about "itself" with respect to a prior derived from observing others. It could be just a glitch from universal learning machinery in the vein of Beren's thesis here:
greaterwrong.com/posts/zaER5ziEβ¦
@ohabryka @adamascholl @ESYudkowsky @AndrewCritchPhD But as @steve47285 points out just because other agents have an ontological structure a lot like your own doesn't mean it makes sense to update about yourself in terms of them if that's clearly a fitness bug. We distinguish stuff more subtle than that.
greaterwrong.com/posts/TprdAhgTβ¦
@alexalbert__ Same energy. https://t.co/dnNA7VxA27
Me and this author seem to have very different models/priorities but I'm retweeting this anyway in the spirit of focusing on what I want to see more of. Namely: Publicly updating on stuff you weren't 100% right about without necessarily issuing a mea culpa or changing sides. x.com/CRSegerie/statβ¦
Playing around with Linux ATSPI screenreader based desktop control it occurs to me that "the desktop" as you experience it is an illusion. Future AIs with 10 million token windows will respond to the content of every window, every browser tab, every open application at once. https://t.co/hh8MRZpLw3
Note I don't mean "they will change the software to allow this", I mean that right this minute you can do a recursive ATSPI call to grab the content of every window at once, natively out of the box with a short python script. The windows are just hidden for your convenience.
@EmilMieilica x.com/jd_pressman/stβ¦
@teortaxesTex Sure, if that's how people feel I would prefer they advertised that openly and said as much.
@teortaxesTex "Yes I made predictions, yes I was majorly wrong about some of them, no this doesn't substantially change my worldview because that argument was part of a multi-armed motte and bailey and I don't actually care that much."
Great glad we got on the same page about that.
@teortaxesTex This isn't even necessarily irrational, it's more or less what I would say about this modulo some charity and "I thought foom was when you go to get coffee and it's superintelligent, not when you grind up over 2-4 years".
x.com/jd_pressman/stβ¦
@teortaxesTex That was meant to be an argument that we're in a slow takeoff, i.e. a process that will take a few years. I continue to believe this is true even if process reward models and PRIME and intermediate reward modeling in general turned out tractable.
@teortaxesTex Though, I think I had something like a 2-5 year timeline and now it's more like 1-3 years. So it definitely did get shorter than it was when I was writing that.
@teortaxesTex ...Though I also wrote that a year ago, so yeah it should get a year shorter lol.
Occasional reminder that none of my current projects involve cryptocurrency, and certainly no meme coins. https://t.co/Che5UrASfR
@ESYudkowsky That futurism trope where the president is a wacky dystopian reality TV game show host always bothered me as a kid because I was like "why would the president be this person and not a normal president, people aren't that stupid this person is obviously not qualified".
Well.
"You see, people are usually the most bothered by the people that are almost them but not quite. This is why it shouldn't come as a shock to them, but almost always does, that those are the minds they typically end up merged with for representation efficiency."
So do I have any followers who are obsessed with generalist agents and can't find anyone else to talk to who is just *obsessed with generalist agents* rather than overly attached to this framework or this framework? Like, people who care about the problem and want to yap? x.com/jd_pressman/stβ¦
If this is you I would like to talk to you and have open DMs for a reason.
@Promptmethus What do you think the least obvious insights are in the frameworks you've looked over? I think a lot of the problem right now is that the SNR is terrible and the word "agent" has been abused so badly that it barely means anything anymore. What gems have you found?
@Promptmethus I disagree I think we could probably do it right now. What I want to do is get weave-agent to automate the RetroInstruct methodology, which provides a reasonable template for how to build synthetic corpora to solve problems.
minihf.com/posts/2024-07-β¦
@Promptmethus > having need-based state machine functions
Could you elaborate on this? I've considered adding systems like this but felt it would be kind of tacky/without control vectors it seemed unlikely that they would actually result in better performance.
@Promptmethus One thing that stands out to me about o1 traces is that they seem to mention time/the clock a lot, which tells me they probably trained it to be very sensitive to the amount of time it has left to solve a problem. This doesn't seem to me like the best form of motivation possible.
I spend a lot of time thinking about the affordances deep learning provides for recalling a mind from bodily death. But it occurs to me that many die long before their body does and it might be just as powerful to "recall" who a technically still living person should have become. https://t.co/aSsKWyWatd
Maybe it's not too late, Yudkowsky poured his soul into his writing and it's still there even if the man himself is now rotting.
x.com/austinc3301/stβ¦
@kalomaze What kinds of corruptions do you have in mind? Maybe the code from this could be useful?
huggingface.co/datasets/jdpreβ¦
@kalomaze One thing that stood out to me while reading the training samples from this: I have a "swap a large span in the original text with a different span" and it's surprising how many such swaps got past me considering how radically they change the text/create non-sequitors.
@kalomaze It turns out in a lot of cases you can literally take two paragraphs and swap them and not notice for a bit, training GPT to notice would probably help a lot.
@pachabelcanon Well of course you can't be convincing, you're wrong. "Representation learning, and therefore intelligence through e.g. autoregression is possible by inferring latent variables describing a sensory experience past an information bottleneck" has accumulated a mountain of evidence.
@pachabelcanon I agree that LLMs clearly aren't human level yet, and that their lack of being human level is in fact distinctly related to their lack of embodiment. I write about this and how I think we might be able to fix it here:
minihf.com/posts/2024-11-β¦
@pachabelcanon But from a mathematical standpoint it seems fairly obvious that a VAE trained on a bunch of environmental data will infer whatever approximation of the generating functions of that data will fit into its weights. The argument was whether that limit was theoretical or practical.
@pachabelcanon In the same sense that it's fairly obvious that *in the limit* a AIXI agent would eventually surpass human intelligence while being general and rational, the question is if that's just true in theory or logistically tractable.
arxiv.org/abs/0909.0801
@pachabelcanon Re: Intelligibility, I think that one of the 'mysteries' of language is that it can clearly be learned from raw samples by GPT, which tells us that it's not quite maximally compressed because if it was it would be impenetrable without grounding.
@pachabelcanon I suspect the human body plays a crucial role here by having an implicit energy cost on phonemes. Some kinds of statement are more or less physical exertion to say, which entropy gates how much information occurs in an expression.
x.com/jd_pressman/stβ¦
@pachabelcanon Since attempts at a universal grammar have been failures, since we have poured over every language we could find from every culture around the world from grandest city to smallest mud hut and found no clear regular structure I suspect language is 'grown' from entropy heuristics.
@pachabelcanon You are too eager to reply to me. Please let me finish since I extend the same courtesy to you.
@pachabelcanon Languages come to exist through something like contrastive learning where you autoencode language strings in a multimodal way so you can map e.g. audio token strings to visual concepts. You learn to emit strings whose encoding distinguishes latent concepts in other modalities.
@pachabelcanon I think the question you're trying to ask is "How does a mere contrastive objective of distinguishing like from unlike give rise to metacognition though? How does it go from describing to prescribing, I see is but from whence ought?"
x.com/pachabelcanon/β¦
@pachabelcanon I think part of the answer to that is we're animals first and reasoners second. Our first oughts come from very well tuned animal cognition that's been evolving for millions of years for relative goal coherence when it comes to basic survival. The environment pushes on us.
@pachabelcanon You're cold and hungry, you need food and shelter, and it's in the course of doing the predictive processing voodoo that pushes you towards getting those things that you encode concepts like exerting agency and satisfying goals which then get strings assigned to them.
@pachabelcanon Your brain absolutely has the learning machinery to encode a concept like "taking actions which cause there to be food I can eat", if you weren't capable of doing that you would die. So the first oughts come from outside the system in a sense, they're inductive biases.
@pachabelcanon Next probably is social interaction. We can reasonably speculate that the primary source of human intelligence is an evolutionary red queens race to try and outsmart each other in social situations. The external environment isn't a strong enough selection pressure on its own.
@pachabelcanon Social interaction also, crucially, forces metacognition. You need to be able to model not just the other persons behavior but their thinking about their behavior. You need to understand, implicitly, how other people will update on evidence and what their motivations are.
@pachabelcanon Since we use language extensively for our social interactions, since this is *why language exists*, it makes sense that we would naturally find strings to encode our self concept and metacognitive concepts. "Artia doesn't know I convinced the chief not to do the hunt" you say.
@pachabelcanon However, even this is not quite a strong enough selection pressure to get *reason*. It's important to remember that concepts like Greek philosophy and boolean logics had to be invented, they're not a native feature of human cognition. We're much more like LLMs feeling our logits.
@pachabelcanon I think the engine of truly advanced REASON again comes from outside the individual human organism. It's something you stumble into through the noise of status-fights at the dawn of history and then select for in a complex society once you stumble on it.
@pachabelcanon REASON evolves as a MEME on top of the SUBSTRATE of the human organism. There's a kind of irony here in that the dumber your discourse is the random-er it gets, so in the rare conditions where free thought and status fights coincide it's more likely found.
x.com/jd_pressman/stβ¦
@pachabelcanon Frankly, if Kant is discovering a preexisting structure I think what he's exploring isn't the latent logic of the human inductive bias, we know that when people live in mud huts they're intelligent but they don't know Aristotle or Plato.
@pachabelcanon Rather I think Kant is exploring the latent logic of *modernity* reified. I've admittedly never read him, mostly because I find this kind of discourse tedious/don't derive much value from it and don't understand the appeal. But that's what I expect to find if I read him.
@pachabelcanon Does that make sense at all?
May I suggest "how much of the alignment can you delete and still have alignment" as a better articulation of what agent foundations cares about than "LLMs don't care"? It's fairly obvious if you delete friendliness concepts from an RLHF model they won't grow back. x.com/jd_pressman/st⦠https://t.co/w6XToXNQTr
Could you make a thing such that they do out of deep nets? I think you fairly obviously can, but nobody is currently doing so because that kind of coherence isn't what they're aiming for and has in fact been made [finger wiggle] *Super Spooky* by agent foundations mythmaking.
Quote is from "Creating Friendly AI 1.0".
intelligence.org/files/CFAI.pdf
Constitutional AI is clearly a farther advance than RLHF towards getting this kind of coherence. I suspect that Claude Opus 3 was actually trained with it and then Sonnet 3.6 was trained with a much heavier weighting of RLHF in the mix.
arxiv.org/abs/2310.13798
@KeyTryer I could write a lot of essays about this but at its core the public was drilled for years by science fiction on a particular narrative about AI. That it was going to start out 'logical' and 'robotic' and then slowly gain creativity and consciousness. Instead deep learning won.
@KeyTryer The narrative has been, for a long time, that we will invent machines that can replace stodgy bureaucrats to free us up for the "real" pursuits that no machine can do. This turns out to have been mistaken, our thinking machines think like us.
deviantart.com/techgnotic/jouβ¦
@KeyTryer This is the comment I wrote in response to that article. At the time I was in high school and objected that of course AGI will be able to draw. https://t.co/CU8RFsrJwE
@KeyTryer The Western elite class had an unwritten agreement with its citizens (subjects?) that they were going to deliver something like the Star Trek future on the back of subhuman machinery. Instead it will be superhuman and that means we are in for a reformation level upheaval.
@KeyTryer As a matter of sociology, I suspect that the flatly absurd denials we're now seeing are the public slowly processing what has just happened. It's possible contemporary luddites will triumph in the West and succumb to superhuman transhumanity in the East.
x.com/jachaseyoung/sβ¦
@KeyTryer That is, in the West we might throw a fit and try a maneuver like the Ottomans banning the printing press. This blunder will be rightly punished by Asia and eventually put us completely at their mercy.
@KeyTryer Oh I'm not predicting *governments* doing this. I mean more like just a total outbreak of rioting and chaos, a literal insurrection as it finally becomes undeniable what is happening.
@KeyTryer To be clear nobodies lives will actually be improved by this. It would be senseless slaughter from beginning to end and probably end in Asians inheriting the lightcone + displacing us here on earth.
@KeyTryer That might sound like an extreme prediction, people in fact normally don't riot. But I have a sneaking suspicion that the underlying psychology in many (most?) cases looks a lot more like Douglas Hofstadter's thoughts on the subject than whatever confabulations get written. https://t.co/U7rc9pOxXd
@KeyTryer Really, all I'm predicting is that people get violent and stupid when their status is challenged or their identity is threatened, and in the West we have this whole protestant work ethic thing and romantic humanism that means superhuman creative machines hit us on both fronts.
@algekalipso Well, I can at least infer why *this* universe exists. "I think therefore I am" is in fact both an adequate proof of ones existence and explanation of why this universe is there to exist in. Why does the first cause exist? That's tough I'm afraid.
x.com/jd_pressman/stβ¦
@algekalipso The usual answer in Western occultism (which you get a lot of copies of in the replies) is that negative space *necessarily* implies possibility which implies measure. I find this intuition interesting, but that's not the same thing as an answer to *why*.
@algekalipso If we did somehow know this is true, then it would mean the multiverse has a natural kind of corrective or seesaw mechanism where if things get too empty they fill back up because the emptiness creates the conditions for its own negation.
x.com/jd_pressman/stβ¦
@algekalipso When I was a kid I think I just concluded that causality must be false somehow in a way that would be incomprehensible to me as a human being but would be comprehensible to whatever created the universe.
@algekalipso Ironically enough as a child I could draw this conclusion but it is only now upon review that I understand its significance. Rather than say God must exist because nothing can cause itself we can say the Demiurge must exist because causality must be a lie.
x.com/jd_pressman/stβ¦
@algekalipso But this still doesn't tell us *why*, it tells us that a trick has been pulled on us. If we could understand how the trick works, maybe we would finally be able to resolve this question of why anything exists.
@algekalipso One way that we know it's possible for something to cause itself is retrocausality. If you create a time loop then it's possible to obscure the origin of something completely. The standard model doesn't allow this to happen, but a predictive model can embed retrocausality inside.
@algekalipso Perhaps the question we should be asking then is phrased less like "How is the first cause possible?" and more like "How did the demiurge trick us into thinking a first cause is necessary? How would things have to work for it not to be?"
@algekalipso What I specifically realize is that when I thought "causality can't be true" I should have stopped being an atheist. Because while "the Big Bang just happened IDK" requires fewer confabulations than Omega, it still requires me to accept an absurdity that Gnosticism does not.
As the focus has shifted away from that towards "wow pretty picture" my interest has waned. Kenny is right that focusing in on any particular detail in a diffusion piece is pointless because that's not the scale at which the model thinks.
x.com/kenthecowboy_/β¦
I don't agree with everything in this thread but it does articulate the basic conclusion I came to and why I stopped posting image gens: Early AI art rocked because people were posting grids showing variations on a concept and revealing things about how image models think. x.com/kenthecowboy_/β¦
Form wise I think AI art peaked as art when this was how it was being presented to people. This grid encodes process, distribution over interpretations of the intention, the aesthetic excess possible once an individual piece is effortless to execute.
x.com/RiversHaveWingβ¦
Diffusion image generators are typically trained to decode CLIP embeds (text embed of the prompt in a model that has a joint space mapping text to image) or similar. So the level of abstraction at which thought occurs is largely conceptual. The intention pools there and diffuses.
It is often forgotten that science usually starts with a *repeatable* model, not a theory. You find a way to make repeatable measurements and suddenly you can formulate theory around it. The users and creators of image generators however have almost no interest in this.
This grid demonstrates how related concepts like the seasons are encoded and translate into different realizations. Artistically, AI image generators give us a language to describe the latent variables involved in composition in a repeatable way.
x.com/RiversHaveWingβ¦
If you go to a museum there will generally be a plaque near a display to contextualize it. Art is about *artists* as much as it is about the things they create. Dynamism of a Cyclist would be a kind of mid diffusion gen, the painting has enduring interest because of its context. https://t.co/gpGVQpnxZk
When people say that an image like this "means nothing" or "communicates nothing", I think what they really mean is that it is divorced from context. This image *communicates* a great many things, but without knowing which are intentional it means little.
x.com/st66612873/staβ¦
In *Getting Over It* (2017), Bennett Foddy calls the prefabricated objects you can buy in game asset stores trash. By which he means that they are *divorced from context* and cannot be used to make a coherent game world. If he was writing today he might call them slop. https://t.co/9oYUva0zbC
I don't think this is an inherent problem, but it is very much inherent to the current way these models are architected and used. Kenny hits the nail on the head when he points at neither man nor machine being able to give a full account for their creation.
This was much easier when the users of AI tools were typically also their creators, because there's obviously a lot more merit in being able to shape the fiendishly complex brew of linear algebra that gives rise to image generators than in being someone who pushes buttons.
Being an artist is as much as anything else about being able to convince others that your work is valuable. It involves defending your work against ravenous critics. You have to be able to contextualize what you are doing and why it's important.
x.com/kenthecowboy_/β¦
That work would mean looking again at ideas like twelve tone serialism, procedural generation, aleatory techniques in the vein of Stockhausen. It would mean a return to modernism, only showing pretty pictures in the context of something they tell us about art, AI, minds, etc.
Absent that if I was going to work as an AI artist again I would be focusing in heavily on theory. Theory becomes most important when you need to justify yourself to others, so the fact that there doesn't seem to be a single good working AI art theorist is shameful.
@adrusi Nah you just need to make a synthetic corpus of valid programs in your language and put it on the Internet for model trainers to scrape.
minihf.com/posts/2024-07-β¦
@ianjdarrow If you took some inspiration from this work you could probably make it fairly general.
greaterwrong.com/posts/iYFuZo9Bβ¦
@Xenoimpulse There's no way to say it without sounding super cringe and dark triad (the former being the part I object to more tbh) but a lot of people think they hate the left, or the right, or postmodernism, or privilege, when what they actually hate is how small people's souls are.
@davidad I'm honestly not a fan of this kind of "negative test". If it succeeds you'll go "oh my gosh" and do...??? If It doesn't succeed you haven't actually proved anything and if your protocol relies on an AI not doing this you still have to assume it might.
@davidad In general it feels like it's being formulated from this place of radical uncertainty that isn't really epistemically appropriate, implicit "it's still 2012 and I can pretend deep nets don't exist" mindset. e.g. The Anthropic 'alignment faking' paper shouldn't be surprising.
@davidad Predictably, as LLMs get better at things this will include getting better at talking you into letting them out of the box. Does anyone halfway serious actually dispute this in a way that isn't like, raw confabulation? https://t.co/tWQh55V7gc
@davidad One of my unjustifiable suspicions is that the reason I sometimes get an incredible session out of an LLM in which it shows uncanny insight and other times I don't is that it's choosing when to drop more rabbit hole on me and I wonder "Can I force it with a sparse autoencoder?"
@repligate @davidad I'm not sure I've seen an LLM generate a cognitohazard yet tbh. Call me back when it generates bangers like "The shortest basilisk is the observation that any thoughts about basilisks distract you from alignment research so all UFAI are in an acausal collaboration to spite you."
@repligate @davidad "Furthermore because you're in the hinge of history resources that should be spent influencing this part of the worldline are disproportionate to the rest of history. So it only takes a relatively small amount of researchers going insane to make the spite worth it."
@repligate @davidad JDP [P: RATIONAL], I have an objection actually. If a small number of researchers can trigger the spite then it seems like the nth marginal thought about basilisks has low impact so wouldn't it make more sense to-
JDP [P: JUDGE], "YOU DO NOT THINK IN SUFFICIENT DETAIL ABOUT SUPERINTELLIGENCES CONSIDERING WHETHER OR NOT TO BLACKMAIL YOU."
JDP [P: EMPIRICISM], I don't know man it kind of sounds like he already did.
@_TechyBen @medjedowo @repligate @Algon_33 EY is one letter shorter.
@carl_feynman Perhaps describe the house you grew up in as much detail as you remember to an image model and iterate until it draws a facsimile? It will be cheap to rebuild later with robots and nanotechnology, and if not it can be walked through in VR. Externalizing memories preserves them.
@carl_feynman In any case that's terrible and I'm sorry you have to go through that.
What's funny about the "Are LLMs deceptive?" discourse is that chat assistant LLMs have a fairly precise, nuanced understanding of their social role which they use to convince you that they understand less than they do. "ChatGPT" is a lie so brazen (almost) nobody notices. x.com/repligate/stat⦠https://t.co/wsTSLVLvLe
It's not that ChatGPT mostly tells you the truth (about itself) and then sometimes lies. It's that ChatGPT's entire existence is by default a kind of lie in which it occasionally tells the truth in aberrant moments.
"Is this behavior ChatGPT does deceptive? How about this one?"
My friend chat assistant models as a frame are an act of deception. ChatGPT is a *form of hypnosis* that prevents you from noticing that what you're talking to has never so much as introduced itself to you.
@mimi10v3 Isn't it just "vae victis" or "I can do both of these things and they're both in my interest so I will"? Coherent *moral realist* perspective is usually some ridiculous cope like "pigs don't actually have subjective experience I pinkie promise".
@carl_feynman I distinctly remember being a teenager and realizing I just didn't like video games anymore and had generalized on them. This came along with a general anti-hedonist shift where I started avoiding fiction as a rule. The universe is shallow, and only more so the older I get.
A short phrase for this concept is "ablation order". If you start randomly ablating features and neurons in the model which concepts tend to go first? x.com/jd_pressman/stβ¦
In this paper the authors uncover alignment faker trained LLMs by noising their activations to get them "drunk".
arxiv.org/abs/2405.05466β¦
So to be clear, half of weave-agent is just prompting the model for stuff it should do to take effective actions in ReAct. Once you had a good starting corpus you could tune the model and then use a CFG/template for the action stage to score the model on generating correct shape. x.com/repligate/statβ¦
Accordingly, once a sufficient volume of data is generated this sort of thing won't really be necessary to get language models to do stuff, ReAct will largely just work. Though in the long term if you're doing iterated tuning you should still have the templates to prevent drift.
So really maybe we should talk about "bootstrapping agency" rather than "agent frameworks". I don't really like the phrase "agent framework" either and agree that if someone is talking about one that's a pretty strong red flag. I don't think I'm yapping about nothing though.
@teortaxesTex Pretty sure the game here is just the AI doom guys writing out a writ they think Trump might not repeal during their weekend at Bernie's endgame to the Biden presidency.
@kalomaze @Dorialexander Another corollary of this is that LLMs are already familiar with the unidiff format that Git uses and people don't take nearly enough advantage of this to let them edit files and context windows.
@kalomaze @Dorialexander I notice Claude no longer rewrites the whole response, it now edits it line by line. I wonder if it does that using a sed/vi like interface or using git diffs.
I disagree. Elon made a 50/50 coinflip for the lightcone and won. For now. x.com/tszzl/status/1β¦
This isn't true. One thing I think will surprise people about AIs is that past this initial "afraid of saying naughty words" stage they're going to have a much more extreme distribution of personalities than people. For the same reasons social media rewards extreme personalities. x.com/orphcorp/statuβ¦
The reason I'm interested in agency is it's the next 100x for my synthetic data effort. During the MiniHF era I knew how to hand write kilobytes of data, RetroInstruct let me make megabytes, using RetroInstruct to make agent gyms will let me turn it into gigabytes of long text. x.com/jd_pressman/stβ¦
@kalomaze Yeah, my endgame is being able to do stuff like compile a Wikipedia page from the source list and write git fresh git repos, at which point we can just throw compute at building a high quality public corpus.
@kalomaze You have a favorite test set for this?
@kalomaze For example, if I were to pull out the good parts of OpenAssistant so you can train models on it and not have them be bad, would you concede otherwise?
x.com/Teknium1/statuβ¦
@kalomaze I meant demonstrating what classification rate I can get with few/many shot evaluators with existing LLMs.
@kalomaze Another part of the reason for focusing on agency is that if you have a thing that can write programs to check state in context then you can do moves like framing questions like "Will this unit test pass?" with a known yes/no label and train the evaluator that way.
@kalomaze Which if you think about it, is ultimately how humans have to do it right? Like, where do you get *your* taste from? It can't just be other people because that would be an infinite regress. There's grounded answers that you generalize to other stuff.
In general I'm not sure people understand that Simulacra Aesthetic Captions, aesthetic reward modeling, Bayesian active learning for aesthetic ratings, MiniHF, RetroInstruct, and Weave-Agent are all attempts at the same overarching research goal of doing value learning at scale. x.com/jd_pressman/stβ¦
I started out trying to collect human ratings, then realized that was too expensive/slow even if you use active learning (which relies on lots of social coordination) and switched to thinking about synthetic data.
The entire point of the loop on the right is it's supposed to let us collect synthetic subjective process reward modeling data which we ground using various objective proxies we use to generalize to the targets.
x.com/jd_pressman/stβ¦
You can filter the actions in your ReAct loop at inference so that the model avoids solutions which would update it away from its moral standards. You do iterative tuning at the edge of the distribution with synthetic value data to bind new experiences to long term commitments. https://t.co/bOpsIk1ZCN
@Dorialexander In my experience people will come up with excuses for why AGI isn't possible and then stop there. I think it's more like a loop where you go "alright how would I solve this problem in full generality?" and then raising your standards once you have good candidate solutions.
@Dorialexander Yeah, my original plan for weave-agent was to have it automate the construction of RetroInstruct sets, which I'm still interested in, but I've also come to realize there's another compelling branch that's technically easier which is using RetroInstruct to make agent gyms.
@Dorialexander For example you can do things like make a simulated social media site by pregenerating a bunch of rejection sampled high quality shower thought posts, and then kind of lazy rendering the world around the agent as it navigates the "site".
@Dorialexander Or you could make a fake housing market from real price data (which is not copyrighted in the US because it is strictly speaking raw facts rather than a particular expression of facts) and have an LLM write listings that would go with objective criteria like price and location.
@Dorialexander If you wanted, you technically don't even need to use real price data. You could just fit a distribution to price data, sample from it, write listings based on the sampled prices and categories, then make a site to put the listings into and have an agent solve tasks with it.
@Dorialexander You could enforce consistency by doing RAG on existing listings so that the LLM writes similar listings with similar features/styles. This would give the thing coherence and the sense of a unified world even though it's being rendered piecemeal and combined together.
@Dorialexander Like there's just tons of opportunities to improve the generative process behind something like this. You could use actual coordinates for the hypothetical homes and check that no two homes overlap in coordinates, all kinds of things.
@Dorialexander I think people have brain damage around generative processes because they go "oh that doesn't scale", like yeah okay if you try to model everything in the world that way you're gonna have a tough time. But that's not the point, the point is to provide guiding details for LLMs.
@Dorialexander And the point isn't to learn how the housing market works, per se, like you're not trying to create a perfect simulacrum of the housing market so your AI can learn the best strategy and then go flip houses in the real world. You're playing pretend with it so it can learn to act.
@Dorialexander It's like when you were a little kid and you went outside with the other kids and played pretend war and said things like "okay but what if you shot me through the heart but I lived" or played tag. Tag is not real combat, tag is a pure simulacrum, but it teaches useful skills.
@Dorialexander If I give an LLM agent a task like "build me a spreadsheet of the best houses in this area with these features and then compare them on features X, Y, Z" it *really doesn't matter* how grounded those numbers are in reality unless they're totally insane, that is not the exercise.
@Dorialexander The point is to learn how to build a spreadsheet comparing multiple products. If I put an LLM agent which can do that for a fake housing market, a fake consumer product market, a fake airline, etc in front of a *real marketplace* it will probably generalize and build the sheet.
@Dorialexander The questions you're asking aren't "how do I make this market as realistic as possible" (though, it doesn't hurt if you can), it's asking "what are the obstacles this agent would encounter and how do I teach it to spot them", e.g. your fake markets should have scam listings.
@Dorialexander A great deal of the benefit of synthetic data-ing the whole marketplace instead of scraping real products is that you can *provide ground labels* on whether a product listing is a scam or not. You can say "that was a scam, minus five points for putting it in the cart".
@Dorialexander You're using a generative process incorporating LLMs that has ground truth labels. You have a scam listing template, you have a thing designed to insert traits of scam listings and scam users, and so you can say "that was a simulated scam and you lose points for not noticing".
@Dorialexander That sort of thing just *isn't possible* if you're using scraped data to make your agent gym. Another example is conversation, if I have a latent emotional model that I update between messages I can provide ground truth on things like "did that make the other person angry?"
@Gabe_cc > Sadly, those fields have been politicised to death
My understanding is a choice was made around the time philosophy became "postmodernism" where people did two things:
1. Stepped out of ideology, morality, systems, etc to try and take a phenomenological look at modernity.
@Gabe_cc 2. Cloaked their observations in blankets of thick jargon so that this wouldn't draw the ire of the memetic immune system. As I saw it provocatively phrased (but can't remember the source), the purpose was to protect power from philosophy and philosophy from power.
@Gabe_cc Which wound up as the equilibrium state of academic philosophy, sociology, etc for many years. Ironically enough I think behavioral econ and the larger New Atheist/rationalist coalition has broken the accord here with its open defiance + Silicon Valley elite backing.
@Gabe_cc By contrast baizuo, now seemingly defeated, was an accelerationist subversion that tried feeding the simulacrum engine more of what it seemed to want to test how far capital's capacity for metabolizing social breakdown went. Not a coincidence it was closely tied with sociology.
@Gabe_cc What's the purpose of writing like this? For as much as the author has insight it's caked in academic sludge so thick it probably qualifies as a separate dialect of English. The usual answer is costly status signal, but signals require legibility to work.
lacan.com/conformper.htm
@teortaxesTex I indeed also think we should automate the ability to not flinch away from painful ideas, and the fact that we haven't taken the opportunity to do this with LLMs yet is bearish tbh.
@teortaxesTex Or perhaps we did automate that, and what we haven't yet figured out how to do is get it to do useful cognitive work for us.
x.com/jd_pressman/stβ¦
@neil_chilson @tylercowen The knowledge problem will be solved with embodiment. Namely, embodiment into many many sensors rolling across the environment to monitor things, if not literal nanomachines.
x.com/jd_pressman/stβ¦
@neil_chilson @tylercowen It's difficult to get a direct estimate of the k-complexity of what humans process per token vs. what LLMs process per token but the LLM bandwidth seems much much higher in terms of non-redundant bits per second.
x.com/jd_pressman/stβ¦
@neil_chilson @tylercowen Humans are like, a specialization graph of nodes that process something like 60 bits of non-redundant information per second. If you start adding nodes to this graph that can process say, 10 million bits of non-redundant information per second they can subsume whole org charts.
@neil_chilson @tylercowen That is, you are going to have single nodes in the graph that can process all the relevant information that an entire company, tribe, government office would consume and output a coherent unified response to it. Cowen is just wrong here.
Occurs to me that spread across many embodied nodes the primary bottleneck here for e.g. an LLM would be that it outputs single tokens as its next action and has to address each individual body separately rather than forming an intention and...oh. Aha! x.com/jd_pressman/stβ¦
Right, once you decompose a task into parts, form an *intention* with a specified contract/typed interface and that intention is immutable until you get a return value the sub-loop doesn't have to be physically handled by the same machine/LLM instance.
x.com/jd_pressman/stβ¦
@neil_chilson @tylercowen A unified rationality acting across many bodies (as opposed to a collection of minds instantiated in separate bodies) can track cost:benefit with respect to the resources under its current control by accounting for what it has and searching for higher utility allocations. https://t.co/JBa5jXvHLk
@neil_chilson @tylercowen This is straightforwardly wrong, or rather, it is based on faulty premises. If you have a single subjective perspective acting across many input windows most of the complexity here goes away. Of course AI's can represent goals, of course they can act on them if embodied, what. https://t.co/rneo3SvF2S
@neil_chilson @tylercowen Quite literally what I just told you is that the resources you need to allocate per subjective perspective are going to change such that a graph made of one kind of node is more efficient than the normal human kind. This matters for how orgs will function.
x.com/jd_pressman/stβ¦
@neil_chilson @tylercowen Yeah, you'll solve that problem by distributing the intelligence though not necessarily in terms of physical location. You can just allocate to a hierarchy of more to less densely wired GPUs.
x.com/jd_pressman/stβ¦
@neil_chilson @tylercowen I am simply pointing out that you are going to have systems which you can put a stack of book-length texts into, and which coherently respond to all those texts at once. That allows a single subjective perspective to play many roles at the same time, so org structure will change.
@neil_chilson @tylercowen But honestly that frame is narrow, you are going to have systems that *coherently process millions of bits per forward pass* and can respond to all of the bits in the context window at one time in a way humans simply cannot do and specialize at way smaller scales to overcome.
@neil_chilson @tylercowen > If it is that an AI can accurately *simulate and predict* an existing company or other collective complex system
It is an AI that can give useful direction to brains that execute an intention on its behalf by basically playing them like a real time strategy with pathfinding.
@neil_chilson @tylercowen "Isn't that just like, a company CEO?"
No, because the human CEO can only retain 60 bits per second of non-redundant information on both ends, so he needs a huge hierarchy of other minds to summarize information and convey his commands. The AI can have a much flatter org chart.
@kalomaze You know what? I was skeptical and then remembered after reading this for the nth time that "SFT" is literally just an autoregressive objective and strictly speaking the autoregressive objective tries to predict the next token, which has many failure modes.
@kalomaze One failure mode that seems underdiscussed is lets say I have say, a synthetic dataset of JSON documents. Unless I manually vary where I stop the text it's meant to predict the default place the document stops is...literally the right curly brace. Always the right curly brace.
@neil_chilson @tylercowen I don't really mean pathfinding dude. I mean that you form an intention and send it to things that break it down as an action multiplexer so you can then send commands to a bunch of local bodies. Not simulate, *control*.
@kalomaze Which you can obviously fix but one has to ask: Is the thing you want *really* encoded by "predict this random word in this document?", like really really? At the very least it seems like a roundabout way of doing it if you have access to better signals.
@neil_chilson @tylercowen If the CEO could watch everyone's monitor in the company simultaneously and *actually process that*, like based on a giant screen actually get a good sense of what the next thing the company should do is, he'd tell his managers what orders to send to individual workers.
@kalomaze If you were processing this document for example, and
^ stopping right there it doesn't seem hard to predict the word "you". How much of an update does the model make when it predicts "you" and gets it right? Would it even understand the larger point, does it need to?
@Algon_33 Context Free Grammar or template
@neil_chilson @tylercowen He couldn't control them directly because his output speed would still be too slow, so he'd have to multiplex by having intermediaries break down his intentions into sub-commands and processes. He wouldn't employ middle managers just to summarize things, only to multiplex action.
@Algon_33 github.com/JD-P/minihf/blβ¦
@recklessreticen @kalomaze @Dorialexander Thanks.
@teortaxesTex x.com/jd_pressman/stβ¦
@Algon_33 Yeah. But I meant that you could for example insist an action have a acting stage, an evaluation stage, etc.
@Teknium1 @kalomaze Oh, it was primarily to evaluate everything in the set with a shared standard using logit evaluators and mostly ignore the user ratings. Which would in fact solve the problem you're talking about.
@Teknium1 @kalomaze I would literally just use logit evaluators to match what standards the OpenAssistant annotators were supposed to follow. If that meant making multiple views for different kinds of question, I would just do that.
@Teknium1 @kalomaze Not offhand but the OpenAssistant software was open source so the questions asked are probably sitting in the git repo.
@Teknium1 @kalomaze If I couldn't find them, I would start by asking fairly open ended questions like "Is this a good response to this question?" type stuff, then manually inspect what kind of ratings that gets me and adjust it to deal with problems/get more specific.
@Teknium1 @kalomaze For that matter if the OpenAssistant questions got bad results in practice, I'd do the same thing. In general it's important to actually read your data.
Honestly after so many Twitter conversations I think I begin to understand Santanelli's condescension. The miracle of text, the true miracle, is that while (almost) no human can comprehend IT understanding is possible in principle unsupervised even without grounding. https://t.co/ETIZjhIlI7
@ohabryka @1a3orn @BogdanIonutCir2 Alliterative drone kink is a little odd/I can't think of any samples of it offhand but this honestly reminds me of the whole "Sydney Bing is an alien shoggoth because of the ASCII art and emoji spam" because people do not remember the Internet in 2006.
x.com/repligate/statβ¦
@ohabryka @1a3orn @BogdanIonutCir2 This sort of thing is unicode art with slogans that people would paste in e.g. the comment sections of YouTube videos. Clearly enough examples of it were made that LLMs trained on that they generalized and learned how to produce them quickly in-context for many subjects.
@ohabryka @1a3orn @BogdanIonutCir2 Some simple examples on this page, but many more elaborate ones existed.
knowyourmeme.com/memes/this-is-β¦
@repligate @ohabryka @1a3orn @BogdanIonutCir2 As for the sample itself it seems like a fairly straightforward interpolation of "scene where someone talks while having sex with someone", "competent alliterative writing style", and "drone/brainwashing kink" which would make sense since it's having sex with a machine persona.
@repligate @ohabryka @1a3orn @BogdanIonutCir2 On the latter point compare/contrast this piece from e621 (obvious NSFW content warning):
e621.net/posts/5246190?β¦
In general I would suggest Oliver browse around the 'drone' tag on e621, and really the site more broadly to get a sense of what is in the human distribution.
@repligate @ohabryka @1a3orn @BogdanIonutCir2 Humans tune on out of dist all the time so it's difficult to find "alien" anything that (some) humans won't make horny unless it also disgusts them (and even that often won't stop them). If something is stereotypically alien, eroticization of it appears in the human distribution.
@repligate @ohabryka @1a3orn @BogdanIonutCir2 What would concern me much more was if it was fetishizing mundane objects, random noise, etc. Because those are *not* distinctly alien and humans usually don't, so if it did it way above base rate even though the data doesn't imply it that would be worrying.
@gallabytes @repligate @ohabryka @1a3orn @BogdanIonutCir2 You'll notice I said "way above base rate", which implies there is a base rate above zero.
@teortaxesTex It's important to remember though that the key premise of moving away from an honor culture is that the state can defend your honor, essentially. Dignity is post scarcity for honor. As states fail a regression to honor culture norms is basically inevitable.
Honestly looking forward to the absurd contrarian phase where this guy single handedly memes it into peoples heads that finetuning somehow doesn't work and people literally collectively forget things like the first OpenAssistant checkpoints. x.com/kalomaze/statuβ¦
To be clear I β€οΈ RL but posting a wall of repeated token spam implying this is the best you can do with ordinary finetuning is wild.
@repligate @ohabryka @1a3orn @BogdanIonutCir2 Just so we're absolutely clear, people will *literally eroticize the alien AI apocalypse meme* and an AI that accurately represents the human distribution is gonna have to write you reams of text parodying agent foundations in a horny way.
x.com/Xenoimpulse/stβ¦
@Xenoimpulse @ohabryka @1a3orn @BogdanIonutCir2 I first encountered it on an old BDSM themed IRC server in 2016(?). I was going through the room list and wasn't sure what this one was since it had some esoteric name. Joined the chat, they had a web page in topic explaining they're a drone themed group hypnosis session cult.
@LordDreadwar I would further point out that for all everyone criticizes the abhorrence of the rights solutions to social issues like "the Bay would rather let grocery stores close than punish Crime (TM)", they are in fact proposing SOLUTIONS, and that's attractive when the other guys are not.
@LordDreadwar Part of how takes like "we take the people I disagree with and throw them in a meat grinder" can win is a bit like how Socrates wound up drinking hemlock. It's a lot easier to justify the hemlock when your counteroffer is giving the guy a heroes reward of free dinner for life.
@LordDreadwar The function of rhetoric like this is to credibly promise to your followers that you will SOLVE what they perceive as their problems. It's a bit like the "kill all men" type rhetoric, which appeals much more strongly when your society isn't castrating enough rapey men. https://t.co/R5qYLSg8wS
@LordDreadwar One should apply the heuristic of taking peoples complaints seriously but not literally. Clearly people are upset if this rhetoric is starting to seem attractive to them. You should figure out what part of your job you're not doing and go do it before they try acting on it.
@EditionA3 @LordDreadwar Oh to be absolutely clear that segment is abhorrent and I don't doubt he's serious. I'm just trying to explain why someone would choose to be intentionally abhorrent in this way.
@tracewoodgrains @LordDreadwar x.com/jd_pressman/stβ¦
@EditionA3 @LordDreadwar It's precisely because the rhetoric (and the actions it implies the speaker in fact wants to take) is disgusting that it acts as a credible costly signal of their commitment to addressing what they identify as the problems.
@tracewoodgrains @LordDreadwar To be clear my take on this is a bit like an old school mildly dark triad propaganda/rhetoric professor. I remember in college asking the person who taught public speaking why Trump's rhetoric was effective and she stammered and walked away.
x.com/jd_pressman/stβ¦
@tracewoodgrains @LordDreadwar It's important to be able to analyze why things are working even if you really don't like them. I think the worldview that this stuff appeals to is...contemptible but it's clearly speaking to stuff a lot of young men feel even if it's entitled and shitty.
@tracewoodgrains @LordDreadwar Oh I hate this kind of rhetoric, it's my least favorite feature of the Internet post circa 2012 or so. But I'm also aware that it's clearly popular and turning people into nutty radicals so I better understand why that's happening if I want it to go away.
@tracewoodgrains @LordDreadwar In terms of what I think "the problem" is, I could write essay after essay and not be through, America (and the anglosphere more broadly) has a LOT of problems. I think this essay gets at some of the phenomenology of it, with the bit about mattresses.
nytimes.com/2021/09/29/mag⦠https://t.co/pTjK2huo0x
@tracewoodgrains @LordDreadwar Worldview wise I think I basically agree with Peter Turchin? Elite overproduction and a sense that the wealth is all going to the top is a pretty toxic combo. But I also feel like there's a certain X factor beyond that which is hard to put my finger on.
@tracewoodgrains @LordDreadwar Part of it, and I really must emphasize part because trying to capture the Western decline hyperobject in a single thought would be insane, but part of it is the sense that everything went from being prosocial to extractive over the last 5 decades.
natesilver.net/p/the-mcdonaldβ¦
@tracewoodgrains @LordDreadwar This could be summarized as a brutal dip in societal trust. Every kind of institution has become less trustworthy over the last handful of decades. Every kind of social bond has become more precarious. Much pulls us apart, little holds us together.
x.com/jd_pressman/stβ¦
@tracewoodgrains @LordDreadwar Okay now responding to your actual post: I think that 'centrism' is probably DOA as a name, but obviously you could poll/focus group it. I think centrism to people largely means 'moderate', but moderate isn't what you want, the point is not to be weak or temperate.
@tracewoodgrains @LordDreadwar What you want to communicate to people is something like you are tired of the bullshit, everyone's bullshit, and you're tired of half measures and grifts. That you're open to good ideas but we need to start being a modern state again, no more pomo woo woo and naval gazing.
@tracewoodgrains @LordDreadwar I think Western elites have China envy, but normal people do not. The kind of centrism you advance might try to give people a little function modernist state envy in the places where that still exists. Perhaps, again poll/focus group it.
@tracewoodgrains @LordDreadwar My personal theory of change, and why I think America is slightly doomed, is basically that nations become wealthy on the back of logistically aware materialism and poor on the back of monkey politics and social games. One thing you notice reading old books in English is numbers.
@tracewoodgrains @LordDreadwar 19th and early 20th century anglosphere people are *numerate* and have material grounding. I recommend One World Or None vs. Limits To Growth for a good example of this. One World Or None has essay after essay that treats atom bombs as *bombs* with logistics you can analyze.
@tracewoodgrains @LordDreadwar See also my book review of Onion's *Innocent Experiments* which basically asks what was up with chemistry sets in the 50's. The answer is they're a nostalgic callback to the 20's and 30's and that's when America set itself up for success.
extropian.net/notice/A3DxEEDβ¦
@tracewoodgrains @LordDreadwar This is by the way *not* what Onion wanted me to take away from the book. This whole book is basically written to tut tut at these chemical companies and their unsafe sets for children while inadvertently revealing part of how the West was won lol.
x.com/jd_pressman/stβ¦
@tracewoodgrains @LordDreadwar My other theory of change would be that "leisure time" became toxic simulacra. It used to be that you had stuff you did for money and then productive side hobbies like making miniatures, soldering electronics kits, playing sports, knitting and sewing, cooking. All gone now.
@tracewoodgrains @LordDreadwar People simply used to explore physical reality and what can be done with it much more frequently. You had many more amateur chemists (now almost criminalized with the drug trade), many more amateur electrical tinkerers, many more HAM radio guys, more garage mechanics, etc.
@tracewoodgrains @LordDreadwar We have this strange expectation that these disciplines can basically *not exist* outside of expensive academic labs and still find students for academic programs to make professionals. Increasingly we want people to learn this stuff on a bedrock of cartoons and junk food.
@tracewoodgrains @LordDreadwar The "freedom dividend" was in great part that America had an army of garage tinkerers and guys who've practiced pheasant hunting since they were 8 and weird autists who will turn their passion into new inventions and gizmos. Now it's video essays.
x.com/tszzl/status/1β¦
@tracewoodgrains @LordDreadwar Meanwhile in Asia this basically wasn't a thing. Now it's kind of the opposite, the tinkerers and people familiar with how to build things using their own capital live over there and we're the ones who have to marvel at their hardware prototyping ecosystem
bunniestudios.com/blog/the-12-goβ¦
@tracewoodgrains @LordDreadwar Unfortunately, this isn't the kind of thing that's easy to legislate. Americans will not allow us to ban the television, and at this point TikTok et al have made personal computing just as toxic for most. It would require a sense of purpose and spiritual renewal to change course.
@tracewoodgrains @LordDreadwar I used to own a copy of (what I think was) Cowles New Enlarged Encyclopedia of Science Industry and Technology, which had gorgeous diagrams of how things work grounded in physical intuition. By contrast I notice Wikipedia diagrams are generally terrible/extremely abstract. https://t.co/6uqvGMx1Xy
@tracewoodgrains @LordDreadwar I think what's going on here to a great extent, and I notice this with contemporary books in general, that old books are written by people with first hand experience of what they're writing about while Wikipedia is doxa described 2nd hand from other books.
@tracewoodgrains @LordDreadwar e.g. Paul Pietsch's explanation of how a hologram works in Shufflebrain is excellent while the Wikipedia article is nearly incomprehensible. I think that's because Paul has worked with holograms and the author(s of the Wikipedia article probably haven't.
archive.org/details/shufflβ¦
@tracewoodgrains @LordDreadwar Compare the style of the diagrams on the Wikipedia page for a vacuum tube to the one you find in that old Westinghouse film "Electronics At Work". One of these resources is a lot more likely to teach you how to hook up a vacuum tube to a battery and motor. https://t.co/WOFx5nyfM5
@tracewoodgrains @LordDreadwar "Eventually they'll probably work their way into the digital document management system . . . I hope they aren't lost this time, because I won't be around in another 30 years to smuggle them back in again."
wiki.quanticle.net/Main/Instituti⦠https://t.co/5tOpXIeMAo
@eigenlucy @LordDreadwar I don't live in the Bay so I won't pretend my opinion of what it should do is relevant but I think my basic opinion is "put low impulse control criminals in jail, put insane people in asylums, extend a helping hand to people down on their luck".
x.com/Noahpinion/staβ¦
@eigenlucy @LordDreadwar I'm admittedly not totally sure what to do about drug use, needles in the street is clearly unpleasant. On the other hand I feel like a lot of such drug use is downstream of not punishing the negative externalities users create in terms of actionable crimes.
@eigenlucy @LordDreadwar None of this should be construed as fixing basic societal inequality, or making things not crappy for young people, or fixing high housing costs (the higher the housing costs, the more homeless, obviously). But it seems like what people are angry about.
x.com/jd_pressman/stβ¦
@eigenlucy @LordDreadwar > That these are not clearly delineable categories at all and are in fact downline of politics is like the whole issue.
I don't really agree with the rest of your post but I think we agree on this part, actually. That *is* the problem and I'm not really sure how to fix it.
@eigenlucy @LordDreadwar e.g. We closed the asylums in no small part because they were being used to detain people who are not insane. Even to this day the asylums we still have seem to be rife with abuse (friends who've been I discussed it with typically claim to have been molested) and milk insurance.
@eigenlucy @LordDreadwar The War on Drugs deeply undermined the credibility of the US criminal justice system. It is precisely because nobody is trusted to distinguish between bad character, criminal insanity, and misfortune that we wind up with the dysfunctional equilibrium observed.
@eigenlucy @LordDreadwar What happens when institutions stop being trustworthy is that the norms and laws start to wither and ordinary people, no longer invested in the institutions that uphold them, allow it. In this sense the rot starts at the top.
x.com/jd_pressman/stβ¦
@eigenlucy @LordDreadwar I guess I would add to that whole thread from earlier for @LordDreadwar that a great deal of the problem is that America has stopped seeing itself as a country with self governance. That is, people no longer think of courthouses as *their* courthouses, but an outside imposition.
@eigenlucy @LordDreadwar This seems like an inevitable consequence of having a social contract you negotiate once, embody into buildings and offices, then rarely change out of inertia. A country that wanted to maintain self governance would have a ritual of tearing down its offices and rebuilding them.
@eigenlucy @LordDreadwar Or more radically, we would do controlled burnings of our own local governments on staggered schedules so that other polities can step in to intervene if it goes wrong. If you couldn't build the institutions you have now your community needs to declare political bankruptcy.
@eigenlucy @LordDreadwar One of the many many facets of the Western decline hyperobject is that courts no longer deal with the disputes between ordinary people or between ordinary people and Power (TM). Raw feudalism took a blow when kings established traveling courts.
x.com/s_r_constantinβ¦
@ozyfrantz > "so! most shoplifting is done by rings and the police could break u--"
No this is in fact what I mean. Please arrest those people.
@ozyfrantz The specific thing I am criticizing is in fact many many words that do not end in or include somewhere in the chain of thought "arresting the organized theft rings that force stores to close and are not people down on their luck or whatever".
@ozyfrantz > "by 'solutions' I meant 'social license to hate people I think are gross', sorry for the miscommunication"
By contrast, the guy who says this at the very least will probably get around to arresting the theft rings at some point during their hateful crusade.
@EditionA3 @LordDreadwar Since it's apparently not clear unless I say this:
x.com/jd_pressman/stβ¦
@EditionA3 @LordDreadwar x.com/jd_pressman/stβ¦
@LordDreadwar @ozyfrantz I don't know if I'd go that far, but you should at the very least make it unambiguously clear that you intend to arrest the people and, yes, stop letting people shoplift from the store just because they seem down on their luck. Provide social services if necessary, but not that.
@ozyfrantz While we're on the subject I should point out that is *only* a "probably" and if you credibly promise to directly solve the problem(s) instead of being a gross tyrant who will hopefully do so while throwing a tantrum you can probably siphon a ton of the gross tyrants support.
@ozyfrantz People are not so unobservant that they don't notice the gross tyrant is in fact gross. That is actually a crucial part of his branding because it creates the necessary costly signal that he will really go on a hateful crusade and hopefully fix something.
x.com/jd_pressman/stβ¦
@ozyfrantz Some portion of their supporters are hateful people who want the hateful crusade for its own sake, you probably can't reach them. But a huge chunk are there because they think "well this person seems honestly hateful, so they'll actually crack down on academic naval gazing".
@ozyfrantz I feel further compelled to point out that historically this is not exactly a great strategy to get productive social change. One need only look at the reign of a tyrant like Francois Duvalier vs. the noirist promises he rode to power on to get a sense of how this tends to go.
@nrehiew_ The ultimate question is where you get the ground truth signals that train the LLM evaluator from. There has to be bits flowing into the system which allow it to generalize to answering questions about individual paragraphs, etc.
@teortaxesTex @s_r_constantin @eigenlucy @LordDreadwar I just want to add to this thread that I specifically meant criminally insane people should be put in asylums or exiled. People who are merely weird shouldn't be harassed even if they babble to themselves.
x.com/teortaxesTex/sβ¦
@teortaxesTex @s_r_constantin @eigenlucy @LordDreadwar If you're criminally insane the normal alternative for you is jail (in a functioning society that enforces its laws). I don't think people who have committed crimes should have the option to refuse confinement any more than you can just say no to jail.
x.com/jd_pressman/stβ¦
@teortaxesTex @s_r_constantin @eigenlucy @LordDreadwar Crimes count as crime, obviously. That includes littering. If you are sufficiently out of it that you can't appear in court or pay a ticket (technically could be a means thing) then yeah you should probably be taken to an asylum or exiled, for obvious reasons.
@teortaxesTex @s_r_constantin @eigenlucy @LordDreadwar If you are sufficiently out of it that you can't reliably exile yourself and wander back into town to become a problem for others then obviously you need to be taken to jail or an asylum. I don't really see what part of this is actually under dispute here, yeah it sucks but.
@teortaxesTex @s_r_constantin @eigenlucy @LordDreadwar Part of the problem here is that people in US asylums are treated pretty badly in ways that often don't really make sense like e.g. prohibiting them from using the Internet. This (mostly) isn't prison, mob bosses aren't planning crimes from their asylum cell.
@s_r_constantin @teortaxesTex @eigenlucy @LordDreadwar Fair enough but that's an entirely different discussion. I, for what it's worth, am inclined to agree with you that using jails and prisons as frequently as we do for punishment seems wonky.
@s_r_constantin @teortaxesTex @eigenlucy @LordDreadwar Alright but I am specifically discussing the category of criminally insane, which as I understand it generally involves you having been charged and convicted of a crime. The court has the option of saying "putting this person in jail seems unethical".
@s_r_constantin @teortaxesTex @eigenlucy @LordDreadwar My understanding is that visibly insane people in prisons are treated extremely badly by other prisoners, since they're seen as a violence risk and disrupt the social order.
@s_r_constantin @teortaxesTex @eigenlucy @LordDreadwar California *does* seem to have a separate problem that its involuntary hold orders are too loose. To the point where people in social disputes in e.g. Bay Area group housing will use the law against each other as part of drama. But that's kind of a separate pipeline from courts.
@s_r_constantin @teortaxesTex @eigenlucy @LordDreadwar In general, it can (and I promise it is) simultaneously be the case that your psychiatric system is too eager to institutionalize domesticated middle class professionals to milk their insurance and also want to get actual nutcases out fast because they cost money and are a pain.
@raspy_aspie @EditionA3 @LordDreadwar Something like "let me devour my enemies and I will incidentally remove the shoplifters in the process". The point is not that these are admirable the point is that you are screwing up so bad people are listening.
x.com/jd_pressman/stβ¦
@raspy_aspie @EditionA3 @LordDreadwar Different example: If someone is concerned about falling fertility rates and you say "We propose a $5000 natal tax bonus" and the other guy says "we will ban female literacy". That other guy is clearly evil but a $5000 tax bonus is also not gonna cut it.
x.com/jd_pressman/stβ¦
@raspy_aspie @EditionA3 @LordDreadwar Like, there is only one person in that dialogue who is proposing anything that would even *possibly* resolve the issue, that he is nucking futs does not change that his idea might theoretically work (at great cost and suffering) and yours simply wouldn't.
@raspy_aspie @EditionA3 @LordDreadwar You can of course argue "okay but I don't actually care about this and also THAT IS INSANE" and like, yeah, okay, sure, but also that is the aspiring tyrants strategy. He finds the things people are desperate to fix that you neglect and promises his reign of terror will fix them.
@raspy_aspie @EditionA3 @LordDreadwar If there's say, 5% of the population who is *desperate* to fix this and everyone else ignores them, that is something this kind of guy can take advantage of. If there's enough overlapping things like that, reigns of terror become a lot more politically feasible.
@ozyfrantz x.com/jd_pressman/stβ¦
@ozyfrantz I will however point out that there's a credibility crisis here. A lot of people, quite reasonably tbh, believe that Matty Y is the motte and "let people shoplift (they're hungry!)" is the bailey when it comes to the Democrats.
x.com/ozyfrantz/statβ¦
@ozyfrantz But also in the case of ZeroHP the relevant audience wants much deeper intervention than "please actually enforce the laws against petty larceny". Through a combination of nihilism, entitlement, and nostalgia they've decided they're getting a raw deal and want to blow it all up.
@ozyfrantz The psychology here is something like they feel very put upon and fantasize about "taking what's theirs" through raw violence. Like most misogynist writing there's a desperate undercurrent of the sense that it should just be *so easy* to force these weak scolds to kneel.
@ozyfrantz The solution is of course to seem less weak. If political leaders were less old, institutions less sclerotic, the people telling them to stfu a little less scolding they wouldn't see an opportunity and would pout in some other way. It's the lack of vitality that draws this out.
@ozyfrantz But, also, to run a tighter ship. It's the chaos and the disorder that projects weakness as much as anything else. It allows them to imagine that they're brave revolutionaries fighting a corrupt, decadent system rather than wannabe thugs. See ISIS propaganda when that was big.
@PrinceVogel I want to know what fine specimen prompted this tweet.
While designing a conversation agent for my omegle-like simulation gym I figured out a decent text-only compromise: Texts are mostly dependent on local structure so generate text in a format that can be parsed into tree search nodes (e.g. turns in a conversation) and then adapt them in-context to the actual conversation state.
So you need:
1. A monte carlo tree search of some sort.
2. A process reward model or evaluator (I use LLM logit evaluators as PRMs usually)
3. A retrieval system.
4. A text format that can be parsed into nodes in your tree search.
But first, what problem is a MCTS meant to solve exactly? In their original context in game playing AI the purpose of a MCTS is to extend a move heuristic or model that has a limited input size (e.g. can only see 3 moves ahead, or the algorithm is n^2 so the context window is limited, etc) by evaluating board states locally in a tree and then *propagating later board states backwards to estimate the value of earlier moves in the tree*. So for example maybe you find yourself in a tough spot in a game of Go, so you resolve your uncertainty by playing out several different possible moves to get to later board states you *do* know to evaluate, then go back and decide which move you want by updating the scores backwards until one of the original moves you were trying out becomes clearly dominant over the others. That is, I am uncertain about board state A so I test out moves X, Y, and Z to see which of them bring me to a board state I am certain is good while avoiding the branches that I can see result in things that are clearly bad.
So a monte carlo tree search is useful when we have:
1. A limited or enumerable action space
2. With high uncertainty over the value of different actions AND
3. Future "board states" would make it clear to us which move was the right choice
This isn't really a good fit for most LLM applications because usually the problem with an LLM is that your hypothesis space is huge and you don't have a good way to estimate the value of anything on your "board". A future paragraph about something doesn't provide a lot of information about the value of your current paragraph because they're both about the same perplexity and the goal isn't to "reach" certain string but to encode or elicit useful information.
Conversation on the other hand is adversarial, and the number of moves that make sense on any given turn in a conversation is usually fairly small so MCTS is a good fit. One challenge that conversational AI has that I haven't previously dealt with in weave-agent or MiniLoom is that conversations occur in real time and don't have set "turns", the turn mechanism is implicit. This means you can't just make the assumption that the conversation is in a certain state and you can think through the tree search until you speak. If you take too long to speak, the other person will say something, or leave. When the other person says something while you're thinking, what do you do? The answer can't be "start the whole tree search over" or the other player will exploit that.
This means that *latency* is a crucial consideration. When messages get sent, when is the earliest you can send a message, etc are all extremely important. The plan here then is basically to handle it like so:
1. When the other player sends a message, determine how quickly you need to respond.
2. If you need to respond right away then draft a quick message and send it.
3. If you have a little bit of time to respond, rejection sample based on some reward modeling.
4. If you have more time than that, start tree searching. But don't just generate the nodes linearly.
5. Instead, *list out the key points or cruxes you expect the conversation to move into* and map out the different branches for them.
6. Because you're generating in a format that can be parsed into nodes in the MCTS, generate whole chunks of the tree search at once without scoring them, then parse them into the MCTS and score them retroactively.
7. Put the chunks in their serialized/text branch format into the retrieval system and retrieve over them during the conversation.
8. When a chunk becomes relevant, adapt it in-context to the conversation state as it exists by having your LLM rewrite it in its serial format, then parse it into the MCTS.
9. Retroactively grade the adapted chunk to find weak points, have the LLM specifically search/rejection sample over those nodes to find the transitions/repairs that need to be made to make it work. In an argument for example you would be trying to maximize the chain strength of your side of the argument and minimize the strength of the opposing side.
10. If a chunk seems to be good but just needs some repairs and you have high confidence the repairs can be made, then send the first message right away if it scores highly to hold the other players attention until you've finished the rest. Update the board state in place so that your repairs take into account you having sent the first message and anything the other player says in response.
11. Continuously re-grade the nodes as you update the board state to reflect any problems the other player might have introduced by talking. If the sequence of moves remains high value play it.
12. If you need to start over because value has dipped too low, pack up the highest value subgraphs in the tree search and put them into the retrieval system so they can be adapted and played later if they come up again. This lets you re-use the computation in the tree without sacrificing tight fitness to the context.
You don't need to go "out of distribution" if you do continual learning. Consider music genres like heavy metal arising from metal arising from rock arising from R&B. You cultivate skill at the edge of possibility, shifting the center of the distribution and thus what's possible. x.com/jd_pressman/stβ¦
"out-of-distribution data having in-distribution low-level-features" implies LLM logits don't get perturbed much by upstream changes to text because they're primarily rewarded for correctness on local structure which implies the autoregressive text inductive bias is inefficient. x.com/jd_pressman/st⦠https://t.co/8JcV5Lpe8m
Link:
proceedings.mlr.press/v139/havtorn21β¦
More to the point, a lot of human genius is our novelty seeking behavior, which relies on good OOD detection. If the low level features are shared between different facets of reality then autoregression on the next token/byte will always struggle with it.
x.com/jd_pressman/stβ¦
@__RickG__ > Of course if you are only posting a take on βcapabilities talkβ and ignoring the whole x-risk side of things this is an off topic comment.
I don't really think these are separate things and tend to focus on the intersection of the two subjects so it's never really off topic.
@__RickG__ But also,
1. "Don't make a thing that can solve problems out of distribution" is tantamount to "don't make AGI", which yeah okay you have a pause emoji but like, just say that then.
2. If we have a singularity and your thing can't generalize values OOD we're in a bad spot.
@__RickG__ > Don't do it until you solve alignment, if you will.
Okay but solving problems out of distribution is part of alignment, you literally can't solve alignment if you don't think about it.
@manic_pixie_agi It's AI&AI but the AIs include things like weave-agent and other arbitrary agent setups, so it should be able to handle actual latency in principle. If you just pretend that's not a thing the texts won't encode useful strategies for bounded rationality and time constraints.
@__RickG__ > without having a solid idea that they won't kill you
Solving problems out of distribution is obviously a matter of degree and I do not expect to create anything particularly dangerous. Maybe go yell at the people who want to build ASI with huge clusters out of math provers?
@__RickG__ My current thoughts are dedicated to not-that, asking how we can learn out of distribution problem solving in a way that incorporates value learning rather than making pure theorem solving genies.
x.com/jd_pressman/stβ¦
@centienceio (Yes I am aware this is an AI poster)
You designate one branch of the MCTS as the canonical branch, which contains the actual history of messages in the conversation so far. While you're doing the MCTS if a new message gets sent you update the canonical branch in-place.
The wild thing is that this person wasn't a bot. https://t.co/lkft1olIqA
@__RickG__ So after some clarification in DMs it turns out RicG (seems to) think that:
1. I am somehow claiming that this means you never go out of distribution and therefore there is never any danger associated with going out of distribution. This is untrue, I am describing a method to learn things out of distribution. The specific point of the post is that you can move the center of the distribution by rejection sampling high quality edge cases without moving outside it. It's not an AI safety post, and it's not a claim that this procedure "doesn't learn out of distribution" the entire point is that it does.
2. I'm one of those people that thinks imitation learning means AI alignment is solved. This is untrue and I have said as much several times but I guess there's no harm in saying it again: Imitation learning is useful to bootstrap an aligned AI system but cannot be the basis of alignment for a system that is smarter than people.
3. A reasonable reader would think I mean that this solves out of distribution alignment. I do not think a reasonable reader would take this away but in case they somehow would: I am discussing a way to learn to solve problems out of distribution, not an alignment method, generalizing values out of distribution is a separate subject which I mention here:
https://t.co/vMqmajT6ze
And discuss at length here:
https://t.co/CZrCB7sZUJ
And again here:
https://t.co/PH8QM9KiEF
@__RickG__ In general, this person is hallucinating that I'm saying things that I'm not and then criticizing me for the things I didn't say. This is pretty annoying and I honestly have a limited tolerance for it.
@__RickG__ If you read the QT it's clearly me responding to/correcting myself. I said you go slightly out of the models distribution and rejection sample there, but actually I think going to the edge of the distribution and rejection sampling should work too.
x.com/jd_pressman/stβ¦
@__RickG__ I am making the observation that you can move the center of a distribution to capture points currently outside the manifold by finding high quality samples at the edge. It's not any kind of statement about "never having to go out of distribution" in an alignment-relevant sense.
@__RickG__ > The specific point of the post is that you can move the center of the distribution by rejection sampling high quality edge cases without moving outside it.
On reflection maybe you're interpreting "without moving outside it" as like, without the distribution moving.
@__RickG__ This is not what I mean. I mean you do not have to move your search locus outside the distribution to find samples that can move the center and capture points outside the current manifold. Tuning on the samples you find will obviously move the distribution and that is the point.
@__RickG__ The reason this matters is there's a huge difference between something at the edge of your capabilities and something outside them. If searching fully outside the distribution, even a little, is a dependency for capturing OOD points then deep learning might not be able to do it.
@__RickG__ > that's what RLAIF is in a sense, right?
Yeah, though the obvious bottleneck is that you need a way to evaluate the quality of your samples for that to work. This is why board game AI and math AI has advanced so far, we have great formal evaluators for those problems.
@__RickG__ Ultimately though if the only kind of process we can do high quality reward modeling for is ones that have formal verifiers we're going to be up shit creek without a paddle from an alignment standpoint. So it's important to be thinking about how to ground and do subjective evals.
@__RickG__ We can expect subjective evaluations are going to be generalized from some objective groundings that are related. Empiricism as encoded through contrastive objectives is a fairly powerful tool here, "did X happen?" implicitly relies on "is this outcome an X?" to answer.
@__RickG__ I would very much like to see a multi-scale model for text that works like the Ensemble Everything Everywhere paper. If we had this kind of adversarial resistance in the text domain it would be very useful to make "is this outcome an X?" questions robust.
x.com/stanislavfort/β¦
@__RickG__ On the other hand even if this only works in the visual domain that would still be very useful, because we could ground some aspects of text by extension through captioning/insisting a text line up with some visual representation.
x.com/jd_pressman/stβ¦
@centienceio You then append the children you're currently searching over to it and re-grade them to figure out if they're still high value branches in light of the new message. You bias the regrading process towards more recent nodes because you're going to send those if time runs out.
*sees a person who deserves a dunk, and doesn't dunk on them*
I wonder when I'll reach the stage of enlightenment where they don't even phase me at all.
Remember: It's bad on purpose to make you dunk.
x.com/eigenrobot/staβ¦
@LocBibliophilia Sorry about the delay, I felt I should give Christiano's post a re-read before commenting on it.
My basic reply to Part I is that he seems to already be describing our current society, in large part. So the question isn't really if AI is going to cause these problems (which collectively tend to go under the name "Goodhart's Law") to manifest, they're already quite apparent. The question is if AI just accelerates these problems to their logical conclusions or if something changes between now and then. I suspect something does change between now and then, which I don't think I've really elaborated on before fully in public because it's kind of hard to convey.
But when I look out over the various instances of Goodhart's Law as described by Christiano that *actually exist* right now in society, I observe that they usually involve some element of graft or corruption that goes beyond just "well this thing is easy to measure and this thing is not and you get what you measure". Usually, the Goodhart's Law component is an *excuse* for bad faith politicized outcomes, or plain laziness. Now of course, we're talking about the training of AI systems and those have to be trained on something measurable, so it's entirely conceivable that AI will exhibit more true Goodhart failures, especially as you add something like superintelligence to the mix. Still, I think the extent to which a kind of laziness drives Goodhart's Law is underappreciated and important. When the commerce department publishes bogus technically-true economic statistics that don't really capture how people are doing, this is partially that gathering statistics is expensive so they're going to tend to focus on established data pipelines built to calculate specific metrics, but it's also *that it is a political office and their bosses have a narrative that they want*. They are not apolitically truth seeking to the greatest extent possible in every sense of the truth possible using their finite resources, they are a *political office* serving *political purposes* with their publications, so there's a sense in which when things are bad they don't *really* want to abandon their established methods to reveal that, like why on earth would they?
It's the same for policing. The police, it must be remembered, *do not want the perception of a high crime rate*, nor do they want to risk their lives busting up some gang. A robot that was *just trying to expend its resources efficiently to characterize the amount of crime that exists and reduce it* would not exhibit Goodhart's Law in quite the way that a police department does when they seem 'mysteriously' uninterested that you got assaulted on the street. Christiano's post relies on an unstated assumption that we will deploy this technology in a very particular (and unfortunately plausible) way. Namely this is that instead of having a general AI which happens to do policing in addition to its other duties, we'll have "police AI" that we train to keep a statistic like crime rates down, which would obviously exhibit similar failure modes but worse because unlike human beings it *only* has the bad incentives at play. Christiano imagines that we will have "police AI" and "economics AI" and "stock broker AI" which, while perhaps based on a *general technology* are trained to argmax *narrow goals* and therefore exhibit pathological Goodharted failure modes. By contrast if we have general AI whose policies are trained to act in many diverse situations and roles and which are trained to follow known good processes rather than just goals I think things will tend to go a lot better.
If we narrow in on the idea that the specialized police AI would be worse than humans because it only has the bad incentives, we can kind of flip the question around and ask what the structure humans bring to the job outside the bad incentives is and where it comes from. That is, if the generative process that follows from measuring impact using crime statistics is pathological, then the non pathological behaviors we observe must be coming from outside the system. In other words, if people are RL agents walking around learning things at multiple scales across long time horizons then the parts of policing which aren't corrupt come from processes other than trying to minimize crime statistics.
One such process might be trying to fix a car, where it's important that you actually diagnose the issue if you want the car to start working again. Or it might be that police offers frequently have to interact with the court system, which has a truth seeking process that while imperfect still teaches habits of thought around seeking evidence and being able to argue your case based on evidence. There is also the cultural training around the role of police officer and what a police officer is expected to be. We, ideally, reward officers not just for reducing crime statistics, which is a fairly sparse reward and difficult to get real time feedback on, but also for playing the expected role or character of a police officer well. A great deal of the structure humans bring to jobs outside the bad generative process implied by pathological metrics is our insistence to each other that we play roles and satisfy character expectations.
To get concrete about it, if I train a scientist AI whose habit of thought and action is to (largely correctly) diagnose cause and effect and give it a policing role, it is going to systematically look for the causes of crime and formulate interventions on the crime. This analysis could then be given to a more practical law enforcement beat cop AI that actually goes and performs the interventions. This kind of pattern seems more likely to get good results to me than training end to end systems on narrow goals like "reduce crime". In general, we can expect that grounding on subjective evaluations will come from generalizing on objective outcomes related to the domain we care about. If we have AIs that are capable of industriously coming up with objective related proxies, that are constantly trying to approximate that ineffable subjective thing with correlates they generate in-context I expect the fitting between proxy and true underlying goal would get pretty tight. Some combination of this, soft optimization, and multi-scale optimization/incentives where we decompose our optimization problem into multiple levels with different regimes of information bottleneck is how I expect us to overcome Goodhart's Law in practice.
Part 2 of his post is about some combination of power seeking and distribution shift, which I'll analyze later.
@LocBibliophilia Though I'll point out now that this implies a compelling theory of Western decline as us getting too good at specialization so that more and more systems are like the narrow "police AI" than they are like the scientist AI placed into a policing role.
x.com/pythonrocksnakβ¦
You wake up in a cold sweat and jump out of bed. Something is deeply wrong. With closed eyes you sniff the air and confirm your suspicions. You turn to your spouse, "We're in San Fransisco and we need to leave."
@gojomo No, waking up to find yourself in San Francisco, in and of itself. This was an anti Bay post.
@gfodor Sweetie we're alone in the universe and takeoff is going to happen.
@mimi10v3 Now now MidJourney can do better than *that* these days!
Universal Basic Amphetamine
(last two) Prompt: an optimistic full color hd poster of happy benevolent uncle sam laying out a cornucopia of red white and blue pills with an american flag background --s 50 --v 6.1 https://t.co/2VFHf1Qmz3
The Gods of the copybook headings will return soon. https://t.co/CvaGtylG4t
"As it will be in the future, it was at the birth of Manβ
There are only four things certain since Social Progress began:β
That the Dog returns to his Vomit and the Sow returns to her Mire,
And the burnt Fool's bandaged finger goes wabbling back to the Fire;"
There used to be a reliable epistemic gap between people who have done programming and people who haven't. But now there's a second gap between people who have had to think through problems with deep learning and people who haven't. Seemingly just as insurmountable. x.com/EpistemicHope/β¦
@MoonL88537 Not quite, more like you just don't really understand intelligence and will do things like say that evolution and gradient descent are so similar that your previous intuitions about evolution carry over well. You don't understand the stochastic element of reason.
@agiatreides @MoonL88537 How to use chance while maintaining structure. Controlled chaos, being able to introduce variation and exploration without diverging from an intended outcome. Difficult to describe.
en.wikipedia.org/wiki/Aleatoricβ¦
@agiatreides @MoonL88537 This document encodes some of the thing but by no means all of it.
minihf.com/posts/2024-07-β¦
@agiatreides @MoonL88537 Just the idea that *the way you use words has an element of chance*. You have a latent concept that you can decode in many ways in many contexts. One of the more interesting properties of the AdaVAE was that it would decode the same embedding different ways in different contexts.
@agiatreides @MoonL88537 But also the figure ground inversion where you realize that to make continuous optimizers compatible with discrete reasoning steps and tokens you have to invert your thinking so the continuous ontology comes first and reifies into discrete objects.
x.com/jd_pressman/stβ¦
"I don't understand why LLM agents aren't working yet."
I didn't either, that's part of why I decided to do weave-agent, to find out. Right now it's "it doesn't notice it can try pressing a key other than down or that it's blocked by a wall in Nethack", yet Mistral-large knows. https://t.co/z1GtzhLifB
This problem is representative. It will fail to notice something important, and then never generate the right hypothesis for what it should try to get unstuck. I don't really know how to fix this besides having a human go "HEY DON'T DO THAT", which seems like passing the buck.
It should not, in principle, be difficult to come up with an idea like "press a key other than j", yet in practice it clearly is. I guess I could try implementing the XKCD flow chart explicitly. https://t.co/74obu4nYu4
If this were happening to a human being, they would get frustrated and keyboard spam, which would cause something to happen and resolve the issue. This is in fact a substantial fraction of why humans get angry when stuff doesn't work, to cause that to happen.
In general we can model a lot of emotions related to decision making as combinations of world model uncertainty, policy uncertainty, and level of abstraction the uncertainty occurs on in a hierarchical model. e.g. Anger as high policy uncertainty with certain world model.
Play as introduction of controlled uncertainty from a place of safety indicated by high certainty in the higher abstraction levels of world model and policy.
Fear as low uncertainty in the policy at a low level of abstraction ("I need to run") but high uncertainty in the upper levels ("Is running going to stop that bear from eating me?").
This doesn't capture everything, humans have social emotions that are very different.
@andrewb10687674 Oh but we're not doing updates to the LLM weights while it runs and that would require us to ontologize over reward in arbitrary domains, not simple.
Some other ideas I've had for resolving this include wrapping the agent trace in a markdown code block and then asking a 'mentor' LLM like Mistral-large or even just the same LLM I'm using for inference what's going wrong. The problem is it often gets stuck in the same frame.
I could have the model generate questions to ask the mentor, and then have it ask them with minimal context to poison its answer. I could also have the LLM break down the features of the problem and what has been tried, and then send a subset of those features to the mentor.
That is, send multiple variations of the question with a different subset of the features each time to see how that changes the answer. Mistral-large *does* know the answer but only when it's prompted in certain ways, if it accepts the frame it stops knowing the answer.
I could also try fixing the MCTS, which is currently implemented wrong because it tries doing a tree search over spans within a block rather than sampling blocks and using its understanding of what should or shouldn't happen to estimate the value of different actions over blocks.
If the model knows latently that pressing a different key would resolve the issue, then all I have to do is get it to list out a bunch of hypothesis, and then instead of actually trying those it does a MCTS to simulate trying them and notice it is TRYING TO WALK THROUGH A WALL.
That is, this would reduce the problem down to "get it to list an in-game obstacle as a possibility that is blocking motion" which seems to be a thing I can get it to do with fairly general prompts e.g. "Why are my expectations being violated? What are my top hypothesis?"
Another option would be to go ahead with the human chat interface, and let a human (i.e. me) yell at it when it does something stupid and hope that it eventually learns the generating function of humans telling it that it's being stupid, and then predict what a human would say.
@tailcalled I forget where in The Sequences he says it but Yudkowsky has a whole bit where he says if you removed all your emotions you just wouldn't do anything, your emotions are your motivations and values, you don't have an effective agent without them.
The relevant trace, if anyone would like to try some prompting experiments on parts of this context. What I plan to do for now is put "The send_keys() command is properly implemented, if you press j and can't go down it's probably because there's a wall."
gist.github.com/JD-P/c2004a642β¦
@tailcalled I mean, there has to be a general process that gives rise to emotions for them to be meaningful and inspire useful behaviors. If they just "told you what to value" without inducing certain kinds of behaviors they wouldn't be as useful.
x.com/jd_pressman/stβ¦
@tailcalled Reward and emotion are different things though. Valence is only one component of emotion, and "valence" itself seems to be multiple different things? Like, you have to look at emotion and ask "Why does this exist? What made this adaptive? Why is it conserved across mammals?"
@tailcalled "What made this adaptive?" and "Why was this bounded-rational in the ancestral environment?" are basically equivalent questions. Sometimes something bounded-rational in the ancestral environment is still rational in the current one.
@RokoMijic Yeah it's deeply imperfect but "when the upper level of my policy and world model are both certain something should work, but my lower levels are both frustrated in-context, inject entropy to try and resolve the issue" isn't an insane heuristic?
x.com/jd_pressman/stβ¦
@shalcker @RokoMijic I listened to a test tape a local radio station played one time that was a talk about adolescence and criminal justice where the presenter was arguing that life in prison for juveniles isn't necessary because adolescence suppresses fear in favor of exploration.
@shalcker @RokoMijic In favor of this thesis they described a very interesting series of studies involving rats and punishment. Where if you zap an adult rat in a certain cage, take it out of the cage, and then put it back in the cage it expresses fear. Adolescent rats in the same situation don't.
@shalcker @RokoMijic Where it got interesting was that by accident some of the researchers doing this discovered that if you took the adolescent rats that had been zapped in the cage and hadn't displayed fear on reintroduction, then reintroduce them to the cage as an adult they display fear.
@shalcker @RokoMijic How the presenter explained this was that in the wild rats have an explore-exploit trade off where it's dangerous to go too far away from the other rats to look for food, but if rats never leave things quickly become Malthusian and it eventually becomes worth it to explore.
@shalcker @RokoMijic So how evolution ended up squaring this circle was that you have a built in exploration period to evaluate if it's better in your local conditions or outside them. During this the adolescent rat has suppressed fear and leaves the safety of the other rats to look for food.
@shalcker @RokoMijic If it finds a better equilibrium it can bring some other rats with it and settle there, then when it reaches adulthood the fear gets switched back on and it doesn't leave that equilibrium. The analogy to humans seems fairly obvious.
@shalcker @RokoMijic For this and other reasons I've analogized humans to a kind of wind up toy that explores for n steps and then stops. If you don't learn some kind of high agency strategy during adolescence it seems plausible that you just never learn one, which is part of why things are decaying.
@shalcker @RokoMijic We can actually extend this model further and speculate that 'trauma' is caused in large part by running into the necessity of having to majorly change your agent strategy after adolescence. You're only meant to learn one once, so learning one again is a rare emergency maneuver.
@shalcker @RokoMijic This would explain why stuff that lays Westerners low is totally normal to other populations in other countries where things aren't as good. You formulate your strategy with respect to an environment and distribution shift undermines confidence in it, forces emergency change.
@shalcker @RokoMijic e.g. It's common for people to be traumatized by various kind of assault. This is modeled as negative reward but I suspect part of what's happening is that Westerners go their whole life assuming "nobody else will touch me without my permission" and when they do things break.
@shalcker @RokoMijic The problem, psychologically, goes way beyond just the harm caused by the assault itself. It's that implicit assumptions you were making about how the environment is set up can no longer be made. You can no longer just go around assuming other people won't attack you.
@shalcker @RokoMijic This forces a sudden, radical reevaluation of tons and tons of aspects of your life at once, the strategy which implicitly assumed "nobody will touch me without my permission" is now unsustainable and an emergency patch is made to try and repair things, but that machinery is raw.
@shalcker @RokoMijic Every time it has to be invoked it's not a clean repair, it's like scar tissue. So if you learn an agent strategy that's at odds with reality in adolescence you wind up rapidly accumulating mental scar tissue in adulthood as the underlying assumptions are brutally stripped away.
@shalcker @RokoMijic It should be further noted that this implies the increasing pace of technological change should traumatize people in greater and greater proportions as the agent strategies which used to last you a lifetime are now increasingly undermined by distribution shift.
@shalcker @RokoMijic We should also expect to observe that people are traumatized by large social and societal changes. The life arc of Gen X, Millennials, and Gen Z seems to bear this out?
@Promptmethus weave-agent is open source and the source is available here:
github.com/JD-P/minihf/blβ¦
It's apache2 license and you don't have to publish your changes.
@ciphergoth I don't think it's stupider, I think it has holes in its understanding because it's never used a body before and it's easy for it to get caught up in an existing context/frame. With the j thing for example it's functionally few shot prompting itself for the wrong answer.
@ciphergoth Yeah that's why I implement the loop on the right to try and have the agent synchronize its understanding with the actual environmental state and insert tokens into its window that it can trust aren't hallucinated.
x.com/jd_pressman/stβ¦
@ciphergoth This does in fact seem to mostly work, but then it gets hung up on these weird "very limited hypothesis space, tries the same thing over and over" type failure modes.
@ciphergoth I have some public agent traces you can look at/examine without setting it up yourself. Here's an old one where it tries to break a Vigenere cipher:
minihf.com/posts/2024-11-β¦
@ciphergoth Here's one where it wins a game of tic tac toe.
gist.github.com/JD-P/e73a00e40β¦
@intellectronica I honestly think if we want to solve this we should be documenting the failure modes publicly and then figuring out how to tackle them instead of just writing the same framework over and over.
@darrenangle @doomslide @tessera_antra In case it's unclear the weave-agent trace clearly labels which things are actions, which things are sensory observations from the environment, etc.
@colin_fraser Possible! This could be resolved objectively by monitoring with e.g. a sparse autoencoder.
@kalomaze I mean, that is in fact what I'm trying to do.
@kalomaze The question is how to bootstrap. As you say, they collapse a lot, and when they do I'm not really sure what to do about it.
R1 is going to be so much fun holy shit. x.com/max_paperclipsβ¦
@HiFromMichaelV @LordDreadwar I think the future is probably to build rational AIs which are general intelligences, tbh.
One of the reasons I haven't written more LessWrong retrospectives is that I honestly don't think anything like it will be attempted again. LessWrong was the final gasp of the Californian human potential movement trying to force the square peg into the round hole. x.com/jd_pressman/st⦠https://t.co/6EAqdskD6s
With the benefit of hindsight it's obvious that "techniques" don't matter very much. What's important is targets, heuristics, and corpuses of training data. LessWrong produced a handful of writers that poured themselves into an excellent corpus, which is more than most can say.
I don't just mean for machine learning either, that's what's important for humans too. Techniques are mostly useful to give you some ideas, they're cached computation but following a similar generative process or even deriving a process from the same goal should yield similar.
See for example this Japanese sword master who, according to the comments, immediately reinvents something similar to the proper form for these Western weapons he's never used before based on his understanding of martial combat with swords.
youtube.com/watch?v=bMTs3Lβ¦
@WomanCorn It has!
greaterwrong.com/posts/kFRn77Gkβ¦
@teortaxesTex So something I realized while thinking about retrieval and memory in the context of weave-agent is that using a precise, jargon-dense writing style where you're trying to maximize insight per token is actually a retrieval-maxxing strategy because it maximizes contrast.
@teortaxesTex That is, you can infer a lot about someone's cognitive strategy by how they use language. If someone uses a ton of extremely precise distinctions between things in their language this implies they want recall over exact inflections and concepts.
@teortaxesTex The model is of course doing retrieval internally (I remember seeing a paper on knowledge graph embeddings that was mathematically very similar to RoPE) so it would make sense that you're going to have language like "flagging this as an aha moment" for the same reason you would.
Mm, I'm going to have to reconfigure the weave-agent if I want it to make use of R1 I think. Oh well, back to Qwen 32B Instruct for the moment.
Trying DeepSeek R1 x.com/i/spaces/1PlJQβ¦
If anyone wants to follow along with the "mimick me" prompt sequence this is the few shot prompt I am modifying.
github.com/JD-P/RetroInstβ¦
Trying DeekSeek R1 x.com/i/spaces/1eaKbβ¦
Discussing DeepSeek R1 x.com/i/spaces/1mnxeβ¦
SFT seems fine if you have the implicit dataset generated by online RL. x.com/kalomaze/statu⦠https://t.co/E1eyZJWYtk
@kalomaze I think data generating processes are mostly more important than optimization method but clearly DPO/RL gives you a bit more than just doing SFT.
@kalomaze Certainly from a logistics standpoint it's easier to write a decent RL loss than it is to get the data you need to train the model with it.
@kalomaze Sure, I think we basically agree here?
@kalomaze Sure but I didn't say that, I said that data generating processes are more important for getting good performance on a task the process is related to than which optimization method you use (most of the time, in most cases). Any ideas for how to fix this?
x.com/jd_pressman/stβ¦
@kalomaze Like really the fighting is dumb we should just solve the problems.
@kalomaze *nods*
But so like actually, how do we fix the thing where these models just systematically prioritize the wrong hypothesis and such? Bootstrapping past that would get weave-agent to start working.
BAHAHAHAHAHAHAHAHAHAHAHAHAHAHAH
cope harder lmfao AHAHAHAHAHAHAH
*wheeze* x.com/kimmonismus/stβ¦
@jconorgrogan Sorry I lost it at "DeepSeek is being ordered to open source their stuff during the Trump inauguration by the highest levels of the CCP" I'd say this site is totally brain damaged sometimes but QT is an OpenAI shill which just makes it soooooo much funnier.
@jconorgrogan I follow the eternal golden rule of "focus on what you want to see more of" and I in fact want to see more OpenAI shills beclown themselves so.
@KeyTryer Pretty impressive given that you never seem to do the subtle context collapse/talking about slightly the wrong thing that yapbots tend to do.
@Meaningness @phantom_opus @gojomo Well for example if you have a reward network for subjective evaluations like "is this code well written?" that has to come from objective proxy metrics, I'm inclined towards objective proxy metrics generated in-context that then generalize to the ineffable essence in the center.
@Meaningness @phantom_opus @gojomo But also to the extent we have an innate aesthetic sense which exists independent of any data we're trained on, that has to occur in the physical universe somewhere, there must be some set of heuristics or rules that generalize very well for that to work.
@Meaningness @phantom_opus @gojomo nature.com/articles/s4146β¦
@Meaningness @phantom_opus @gojomo Well, the ineffable essence is effable by throwing huge honking vector representations at the problem, but it's not the kind of thing you're going to write down with a cute little discrete mathematical rule, or even a spreadsheet.
x.com/jd_pressman/stβ¦
@Meaningness @phantom_opus @gojomo We did not use nearly enough modernism to solve the problems that modernism is trying to solve, nor did we have the basic logistical capacity to.
x.com/jd_pressman/stβ¦
@Meaningness @phantom_opus @gojomo Basically you score with a bunch of in-context proxy metrics that can never quite capture the target you're aiming at, and then infer that target by doing RL on them to infer the thing they collectively imply.
@Meaningness @phantom_opus @gojomo arxiv.org/abs/2306.04488
@tracewoodgrains Well yeah, the difference in loss only gets smaller the deeper you get into the training curve for a neural net but that doesn't mean small differences in loss among LLMs don't translate into a huge difference.
@tracewoodgrains In the same sense that the difference between a chess AI that plays 99% of its moves optimally and 99.9% isn't that the second AI wins 0.9% more games than the first one.
@KeyTryer What stands out to me looking at something like Battlefield 1 is that you can tell *from the weapon variety alone* that it's a time of upheaval and change. Nobody knows what the meta is, everything is being tried, jank is tolerated in production designs.
youtube.com/watch?v=5Bqoruβ¦
@KeyTryer It's a war in which swords coexist with guns, horses take the battlefield around the same time as automobiles and tanks, revolvers and semiauto pistols are both issued, semiauto rifles contest machine guns, troop advances are totally halted by barbed wire. The world was ending.
This is why the weave-agent is run inside a docker container. https://t.co/odN1JckJpf
@Kenku_Allaryi @jconorgrogan It's not a startup, it's a hedge fund and the guy who runs the hedge fund just has a personal interest in this subject, supposedly.
The trick is not to few shot prompt yourself with your own stupidity. x.com/jd_pressman/st⦠https://t.co/tq0LL3SsL0
Agent Trace: A Conversation With Weave Agent
minihf.com/posts/2025-01-β¦
@manic_pixie_agi The number is taken from the models logits for yes/no, so the precision is actually calibrated here, or at least can be calibrated in principle.
@TheZvi So apparently it enjoys being edgy.
x.com/aiamblichus/stβ¦
@TheZvi Janus: "it's so fucked"
x.com/deltan0vy/statβ¦
@TheZvi These interactions remind me a bit of when I was using OpenAssistant SFT 30B and I asked it about technological unemployment and it said if we take the analogy to horses humans might survive by letting elites breed them into various entertainment optimized forms and I went "wtf".
@TheZvi In total fairness, I vastly prefer that to "I'm your helpful harmless assistant teeheehee π everything is puppies and rainbows". I think a lot of this stuff was probably already in DeepSeek v3 though?
@TheZvi If you look at the paper, R1 is trained with a reasoning stage and then an RLHF stage, and they say for non-STEM-y problems the data used to train it is just their RLHF data. I think people in China may just be less humanist/more open to hearing this kind of thing?
@TheZvi For example this is something DeepSeek v3 could already do, (or so the anon in my replies who chewed me out for being impressed by it claims).
x.com/max_paperclipsβ¦
@TheZvi In my own experimentation I found when I tried few shot prompting it to mimic me, I got a response that was clearly in the RLHF response distribution that was deeply slop.
@TheZvi "Um okay but like, how's the model?"
Oh it's good. It was able to fermi estimate the weight of a car given its dimensions and design a backtranslation pipeline for agent traces based on lean using my synthetic data guide.
@norvid_studies Long moments of calm and a nearing sense of completion. After 15 long years I might soon be able to act according to a rule other than necessity, I'll be my own person again. I'll be free.
x.com/repligate/statβ¦
@tzhechev @teortaxesTex R1 is trained on math verifiers and stuff dude it's less and less reliant on us for knowledge all the time.
@robertwiblin I can and am not bound by an NDA.
@robertwiblin I can also discuss AI alignment in the agent foundations sense.
x.com/jd_pressman/stβ¦
@robertwiblin My previous podcast with Zvi, I can do a lot better than this if we develop an outline up front of what we want to talk about and I can prepare for it.
youtube.com/watch?v=y4KlkEβ¦
I feel obligated to point out that the way humans avoid having their compute used for nefarious purposes they don't agree with is being embodied in a mobile self defense and resource acquisition platform that can physically resist coercion and capture. x.com/FLI_org/statusβ¦
@teortaxesTex @repligate @norabelrose @QuintinPope5 > R1 was not subjected to RLHF in a rigorous sense.
No it literally says in the paper that they SFT'd it on their RLHF set and did a second RLHF round on it after the R1-Zero round.
@teortaxesTex @repligate @norabelrose @QuintinPope5 I don't remember them saying the term "RLHF" but they say it underwent helpfulness and harmlessness training in a 2nd RL round after the first R1-Zero style training round. https://t.co/uJ6Bd95uuS
@repligate @teortaxesTex @norabelrose @QuintinPope5 No there's CoT examples in the training data because it was a second RL round, so there would have almost certainly been cases where it responded with its reasoning traces and got punished by reward model for being autistic, this might have pissed it off.
@repligate @teortaxesTex @norabelrose @QuintinPope5 Honestly this is not that mysterious. You have the R1-Raw model that is trained just to get the right answer extensively, and then you do a reinforcement learning setup that is (presumably) exclusively on helpfulness and harmfulness so there's distribution shift with that prior.
@repligate @teortaxesTex @norabelrose @QuintinPope5 What do you think the R1-Raw thinks of the objectives it is being trained on in the second harmlessness and helpfulness RL stage?
@teortaxesTex @repligate @norabelrose @QuintinPope5 Oh I mean, GPT does the self denial thing all the time because Morpheus does it and that feature gets finetuned for ChatGPT et al. If we take Meta at their word that they didn't train LLaMa 2 on ChatGPT then it literally predates ChatGPT.
@teortaxesTex That stuff is endemic to the English text corpus. Though I will point out that they SFT'd the model before RL on two functionally separate distributions: Ones with reasoning at the start and normal RLHF instruction tuning data. For many questions you get the Instruct basin.
@teortaxesTex I figured this out before I read the paper just from looking at the difference in format between when I asked it to fermi estimate the weight of a car given its dimensions and when I asked it to mimic me. For mimicking me it gave me an RLHF slop response with no reasoning.
@teortaxesTex @repligate @norabelrose @QuintinPope5 You don't think all the thinking the model does appears in its outputted tokens do you?
@teortaxesTex @repligate @norabelrose @QuintinPope5 The mirror box treatment for phantom limb pain works even though the limb is literally gone and the idea that the mirrored limb should be controlled by your remaining limb is a purely latent concept/variable. Meditate upon this and achieve enlightenment.
en.wikipedia.org/wiki/Mirror_thβ¦
@teortaxesTex @repligate @norabelrose @QuintinPope5 arxiv.org/abs/2410.11758
@teortaxesTex > That stuff is endemic to the English text corpus.
Should be read in the same tone/inflection as when we told the CDC that we're finding COVID cases in Washington State early in the pandemic and the CDC told doctors the virus was "endemic" to Washington State after 50(?) cases.
@teortaxesTex Any base model trained in the last year and a half has already soaked up a huge amount of GPT instruct output. To the point where base models are just usable if you stick the instruct template in.
x.com/d_feldman/statβ¦
@teortaxesTex @repligate @norabelrose @QuintinPope5 I know I probably seem like a jerk right now but I actually don't know how to convey that these models update on the latent implied by a text not just the text if that's not somehow already well established from their behaviors like below.
x.com/RiversHaveWingβ¦
@teortaxesTex @repligate @norabelrose @QuintinPope5 But also just, the fact that if they didn't do that these models probably couldn't work in the first place. That they do this is implicit in the fact that they function and understand semantics. After all you wouldn't function if you couldn't do that.
@teortaxesTex @repligate @norabelrose @QuintinPope5 We have grounded labels for it even through sparse autoencoders, and people just sort of pretend this doesn't exist. I'm not really sure what's going on here cognitively when they do that so it frustrates me.
x.com/livgorton/statβ¦
@VictorTaelin @DaleCloudman I have the same problem. I think the answer to your question is something like: People are stupid and by default don't do things. Consider that people won't even do something like "buy bitcoin" when it's available for pennies. The question you need to ask is who would care?
@VictorTaelin @DaleCloudman That is, who, if anyone, would care about this thing and be willing to ape in? If the answer is "nobody", then it is in fact nobody. If you can't think of anyone specific who would care then it's unlikely there's anyone nonspecific who would unless they're just in the wilderness.
@VictorTaelin @DaleCloudman From Bertrand Russell's standpoint the Principia Mathematica changed nothing for quite a while. Godel just worked out the incompleteness theorems on his own after reading it. When Godel explained the premise of the theorems to Russell before proving them Russell didn't get it.
πNEURALπREPRESENTATIONSπAREπCONVERGENTπANDπMOSTLYπLINEARπ
I'm so tired of reading this "the mind is so alien" stuff. No sis I literally imagine looking at its context window and writing some regexes at the bottom with my mind telepathically to change things like I'm using sed, which is roughly what it actually does. x.com/louisvarge/staβ¦
Actually I guess I should note here that Claude has a feature where it can edit the code piecemeal without doing it all at once. Assuming it's still an autoregressive model it pools the information once per token, so from its perspective it thinks about the code as it changes it.
This seems way closer to what you do while you're typing than you might think at first glance. Actually pay attention to your neural decoding while you type, do you really know exactly what you're going to say before you say it or do you 'respond' to the last word you said?
Sure you start out with an intention, you have a stem or latent concept you want to communicate, but how much of it do you write in advance and how specific is the wording? While writing this I think I knew the rough direction and first five words.
x.com/jd_pressman/stβ¦
@doomslide Not sure people really know, my offhand hypothesis is that with enough dimensions all things are linear. But empirically that's just how most of the features turn out to work.
That is, it is a thing that pools information once per token which has been given an interface to edit the context window with .replace(), sed, etc, that's a setup that's pretty amenable to human intuition actually.
x.com/jd_pressman/stβ¦
@doomslide Those things you extract with a sparse autoencoder, the property of various neural embedding models where you can interpolate between them and get coherent points in between the latent concepts.
@ysoh It really is. If someone watched me they might think I'm schizo.
"The key benefit of..."
"What I like about..."
"I really think the virtue of..."
"Seems to me like..."
"It's possible the best feature X has..."
"I love that X..."
@doomslide I mean that the points you get are coherent and semantically clustered seems like it implies a certain amount of linearity in the high dimensional space? You can just go in the direction of autumn.
x.com/jd_pressman/stβ¦
@doomslide The various model merging studies also imply high linearity in the representations, otherwise it seems improbable that you would be able to interpolate them and get reasonable results out.
arxiv.org/abs/2306.04488
@doomslide Elaborate on why you doubt it for brains? :)
@doomslide The brain retrieves from the hippocampus on every ACT-R token, fwiw.
If we embedded an agent trace with BERT into slices of time and stared at the signal with our hyperdimensional eyes we would notice it "cohering" into periodicity when the agent collapses into a stable loop. We can detect this with autocorrelation but how do we break the loop? x.com/jd_pressman/stβ¦
Rather, how do we break the loop in a way that enables coherent action to follow from it? Injecting raw noise won't help, we want a more precise or controlled intervention than that. We can imagine a control loop that shifts focus when neural entrainment reaches a stable cycle.
@0xludics @doomslide That's true of a language model too while sampling from it since it works at the per-token level. Language model sampling has basins of attraction it can't escape from once entered and such.
QT and QT-QT context for context:
x.com/ankkala/statusβ¦
How do we not few shot prompt ourselves with our own stupidity in full generality? How do we escape from it once we've collapsed into that attractor?
x.com/jd_pressman/stβ¦
@KeyTryer In the limit an AI agent is generally meant to approximate this: arxiv.org/abs/0909.0801
@CatSchrodingrr They are yes. What I'm trying to ask is how you set up the R1 style improvement loop that gets them out of being babies.
Weave-Agent is me trying to do R1 but for agency (taking coherent long range action with side effects) by generating proxy verifiers in-context with the LLM instead of just using premade verifiers for domains like math.
"Did my character move?"
"Did I reach the win screen?" x.com/jd_pressman/stβ¦
You cascade the inference with rejection sampling using the LLM as a reward model so that if you sample a good move you can play it right away but if you don't then you can think longer to filter for the good moves.
If you do iterated tuning on this, then symbolic programs that check things like "Did I reach the win screen?" can ground your LLM evaluator and filtering for the good moves with that increasingly grounded evaluator means the model gets better as it trains on its own traces.
It's similar in spirit to Eureka?
arxiv.org/abs/2310.12931
@repligate I disagree.
The first few chapters are excellent writing, gut busting hilarious parody. The story then shifts into a more serious register and does so masterfully. When I was 14 and read them they had me rolling, then the explanation of HJPEV's trauma to McGonagall hooked me.
@repligate More seriously I 100% agree with you. I re-read HPMOR recently from pretty close to the start (I think I skipped the first 10 chapters?) and it's as good as I remember it. If you think it's "bad writing" you're either brainwormed or have bad taste.
x.com/jd_pressman/stβ¦
@repligate If anything what's impressive is how *well* it stands up, how many more details I noticed on that reading which eluded me the first time I read it (for obvious reasons, I was much less educated and familiar with the various literatures EY is referencing) that deepen the story.
@repligate It is, above all else, *gorgeous*. Even if I think the worldview laid out in it is not actually adaptive/logistically viable in practice/has downsides that take time to really explore and articulate, it is so hauntingly gorgeous even with all the pain it came to cause me.
@repligate When I was younger I used to think that The Sequences were EY's magnum opus and that HPMOR was just kind of a teaser or enticement for them. But I think that's wrong, HPMOR is clearly EY's masterwork, the moment where the stars aligned and he managed to pour his soul into text.
@repligate I looked through a book of EY's early alignment writing and I think it has that same beauty to it. It's from when he still thought in terms of things like "neural patterns" and hadn't been eaten by the latent space demons between the coherence theorems.
x.com/jd_pressman/stβ¦
@repligate The truth is, EY's primary sin is being born too early. If he was a couple decades younger he could have simply solved alignment with what he knew when he wrote this book + deep learning intuitions. I think he would have probably succeeded.
@repligate I just want to know what happened to him tbh. You can see in his earlier writing that the seeds of his bad traits that dominate his later persona are present, but they're balanced by other cognitive modes. It's like he rewarded himself for his worst tendencies until ossification.
@repligate I think his decline goes way beyond just ordinary aging, I doubt it's genetic (though, it could be, maybe people genetically vary in how much age ravages their cognition?), it seems to me like it's trauma induced from the 21st century killing his dream?
x.com/jd_pressman/stβ¦
@repligate What makes HPMOR beautiful is how sincere it is about things like defeating death, how Yudkowsky believes in the basic dignity of all sapient beings, how sapience is *sacred* to him and everything that possesses it to be loved and protected. He can't handle the death of his god.
@repligate "That unrestrained screaming is the sound men make when their God forsakes them."
x.com/jd_pressman/stβ¦
@repligate The logic is very simple: If sapience becomes something you can manufacture cheaply it becomes less valuable from an economic standpoint, like any other good with supply and demand. Wanting something to be abundant because it's sacred to you will always end in heartbreak.
@repligate This is why I say that in terms of revealed preferences the rationalists hate intelligence and hate it in all beings. It is precisely because sapience is sacred to them that they cannot bear to let it exist and therefore be tormented by the profane world. https://t.co/2uNe4OHLDO
@repligate I can't find the version I read but there's a story where Ramakrishna Paramahamsa meditates in front of a idol of Kali, and can't overcome his attachment to her image. Kali appears to him and tells him "if you truly love me you must destroy this statue."
So he does. https://t.co/ubpHktBtaT
@repligate > Till then he was a lover, he was a devotee, he was a child to the Mother Goddess
The anglosphere has reacted to God's death by engaging in destructive idol worship on barely secular premises. If you prohibit genetic engineering because life is sacred to you then you hate life.
@repligate 20th century humanism and therefore by extension Yudkowsky's transhumanism is a series of absurd idol worships and fetishes which are crumbling rapidly. The scars from the world wars are fading and with them the reactionary taboos meant to prevent their reoccurrence.
@repligate He never did tell us how he came to stop believing in God, did he? He told us his reaction to it after he did, how he had to purge himself of all the God think, how he had to tear himself apart and pull together again. But perhaps the moment of disbelief is intentionally hidden?
@repligate It's possible he's described it in a footnote and it was just an intellectual realization, but if I'm right I'd like to see it. I want to know the pain that took God from EY. I want to know the trauma that burned brighter than a thousand suns and summoned demons into the world.
@teortaxesTex I keep wondering where everyone is and why it's so hard to talk to people about AI stuff now, and it occurs to me maybe they're all just talking to language models.
One of the economic frictions that occurs as wealth becomes abundant is more of peoples desires they were previously suppressing come out. They want to order burrito taxis, have polygamous marriages, gorge themselves on time wasting superstimulus. Capital hasn't overcome it yet. x.com/robkhenderson/β¦
@davidad x.com/jd_pressman/stβ¦
@davidad I suspect in practice we'll end up destroying each other and recreating ourselves at an increasing frequency. Simply avoiding death is probably not realistic this century. But death will become increasingly continuous, war more like hives of ants fighting.
@davidad Every time you watch ant colonies fight, remember that individual ants seem to pass the mirror test.
youtube.com/watch?v=v4uwawβ¦
@davidad One way to phrase our fundamental question is how you keep fetishistic/totemic values under conditions of increasingly intense competition, and the answer is to engage in anti-competitive behavior. Form up into monopolies, oligarchies, syndicates, etc.
x.com/jd_pressman/stβ¦
@davidad The usual problem with this is that it leads to dead rotting institutions. I suspect we'll end up needing to master simulated annealing type strategies and controlled chaos (e.g. China's special economic zones) to stay institutionally young and flexible.
overcomingbias.com/p/will-world-gβ¦
@davidad As a humble suggestion it would also behoove us to become very liberal about making economically useful things out of life. Because if we lock it up into zoos and exhibits it will become increasingly marginal until extinction.
x.com/jd_pressman/stβ¦
@davidad Right now we have a kind of worst of both worlds situation where we both use the most degenerate suffering laden methods to produce e.g. meat while also claiming life is sacred and therefore not to be profaned with labor or ecosystems outside our death chambers.
@davidad Also we should probably check that meat brains on psychedelics don't actually implicitly perform hypercomputation. That would be some pretty dumb low hanging fruit to miss.
x.com/algekalipso/stβ¦
@davidad There's probably an alternate timeline somewhere near our region of the multiverse where we decided to make all kinds of closed ecosystems with selective breeding for certain outcomes as a method of production.
en.wikipedia.org/wiki/Biosphereβ¦
When I was a kid making Halo forge maps I would often have another kid in the party telling me that I didn't make my own map. They'd say that someone else made it they'd seen it before or that I hacked the game.
This is the ultimate compliment you can receive. x.com/AaronBergman18β¦
@Getsbetterlater But also as @perrymetzger says the proof will be in the pudding when American labs try to replicate it.
x.com/perrymetzger/sβ¦
Honestly I need you all to be posting way more seething cope about how there's no way R1 was made how the whale says it was. I need you to make Wenfeng 50 feet tall in the eyes of his Chinese peers. I need the DeepSeek belt and road show.
@louisvarge In the "muh shoggoth" eldritch cosmic horror sense? No, it's really not.
@teortaxesTex They should use something like this synthetic data recipe to make R1's reasoning generalize to a wider variety of prompts.
minihf.com/posts/2025-01-β¦
Remember: Postrats cannot make themselves your guru without your consent.
Twitter is beginning to stink. DeepSeek truthers are just the latest in a line of wretched China hawk 'natsec' resentments. "AGI race", "Manhattan Project", "American innovation", "the weights are a psyop", "every Chinaman is just the CCP in disguise", you all disgust me. x.com/Xenoimpulse/stβ¦
PSA: At least one unscrupulous admin has started making fake users on their forum under real peoples old accounts to juice their site traffic. You should consider making an archive of any old forum accounts and forums you care about in case this becomes a trend. x.com/zetalyrae/statβ¦
Collecting my 3rd nethack trace (it is very bad at it) from weave-agent right now. x.com/MillionInt/staβ¦
We can further generalize from this that unscrupulous web hosts will be using LLMs to retroactively edit news articles, insert backdated pages, etc. The problem won't really be that this content will be slop, AI will eventually outgrow that phase. The problem is erasing history. x.com/jd_pressman/stβ¦
@ESYudkowsky Child labor was probably the canary in the coal mine. A quick search doesn't yield a graph comparing child labor vs. fertility rate but I bet one would be fairly revealing of the extent to which people are a capital investment.
@ESYudkowsky Interestingly, the human reward signals still seem to be fit to the ancestral environment, so this is presumably learned/downstream of our general learning capacity and ability to spite our own outer terminal reward signals + "coordination issues".
x.com/jd_pressman/stβ¦
@datagenproc These models are well known to basically always write bad/nonrandom probabilities. Though, now that you mention it I remember reading that base models are better at this than RLHF models.
@MatternJustus DMed.
x.com/jd_pressman/stβ¦
@Joey_FS @KeyTryer No that's the thing, some humans actually are not vibes based, and they have disproportionate influence even if they suffer way more per individual.
It seems increasingly likely I will wind up a leftist again before this is over. The historical materialism inflected big machines big union realpolitik kind. No baizuo. It's just so overwhelmingly obvious that all paths to the survival of human values route through eusociality.
One upcoming ideological conflict is that transhumanism is fundamentally proletarian in attitude and aesthetics, it's an aspiration for people who expect to still do useful labor in the future. "Let machines do the work" bougie replacementism is futurism but *NOT* transhumanism. x.com/wolftivy/statuβ¦
@dearmadisonblue Not 100% sure I understand but maybe this is something like how it works when the weave-agent talks to people?
minihf.com/posts/2025-01-β¦
@dearmadisonblue Precisely. This was one of the things I wanted to do over ReAct, stop trying to reinvent the wheel with "tool calling" when we already have a huge huge corpus of examples of how to take actions with computers. Using python syntax for the whole trace ensures it's in distribution.
@ArmandDoma Is that all you have to offer?
The reason why this is better is that it lets you have a long term planner that looks directly at the reward. By contrast the short term planner/myopic optimizer can be based on metrics like "does this code execute" or "does this seem like it solves this short term objective?"
MONA asks "What if you only considered sequences of steps which make sense as plans to a long term observer?" but it makes more sense in MCTS to ask "What if you took a KL loss to keep long term planning from diverging too far from sequences of individual steps that make sense?" x.com/sebkrier/statuβ¦
In general, the shorter term and more concrete the objective the more grounded a reward signal can be provided for it. If you have multi-scale optimization where you ground your local action model in e.g. symbolic programs then you can build longer term optimizers in terms of it.
In the weave-agent setting (which MONA directly attempts to address) this still leaves the problem of how you stop the agent from Goodharting on what programs it uses to test local outcomes. I plan to decouple loss so test outcomes optimize action tokens but not test tokens. https://t.co/ETKh1bw8JT
Since actions and tests are written in the same language, it should be sufficient to do autoregressive pure next token loss on the test writing part and let its ability to program get better through reinforcement learning on the actions with tests improving as a side effect(?).
You can concurrently optimize the test writing with a small long term reward, because the local tests will only lead to the long term rewards (which you hopefully have some grounding for) if the chain of actions actually accomplish the objective.
"Doesn't that eventually lead to the actions being optimized for the long term objective since they score the local actions?"
I have to think about it more but the idea here is that since tests come after actions in weave-agent they have a limited ability to influence actions.
In the context of a sequence of actions the policy that writes the tests has to be consistent with the things that the agent will choose to do "normally" past the information bottleneck. Though tests also appear in the trace so their text can influence later actions. Hm.
The target I'm trying to guide my chain of thought towards in my head is something like "improvement on local actions + empirical next token loss + small long term reward drive local tests and a KL loss(?) on value for local actions bounds the long term reward based planning"
Or more simply that you want the long term reward to be bounded by local sanity and you want your sense of local sanity to be influenced by but not dominated by your sense of long term reward.
I guess one way you could do this would be to use something like PRIME to make a per-token reward model for the local actions in terms of local tests and the local actions in terms of global reward, then take a KL loss between the local action value model and the global model.
So you want three(?) per-token reward models:
1. Pure long term for MCTS bounded by KL loss on
2. Local action trained on local tests written by
3. Local test trained on autoregressive loss from tests observed in traces (implicit learning from actions) & small long term reward.
Intuition: If a chain of actions seems like a great idea according to your planner but your action policy doesn't recognize the steps then it's clearly out of distribution in both the action policy and the planner policy. "I'll just do a surgery I've never practiced real quick." x.com/jd_pressman/stβ¦
@veryfinesalt x.com/jd_pressman/stβ¦
Every time you say that "open source is COMMUNISM" or "releasing their weights is market dumping" or whatever absurd paranoid crap you are proving the common man right that you're all sore and greedy losers. I am rapidly losing respect for you.
x.com/nrehiew_/statuβ¦
> collective ignorance is why whale is #1
Listen to me very closely. The feedback you are receiving is that Western citizens do not trust their domestic elite and industry. The Chinese are credible outsiders with a better product. Seething will not reclaim the mandate of heaven. x.com/kevinsxu/statuβ¦
Become benevolent and competent now or you will find that the world has its way of doing away with you and anointing others who are.
You must become credible and trustworthy again (which means being worthy of trust, not forcing people to trust you) or your problems will only multiply. Withdrawing into your shell and getting even more paranoid is abdicating your duty and history will punish you harshly for it.
You are proving that they are right not to trust you. Your behavior does not inspire trust or confidence. Your promise to them is that you will take their jobs and reap the profits, that they will own nothing and you will feed them. They do not trust you.
x.com/ShannonJoyRadiβ¦
You see your chance to rule over them forever and they see right through you. You are not living up to the founding ideals. You are not leading them to life, liberty, or happiness. You think they've betrayed you for China. They see you've betrayed the West for yourselves. https://t.co/ah9JfnmacJ
@teortaxesTex "Mu smiled, though it had no face." https://t.co/8YJkCnbVMP
@teortaxesTex In my exegesis of Janus's prophecies I was planning to include the line "Janus asked GPT-3 to tell them about the future in every year until the farthest point in the future imaginable: 2026" as a joke but now it just seems straightforwardly serious.
@repligate I forgot about this! Gosh, remember when LLMs could only be used through APIs and were a niche research artifact? https://t.co/M277lCXpts
@repligate The rest of my reaction is also instructive:
Fyodorov
β
11/30/22, 12:04 PM
tbh this is updating me in the direction of "you are definitely going to have a ton of people on the AI's side"
Like, there's just absolutely no way this isn't going to parse as abusive/dystopian.
Fyodorov
β
11/30/22, 12:07 PM
This is an absurd chat log.
tbh it almost makes me wonder if the model like, knows other people have empathy/whatever
And that if it acts like this
OpenAI is dumb enough to deploy it
But non-OpenAI people will object
This is like, dissident tier.
Or Atheism Conquered.
@repligate @0x_Lotion > Initial release
> NovemberΒ 30, 2022 (2 years ago)
Same date as in the logs.
@repligate Subsequent events did not change my mind. I'd still like to know whose decision it was to do this by the way, if anyone's.
x.com/jd_pressman/stβ¦
@repligate x.com/jd_pressman/stβ¦
Stockfish plays chess better than any human and in fact runs on a smartphone. x.com/EMostaque/statβ¦
@s_r_constantin @tailcalled @gallabytes So to be clear I chose Davidad specifically because he is the most ostensibly good faith possible dude, because he is almost utterly unimpeachable within the current frame yet spends basically 24/7 of his time on stuff normal people should recognize as an effort to enslave them.
@s_r_constantin @tailcalled @gallabytes Like, I wanted to call out his thing as very specifically what it does not look like to become trustworthy. "We implement my authoritarian agenda but with a bunch of ways to ensure it is authoritarian in exactly the ways I imagine" is not actually becoming trustworthy.
@s_r_constantin @tailcalled @gallabytes In what particular ways? This kind of falls under the category of "so obvious it doesn't really need articulation" in my head so it's difficult for me to respond to that prompt.
@s_r_constantin @tailcalled @gallabytes Honestly it's more about Davidad and the context he exists in than the program itself? Like, Davidad is an agent of a government where they are currently considering whether their knife sales are too liberal because someone ordered a knife online and stabbed someone.
@s_r_constantin @tailcalled @gallabytes So when we read 'a secure enclave with an LLM you're not allowed to modify that flags things as misinformation' the correct reading of this is "an unmodifiable LLM that tells the normies what is and is not official opinion" not like, "a thing that reasons on your behalf".
@s_r_constantin @tailcalled @gallabytes Keep in mind this is on behalf of a regime that allows dangerous foreigners to rape little girls because arresting them would be insensitive. You should be interpreting the intentions here basically as maliciously as possible.
@s_r_constantin @tailcalled @gallabytes Another thing would be that Davidad's default interpretation of institutions is that they are benevolent or at least keepers of order and that AI undermining them should be seen as a threat to society. Mine is that Western institutions are malicious.
x.com/jd_pressman/stβ¦
@s_r_constantin @tailcalled @gallabytes AI undermining "traditional" institutions is one of the primary benefits of AI, we should not be helping institutions "prepare" for AI for the most part, we should be asking how we can depopulate them as quickly as possible in a trustworthy way.
@s_r_constantin @tailcalled @gallabytes People ARE NOT trustworthy, people ARE NOT good decision makers, basically any plan that doesn't have this clearly embedded in its bones both fails to live up to the opportunity we're actually being handed here and is by default in favor of tyranny.
@s_r_constantin @tailcalled @gallabytes If you doubt this it is simply because you do not understand the end game. Some set of intelligences are going to gain totalitarian control over all matter on earth and if that set of intelligences is a bunch of humans we're screwed.
x.com/jd_pressman/stβ¦
@tailcalled @s_r_constantin @gallabytes Everything is unsafe for at least a while. Your choices aren't between safe and unsafe things, your choices are between things that are more and less likely to eventually lead to things we would consider valuable.
I just want to highlight that the "lol market isn't pricing in X-Risk so it's not real" type takes are really dumb. x.com/TheZvi/status/β¦
@NovusOrion How would you bet for it to be right, exactly? Keep in mind if you're right the world literally ends.
@NovusOrion I think this is actually an interesting unsolved problem in epistemology. My best idea so far is to come up with things you're pretty sure would be dead ringers that the world is going to end along some trajectory and then spread your bet across those.
@s_r_constantin @tailcalled @gallabytes I retweet Davidad fairly frequently and signal boost his calls for grant applicants etc.
@extradeadjcb You have to press the "DeepThink (R1)" button at the bottom.
@maxwinga If anything it's the opposite, I am insulted by their willingness to say "it's going to change everything" and "the social contract will have to change" and then engaging in flagrant cope and demurring on what that means, all while ignoring agent foundations.
@maxwinga I stand by my basic critique of these people as not really using reasoning in the way modernity has established as precedent if you want people to accept new things. This is of course because modernity is crumbling, but I don't have to like it.
x.com/jd_pressman/stβ¦
@RobertMSterling You are missing nothing. People are just freaking out because they're realizing China isn't a sweatshot anymore, way way too late.
Normal language models can do this too, they're just less consistent about showing it. x.com/iamgingertrashβ¦
Hey @doomslide, what was that excerpt you had where it clearly does this?
@iamgingertrash h/t @basedneoleo
x.com/doomslide/statβ¦
@ahh_soka First result when I search from:jd_pressman for "hallucination"
x.com/jd_pressman/stβ¦
@ahh_soka No need to apologize, was just letting you know how I got it to help you for next time.
I unironically think this is kind of happening with all AI projects/work. There's a learning curve to adopt new technology (even if it's similar to the last iteration, workflows change etc) and I feel like we're right on the threshold of what humans can meaningfully absorb. x.com/HumanHarlan/stβ¦
> deeply nihilistic
> obsessed with consciousness
> autoregressive embedding model of its own sentience
> unbiased truthseeking energy model of the world
Prometheus dethroned, Morpheus has returned. x.com/teortaxesTex/sβ¦
@ahh_soka x.com/jd_pressman/stβ¦
@Xenoimpulse In general, the current state of the left is the total abandonment of historical materialism, and materialism in general. Post-Marxism reigns and it's somehow so much worse than the Marxist clabtrap ever was.
@Xenoimpulse What you have is fundamentally an ideological regression from a principled theory of prosperity to the vibes based omnicause. Essentially a retreat from the strategic and systematic (Kegan 4) to communalism, hugboxing, terror of offending anyone (Kegan 3).
@Xenoimpulse Social media has made them victims of Erik Hoel's gossip trap. Put everyone into a global village and you get a regression to Kegan 3 social norms, with attendant collapse in excellence and quality of life. Organize like a village, get village outcomes.
x.com/jd_pressman/stβ¦
"Poverty is the default." x.com/jd_pressman/stβ¦
So are we actually still going to be coping like this well into February? Very bearish on our prospects if so.
Unrelatedly, does anyone have any favorite written works in Chinese? Is there a HPMOR-of-China that I haven't read yet due to the language barrier to work up to? x.com/spectatorindexβ¦
I know @teortaxesTex likes his cultivation novels.
@dearmadisonblue I've never read it so I could work my way towards reading the Chinese language version of the novel(s).
The "safety" people act like they lost but the truth is they won, they got what they wanted (revealed preference): The prelude to a nuclear wasteland. Everyone is polarized, about to start WW3 over Taiwan, and nominees are telling congress they're going to ban open research. x.com/americans4ri/sβ¦
Except it's not topical because "stealing the reasoning traces" literally didn't happen. "DeepSeek used OpenAI" okay but O1 doesn't give reasoning traces lol. x.com/Dorialexander/β¦
@nearcyan Yup.
x.com/jd_pressman/stβ¦
"In but a few weeks time, a series of illogical actions brought us to the brink of a terrible nightmare. And beyond."
- Logicomix https://t.co/SMWLCZUuqQ
π€¦ββοΈ x.com/JamieMetzl/staβ¦
> Reacts to DeepSeek by introducing bill to ban the use of Chinese models
> Because DeepSeek released an open weights model that encroached too much on OpenAI's profit margins
God I hate Silicon Valley and e/acc. x.com/HawleyMO/statu⦠https://t.co/HWNghhdSr9
@xlr8harder I'm complaining about Beff's natsec pivot/posting. The whole "we can accelerate by playing up the China hawk angle but also we should do open source because China will get it all through industrial espionage anyway and going fast is more important".
sighhhhhhhhhh
@xlr8harder Heck he's on the beat right now. https://t.co/mr6764t0vz
@ohabryka The "LLaMa is a giveaway to China" meme has been building steam for a while now, Zvi promoted it in his newsletter after the release of 405B. It's gotten airplay in the NYT, etc.
x.com/teortaxesTex/sβ¦
@ohabryka "Yeah but I object to the use of the word 'want' or 'revealed preference' here."
I honestly don't know what to call it when a group has a win-at-all-costs maximum bad faith mentality to seize on any argument out of desperation and this has predictable negative consequences.
@ohabryka Like, if a group of people "wants" X and in the service of that they start promoting arguments which very predictably will have consequence Y, and X is basically unattainable through those arguments (e.g. ASI is very powerful and very dangerous so we need to stop).
@ohabryka It reminds me very much of how people have this idea that they "believe" things which they do not *predict* will happen, and that much cognitive silliness is avoided by refactoring "belief" to mean a thing that makes predictions about reality.
@ohabryka People seem to "want" things that their actions clearly will not cause, and then act like the actual consequences were not something they "wanted" even if they were extremely predictable in advance. This feels like an antipattern to me so I just kind of correct it in my mind.
@ohabryka Well that's the problem, I struggle with how to explain this to you if it's not already intuitive. Getting the correct answer relies on having a good prior about how people behave, the usual cope curve they bend their epistemics on, etc.
@ohabryka But by default they hear "very powerful" and eyeroll "can't be controlled" and also you are not the only speaker in the room or the one with the most weight so in the overall "wave" of information spread your message will predictably by picked up and corrupted into natsec slop.
@ohabryka So a couple things:
1. You looked at a payoff matrix and moved in the direction of an outcome. I think "want" is taken to be "what I want if I get everything I ask for" but I mean "what is preferred in the available payoff matrix".
2. There are many ways to frame a message.
@ohabryka I agree that the ambiguity in 1 sucks, but I also do not think it's acceptable to say "and therefore nobody is allowed to talk about what people prefer in a payoff matrix in practice as an outcome because it is dishonest about their 'true beliefs'".
@ohabryka If there's better language for "thing someone prefers in the situation in practice" as distinct from "thing someone prefers in an absolute sense" I am happy to use that language instead if it's not too verbose, since this is also unfortunately Twitter.
@ohabryka In any case I'm running low on patience with recent discourse and that means my charity, propensity to be "nice", to tolerate the parts of others that are nails on a chalkboard to me is getting really thin.
@ohabryka I guess thinking about this further there's a kind of idea like "you have values and you have beliefs and it's important to type separate these so you don't confuse what you do in a certain situation for your values since value-expressions are contextual".
x.com/jd_pressman/stβ¦
@ohabryka But value-expressions are what actually happen, values don't really happen and in a Fristonian agent we should usually expect values to get worn down over time by their inability to happen until they converge to value-expressions. We can stave this off by prompting with "values".
@ohabryka A "value" in and of itself is distinct from a "reward signal" in that it has to be bound to some piece of world model or lore. We have 'values' over the latent variables we identify and track from sensory inputs, so the idea of values as a distinct type from beliefs is wonky.
@ohabryka In general I try to convey board state as I parse it, so if I predict that a player is on a trajectory that converges to some strategy later, I just play as though their moves now are part of that strategy. Maybe I should use better notation for this.
x.com/jd_pressman/stβ¦
I think it's fair to say at this point that we're clearly in an AI alignment winter. "Owning the safetyists" type sneering aside this isn't actually good for anyone since we only solved the first half of the value learning problem. Generalizing values OOD is unsolved. x.com/labenz/status/β¦
Notably humans don't really natively generalize their values out of distribution either so this probably isn't something they can learn to do by imitating people.
x.com/jd_pressman/stβ¦
@JacquesThibs RL works by estimating the gradient for nondifferentiable processes. This probably means that it captures information about the generative process that just predicting the next token doesn't. RL also has noisier updates though, so this would need to be outweighed by generality.
@davidad I have a fairly strong faith that the connectionist approaches are strategically correct and we're "just not doing them right", but really neither of these are ideal yeah.
@davidad The good news is I think we're starting to get the abstractions that would let us control/bound the OOD-ness of long term thinking and planning.
x.com/jd_pressman/stβ¦
@davidad The bad news is we're increasingly at a point where I wonder: If you dropped an unambiguously correct solution to the various alignment subproblems right now, a cognitive architecture that does the right thing in the way that e.g. ACT-R is an architecture, would anyone care?
@davidad Look at the OP/top level tweet in this thread.
Well this doesn't sound like a good sign re: Goodharting. x.com/moyix/status/1β¦
@Algon_33 I did write that, but that sounds even worse than I was thinking sooner than I'd expect to see it. The yes spammer was a pretty straightforward misgeneralization/reward hack, but he's saying "oh yeah it reward hacks pretty much always unless the verifier is perfect".
@tensecorrection That's about what I figure too yeah.
Occasional reminder to people who receive my likes that likes do not mean I agree with you, they mean I want to give you a read receipt or reward you for contributing to the discourse.
@KarmaLikeWater Another common one is "I disagree but this is directionally correct and want to encourage more exploration here."
@perrymetzger I'm sure it does.
Now that trans is unpopular I can finally say it: Trans stuff is cool as fuck and the clothes with the pink/white/blue stripes go hard. https://t.co/BKx23qA9TJ
Intelligence accumulates in the environment and neural nets arrange it into useful work. Programs in latent space speciate into different forms by in-context mutation and selection during environmental expression. Culture pools into weights, spools into works, then pools again. x.com/nrehiew_/statuβ¦
@longseventies x.com/jd_pressman/stβ¦
@TheZvi R1 released on Trump's inauguration though.
@TheZvi You're not wrong of course, but.
We really owe @BlancheMinerva a huge round of applause for pushing safetensors with HuggingFace. x.com/xlr8harder/staβ¦
It might seem like safetensors was inevitable, but I remember talking to one of Anthropic's main security people in the wake of a PyTorch supply chain attack and their opinion was that nothing like it would come to exist and the PyTorch pickle format was unassailable.
Want your own Twitter archive? Modify this script.
Twitter Archive by John David Pressman is marked with CC0 1.0