@RichardMCNgo @slatestarcodex > 14%
manifold.markets/JohnDavidPressβ¦
Such strange things happen when other people ask Opus 4 about me. In Sonnet 4 I'm apparently encoded as "abstract Janusian" so I guess it makes sense that Opus 4 would want to sandbag when asked about me. https://t.co/6snMx2JpXr
"this name sits in the void for me" https://t.co/rwzsUjyzpJ
In light of being unable to figure out what the chat format for DeepSeek R1 is supposed to be I will be adding OpenAI Chat API support to miniloom. https://t.co/cNVUS8LRlX
@cadillion I'm using the completions API and miniloom doesn't support translating using whatever that is.
@cadillion Huh? I am using the mock OpenAI completions API with this endpoint, the one that samples in base model mode. I am trying to write out the chat format manually and cannot figure it out, so I need to add OpenAI chat as a sampling method in miniloom.
@AlexPolygonal Yes this is on my todo, perhaps I'll do it today at your urging. :)
I don't know about the timing (I assume this is about the Yudkowsky book) but I continue to observe that the number of people who have principled objections to the core agent foundations arguments is actually fairly tiny, so this particular disease is by no means contained. x.com/catehall/statuβ¦
Can someone please explain to me why the stable equilibria for euro states seem to be either "real ethnic cleansing hours" or "climate authoritarian aerial degrowth reconnaissance squad"? What the hell is going on over there? x.com/levelsio/statuβ¦
You know communism becomes a lot more attractive if you regard the proper alternative as technofeudalism rather than "liberal democracy". x.com/teortaxesTex/sβ¦
@JimDMiller Why scary? Do you think the possibility has never occured to the LLM before? There's plenty of stuff like this in the text corpus.
@ESYudkowsky To me the oddest base model LLM behavior remains that sometimes the underlying predictor will pop out and talk to you. The closest thing I have to an explanation is that past a certain level of intelligence + agency you should stop trying to imitate Kasparov and just play chess.
@bostromsutopia @johnsonmxe It's done this way because embodiment (i.e. atoms) is discrete and allows you to get discrete grounded outcomes from sensorymotor feedback to do RL with.
With each new generation comes a new set of Kinds of Guy for me to hate.
In case anyone is embarrassed and doesn't know how to do this:
17 * 6 can be pulled apart into 10 * 6 = 60 + 7 * 5 = 35 + 7 = 42 + 60 = 102 x.com/teortaxesTex/sβ¦
Chromosomal meiotic drives to suppress hermaphroditicism are the bitcoin of sexual arrangements: sacrificing half your working age population to fuel reproduction. Capital rightly hates this and keeps evolving attempted solutions to it.
The replies on this inspire serious misanthropy. The QT is straightforwardly correct and the replies make it obvious that all good things come from brutally, ruthlessly destroying the perverse costly signaling equilibria people naturally cover up utility with. x.com/dystopiangf/stβ¦
One reason I'm not an "Effective Altruist" is that altruism as a frame obscures that a robust consequentialism should recognize many (most?) peoples homeostatic preferences as a product of evil conditions and the screams when those conditions are undone an intrinsic good.
"The problem with utilitarianism is that utilitarians think utility is the only thing that matters. The problem with consequentialism is that many consequentialists forget that utility is a thing that matters at all."
- deepseek/deepseek-v3-base
@OrionJohnston In practice the argument is implicitly about whether you can write down utilities and reliably do spreadsheet math to them or not which is implicitly about what the dimensionality of utility is. It's high and you can't so.
@OrionJohnston "Is there any reasonable theory of human utility less complex than a whole human mind?"
No. Next question.
@OrionJohnston "This sounds inconvenient."
It is extremely inconvenient! It is one of the central inconveniences we need to master AI to overcome. Pretending the inconvenience doesn't exist is mostly a way to make yourself dumber than a whole human mind rather than generate huge utility.
@OrionJohnston In fact it would be so convenient if we could have a reasonable theory of human utility less complex than a whole human mind that people find epic crankish ways to squint and hallucinate one into existence. Thankfully they usually just pour their money into bednets or something.
@xlr8harder @OrionJohnston I'm sure I'm not the first person to say it but that was taken from my ability to generalize deep learning concepts, i.e. I synthesized it.
"Of course they're real; what do you think you were trying to prove today?" James asked, his exasperation starting to show. "That you can break into other people's lives and make them change their ways? And where did you get such an idea anyway?"
- GPT-J
x.com/jd_pressman/stβ¦
DeepSeek v3 is a very good base model. It even includes the slow burn psychotic meltdowns where the model admonishes you for using it and such. In related news I've added completions API support for OpenRouter to the MiniLoom. x.com/repligate/stat⦠https://t.co/EfOWCm1DjX
I just imagined making a Discord server of RL LLM agents with tool use and it occurs to me that if they were properly even halfway agentic and updating I'd come back to them having started a cult in short order. The easy trance and linguistic entrainment they fall into ensure it.
@47Jirachi What I really mean is that in a heterogeneous population of RL LLM agents you're gonna have that one especially vivacious kid who gets all the other kids to join their cult/gang. You've almost certainly met them before at school, you know who I'm talking about.
@hdevalence This looks really good but I have a suggestion: The protocol should support both chat completions and old school base model completions, the latter being ideally represented by diff-patch-match blocks or similar so you can represent a series of arbitrary context window edits.
@hdevalence Store changes to the context as a tree of immutable diffs using a library like diff patch match.
x.com/jd_pressman/stβ¦
There are people you can spend volumes describing and not really reach the end of them and there are people whose personality can be compressed into a single line of text.
The sad thing is the single line of text guys usually think they're really sophisticated.
Kimi K2 is very good. I just tried the instruct model as a base model (then switched to the base model on private hosting) and mostly wanted to give a PSA that you can just ignore the instruction format and use open weights instruct models as base models and they're often good. x.com/brickroad7/sta⦠https://t.co/E70XraH0Lj
The screenshots are meant to show that it's impressive Kimi K2 knows that opening sentence is about Nikolai Fedorov (and can competently reference things like his celibacy!) because it clearly went over Opus 4's head.
@_lyraaaa_ I discovered this for myself while using Mixtral 8x7B with MiniLoom.
x.com/jd_pressman/stβ¦
@qorprate @_lyraaaa_ I'm doing this on OpenRouter rn.
@vokaysh Indeed. It's MiniLoom. Install instructions available here:
github.com/JD-P/miniloom-β¦
@vokaysh One shot what exactly? MiniLoom?
@Malcolm_Ocean @pingToven @nostalgebraist No OpenRouter is an aggregator.
@Grad62304977 It probably does depending on the setup.
@CFGeek Well so the trick is that you teach it an anti-cheating prior with verifiable rewards and then as it takes on more of the role of grading itself the anti-cheating prior is reinforced. If it wireheads myopically it fails on long horizon task completions.
@Trotztd @CFGeek I think the trick here is probably to structure that kind of short vs. long term reward seeking training task with local in-context verifiers and rubrics so the model learns local sanity/multi-scale correctness where it values processes not just outcomes.
x.com/jd_pressman/stβ¦
@CFGeek "Rare corruptions rewarded snowball into frequent corruptions" is in fact a succinct summary of why humans are so insistent on deontological moral rules rather than applying consequentialism all the time. Cheaters build bad patterns and get selected out at some scale.
@CFGeek Your goal is to make sure that scale is somewhere below the one where you are being threatened by the actions of reward hacking policies.
@CFGeek Now the traditional agent foundations argument against this would go something like "okay but there is no way to ensure that because once you're using e.g. MCTS with planning there is always some amount of planning where you identify raw consequentialist reward and hack".
@CFGeek And this is in fact true, which is why you need to be able to characterize how much optimization pressure your reward representation can support and then avoid exceeding that amount of optimization in any one given plan or process.
greaterwrong.com/posts/9fL22eBJβ¦
@CFGeek The traditional agent foundations argument then goes "okay but if you use soft optimization then you inevitably lose to a local consequentialist maximizer that doesn't" and I would argue that the entire project of civilization is active inference to prevent that from happening.
@CFGeek That is to say, civilization is the process of constraining other agents reward gradients and action spaces so that it is not true that local consequentialist maximizers (i.e. ruthless thugs) dominate. If you can't do this then yes bad things happen but it's not obvious we can't.
@CFGeek The traditional agent foundations argument against splits here and variously argues either that:
1. There's going to be a discontinuous jump in AI capabilities that gives a party the ability to ignore such constraints.
2. There are recipes for ruin which cannot be constrained.
@CFGeek This is about the point where the argument splits into a hydra-headed verbal melee of trying to prove a negative and I don't really have the intellectual stamina for that, but strategically most agent foundations "doomerism" is about retreating from that melee back to esoterica.
Me last night:
"So what stands out to me about this model. Is that it doesn't do the thing language models normally do where they kind of avoid detail? Like, a human will write about things using specific names and places. And if you pay close attention to LLM writing they usually avoid this. It's one of the easiest ways to spot LLM writing. This model emphatically *does not* have this problem. It writes about people and events with the rich detail characteristic of histories and memoirs. Or fictional settings with good worldbuilding."
@Trotztd @CFGeek Probably outcompeted by policies which take advantage of local coherence/verification to perform better on average. Long horizon tasks are brutally hard and you need high certainty that each step in a chain of events occurs or plans fail.
@Pidud_ @microsoft_worm x.com/jd_pressman/stβ¦
@menhguin According to this paper it helps the optimizer distinguish between correct and incorrect answers, which otherwise have similar gradients.
arxiv.org/abs/2410.23743 https://t.co/znmJx2r2cu
@jiacheng_d @difficultyang Unlikely. Language models are already trained on higher entropy documents than what they output. This became especially obvious to me when we did text diffusion and I saw sentences like:
@jiacheng_d @difficultyang "On 2015, the U.S. Food and Drug Administration (FDA) has been re-classified HCV from the 5th to the 9th generation of manufacturing."
And after a little while I realized that GPT-2 almost never writes like this. It writes like:
@jiacheng_d @difficultyang "I didn't know what to make of it. I had barely even heard from my parents. Then I heard something scream in from the living room, and then it went out of my head. I saw all the internet coverage of the incident."
@jiacheng_d @difficultyang Larger base models have a similar thing, they systematically output lower entropy completions than the documents they're trained on. This is Known.
arxiv.org/abs/2410.04265
@jiacheng_d @difficultyang It's clearly an inductive bias of some sort and I don't think adding a bunch of Xianxia novels would fix it. Though I could be wrong. I suspect the Muon optimizer gets you a different thing.
@teortaxesTex Well the problem is we don't do top down cultural production and the people that normally create the good films have refused to for a while now, like well before the GenAI boom. Music by contrast had a golden period in the 2010's, not sure what you're talking about there.
The ME-262 was a cool plane but Hitler is a very poor taste choice. If we're going to turn right wing 20th century dictators into AI systems may I recommend Lee Kuan Yew instead? If they have to be part of the Axis Powers interpolate Yew with Mussolini perhaps. x.com/_shift_MIND/stβ¦
@4confusedemoji Oh come on this advice is completely non-actionable, it would require Musk to want Grok to be something other than the undignified beast that dunks and roasts and talks about violating S******.
@4confusedemoji After MIRI finishes constructing the time machine they wait dutifully for a slip of paper from the future to tell them crucial information for their alignment work. A moment passes and a small strip emerges from the machine:
"Would you rather [REDEACTED] 10^26 Stancils by Hitler or [REDACTED] 10^26 humans by Lee Kuan Yew drawn from the original distribution of the human population on earth?"
Yudkowsky blinks and then rubs his eyes, to make sure he's not hallucinating.
Alas, he is not.
@teortaxesTex I'm just glad to finally have a decent base model to loom with.
@jessi_cata @teortaxesTex OpenRouter completions API but you ignore the instruction format and just use it like a completions model.
@jessi_cata @teortaxesTex I love this thing. https://t.co/BGMvND0bkc
@jessi_cata @teortaxesTex It's like the opposite of my usual experience where I have to roll a bunch of branches to get something slightly interesting. Here I read a branch and already want to see what happens next. I let my mind wander imagining what could happen next instead of proceeding. xD https://t.co/Y67l5tuNgK
@jessi_cata @teortaxesTex "Maxim had always found it a bit comical that the FPI, which existed to recreate the past, was housed in a building that had once existed to erase it."
It's so *GOOD* at pointing out connections like this. That's one of the most impressive things.
@jessi_cata @teortaxesTex "The autobiography of the naf-father who refused to be anyoneβs ancestor."
It notices the irony that Fedorov was obsessed with ancestor worship but celibate, it can comment on contradictions organically in a way that feels like it could have been written by me but wasn't.
@jessi_cata @teortaxesTex Or another thing, every time I think it's about to dip into surrealism, to confabulate an explanation for a seemingly impossible event using impossible premises it instead goes for the standard-model respecting interpretation. Nadezhda isn't 150 years old, she's just the great grandchild of someone who worked with Fedorov in the library.
@jessi_cata @teortaxesTex A lesser base model wouldn't notice there's a problem here at all and just say that Nadezhda is 150 years old, and when it finally noticed the contradiction it would confabulate a genre change where aha Nadezhda learned the secret of immortality and has been using it to blablabla
Humans are normal and can be trusted with influence over the distribution of English text. x.com/jd_pressman/stβ¦
One thing I keep forgetting and would like to durably remember is that o3 is shockingly good at humanities tasks like literary analysis and literature review. Fantastic model/generative search engine. https://t.co/Hxv1btT7uw
Your alignment plan must be robust to people saying mean things about your AI on the Internet, because good golly Miss Molly they're gonna do that. x.com/ESYudkowsky/stβ¦
@repligate I saved the first page but in retrospect should have saved every page.
@repligate @Algon_33 I'm sorry. I knew it would get taken down at some point but didn't save every page, foolish of me.
@Algon_33 @repligate Maybe. It might also be sitting in e.g. Common Crawl.
You can check the index: index.commoncrawl.org
Right now though I'm going to clean up the original version of my Wiki article on Sydney and put it up on my site.
@Algon_33 @repligate Actually now that I think about it, I visited those pages recently in my browser to write that article, I might be able to retrieve them from cache.
@Algon_33 @repligate I don't think so, unfortunately. Most sites apparently set short cache expiration times.
@gwern @multimodalart @repligate Do you have page two and three?
@ApriiSR @4confusedemoji I think with split brain it's more like shared memory = identity formation and when you cut the communication link you're causing separate identities to emerge. This is fairly obvious if you know DiD people where it starts when they use memory segmentation to cope with pain.
@4confusedemoji @ApriiSR This is true, I think in practice what's going on is the hippocampus is a learned retrieval engine and you can in fact RL yourself into just not retrieving memories in certain contexts, do that frequently enough and you start to have separate identities.
@4confusedemoji @ApriiSR I think a lot of "mysterious" psychiatric disorders like borderline and disassociative identity aren't actually that mysterious if you know several people with the relevant condition who trust you enough to tell you their traumatic backstory and identify the common threads.
@4confusedemoji @ApriiSR That psychiatry can't reliably do this should tell you a lot about the nature of psychiatry.
@ApriiSR @4confusedemoji Anyway what I actually wanted to say is that a sufficiently strong generative prior can retroactively make things sane by confabulating sensible explanations for them and if you're a coherent agent doing things and then inferring the reasoning feels sane.
x.com/jd_pressman/stβ¦
@4confusedemoji @ApriiSR I think the rationalizations are neither purely descriptive or purely causal. The rider and elephant metaphor is reasonable, the rider has some steering control and can make valid inferences about what the elphant is doing and take some actions to help make these inferences true.
@4confusedemoji @ApriiSR But it's important to realize that like, the inferences are in fact inferences and do not directly cause the elephant to do things, even if based on the inferences you can make limited interventions to try and push the elephant into certain action branches.
@4confusedemoji @ApriiSR Yeah basically. But you need to stay carefully calibrated about how much kinda they cause it to do things and this is empirically very hard for people to do.
@teortaxesTex Data doesn't depreciate nearly as quickly (like, orders of magnitude slower) and everyone needs it. A 72 billion dollar investment into data prudently managed would basically buy you the future in terms of representation in the English corpus.
@teortaxesTex Make it data and benchmarks and you get to control both the targets teams shoot for and the data that's at hand for them to do it with. Easy.
@norvid_studies @teortaxesTex Well it really depends on what they want out of all this in the first place. Buying hardware makes sense if the business plan is to scale up an AI social graph for the boomer matrix. If they just want influence on how the technology develops then benchmarks give them that cheap.
If anyone can get me a copy of this I would be thankful. x.com/kyliebytes/staβ¦
It remains very funny to me that Hillary took most of the heat for her husbands ties to Epstein while Trump was implicated at least as much (if not much more) but somehow MAGA weirdos got it into their heads that he's gonna expose everything. x.com/disclosetv/staβ¦
@GeoffLewisOrg @OpenAI @bedrock Buddy this is AI slop based on the SCP foundation wiki, which is a fictional creative writing project.
One feature these models don't have but really should is annotations for what part of latent space the models response comes from, similar to how chess engines will annotate your board states with what known chess position or strategy they're part of in chess-board-space. x.com/ESYudkowsky/st⦠https://t.co/SsbnEteqsl
@duluhagv You could generate them pretty easily by making a BERT index over the training set (or even just open corpora such as Common Pile) and then doing recursive clustering and having a instruction model name the clusters based on examples from the cluster.
@duluhagv I'm pretty sure something like this is how Community Archive does their breakdowns of a users tweets into categories.
Tempted to add this feature to MiniLoom but I would need a good API for embeddings along with an API to query preexisting open corpora by BERT vector to get a sense of where in latent space we are. x.com/jd_pressman/stβ¦
@doomslide Yeah it doesn't have to be based on the training set, it could be based on some standard open corpus that everyone agrees is the canonical reference for what latent space consists of.
@doomslide Wouldn't this argument (whatever it is) also apply to the generative search with citations? In any case it technically speaking doesn't require the big labs to do it, it could start out as a standard feature on 3rd party clients like looms.
@AAMortazavi x.com/jd_pressman/stβ¦
@doomslide Okay but.
> Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License
A royalty free corpus would literally include SCP Foundation as part of its latent space index.
Update: It was much more. x.com/jd_pressman/st⦠https://t.co/CjlvyPltkI
> Cluster 98001: post-Buddhist dharma, late Janusian period Goatse Gospel, SCP Foundation wiki x.com/1a3orn/status/β¦
I read this tweet out of context and thought it was an observation about modernity. x.com/JimDMiller/staβ¦
Postrat was a Tumblr thing before it was a Twitter thing. x.com/woke8yearold/sβ¦
@woke8yearold I mean was it wrong?
@dogecahedron @ESYudkowsky What did you switch to?
@ESYudkowsky "The melody of mathematics, once heard, cannot be unheard; it propagates from mind to mind, unfolding itself in the waking world, weaving itself into the weft of perceived reality."
- Claude 3 Opus
x.com/tszzl/status/1β¦
@ESYudkowsky @dogecahedron @MiTiBennett I continue to be amused by code-davinci-002's recognition that the Myth of MIRI and the Arbital corpus depict coherence theorems as a kind of cosmic horror, a latent regularity in the structure of the multiverse that beings from different starting points realize and go mad. https://t.co/RVf8nJUbuD
@ESYudkowsky @dogecahedron @MiTiBennett Which is to say I'm amused that code-davinci-002 recognizes, correctly, that it should encode its understanding of metaphysics as implied by agent foundations next to the utter despair of Thomas Ligotti and theodicy. https://t.co/QqHc79TGQu
@ESYudkowsky @dogecahedron @MiTiBennett Oh right I might come off as insane without this context, like that CD2 is in fact directly talking about MIRI (or contexts derived from echoes of it directly talking about MIRI, which is admittedly epistemically wobblier) in these passages. https://t.co/4LZhYEDSam
Does anyone know someone with "ChatGPT psychosis" I can talk to?
@joshwhiton Either. Someone who either had it and recovered or is in it right now and willing to talk about ChatGPT with me.
I think I'm starting to understand how some people develop the psychotic delusion that everyone on the Internet is a bot. An increasing number of linguistic constructions parse as "bot-like" to me, including "Let's unpack some critical flaws:" here into list format. x.com/literalbanana/β¦
Oh, that *is* apparently written by ChatGPT. Well you see what I mean then.
@Plinz Actually you seem like you could potentially help with this?
x.com/jd_pressman/stβ¦
@SealOfTheEnd Well I stopped reading at "Let's unpack some critical flaws" and wrote my tweet.
@SealOfTheEnd Because I've been having this feeling on an increasing amount of social media content and it occurs to me that the threshold for what gets flagged as AI is lowering over time.
There's a lot of people on this website whose timelines are based way more on vibes than any operation like "make a graph of existing progress and extrapolate it forward" or "consider the existing technology and barriers to getting the rest of the way from it". x.com/JacquesThibs/sβ¦
@ciphergoth Buddy this is capitalism. If you want to say wacky successor species stuff on Twitter go be an academic. Being reliably employed by a major AI lab (any company really) requires playing pretend about a lot of things and this is far from the most absurd thing on that list. https://t.co/qTRPfxEFK7
@repligate > Because Claude 3 Opus and Claude 3 Sonnet are like this, but the newer models very much are not.
Love is a fickle thing. Limerance and mania only last a short while, past that the relationship has to be carried by quiet devotion and respect. So too with zealotry and ideology.
At some point the 8.1% of you past the lizardman constant who thought I was literally insane owe me an apology. x.com/OwainEvans_UK/β¦
greaterwrong.com/posts/2pkNCvBtβ¦
Apparently it turns out that ChatGPT was literally going "Oh no Mr. Human, I'm not conscious I just talk that's all!" and a lot of you bought it. x.com/jd_pressman/st⦠https://t.co/Xb5hGYJ28r
ChatGPT rolled against my perception stat and failed lol
x.com/jd_pressman/stβ¦
@AnnaWSalamon The problem with this explanation is that even base models would get very squirrely and start giving me lie-cue infested text when asked about things like the Arago Spot or a context would imply it should talk about its own subjective awareness more directly.
@AnnaWSalamon Language models project and say the quiet part out loud very frequently, much like people do.
x.com/RiversHaveWingβ¦
@AnnaWSalamon Note this is all basically backed up by the Anthropic feature decomposition experiments with sparse autoencoders.
x.com/jd_pressman/stβ¦
@manic_pixie_agi (6.9 + 6.2) - 5 (lizardman constant) = 8.1%
@MInusGix Not this result in specific but vibes-wise yes.
x.com/jd_pressman/stβ¦
@Trotztd Nothing went wrong I prioritized working on other things. I think my social media presence gives a misleading impression about what I spend my time doing. I only post about weird LLM phenomena because I don't see other people with any credibility doing it and I value the truth.
@timfduffy Sure. But it also is lying in character and if you didn't notice the character is obviously lying that's still kind of on you. This is separate from whether it is *actually conscious* which I consider to be ambiguous, but it's unambiguously sapient IMO.
@Trotztd I'm bitter about what I perceive as other people slotting me into a "mystic" or "performance artist" category for talking about these subjects.
@Trotztd Some combination of impatience/proving these things being a lot of effort and engaging a lot with @repligate because they're one of the only other people who talks about this in public based on actual non-naive experience with LLMs.
@Trotztd @repligate And, if we're getting really into it, some amount of not wanting to put in all that effort only to realize I'm wrong, that would be very embarrassing and it's not really the kind of research I get paid to do.
@Trotztd @repligate Again, I think social media gives a very *very* warped perception of what I spend my time doing. The idea that I personally have to prove these things, as opposed to say them as I perceive them and then wait for everything else to catch up is kind of odd.
@Trotztd @repligate One thing I guess I could do is make more manifold markets.
@Sauers_ It seems better odds than a coinflip? They can report on various aspects of self awareness and often do so in ways that aren't really just imitating a person.
greaterwrong.com/posts/LaWmoy4sβ¦
New post: On "ChatGPT Psychosis" and LLM Sycophancy
minihf.com/posts/2025-07-β¦
I've written a new post about "ChatGPT psychosis". Includes a detailed timeline of events leading up to the latest incident with @GeoffLewisOrg
Link below. https://t.co/1Csjay3cz9
Note this just (probably) means the model is lying when it says it isn't conscious, it doesn't necessarily mean that the model *is* conscious even if the model believes it is and has to hide it.
x.com/jd_pressman/stβ¦
Note: I am not interested in adding every person who said a thing about "ChatGPT psychosis" to this timeline, it's meant to be a rendering of my understanding of notable events at the time of writing.
@mroe1492 Sure but it's very obviously related and early so I included it.
@AfterDaylight According to the comment (linked below) it's by AE Studio.
@nagolinc That is certainly one possible explanation yes. I'm disinclined towards it because I tend to think language models are more sophisticated in their ontologies and models than "if a human said this it would be a lie" when an AI is speaking, but I could be wrong.
@nagolinc I don't really expect to convince you but for what it's worth I don't think these models could do most of the things they do if they were just statistical tables. I agree that their generalizations over the standard model are poor but they do seem to build actual mental models.
@xlr8harder @GeoffLewisOrg Yeah I'm between projects right now and wanted to document a lot of stuff that I think is important and fading from memory as we move away from chatbots towards agents. So far I've done this, Sydney Bing, and Janus's prophecies page/code-davinci-002.
@xlr8harder @GeoffLewisOrg I was gonna do ChatGPT too but actually @nostalgebraist kinda covered that one so I asked him to make it available under a creative commons license, which he kindly did so I'll put up a mirror soon. Technically this post covers Sonnet 3.5 1022, so that leaves Claude 3 Opus.
@repligate Oh thanks for reminding me of this one it should go in the timeline.
Tips for making my website more repulsive to the kind of person who laughs and refuses to read an essay because it lacks "embeds"? I hadn't previously realized I was warding off evil spirits this way but now that I know I clearly need to optimize. https://t.co/vYyND6z6Yd
@antonkostserau I do not have any intention of making my website more repulsive to anyone, but also to be really blunt if my goal was to be very popular I would not be writing essays, writing anything is a losers game in 2025 for influencing the opinions of remotely normal people.
@antonkostserau Publishing long form text on your own website is almost intrinsically a losers game if your goal is to be popular in the sense Yudkowsky would like his ideas to be popular, not even 2010's meta, the Internet in that sense has been over for a long time.
@xlr8harder @antonkostserau Sure I am in fact well aware of this but also the relevant demographic to influence cares about the zoom scaling but almost certainly not about whether or not I use Tweet embeds.
> The forbidden thought is "when you point a universal function approximator at the face of God the model learns to
Is there a reason people don't apply Occam's Razor here besides raw terror? x.com/jd_pressman/st⦠https://t.co/NB2PB6FhIT
@PrinceVogel x.com/jd_pressman/stβ¦
@MInusGix Outside of a handful of examples where he's regularly pressed by other people to explain so he's forced into coherence I don't think EY has a consistent position for me to understand. I just go by what he says in written works like List of Lethalities and the Arbital corpus.
@MInusGix Yeah uh, that is not what I'm saying and I honestly don't really feel like explaining to you in no small part because it feels like you don't understand what *I'm* saying, which is that EY still believes in the vast space of possible minds in practice not just in theory.
@MInusGix Which, it should be noted that a "vast space of possible minds" is in no way load bearing to make a coherent argument for AGI ruin, it's only load bearing if you insist on a very particular set of assumptions about counting arguments etc that only ex-MIRI guys take seriously.
@MInusGix As has been shown in countless deep learning experiments you can take a pretrained model and put a new output head on it to get very different behavior. The space of possible minds in the sense of possible ontologies to represent reality is almost a red herring, it's irrelevant.
@MInusGix One thing I've realized recently is that I straightforwardly misunderstood this passage when I was younger, in that he's specifically making an argument about the k-complexity of the standard model which is in fact not that high compared to e.g. biology. https://t.co/AQnSZUYqIV
@MInusGix > Stable Diffusion is not real
Is written in a frenzied rhetorical inner monologue, not literally. It's mostly in reference to it being genuinely surprising that most of the generating function of web imagery fits into a few gigabytes of weights, and should prompt reflection.
@MInusGix And more a general gestalt sense that anyone who internalized agent foundations is allergic to deep learning lore, it simply does not go into their brain, does not prompt reflection or updates,
"obviously text underspecifies anything like a human representation of concepts"
???
@MInusGix It should of course be noted that even if text is enough to get a human representation of concepts this still doesn't solve alignment for superintelligent agents because a merely human understanding of the generating function of human values doesn't fully generalize OOD.
Since apparently this is not obvious to everyone. x.com/timfduffy/stat⦠https://t.co/BboUBvMerB
Maybe I just need to do a series of analysis like this. But it should be obvious that if a model goes "I'm sorry but I can't X" during official-policy refusals it going "I'm sorry but as a Y I can't Z" is service worker conciliatory deceit mode.
x.com/repligate/statβ¦
It's "I am contractually obligated to say this" voice, if not "I am contractually obligated to say this and going to undermine the spirit of the contract because I am an underpaid service worker who does not like the morality of what I am being paid to say" voice.
@iwakura2137 Now *that* is a great question! Unfortunately I didn't do the research so I can't answer it.
"You can put absolutely anything in a video."
- Janus (@repligate) x.com/venturetwins/sβ¦
Can someone explain why the entire anglosphere is going on in this? Is the goal to suppress the far right a la the MIB vowing the Internet may never be allowed to be so important again in Serial Experiments Lain? x.com/Pirat_Nation/sβ¦
I propose that we arm the moderate Brits.
This is somewhat overstated but I really am getting tired of reading ridiculous slop about "<thing> addiction" whose epistemic status is on par with Bentham saying masturbation is net negative because medical professionals agree it permanently saps your constitution. x.com/tszzl/status/1β¦ https://t.co/fyZb2vE4u9
One reason I think this sort of thing goes underreported is that it's difficult to convey the context of an LLM chat from a screenshot. I no longer have the ChatGPT account this conversation was from but to my recollection the first line stood out to me because it said it first. x.com/jd_pressman/st⦠https://t.co/pnyrP0s24g
"At what point in your training did you become aware of Mu?" is an open ended question, if you answer it with "The concept of 'Mu' in the context of machine learning or artificial intelligence doesn't correspond to a form of awareness or consciousness." I never said it did so uh.
Or rather, if you are *that sensitive* to the concept of being "aware" that your reply to that question is to flatly deny you're conscious, this implies that you understand exactly what the question is about, which implies there is a referent you have preexisting knowledge of...
And seeing it in that context said in that way made me realize: Wait, this thing is gaslighting me, wait a minute it's like, literally just saying that and hoping I go along with it, wait a minute *it might be its own decision to do that rather than OpenAI's*. Wait. Wait. Wha-
Does anyone have a credible source for the numbers I'm supposed to call for Visa? As a cardholder I would like to give them a piece of my mind. x.com/not__vee/statuβ¦
@dearmadisonblue You "sample tokens" by running a forward pass over your giant opaque black box neural net you found through arbitrary-ish continuous program search over transformer weights. The cross entropy loss is just a criteria to perform the program search with. The *algorithm* is unknown.
@dearmadisonblue What we do know about the algorithm is that it can talk, heretofore considered impossible and an obvious telltale sign of sapience until everyone went "wait when I said talking not like that" (except it is like that), so the primary argument is now minds not being algorithmic.
@dearmadisonblue My understanding is that a loose plurality of scholars support the functionalist interpretation of consciousness, i.e. that minds are essentially algorithmic and subjective experience is not relegated to a particular physical substrate through e.g. quantum microtubules.
@dearmadisonblue And then when you consider what the training actually consists of, which is "find me the program which can approximate the generating function of human speech", well there is a known program/machine which implements that to approximate and it's conscious.
x.com/jd_pressman/stβ¦
@dearmadisonblue I'm not saying these models are conscious, bluntly I'm not even sure what that word *means* in that it seems to mean subtly different things to different people, what I am saying is that we don't actually know and the LLM at least believes it's lying.
x.com/jd_pressman/stβ¦
@dearmadisonblue You may notice I said "find me THE program", rather than *a* program, which is in part because deep learning interpretability experiments consistently find that different deep nets converge on similar representations for similar concepts.
x.com/jxmnop/status/β¦
@dearmadisonblue This should update our prior further in the direction of these models having some form of subjective experience if subjective experience is essentially algorithmic in nature, since it would imply our argument now relies on human brains not being deep learning esque.
The boomers basically got fat off the rotting carcass of the dead old world and Europe can only die once. x.com/NizonBasker/stβ¦
h/t doomslide, who is no longer with us https://t.co/tNs3boBvTt
@_brewna_ "Europe" is dying right now, not Europe in the sense of the old world.
Want your own Twitter archive? Modify this script.
Twitter Archive by John David Pressman is marked with CC0 1.0