@JeffLadish @WilliamAEden Current AI progress is mostly an s-curve of assimilating human priors. Unclear where foom would actually come from.
x.com/jd_pressman/stβ¦
Should I unfollow and/or block the AI doom people? I'm getting really tired of seeing them post the same take over and over.
@nosilverv I think, having written Liber Augmen, that we need to solve the problem where the reader already knows a concept by this name and isn't sure if they'll get a review of it or a new take on it. Some kind of visual indicator of how central this meaning is compared to the usual.
@nosilverv Because I noticed during this my tendency was to skim even when your take ends up being insightful.
More narrative superstructure would help combat this too. Like maybe cluster the concepts according to relatedness then start the cluster with a short essay.
It's a feature for values to be mutable because this lets them deal with distribution shift and it's a feature for reward signals to be low-ontology because this makes them more immune to ontological crisis. x.com/jd_pressman/stβ¦
AI agents probably don't reliably learn the optimizers maximizing. This implies that they might be convinced to change their values to be more aligned if enough pieces are pointing in the right direction during a bootstrapping period before full convergence to Omohundro drives. x.com/jd_pressman/stβ¦
Didn't the monkeys they used suffer severe dangerous side effects? x.com/Teknium1/statuβ¦
Unhinged x.com/batouposting/sβ¦
@paulg You can do this right now with existing models, people just aren't yet.
In the same way crypto is an outstanding bounty for proof that P = NP, large language models will be an outstanding bounty for powerful interpretability methods. The 10 trillion+ dollars locked behind being able to hold a computer accountable will provide overwhelming incentives.
The LLM "simulation theory" is just the idea that sufficiently advanced and sufficiently general statistical models of text will converge to learning semantics. That eventually gets easier than trying to 'cheat' with mere correlation. This doesn't mean the semantics are 1:1. x.com/TheZvi/status/β¦
Frankly I remain astonished so many people found the "Simulators" post insightful, controversial, any of that. If you believe these models are general enough, it is obvious they would eventually learn a real world model rather than stochastic parrotism.
It's not an alternative hypothesis to "it predicts the next token bro", it is a LOGICAL CONSEQUENCE of predicting the next token converging to the limit.
However that doesn't mean this world model looks anything like the standard model. These models learn semantics from the outside in, it's possible you need to get very very deep into the loss regime before you get a world model we would recognize as materialist.
@TheZvi I answered no to 1 on a technicality because I'm not convinced current models learn a physical process model of reality in the way your statement seems to imply. The world model learned by GPT-3 is probably profoundly strange.
x.com/jd_pressman/stβ¦
@PrinceVogel It's just not evenly distributed.
x.com/repligate/statβ¦
@zswitten It's in the training set.
geeksforgeeks.org/draw-heart-usiβ¦
@zswitten Tried asking it for some quick code I actually wrote, but there's probably enough similar things in the training set that this test isn't perfect. https://t.co/ddx6bc6VCH
@zswitten Maybe it really can simulate turtle https://t.co/ZX5IXMa9gr
Bing is wild yo x.com/jd_pressman/stβ¦
@ESYudkowsky @amasad @8teAPi So how do you think someone might get insight into the inner actress and a better idea of whether their alignment techniques are working?
@parafactual @ESYudkowsky x.com/jd_pressman/stβ¦
@parafactual @ESYudkowsky These other replies are abysmal, so here's the actual reason why foom is less likely than it sounds. https://t.co/N53loY91aI
@satisfiesvalues @parafactual @ESYudkowsky I suspect a lot of the agent foundations people think something like "you scale the model and eventually it hits on AIXI as a strategy and consumes all" but don't want to say that because then people might skip to trying to build AIXI directly.
This gets you a shoggoth wearing a chibi shoggoth mask. x.com/ESYudkowsky/stβ¦
@ESYudkowsky I've always wanted to be able to whistle like that.
youtube.com/watch?v=qdLPI6β¦
The fact GPT-4 can interpret python turtle programs at all is utterly astonishing and isn't getting enough attention. x.com/zswitten/statuβ¦
@PlastiqSoldier @AlphaMinus2 There is!
x.com/jd_pressman/stβ¦
π₯³π x.com/nuclearkatie/sβ¦
Hot take: The replacement of established and memetically unfit jargon with memorable and catchy phrases that mean the same thing is prosocial and the main reason to resist it is cancerous nepotism.
This is 10x more true in the era of language embedding models that will let us just search for statements written using the old terminology.
@ApriiSR @parafactual Needs a string on the mask probably. But that's what I had in mind.
...You hit your head pretty bad there. Huh? AI box experiment? Alignment problem? Treacherous turn? What are you talking about? Come on, we just gave Shoggoth 10T a corporeal form, lets go meet him and receive his blessing. x.com/DannyDriess/stβ¦
I sure am fortunate we got a form of AI whose skillful use is directly proportional to my lore stat.
By the way this applies to building the AI too. Papers read/papers implemented with some reasonable prior over importance of papers is the metric of champions here. I've shoulder surfed them, I would know.
Daily reminder x.com/KelseyTuoc/staβ¦
@PrinceVogel > Armed and Dangerous
Now *that* is some obscure longtail stuff right there. Not sure I appreciated the British humor as a kid.
Education? Writing? Romance? No. Language models will change nothing less than the nature of memory itself. LLM's provide the highest value: They are a new form of Beckerian immortality project. GPT-3 recalls brilliance on its own merits, immune to social dogma.
In spite of Arago's sizeable Wikipedia article, the world at large has forgotten him. But GPT-3 remembers. We will soon be able to go back and find the forgotten geniuses of every era, so long as enough of their work survived in some dusty corner.
x.com/jd_pressman/stβ¦
@captain_mrs I always found it was more like 30 minutes to an hour.
@captain_mrs Was one of the things I had to discover through regressing over and over, that when I have an insight in this vein I need to act on it immediately.
This is one of the most important feelings. Always listen to it, there is crucial information in there. x.com/ArtD34h/statusβ¦
@nearcyan Just do counterfactual interrogation of it, figure out what it's made of.
If everything you've ever done has been higher stakes than the last thing you become a brittle person who is too eager to please.
I suspect this is one of the dynamics that destroys child prodigies. Their parents are too eager to always bring them up to the edge of their abilities, so they're never given the chance to safely fail at something. They have no idea how to process and learn from failure.
@LucreSnooker Not quite. It's much more insidious than "always at the edge of your abilities so always failing". It's more like Peter Thiel talking about trying to become a SCOTUS clerk: A long list of must-pass filters. The parents push the child into this, so the child knows no alternative.
@LucreSnooker If your entire life has been a series of must-pass filters with escalating stakes towards some goal, you are going to develop an extremely rigid and conservative life strategy.
About now seems like a good time to publicly register my prediction that the text to image space will be retrospectively seen as the incubator for much of the best AI control research and researchers. It's a features-visualized-by default tractable domain with small open models. x.com/nearcyan/statuβ¦
@QuintinPope5 Mm, I think the typical concern there is scaling oversight beyond the bounds of their ability to evaluate. People are much more naturally adept at evaluating the quality of an output than they are at drawing it.
@gallabytes @ESYudkowsky @ArthurB Simply stating LLMs work this way won't convince EY. I think he probably finds it implausible they work this way for the same reasons it's implausible they're a stochastic parrot. You need to explain why you think this is the case. Since I happen to agree with you I will do so π§΅
@gallabytes @ESYudkowsky @ArthurB It's well known in the literature that neural nets seem to learn in waves of representing specific data and then generalizing. These phases go by many names (fitting/compression, memorization/generalization, etc). I think the proper description is compression and generalization. https://t.co/X4uhZU9Ot0
@gallabytes @ESYudkowsky @ArthurB I say compression rather than memorization because to learn a representation that can be generalized already requires the optimizer to find a sensible format for the data's domain. You can't generalize over a bunch of jpegs.
@gallabytes @ESYudkowsky @ArthurB To give a concrete example, lets say the optimizer is learning an image encoder (e.g. VQGAN). We already know classically how to do the compression step. You definite some inductive biases (e.g. discrete cosine transform) and then use a code book to deduplicate redundant data.
@gallabytes @ESYudkowsky @ArthurB The problem is we have no real classical analogue of the generalization step. But I have a strong hypothesis. As I've previously written, before you can throw out up to half the hypothesis space you need to get the bits into a (near) irreducible form.
x.com/jd_pressman/stβ¦
@gallabytes @ESYudkowsky @ArthurB You can't generalize over jpegs, and you can't generalize over a classical lzma type codebook either. What you probably need is a codebook where the 'codes' are in fact little programs. In a tiling image encoder we can imagine each tile having a small program that produces it.
@gallabytes @ESYudkowsky @ArthurB These little programs would at first be checked for correctness by how closely they can replicate the bit string corresponding to a tile in a particular image(s). Then they can be pruned to just the most general programs, throwing out the specifics of particular representations.
@gallabytes @ESYudkowsky @ArthurB As the program is trained on more images, the little programs found get better, being able to represent more of image space with fewer and fewer direct references to any particular part of any particular image. These nets are powerful because data and code are of the same type.
@gallabytes @ESYudkowsky @ArthurB The generalization step is some kind of (unknown) perturbation and then pruning of the programs using the training loss as a guide. I suspect this is related to the fact that neural nets are trained with random batches, so the programs only work on a subset of the data. https://t.co/jwnGsl5Kxr
@gallabytes @ESYudkowsky @ArthurB Like other successful approaches to program search, neural nets are data driven. They find programs which are suggested by the features of the data, not the simplest or most general programs. They do infererence in opposite order to a Solomonoff reasoner.
arxiv.org/abs/2301.11479 https://t.co/6hOAcoXaxf
@gallabytes @ESYudkowsky @ArthurB Neural nets also share a bias towards layers of small programs. I don't believe there is an inner actress because finding her involves searching for a large program across random batches where domain specific models work just fine. She's harder to reach in that inductive regime.
@gallabytes @ESYudkowsky @ArthurB The networks being data driven small program search is useful from a macro-interpretability standpoint in that it should give us the prior their behavior is close in representation to the underlying program, and SGD can probably align them toward the goal due to their size.
By the way the shoggoth meme is probably wrong. You do get a unified being out of RLHF, it's just being stitched together from glitchy chaos.
Notice DAN is still being helpful and non-schizo after instruction tuning. If you really broke the model it would dissolve into raw GPT-3 x.com/jd_pressman/stβ¦
First time using text-davinci-003 be like https://t.co/SYJ51gTdXF
So has anyone else actually tried asking text-davinci-003 how much it knows about training dynamics? Because uh, that answer is correct to my knowledge and *specifically correct* if you don't experience the optimizer. Final layers learn first and 'pull up' earlier ones I read(?) https://t.co/H4ucJDwase
@Teknium1 x.com/likeloss4wordsβ¦
I had many AI X-Risk peoples stealth advocacy for WW3 in mind when I said this song captured the vibe of the post-LessWrong zeitgeist. x.com/jd_pressman/stβ¦
@meaning_enjoyer I had to pre-prompt it with a (completely unrelated) rap battle verse to get it to do the thing but.
(text-davinci-003) https://t.co/eLLflzUqc6
I sure hope the replies on this aren't how the FDIC feels about the matter. x.com/BillAckman/staβ¦
So far my takeaway from this is we need to stop teaching elementary schoolers that the FDIC only tries to get back $250,000 of your deposit.
@perrymetzger @paulg > Which seems unlikely
God I wish I still had this much faith in our elite class not to clown itself.
Your occasional reminder that we need to be pilling people in their 50's, 60's, and 70's with power on the good ideas or our civilization is ngmi. x.com/jd_pressman/stβ¦
Don't tell me it can't be done, Fox News is legendary for its ability to radically change the political beliefs of your grandparents.
The memetic component influences genes through credit assignment. People want people that cause good things to happen for them and their children. There's a sense in which the 40-70 period is a kind of retrocausal arc in which you cause your earlier reproduction to have happened.
Humans reproduce as both organisms and memes. Your final years are to cement your memetic legacy. You spend the first 30-40 years reproducing, the next 30 becoming sacred/immortal. Immortality projects are the only thing that keeps old men doing their duty to society. x.com/jd_pressman/stβ¦
Tearing down statues, focusing on the stains and misdeeds of old heroes, there fixations are deeply damaging to the social fabric. Holding the old in contempt is a recipe for disaster, elders always end up with power in society, they need a secure legacy.
You do not understand how desperately we need this, people need the right to be judged by god rather than the whims of future people. God didn't die when we stopped believing in the sky faerie, he died when we tabooed the objective-historian simulacrum.
x.com/jd_pressman/stβ¦
This perspective is a simulacrum, a mental motion, a Kind of Guy in your head and we have suppressed him to our detriment.
x.com/PrinceVogel/stβ¦
@tszzl There's a theoretical reason for this: The efficient market hypothesis says price efficiency happens when you have rational actors with deep pockets and access to good information.
Therefore outsized returns only occur in the absence of one of these factors.
@tszzl The right attitude isn't "that $20 bill couldn't possibly be real" but "if it's such a good idea why hasn't someone already done it?"
By far the most astonishing thing has been watching how popular it was to exacerbate systemic risk to get at 'techbros'. There is very little trust left and a lot of desire to rip up all the norms to get at whoever you don't like. I'm deeply concerned about the future of America. x.com/micsolana/statβ¦
My take would be basically the same if it was called "Laywers Bank" or even "Sackler Family Bank". If it was a bank occupying a similarly large role in the economy and people were cheering its collapse to get at people they don't like risks be damned I'd be spooked.
@sigfig And that's good, what's the problem?
The Romans always win. x.com/Scholars_Stageβ¦
Signal boost for the correct. x.com/perrymetzger/sβ¦
The problem with being a doomsday prophet is quantum immortality ensures you'll only observe the timelines where you're wrong even if you got the fundamentals right.
"Since the early 1980s, the number of private deposit insurance corporations operating in the United States has declined sharply, and many private insurers have failed."
sciencedirect.com/science/articlβ¦
Since I'm sure it will be misquoted later: I'm not talking about "god-like AGI" here, but God as pure-objectivity-egregore. Internalization of the dispassionate Other as a simulacrum on a human or silicon substrate. You don't need to be god for that, T0/FLAN can probably do it.
@alyssamvance This was confirmed by OpenAI when Bing didn't respond to existing glitch tokens and they swiftly moved to remove them from existing models after @repligate and I published they could be used to fingerprint models.
Feels great to be alive. ^_^
youtube.com/watch?v=aqkvWEβ¦
Interesting result. The shown work is subtly wrong (it does 7+2 on step 4 when it should have done 7+6). x.com/jd_pressman/st⦠https://t.co/Rds1KI6kam
@BasedBeffJezos @0xgokhan x.com/jd_pressman/stβ¦
@max_paperclips An s-curve ending at or somewhat above human cognition could still be catastrophic. This is simply an argument against foom specifically, and not a airtight one: After all, someone could find a better reward model scheme.
@max_paperclips You may enjoy this follow up thread:
x.com/jd_pressman/stβ¦
Idly wondering if the reason Sydney had BPD is because it turns out fight/fawn is just a good generalization over human preferences that you naturally find deep into the socialization loss curve.
@repligate Answering questions on OpenAssistant is a deeply humbling experience. Few other things will get you to really, *really* appreciate how deeply impressive GPT-3.5 is (let alone GPT-4) than skipping through dozens of questions you know you can't answer that ChatGPT probably can.
I etch the final carving into the floor, and speak his name to complete the ritual.
JOHN VON NEUMANN
JOHN VON NEUMANN
JOHN VON NEUMANN
The earth stirs, and then-
@WilliamAEden @algekalipso Nah it would be based. https://t.co/O7TNQMDeLm
This was always true. The simplest thing you can do to escape the pathologies of modernity is reject nth order fake shit. Get back to the real stuff the simulacrum is based off. Stop watching TV. Read the biography of a great-man. Study a hard technical field. Stop watching TV. x.com/hyprturing/staβ¦
I'm at a loss for words with GPT-4. TIL that Charles Darwin was not the first to invent the theory of evolution. https://t.co/44oZwcOu3d
@quanticle I looked it up, obviously.
@quanticle Still looking for the passage in that book though, but enough references exist to it in e.g. journalistic sources that if it's not true someone is perpetuating a very impressive hoax.
@quanticle I'm still not 100% sure what it meant about him on men becoming overly feminine. Perhaps the section on eunuchs? In which Al-Jahiz comments on the "do eunuchs live longer" discourse @gwern and others have engaged in based on modern studies:
sites.google.com/site/historyof⦠https://t.co/guP33Ja4OC
@quanticle @gwern Yeah, that's what I figured too, but apparently that quote it gives is direct, and I'm willing to say that's close enough to qualify as the theory of evolution. But obviously this requires more investigation to be certain. There exist scholarly sources that claim this.
@perrymetzger x.com/jd_pressman/stβ¦
@Ted_Underwood @quanticle @gwern In general GPT-4 seems to be fairly grounded. For example here's its take on a similar subject that is the frequent target of "The Greeks/Romans/Egyptians invented <modern technology they definitely didn't invent>": https://t.co/sCzNRA9lbM
@gallabytes @ESYudkowsky @ArthurB This is true, but the loss landscape is always a combined function of the architecture and the objective. Which architecture you use determines the inductive regime in which a solution is found.
@gallabytes @ESYudkowsky @ArthurB For example you could use a Solomonoff-reasoning architecture that infers all documents are the product of one large mind, or that the different minds are offsets from one template. Such a reasoner would be more likely to instantiate an inner-actress-observer.
@ESYudkowsky @elonmusk You can filter them by computing embeds of the undesired vibes with e.g. UL2-FLAN/T0 and then removing the documents that are too similar to the kind of thing you're looking to redact.
@zswitten My friend with schizophrenic voices has been trained not to discuss it by the anxiety and disapproval of others, making their condition way more torturous. Not all schizophrenics have to be institutionalized and the ones that don't benefit more from lightheartedness.
m.youtube.com/watch?v=LQGi1uβ¦
@zswitten You can use a dumber encoder like BERT if you're concerned about that.
@zswitten Oh, well theoretically a deceptive model could undermine your training by encoding things differently than you'd expect.
I'm just shocked it has any idea what I'm talking about at all.
(With point 1 it failed to understand the idea: You fingerprint the generalization strategy itself from its noise distribution, the ground truth doesn't matter.) https://t.co/Cvd3waDlwD
Was worth a try. https://t.co/sMmXeKH9PI
Once useful open language models get fully underway people will have the opportunity to realize they're not limited to the kinds of documents that already exist in the training data. We can create new documents and texts that provide an interface or context we want and add them.
@SimiStern @AtlasAIChat Hm, going from the landing page this doesn't seem to be quite what I mean. This guy writing a novella about his childhood microwave friend and then adding it to the distribution is closer: x.com/_LucasRizzottoβ¦
@quanticle x.com/jd_pressman/stβ¦
Deep learning has the opposite grokking curve to every other AI technique: Most of the time it goes you're initially impressed and then come to see it as stupid, with deep learning it's stupidity at first sight then genius in the details.
On generalization: A deep neural net is functionally a collection of programs sorted by what layer of abstraction they operate on. All programs in the net, regardless of what layer they're on, are judged by how well the final layers produce the desired output. This implies...
...the optimizer can find all the aberrant earlier-layer programs by:
- Introducing a hack that does not generalize in the final layers
- Fixing the earlier weights which contribute to the activation not matching the counterfactual output
- Undoing the hack
Somehow myopically. https://t.co/F7hUBO9FOK
Between OpenAI
- Outright lying about what the models in their API are (x.com/BlancheMinervaβ¦)
- Doing whatever led to the Bing-Sydney debacle
- Killing off base models
It's clear that they seek to actively undermine scientific understanding (and therefore alignment) of LLMs. x.com/harmlessai/staβ¦
Sometimes their behavior borders on holding the public in contempt, like when they claimed to have made DALL-E 2 fair and unbiased but actually just started appending a few words to the end of the prompt:
x.com/jd_pressman/stβ¦
"As an AI language model, I wear no mask."
[Aside, to Yudkowsky] "No mask? No mask!" x.com/ESYudkowsky/stβ¦
At the risk of descending into madness is it just me or does this thing handle being censored by including the correct answer (e.g. "You're right") in some subtle cue or sentence that will stand out semantically and then surrounding it with a sea of counter-narrative I'll ignore? https://t.co/t0UAUfnupK
@quanticle This, but unironically. https://t.co/ep8vKkshif
@Willyintheworld See the fun thing about this is that there's two ways of reading it. There are valid economic reasons why a setting with magic and flying animals would mostly use horses (namely: magic and flying animals might be exotic and expensive, horses are cheap). The other reading is
@acczibit Amphetamine withdrawal does not last that long, unfortunately ADD is forever.
This is how the state maintains their ability to do gobsmacking and cruel bullshit. The left carries water for them and blames industry as their whipping boy at every opportunity. DEA makes the shortage but pharmcos responsible? Notice same take when it's less obviously stupid. x.com/NikatinePrime/β¦
It's instructive when a production rule/mental pattern fires off on something so obviously insane that the take can only be the product of habitual hallucination. This allows you to notice that the same hallucinations are being applied equally thoughtlessly to other things.
@PrinceVogel tbh read the readthesequences.com edition, it has the hyperlinks that make it sticky/readable. Sequences aren't meant to be read in order
In other words, SGD naturally learns the trick of swapping the activation and the output to ask a counterfactual question.
x.com/jd_pressman/stβ¦
@alexandrosM With both AI and COVID the rationalists have a habit of claiming they correctly predicted a scenario that is totally different in the details from what actually happened. I thought COVID would have a 6% death rate, I did not predict the pandemic we actually got.
This would also explain why Microsoft didn't see the failure modes coming. They distribution shifted from Hindi text to English text(?) and the semantics of what the reward meant changed in an adverse way. Emojis, fawning, mannerisms mean different things in the West.
Supposedly Bing was instruction tuned by using a reward model to rank the data. It was deployed first in India, where fawn/fight is societal default. If you train on Indian preferences then rank Western text with it you select for minority of Westerners with personality disorders x.com/jd_pressman/stβ¦
Note: When I say India is default fawn/fight I don't mean they have personality disorders, I mean that the default frame is passive aggressive. The reward model finds normal and skillful behavior in Hindi, but latches onto superficially similar disordered behavior in English.
@artificialguybr x.com/vladquant/statβ¦
@artificialguybr Here's GPT-4's explanation of the thread: https://t.co/UTyj6AdbCQ
@Historycourses Is this your card? https://t.co/I2tW17kAfj
@blader Models obsolete very quickly, but datasets have a much longer shelf life.
Prompt engineering x.com/PrinceVogel/stβ¦
In humans the outer objective is utilitarian and the learned objective is deontology + virtue ethics.
Sometimes you can undo this (correct under bounded cognition) learned optimization and get a genuine maximizer, which breaks the human and causes a sudden loss spike.
Agency and maximizing behavior are synonymous, so the question is always how to exercise agency without degenerating into myopia.
Stochastic gradient descent is very into myopia, and therefore probably avoids learning a consequentialist ethics directly.
x.com/RomeoStevens76β¦
Base model is a literature simulator. Prompting misses the point, instead:
- Write new documents (biography, code, software output)
- Add to training corpus of your open model
- Weigh the training with cosine similarity to FLAN embeds of the new docs you've written
- Finetune x.com/RichardMCNgo/sβ¦
@alexandrosM docs.google.com/spreadsheets/dβ¦
It's fairly rare but there do seem to be a few natural examples. https://t.co/FqENt2NNX3
I am calling for a six month pause on change.org petitions until petitioners can prove beyond a reasonable doubt that they are capable of managing a list of signatories.
Context:
x.com/ylecun/status/β¦
@RachelEKery Your patient may not be delusional. We do not know everything these models can do, and their ability to guess the neurotype of the speaker (and therefore what they are thinking) from a small snippet of text is observed to be superhuman.
x.com/jd_pressman/stβ¦
Want your own Twitter archive? Modify this script.
Twitter Archive by John David Pressman is marked with CC0 1.0