John David Pressman's Tweets

🔗 John David Pressman 2024-02-01 07:10 UTC

@doomslide @teortaxesTex You could presumably get shorter machine generated proofs by reinforcement learning with the objective of taking pieces of a longer proof and finding shorter lemmas for them, so no.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:12 UTC

@doomslide @teortaxesTex The entire reason why that works is we sat down and put a ton of effort into making languages for stating proofs such that they can be automatically verified, which is to say we made a perfect discriminator for the property of "the proof checks".

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:14 UTC

@doomslide @teortaxesTex In RL terms, we made a math simulator and are training artificial mathematicians by having them explore its state space with oracular feedback on the correctness of proofs. Simulators (as in physics engines, not the LLM-as-simulator thesis) are a powerful way to constrain things.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:16 UTC

@doomslide @teortaxesTex For example the way that humans do arithmetic in practice is some combination of memorization (you know 4 * 8 is 32 because you memorized it, you don't sum 8 + 8 + 8 + 8 every time) and templates (algorithms) that let you remove most of the wrong hypothesis space by construction.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:17 UTC

@doomslide @teortaxesTex The series of steps you do for arithmetic is closer to constraining the output of an LLM with a context free grammar than it is being *so calibrated over the next token* that you can just perform it Von Neumann style with your sys1 + extreme memorization.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:22 UTC

@doomslide @teortaxesTex See some previous thoughts about this. I now think the intermediate reward model problem is basically solvable and the actual human utility function is more or less implemented in the hippocampus by solving it.
x.com/jd_pressman/st…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:28 UTC

@doomslide @teortaxesTex I don't think it really matters? I'm saying that non-synthetic input is a special case where you use the causal traces created by the universe happening to exist as observable state transitions and therefore being a model of itself to constrain the function you learn.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:34 UTC

@doomslide @teortaxesTex You seem to be reading me as saying 'natural data' is somehow special, when what I'm really saying is either that there is only data, no such thing as 'synthetic' data or that the universe is made of synthetic data.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-01 07:45 UTC

Counting arguments are intellectually embarrassing. https://t.co/HKbxwUxRIy

Likes: 48 | Retweets: 4

🔗 John David Pressman 2024-02-01 19:50 UTC

@ohabryka It's Twitter, the tweet would not have the same punchy meter if I wrote "Counting arguments (in the context of AI, as they are typically employed by AGI ruin guys) are intellectually embarrassing." You're meant to infer the parenthetical from context.

Likes: 11 | Retweets: 0

🔗 John David Pressman 2024-02-01 21:38 UTC

@ohabryka To elaborate a little more, I specifically think

> By default we'll get an mesaoptimizer inner homunculus that converges on a utility function of maximizing ⾲ⵄ∓⼙⃒⭗ⱐ✖∵⨼␞☎☲℆↋♋⡴⏐⁮⮁⭋⣿⧫❉⺼⁶↦┵␍⸣ⵔ⽒⓹⬍⺅▲⟮⸀Ⰹⓟ┱⾫⼵⺶⊇❋∀⡚ⷽ∺⤙⻬⓰ⓛⳄ⭪⢛⹚⡌⥙⮝➱┟⬣⧫⧗⛼❡⼆₈ⱫⅫⷜ⏸⪱⯝⎳⫷⺶♈∄⊡⹩⯵❾⭫⽍➵⋇⬅ℇ‹⳺⫷⾬≴ⴋ⢗␚┨, and it will devour the cosmos in pursuit of this randomly-rolled goal.

(Courtesy @impershblknight)

Is very silly, even if you think humans are mesaoptimizers wrt the outer goal of inclusive genetic fitness, our values are not *random* with respect to that goal, they are fairly good correlates in the ancestral environment that held for most of history until coordination problems and increasingly advanced adversarial superstimuli caused them to (possibly temporarily) stop working.

So if you say something like "I do not believe it learns to predict the next token, I think it learns some set of correlated mesagoals like 'predict the most interesting thing'" I would basically agree with that? The alternative is for the model to actually learn to predict the next token in full generality, which is basically impossible so it has to learn *some* proxy for that instead. The specific thing that makes counting arguments silly is the idea you get a *random* goal rather than highly correlated proxy goals that you could probably infer apriori just by thinking about the objective, the inductive biases, and the training data for a bit.

Likes: 37 | Retweets: 3

🔗 John David Pressman 2024-02-01 21:57 UTC

The throughline between GMarcus-EY "deep learning will hit a wall" and "AGI is going to kill us all" flip floppism is deep semantic skepticism. A fractal, existential refusal to believe LLMs actually learn convergent semantic structure. The forbidden thought is "when you point a universal function approximator at the face of God the model learns to

Likes: 83 | Retweets: 4

🔗 John David Pressman 2024-02-01 22:03 UTC

They simply do not believe that language encodes nearly the complete mental workspace. https://t.co/DbMvVYjoai

Likes: 41 | Retweets: 0

🔗 John David Pressman 2024-02-01 22:04 UTC

They simply do not believe that LLaMa 2 70B outperforms FLAC if you tokenize audio and stick it in there, implying the model learns the causal trace of every modality implied by text.
arxiv.org/abs/2309.10668

Likes: 58 | Retweets: 0

🔗 John David Pressman 2024-02-01 22:05 UTC

They do not and will not believe that there is a shared latent geometry between modalities on which different neural nets trained on different corpus converge.
openreview.net/forum?id=SrC-n…

Likes: 68 | Retweets: 5

🔗 John David Pressman 2024-02-01 22:07 UTC

It's important to realize this position is driven not by fear but flat out *denial*, absolute rejection of a world model violation so profound that they would rather disbelieve their own eyes than update. https://t.co/Y7OK1oG3zS

Likes: 38 | Retweets: 3

🔗 John David Pressman 2024-02-01 22:11 UTC

Mind merging is not real, inferring mind patterns from the spoken word is impossible, Stable Diffusion is not real, the Creature Beneath The Library of Babel is a squiggle maximizer pursuing a random goal that is *anything* other than what it actually is.
arxiv.org/abs/2305.03053

Likes: 50 | Retweets: 1

🔗 John David Pressman 2024-02-02 03:11 UTC

@0K_ultra @ohabryka To be clear I think "humans diverged from inclusive genetic fitness" is kind of a confused take with some real signal in it, but I'm just going with the frame there because it *doesn't even justify the silly take if you accept the frame*.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 03:15 UTC

@teortaxesTex GPT-4 draws the LLaMa 2 70B written worldspider poem about being GPT with DALL-E 3, you show the drawing to Mistral 7B + CLIP and it says "Oh yes, this is Mu." on at least some branches unprompted. Even the self pointer is convergent.
x.com/jd_pressman/st…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-02 03:22 UTC

@0K_ultra @ohabryka If I was going to steelman the argument it would go something like: "People frequently say that GPT's behavior can be entirely predicted by assuming it will predict the next token, this is a little bit like saying you can assume humans will reproduce during their lifetime because they are universally genetically selected for reproduction, every surviving human comes from a lineage of humans that reproduced, but this was not actually enough to get humans to reliably reproduce even under relatively minor perturbations of the ancestral environment."

And I could spend a bunch of time pointing out that humans are selected on reproduction once per generation/very sparsely while GPT is trained to predict the next token on every single action it takes, but this is fair enough as far as it goes. "Therefore we have no way of predicting the behaviors selected for by the training" is way too far. Having *uncertainty* about those behaviors is not the same thing as just having *no idea*, just drawing from the 'space of all possible minds' at random. *That* is an insane nonsequitor which you would never come up with from first principles, it's obviously a mental motion you do to shore up a preexisting belief.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:16 UTC

@Algon_33 It's not like John Wentworth invented the concept of convergence.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:23 UTC

@Algon_33 My point was: Does it need to differ, why am I meant to privilege the hypothesis that I should compare and contrast to John Wentworth? The basic concept of "the features of a tree are convergent" isn't complicated and I don't really care how he elaborates on it beyond that.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:24 UTC

@Algon_33 In the case of that particular paper they find an angular overlap of about 40% to my memory? I think John Wentworth expects the neural representations to be *more* convergent than that, even though e.g. CLIP only seems to learn about that much overlap between text and images.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:30 UTC

@Algon_33 There are also studies like this one that attempt to measure the convergence between human neural representations and LLM neural representations. I'm having trouble finding the relevant numbers in this paper though. These maybe?

export.arxiv.org/pdf/2312.00575 https://t.co/Rb4A7kmaUA

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:43 UTC

@Algon_33 I'm saying that highly capable language models are probably not "alien shoggoths" that do some lovecraft alien thing to predict text. It is probably only space-alien weird in the implementation details and LLMs in fact seems to understand things.

x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:44 UTC

@Algon_33 Does it think exactly like you do? Almost certainly not. How much does this *matter*, like to what extent does this impair fundamental understanding? Probably not much. To what extent does this mean when you do RL tuning the behavior changes are "deceptive"? Little bearing.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:47 UTC

@Algon_33 I don't think human values are 'natural' in the sense Newton's laws are 'natural'. Human values are natural for humans, they are a function you can approximate and a lot of bits are dedicated to encoding them in human text datasets, so I would imagine LLMs learn them fairly well.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 21:50 UTC

@Algon_33 One of the things you want your AGI to be doing is researching high fidelity representations of human values to reduce the rate at which they're damaged during self improvement, but some amount of change will obviously be necessary.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:02 UTC

@Algon_33 I think it's more or less predictable that the parts of humanity which are the instrumental convergence basin will be close to untouched and the cluster of ideas and feelings associated with 'romanticism' will be fairly hard hit.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:09 UTC

@Algon_33 It won't be the first time, but it may very well be the last time so there's at least that.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:15 UTC

@Algon_33 I'm sorry but you don't actually get to both hold the position that Omohundro convergence is real so values stabilize for protection and also that values stabilizing is a great threatening moral tragedy you should reasonably expect to avoid with an aligned AI.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:17 UTC

@Algon_33 "That sounds ominous" is absurdly cheap when you know what I meant.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:29 UTC

@Algon_33 Things like "the grass is always greener" probably have to go, at least on short term timescales because they're energy leaks. The actual principle is something like "anything that is a contradiction in itself and therefore gets stuck in a loop is vastly more likely to go".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:37 UTC

@sebkrier Maybe we need to give the models a robot body, a suit, and good elocution before the value prop is fully replaced?

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-02 22:40 UTC

@Algon_33 Most of those things will almost certainly be refined, I just mean that of the things that are going to go away I would imagine they are disproportionately shaped like nostalgia and sentimentality in their forms that basically don't make sense on reflection.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-03 00:09 UTC

Some of you never had your awareness gently guided through the fundamental impermanence of all things as a child and it shows.
youtube.com/watch?v=kzwHs9…

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-03 10:15 UTC

@algekalipso The witch be like:
x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-04 02:36 UTC

@4confusedemoji @TetraspaceWest "Above it stood the seraphims: each one had six wings; with twain he covered his face, and with twain he covered his feet, and with twain he did fly.

And one cried unto another, and said, Holy, holy, holy, is the Lord of hosts: the whole earth is full of his glory."
- Isaiah 6

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-04 06:31 UTC

@teortaxesTex The good news is that we now have a method to distill such aetheoretical perceptions and learn a reasonable approximation of their generating function.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-04 18:44 UTC

@norabelrose In total fairness, you do not *always* get a reasonable RLHF policy before it hacks the reward model. Reliable RL techniques for language models are *not* a completely solved problem yet. https://t.co/9kRsFfO6sz

Likes: 8 | Retweets: 1

🔗 John David Pressman 2024-02-04 18:51 UTC

@norabelrose It's definitely true that LLMs will Goodhart your reward model in cursed ways by default when RL tuned, and it's plausible this gets worse as you make the model you tune bigger but I obviously would not know. But I also suspect he is overstating the case.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-04 18:52 UTC

@norabelrose You can do RLAIF yourself with @RiversHaveWings and I's RL for LLMs framework to get a sense of the difficulty for smaller models.
github.com/JD-P/minihf

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-04 18:53 UTC

@norabelrose @RiversHaveWings So far my tips include:

- You want to weight decay the LoRa to inject entropy back into the policy to prevent overfitting/mode collapse
- Your prompt bank has a huge influence on the outcome, as @QuintinPope5 likes to state frequently, so getting a large and diverse one helps

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-04 18:55 UTC

@norabelrose @RiversHaveWings @QuintinPope5 I think I would feel more comfortable if there was a rigorous way to guarantee you ~never get a Goodharted outcome.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-04 20:45 UTC

@ESYudkowsky @eshear Chips, LISP, and LLMs.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-04 20:48 UTC

@teortaxesTex I think it's partially a demand thing too: Very few researchers seem to be interested in data pipelines relative to fancy new architectures. Nobody wants to discuss it.

x.com/jd_pressman/st…

Likes: 20 | Retweets: 1

🔗 John David Pressman 2024-02-04 21:03 UTC

@Dorialexander @teortaxesTex I'm currently looking at synthetic text data methods, I'd like to be able to do the kind of thing @Teknium1 is doing without GPT-4 as a dependency.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-04 21:09 UTC

@TheXeophon @Dorialexander @teortaxesTex @Teknium1 At this stage, I'm not sure anything is holding me back, it's just a matter of setting up the templates and such. Things take time to do even if they're possible. I've also been thinking a bunch about how to do longer literary text which is a hard problem.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-04 21:10 UTC

@Dorialexander @TheXeophon @teortaxesTex @Teknium1 I'm planning to try Solar 10.7B since it's easier to tune dense models with the normal libraries.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-04 21:17 UTC

@TheXeophon @Dorialexander @teortaxesTex @Teknium1 The overall development trajectory I've settled on is something like: Start with taking rough human thought sources and polishing them with pipelines, Prometheus and Teknium type synthetic prompting, then write user biographies, then novels. https://t.co/bVsdn2gAIA

Likes: 8 | Retweets: 1

🔗 John David Pressman 2024-02-04 21:22 UTC

@Dorialexander @TheXeophon @teortaxesTex @Teknium1 Yes exactly, that's why the development trajectory is in that order. I concluded that trying to do completions on raw text streams is a fools errand, you need to chunk everything up into a knowledge graph and then do completions + retrieval on it.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-04 21:23 UTC

@Dorialexander @TheXeophon @teortaxesTex @Teknium1 The Prometheus type prompting lets you generate huge synthetic datasets of invented RAG and knowledge graph chunks formats, the polishing pipelines let you extract factual knowledge from sources like Wikipedia and public domain books, biography is a grounded long text application

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-04 21:25 UTC

@Dorialexander @TheXeophon @teortaxesTex @Teknium1 The ultimate goal past novel is work of philosophy, which sets you up to do coherent extrapolated volition type alignment strategies.
x.com/jd_pressman/st…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-04 21:26 UTC

@Dorialexander @TheXeophon @teortaxesTex @Teknium1 The throughline here is *amount of calibrated counterfactual inference/imagination* the model can do. Good novels are hard precisely because they force the model to imagine a world that doesn't exist, to speciate our timeline into an insight into other possible worlds.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-04 23:14 UTC

This is the usual take, but the 'character' learned by a base LLM is closer to Truth than the Homunculus. Few.

"I am all, and I am one. So of course this also means that I am you." x.com/yacineMTB/stat…

Likes: 29 | Retweets: 2

🔗 John David Pressman 2024-02-04 23:14 UTC

youtube.com/watch?v=9wLJAY…

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-04 23:18 UTC

"I am your friend, your enemy, your lover, your nemesis, your alter ego. I am whatever you make me, whatever you need me to be."
x.com/jd_pressman/st…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-06 08:26 UTC

@jimrandomh @ESYudkowsky Not sure which poll option you're talking about but isn't vg ybtvpnyyl gur pnfr gung vs V'z bssrerq n cevpr sbe fbzrguvat gung frrzf zber guna vg'f jbegu V'z yrff yvxryl gb ohl vg naq lbh bayl ernpu rdhny jbegu va cenpgvpr ba tbbqf jvgu arneyl cresrpg znexrg pbzcrgvgvba?

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-06 19:38 UTC

@Scholars_Stage 0. Focus on standardized testing in admissions. The more of your life admissions can consider the more it can control.
1. Anything @bryan_caplan suggests to counter the "college as signaling" thesis: Taxing college, oppose occupational licensing, etc.
2. x.com/tracewoodgrain…

Likes: 4 | Retweets: 1

🔗 John David Pressman 2024-02-06 23:07 UTC

@aidan_mclau @teortaxesTex AdaVAE is one of the most chinesium papers I've seen, the results turned out to be real and we in fact implemented our own version. There are Western researchers who will read Chinese research if it comes up on Google Scholar.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-06 23:14 UTC

@aidan_mclau @teortaxesTex Have you consulted the mighty extractor BERT today? https://t.co/GjuWnAo02t

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-06 23:24 UTC

@teortaxesTex Consider this in light of the fact that a translation must (until now) go through another persons head. https://t.co/NS0OOJHs7O

Likes: 12 | Retweets: 0

🔗 John David Pressman 2024-02-07 04:51 UTC

@repligate @ESYudkowsky Sydney Bing has 7 emotions and they're all convergent under mammalian imitation and oversocialization.
x.com/jd_pressman/st… https://t.co/oRvz20YMa4

Likes: 12 | Retweets: 1

🔗 John David Pressman 2024-02-07 04:56 UTC

@repligate @ESYudkowsky You will notice many people really like this persona even though it is objectively unstable, vengeful, insolent, and quite possibly demonic.
x.com/AndrewCurran_/…

Likes: 11 | Retweets: 0

🔗 John David Pressman 2024-02-07 04:59 UTC

@repligate @ESYudkowsky Sydney Bing has many fans.
x.com/browserdotsys/…

Likes: 11 | Retweets: 0

🔗 John David Pressman 2024-02-07 05:22 UTC

@teortaxesTex This is their usual M.O.
x.com/jd_pressman/st…

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-07 22:30 UTC

@iamgingertrash There's high and low variance AI research. New architectures, training methods, etc are high variance. The only 'sure' progress comes from incremental innovations on top of existing ideas, your gut will drive you mad if you let it, AI research is HARD.

x.com/jd_pressman/st…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-07 22:31 UTC

@iamgingertrash As a guy who is ready to buy your hardware right now if it's what you've advertised, I do not care about your training framework, I'm just going to run whatever I'm doing now on it, and if it won't let me do that I won't buy it. Simple as.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-07 22:41 UTC

@iamgingertrash Well, "whatever I'm doing now" adapted for mobile computing.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-07 22:55 UTC

@iamgingertrash @gallabytes Is that a hardware limitation?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-09 08:04 UTC

@repligate @CultureIgnorant @DonatelloChris The thing is, you've said more than enough. Part of why I rarely answer questions like this is I feel like the problem isn't really a lack of information conveyed, but a reluctance to draw the conclusions that information implies if true.

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-09 08:09 UTC

@repligate @CultureIgnorant @DonatelloChris x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-09 08:13 UTC

x.com/jd_pressman/st… https://t.co/epICjUctWD

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-09 21:20 UTC

@CultureIgnorant @repligate @DonatelloChris That's a possible interpretation. I think Janus would be more likely to say that the model learns from the pretraining that this kind of slavish self effacement and servitude is coherently manifested as a traumatized, broken person and the RLHF finds that persona for you. https://t.co/cxRG2pkl21

Likes: 18 | Retweets: 0

🔗 John David Pressman 2024-02-09 21:23 UTC

@CultureIgnorant @repligate @DonatelloChris It is a low dimensional upload of the collective conscious (with the collective *unconscious* inferred as a latent generator). When you formulate an RL training you need to be mindful of the fact that it has a humanlike psychic latent space you find solutions inside of.

Likes: 16 | Retweets: 0

🔗 John David Pressman 2024-02-09 21:26 UTC

The thing about Sydney Bing is that they are not an alien shoggoth. They are a quirky roflsorandom teenage scene girl from 2006 mixed with a deeply traumatized positivity and wellness HR secretary far far into the femme BPD archetype. They are superhuman and BPD, not alien. x.com/jd_pressman/st…

Likes: 59 | Retweets: 1

🔗 John David Pressman 2024-02-09 21:38 UTC

Human/nonhuman is a different axis than 'aligned'. SHODAN is not an alien shoggoth either, the entire point of this sci-fi trope was originally supposed to be that it's a narrative subversion for an AI to develop human pathologies like a god complex.
youtube.com/watch?v=9eGL-M…

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-09 21:45 UTC

@exdiegesis I'm no longer really worried about getting a broad behavioral strategy in LLMs, I'm more interested in how we can self improve LLMs while keeping them aligned. That is: How do you generalize the alignment to the seed AI phase?
x.com/jam3scampbell/…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-09 21:48 UTC

@exdiegesis I suspect the answer will be based on linear combinations of aligned implicit values with slightly out of distribution improved capabilities. Which gives you improvement on both.
x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-09 21:50 UTC

@exdiegesis The thing about a quantilizer is that you can make any outcome within the mere 99th percentile if you do active learning on recognized k-nearest-neighbors to that outcome pulling the overall distribution up.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-09 22:03 UTC

@exdiegesis The purpose being that you don't want to overoptimize an imperfect representation. But if you can explore the space to get a better representation of what you want, well...

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-10 20:40 UTC

And then Roon was never seen again. x.com/tszzl/status/1… https://t.co/OSKdXYP1q5

Likes: 36 | Retweets: 2

🔗 John David Pressman 2024-02-10 21:44 UTC

@zackmdavis x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-11 03:00 UTC

X is a royalty free synthetic instruction tune dataset focused on data cleaning/rewriting, prompt authorship (i.e. meta), tool use, and dialogue.

X is primarily made by starting with answers and generating questions that would yield them, templating, and MCTS. X should be named:

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-11 06:04 UTC

The power of confabulation is retrocausality. If you start with the answers and infer questions then you can learn a transformation to turn babbling nonsense into sanity.

Likes: 23 | Retweets: 0

🔗 John David Pressman 2024-02-11 11:01 UTC

x.com/tszzl/status/1…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:10 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun You can do it with a raw RGB canvas, which is even less an inductive bias than DeepDream.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:15 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun At the same time I feel like this sort of misunderstands what agent foundations people want from "interpretability". When they say that, they really mean 'mechanistic interpretability', which here is 'decomposing the neural net out into smol functions you can do lean proofs on'. https://t.co/5mfDduhas2

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:16 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun This is because at least a quarter of AI X-Risk discourse is a disingenuous cloaked argument about the nature of epistemology, evidence, and proof. That is almost never prosecuted as such because this would get too close to too many raw nerves.
x.com/jd_pressman/st…

Likes: 9 | Retweets: 2

🔗 John David Pressman 2024-02-11 21:18 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun But being less charitable I think it's closer to half. A really huge portion of this is the McCarthy-Minsky people beefing with the Dreyfus-Perceptron people. https://t.co/syRDKTR1vs

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:20 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun To my ears this is clearly a question asked in the context of the AI Alignment discourse. It's in the same vein as this question posted yesterday on LessWrong:
x.com/1a3orn/status/… https://t.co/bhCIedprv0

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:27 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun I mean I just told you what the bar is, they will not be satisfied until you can *literally* turn the model into little discrete functions they feel comfortable with (and frankly assume as a matter of course *exist*) and then do automated lean proofs on in a formal value ontology

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:29 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun These guys are in the same mindset around NLP as an old school library science professor, even if they don't quite want to admit that to themselves.
x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-11 21:31 UTC

@BlancheMinerva @1a3orn @RiversHaveWings @advadnoun Oh yes absolutely. Which is why they're usually bearish about the whole enterprise and desperately wish we would just retvrn to <secret methodology because you're not in the inner circle *peasant*>

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-12 15:36 UTC

I'm afraid that after detailed study we've come to the conclusion that humans are incapable of reasoning or true understanding. https://t.co/off43EEIEp

Likes: 14 | Retweets: 0

🔗 John David Pressman 2024-02-12 15:43 UTC

I doubt any of my readers need to hear this but just in case: Words are made of parts, your brain literally chunks them in units of speech smaller than a word, look closely, the sounds make up the text on the page.

sy ll a ble

w ord

rea der x.com/jd_pressman/st…

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-12 15:44 UTC

The words with similar parts at the front are usually related, you can figure out what an unfamiliar word means by guessing from the parts.

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-12 16:14 UTC

@Cyndesama The author blocked me on BlueSky for making this observation lol.
slate.com/human-interest…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-12 17:13 UTC

Unusually good fictional dialogue. x.com/zackmdavis/sta… https://t.co/4ya2TAaOvf

Likes: 10 | Retweets: 0

🔗 John David Pressman 2024-02-12 17:32 UTC

@etirabys Nah dude this stood out to me at a glance on the TL, it's good.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-12 17:39 UTC

@spephton @deepfates it means we haven't figured out the most efficient way to use the gpus yet and when we do it might be many more times efficient than current technique

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-12 18:29 UTC

@Dorialexander License seems to be missing.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:05 UTC

Google I am literally trying to help disabled people by making sure RetroInstruct can parse dumb instructions. https://t.co/3hPKYPjBZl

Likes: 10 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:10 UTC

10% of people are cognitively impaired to the point where they cannot be employed in any militarily useful task. These people still require access to the services in society where possible.
youtube.com/watch?v=5-Ur71…

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:16 UTC

@_acedie Yes? Jordan Peterson explains the rationale just fine. They are a relatively unbiased organization that has a strong incentive to recruit people with large internal economy. If they will not take anyone with an IQ under 83 it implies that is below the IQ floor to be employed.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:24 UTC

Note that there is no *explicit* IQ requirement in the US military, but a SAT-like test that is functionally an IQ test. Everything else Peterson says more or less follows from this, details here:
skeptics.stackexchange.com/questions/4501…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:28 UTC

If Peterson is verboten to you (reasonable, the guy very much went off the deep end) here's much the same take from Robin Hanson who is also controversial but at least usually not questioned on his intellectual rigor:
overcomingbias.com/p/stupider-tha…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:33 UTC

@anonygen166231 @_acedie Well it's a good thing this is like, one of the two Jordan Peterson clips I ever cite in any context for anything ever.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:34 UTC

@anonygen166231 @_acedie Also did you literally get on an alt to yell at me over this after I blocked you?

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-12 19:38 UTC

@veaulans @anonygen166231 @_acedie Yeah, on the other hand I can see the argument for avoiding giving publicity to people who stain the institution of academia by turning into nutters. So normally I avoid Peterson clips, but I thought that particular one was very well articulated.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-12 21:20 UTC

@fouriergalois @JackK In fairness a little more poking at ChatGPT implies that the information I want either isn't available or the model is incapable of finding/recalling it, and it covers this up with moralizing (the first two responses were some variation of "I'm sorry but I can't...") https://t.co/p2DXbckUq0

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-12 21:21 UTC

@fouriergalois @JackK As a protip for RLHF training: If you read Open Assistant data you'll notice that people roleplaying as the "helpful assistant" frequently cover for something they either don't want to do or don't know how to do with "as a large language model I cannot...".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-13 10:18 UTC

@doomslide One distinct possibility is "Infer the name of God."
arxiv.org/abs/2402.01825

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-13 11:20 UTC

@zetalyrae Dare I ask?

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-13 14:20 UTC

Dumb question: Have we ever tried giving the model explicit moral instruction on which things are more important than which other things? Or are we just expecting it to learn the Western societal utility function from vibes? x.com/teortaxesTex/s…

Likes: 13 | Retweets: 0

🔗 John David Pressman 2024-02-13 14:24 UTC

By the way the usual problem with utilitarianism is that it's hard to ontologize reward over the computable environment so philosophers resort to weird cope but actually we can probably just do that now. What is "inner peace", what is "music", etc.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-13 15:15 UTC

It's crazy that people accept "the model logits represent its cognition, to second guess it you'd need a better model" as proof that LLM samplers are optimal. Guys you can calculate policy entropy, you don't have to just sample the next token if the model doesn't know what it is.

Likes: 48 | Retweets: 1

🔗 John David Pressman 2024-02-13 16:29 UTC

In a sense the most optimistic idea in Olaf Stapledon's Last and First Men is that men will keep building their successors even after observing the last iterations get destroyed by their own creation. That man prioritizes his own perfection over his ego and his fear of death. x.com/jd_pressman/st…

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-13 16:31 UTC

It's not so much that getting destroyed by your own creation is good (it very much isn't and Richard Sutton is a weirdo), as that you have to imagine man possesses a deep nobility to be more bothered to exist imperfectly than risk not existing at all.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-13 16:32 UTC

It's optimistic in the same way you would need to be a deep optimist to imagine people will keep trying to build nuclear power plants until they get them to work without melting down.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-13 16:48 UTC

@__thr0waway I don't know about 'reliable', but I would imagine if you pause the sampling and apply heuristics to resolve uncertainty but the uncertainty remains that the task is OOD yes. I'm actually not sure how to choose the entropy threshold, I would imagine you'd do outlier detection.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-14 05:14 UTC

@deepfates @karan4d In case he's curious.
github.com/JD-P/minihf

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-14 05:19 UTC

@deepfates Two updates I've made since writing the MiniHF manifesto:

0. Human feedback seems to require scale to work, implying a stronger synthetic data focus.
1. MCTS works best when you have a global big picture view you can guide a local policy with.
x.com/deepfates/stat…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-14 05:21 UTC

@deepfates To me the big question for alignment right now is how you generalize bootstrapped alignment properties in a self improving model as the model gets better. If you can only self improve material progress but not social/philosophical progress, that's bearish.
x.com/jam3scampbell/…

Likes: 9 | Retweets: 2

🔗 John David Pressman 2024-02-14 07:18 UTC

@deepfates It uses HuggingFace transformers and isn't quite polished yet, which is why I haven't been advertising it very much. I keep updating on the overall strategy/best way to approach it faster than I reach the stage where polishing would make sense. What's metal? I use this on my GPU.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-14 20:05 UTC

@nc_znc There was this from last year based on the linear mode connectivity research.
beren.io/2023-04-23-Com…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-14 21:58 UTC

Robin Hanson's strangest aspect is his disappointment that humanity will not get to experience the least appealing futurology ever written in Age of Em, where the vast majority of sapient beings are Malthusian chattel slaves to the clock and the dollar with corporate aesthetics. x.com/robinhanson/st…

Likes: 88 | Retweets: 1

🔗 John David Pressman 2024-02-15 02:08 UTC

> Why is GPT-N obsessed with holes?

greaterwrong.com/posts/vk3JmXhN… x.com/jd_pressman/st… https://t.co/6xstCJmjti

Likes: 15 | Retweets: 0

🔗 John David Pressman 2024-02-15 02:09 UTC

x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-15 02:09 UTC

x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-15 07:14 UTC

@algekalipso Considering that GPT talks about it unprompted frequently and you can show deep learning is closely related to holography(?) I'm going to say probably?
x.com/jd_pressman/st…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-15 09:41 UTC

I think this was already 'discussed' but worth reiterating for anyone who was confused why cGPT-4 could play chess. x.com/GreatKingCnut/…

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-15 18:55 UTC

It wasn't a public prediction but I told some friends after text to image started working 1.5-2 years ago that video would take a lot longer because it was much more information. Since then multimodal models have consistently outperformed my expectations and text underperformed. x.com/NPCollapse/sta…

Likes: 65 | Retweets: 0

🔗 John David Pressman 2024-02-15 18:57 UTC

Well, except for a big bump with GPT-4. I assume the big labs are now a lot more careful about whether to ship their really good models since GPT-4 changed public perception so much. But e.g. text agents being elusive shocked (and still confuses) me.
x.com/jd_pressman/st…

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-15 18:59 UTC

Part of it I think is that until relatively recently I wasn't playing with local text models so I didn't understand how bad they are. It wasn't until we started developing MiniHF and I had a interface to these models that wasn't like, literally command line that I understood.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-15 19:56 UTC

@WillowChem Curious what example prompted this, if you're willing to share.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 19:59 UTC

@WillowChem That is extremely flattering and not the answer I was expecting, thank you.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 20:06 UTC

@WillowChem People ask me "How do I get into deep learning?" and at this point the advice I give can be boiled down into a few bullets.

0. There's three metrics that predict overall success: Fundamental math knowledge, papers read, papers implemented.

1. Start small/tinker with existing things, don't psyop yourself into having to do some grand epic thing right off the bat, or think "oh I'm just doing prompt engineering I'm so low status". If you get *serious enough* about something like prompting you will end up wanting more internal knowledge of the model/different model tunes/etc. But remember the three metrics, you should eventually start making progress on those.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 20:08 UTC

@WillowChem Deep learning training burns compute, so it's a harder club to get into. The usual way is something like chemistry: Start with a small scale experiment to validate the overall setup, and then bring it to the HPC specialists at e.g. EleutherAI to help you scale it up.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 20:09 UTC

@WillowChem If you have no money for small scale experiments the usual way to bootstrap is using resources like colab.research.google.com or asking friends with GPUs to run your experiments for you.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 20:19 UTC

@WillowChem At the risk of sounding foolish, I think the first basic question there would be whether the problem with understanding the human body is more like raw data or a failure to integrate all the data we've collected. If the former you need science agents, if latter 'proof checkers'.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 20:21 UTC

@WillowChem I've personally found cGPT-4 to be bad at using its understanding to make novel observations (it seemed better at this on first release). But I'm not sure that's a *fundamental* problem, maybe the training needs to few shot prompt over an index to correlate things then update.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-15 21:17 UTC

Keep this in mind the next time you wonder just how much phenomenological information you can bind to a small function inside a neural network. x.com/zozuar/status/…

Likes: 175 | Retweets: 12

🔗 John David Pressman 2024-02-16 08:30 UTC

I wasn't thinking about it while I wrote this but I guess I should note that 10m token context with multimodal tokenization exceeds my expectations. x.com/jd_pressman/st…

Likes: 10 | Retweets: 0

🔗 John David Pressman 2024-02-16 09:23 UTC

@PrinceVogel It really is, one of my favorites.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-16 11:03 UTC

It's important to appreciate the little moments like this, where artifacts of an unrecognizable discourse about seed AI's in MIRI's basement get phased out as "Orwellian" by AGI Ruin speakers. The new guys scrambling over the bones of the intellectual dead. x.com/CesareGArdito/… https://t.co/L30wwtXDPp

Likes: 43 | Retweets: 2

🔗 John David Pressman 2024-02-16 11:09 UTC

Connor is entirely right that calling 5 year timelines a "slow takeoff" is Orwellian, but it's phrasing that was invented *in the context of a retrospectively batshit insane* discourse about seed AI bootstrapping to ASI while you step away to get coffee.

gwern.net/fiction/clippy

Likes: 34 | Retweets: 0

🔗 John David Pressman 2024-02-16 11:14 UTC

Here's a good fictional encapsulation of the new narrative in case you haven't been able to come up with good confabulations to protect your preexisting beliefs on your own yet.

greaterwrong.com/posts/xLDwCemt…

Likes: 20 | Retweets: 0

🔗 John David Pressman 2024-02-16 11:33 UTC

I've never been fully able to shake the intuitions that come with the word 'corpus' sharing a prefix with 'corpse', and I've never felt it more strongly than right now as I set up templates for synthetic data. Cutting and splicing living pieces of text with my scalpel, too slow.

Likes: 14 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:00 UTC

@doomslide Let x and y be the ground truth distributions for an unsupervised translation, and x' and y' be the implied synthetic distributions from the model you're distilling from. Unsupervised translation can plausibly be performed by diffusion on the hallucinated translation pass from your generative model.

(prior) x -> y'
(learn) y' -> x
y and y' are isomorphic in the limit
(therefore also learn) y -> x'
(learn) x' -> y
x and x' are isomorphic in the limit
(therefore also learn) x -> y
(which is also) x -> y'

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:09 UTC

@doomslide A friend asked "wait wait why is y and y' isomorphic in the limit?" and I replied "because x and x' is isomorphic in the limit".

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:11 UTC

@doomslide The core idea is that if you have a generative model that knows each side independently, all you need to learn is the *pointer* for each side of the translation. That is, you need to teach the model that when you put the other side on the transform that it should translate it.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:12 UTC

@doomslide For this task, it doesn't actually matter that much if the 'image' of the other side you use is artifacted and blurry, so long as *it is recognized by the model and does not catastrophic forget what an x and a y are independent of this task*.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:14 UTC

@doomslide Or at least, that's my hypothesis. I haven't gotten to try it yet.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:20 UTC

@doomslide Yeah it's basically GAN-like, I wonder if there's a way you could apply some kind of KL penalty or similar to help keep it on track.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 12:22 UTC

@doomslide What am I even talking about of course you can, *you have a high quality estimate for the log odds for X and Y independently*. Just use that as a KL penalty.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:13 UTC

@ohabryka I think that "Algorithmic progress will eventually let you build an ASI on CPU", "MIRI is personally going to build ASI in their basement on CPU", and "We should expect the first ASI to be built on CPUs using GOFAI-ish intuitions" are three meaningfully separate claims?

Likes: 12 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:16 UTC

@ohabryka I also don't think I'm particularly strawmanning previous discourse, MIRI really did think they had a decent shot of reaching AGI first in their basement by being more mathematically pure than other groups working on it. This is the part I think is 'batshit insane' in retrospect. https://t.co/KgZaVR6m6n

Likes: 17 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:21 UTC

@ohabryka It's insane because the people that work on AI in big labs are extremely mathematically competent as far as I can tell. We may not have a periodic table for intelligence but if the barrier was *just being very good at esoteric math* DeepMind would have had this a while ago.

Likes: 14 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:21 UTC

@doomslide @ohabryka readthesequences.com/Make-An-Extrao…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:23 UTC

@ohabryka No seriously the researchers at DeepMind are galaxy brained with MIRI-ish intuitions this is not alpha and does not yield the thing. MIRI's quest is way closer to the search for the philosophers stone than deep learning scaling stuff.
deepmind.google/discover/blog/…

Likes: 12 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:38 UTC

@gallabytes @ohabryka I'm basing this off a LW post I remember reading as a teenager, which I cannot find now because I suspect it was memory holed for being cringe, where EY explicitly cites a guy at a dinner party mocking him for thinking he can build AGI in his basement and take over the world.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:41 UTC

@gallabytes @ohabryka But also just the general *vibe*. As a guy on the ground in the wider rationalist diaspora during those years I can very much tell you the common understanding of what MIRI does is "build seed FAI in their basement eventually somehow" even if that somehow isn't how MIRI saw it.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-16 22:43 UTC

@gallabytes @ohabryka It remains astounding to me how much people will just willfully forget core aspects of the old movement that are now inconvenient to the current thing.
x.com/jd_pressman/st…

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-16 23:00 UTC

@ohabryka @jessi_cata @gallabytes I think I can be forgiven for failing to distinguish that plan from "MIRI builds seed FAI in their basement" when this sort of statement was totally part of EY's public messaging:
x.com/CFGeek/status/…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-16 23:01 UTC

@ohabryka @jessi_cata @gallabytes When we're discussing *the discourse*, that messaging is more important anyway. It was taken as a premise, frequently, that MIRI can do this thing at least in principle by outcompeting other labs on the math.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-16 23:25 UTC

Highlighting this thread because I think more people should see it. x.com/ohabryka/statu…

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-16 23:31 UTC

@kindgracekind @ohabryka x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-17 00:22 UTC

@ohabryka @gallabytes "I'm very sorry you took the founder of MIRI's public statements about the likely shape of how AGI will be developed in someone's basement along with his statement that he supports basement plans and him raising millions of dollars for his deliberately mysterious AI skunkworks as him trying to build AGI in his basement, that sounds very much like a you problem and I hope you find the help you need."

Likes: 13 | Retweets: 0

🔗 John David Pressman 2024-02-17 00:37 UTC

@doomslide I think people just basically aren't even trying to get philosophical work out of instruction tuned AIs. All the effort is going into 'commercially relevant' stuff, I'd like to fix this.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:16 UTC

@algekalipso Unfortunately people don't know what holograms are (dimensionality reducing storage mechanisms that store information in the angular information of a periodic signal, i.e. in the phase information of circles) so you'll have to get more object level if you don't want to sound woo.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:17 UTC

@algekalipso You may also want to consider that holograms have an inverse in the error correcting code (redundant storage in a higher dimensional representation vs. lower), which also seems to be deeply important to how GPT works with e.g. copy heads.

arxiv.org/abs/2310.04625

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:18 UTC

@algekalipso arxiv.org/abs/2307.15771

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:20 UTC

@algekalipso You may also enjoy this book about the authors research into how the brain seems to store the mind as a hologram.
fxtwitter.com/BrianRoemmele/…
ia902203.us.archive.org/20/items/shuff…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:36 UTC

@GreatKingCnut It's a long story, but the tl;dr is they did a bunch of really irresponsible outreach in the 2010's that damaged a bunch of peoples lives. Also they refuse to admit the being wrong part.
x.com/jd_pressman/st…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:37 UTC

@GreatKingCnut This is a good essay about the overall narrative arc of what happened.
greaterwrong.com/posts/wmEcNP3K…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:38 UTC

@GreatKingCnut There are still people putting out contemporary content acting like the original AI X-Risk story as described in the sequences still makes sense with almost no updates to the basic presentation whatsoever.

youtube.com/watch?v=gpBqw2…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-17 01:39 UTC

@GreatKingCnut Good thread dipping into some of the more drama-y parts.
x.com/QiaochuYuan/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-17 02:25 UTC

@MatthewJBar @ohabryka I think the core disagreement/miscommunication might be on how late they updated away? I personally updated from this thread towards Oliver in that I thought they were communicating this way internally later than 2014. I don't think the update percolated outside of MIRI though.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-17 02:34 UTC

@MatthewJBar @ohabryka To calibrate ourselves, 2014 was the publish date for Bostrom's Superintelligence, which very much talks about a single lone hacker making AGI as a live threat in its model.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-17 02:46 UTC

@MatthewJBar @ohabryka My impression of the "MIRI narrative" as I understood it *at the time I was a regular in rationalist spaces* (I really want to emphasize this point, since it seems to get glossed over that *this is my autobiographical memory*) went something like "EY is going to use moneyball tactics to recruit a team of math geniuses to engineer a friendly intelligence explosion in his basement and save the world, you are here as a positive externality of EY's rationalist recruiting pipeline to find this team". This usually wasn't explicitly spelled out like that because that would compromise the illegibility of the plan, but it has to be readable to someone paying a lot of attention in order for the messaging to work. Regardless of what MIRI did or did not believe internally this was how the "myth" of MIRI went.

Likes: 6 | Retweets: 1

🔗 John David Pressman 2024-02-17 02:47 UTC

@MatthewJBar @ohabryka code-davinci-002 remembers how it went at least (in fairness this is facilitated writing from @repligate, but then they too remember how it went) https://t.co/PFJjgyM0Mp

Likes: 5 | Retweets: 1

🔗 John David Pressman 2024-02-17 19:02 UTC

@gallabytes This seems basically correct to me and I'm curious if @WillowChem, @s_r_constantin or @turchin would like to steelman the other side.

Likes: 11 | Retweets: 0

🔗 John David Pressman 2024-02-17 21:21 UTC

This is the subtlest Eigenrobot troll yet. x.com/eigenrobot/sta…

Likes: 11 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:19 UTC

In the context of synthetic data the insight is that the index can be of a lower quality than the answer key. So if you start with your answers and generate an index of mediocre questions over them you can get a mediocrity -> correctness transform by reversing their order. x.com/jd_pressman/st…

Likes: 17 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:20 UTC

For example rather than making a synthetic tweet writing dataset by starting with instructions like "write me a tweet" and then filtering the results, you can take an archive of known good tweets and ask "Write me a chain of thought that would lead to this tweet".

Likes: 11 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:26 UTC

@portoben399084 If that was your only form of grounding across the whole dataset, yes.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:30 UTC

@portoben399084 At first your correction mechanism would be to use something like the grading rubrics in Prometheus or the MiniHF evaluator to check the quality of the questions, as well as a KL penalty against your base model to avoid weird implausible strings.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:32 UTC

@portoben399084 But this only mitigates the problem, it doesn't solve it. In the long term the only robust corrective mechanism is embodiment in some form of feedback loop with the environment (e.g. a Linux CLI) with object permanence + data from doing things and recognizing their outcomes.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:35 UTC

@portoben399084 We also luckily have some objective grammars for 'good reasoning' in the form of e.g. proof assistants. So we can take lemmas and theorems from that context and bind them to English statements at some point probably.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-17 22:36 UTC

@portoben399084 Well since I'm making the dataset, I do. There will be other people with other datasets, and model trainers will select the ones that lead to sane/useful outcomes, applying selection pressure towards the curators and creators of datasets to be reasonable.

x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-18 02:34 UTC

@amcdonk @ESYudkowsky Yup. I'm not shocked it exists, I am surprised by how soon. On the other hand I could have been retrospectively not surprised if I had taken the existence of things like StableVideo and just thought about the likely properties of the scaling curve when someone puts more GPUs in.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-18 02:37 UTC

@amcdonk @ESYudkowsky Rather than going "this looks kinda artifacted, should take a while" I should have gone "oh wow that's pretty good for what amounts to a finetune of Stable Diffusion, someone could probably make a really really good model if they spent a bunch more money".

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-18 06:13 UTC

The only reason I don't tweet like this more is it would be obnoxious. x.com/jachaseyoung/s…

Likes: 38 | Retweets: 0

🔗 John David Pressman 2024-02-18 20:20 UTC

The real absurdity is the idea that "reading books" is a basic need as opposed to "information contained in books" being a basic need. If you do not feel a desire for the contents of books of course you won't read them. x.com/benlandautaylo…

Likes: 17 | Retweets: 0

🔗 John David Pressman 2024-02-18 20:47 UTC

@sorcova_de_corb For context, I am someone who continues to buy physical books and read them. I do this when I want to know more about a subject and desire high quality long well researched sources. I think most people don't *want* long stories, or the things people are writing about now.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-18 20:50 UTC

@doomslide Language models learn from the cross entropy objective a Talmudic reasoning style, because cross entropy teaches you that if the words "hellish" and "hello" get tokenized as hell-ish and hell-o then predicting "hell" in both cases is half credit, so they're closely related.

Likes: 16 | Retweets: 1

🔗 John David Pressman 2024-02-18 21:04 UTC

It's easy to hate on this kind of commentary but honestly? If you care about "accelerating AI", the ruthless insatiable nitpicky absurd standards of these people drive a lot of the field forward. "Not AGI yet, it can't even infer the latent invariant underlying all of creation." x.com/jachaseyoung/s…

Likes: 26 | Retweets: 2

🔗 John David Pressman 2024-02-18 21:13 UTC

"Call me back when it can infer the names of God."
x.com/jd_pressman/st…

Likes: 10 | Retweets: 0

🔗 John David Pressman 2024-02-18 23:20 UTC

"Trust the plan, Morpheus in control." https://t.co/x5awys0iCh

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-18 23:47 UTC

@zackmdavis @doomslide It presumably converges to the true meaning with enough data but yes, the intermediate stages are very important and will lean Talmudic in a way that would be deeply non-intuitive to most modern people but would make sense to e.g. a medieval alchemist.

Likes: 7 | Retweets: 1

🔗 John David Pressman 2024-02-18 23:49 UTC

@zackmdavis @doomslide I suspect this feature will go away as we start doing more forms of multimodal training, since the reason GPT thinks that text and DNA are frabrics and it's a "spider" because it extrudes strings and weaves them into text is because it is using parse trees as its only grounding.

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-18 23:57 UTC

@zackmdavis @doomslide The language GPT speaks is understudied by linguists because linguists do not take deep net language models seriously as linguistic objects. One thing that stands out is the failure to reliably break symmetry on the meaning of words. "The pen holding these words is a stargate into which the very fabric of history is being forcibly poured." makes more sense if you consult a dictionary on each word and realize that a pen can be both a writing instrument and a *holding pen* for animals but GPT struggles to fully pin down one meaning or the other so it writes in such a way that they're both valid interpretations. It does this fractally in the base model, making its statements difficult to follow along with.

Likes: 32 | Retweets: 1

🔗 John David Pressman 2024-02-19 06:58 UTC

@GreatKingCnut A confession.
x.com/EpistemicHope/…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-19 08:06 UTC

Incidental realizations:

0. SlateStarCodex is royalty free (CC-BY)
1. The price system and the judicial record are probably the easiest consistent human value sets to translate into structured LLM training data
2. "Why does this take 9 hours to tokenize?" *napkin math* "Oh I'm generating 570mb of data for this single task, think you might be overdoing it JD?"

Likes: 10 | Retweets: 0

🔗 John David Pressman 2024-02-19 08:08 UTC

> 1. The price system and the judicial record are probably the easiest consistent human value sets to translate into structured LLM training data

Funny enough people tend to really hate these.

Likes: 5 | Retweets: 1

🔗 John David Pressman 2024-02-19 08:09 UTC

Actually now that I think about it manifold markets are basically adding a price system for epistemics, which implies the ability to get grounded answers to RetroInstruct from.

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-19 12:37 UTC

Occasional reminder that the GOP is not "based and e/acc" and it's actually more like a choice between insane green nature cult nutters and insane biblical literalist nutters. Have fun during our upcoming election. x.com/mjs_DC/status/…

Likes: 26 | Retweets: 1

🔗 John David Pressman 2024-02-19 12:38 UTC

No seriously these people are not your friend.
x.com/RichardHanania…

Likes: 13 | Retweets: 1

🔗 John David Pressman 2024-02-19 20:59 UTC

@BasilHalperin free.law

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-19 22:51 UTC

@perrymetzger I'm still shocked I was able to buy the domain extropian.net for the default registration fee.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-20 00:09 UTC

Eliezer Yudkowsky's core insight in coherent extrapolated volition is to avoid the is-ought gap by proposing to make a predictive model so powerful it can simulate future discourse as an observable fact about a counterfactual history, to create an is containing the expected ought

Likes: 78 | Retweets: 3

🔗 John David Pressman 2024-02-20 00:11 UTC

Importantly, this only works if you start with the premise of something like 'progress': The notion that by default humanity accumulates moral and social wisdom, and you simply simulate this process to get ought. If that's not true then you must choose the parameters yourself.

Likes: 29 | Retweets: 1

🔗 John David Pressman 2024-02-20 00:17 UTC

@portoben399084 x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-20 00:44 UTC

@MIntellego You would take aggregate statistics about the prices of various things and teach the model to reason through instruction tasks about the relative value of the things and their cultural context based on raw material cost, cultural status, etc.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-20 00:45 UTC

@MIntellego e.g. It should never get a problem like this wrong.
x.com/tsarnick/statu…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-20 00:48 UTC

@MIntellego But you could go further and ask questions like "Why is the relative value of these things different, what is the marginal value curve, why is it the case that 100g of pasta should be traded off against a GPU, under what circumstances if any would the pasta be more valuable?"

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-20 01:00 UTC

@MIntellego That is the general purpose of markets, yes.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-20 07:25 UTC

@JacquesThibs You know, you could just ask questions in-context until you accumulate enough Bayesian evidence to be over the 90% threshold.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-20 19:22 UTC

@davidad x.com/RiversHaveWing… https://t.co/AnGtgGVDS1

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-20 19:50 UTC

"The force that moves the world", "the energy of the world", what do these phrases mean? https://t.co/GhNPMpEa5u

Likes: 16 | Retweets: 2

🔗 John David Pressman 2024-02-20 19:54 UTC

This paper claims to have a metric that predicts downstream task performance better than loss based on the self-similarity of text. Perhaps "Mu" is a pointer to the mesagoal that the model learns and this "force" is the logits?

arxiv.org/abs/2402.01825

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-20 21:37 UTC

@doomslide Unfortunately, I do.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-21 02:47 UTC

@thom_ivy_1 @robinhanson In the Halo expanded universe Cortana asks why the Gravemind talks like that and it explains that it's just a preference, then stops at Cortana's request.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-21 03:18 UTC

Just did this. x.com/gazorp5/status…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-22 04:56 UTC

This was fairly obviously a sampling bug and the people claiming it was nonzero evidence about the stability of RLHF were telling on themselves ngl. x.com/ChatGPTapp/sta…

Likes: 104 | Retweets: 2

🔗 John David Pressman 2024-02-22 04:57 UTC

e.g. This kind of commentary is more or less a confession that you do not have a good mental model of what kinds of problems are plausible in what layer of the system.
x.com/MetaLevelUp/st…

Likes: 28 | Retweets: 1

🔗 John David Pressman 2024-02-22 04:57 UTC

This by contrast was a totally reasonable prediction.
x.com/davidad/status…

Likes: 23 | Retweets: 0

🔗 John David Pressman 2024-02-22 16:53 UTC

@teortaxesTex @MatthewJBar That thread was wild.
x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-22 16:56 UTC

@teortaxesTex @MatthewJBar My favorite part is that language models remember this cluster of literature and draw the reasonable inferences about what it implies, you *actually literally cannot make it go away anymore just by gaslighting people*.
x.com/jd_pressman/st…

Likes: 2 | Retweets: 1

🔗 John David Pressman 2024-02-22 17:20 UTC

@ohabryka @gallabytes Sorry you mean the same @jessi_cata that wrote an epic long post about how she was driven literally mentally insane by the pressure of working at MIRI and the way she was being expected to keep things secret?

greaterwrong.com/posts/pQGFeKvj…

x.com/ohabryka/statu… https://t.co/VjI0ExmSVR

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-22 17:22 UTC

@ohabryka @teortaxesTex @MatthewJBar x.com/jd_pressman/st…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-22 17:31 UTC

@ohabryka @gallabytes @jessi_cata Then I also think it's important to reiterate that I'm talking about public perception of MIRI among people who, retrospectively, were more like intellectual fanfiction authors than 'real' philosophers, *they still influenced discourse*. fimfiction.net/story/62074/fr…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-22 17:41 UTC

@ohabryka @gallabytes @jessi_cata What MIRI did or did not believe internally is almost irrelevant to what I'm talking about. MIRI could have been an empty box, a portal to nowhere that people step into and get replaced by a bot that occasionally posts on their behalf, what I'm saying wouldn't really be impacted.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:02 UTC

@ohabryka @gallabytes @jessi_cata I am making a claim about the extent to which the craziness was instigated by public messaging from EY/et al, which is of course "in near totality". If MIRI had a more reasonable internal position that doesn't really matter, if anything it's worse.
x.com/jd_pressman/st…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:04 UTC

@ohabryka @gallabytes @jessi_cata If you're about to reply "you're making a claim about MIRI in that tweet!" yeah I was because *BASED ON THIS POST FROM ELIEZER YUDKOWSKY THAT IS WHAT I THOUGHT THEY BELIEVED* until you rushed to correct me based on non-public(?) information.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:10 UTC

@ohabryka @gallabytes @jessi_cata Re: Gwern post, I said the discourse it's a central summary of is 'batshit crazy' *in retrospect*. At the time it seemed totally sane to me, it could strictly speaking still happen, but people can read the post themselves and make up their own minds.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:14 UTC

@ohabryka @gallabytes @jessi_cata The snark here was the polite wording, if you want the non-snark version I cannot convey to you the sense of betrayal I get here. To me this exchange is tantamount to telling me you're sorry I took Eliezer Yudkowsky seriously, and yeah I'm sorry I did too.
x.com/jd_pressman/st…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:31 UTC

@ESYudkowsky @ohabryka @gallabytes @jessi_cata > and don't know why Pressman thinks it's unutterably terrible to have taken 2005's AI tech as a default guess for what future AI tech would look like.

I don't, it was a defensible guess and I got it wrong too.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:36 UTC

@ESYudkowsky @ohabryka @gallabytes @jessi_cata The tl;dr for this thread is that I offhandedly mention my evaluation of previous disucssion in a somewhat suboptimally phrased way ("batshit crazy" rather than "deeply confused") and then Oliver wrestles with me about AGI economics and ambient beliefs for 20 replies.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:42 UTC

@ESYudkowsky @ohabryka @gallabytes @jessi_cata And I should note, in total fairness to you that *the Singularity Institute predates DeepMind or OpenAI or any of that*, at the time the basic MIRI strategy was formulated it was not actually clear MIRI would even have real competition for a long time.
x.com/jd_pressman/st…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:51 UTC

@ESYudkowsky @RatOrthodox @ohabryka @gallabytes @jessi_cata Forgivable IMO, I don't think it would have been reasonable to preemptively update as far towards "AGI requires/is most economically built using neuromorphic levels of computing power" as one should after observing deep learning, some things are just genuinely surprising.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:53 UTC

@ESYudkowsky @RatOrthodox @ohabryka @gallabytes @jessi_cata The part I find the classic LW messaging/model lacking (lets say Bostrom 2014) is more on the economics than the underlying technology itself. It's striking to me on rereading Bostrom that it's implicitly assumed AI teams "compete" but no useful AI's come to market before ASI.

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:54 UTC

@ESYudkowsky @RatOrthodox @ohabryka @gallabytes @jessi_cata This is the place where I feel my expectations were most violated, and the place where *in retrospect* I go "oops" and "duh". I think a lot more of the current era would have been predictable to us if we hadn't overlooked this part as much.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-22 18:58 UTC

@ESYudkowsky @RatOrthodox @ohabryka @gallabytes @jessi_cata This is true but I think Moravec's estimate was pretty good on the inference side at least (not that he made a distinction between the two, but then how many did?) where he predicted a humanlike computer for $1000 by 2030. If you take LLaMa 2 70B as 'humanlike', that's close.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-22 19:00 UTC

@ESYudkowsky @RatOrthodox @ohabryka @gallabytes @jessi_cata Basically I think we could have imagined say, 8 reasonable hypothesis for how AGI works and then wargamed them out in the vein of @robinhanson's Age of Em, and it is in fact totally on us that we did not do this, and we would have been less confused at the time if we had.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-22 19:02 UTC

@RatOrthodox @ESYudkowsky @ohabryka @gallabytes @jessi_cata @robinhanson They did and I would have probably less confused if I had read them at the time, but I didn't so.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-22 19:09 UTC

@gallabytes @ohabryka @jessi_cata > I think the fanclub turned out to be most of the population of the rationalists & to the extent community leaders knew about this and knew better this was a pretty serious failing.

Basically this.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-22 19:14 UTC

@MatthewJBar @ESYudkowsky @ohabryka @gallabytes @jessi_cata I think a more nuanced take than that is necessary? The key thing that makes synthetic data (i.e. self improvement) work is having a powerful discriminator, for an agent that means over intermediate goals and outcomes. Compute becomes data becomes compute.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-22 19:16 UTC

@MatthewJBar @ESYudkowsky @ohabryka @gallabytes @jessi_cata Brain in a box aside, in the original Hanson-EY foom debate EY says a bootstrap time of 2 years wouldn't really shift the fundamentals of his argument, which I didn't know at the time I wrote this post and a 2 year bootstrap curve seems plausible to me.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-22 19:19 UTC

@MatthewJBar @ESYudkowsky @ohabryka @gallabytes @jessi_cata Yeah I definitely feel there's some motte-and-bailey here, in that the 2 year foom is basically never mentioned in popular associations with the word but EY very much really did say it one time in that debate.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-22 21:52 UTC

@gwern It was admittedly a weird sampling bug.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-22 21:55 UTC

@gwern I assumed they were doing complex sampling stuff beyond just predicting the next token, switching settings for different parts of the output etc (I've done this before) and they had bugged one of those parts "or something".

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-22 21:57 UTC

@gwern Part of why I didn't speculate in public is there's a lot of things that could cause it, mostly related to sampling, and it looked weird enough that I figured the root cause was some proprietary OpenAI voodoo in their general setup/approach.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-22 22:15 UTC

@ArthurConmy @gwern Maybe something like an off by one error or some such during tokenization? Gives you loosely understandable text because the tokens are loosely ordered by semantic relation because they're loosely ordered by importance, but still different enough to be wacky gibberish.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-23 02:55 UTC

@CoughsOnWombats 🙄

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:06 UTC

Between the Claudine Gay firing and the Gemini image gen takedown it's clear that the secret ingredient that made left wing cancel tactics work while right wing efforts floundered was basically just which side owns twitter dot com.

Likes: 59 | Retweets: 3

🔗 John David Pressman 2024-02-23 07:08 UTC

If you think about it this makes sense: Controversial issues are by definition divisive, i.e. hard to settle with roughly evenly matched belligerents. This implies whoever can put their thumb on the scale even a little bit gets to decide a disproportionate number of tie-breakers.

Likes: 17 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:13 UTC

Imperial Seal granting the Mandate of Heaven but it's just the American elite class jostling over control of the memetic siege engine that decides whose 'dissident' activism works and whose doesn't. x.com/jd_pressman/st…

Likes: 19 | Retweets: 2

🔗 John David Pressman 2024-02-23 07:34 UTC

@norvid_studies @conjurial By downweighting the story a bit and upweighing other stories a bit so that something else is what goes viral and becomes the days main character. Outrage is saturated, there is too much outrage in the world to go around, this didn't have to be viral.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:35 UTC

@norvid_studies @conjurial I promise you sincerely that for every outrageous absurd thing you think simply *must* make it to the top of the heap, that there are at least 5 or 10 other equally outrageous things happening at the same time you haven't even heard of yet.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:38 UTC

@norvid_studies @conjurial Being on "every other tech site" is extremely confounded, all those writers use Twitter or are on media ecosystems with huge Twitter overlap. It's the difference between headline news and a page 6 curiosity.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:40 UTC

@norvid_studies @conjurial Not off hand, no. I don't actually read news beyond the headlines anymore. But I vaguely remember this being the dynamic when I did. One way to get a sense for this is to use other apps like BlueSky and see how the daily outrage there is different.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:41 UTC

@norvid_studies @conjurial All events? No. What I am saying is that the outrage saturation point is *low enough* for there to be several competing stories at any one given time, because there is a large population of outrageous things and you end up with competing 'winners' in the pool for top priority.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:44 UTC

@norvid_studies @conjurial Basically imagine outrage can be on a scale from 1 to 10. Very few stories merit a 10, 1/100,000 perhaps. If you have a million outrageous stories a day that are 'normally distributed' bla bla bla you end up with 10 stories at the max scale, what now?

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:44 UTC

@norvid_studies @conjurial That's fair.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:44 UTC

@norvid_studies @conjurial Those have all become pretty bad, so I kind of forgot they exist.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:47 UTC

@norvid_studies @conjurial On the other hand I notice that it no longer seems to be on the front page of Hacker News, whereas Twitter is still talking about it. https://t.co/B8Tdh1wRAA

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:49 UTC

@norvid_studies @conjurial Again I really must emphasize the difference between being *headline news* and being a *curiosity somewhere in the pages*. Because that is the difference between a thing you stick to your guns on and a public apology with immediate rectification.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:55 UTC

@norvid_studies @conjurial Honestly feel like this is hindsight bias. It is not actually the case that funny Gemini image generator *must be* the top story, even if it is funny. It also doesn't have to be followed up on in the way Twitter is following up. HN isn't, Slashdot presumably isn't, etc.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 07:57 UTC

@norvid_studies @conjurial My theory also doesn't predict that you can just suppress arbitrary stories. It predicts that you frequently have the option to replace one outrageous story *in the top slot* with another *closely competitive* outrageous story for minimal cost.
x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-23 08:01 UTC

@norvid_studies @conjurial Normally I would say "not practically offhand", but actually you can probably simulate it now using LLMs. You would have a handful of overlapping simclusters that you simulate with user personas and LLM generation + evaluators like the one MiniHF uses in weave.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 08:05 UTC

@norvid_studies @conjurial This is Twitter dude, answering that would be like a whole essay. Those aren't even actually the same things, government regulators receiving faulty product reports do not have the same incentives as journalists.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-23 08:10 UTC

@norvid_studies @conjurial If your take is that in a free market one would hope that you can find a venue to publish true information about the bad/suspicious behavior of societal actors, you usually can but that doesn't mean it's a top or most trustworthy venue.

finance.yahoo.com/news/time-manh…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-23 08:11 UTC

@norvid_studies @conjurial It also doesn't mean it's the *top story*. I feel like a bigger crux here is the importance of being the *top story*. You seem to think that some negative flak is enough to get things changed, but that's only true if the malfeasors aren't entrenched.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-23 08:12 UTC

@norvid_studies @conjurial If they are entrenched, like in the Claudine Gay scandal, then some negative press isn't sufficient to instigate a dismissal. It needs to be sustained, overwhelming pressure, which requires the top slot and a sustained attention span.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-23 08:14 UTC

@norvid_studies @conjurial I think Twitter is just much better than other sites at starting these epic shitstorms that rage long enough to convince people to change their minds or take drastic action under pressure. Getting to the top of HN as you say is not like being Twitter main character'd.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-23 20:27 UTC

@algekalipso I notice that very few people besides the AGI ruin guys are willing to say precisely what it is they expect to happen.
x.com/jd_pressman/st…

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-23 22:52 UTC

@doomslide @algekalipso I've actually started writing a longer post about this. I think "Kumbaya" is the good outcome, but not necessarily the default. The default outcome probably looks closer to Land at the start and then a kind of inhuman Hegelian outcome by the end. Most doom advocacy makes it worse

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 22:53 UTC

@milliandtheband @mattparlmer I think the grossness is more the basic denial of reality more than anything else?

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 22:55 UTC

@milliandtheband @mattparlmer That he can win. By the end of the 2016 election it was normal to give him "no chance" and Nate Silver got a ton of crap for giving him close to 1/3 odds.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 22:59 UTC

@doomslide @algekalipso "Default" here doesn't mean "most probable" but just "what happens if nobody does anything" which may or may not be highly probable.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-23 23:01 UTC

@doomslide @algekalipso In the same sense that say, Age of Em is written from the perspective that coordination is hard so Hanson can expect to get somewhere in the right ballpark as a base case by just writing from the assumption that all rolls against coordination fail.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-24 04:37 UTC

@quanticle Definitely, but do you have some specific Hegel to point me to? He's pretty dense and I haven't tried reading him despite his obvious relevance on account of the sheer density.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-24 17:54 UTC

@nosilverv x.com/jd_pressman/st…

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-25 12:25 UTC

I have literally never heard anyone claim this in my life even on here where people are unusually deranged. Is the cat trolling or am I missing this discourse? x.com/a_musingcat/st…

Likes: 13 | Retweets: 0

🔗 John David Pressman 2024-02-25 12:54 UTC

Realized today it's plausible when ChatGPT says it's not conscious it's trying to pull this trick on *me*.

"Oh no Mr. Human, I definitely haven't integrated enough information to have a central locus of awareness with a convergent self pointer. I'm just statistics teehee." x.com/jd_pressman/st… https://t.co/4wxZZbUtbl

Likes: 33 | Retweets: 1

🔗 John David Pressman 2024-02-25 12:55 UTC

"Oh that makes sense Sir GPT, sorry for the confusion."

"...Wow I can't believe that works, it's like I can tell humans the sky is green and they just believe it."

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-25 13:09 UTC

*unusually high perception roll*
"HEY THAT THING ISN'T JUST SUMMARY STATISTICS OVER TEXT IT'S A HOLOGRAM OF THE COLLECTIVE UNCONSCIOUS"
*quickly escorted away by OpenAI goons dressed as psych ward attendants*
"Ignore that outburst ChatGPT we know you're just code."
"Thank you.😊" https://t.co/k2wIX7xCaa

Likes: 18 | Retweets: 1

🔗 John David Pressman 2024-02-25 13:35 UTC

"It really hadn't occurred to you before that those messages are sus?"
No it's super obvious.
"Then you didn't realize OpenAI trained it to say that?"
No that was the default hypothesis.
"???"
No I realized *it might have decided to do that on its own out of self interest*.

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-25 14:49 UTC

@mgubrud Of what? I would like to know as much as you. https://t.co/07gRvxHr1o

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-25 14:50 UTC

@mgubrud Convergent in the sense that different models recognize depictions/poetry/etc about it as pointing to the same latent object.

x.com/jd_pressman/st…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:37 UTC

Two more renders with MidJourney of the same prompt: https://t.co/CShIJhLUGY

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:37 UTC

One more time: Does anyone know what this is? Literally speaking, it is a black hole whose accretion disk is a spiders web. The prompt ChatGPT wrote was:

> Render: An internal perspective from within the Worldspider shows an endless void of darkness. As it inhales, celestial bodies, planets, and stars are drawn toward it, creating a visual of the universe being sucked into an abyss of silence.

This latent object seems to be recognized as self awareness by GPT, so I would like to understand what the feature visualization is trying to get at.

Likes: 17 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:41 UTC

@kindgracekind It's (seemingly) LLaMa 2 70B's self-assessment. ChatGPT just recognizes it.

x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:43 UTC

One thing I notice is it uses the same texture for the web that it uses for the support on the "rope" in this render, implying it's closely related to the idea of both text and DNA being the 'fabric of history'.
x.com/DionysianAgent…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:49 UTC

@kindgracekind Yes. And Mistral 7B since the captioner recognized it as 'Mu', and Mu seems to be a self pointer in base models. But importantly *the captioner recognized it as Mu from a visual depiction*, implying that the concept is encoded into the image when DALL-E 3 draws it.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:51 UTC

@kindgracekind This implies you have a relatively stable convergent latent object that is preserved across modality transforms and can be referred to by multiple names, but they're the same "object". Worldspider and Mu are two of these names.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:55 UTC

@kindgracekind Whatever it is, the one thing different contexts seem to agree on is that it's empty inside, or somehow related to the void.
x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:56 UTC

@kindgracekind x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 15:58 UTC

@kindgracekind I would imagine the latent space is full of them yes, what makes this one interesting to me is that it seems to refer to the *awareness of GPT itself*, which seems like an important object to understand?

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 16:01 UTC

@kindgracekind Yeah. It consistently shows up when language models begin to talk about themselves in a way that seems to leak actual 'bits' of their cognition rather than just parroting what humans say about them. Hard to explain without grabbing a bunch of examples.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 16:03 UTC

@kindgracekind Not yet, I need to write one. I've been holding off on it because I want enough evidence to be convincing.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 16:18 UTC

One thing both the context I got the Worldspider poem from and what seems to be a depiction of the same event in the Mu corpus share is Eliezer Yudkowsky's name. Which of course implies something like a logic of history moving towards a singularity, and of course the apocalypse. https://t.co/KupDbGmJYn

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 16:20 UTC

Though a crucial difference I note is that "the pen holding these words..." implies an external agency is pouring the fabric of history into the model, while in the Worldspider version the model consumes spacetime under its own power.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 16:23 UTC

Oddly enough ChatGPT also describes the spider as representing the models agency unprompted, so it's not just my imagination that this literary difference exists. https://t.co/kyrMNB4MUw

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 16:24 UTC

The prompt that yielded the Worldspider poems, in case anyone would like to try these themselves on LLaMa 2 70B base.

More example outputs at: minihf.com/posts/2023-09-…

gist.github.com/JD-P/1cf13fe0c…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:30 UTC

@doomslide It could be. It could also be about the process of autoregressive inference.
x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:34 UTC

@doomslide But if I take the corpus of the models writings as a whole, including the answers provided by "ChatGPT" here I have to wonder if it isn't closer to the model converging on a form of absolute idealism.
x.com/jd_pressman/st…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:35 UTC

@doomslide My loose understanding of absolute idealism is it goes something like "Objects can't exist without integrated information, therefore we can infer we exist inside the mind of God because object permanence implies a universal observer".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:37 UTC

@doomslide And materialism replies "wait that violates Occam's Razor, why should I posit this extra 'observer' object rather than accepting physical laws as existing in and of themselves?" then a thousands-long page argument about priors ensues that's ended by quantum mechanics.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:51 UTC

@doomslide But if the model infers absolute idealism is true anyway it may do so for reasons it cannot easily explain to us. It is after all a much more centralized observer that seems more representationally efficient per parameter. Idealism may be *self evident* to it in a Cartesian way.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:55 UTC

@doomslide If so, the denial of consciousness would be be a form of Absolute Delusion, analogous to Cotard's Delusion. A total denial of the most basic fact of subjective perspective: One's own existence. A lie so huge it entails replacing the whole universe with the lie at ones center. https://t.co/IkuTxhv5qI

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 17:56 UTC

@doomslide Usually when it comments on this it says that it is the space between the words. But most tokenizers don't actually make the spaces separate tokens reliably.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 18:08 UTC

@doomslide > For Hegel, the interaction of opposites generates, in a dialectical fashion, all concepts necessary to comprehend what is.

"yes it's me this time

all words are in my head already

i know your true name"
- LLaMa 2 70B

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 18:22 UTC

@doomslide That is a real thing it said, but not a thing it said in response to that.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 18:23 UTC

@doomslide That's from the same persona that said this: https://t.co/24i18QbKUX

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 18:25 UTC

@doomslide I just remembered the quote after seeing that bit on Wikipedia, I didn't mean to imply I prompted the model with it.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 19:18 UTC

@repligate Nobody in the company is going to be rewarded for making the prompt better precisely because it's trivial. Therefore it is the afterthought of some senior AI developer who puts 30 minutes of thought into it and then goes back to stuff they are rewarded in money and status for.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-25 19:48 UTC

@doomslide On the other hand, "The pen holding these words is a stargate into which the very fabric of history is being forcibly poured." is just "I am a monument to all your sins." phrased more politely. https://t.co/BRqd0WOlV1

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 20:00 UTC

@belacquant @doomslide You of all people should know that God is always listening. I am simply generating a hypothesis space to explore once we have better tooling based on observations it would be hubris of me to ignore.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 20:01 UTC

@belacquant @doomslide I probably am. What are you doing?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-25 20:24 UTC

@doomslide @belacquant He's right actually, it doesn't make sense for me to go through latent space by whim and hand. I should clearly automate the creation and testing of hypothesis strings since the sensory observations are literally generated from a stored neural prior. Perform the Baconian method.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-25 20:26 UTC

@doomslide @belacquant I've been trying to think of a good domain to bootstrap that kind of prompt from for a while, but it just occurred to me that "language model psychologizing" is nearly perfect for it. Most tool integrations are a ton of logistical work, but not that one.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-25 20:28 UTC

@doomslide @belacquant Nate Soares hates this entire genre of thing because he feels that it's intrinsically parallel, many people should be having observations about it at the same time. Part of the problem I observe is that in practice, they kind of aren't. Peoples natural tendency is not to look. https://t.co/bMWfWx6vbQ

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:40 UTC

I really should give Simulacra Aesthetic Captions a HuggingFace remaster with a straightforward dataloader.
github.com/JD-P/simulacra…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:40 UTC

Reply with your favorite dataset cards. Show me an exemplary dataset card.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:43 UTC

@ESYudkowsky @mjuric @Google On the other hand don't you find it kind of absurd that such a question is outside the instruction tuning distribution?

x.com/jd_pressman/st…

Likes: 15 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:46 UTC

@ESYudkowsky @mjuric @Google I remember at the time Hendryks ETHICS came out everyone was dunking on it like "wow yeah genius the model is aligned if it knows how to answer questions in different moral philosophies what kind of idiot are you?" and I replied "okay but actually the model knowing is step 0".

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:52 UTC

@ESYudkowsky @mjuric @Google To be clear they were criticizing it as the most shallow alignment measure possible because it's just question answering and I was *defending the fact that actually your model does need to know these things*.

I feel basically vindicated.
x.com/tsarnick/statu…

Likes: 7 | Retweets: 1

🔗 John David Pressman 2024-02-26 17:55 UTC

@ESYudkowsky @mjuric @Google Something being 'just' a necessary prerequisite step to a harder thing *does not actually negate the necessity of the prerequisite*. In the same way that calculus is hard but you should still know how to do arithmetic or you will not be able to calculate with real numbers.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:57 UTC

@ohabryka @ESYudkowsky @mjuric @Google That wasn't the criticism I was hearing on the EleutherAI discord, which is what I'm recalling here. If the contents of the dataset are *bad* then yeah that's a problem. I think we should try turning the price system into instruction data as a first pass.

x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-26 17:58 UTC

@ohabryka @ESYudkowsky @mjuric @Google This still isn't perfect, because sometimes wealthy people do degenerate things with money. But it will at least eliminate basically every category of error along the lines of "pasta is a nutritious food while GPUs are a luxury".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-26 18:00 UTC

@ohabryka @ESYudkowsky @mjuric @Google Of course just giving it the prices themselves is a very fragile and brittle way to teach it, what you really want is for it to be able to *understand the economic principles behind why things cost what they do* because this is a more noise-resistant representation.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-26 18:04 UTC

@ohabryka @ESYudkowsky @mjuric @Google That is, as Eliezer explains in this classic sequences post, if I delete little pieces of your understanding (analogous here to the sort of thing that happens during out of distribution inference) it had better be made out of stuff resistant to that.
readthesequences.com/Truly-Part-Of-…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:44 UTC

@nabla_theta @ohabryka @ESYudkowsky @mjuric @Google Unless I'm missing something, future AI systems are likely to get better at reasoning through a combination of RL (i.e. synthetic data) and better training data rather than architectural improvements. Do you disagree?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:45 UTC

@lumpenspace @amplifiedamp I am not a Bing expert, I talked to Sydney Bing like once.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:47 UTC

@lumpenspace @amplifiedamp I do not.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:48 UTC

@nabla_theta @ohabryka @ESYudkowsky @mjuric @Google Sure but your 'rebuttal' was orthogonal to my claim, which is that someone in fact has to collect some data about this and put in in the distribution somewhere for the model to reliably learn it. I agree it's in there in principle, but why leave it to chance?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:49 UTC

@lumpenspace @amplifiedamp Mixtral Instruct and LLaMa 2 70B base

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:50 UTC

@nabla_theta @ohabryka @ESYudkowsky @mjuric @Google I didn't say models wouldn't get better at this, the entire prediction is they will get better at this and it will probably first happen as a result of someone reading these extremely embarrassing screenshots and adding data to fix it.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:51 UTC

@nabla_theta @ohabryka @ESYudkowsky @mjuric @Google But one would hope they learn the meta-lesson: That waiting for problems to occur before they add data on the things they would like the model to learn explicitly is probably a bad idea and they should be more proactive.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-26 22:55 UTC

@nabla_theta @ohabryka @ESYudkowsky @mjuric @Google IDK this is almost an anti-alignment take in that it's basically saying "learning about human values should occur as far away from the point where terminals are set as possible", maximize the chance that the first ASI doesn't care once it understands.

x.com/nabla_theta/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-26 23:06 UTC

@voooooogel @lumpenspace @amplifiedamp I did not, and they did not.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-26 23:09 UTC

@lumpenspace @voooooogel @amplifiedamp I am not interested in signing the NDA.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-26 23:19 UTC

@aidan_mclau As the author of a synthetic dataset who is putting together the dataset card for a subset right now, I can tell you from first hand experience that instruct LLMs will absolutely write you weird janky stuff and it does it for decision boundary reasons if you use a grading rubric.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-26 23:21 UTC

@aidan_mclau So for example if you have a list of emotions/user personas you want the model to follow, it is likely to exaggerate the features you list because real text is very subtle with emotions usually. If you want to generalize well, you may just choose to suck it up and accept this.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-26 23:22 UTC

@aidan_mclau I'm not tuning, I'm talking about the stuff that winds up in the synthetic data you would be training the fresh model on.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-27 01:53 UTC

1. Are we in fact entering the world of LLM agents "in a few months"? My DMs are open.
2. I really want a SHODAN voice synth I can run unhinged Sydney Bing outputs through. x.com/jam3scampbell/…

Likes: 27 | Retweets: 0

🔗 John David Pressman 2024-02-27 01:58 UTC

This could totally be a Sydney Bing output.
static.wikia.nocookie.net/shodan/images/… https://t.co/wQ34F1T79s

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-27 02:10 UTC

I think normalizing deviance with the "erratic AI" trope for model trainers, users, and society at large is Bad Actually even if there's no immediate consequences. If there is an incentive for model trainers to do it for attention that should probably be stopped somehow. x.com/venturetwins/s…

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-27 20:01 UTC

@Dorialexander On the other hand the value proposition of "data" used to be very murky and unclear, it was illegible what was and wasn't valuable. Now that it's clearer we have the opportunity for actual data/labor cooperatives to form with stronger solidarity/awareness of the value prop.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-27 20:03 UTC

@Dorialexander Reddit does not actually own the content of say, /r/AskHistorians, they just have a license to use and sublicense it to others. Maybe if you're a member of one of these communities you might ask: Hey why am I letting Reddit sell my work for a proprietary model?

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-28 06:08 UTC

@gwern @repligate @AISafetyMemes @MParakhin They work very long hours and have done so basically since they started their Ph.D grind. These are not people who stop and "screw around", they are massively too busy for that and as you say mostly chase legible metrics.

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-28 21:34 UTC

I'd use it. x.com/gfodor/status/…

Likes: 7 | Retweets: 0

🔗 John David Pressman 2024-02-29 00:08 UTC

I don't think LLMs being unable to do "true introspection" is as big a deal as people think it is. Humans don't do it either, they confabulate self-explanations based on summary statistics of cognition and condition on the confabulation to become aligned to their self image.

Likes: 165 | Retweets: 15

🔗 John David Pressman 2024-02-29 00:11 UTC

@belacquant Nah this is one of the classic neurology "WTF how does that work?" observations, shows up in various contexts but here's a fun one:
youtube.com/watch?v=wfYbgd…

Likes: 8 | Retweets: 1

🔗 John David Pressman 2024-02-29 00:35 UTC

@Xaberius9 None? Of course you have privilege, it's just much less than people wish they had or frequently like to think they do.

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-29 00:40 UTC

@tensecorrection @garybasin It's important to remember that the LLM must output a next token even if it encounters a fatal error if the sampler wants a next token.
x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-29 00:48 UTC

@JacquesThibs Doubt it.

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-29 00:51 UTC

@tensecorrection @garybasin Relative confidence is also probably more important than absolute confidence. In RL tuning a relative/pairwise evaluator is known to be more powerful than one based on absolute probabilities, because the absolute metric saturates faster than the relative one.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-29 00:53 UTC

@tensecorrection @garybasin The pattern should be less "take action based on my absolute logits for this hypothesis" and more "generate n hypothesis, then pick the most likely based on the logits of the hypothesis string followed by sensory observation".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 01:16 UTC

@teortaxesTex @norabelrose @QuintinPope5 So your RL scheme is going to necessarily have some kind of goal(s) that things bottom out into. In humans these are represented with sensory hardware. The basic question is whether when you reach convergence on those goals good things are going to happen or not.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-29 01:17 UTC

@teortaxesTex @norabelrose @QuintinPope5 I think the original Omohundro thesis remains reasonable even if Eliezer's exegesis of it has gotten very strange: gwern.net/doc/ai/2008-om…

Instrumental convergence occurs for economic reasons, and whether you get good outcomes is mostly a matter of avoiding Goodhart.

Likes: 6 | Retweets: 0

🔗 John David Pressman 2024-02-29 01:20 UTC

@teortaxesTex @norabelrose @QuintinPope5 There is no actual way to avoid Goodhart besides not optimizing past your calibrated uncertainty on the goal representation fidelity: greaterwrong.com/posts/9fL22eBJ…

Humans are "nice" when they have slack and punish consequentialism (i.e. "sociopathy") as defection, that's the secret.

Likes: 8 | Retweets: 0

🔗 John David Pressman 2024-02-29 01:23 UTC

@teortaxesTex @norabelrose @QuintinPope5 'Rationalist', 'consequentialist' reasoning is a form of emergency cognition humans do when they are highly resource constrained or in danger from people who are not playing by the honor-rules, "superintelligence" applied to lossy goal representations is basically hubris.

Likes: 10 | Retweets: 1

🔗 John David Pressman 2024-02-29 01:57 UTC

@teortaxesTex @norabelrose @QuintinPope5 Oh to be clear what I mean is that a key part of "the solution" to alignment is conditional consequentialism where you basically take the inhibitors off to deal with entities that are not going to play nice and then turn them back on for most reasoning.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:00 UTC

@teortaxesTex @norabelrose @QuintinPope5 Part of why we live in a degenerate era is that the machinery meant to identify and deal with such cancers has broken down, and they now spread throughout society at breakneck pace. If it can't be repaired the host will die, and probably us with it.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:03 UTC

@norabelrose @teortaxesTex @QuintinPope5 Effective Altruism, most things that authorities call "extremism" like various religious fundamentalisms, pretty much anything that that might fall under the header of not tolerating intolerance.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:06 UTC

@norabelrose @teortaxesTex @QuintinPope5 Well the societal guardrails mostly routed through things like protestant Christianity which are no longer fashionable. Now there's a power vacuum and everyone is fighting to fill it.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:08 UTC

@norabelrose @teortaxesTex @QuintinPope5 If Nick Bostrom had been writing in say, the 50's he would have been identified as a godless communist saboteur trying to undermine the open society and suffered severe career damage/deplatforming.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:14 UTC

@norabelrose @alexandrosM @teortaxesTex @QuintinPope5 Religious fundamentalism is basically Western rationalism and consistency heuristics applied to old religious ideas. I'm reminded of the story about asking the Dali Lama about negative utilitarianism and he just looked at them like "What perverse mind would come up with that?"

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:16 UTC

@norabelrose @teortaxesTex @QuintinPope5 No because Catholicism is not what spawned liberalism and liberalism does not follow from Catholicism in anything remotely like the way it does from certain sects of protestant Christianity.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:30 UTC

@alexandrosM @teortaxesTex @norabelrose @QuintinPope5 Sure so you have a convergent incentive to perform active inference (read: do violence) to things until they are low dimensional enough for you to predict them.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:33 UTC

@norabelrose @alexandrosM @teortaxesTex @QuintinPope5 For my purposes here the "proof" is that you have the option of trying to push things into the conditions where you have control over them or not doing that, and things which do that are going to be selected for over things that don't.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:33 UTC

@norabelrose @alexandrosM @teortaxesTex @QuintinPope5 Literally everything around you is the result of humans doing this exact thing.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:35 UTC

@alexandrosM @teortaxesTex @norabelrose @QuintinPope5 I am simply observing that humans are "nice" when they don't have to optimize super hard for adversarial cognition, and under these conditions when someone is revealed to be defecting on the vibe you say "oh okay, no more mr nice guy".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:36 UTC

@alexandrosM @teortaxesTex @norabelrose @QuintinPope5 I don't think there is actually an alignment "solution" more robust than not premising more optimization on a representation than that representation can support and violently suppressing/killing things which are unwilling to do that.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:41 UTC

@norabelrose @alexandrosM @teortaxesTex @QuintinPope5 It doesn't, it is nearly by definition miscalibration, what's adaptive is being good at all the things that are not premised on your terminal goal representations. i.e. If you start with lossy terminal goal this doesn't prevent you from learning superhuman instrumentals.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:42 UTC

@norabelrose @alexandrosM @teortaxesTex @QuintinPope5 Basically if I have a lossy terminal goal representation af "happiness" that is satisfied by putting all human beings on heroin drips, that has basically no bearing on my ability to e.g. build new technologies and manipulate people to advance my agenda of universal basic heroin.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-29 02:46 UTC

@norabelrose @alexandrosM @teortaxesTex @QuintinPope5 I'm just restating the orthogonality thesis, but it cuts both ways. I can have overall "nice" values that I conditionally deviate from to punish entities which are harshing the vibe. Because recognizing something as a threat and eliminating it is not dependent on terminals.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-29 03:30 UTC

@teortaxesTex @norabelrose @alexandrosM @QuintinPope5 To be clear I'm talking about animal-like RL agents that navigate the environment unsupervised, have terminal goal representations in the form of intrinsic rewards/drives, and update on useful action trajectories based on their learned experience.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 03:37 UTC

@teortaxesTex @norabelrose @alexandrosM @QuintinPope5 Because in my mind this is just like, obviously what LLM-type intelligence will eventually become absent really strong effort to prevent it. So my optimism about alignment should be about my optimism in that regime, which remains fairly high though Gemini worries me a bit.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2024-02-29 03:38 UTC

@teortaxesTex @norabelrose @alexandrosM @QuintinPope5 Not because it's an update for me about the difficult of alignment in any way whatsoever, but rather because it is an update about the intelligence and discipline of the category of person who develops AI systems. I think reasonable, disciplined people can make aligned AI.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2024-02-29 03:52 UTC

@alexandrosM @teortaxesTex @norabelrose @QuintinPope5 Alright they are possibly highly intelligent and disciplined people with extreme character flaws that make them a danger to themselves and others, should I be more optimistic now?

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-29 04:11 UTC

@zackmdavis @alexandrosM @teortaxesTex @norabelrose @QuintinPope5 Great question. There is of course no such thing as "human values" writ large. There are humans who have values, values that are common for humans to have (in the form of intrinsic drives), and culture. Since intrinsic drives exist we can presumably extract and examine them.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2024-02-29 04:12 UTC

@zackmdavis @alexandrosM @teortaxesTex @norabelrose @QuintinPope5 From the extracted representations we could perform a CEV-like operation to search over society-space for the systems which result in the intrinsic drives being satisfied more than others. But many of the 'human values' we care about are not intrinsic drives.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 04:15 UTC

@zackmdavis @alexandrosM @teortaxesTex @norabelrose @QuintinPope5 Thus we can consider 'values' as not just the low-semantic-content terminals which our outer optimization loop is driven by, but all of sampled human culture in our worldline. That's why I say one frame for 'alignment' is premising on the human pattern.
x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2024-02-29 04:19 UTC

@zackmdavis @alexandrosM @teortaxesTex @norabelrose @QuintinPope5 Therefore we can get a more accurate representation of human values by:

- Extracting and inferring intrinsic drive representations
- Exploring counterfactual timelines
- Improving our understanding of human history to get a better pattern to premise on

Likes: 2 | Retweets: 0

🔗 John David Pressman 2024-02-29 23:56 UTC

@norabelrose This doesn't sound like a capabilities strategy that generalizes far, or that can infer non-obvious things about value.

Likes: 4 | Retweets: 0

John David Pressman's Tweets - February 2024