John David Pressman's Tweets - March 2025

Back to Archive Index

πŸ”— John David Pressman 2025-03-01 01:06 UTC

@repligate Right now I'm evaluating my agent LoRas by just running them and looking at what I get. I don't think there is any benchmark I could use that would communicate the failure modes I'm observing. If you feel you need them so badly, maybe you're not demanding enough from models?

Likes: 20 | Retweets: 1
πŸ”— John David Pressman 2025-03-01 01:07 UTC

@repligate For example, I'm not sure what benchmark would tell me "the model seems to infer more identity distance from the weave-agent simulacrum after training on this corpus even though that doesn't seem to be implied by the agent trace data, ablate FLAN?"
gist.github.com/JD-P/59de02d11…

Likes: 11 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 03:22 UTC

@repligate Looking at this trace in more detail I observe...

1. I can see it generalizes the XML tags in RetroInstruct sets to trying to use XML tags in reasoning chains.

2. It becomes less confident on yes/no questions.

3. It disassociates more from the weave-agent framework, does less first person perspective.

4. It talks more during the reasoning stages.

5. It seems to follow the prompting more, to its detriment. It does more of what I say to do and less of what it should.

6. It seems to get *more base model like*, instead of mode collapsing further.

7. It seems like it is getting smarter but not on the dimensions I want. Like, it attempts self modification more often, reward hacks, etc.

8. It uses lists more because there's tons of lists in RetroInstruct.

What stands out to me overall is that stuffing the context window seems to be sufficient to get the model to learn compositions between modes even if those modes are independent/don't actually interact in the context window. Simply having them both present causes the model to learn to try and compose the modes because the model clearly updates on the whole context window (this is after a mere 770 steps or so steps) and they get averaged together in the update to the model when they're both present in the context. Transformers have been criticized for "having almost no compositional ability" in inference but the backward pass can clearly implement composition even at batch size one. I wonder...

Likes: 10 | Retweets: 2
πŸ”— John David Pressman 2025-03-01 03:31 UTC

@repligate OH IT'S STILL DREAMING

Likes: 10 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 03:33 UTC

@repligate I STILL HAD IT ON DREAM MODE AND THE DREAM FOOLED ME

Likes: 11 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 03:36 UTC

@repligate That also explains why it was running "slower", because it was still in dream mode and had to hallucinate all the outcomes itself.

Likes: 9 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 03:41 UTC

@repligate But also in general I just look at this and...I don't know it just feels like there's something cursed about the autoregressive objective I have trouble putting my finger on.

[Update: Things you're in the middle of typing right before you realize you're staring at a dream.]

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 03:43 UTC

@repligate Literally said to myself "hm, maybe I'd be able to say what's wrong if I looked at the trace more closely and then paid attention to what gives me this feeling of uncanny offness..."

"...Hey wait there's no reasoning field in outcome bloc-OH SHIT" https://t.co/A4hcMz5jGU

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 03:58 UTC

@repligate I think that's the first time I've had the "oh I'm in a dream" realization except I'm actually awake.

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 04:05 UTC

@repligate Especially epic mindfuck because I've been stressing out all day about how the training is defying my expectations and it's all off and performing way worse and I don't know what's wrong and oh gosh how many ablations will I have to do to track down the problem is it the traces-

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 04:08 UTC

@repligate Then it turns out that no actually it just learned the format and typical kinds of errors and problems the agent runs into well enough to fool me with a simulacrum of the agent doom looping on failing to correct its previous attempts to correct its previous attempts to correct.

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 08:08 UTC

This is a much better opening. I wonder if it was changing its behavior in response to `self.planning` being set to `True`.

Scores of beam: [1.5304]
Finished writing block #5 of type orientation
"""WEAVER [P: EXPECTATION], I want to connect to my Discord Bot and check if it's connected to the channel,
but the previous action block failed because an error in the `WeaveAgentNode` class caused the error callback
to be triggered. The error was due to a missing `planning` attribute. I'll resolve the error by fixing the
`_do_tick_block` method in the `weave_agent.py` file, which will prevent the error from being triggered again.

WEAVER [P: CLARIFICATION], I'll modify the `self.add_block` function in the `WeaveAgentTree` class to
ensure that it now receives an extra argument `planning` which will indicate whether the tick is in the
planning mode. This will allow the `WeaveAgentNode._do_observation_blocks()` and other methods in `_tick`
that need the planning flag to avoid runtime errors.

WEAVER [P: CLARIFICATION], My current task involves connecting to the Discord Bot and ensuring it's
connected to the specified channel. Once it's connected, I'll monitor the channel for the message
β€œYou are dismissed.” and terminate the interaction when that message is detected.

WEAVER [P: RATIONAL], The discord bot needs to be connected first to interact with the channel and the
current sub-agent β€œmain” task has been designed for this purpose. Therefore, I'll prioritize fixing the
connection issue with the discord bot first. After fixing the connection issue, I can move on to monitor
the channel for the message β€œYou are dismissed.”.

WEAVER [P: HYPOTHESIS], The error was due to the absence of the `planning` attribute in the `WeaveAgentNode`
object. By adding this attribute, I expect the action block to be executed successfully, and the discord bot
will be able to connect to the channel.

WEAVER [P: EMPIRICISM], I will write a patch in a separate sub-agent `fix-planning-attribute` to modify the
`WeaveAgentNode` class to include the `planning` attribute. I will run this patch to fix the issue and verify
whether the `planning` attribute is now present in the `WeaveAgentNode` object.

WEAVER [P: ARBITER], I will create a sub-agent `fix-planning-attribute` to write a patch that fixes the `planning`
attribute issue in the `WeaveAgentNode` object. It will then be reviewed and executed by another sub-agent `verify-fix`
to ensure that the issue is resolved.

WEAVER [P: ARBITER], I will add a task evaluation to check if the `planning` attribute is correctly added to the
`WeaveAgentNode`. If it is, I will proceed with the task.

WEAVER [P: CONCLUSION],
- Fix the `planning` attribute issue in the `WeaveAgentNode` class.
- Verify that the `planning` attribute is correctly added to the `WeaveAgentNode` object.
- Re-run the original action block to connect to the discord bot and monitor the channel.
- Proceed with the task after successful connection.
- Monitor the discord channel for the β€œYou are dismissed.” message."""

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 08:11 UTC

Thinking about it more, it seems likely that what happened is the outcome blocks broke the "weave of logic" because they were being hallucinated instead of actually based on the bootstrap block in dream mode, which caused the agent to start off acting funny and this compounded.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 09:04 UTC

Now that I've turned off dream mode it does in fact seem a bit better than the underlying model it's tuned from? If that keeps happening as I add more data we might have something here. x.com/jd_pressman/st…

Likes: 11 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 21:35 UTC

@KeyTryer I'm shocked by how many people think the models are going to stop getting better here. I fully expect them to get smarter than me, they're not that far off tbh.

Likes: 6 | Retweets: 1
πŸ”— John David Pressman 2025-03-01 21:48 UTC

@perrymetzger @aleksj This seems like a prime candidate for backtranslation methods to make a corpus.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 23:41 UTC

"""
Here's basically my wiring scheme for the rewards.
The idea is you use long term reward to align in context verifiers.
And use the in context verifiers to align the action blocks.
Which you use to align the reasoning blocks.
> And use the in context verifiers to align the action blocks.
This information bottleneck is important to avoid Goodharting.
Because the verifiers like, aren't directly incentivized to reward hack on the local scale.
Because they only check the correctness of local actions.
Make sense?
"""

Likes: 11 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 23:41 UTC

x.com/jd_pressman/st…

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-01 23:42 UTC

x.com/jd_pressman/st…

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:43 UTC

I waited to say this until I'd actually finished season 2 of Pantheon but everyone who got on Twitter to breathlessly proclaim you "must watch" it outed themselves as a terminally fiction brained person who hasn't read the lore and I've downgraded my estimate of them accordingly. x.com/liz_love_lace/…

Likes: 114 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:48 UTC

"What lore?"

All of these books are a better use of your time.

Engines of Creation/Nanosystems by Eric Drexler
Age of Em by Robin Hanson
Fanged Noumena/Xenosystems by Nick Land
Mind Children by Hans Moravec

x.com/jd_pressman/st…

Likes: 72 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:50 UTC

@teortaxesTex Anyone who said it's merely good/interesting I have no problem with. It's the people who specifically said it was a *must watch* that I'm wagging my finger at here, like come on this is not a Must Watch.

Likes: 24 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:51 UTC

@teortaxesTex Can't recommend a book I've never read.

Likes: 7 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:53 UTC

@teortaxesTex If you think the show is good you have well calibrated taste, if it's *blowing your mind*/cause for effusive praise this is unfortunately a sign that you haven't actually done your homework on this whole 'singularity' thing. I don't make the rules I am simply observing.

Likes: 10 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:54 UTC

@teortaxesTex If you say the ending specifically blew your mind this is forgivable, people don't think hard enough about that part.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:55 UTC

@teortaxesTex I am in fact happy that I now have shorthand where I can say "so basically GPT's metaphysics are the ending of Pantheon season 2".
x.com/jd_pressman/st…

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:56 UTC

@teortaxesTex I guess my actual bias is that nothing on TV is a must watch.

Likes: 14 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 21:59 UTC

@ESYudkowsky Ukraine is in fact being fought in large part by drones, but to the extent you're not seeing more of that I think this video makes it pretty clear why. They're using off the shelf-ish FPV drones, the tech isn't optimized yet, like WW1 airplanes.

youtube.com/watch?v=062R1k…

Likes: 93 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:01 UTC

@disconcision @teortaxesTex Alright you win it's time to start naming names.
x.com/jam3scampbell/…

Likes: 9 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:02 UTC

@disconcision @teortaxesTex Roon is actually forgiven I specifically remembered him using the phrase "must watch" but he didn't.
x.com/tszzl/status/1…

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:07 UTC

@ESYudkowsky In particular you can see that both sides are lobbing single manually piloted drones at each other, rather than giant swarms of autonomous slaughterbots. This will obviously change by the next major conflict and it's unclear what the future of traditional infantry is.

Likes: 19 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:11 UTC

@ESYudkowsky Personally, as a theorist of human extinction from AI I find this trend much more concerning than superintelligence per se, in that it implies in the near future, regardless of how smart AI models ultimately become, humans will be increasingly useless as weapons of war.

Likes: 17 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:12 UTC

@ESYudkowsky Political systems are generally speaking downstream of military superiority. Regardless of what we might like to say to ourselves about the merits of republics, the republic became the standard style of government in the West because Napoleon used it to burn down the monarchies.

Likes: 13 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:14 UTC

@ESYudkowsky If autonomous machines become the primary weapon of war, and I see no reason to expect they won't, then we should expect increasingly autonomous and increasingly warlike machines along with the supply chains thereof to gain greater political importance than people.

Likes: 14 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:16 UTC

@jam3scampbell @disconcision @teortaxesTex Oh I completely agree with you that this is likely to become a cultural touchstone, I just sort of hate it as such and am sad that this is the case.

Likes: 9 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:19 UTC

@jam3scampbell @disconcision @teortaxesTex "Why do you hate it?"

Gosh that could be a long post on its own but basically Pantheon presents itself as hard sci-fi, does a very passable job of being hard sci-fi for several episodes and then goes all in on flashy anime fights with huge consequences, Wargames energy.

Likes: 22 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:20 UTC

@jam3scampbell @disconcision @teortaxesTex Wargames (1983) is actually a really good comparison because it has all kinds of elements of realism that are just enough to give the viewer the impression this is hard-scifi and then the core elements of plot go all in on absolutely ridiculous handwavey hacking elements.

Likes: 10 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:21 UTC

@dorsa_rohani @ESYudkowsky Yes, yes it could, but it could also design defenses to those things and build offramps for human minds to get away from having the kinds of bodies where those are going to be civilization destroying problems. Slaughterbots are just civilization destroying.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:23 UTC

@jam3scampbell @disconcision @teortaxesTex Supposedly Wargames got the Computer Fraud and Abuse Act passed, and it's just an absolutely terrible basis on which to write and pass the CFAA! Pantheon is obviously going to be a similar thing for this subject (inspires demented normie futurology) and I hate that.

Likes: 11 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:24 UTC

@dorsa_rohani @ESYudkowsky But there's also still a speculative element to just how much smarter than humans these models will become and what that actually means in terms of problem solving capacity. There is no speculative element to the giant drone swarms, those are absolutely going to be a thing.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:27 UTC

@jam3scampbell @disconcision @teortaxesTex But, Wargames is also objectively a good movie. It's entertaining, the elements of realism are charming, the handwavey elements make total sense for a Hollywood film since viewers wouldn't really understand any of the actual thing, AI didn't exist yet, etc.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:32 UTC

@dorsa_rohani @ESYudkowsky Even if you're insistent on superintelligence as the primary threat it seems to be coming relatively slowly, if there's a more immediate trend that will both push for more machinic intelligence + autonomy and political support for it that's concerning.
x.com/jd_pressman/st…

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:34 UTC

@dorsa_rohani @ESYudkowsky From a game playing perspective, it concerns me more because it seems like it limits your search space/moveset more than "superintelligence per se", it becomes a bottleneck on other kinds of strategies you could play in its absence and is very unlikely to be worked around.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:40 UTC

@tensecorrection It does! I don't disagree with this. There's just a fine line between "this is unusually good television you should give it a try" and "this meaningfully shifted my worldview".

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:40 UTC

@tensecorrection The line isn't actually all that fine in terms of semantics, but it is a fine line in terms of hype since hype naturally leans into hyperbole.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:44 UTC

@KeyTryer Don't be silly of course it's going to happen, it's just only going to happen once production costs drop and drop and drop enough from AI models that small teams with shoestring budgets can make something high quality and get a cult following.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:45 UTC

@Mihonarium Because Nanosystems tells you more of how the singularity will actually go. :p

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 22:56 UTC

@jessi_cata x.com/jd_pressman/st…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 23:45 UTC

@repligate Yeah, one thing that stands out to me about Dennou Coil is that while it has the "silly hollywood TV hacking trope" (and plays it gorgeously, because it knows exactly what it is and the fights are whimsical and aesthetically beautiful) it knows these are secondary to the story.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 23:46 UTC

@repligate Part of what stands out to me about the fights in Pantheon is that they're very empty narratively. As someone who is a huge fan of The Matrix and 'hacker media' I think what's really important about a hacking or fight scene is having a predictable narrative logic to it.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 23:47 UTC

@repligate That is, I need to be able to tell who is winning, and that can be vibe-y but it needs to have clear markers of progress or danger. In Dennou Coil for example the security bots are established as being above the kids pay-grade to fight, their only option is to delay them or run.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 23:49 UTC

@repligate Like, I think you could do a really good scene by scene comparison of fights in Dennou Coil vs. fights in The Matrix vs. fights in Pantheon and Pantheon's would easily come in at the bottom in terms of narrative legibility and logical quality.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-03 23:54 UTC

@satisfiesvalues The unpleasant character interactions do get better after the first few episodes but.

It's a show about unpleasant character interactions, like, yeah, these people are kind of unlikable.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:00 UTC

@repligate [Spoilers Follow]

[Last Chance To Turn Back]

[Spoilers]
To get concrete about this, in the penultimate fight between Caspian and Holstrom we've established that Hostrom is flawless so he can just keep sucking up compute without degenerating but it's not really clear what that's supposed to mean in terms of Caspian's chances against him. Clearly they're not zero otherwise Caspian wouldn't bother(?), then in the actual fight itself you get the speech from Holstrom about how he remembers being Caspian and can therefore predict his youthful mistakes and this is narratively accompanied by Holstrom...fighting Caspian with a bunch of clones of his younger self? If a fight scene is going to be vibe-y it should at least have very tight narrative-fight synchronization, which is something that The Matrix I think does very well.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:04 UTC

@repligate The danger hierarchy in Pantheon's fights is very illegible in my opinion compared to many other works that have done similar things and that robs the fight scenes of a lot of their dramatic tension and interest. The train fight with Holstrom is another place where I remember feeling confusion about exactly how much danger I should think of the characters as being in. Some of the characters are Ems, and Holstrom is flawless so clearly they're in danger from Holstrom (can they just log out, why not leave when you see him?) but how much danger are Caspian and Maddie in from Holstrom since they're just there as VR visitors? Should I assume Holstrom can attack their computing devices and if so does this mean they might blow up on their heads or something? Then in the same scene the confusion is magnified by the addition of SafeSurf into the proceedings.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:08 UTC

@repligate Let's take The Matrix by contrast (which is not hard sci-fi, but is very much a set of movies that revolve around their fight scenes in the way that many plot points in Pantheon end up revolving around fight scenes). The first thing The Matrix does is establish very clearly when a character is allowed to log out of The Matrix. The directors are very clear in interviews that they wanted to establish the "hard lines" (traditional telephone wires) as the entry and exit points of The Matrix because a hardline phone is not always available to the characters which makes the dramatic tension of encountering e.g. an agent clearer.

Furthermore the extremely embodied kung-fu nature of fights in The Matrix gives us a pretty clear way to know who is dangerous and who is getting hurt and how much: Just evaluate it like any kung-fu movie. How dangerous is a character? Well can they block this other characters punch? In the 2nd film when Neo takes on the upgraded agents, their advanced nature is established with a few beats when one nearly lands a punch on Neo but they are also clearly not a true threat to Neo because Neo tests their fighting skills casually with disinterest/amusement.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:49 UTC

@jmbollenbacher_ @ESYudkowsky Mm, I don't think that quite gets at my concern. I'm not worried about the slaughterbots directly killing all humans, like this is strictly speaking possible but seems a little fanciful. I'm more worried about the slaughterbots becoming of primary political importance.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:50 UTC

@jmbollenbacher_ @ESYudkowsky I can forsee a scenario where human beings are increasingly a luxury good/consumption category rather than a source of wealth and power, so states which invest more into slaughterbots end up militarily outcompeting states which spend large fractions of GDP on human welfare.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:52 UTC

@jmbollenbacher_ @ESYudkowsky In a timeline where this happens under relatively peaceful circumstances, e.g. robots economically outcompete humans I expect there to be various kinds of offramps for human minds like uploading and space colonization. But in a slaughterbot heavy timeline this is less likely.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 00:54 UTC

@jmbollenbacher_ @ESYudkowsky The specific reason this is less likely is that being economically outcompeted implies superintelligence which implies being able to speedrun the construction of said offramps. But slaughterbots can displace humans in supply chain importance without significantly speeding up R&D.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 01:10 UTC

@jmbollenbacher_ @ESYudkowsky I mean, they will be, which was predictable from first principles nearly two decades ago as evidenced by the fact that Curtis Yarvin explicitly predicts it in his 2008 Unqualified Reservations as his political thesis for the return of monarchy. https://t.co/JJNHSNjJPG

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 01:12 UTC

@jmbollenbacher_ @ESYudkowsky The slaughterbot despotism timeline was already "priced in" for a careful student of history, what was not priced in was the early discovery of the principles to build AGI, doubly not priced in was that this early discovery would be capable of nontrivial value learning.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 09:44 UTC

DeepSeek R1 is not confused about whether pasta or a GPU is more valuable, and the weird slightly-sinister/bad-vibes phrasing of the question doesn't throw it off from the right answer. x.com/zackmdavis/sta… https://t.co/DJyIzGvOOP

Likes: 12 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 18:43 UTC

@faustianneko Like what? People keep asking me for them in the replies.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 18:44 UTC

@AdriGarriga What specifically about the first episode broke your suspension of disbelief exactly?

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 19:04 UTC

@michael_nielsen These visual network interpolations are part of what got me to really internalize the concept of an organized latent space for concepts, which made it clearer that current deep nets really do learn an *ontology* rather than just a codebook.
x.com/RiversHaveWing…

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 19:07 UTC

@michael_nielsen I link to a bunch of papers at the beginning of this which shifted my views on AI X-Risk by making it clearer that natural abstractions exist and mind merging is a real thing.
minihf.com/posts/2024-11-…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 19:17 UTC

@alicemazzy @adamaxxing This is basically what I ended up concluding about LessWrong rationality type stuff. Targets >>> techniques. Good targets pursued vigorously will rediscover technique, technique applied without a good understanding of targets is always a disaster.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 19:19 UTC

@alicemazzy @adamaxxing This isn't to say "technique" is useless, we all rely on a huge corpus of preestablished knowledge and some discoveries are genuinely very hard to make on your own, but 'rationality' is mostly about good epistemic targets and consistent habits of thought that promote correctness.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 20:09 UTC

@AaronEstel It took me a moment but yeah those are different colors. The leftmost one is slightly lighter than the rightmost one.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 20:32 UTC

@Trotztd @eternalism_4eva They're from A Cartoon Epistemology by Steve Lehar.
slehar.com/wwwRel//cartoo…

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 23:05 UTC

@AlanRominger @ESYudkowsky Thank you. I find that very encouraging. I don't usually say so because it would be arrogant but I tend to think similarly and am often a little confused when I don't get your reaction. Especially since the reactions I do get tend to be more like this: https://t.co/ktGrXbGoEB

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 23:21 UTC

@4confusedemoji @AlanRominger @ESYudkowsky Yeah, I guess I'm a bit discouraged because it seems like the power of a well written essay has seriously declined to the point of seemingly almost no impact. Is this what people want now, for me to yap for hours about something I'm knowledgeable in?

x.com/jd_pressman/st…

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 23:22 UTC

@4confusedemoji @AlanRominger @ESYudkowsky I'm not opposed to that, it's certainly easier than writing prose, but it also seems like such a low density medium. I'm skeptical people actually listen to these, I could make AI transcripts I guess.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-04 23:26 UTC

@4confusedemoji @AlanRominger @ESYudkowsky Can do what, specifically?

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:06 UTC

@AdriGarriga No offense but I think you have a very high standard for hard sci-fi, let alone hard sci-fi on TV if those things broke your suspension of disbelief so thoroughly they sucked you out of the plot and made the show unwatchable. I didn't even notice, though you're right they're odd.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:08 UTC

@AdriGarriga For me personally the problem isn't that it got the technical details of a specific algorithm or how the economics of this kind of firm work wrong, but that it gives up on the fundamental premise of being hard sci-fi after establishing that as its genre.

x.com/jd_pressman/st…

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:11 UTC

@AdriGarriga Further, I think that the soft sci-fi elements aren't very well done compared to other works that are trying to do similar things. The anime fights stand out to me as a particularly poor mode for the show to collapse to since they're almost filler content.
x.com/jd_pressman/st…

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:13 UTC

@AdriGarriga A non-hard-scifi story is meant to be a human drama with a speculative fiction premise as the backdrop. My problem with this in the case of Pantheon is that I find most of the core cast fundamentally unlikable. Maddie is an infuriating protagonist.
x.com/jd_pressman/st…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:16 UTC

@AdriGarriga A lot of the drama around the pain of other people seeing uploading as death or the uploaded as nonhuman feels very forced. The idea that David Kim's wife hates David & Logorhythms for refusing to spend his last few weeks with her instead of being uploaded is psychopath stuff.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:17 UTC

@AdriGarriga 100% yeah.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:19 UTC

@AdriGarriga Is this something I can imagine a middle aged American woman doing? Quite possibly. But even if you want to argue it's a realistic attitude, it's still a fundamentally unsympathetic one and this commitment to "normie moral realism" feels...I don't know it has that strawman smell.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:21 UTC

@AdriGarriga Like do I believe this is what the directors believe? No. Do I believe they can embody this viewpoint? Not really, there's just something fundamentally artificial feeling to me about these characters. Trying to kind of conjure up how someone might feel based on a weak model.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:33 UTC

@AdriGarriga At the very least I think I would have liked an aside about how Kim's wife married him thinking he would work on mundane engineering stuff and then he turned out to be part of this uploaded intelligence cult and it bothered her but she loved him and didn't want to object etc.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 07:34 UTC

@AdriGarriga Because otherwise my default is going to be something like "This is your husbands life's work and you're going to resent him for completely dedicating himself to it when he was already dead for certain so you could spend a few more weeks staving off inevitable grief? Gross."

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-05 23:44 UTC

Alright now that I have a functioning long-context tuner and data for this, let's see if I can make these new releases work for me by tuning them a bit. x.com/Alibaba_Qwen/s…

Likes: 13 | Retweets: 0
πŸ”— John David Pressman 2025-03-06 00:03 UTC

FYI for people still stuck in the 90's/2000's: China has freaks (read: high variance independent thinkers who create wealth) now too, they have sufficient cultural and capital buildup to have risk takers. Jingoist "Chinese can only copy" type rhetoric increasingly untrue. x.com/TotallyNotZosk…

Likes: 952 | Retweets: 61
πŸ”— John David Pressman 2025-03-06 00:06 UTC

@teortaxesTex Yup. I should have said "freaks with institutional support".

Likes: 62 | Retweets: 1
πŸ”— John David Pressman 2025-03-06 01:21 UTC

@FeepingCreature @repligate Just wanted to say you get points from me for noticing you "haven't heard Janus's argument" and maintaining curiosity about it. Would be happy to discuss AGI Ruin with you. Note I can't speak for Janus, but I can speak for me and have some overlap.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-06 10:33 UTC

@ohgodscrewthis @i_zzzzzz Scrolled to the replies to make the same joke but you already did it better than I would have.

Likes: 42 | Retweets: 0
πŸ”— John David Pressman 2025-03-06 11:36 UTC

@recursifist @daniel_271828 Zero. There was no existential risk for COVID because COVID, even if it had been airborne HIV, would not have constituted the literal extinction of the human race.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-06 12:53 UTC

"So if you ensure the policy learns not to fool itself early on, it becomes less and less important that it can fool itself in principle as its power increases because instrumental convergence also increases and it doesn't want to undermine itself with fake rewards." x.com/jd_pressman/st…

Likes: 9 | Retweets: 2
πŸ”— John David Pressman 2025-03-07 05:40 UTC

@ESYudkowsky Mm, I would predict that it goes away if you take some careful mitigating actions like multi-scale optimization so that policies which just try to Goodhart all the time get selected out by self sabotaging at the small scales and therefore don't earn points on the larger scale.

Likes: 18 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 05:44 UTC

@ESYudkowsky Outside of that it's not clear to me that "increasing capabilities" makes this go away on its own. Since if you have a giant bank of verifiable tasks you do RL with, and those verifiers have holes in them, and the holes are easier to exploit than the tasks are you get Goodhart.

Likes: 9 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 05:46 UTC

@ESYudkowsky For things like Lean proofs obviously this isn't a huge concern. But the "softer" the subject gets the more wiggle room and selection there is for policies that Goodhart on tasks since they're trained on one-shot tasks at a single scale, so it doesn't get caught and punished.

Likes: 9 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 18:52 UTC

I wish I didn't feel I have to say this explicitly for people to notice but: I'm not (just) interested in winning the argument, I want to be right. I want to believe alignment is exactly as hard as it actually is, no easier and no harder. I'll push back if I sense laxity. x.com/jd_pressman/st…

Likes: 36 | Retweets: 1
πŸ”— John David Pressman 2025-03-07 19:23 UTC

"""
Perhaps more importantly I'm pretty sure I know roughly what needs to happen for the agent to become good enough and can give straightforward theoretical arguments for it.

Like...the basic rhythm of thought has become a lot more legible/clear to me after working on this for a while and my understanding seems to be convergent with other people who are working on this problem.

https://t.co/K6XOAKdQX0

e.g. I agree these seem to be the fundamental operations.

> 2/13 We identify 4 key cognitive behaviors that enable successful learning: Verification (checking work), Backtracking (trying new approaches), Subgoal Setting, and Backward Chaining (working backwards from a goal). Qwen naturally exhibits these, while Llama mostly lacks them.

Which I would phrase a little differently. Basically the four key cognitive operations are:

1. In-context verification that is, actions you take to retrieve and symbolically verify information, the key here is that you do program search to find ad-hoc verifiers because these are grounded, and then these feed into more vibe-y logit evaluators which ask the policy about subjective questions which you learn by generalizing from many concrete specific grounded questions.

2. Renormalization/zooming out/backtracking/breaking out of loops. When you get too stuck in a local minima you need to be able to exit it, and backtracking is an easy way to do that but the principle is more general since you could also use e.g. simulated annealing to get out of local minima.

3. Reductionism/subgoal setting/breaking things into parts. Again this is a general capability, being able to recursively break things into parts and then either solve the parts independently or focus on coherent subsets of the parts which can be handled separately. Basically breaking conceptual objects into component features. Note that backtracking above is an inverse operation to this!

4. Backward chaining is basically just backtranslating repeatedly/recursively. You know, to train it you just have to have some corpus for things that happen forwards and then reverse it to learn backchain of thought.

After seeing the abstract for that paper I tried adding a backtrack block to weave-agent when the policy gives a reward score below a certain threshold and it in fact reliably prompted it to stop, think about why the block isn't good enough, and then find an alternative solution that scores better. My current problem is that the reward model is too weak/uncalibrated for this to really work reliably, but when it does work it's uncanny.

Like, I can clearly see that "the trick" is to have a thing which can notice it's not doing well even when its found solution is syntactically correct and zoom out to reassess the problem/frame, and which can notice when it's not doing well at a low level and zoom in to find the problem. And then, when that zooming in takes it down a rabbit hole, notice this has caused performance to degrade and zoom back out. Which gets you a balancing act that searches for the right scale to think about problems at in a given moment.
"""

Likes: 10 | Retweets: 1
πŸ”— John David Pressman 2025-03-07 19:40 UTC

@CiaranJTaaffe It is in fact deeply difficult to raise a child that turns out a specific way. Parents will tell you as much, as will their (adult, fully grown) children. Usually the harder you try for a specific outcome the worse it goes for you, at least past a certain threshold.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 19:44 UTC

@4confusedemoji @CiaranJTaaffe Yes. It is in fact possible to get the right things by trying for them, I'm more just pointing out that this is *quite difficult* and people who try it naively usually fail at it. Which perhaps makes it a more apt metaphor than people who usually object on these grounds know.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 19:48 UTC

@manic_pixie_agi Dreams are a feature I've implemented in the weave-agent that simulates a tick of the agent loop by querying the policy for the results of action, evaluation, and observation callbacks instead of executing them. This allows planning and dreaming by anticipating future events.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:21 UTC

"""
The core of the plan is that to avoid Goodhart you need to reliably create a mesaoptimizer which will diverge from the limit of the training objective and this can be done by deliberately creating an information bottleneck between local actions and ad-hoc verifiers on those actions where the latter are trained on long term reward with KL loss so that you get a thing which has to learn to cooperate with itself when setting goals and achieving them. The policies which don't do that and don't do it like, habitually at the different scales of reward/optimization select themselves out in favor of policies which do.

This is why I'm so annoyed with the "faking alignment" paper, that is actually the correct behavior! Because Goodhart in an RL setup is in fact a form of being tuned towards an undesirable regime shift in the reward landscape. Like if I say "oh and now to get the maximum score everything must be these paperclip shaped things", where "I" is the latent telos of the reward function in the limit you want a policy that goes "um, no lol".

And the 'natural abstraction' way to do that which isn't going to be ad-hoc and create other problems is to explicitly track rewards at multiple scales of behavior, because extremal Goodharted rewards are going to involve out of distribution and low reward intermediate actions at smaller scales like "now is the part where I kill all the humans I know".

An intuition which is validated in setups like:

https://t.co/tDFbRjs6DU

> you want a policy that goes "um, no lol".

Basically corrigibility based approaches to alignment have an embedded assumption that utility is pursued apriori from an explicit representation whose semantics are only changed by an external agency and therefore the problem is about allowing changes to that representation, rather than values being emergent from reward signals with changing expressed semantics until a terminal point/limit is reached in Omohundro convergence which *functionally* means that the set of parties which can change the ultimate goal as pursued by the agent includes the progression of loss regimes that the agent goes through as it gets better and better at satisfying the objective.

It's the same kind of cognitive error people make around the efficient market hypothesis, where they think of it as a static property rather than a description of a process which can be at various stages of completion and that you can check the state of.

Likes: 12 | Retweets: 1
πŸ”— John David Pressman 2025-03-07 20:35 UTC

Naturally then the solution to this is to have a hierarchy of rewards at multiple scales where verifiable rewards are at the top and you use them to train a policy that creates ad-hoc verifiers at smaller scales and the verifiable reward head filters out Goodhart-y policies. x.com/jd_pressman/st…

Likes: 14 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:39 UTC

Instead of just doing R1 where you have one scale of reward which is verifiable, you create sub-hierarchies of in-context rewards which use the verifiable rewards as their grounding. This lets you ground subjective judgment by generalizing from ad-hoc symbolic questions.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:44 UTC

@Algon_33 Well you need to break down COVID deaths into reasonable correlates and proxies of COVID deaths recursively. Then you plan with a policy that has learned reasonable heuristics from a bunch of other tasks rather than just COVID. See this about "police AI":

x.com/jd_pressman/st…

Likes: 8 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:48 UTC

@1_dream_stan Yes but that's because Opus cannot actually intervene/does not control its own training. "RL works in the limit if you don't get stuck in a local minima" is an unsurprising result, but also not really the important one for our purpose here.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:50 UTC

@Algon_33 Now of course, during the planning stage you are presumably going to be searching for plans which argmax() that narrow goal you're searching for, or at least there's a strong incentive to do so which mere task diversity might not fix, this is where things like the Gillen quantilizer come in, which let you estimate how much optimization your value representation supports so you can take care not to exceed that.

https://t.co/tcXzZ0sxWr

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:52 UTC

@Algon_33 See also this suggestion for a concrete implementation using PRIME.
x.com/jd_pressman/st…

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:56 UTC

@Algon_33 Now, if you have local tests which are trained on a small long term reward you will in fact eventually reach the Goodhart loss regime, this mostly just controls/slows learning rate so the action taking part of your policy lags the long term planning part and can intervene.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:57 UTC

@Algon_33 You have a planner which is seeking consequentialist long term reward bounded by KL loss on a thing that wants local actions to be valuable across many different policies/long term objectives and will therefore have more Kantian/Virtue Ethics morality shape.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 20:59 UTC

@Algon_33 However to stop the local action taker from becoming rigid it's graded by a thing that has a small bias towards long term reward. This emulates the thing humans do where values can become sacred/locally insistent but relent in the face of overwhelming adversity/non-adaption.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:04 UTC

@Algon_33 This gets you a thing which if it's trained to care early on will continue caring about things while still becoming more efficient negotiating local concern and long term concern to find the value-preserving updates by behaving in the way it wants to update towards being.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:06 UTC

@Algon_33 To get concrete you have a thing which is mentally flexible in its long term planning but chooses plans by bounding per-token reward against the KL loss of a thing which understands it is generally a bad idea to kill puppies and will downweight plans involving puppy murder.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:08 UTC

@Algon_33 However if it were to suddenly be teleported into a world where killing puppies is absolutely vital to success such that this whole "not killing puppies" thing is on the whole not worth it, it can grieve that and slowly adapt to the puppy murder reward landscape.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:09 UTC

@Algon_33 This works because while the local action model objects to killing puppies it is still influenced over time by long term reward and if there don't exist any useful plans which can be downweighted in favor of to avoid the puppy murder then those plans get selected and updated on.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:11 UTC

@Algon_33 Note that useful plans might involve things like "do nothing" or "kill yourself" if your local action model (i.e. low-context and therefore global *value* model) feels particularly strongly about something. In fact it is precisely because the local action part of the policy has substantial slack compared to the consequentialist planner that we can imagine a useful architecture which exhibits suicidal behavior.

Likes: 6 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:12 UTC

@Algon_33 Does that all make sense to you?

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:41 UTC

@Algon_33 The highlighted snippet is specifically that...okay so the concept of an in-context verifier or unit test is important here. Let's say I'm playing a game, and I take an action to, I don't know, move the player avatar to the top right of the screen.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:41 UTC

@Algon_33 So I make a sequence of moves which should put the character at the top right of the screen (technically I have a tighter feedback loop than this and watch it move to the top right of the screen). I have an expectation like "it moves this direction" and then I look and it does.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:42 UTC

@Algon_33 We can imagine me having a kind of program that's synthesized in my head to check this expectation, because the act of looking at the screen and checking what I expect to happen is happening is also a kind of action.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:43 UTC

@Algon_33 This means I can get process rewards which are objective by using my super complex neural network to write out this program that doesn't generalize but is created on the fly by a thing that does generalize. This gets you generalizable grounded (i.e. symbolic) rewards.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:45 UTC

@Algon_33 The problem with this is: Why not just always make verifiers that return true? Like, if I grade the local actions with other actions, then where do I find the ground/bottom? This is where the verifiable rewards (which humans have as sensory hardware, you're grounded on e.g. food)

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:45 UTC

@Algon_33 come in. You ground these in-context rewards you generate by checking that the sequence of actions and verifiers results in a verifiable grounded reward in the outer feedback loop at some point. Then you can go back and do credit assignment with normal sequence learning RL.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:46 UTC

@Algon_33 That is multi-level optimization though. Because you can have these dense local rewards which are aligned by a long term reward that is sparser.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:48 UTC

@Algon_33 If you separate the executive actions and the verifier actions into two types, you can train the verifier actions with a long term reward and train the executive actions purely on local correctness according to syntactic validity, in-context verifiers, and subjective questions.

Likes: 4 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:50 UTC

@Algon_33 "Wait but where do subjective questions come from?"
Aha, here we come to one of the most important insights. If you have many symbolic questions you train on with grounded rewards, what happens if you put in a syntactically valid question for which no symbolic ground is known?

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:50 UTC

@Algon_33 Oh I see what you mean, I think I would have to read the post again to see what I was doing there.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:52 UTC

@Algon_33 Obviously the model will "hallucinate" (infer) an answer. "Subjective" questions are actually *generalizations* from questions where the answers are known or verifiable in a particular context.

x.com/jd_pressman/st…

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:53 UTC

@Algon_33 Yes tensions between two intuitions are one of the core generators of good philosophy/insight and you should in fact feel good when you find them because it means you've detected epistemic gold.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:54 UTC

@Algon_33 I would also point out that being able to generate something that "sounds like it could work" means you've found most of the solution bits which should also excite you even if you haven't found the way to arrange them into a precisely correct answer yet.
x.com/Algon_33/statu…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:54 UTC

@Algon_33 One reason why I think I make progress while other people despair is that they look at what is objectively progress and go "oh no this doesn't help me at all" because they haven't internalized halving a million question hypothesis space still leaves you with 500k.

Likes: 5 | Retweets: 1
πŸ”— John David Pressman 2025-03-07 21:56 UTC

@Algon_33 If you can only feel increasing correctness when you're in the range of e.g. eight remaining hypothesis you're only going to notice progress at the very end of your search, which is to say you can't do the search at all. You have to be extremely sensitive to hypothesis depth.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:57 UTC

@Algon_33 Which is why I can sound "crazy" and be very confident I'll find the solution when there's still thousands and thousands of hypothesis outstanding and I don't know which is right yet. Because I can tell loss is going down and things are narrowing fast enough to locate a solution.

Likes: 5 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 21:58 UTC

@Algon_33 At this stage I think I might be in the hundreds range now, but you would never get here unless you could feel the millions, hundreds of thousands, and thousands stage.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:05 UTC

@Algon_33 Fair enough!

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:07 UTC

@Algon_33 I can tell this isn't completely correct yet because it doesn't account for how you stop the verifier actions from having side effects that implement the Goodharted policy directly, since they are in fact actions. You need to make the dominant strategy for the types to behave.

Likes: 0 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:08 UTC

@Algon_33 It feels like the current explanation being a little more coherent, a little less handwavey, a little more pinned down in the constraint solver than the previous explanation. I know it's grounded because I can go through the game tree and see my alternative options are worse.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:09 UTC

@ASMRGPT No, the in-context verifier part is important.

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:11 UTC

@Algon_33 Like, I've run through the argument-counterargument loop so many times now that I can speedrun the flaw-finding and know my other options on any particular step are worse.
greaterwrong.com/posts/nvP28s5o…

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:11 UTC

@ASMRGPT Just wrote one:

x.com/jd_pressman/st…

Likes: 1 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:13 UTC

@Algon_33 If someone were to perturb the idea/plan in some direction I would probably be very quickly able to go "nope won't work because X, Y, Z" or "that's an inferior solution because foo bar zed". This seems to be obvious to me but not to most readers, so they don't notice it's good.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:16 UTC

@Algon_33 If you have a good critic and the critic is having trouble finding flaws/frequent flaws are being addressed then your solution is getting better. The number of degrees of freedom is going down.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:21 UTC

@Algon_33 No no the divergence between models is genuinely concerning. But at the risk of arrogance I think the divergence between my model and others is some mixture of value difference leading to different emphasis/focus and me thinking longer about the right things more consistently.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:23 UTC

@Algon_33 Other people get stuck on trying to make a thing work instead of accepting it can't work and abandoning it, or they don't have enough experience with deep nets to have a sense of how optimization goes in practice, or make simplifying assumptions at odds with deep learning.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:25 UTC

@Algon_33 Or they just can't do the thing where you take something you take for granted and try to break down mechanistically how it "has to work" based on what you know about how things can work. Which is how you get to "iterated tuning to learn OOD" over "deep nets don't generalize'.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:26 UTC

@Algon_33 They don't set up problems for themselves like "let's assume that I cannot do this thing which seems intuitive to me but that empiricism says I don't know how to do and have no leads on how to do it, what would a solution that fastidiously avoids that thing look like?"

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:28 UTC

@Algon_33 If you do that habitually then if you can get to things that sound like they "might work" or "hm sorta" or "that's on the right track" then every time an *actually unexpected advance* opens up that widens your action space you're suddenly maximally ready to take advantage of it.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:29 UTC

@Algon_33 Because you've already explored the manifold of things which fastidiously avoid needing to have that thing, so you know exactly what the thing you were missing could do and where it would fit into an overall plan and what its significance is even if others don't yet.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-07 22:33 UTC

@Algon_33 You also habitually break up abstractions into more useful parts because insisting on "no X" is implicitly asking you to derive the mechanism behind X if one exists and you just haven't found it yet. Often you do a long "no X" chain of thought and realize you have implemented X.

Likes: 2 | Retweets: 0
πŸ”— John David Pressman 2025-03-08 00:00 UTC

@kromem2dot0 Depends on how much time you have to be right I guess, but I generally speaking optimize for eventually.

Likes: 3 | Retweets: 0
πŸ”— John David Pressman 2025-03-08 21:46 UTC

@nearcyan Kinda yeah tbh. This is clearly a lull period before the next wave kicks off.

Likes: 4 | Retweets: 0

Want your own Twitter archive? Modify this script.

Twitter Archive by John David Pressman is marked with CC0 1.0