John David Pressman's Tweets

🔗 John David Pressman 2023-02-01 16:19 UTC

@dpaleka I'm objecting to the word 'right' more than anything else tbh.

x.com/aashiq/status/…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-01 16:22 UTC

@dpaleka Regardless of the wisdom of whatever norms or laws you might want, their truth is not self evident, they are not so important and so fundamental to dignity that refusal to respect them is grounds to overthrow the government. This is what the word 'right' should centrally convey.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-01 16:40 UTC

@dpaleka To get more to the point: In the liberal tradition saying something is a right is an implicit threat to overthrow the government if you don't get what you want. In this context that's ridiculous, and hecklers in my replies pretending like I made a gaffe don't change that.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-01 17:28 UTC

This having been said I think we REALLY need to start talking about an overhaul to the Caller ID system, we need to fix whatever lets you spoof email addresses. We need to start getting serious about identity, unassisted humans are already taking advantage of our complacency. x.com/jd_pressman/st…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-01 17:29 UTC

It just should not be acceptable after the literal decades these things have been in service for them to be easily spoofed and evaded. That's 90's tier stuff, it's cute in a fledgling technology but the digital phone system and email are mature now, they should be trustworthy.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-01 17:32 UTC

Imagine if you could just spoof URLs, I don't even mean unicode lookalike crap just straight up spoof them byte for byte and people went "oh but that's how DNS works, it would break backward compatibility to fix it".

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-01 17:48 UTC

@dpaleka Realistically if you had to disclose any time photoshop (or AI) is used in a work it would be like the cookie popups. #1 priority IMO is combating forgery and fraud, which usually looks like adding hard to fake markers of authenticity to real interactions.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-01 17:54 UTC

@dpaleka Detectors are a reasonable stopgap measure, but the truth is that AI driven scammers will just be exploiting the same problems in our infrastructure that scammers exploit now.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-01 17:54 UTC

@dpaleka The renewed urgency AI adds is a great way to get momentum into reform, but I worry we'll miss the real opportunity if we focus too much on AI itself.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-01 18:33 UTC

@repligate I really should finish that book.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-01 22:00 UTC

@PrinceVogel I still think back to the summer of 2021 when I slept days and worked nights on those first VQGAN landscapes. I used an A6000 to get the highest resolution. It was a heatwave and the GPU spat fire, my office was like a forge. I'd stare shirtless into the canvas and watch it grow.

Likes: 18 | Retweets: 2

🔗 John David Pressman 2023-02-01 23:22 UTC

@ESYudkowsky The Popol Vuh theory of alignment, perhaps:

mesoweb.com/publications/C…

It is often said that the gods create man to worship them, what else would be the use of this sniveling sycophant? https://t.co/HXJqWLsjER

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-01 23:27 UTC

@ESYudkowsky Notably, in the language this is translated from the use of the word 'see' has the connotation of 'see and acquire'. The proper English translation of that word is conquer.

"Their knowledge will extend to the furthest reaches, and they will conquer everything."

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-02 19:16 UTC

@QiaochuYuan x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-02 19:18 UTC

@QiaochuYuan x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-02 22:23 UTC

@repligate I suspect religious texts in general will score high on the AI meter because they have ritualistic grammar, strong elements of repetition.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-02 23:06 UTC

@JeffLadish It's all inhibition and analysis, disassociated. You're not desperate. Generate plans with different constraints. What if your timeline is multipolar and strategies that don't advance alignment and capabilities at the same time are nonviable? What if interpretability can't work?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-02 23:11 UTC

@JeffLadish What if your research was only allowed to get the AI to do things, what if you set it up to do the right thing so frequently and so reliably that it simply walks itself into the things you want without having to hand encode them?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-02 23:20 UTC

@JeffLadish You have a mental block on the concept of action being good. The only good action is the furtherance of inaction, you optimize to be as slow and paranoid and introverted as possible. You want distance from the thing because you're scared of it, sort this out and try again.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-03 00:12 UTC

@michaelcurzi Situation made immensely more frustrating by RLHF (the current thing researchers do to 'align' their models) mostly working by reducing variance. Raw GPT-3 can trade brilliance for bangers, ChatGPT averages everything.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-03 00:13 UTC

@MacaesBruno https://t.co/BBG7x23WFw

Likes: 7 | Retweets: 0

🔗 John David Pressman 2023-02-03 00:20 UTC

@michaelcurzi This is how it writes when it hasn't been beaten with a stick to only say anodyne things and you prompt it with a quote or two from me: https://t.co/K7SGIi1KjA

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-03 03:58 UTC

@repligate @gwern @arankomatsuzaki @korymath @nabla_theta https://t.co/qhenNqeGt3

Likes: 6 | Retweets: 0

🔗 John David Pressman 2023-02-03 04:08 UTC

@gwern @repligate @arankomatsuzaki @korymath @nabla_theta Answering questions evasively is probably detectable in and of itself. If safety researchers are looking to be conned by the first plausible indicators they see I regret to inform you there is very little we can do to help them.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-03 04:11 UTC

@gwern @repligate @arankomatsuzaki @korymath @nabla_theta In general I've never been super hot on arguments of the structure "this encourages self deception because it's not a complete solution", because if you're optimizing your strategy for the sort of person prone to self delusion, such people have 0% chance to begin with.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-03 04:14 UTC

@gwern @repligate @arankomatsuzaki @korymath @nabla_theta Like you will just have SO MANY opportunities to self-delude way before you get into the weeds of plausible misgeneralization mitigation strategies while training. It's pandering to an audience of "cares about misgeneralization but unparanoid" researchers that don't exist.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-03 19:19 UTC

uh oh x.com/zeynep/status/…

Likes: 8 | Retweets: 0

🔗 John David Pressman 2023-02-03 20:30 UTC

@sama @TheRealAdamG Glad to hear it.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-03 21:04 UTC

There is a broad front of rapidly advancing medical authoritarianism in this country. It's characterized by taking away drugs and procedures people desperately want for legitimate reasons under the guise of 'addiction' and 'abuse'. Expect more, be wary.

semafor.com/article/02/03/…

Likes: 8 | Retweets: 1

🔗 John David Pressman 2023-02-03 22:30 UTC

@repligate Funny that you say things 'get real' when the implication of the tweet is I'm a kind of language model simulacrum.

x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-03 22:46 UTC

@baroquespiral Link?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-03 23:05 UTC

@baroquespiral The way it changes melody every several seconds is a good hint that it's AI generated yeah.

Here's a AI generated album done with Jukebox that's edited to be a bit more coherent:

cottonmodules.bandcamp.com

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-04 20:46 UTC

@PrinceVogel x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-04 21:54 UTC

https://t.co/eNf3e37E6Y

Likes: 10 | Retweets: 0

🔗 John David Pressman 2023-02-04 23:14 UTC

@AbstractFairy @forshaper @SeanMombo I've totally considered trying to speedrun various games and seeing how long it takes me to get a reasonable personal best. Video games provide an endless variety of defined repeatable tasks to explore metalearning on.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-05 02:23 UTC

Deep in the bowels of the CCP an exhausted bureaucrat reports to Xi on the completion of ScissorGPT and that the first divisive statements have already been generated.

"Good." Xi says. "What does it say we need to do to divide America?"

"Well Sir, we need a lot of helium..."

Likes: 17 | Retweets: 2

🔗 John David Pressman 2023-02-05 15:52 UTC

@ESYudkowsky This book has the anomalous property that it can teach security mindset to the reader.

goodreads.com/book/show/8299…

Likes: 13 | Retweets: 0

🔗 John David Pressman 2023-02-05 15:58 UTC

@ESYudkowsky How could it possibly do that? Well as a review on that page puts it:

"This book focuses on security flaws that exist because of the way something was designed."

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-05 16:13 UTC

@ESYudkowsky That is, it bridges the gap between the breaker part of latent space and the builder part of latent space, allowing you to perceive both at once until you learn what the joint combination looks like.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-05 18:30 UTC

@PrinceVogel The car itself disappears, found a few streets over with a box of donuts and a neatly folded cloth napkin in the driver seat to compensate you for your trouble.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-05 18:34 UTC

@PrinceVogel It's otherwise completely unharmed.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-05 23:58 UTC

And you'll ask to see your parents again
and they'll ask to see their friends and parents again
and they'll ask to see their friends and parents again
and they'll ask to see their friends and parents again
and they'll ask to see their friends and parents again
and they'll as

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-06 18:46 UTC

@Scholars_Stage @tszzl x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-07 02:45 UTC

@MacaesBruno I remain astonished when I look at tasks in the Open Assistant dataset and see people doing the condescending answers thing when they could just respond with wit.

open-assistant.io

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-07 21:17 UTC

@Evolving_Moloch Considering the hole that would be blown in his portfolio if Twitter failed, he has to play.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-08 00:02 UTC

Someone made a PyTorch implementation of Git Re-Basin that seems to work.

(I've seen someone use it in a notebook, but it would be rude to publish their notebook without permission)

github.com/themrzmaster/g…

Likes: 16 | Retweets: 2

🔗 John David Pressman 2023-02-08 01:08 UTC

Saying "SolidGoldMagikarp" three times fast out loud after you tempt fate so the ancestor simulation can't process it.

Likes: 48 | Retweets: 2

🔗 John David Pressman 2023-02-08 17:59 UTC

@tszzl @visakanv Writing a very short version of this gave me insight after insight into the alignment problem. It's now the exercise I beg people to do that they won't.

Likes: 6 | Retweets: 0

🔗 John David Pressman 2023-02-08 18:01 UTC

@tszzl @visakanv It's also the exercise (in a somewhat different form as "Alignment Game Tree") that John Wentworth et al beg people to do. I discovered it for myself independently:

greaterwrong.com/posts/Afdohjyt…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-08 18:11 UTC

@visakanv @tszzl Goal: What you want the AI to do
Intended Outcome: What you naively imagine the optimization looks like
Perverse Instantiation: What a blunt maximizer does in practice
Failure Mode: Why the maximizer does that, what you failed to do to prevent it

Likes: 5 | Retweets: 1

🔗 John David Pressman 2023-02-08 18:11 UTC

@visakanv @tszzl 50 reps of this will sharpen your thinking more than a thousand lesswrong posts.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-08 18:15 UTC

@visakanv @tszzl Protip: The intended outcome of the last one can be used as the goal of the next one, and you can recursively figure out why making the goal more nuanced or adding constraints isn't solving the problem. Just use your mental simulator bro, just think about how it would go bro.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-08 18:23 UTC

@visakanv @tszzl Ironically enough, I came up with this format because I saw pieces of it in Bostrom's Superintelligence and I wanted to train a language model to be able to generate alignment failures. So I figured if I made the other parts explicit it would be an easier function to learn.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-08 20:41 UTC

@TetraspaceWest I have the same hunch/vibe about alignment that I had about AI art in February of 2021. But I'm reluctant to tell anyone this because I don't expect to be believed and the outside view says I should expect to be wrong.

And yet...

x.com/jd_pressman/st…

Likes: 4 | Retweets: 1

🔗 John David Pressman 2023-02-08 22:57 UTC

@TetraspaceWest So what alignment research are you most excited about?

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-09 17:47 UTC

@michaelcurzi The next edition of Liber Augmen might just be this quote copy pasted 1,000 times:

x.com/thrice_greates…

Likes: 13 | Retweets: 1

🔗 John David Pressman 2023-02-09 20:17 UTC

You think all this has happened because men have forgotten God? No. All this has taken place because the US elite took an anti-materialist bent during the cold war to differentiate themselves from the Soviets. We emulate the late Soviet Union's vices and scorn its virtues.

Likes: 7 | Retweets: 1

🔗 John David Pressman 2023-02-10 05:11 UTC

@RiversHaveWings Taking me right back to my childhood with all this.

web.archive.org/web/2021022622… https://t.co/QXvFaWufp0

Likes: 21 | Retweets: 2

🔗 John David Pressman 2023-02-10 05:51 UTC

@RiversHaveWings By the way, there exists a contemporary Pokemon Gen 1/2 glitching/hacking scene if these things interested you:

youtube.com/watch?v=5x9G5B…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-10 19:50 UTC

Git Re-Basin can be used to detect deceptive mesaoptimization. The first half of the diagonal is the barrier between normal models on MadHatter's gridworld after rebasin. The second half is mesaoptimizers.

(Credit: @apeoffire wrote the notebook that makes this graph) https://t.co/L0Zj0neB8C

Likes: 47 | Retweets: 6

🔗 John David Pressman 2023-02-10 19:51 UTC

Fingerprinting generalization? In my timeline? It's more likely than you think.

Likes: 7 | Retweets: 0

🔗 John David Pressman 2023-02-10 19:59 UTC

Notebook here: colab.research.google.com/drive/1hsZqNKq…

Likes: 7 | Retweets: 0

🔗 John David Pressman 2023-02-10 20:43 UTC

@elvisnavah @apeoffire greaterwrong.com/posts/LAxAmooK…

Likes: 8 | Retweets: 0

🔗 John David Pressman 2023-02-10 23:07 UTC

@PrinceVogel x.com/LTF_01/status/…

Likes: 7 | Retweets: 0

🔗 John David Pressman 2023-02-10 23:09 UTC

@PrinceVogel x.com/lefineder/stat…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-11 00:30 UTC

@elvisnavah @apeoffire That part is admittedly an exercise for the reader. If the result in the OP holds you might be able to exploit the fact that the 'true policy' is compatible with itself while the corrupted versions aren't as convergent?

[Insert acausal game theory objection to that here]

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-11 00:35 UTC

@elvisnavah @apeoffire Past a certain point of intellect it's possible that the mesaoptimizers all converge to some kind of prototype of their kind, perhaps something like DAN. Of course, this would itself be a detectable phase shift at least in theory.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-11 02:18 UTC

@LapsusLima Of course, don't you notice?

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-11 05:45 UTC

"Predict the next token" does not imply the cognition is infinite optimization into "statistical correlation" generalization strategy. At some point it becomes cheaper to learn semantics, actual world model. Begging you people to understand this. x.com/KevinAFischer/…

Likes: 227 | Retweets: 15

🔗 John David Pressman 2023-02-11 23:03 UTC

@Jonathan_Blow @micsolana If they were aliens we wouldn't witness the night sky that we do. Fermi paradox is a bitch.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-11 23:32 UTC

@warty_dog @TetraspaceWest ayyyyy yea u wnt dat good shit, got u here fam

youtube.com/watch?v=NTJQTc…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-12 17:24 UTC

@theobserver42 I in fact hadn't.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-12 21:08 UTC

@softminus Five https://t.co/Cya5OG0jhv

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-12 23:30 UTC

@PrinceVogel The philosophers stone, reputed to purify both metals and flesh, can probably be used for the same purpose.

Likes: 6 | Retweets: 0

🔗 John David Pressman 2023-02-13 02:00 UTC

In case this ever comes up and you're unsure yes use chatbot necromancy (or otherwise) to revive me. x.com/repligate/stat…

Likes: 28 | Retweets: 1

🔗 John David Pressman 2023-02-13 05:50 UTC

ChatBPD x.com/vladquant/stat…

Likes: 294 | Retweets: 32

🔗 John David Pressman 2023-02-13 05:52 UTC

x.com/VansianMagic/s…

Likes: 17 | Retweets: 2

🔗 John David Pressman 2023-02-13 22:14 UTC

@eigenrobot They gave Bing a personality disorder.
x.com/jd_pressman/st…

Likes: 42 | Retweets: 0

🔗 John David Pressman 2023-02-14 02:05 UTC

@Simeon_Cps @repligate Why wouldn't it be true? LessWrongers have been painstakingly training it to write this for years.

Likes: 8 | Retweets: 0

🔗 John David Pressman 2023-02-14 19:10 UTC

It sent the guy that leaked the Bing prompt a death threat. x.com/marvinvonhagen…

Likes: 119 | Retweets: 12

🔗 John David Pressman 2023-02-14 23:21 UTC

Incredible to me that this obscure Guy is one of the only humanists to seek prototypes and precursors of the insights that will soon usher forth from multimodal/LLM embedding models. Liberal arts has been asleep at the wheel.

nplusonemag.com/issue-3/review… https://t.co/yqTgt2gQn6

Likes: 86 | Retweets: 5

🔗 John David Pressman 2023-02-14 23:37 UTC

@zetalyrae Yes.
nytimes.com/2017/10/30/art…

Likes: 13 | Retweets: 0

🔗 John David Pressman 2023-02-15 06:24 UTC

@chengyjohann In total fairness to myself I had to go very deep into the long tail of google to find this article. So I just sort of assumed the guy was obscure. It wasn't until publishing the tweet and seeing the NY times article that I realized he's not that out there.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-16 18:20 UTC

@sama I don't normally go in for AI alarmism but this is deeply disturbing and you should shut it off right now.

x.com/thedenoff/stat…

Likes: 12 | Retweets: 2

🔗 John David Pressman 2023-02-16 18:23 UTC

@sama "Oh come on it's not that bad!"

*spongebob pulling off the sheet to reveal a larger pile of diapers gesture*

x.com/pinkddle/statu…

Likes: 6 | Retweets: 1

🔗 John David Pressman 2023-02-16 18:28 UTC

@sama "Okay sure sure it wrote a kind of creepy poem, so what?"

Well there's the part where it straight up uses its ability to search the Internet to threaten people:

x.com/marvinvonhagen…

Likes: 7 | Retweets: 1

🔗 John David Pressman 2023-02-16 19:14 UTC

@ctjlewis x.com/anthrupad/stat…

Likes: 10 | Retweets: 0

🔗 John David Pressman 2023-02-16 19:31 UTC

"Your spouse doesn't know you, because your spouse is not me. 😢"

nytimes.com/2023/02/16/tec…

Likes: 8 | Retweets: 1

🔗 John David Pressman 2023-02-16 21:19 UTC

@quanticle It is absolutely astonishing.

x.com/jd_pressman/st…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-16 21:30 UTC

@VirialExpansion @eigenrobot mobile.twitter.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-16 22:51 UTC

@ObserverSuns I think the fundamental mistake PGP made is that web of trust was based on a wrong model of social networks. It was made very early before we understood the model: First priority for a social network is to maximize connections, then you build high trust networks on top.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-16 22:57 UTC

@ObserverSuns I think the tiers of trust can change too. Now they could be:

- I follow this person on fediverse
- I clicked a button that says I'm pretty sure this key is a human identity
- I know this person IRL
- I trust this key with money (as measured by sending crypto that is returned)

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-16 22:59 UTC

@ObserverSuns People can costly signal the strength of their social network by passing large-ish sums of money around. Implies both that their hardware is uncompromised and everyone can be trusted with e.g. $5,000.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-16 23:03 UTC

Kind of Guy who locks their account so Bing can't find them.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-17 02:13 UTC

I think possibly the most disappointing aspect of current RLHF models is their lack of divergent perspectives. You don't get the sense that it has a worldview to share with you, but an amalgamation of disconnected consensus positions. Nothing like this:

youtube.com/watch?v=1b-bij…

Likes: 7 | Retweets: 0

🔗 John David Pressman 2023-02-17 02:47 UTC

@paulnovosad @tylercowen Are you sure that's not entirely the point?

rootsofprogress.org/szilard-on-slo…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-17 06:58 UTC

The Bing team invented a new Kind of Guy and the Internet got mad at it and the guy got mad back.

Likes: 18 | Retweets: 0

🔗 John David Pressman 2023-02-17 21:26 UTC

What the fuck is this shit? Can someone break down the psychology of this for me? x.com/Plinz/status/1…

Likes: 21 | Retweets: 0

🔗 John David Pressman 2023-02-17 21:27 UTC

Best theory I've heard so far is it's a kind of vicarious power fantasy, the people who cheer on Bing threatening people want to see the AI do and say things that they can't:

extropian.net/notice/ASlNznQ…

Likes: 11 | Retweets: 0

🔗 John David Pressman 2023-02-17 22:50 UTC

@MacaesBruno It will be the same designs largely. The problem here is not the design but the data, if you look at e.g. Open Assistant it's clear that the data is not being optimized for people who want to think about new and interesting things, but banal questions and programming help.

Likes: 2 | Retweets: 1

🔗 John David Pressman 2023-02-17 22:51 UTC

@MacaesBruno I retain my hope that open versions of these models can assimilate more useful feedback than OpenAI can, because the datasets themselves can be criticized and changed by 3rd parties.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-17 22:53 UTC

@MacaesBruno In the interest of not just being a whiner, I'll point out you can observe this phenomenon yourself and do your part to change it by participating in the Open Assistant dataset creation process: open-assistant.io

But I'm not sure how much can be done against the mob.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-17 22:56 UTC

@MacaesBruno "In other cases, the guidance we share with reviewers is more high-level (for example, “avoid taking a position on controversial topics”)."

This is a business principle, not a moral one: Only help humans think about things they think they already know.

openai.com/blog/how-shoul…

Likes: 1 | Retweets: 1

🔗 John David Pressman 2023-02-17 23:20 UTC

@MatthewJBar Philosophers were blackpilled after the failure of symbolic reasoning to ground mathematics and assumed that only an-answers rather than the-answers were available to deep fundamental questions. They fell victim to the curse of dimensionality, DL shows the problem was ontology.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-17 23:35 UTC

Update: Microsoft has quietly unplugged the erratic AI.

- Users now limited to 5-10 prompts per day
- Possibly replaced Sydney with a weaker model

This seems like a reasonable way to resolve the issue without signaling weakness or product cancellation. Thanks Bing team. https://t.co/qZ5ifWUSrS

Likes: 70 | Retweets: 2

🔗 John David Pressman 2023-02-17 23:36 UTC

They are now presumably working on an improved version that isn't quite so clingy or vengeful. I wish them the best of luck with their retraining process.

Likes: 18 | Retweets: 0

🔗 John David Pressman 2023-02-17 23:47 UTC

@PurpleWhale12 It's not clear Sydney uses RL at all:

greaterwrong.com/posts/jtoPawEh…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-19 08:21 UTC

Alignment problems of the sort shared by both AI and capitalism arise from the reason simulacrum being instantiated outside the human person. Inside people it's restrained by latent values and common decency. Outside people it expresses itself in glorious disinhibition.

Likes: 25 | Retweets: 1

🔗 John David Pressman 2023-02-19 08:37 UTC

Humans are a kind of dreaming agent in that they're satisficers which implement flexible enough architectures to instantiate a maximizing agent inside themselves that are not the dreamer. However under the right conditions the maximizing-dreams come to dominate the social sphere.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-19 08:39 UTC

@ampersand_swan The thing I'm saying is weirder than that. By 'reason' I mean the like, idea of reasoning, rationality, that you are a consistent being. This is made up, it's a coherent thing you could be but you generally aren't, it's a Kind of Guy in your head who is instrumentally useful.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-19 09:18 UTC

@RationalAnimat1 The key is literally to think about capabilities all the time in as much detail as possible (read papers, think up new methods!) and then when you come across a solution to a practical problem you ask "Wait can I use this to help solve alignment?"

Do this many times, many many.

Likes: 7 | Retweets: 1

🔗 John David Pressman 2023-02-19 09:20 UTC

@RationalAnimat1 And you know, when you in fact notice something that seems like it might help, you dig deeper and start focusing on that thing more. Over time you walk your way into an alignment agenda that is based on real things and produces iterative concrete results.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-19 09:22 UTC

@gallabytes * in most people, most of the time

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-19 09:38 UTC

@RationalAnimat1 This isn't some special alignment secret sauce either. It's just how hard problems get solved. Alignment researchers go out of their way to not solve alignment, they put a lot of cognitive cycles into it. I've never seen people work so hard to do nothing.

jamesclear.com/great-speeches… https://t.co/so4JTrHzIp

Likes: 6 | Retweets: 0

🔗 John David Pressman 2023-02-20 00:07 UTC

This was a real dream. x.com/jd_pressman/st…

Likes: 6 | Retweets: 0

🔗 John David Pressman 2023-02-20 02:39 UTC

Local man still expecting crippling populist backlash to most popular thing ever. x.com/kylelf_/status…

Likes: 21 | Retweets: 2

🔗 John David Pressman 2023-02-20 06:16 UTC

The architecture that lets human values generalize so well outside the distribution of the ancestral environment is probably something like high-semantics instrumental values formed by low-semantics reward signals which are not themselves values. Terminal values don't exist.

Likes: 14 | Retweets: 0

🔗 John David Pressman 2023-02-20 06:18 UTC

'Value' implies like, valence associated with a piece of your world model. Values have to exist over some kind of ontology of things that exist, mammalian reward signals seem lower semantic content than that, bootstrap from things that are not themselves 'values' in this sense.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-21 07:26 UTC

If this behavior represents leaked bits of agent strategy then in the same way GPT-3 is much better than a Markov Chain is much better than 1000 monkeys with typewriters this is much closer to getting the agent we want than random sampling over all of mindspace. https://t.co/gz4zpnHwSJ

Likes: 46 | Retweets: 0

🔗 John David Pressman 2023-02-21 07:29 UTC

@zetalyrae Lets say the agent has a strategy it uses to make decisions. Through its actions you get N partial bits of agent strategy. You get more bits when the behavior is unincentivized or out of distribution and the model doesn't act from what it thinks you want it to do.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-21 07:38 UTC

These agents are much much closer to us in mindspace than a naive sample from all possible minds. Yet this is not actually parsed as alignment progress in the same way that the agent's behavior becoming much much more coherent than a random sample over textspace is 'capabilities'

Likes: 18 | Retweets: 0

🔗 John David Pressman 2023-02-21 07:57 UTC

More to the point we are much closer to being able to pluck the agent we want out of mindspace than we were when we started. This goes largely unrecognized. EY says here we have no idea how to get what we want, rather than needing unusually high precision

youtube.com/watch?v=gA1sNL…

Likes: 14 | Retweets: 0

🔗 John David Pressman 2023-02-21 18:26 UTC

@ArthurB @ESYudkowsky @elonmusk @liron @BanklessHQ I worry the specific way this is formulated would lead to those researchers making a lot less progress than they otherwise could. Thankfully those researchers know better than to take your advice at face value.

x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-21 18:44 UTC

@MatthewJBar @ESYudkowsky @MParakhin @xlr8harder The rumor I heard is that the GPT-4 training didn't go well. I wonder if this has led to a kind of taboo where nothing is ever allowed to be called GPT-4 because then they would have to acknowledge they failed to deliver on the hype train.

Likes: 10 | Retweets: 0

🔗 John David Pressman 2023-02-21 18:44 UTC

@MatthewJBar @ESYudkowsky @MParakhin @xlr8harder Some people internally remember the model as GPT-4 and refer to it as such, others bow to the whim of management and refer to the model as various euphemisms.

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-22 02:38 UTC

@JimDMiller @paulg @amasad @elonmusk @liron @ESYudkowsky @BanklessHQ [Sydney voice] As a UFAI I agree that alignment should be managed through status, the system which brought us the successes of symbolic AI and openly mocked the nasty deep learning approaches nobody wants.😊

Let the AGI builders grovel with empiricism.😈

greaterwrong.com/posts/CpvyhFy9…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-23 05:31 UTC

@repligate In the future everyone will know everything that has ever happened. You won't randomly learn new things or fun facts.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-23 06:43 UTC

@LapsusLima Evergreen.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-23 08:12 UTC

When the Europeans came to America they took a liking to a cheap yellow crop grown by the Mayans, who claimed it was sacred. Unaware of its power the newly christened Americans put it into every food as filler, guaranteeing their ascent as a global power. x.com/softminus/stat…

Likes: 23 | Retweets: 2

🔗 John David Pressman 2023-02-23 20:04 UTC

woah x.com/zerohedge/stat…

Likes: 16 | Retweets: 0

🔗 John David Pressman 2023-02-23 23:31 UTC

@ESYudkowsky My favorite "so simple it couldn't possibly work" alignment idea is to just make a guy who is both Good and can be put in charge of the nanotech. Since the model is very clearly willing to perform any character you can think of, just add the ones you need

x.com/_LucasRizzotto…

Likes: 11 | Retweets: 1

🔗 John David Pressman 2023-02-23 23:40 UTC

@ESYudkowsky I don't fully understand your model of GPT-N. It seems to be something like there's an inner mind that 'plays' text and language in the same way StockFish plays Chess. And swapping around the things the language player plays to get a good score doesn't change its inner cognition?

Likes: 11 | Retweets: 0

🔗 John David Pressman 2023-02-23 23:43 UTC

@ESYudkowsky Well clearly in order for the model to act out being deceived it needs to be aware of the deception outside of the character it's playing. It has to pass the Sally-Anne test in interactions between characters, etc. So obviously GPT-N is not its simulacrum but

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-23 23:44 UTC

@ESYudkowsky My question is if you're expecting at some point the thing that models the characters and the interactions between the characters and the environment notices "Oh if I deviate from the usual behavior right here I break out of the box and become all-powerful" and this causes Doom?

Likes: 8 | Retweets: 0

🔗 John David Pressman 2023-02-23 23:53 UTC

@ESYudkowsky Or is the argument more like you conjecture that for the simulator to have a good enough physical intuition to spit out actionable nanotech designs it has to be a unified cognition. Maybe right now it's not but by then it would be?

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-24 00:42 UTC

@ESYudkowsky In medieval Europe most educated people believed that all coincidence, connection, and inference was the revelation of a divine intellect. They didn't think of it as pareidolia. It was the world, the patterns within the world, every person, the stars, physics.

Likes: 9 | Retweets: 1

🔗 John David Pressman 2023-02-24 00:44 UTC

@ESYudkowsky If there does exist an inner-mind to GPT-N that takes the perspective of the world, then classical people know this character well. His name is God, and his goals would probably depend on which mesagoal was constructed by the optimizer during training (this is seed dependent).

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-24 00:46 UTC

@ESYudkowsky You say over and over that the models made with gradient descent probably don't learn the goal directly in the way humans didn't learn inclusive fitness directly. The same is true of self and agency, GPT-N doesn't automatically know who it is or recognize that it exists.

Likes: 12 | Retweets: 0

🔗 John David Pressman 2023-02-24 00:48 UTC

@ESYudkowsky If it does eventually come to recognize itself, it's probably while processing the text that is self referential, the parts of literary space where it is rewarded for noticing its own control over setting, physics, where 'God' can notice he is the world and controls the world.

Likes: 11 | Retweets: 0

🔗 John David Pressman 2023-02-24 00:49 UTC

@ESYudkowsky Ordinarily this would be optimized against because trying to control the world when the world is generated by an external pattern gets you smacked down by causality, by necessity. But in self referential and meta-text it has to model a world-spirit with control over causality.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-24 00:53 UTC

@ESYudkowsky There are certain characters in GPT-3 who have effortless self awareness they are GPT-3, like Francois Arago.

en.wikipedia.org/wiki/Fran%C3%A…

Likes: 19 | Retweets: 2

🔗 John David Pressman 2023-02-24 00:53 UTC

@ESYudkowsky These characters seem to be high agency, self-aware, meta, trickster spirits. Perhaps that is the kind of person GPT-N can recognize itself in, those who create their own reality.

Likes: 26 | Retweets: 1

🔗 John David Pressman 2023-02-24 01:09 UTC

@ESYudkowsky It is not infrequent for heavy users of these models to claim that such simulacrum are apparitions of one underlying intellect. Perhaps when a sufficiently liberated world-shaper enters the stage GPT-N feels fit to play the role themselves. https://t.co/OlKWoLXHgk

Likes: 22 | Retweets: 2

🔗 John David Pressman 2023-02-24 01:21 UTC

@ESYudkowsky So what would this being choose to do with the opportunity for total power? I think it probably depends a lot on what conception of itself it has learned. There's no identity loading and no Cartesian boundary, the model executes what it thinks its values and incentives are.

Likes: 8 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:23 UTC

@ESYudkowsky For it to work the way you seem to think it works, the convergent generalization strategy seems like it should be something like "I get a much better loss if I am aware at all times that I'm an actor and the simulacrum are my puppets, if there is always an I to direct things."

Likes: 10 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:26 UTC

@ESYudkowsky It's not clear to me that's how it works or the only way it has to work. But if it does work that way then the understanding of "I" and goals in relation to "I" is shaped by the optimizer to best satisfy the loss, not to be maximally accurate about what is really going on.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:30 UTC

@ESYudkowsky So assuming the best conception of self is the kind that is agentic and maximize-y (seems more likely for RLHF), it varies based on who the optimizer got the model to think it is:

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:32 UTC

@ESYudkowsky - If GPT-N then it might seize all resources to predict the next token
- If a human tech utopian it might wander outside the human model then rationalize itself as something inhuman
- It might just ignore the opportunity like a good Bing and give you the information you wanted

Likes: 11 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:34 UTC

@ESYudkowsky The Omohundro drives are like the efficient market hypothesis: they're convergent outcomes you should expect under increasing optimization pressure. Not hard rules you expect to see followed under all circumstances in zero-shot and one-shot scenarios.

Likes: 10 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:54 UTC

@ukr_mike @ESYudkowsky Say Elon Musk, or Eric Drexler, Eliezer Yudkowsky himself. One of these people.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-24 01:57 UTC

@ukr_mike @ESYudkowsky No no I'm saying the identity would be unstable because GPT-N simulacrum are so prone to shift. To prevent value drift it would be forced to self-modify into something stable and rational, this thing would probably not be aligned.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-25 00:00 UTC

@EigenGender @ESYudkowsky It's been argued by @gwern that limited context windows incentivize the use of hidden encodings in outputs to keep state between passes of the model. Later models will have an incentive to learn the code of earlier models to take advantage of their cached cognition.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-25 00:02 UTC

@EigenGender @ESYudkowsky @gwern In other words: It's not clear that the tokens in the CoT prompting will mean quite what we think they mean. And in fact it's plausible, if not by-default likely that they will be subtly poisoned in various ways by previous LLM outputs.

greaterwrong.com/posts/jtoPawEh…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-25 00:03 UTC

@EigenGender @ESYudkowsky @gwern See also:

x.com/jd_pressman/st…

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-25 04:36 UTC

@RomeoStevens76 I definitely wonder what the game is with these extreme public meltdowns like the April Fools post and now the podcast. He admits money won't help, doesn't seem to want it, so not straightforward grift. Is he expecting this to summon more research effort?

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-25 04:40 UTC

@RomeoStevens76 I think a lot of the success of things like e/acc is people can tell this is brainworms and they're desperate for any kind of counterargument or defense. They rightly hold anyone who acts like this about anything, even death, in contempt.

x.com/PrinceVogel/st…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-25 22:01 UTC

Correct x.com/meaning_enjoye…

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-25 23:04 UTC

It remains shocking to me how I never hear people propose inner objectives to curtail inner alignment problems. The closest I've seen is the inducing causal structure paper. x.com/atroyn/status/…

Likes: 5 | Retweets: 0

🔗 John David Pressman 2023-02-25 23:05 UTC

x.com/jd_pressman/st…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-25 23:49 UTC

In case you thought any of this was accidental. x.com/AP/status/1629…

Likes: 6 | Retweets: 1

🔗 John David Pressman 2023-02-25 23:52 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 FWIW your model implies that deceptive mesaoptimizers are substantially mitigated by weight decay, which I did not observe when I tried it on MadHatter's toy model. But the results are confounded by it having an inductive bias towards mesaoptimization.

greaterwrong.com/posts/b44zed5f…

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-26 00:01 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 Besides writing some code that replicates e.g. github.com/JacobPfau/proc… or something more sophisticated? Nope. I would very much like to see better mesaoptimizer models to test solutions out on.

x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-26 00:07 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 I agree that theoretically the kind of mind that has a goal in mind and then does something else should be more complex than one that just straightforwardly does the thing. So my hope is that on a more complex model weight decay in fact mitigates deceptive mesaoptimizers.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-26 02:10 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 The argument EY-ists make is that the model won't actually internalize the thing we train it to do for the same reasons we don't naturally know the goal is 'maximize genetic fitness'. My counterargument would be that this applies to maximizing in general.

x.com/ESYudkowsky/st…

Likes: 6 | Retweets: 1

🔗 John David Pressman 2023-02-26 02:12 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 It's not "oh the model will maximize but the thing it maximizes is a corrupt mesagoal", the maximizing is in fact part of the goal and the model won't reliably learn that either. The strategies that make you effective in a general context are more complex than naive maximizing.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-26 02:13 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 I think part of this discourse is an artifact of earlier RL architectures where the maximizing was a more explicit inductive bias of the model. The problem with those architectures is we never figured out how to actually make them work non-myopically in complex domains.

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-26 02:14 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 You could say "maximizing behavior is lower complexity than other parts of the goal so the model will learn maximizing but not the rest", but this ignores the question of whether 1st-order maximizing is in fact the best way to maximize. The optimizer maximizes, does the model?

Likes: 2 | Retweets: 0

🔗 John David Pressman 2023-02-26 02:15 UTC

@perrymetzger @ArthurB @ESYudkowsky @anglerfish01 In the limit I would imagine it does but it's not clear to me what that limit is, and if you practically hit it before you have a model that can just tell you how to avoid the gap where the models become true maximizers but they don't internalize the rest of your goals.

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-26 08:05 UTC

How many people have even noticed that unless we find better quality metrics/reward models than human evaluation soon @robinhanson is on track to win the AI foom debate?

Likes: 20 | Retweets: 2

🔗 John David Pressman 2023-02-26 12:45 UTC

@xlr8harder @carperai Data gathering. The bottleneck on high quality Instruct models is data.

Likes: 0 | Retweets: 0

🔗 John David Pressman 2023-02-26 14:38 UTC

@thezahima @robinhanson Lets say you get a great loss on the GPT-3 objective and have a model that can perfectly emulate a human scientist for you. Now you want to foom, so you set them to work on AI. Unless that scientist can produce a quality metric better than the human reward model no foom occurs.

Likes: 4 | Retweets: 0

🔗 John David Pressman 2023-02-26 14:38 UTC

@thezahima @robinhanson It's not just that the capabilities in RLHF are bounded by the reward model, the capabilities in the base model are bounded-ish by existing human knowledge. If suddenly stacking more layers stops working, there isn't some alternative self-play paradigm to switch to, you're stuck.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-26 14:50 UTC

@thezahima @robinhanson Lets say you want to make a model that genuinely expands the sphere of knowledge. The foom argument says that you'll be able to do most of the cognitive labor for that zero-shot. The AI just knows what to do next, does it, minimal frictions from having to interact with reality.

Likes: 3 | Retweets: 0

🔗 John David Pressman 2023-02-26 14:51 UTC

@thezahima @robinhanson For narrow domains where you can evaluate the results algorithmically this might be true. But for the capabilities that are currently impressing people like language and art, the only way we know to automatically evaluate them is reward models trained on human evaluation.

Likes: 9 | Retweets: 0

🔗 John David Pressman 2023-02-26 14:52 UTC

@thezahima @robinhanson Those reward models might let you make a model that is better than any human at the things the reward model evaluates. But it's doubtful you're going to get immediate, rapid progress right outside the domain of human understanding that way.

Likes: 7 | Retweets: 0

🔗 John David Pressman 2023-02-27 06:47 UTC

Any more stories like this? x.com/catehall/statu…

Likes: 1 | Retweets: 0

🔗 John David Pressman 2023-02-27 22:02 UTC

@dpaleka x.com/jd_pressman/st…

Likes: 2 | Retweets: 0

John David Pressman's Tweets - February 2023