@MalwareTechBlog Why the assumption they don't have PTSD?
@Ted_Underwood At least we still have the text, RIP.
web.archive.org/web/2022082622β¦
You can set structural objectives for the inner layout of a neural network, enforcing shared causal structure with a legible model.
"Inducing Causal Structure for Interpretable Neural Networks"
proceedings.mlr.press/v162/geiger22a⦠https://t.co/Z4x0aLOVah
This implies you can also audit an existing neural network circuit for shared causal structure with an arbitrary model so long as you can reliably find the circuit associated with the behavior(s) you care about.
@joerealboy @MrPrudentialist x.com/jd_pressman/stβ¦
@ESYudkowsky CEV falls under the "not trainable" fill-in of the "Why Your Alignment Plan Doesn't Work" form letter. Though if you have ideas for how to formulate it as a training objective I'm all ears.
@ESYudkowsky Beyond that I suspect the cruxes on CEV will come down to how strong a version of the orthogonality thesis you believe. It's not clear to me that people have values:
- Aggregable across humans
- rationally consistent in a VNM-y way
- Cleanly separated from their intellect
@ESYudkowsky You yourself have previously warned against boosting the IQ on an em because it has a high likelihood of diverging from any human notion of value, this might be intrinsic to any such scheme and would make a key step of the CEV plan incoherent.
@ESYudkowsky A Landian argument could be seriously made that the alignment problem grapples with the aesthetic-prior irrationality of the human mind, and that alignment (principal-agent) problems arise when we externalize the Rational into consistent utilities no person has ever embodied.
@ESYudkowsky This is usually the step where a romantic says that we should reject rationality, but that's ridiculous. The irrational is a brief interlude between genesis and formatting our light cone. If we find we have reason to reject it the problem is us, and we'll have to fix ourselves.
@ESYudkowsky Yes, my sincere belief is that having wishes that don't go in circles is *at best* a cultivated practice that requires deliberate focus over the course of years. It also requires an aesthetic preference for extreme consistency very few people have.
@ESYudkowsky Vervaeke argues something like shamans invent the foundations for modern humanity by finetuning their adversarial-anthropic prior into an animist prior, at their best the rationalists finetune their anthropic-animist priors into a fully materialist prior.
youtube.com/watch?v=54l8_eβ¦
@ESYudkowsky People with materialist priors become bad at adversarial thinking because understanding the natural world largely doesn't require it, which is how the logical conclusion of Moravec's paradox can exist in Elmer Fudd AI that is fooled by simple perturbations in the input.
@ESYudkowsky This is why during the latter half of the 20th century we regress to animist-priors in the postmodernist vein, it's more individually useful to use a frame that excels at adversarial games when society is on a decaying trajectory, accelerating the decline.
@ESYudkowsky Materialist-prior agents tend to have converge-y goals ("I want to live in extreme wealth!"), animist-prior agents tend to have GAN-y seesaw goals which do not converge ("I want my team to win the Superbowl!"), GANs are infamous for the locality of their values and...
@ESYudkowsky ...inability for their goals to have meaning outside of the adversary. If the adversary were to perish they would recreate it so the saga could continue.
@ESYudkowsky If when the adversary is there you want it gone and when it's gone you want it back you're not leaking value but you are in a loop.
@ESYudkowsky tbh thinking about this made me realize I wasn't distinguishing the "leaks value" dutch book failure case and the "stuck in a loop" failure case in my mind because I assumed if you flip-flop and make thermodynamic waste in the process that's de-facto wrong but people can enjoy it
@ESYudkowsky The premise assumes an agent which is objectively good and eudaimonic while also causally entangled in its construction with the mortals it is trying to be an outside perspective to. The thought experiment is tangled up with me too so how could I possibly answer?
@sorceressofmath Solved problem: rom1504.github.io/clip-retrieval/
@sorceressofmath It uses a deep learning models digest of the text/images
openai.com/blog/clip/
@sorceressofmath It's therefore not suitable in an adversarial context where integrity is important (e.g. the thing IPFS content addressing is trying to solve), but if you have another digest that ensures integrity you could layer this on top for semantic search.
@sorceressofmath You would need to curate the sources so that it doesn't get attacked by spammers, too.
@sorceressofmath There exists an open implementation which is comparable to the huge (and hugely expensive) ViT-H CLIP model OpenAI used as the encoder for DALL-E 2, if you'd like to try out some ideas in this vein:
huggingface.co/laion/CLIP-ViTβ¦
@sorceressofmath This paper claims it may also be possible to avoid spending all that money by aligning the representations of two vastly cheaper monomodal encoders, or if you're doing a domain where labeled data is harder to come by than caption-image pairs:
x.com/FrancescoLocatβ¦
@morphillogical @DRMacIver @ESYudkowsky But the harms to people who were lured there on the back of MIRI/CFAR cluster propaganda were substantial, I had friend after friend disappear from the Internet and show back up on my radar living in poverty in one of those group houses.
x.com/jd_pressman/stβ¦
@michaelcurzi youtube.com/watch?v=yModCUβ¦
@JimDMiller @waitbutwhy Nah, do you know the name of the person that invented nitrogen fertilizer offhand? All of modernity is sitting on the back of that one. How about the people that invented oil drilling?
@mr_scientism @apex_simmaps x.com/jd_pressman/stβ¦
Just to give this a final clarification: The grandparent tweet was a prediction about how physics is likely to work, I am not in possession of nor have I ever claimed to be in possession of a method to vacuum collapse or otherwise destroy the universe.
@ESYudkowsky I think it would work better with a retrieval model where the (presumably freely licensed, redistributable) source documents can be cited along with the output.
Cheaper to train too.
@eigenrobot A surprisingly OK page:
lesswrong.com/tag/rationalisβ¦
@eigenrobot Also relevant:
lesswrong.com/posts/S9B9FgaTβ¦
@Malcolm_Ocean This is also a tactic to make it harder to be QT'd or taken out of context by an angry mob.
@ESYudkowsky @DavidDeutschOxf @ShaneLegg I think Friendship Is Optimal squared the circle by making the utopia flawed, which accidentally made it narratively interesting and desirable in a way that a straightforward utopian work wouldn't have been.
@Scholars_Stage "I know my fate. There will come a day when my name will recall the memory of something frightfulβa crisis the likes of which has never been known on earth."
- Nietzsche
@PrinceVogel This one probably doesn't have as much UI sex and polish as the others, but it exists to help people make maps of the places they travel.
wiki.openstreetmap.org/wiki/OSMtrackeβ¦
"The AI wireheads you instead of satisfying your existing desires" is basically Fristonian. What Friston is trying to tell you is that as models become more powerful they export their inductive biases to their surroundings. DL models are not causal or discrete, therefore...
@RomeoStevens76 I mostly just raised an eyebrow that of all the tweets I've written that was the one that ended up in an SSC post.
Want your own Twitter archive? Modify this script.
Twitter Archive by John David Pressman is marked with CC0 1.0