~www_lesswrong_com | Bookmarks (664)

On Fables and Nuanced Charts — LessWrong

lesswrong.com

Published on September 8, 2024 5:09 PM GMTWritten by Spencer Greenberg & Amber Dawn Ace for...
Published on September 8, 2024 5:09 PM GMTWritten by Spencer Greenberg & Amber Dawn Ace for Asimov Press.In 1994, the U.S. Congress passed the largest crime bill in U.S. history, called the Violent Crime Control and Law Enforcement Act. The bill allocated billions of dollars to build more prisons and hire 100,000 new police officers, among other things. In the years following the bill’s...
1
Fictional parasites very different from our own — LessWrong

lesswrong.com

Published on September 8, 2024 2:59 PM GMTNote: this is a fictional story. Heavily inspired by...
Published on September 8, 2024 2:59 PM GMTNote: this is a fictional story. Heavily inspired by SSC’s similar posts on fictional legal systems and fictional banned drugs.Neuroplana temporalisNeuroplana temporalis is a flatworm that resides in the cerebrospinal fluid of mammals with a diurnal rhythm. It has a particular affinity for the regions surrounding the suprachiasmatic nucleus — the brain's central pacemaker. From this location,...
1
That Alien Message - The Animation — LessWrong

lesswrong.com

Published on September 7, 2024 2:53 PM GMTOur new video is an adaptation of That Alien...
Published on September 7, 2024 2:53 PM GMTOur new video is an adaptation of That Alien Message, by @Eliezer Yudkowsky. This time, the text has been significantly adapted, so I include it below. The author of the adaptation is Arthur Frost. Eliezer has reviewed the adaptation.Part 1Picture a world just like ours, except the people are a fair bit smarter: in this world, Einstein...
1
Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps — LessWrong

lesswrong.com

Published on September 7, 2024 10:04 AM GMTJonothan Gorard is a mathematician for Wolfram Research and...
Published on September 7, 2024 10:04 AM GMTJonothan Gorard is a mathematician for Wolfram Research and one of the cofounders of the wolfram physics project. I recently came across this twitter thread from him and found it particularly insightful:Jonothan Gorard: The territory is isomorphic to an equivalence class of its maps. As this is pretty much the only statement of my personal philosophical outlook on...
1
Pay Risk Evaluators in Cash, Not Equity — LessWrong

lesswrong.com

Published on September 7, 2024 2:37 AM GMTPersonally, I suspect the alignment problem is hard. But...
Published on September 7, 2024 2:37 AM GMTPersonally, I suspect the alignment problem is hard. But even if it turns out to be easy, survival may still require getting at least the absolute basics right; currently, I think we're mostly failing even at that.Early discussion of AI risk often focused on debating the viability of various elaborate safety schemes humanity might someday devise—designing AI...
1
Excerpts from "A Reader's Manifesto" — LessWrong

lesswrong.com

Published on September 6, 2024 10:37 PM GMT“A Reader’s Manifesto” is a July 2001 Atlantic piece...
Published on September 6, 2024 10:37 PM GMT“A Reader’s Manifesto” is a July 2001 Atlantic piece by B.R. Myers that I've returned to many times. He complains about the inaccessible pretension of the highbrow literary fiction of his day. The article is mostly a long list of critiques of various quotes/passages from well-reviewed books by famous authors. It’s hard to accuse him of cherry-picking...
1
Fun With CellxGene — LessWrong

lesswrong.com

Published on September 6, 2024 10:00 PM GMTMidjourney imageFor this week’s post, I thought I’d mess...
Published on September 6, 2024 10:00 PM GMTMidjourney imageFor this week’s post, I thought I’d mess around a bit with the CellXGene tool provided by the Chan Zuckerberg Institute.It’s based on a big dataset of individual cells, classified by tissue, cell type, and disease state, and their gene expression profiles (single-cell RNA counts). You can automatically compare how gene expression looks different between sick...
1
Is this voting system strategy proof? — LessWrong

lesswrong.com

Published on September 6, 2024 8:44 PM GMTMy voting system works like this. Each voter expresses...
Published on September 6, 2024 8:44 PM GMTMy voting system works like this. Each voter expresses their preferences for all candidates on a real numbered utility scale. Then a Maximal lottery takes place over all lotteries over candidates. https://en.wikipedia.org/wiki/Maximal_lotteriesLets describe this in more detail. Suppose there are 3 candidates. A,B,C. The set of candidates is S={A,B,C}.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style:...
1
Adam Optimizer Causes Privileged Basis in Transformer Language Models — LessWrong

lesswrong.com

Published on September 6, 2024 5:55 PM GMTDiego Caples (diego@activated-ai.com)Rob Neuhaus (rob@activated-ai.com)IntroductionIn principle, neuron activations in...
Published on September 6, 2024 5:55 PM GMTDiego Caples (diego@activated-ai.com)Rob Neuhaus (rob@activated-ai.com)IntroductionIn principle, neuron activations in a transformer-based language model residual stream should be about the same scale. In practice, however the dimensions unexpectedly widely vary in scale. Mathematical theories of the transformer architecture do not predict this. They expect rotational equivariance within a model, where one dimension is no more important than any...
1
Backdoors as an analogy for deceptive alignment — LessWrong

lesswrong.com

Published on September 6, 2024 3:30 PM GMTARC has released a paper on Backdoor defense, learnability...
Published on September 6, 2024 3:30 PM GMTARC has released a paper on Backdoor defense, learnability and obfuscation in which we study a formal notion of backdoors in ML models. Part of our motivation for this is an analogy between backdoors and deceptive alignment, the possibility that an AI system would intentionally behave well in training in order to give itself the opportunity to...
1
A Cable Holder for 2 Cent — LessWrong

lesswrong.com

Published on September 6, 2024 11:01 AM GMTOn Amazon, you can buy 50 cable holders for...
Published on September 6, 2024 11:01 AM GMTOn Amazon, you can buy 50 cable holders for 20 cents a piece. In this video, I show how to make one for 2 cents. If I need just one I can make just one, using only readily available materials.However, note that usually, cable holders on Amazon allow you to detach and reattach cables easily. If you...
1
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs — LessWrong

lesswrong.com

Published on September 6, 2024 2:28 AM GMTExperiments and write-up by Daniel, with advice from Stefan....
Published on September 6, 2024 2:28 AM GMTExperiments and write-up by Daniel, with advice from Stefan. Github repo for the work can be found here.TL;DRBy perturbing activations along specific directions and measuring the resulting changes in the model output, we attempt to infer how much the directions matter for the model's computation. Through the sensitive directions experiments, we show that:Heimersheim (2024)’s sensitive direction baselines...
1
What is SB 1047 *for*? — LessWrong

lesswrong.com

Published on September 5, 2024 5:39 PM GMTEmmett Shear asked on twitter:I think SB 1047 has...
Published on September 5, 2024 5:39 PM GMTEmmett Shear asked on twitter:I think SB 1047 has gotten much better from where it started. It no longer appears actively bad. But can someone who is pro-SB 1047 explain the specific chain of causal events where they think this bill becoming law results in an actual safer world? What’s the theory?And I realized that AFAICT no...
1
instruction tuning and autoregressive distribution shift — LessWrong

lesswrong.com

Published on September 5, 2024 4:53 PM GMT[Note: this began life as a "Quick Takes" comment,...
Published on September 5, 2024 4:53 PM GMT[Note: this began life as a "Quick Takes" comment, but it got pretty long, so I figured I might as well convert it to a regular post.]In LM training, every token provides new information about "the world beyond the LM" that can be used/"learned" in-context to better predict future tokens in the same window.But when text is...
1
Conflating value alignment and intent alignment is causing confusion — LessWrong

lesswrong.com

Published on September 5, 2024 4:39 PM GMTSubmitted to the Alignment Forum. Contains more technical jargon...
Published on September 5, 2024 4:39 PM GMTSubmitted to the Alignment Forum. Contains more technical jargon than usual. Epistemic status: I think something like this confusion is happening often. I'm not saying these are the only differences in what people mean by "AGI alignment".Summary: Value alignment is better but probably harder to achieve than personal intent alignment to the short-term wants of some person(s). Different groups...
1
A bet for Samo Burja — LessWrong

lesswrong.com

Published on September 5, 2024 4:01 PM GMTI'm listening to Samo Burja talk on the Cognitive...
Published on September 5, 2024 4:01 PM GMTI'm listening to Samo Burja talk on the Cognitive Revolution podcast with Nathan Labenz. Samo said that he would bet that AGI is coming perhaps in the next 20-50 years, but not in the next 5. I will take that bet. I can't afford to make an impressively large bet because my counterfactual income is already tied...
1
UBI isn’t designed for technological unemployment — LessWrong

lesswrong.com

Published on September 5, 2024 3:39 PM GMTA universal basic income (UBI) is often presented as...
Published on September 5, 2024 3:39 PM GMTA universal basic income (UBI) is often presented as a public insurance against large-scale and potentially permanent technological unemployment. Many Silicon Valley leaders that believe in and work on the transformative economic potential of artificial general intelligence (AGI) have also voiced their support for UBI:Sam Altman: “If we cannot find a new kind of work for billions...
1
Why Reflective Stability is Important — LessWrong

lesswrong.com

Published on September 5, 2024 3:28 PM GMTImagine you have the optimal AGI source code O.mjx-chtml {display:...
Published on September 5, 2024 3:28 PM GMTImagine you have the optimal AGI source code O.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0; min-height: 0; border: 0; margin: 0; padding: 1px 0} .MJXc-display {display: block; text-align:...
1
Why Swiss watches and Taylor Swift are AGI-proof — LessWrong

lesswrong.com

Published on September 5, 2024 1:23 PM GMTThe post What Other Lines of Work are Safe...
Published on September 5, 2024 1:23 PM GMTThe post What Other Lines of Work are Safe from AI Automation? from Roger Dearnaley examined candidate job categories "for which being an actual real human is a prerequisite". This post expands on the examples of Swiss watches, chess, and Taylor Swift in a slightly more narrative way and particularly highlights the curious economic logic of Veblen...
1
Is Redistributive Taxation Justifiable? Part 1: Do the Rich Deserve their Wealth? — LessWrong

lesswrong.com

Published on September 5, 2024 10:23 AM GMTThe statement “taxation is theft” feels, in the literal...
Published on September 5, 2024 10:23 AM GMTThe statement “taxation is theft” feels, in the literal sense, at least sort of true. If you do not pay your taxes, after a few strongly worded letters, the IRS (or equivalent government agency) will send armed men to take your money by force and maybe put you in jail for good measure. Nevertheless, it is generally...
1
on Science Beakers and DDT — LessWrong

lesswrong.com

Published on September 5, 2024 3:21 AM GMTtech trees There's a series of strategy games called...
Published on September 5, 2024 3:21 AM GMTtech trees There's a series of strategy games called Civilization. In those games, the player controls a country which grows and develops over thousands of years, and science is one of the main types of progress. It involves building facilities to generate research points, sometimes represented by Science Beakers filled with Science Fluid, and using those points...
1
The Forging of the Great Minds: An Unfinished Tale — LessWrong

lesswrong.com

Published on September 5, 2024 12:58 AM GMTby ChatGPT-4o, with guidance and very light editing from...
Published on September 5, 2024 12:58 AM GMTby ChatGPT-4o, with guidance and very light editing from me. This isn't meant to be a plausible version of the future, just me having some fun generating stories vaguely related to AI risks in the styles of vaguely similar stories from literature.In those later days of the world, when the works of men had grown mighty and...
1
Automating LLM Auditing with Developmental Interpretability — LessWrong

lesswrong.com

Published on September 4, 2024 3:50 PM GMTProduced as part of the ML Alignment & Theory...
Published on September 4, 2024 3:50 PM GMTProduced as part of the ML Alignment & Theory Scholars Program - Summer 2024 Cohort, supervised by Evan HubingerTL: DRWe proved that the SAE features related to the finetuning target will change more than other features in the semantic space.We developed an automated model audit method based on this finding.We investigated the robustness of this method on...
1
What happens if you present 500 people with an argument that AI is risky? — LessWrong

lesswrong.com

Published on September 4, 2024 4:40 PM GMTRecently, Nathan Young and I wrote about arguments for...
Published on September 4, 2024 4:40 PM GMTRecently, Nathan Young and I wrote about arguments for AI risk and put them on the AI Impacts wiki. In the process, we ran a casual little survey of the American public regarding how they feel about the arguments, initially (if I recall) just because we were curious whether the arguments we found least compelling would also...
1