~www_lesswrong_com | Bookmarks (664)

Two arguments against longtermist thought experiments — LessWrong

lesswrong.com

Published on November 2, 2024 10:22 AM GMTEpistemic status: shower thoughts.I am currently going through the...
Published on November 2, 2024 10:22 AM GMTEpistemic status: shower thoughts.I am currently going through the EA Introductory Course and we discussed two arguments against longtermism which I have not seen elsewhere.So goes a thought experiment: imagine you have toxic waste at hand, which you can process right now at the cost of 100 lives, or bury it so it'll have no effect right...
1
Both-Sidesism—When Fair & Balanced Goes Wrong — LessWrong

lesswrong.com

Published on November 2, 2024 3:04 AM GMTIn a few days time, voting will close for...
Published on November 2, 2024 3:04 AM GMTIn a few days time, voting will close for millions of Americans in one of the most contentious and globally consequential elections in world history. And while this week’s subject; both-sidesism, is ‘evergreen’ in that the topic will continue to be relevant long into the future, this election has highlighted its significance and introduced some new turns.A...
1
What can we learn from insecure domains? — LessWrong

lesswrong.com

Published on November 1, 2024 11:53 PM GMTCryptocurrency is terrible. With a single click of a...
Published on November 1, 2024 11:53 PM GMTCryptocurrency is terrible. With a single click of a button, it is possible to accidentally lose all of your funds. 99.9% of all cryptocurrency projects are complete scams (conservative estimate). Crypto is also tailor-made for ransomware attacks, since it makes it possible to send money in such a way that the receiver has perfect anonymity.typical web3 experienceSimilarly,...
1
Science advances one funeral at a time — LessWrong

lesswrong.com

Published on November 1, 2024 11:06 PM GMTMajor scientific institutions talk a big game about innovation,...
Published on November 1, 2024 11:06 PM GMTMajor scientific institutions talk a big game about innovation, but the reality is that many of the mechanisms designed to ensure quality—peer review, funding decisions, the academic hierarchy—explicitly incentivize incremental rather than revolutionary progress.[1] Thomas Kuhn's now-famous notion of paradigm shifts was pointing at precisely this phenomenon. When scientists work within what Kuhn called "normal science," they're essentially solving low- to...
1
Set Theory Multiverse vs Mathematical Truth - Philosophical Discussion — LessWrong

lesswrong.com

Published on November 1, 2024 6:56 PM GMTI've been thinking about the set theory multiverse and...
Published on November 1, 2024 6:56 PM GMTI've been thinking about the set theory multiverse and its philosophical implications, particularly regarding mathematical truth. While I understand the pragmatic benefits of the multiverse view, I'm struggling with its philosophical implications. The multiverse view suggests that statements like the Continuum Hypothesis aren't absolutely true or false, but rather true in some set-theoretic universes and false in...
1
SAE Probing: What is it good for? Absolutely something! — LessWrong

lesswrong.com

Published on November 1, 2024 7:23 PM GMTSubhash and Josh are co-first authors. Work done as...
Published on November 1, 2024 7:23 PM GMTSubhash and Josh are co-first authors. Work done as part of the two week research sprint in Neel Nanda’s MATS streamTLDRWe show that dense probes trained on SAE encodings are competitive with traditional activation probing over 60 diverse binary classification datasetsSpecifically, we find that SAE probes have advantages with:Low data regimes (~ < 100 training examples)Corrupted data (i.e. our...
1
'Meta', 'mesa', and mountains — LessWrong

lesswrong.com

Published on October 31, 2024 5:25 PM GMTRecently, in a conversation with a coworker, I was...
Published on October 31, 2024 5:25 PM GMTRecently, in a conversation with a coworker, I was trying to describe the rate at which time passed subjectively, with a term that distinguished from the usual objective clock speed constrained only by general relativity and generic human psychology. I ended up saying "meta-rate". That bothered me for reasons I couldn't put a finger on until today.I...
1
Toward Safety Cases For AI Scheming — LessWrong

lesswrong.com

Published on October 31, 2024 5:20 PM GMTDevelopers of frontier AI systems will face increasingly challenging...
Published on October 31, 2024 5:20 PM GMTDevelopers of frontier AI systems will face increasingly challenging decisions about whether their AI systems are safe enough to develop and deploy. One reason why systems may not be safe is if they engage in scheming. In our new report "Towards evaluations-based safety cases for AI scheming", written in collaboration with researchers from the UK AI Safety...
1
AI #88: Thanks for the Memos — LessWrong

lesswrong.com

Published on October 31, 2024 3:00 PM GMTFollowing up on the Biden Executive Order on AI,...
Published on October 31, 2024 3:00 PM GMTFollowing up on the Biden Executive Order on AI, the White House has now issued an extensive memo outlining its AI strategy. The main focus is on government adaptation and encouraging innovation and competitiveness, but there’s also sections on safety and international governance. Who knows if a week or two from now, after the election, we will...
1
The Compendium, A full argument about extinction risk from AGI — LessWrong

lesswrong.com

Published on October 31, 2024 12:01 PM GMTWe (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti,...
Published on October 31, 2024 12:01 PM GMTWe (Connor Leahy, Gabriel Alfour, Chris Scammell, Andrea Miotti, Adam Shimi) have just published The Compendium, which brings together in a single place the most important arguments that drive our models of the AGI race, and what we need to do to avoid catastrophe.We felt that something like this has been missing from the AI conversation. Most of...
1
Some Preliminary Notes on the Promise of a Wisdom Explosion — LessWrong

lesswrong.com

Published on October 31, 2024 9:21 AM GMTThis post is one-half of my third-prize winning entry...
Published on October 31, 2024 9:21 AM GMTThis post is one-half of my third-prize winning entry for the AI Impacts Essay Competition on the Automation of Wisdom and Philosophy. It proposes a "wisdom explosion" as an alternative to an intelligence explosion that is safer from the standpoint of differential technological development.Discuss
1
What TMS is like — LessWrong

lesswrong.com

Published on October 31, 2024 12:44 AM GMTThere are two nuclear options for treating depression: Ketamine...
Published on October 31, 2024 12:44 AM GMTThere are two nuclear options for treating depression: Ketamine and TMS; This post is about the latter.TMS stands for Transcranial Magnetic Stimulation. Basically, it fixes depression via magnets, which is about the second or third most magical things that magnets can do.I don’t know a whole lot about the neuroscience - this post isn’t about the how...
1
AI Safety at the Frontier: Paper Highlights, October '24 — LessWrong

lesswrong.com

Published on October 31, 2024 12:09 AM GMTThis is a selection of AI safety paper highlights...
Published on October 31, 2024 12:09 AM GMTThis is a selection of AI safety paper highlights in October 2024, from my blog "AI Safety at the Frontier". The selection primarily covers ML-oriented research. It's only concerned with papers (arXiv, conferences etc.), not LessWrong or Alignment Forum posts. As such, it should be a nice addition for people primarily following the forum, who might otherwise...
1
Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution — LessWrong

lesswrong.com

Published on October 30, 2024 10:50 PM GMTThis work was produced as part of the ML...
Published on October 30, 2024 10:50 PM GMTThis work was produced as part of the ML Alignment & Theory Scholars Program - Summer 24 Cohort.TL;DRThe current approach to explaining model internals is to (1) disentangle the language model representations into feature activations using a Sparse Autoencoder (SAE) and then (2) explain the corresponding feature activations using an AutoInterp process. Surprisingly, when formulating this two...
1
Generic advice caveats — LessWrong

lesswrong.com

Published on October 30, 2024 9:03 PM GMTYou were (probably) linked here from some advice. Unfortunately,...
Published on October 30, 2024 9:03 PM GMTYou were (probably) linked here from some advice. Unfortunately, that advice has some caveats. See below:There exist some people who should not do the advice.Moreover, people are different.More moreover, situations are different. What worked there/then might not work here/now.Some of the advice is missing context, contradictory, inaccurate, misleading, won’t replicate, or is downright false or useless.Consider reversing...
1
I turned decision theory problems into memes about trolleys — LessWrong

lesswrong.com

Published on October 30, 2024 8:13 PM GMTI hope it has some educational, memetic or at...
Published on October 30, 2024 8:13 PM GMTI hope it has some educational, memetic or at least humorous potential.Newcomb's problemSmoking lesionParfit's HitchhikerCounterfactual muggingXor-blackmailDiscuss
1
The Alignment Trap: AI Safety as Path to Power — LessWrong

lesswrong.com

Published on October 29, 2024 3:21 PM GMTRecent discussions about artificial intelligence safety have focused heavily...
Published on October 29, 2024 3:21 PM GMTRecent discussions about artificial intelligence safety have focused heavily on ensuring AI systems remain under human control. While this goal seems laudable on its surface, we should carefully examine whether some proposed safety measures could paradoxically enable rather than prevent dangerous concentrations of power.The Control ParadoxThe fundamental tension lies in how we define "safety." Many current approaches...
1
Housing Roundup #10 — LessWrong

lesswrong.com

Published on October 29, 2024 1:50 PM GMTThere’s more campaign talk about housing. The talk of...
Published on October 29, 2024 1:50 PM GMTThere’s more campaign talk about housing. The talk of needing more housing is highly welcome, as one prominent person after another (including Jerome Powell!) talking like a YIMBY. A lot of the concrete proposals are of course terrible, but not all of them. I’ll start off covering all that along with everyone’s favorite awful policy, which is...
1
[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations — LessWrong

lesswrong.com

Published on October 29, 2024 1:36 PM GMT7.1 Post summary / Table of contentsThis is the...
Published on October 29, 2024 1:36 PM GMT7.1 Post summary / Table of contentsThis is the 7th of a series of 8 blog posts, which I’m serializing weekly. (Or email or DM me if you want to read the whole thing right now.)The main thrust of this post is an opinionated discussion of the book The Origin of Consciousness in the Breakdown of the Bicameral Mind by...
1
Review: “The Case Against Reality” — LessWrong

lesswrong.com

Published on October 29, 2024 1:13 PM GMTThis is not a red stop sign:For one thing,...
Published on October 29, 2024 1:13 PM GMTThis is not a red stop sign:For one thing, in a ceci n'est pas une pipe way, it’s not a stop sign at all, but a digital representation of a photograph of a stop sign, made visible by a computer monitor or maybe a printer. More subtly, “red” is not a quality of the sign, but of...
1
A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More — LessWrong

lesswrong.com

Published on October 29, 2024 12:41 PM GMTThis project report was created in September 2024 as...
Published on October 29, 2024 12:41 PM GMTThis project report was created in September 2024 as part of the BlueDot AI Safety Fundamentals Course, with the guidance of my facilitator, Alexandra Abbas. Work on this project originated as part of an ideation at a Apart Research hackathon. This report dives into APIAYN (A Poem Is All You Need), a simple jailbreak that doesn’t employ direct deception...
1
Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence — LessWrong

lesswrong.com

Published on October 29, 2024 12:16 PM GMTTo update our credence on whether or not LLMs...
Published on October 29, 2024 12:16 PM GMTTo update our credence on whether or not LLMs are conscious, we can ask how many of the Butlin/Long indicator properties for phenomenal consciousness are satisfied by LLMs. To start this program, I zoomed in on an indicator property that is required for consciousness under higher-order theory, nicknamed “HOT-2”: Metacognitive monitoring distinguishing reliable perceptual representations from noise. Do today’s...
1
AI #87: Staying in Character — LessWrong

lesswrong.com

Published on October 29, 2024 7:10 AM GMTThe big news of the week was the release...
Published on October 29, 2024 7:10 AM GMTThe big news of the week was the release of a new version of Claude Sonnet 3.5, complete with its ability (for now only through the API) to outright use your computer, if you let it. It’s too early to tell how big an upgrade this is otherwise. ChatGPT got some interface tweaks that, while minor, are...
1
A path to human autonomy — LessWrong

lesswrong.com

Published on October 29, 2024 3:02 AM GMT"Each one of us, and also us as the...
Published on October 29, 2024 3:02 AM GMT"Each one of us, and also us as the current implementation of humanity are going to be replaced. Persistence in current form is impossible. It's impossible in biology; every species will either die out or it will change and adapt, in which case it is again not the same species. So the next question is once you've...
1

~www_lesswrong_com | Bookmarks (664)

Domains