~www_lesswrong_com | Bookmarks (664)

Win/continue/lose scenarios and execute/replace/audit protocols — LessWrong

lesswrong.com

Published on November 15, 2024 3:47 PM GMTIn this post, I’ll make a technical point that...
Published on November 15, 2024 3:47 PM GMTIn this post, I’ll make a technical point that comes up when thinking about risks from scheming AIs from a control perspective.In brief: Consider a deployment of an AI in a setting where it’s going to be given a sequence of tasks, and you’re worried about a safety failure that can happen suddenly. Every time the AI...
1
Proposing the Conditional AI Safety Treaty (linkpost TIME) — LessWrong

lesswrong.com

Published on November 15, 2024 1:59 PM GMTTechnological progress can excite us, politics can infuriate us,...
Published on November 15, 2024 1:59 PM GMTTechnological progress can excite us, politics can infuriate us, and wars can mobilize us. But faced with the risk of human extinction that the rise of artificial intelligence is causing, we have remained surprisingly passive. In part, perhaps this was because there did not seem to be a solution. This is an idea I would like to...
1
Seven lessons I didn't learn from election day — LessWrong

lesswrong.com

Published on November 14, 2024 6:39 PM GMTI spent most of my election day -- 3pm...
Published on November 14, 2024 6:39 PM GMTI spent most of my election day -- 3pm to 11pm Pacific time -- trading on Manifold Markets. That went about as well as it could have gone. I doubled the money I was trading with, jumping to 10th place on Manifold's all-time leaderboard. Spending my time trading instead of just nervously watching results come in also...
1
Effects of Non-Uniform Sparsity on Superposition in Toy Models — LessWrong

lesswrong.com

Published on November 14, 2024 4:59 PM GMTAbstractThis post summarises my findings on the effects of...
Published on November 14, 2024 4:59 PM GMTAbstractThis post summarises my findings on the effects of Non-Uniform feature sparsity on Superposition in the ReLU output model, introduced in the Toy Models of Superposition paper, the ReLU output model is a toy model which is shown to exhibit features in superposition instead of a dedicated dimension ('individual neuron') devoted to a single feature. That experiment...
1
The Early Christian Strategy — LessWrong

lesswrong.com

Published on November 14, 2024 5:02 PM GMTScott Alexander's latest today discusses Robert Axelrod's Prisoner’s Dilemma...
Published on November 14, 2024 5:02 PM GMTScott Alexander's latest today discusses Robert Axelrod's Prisoner’s Dilemma Tournament and the morality of Christians, Quakers, and the rationalist community. I think is of general interest to LessWrong readers.This is why I’m so fascinated by the early Christians. They played the doomed COOPERATE-BOT strategy and took over the world.https://www.astralcodexten.com/p/the-early-christian-strategy Discuss
1
'Estimat - Values and Data’s For Starters'- A Necessary Proposal? — LessWrong

lesswrong.com

Published on November 14, 2024 2:37 PM GMT1. PROBLEM In today’s digital era, teenagers face a dual...
Published on November 14, 2024 2:37 PM GMT1. PROBLEM In today’s digital era, teenagers face a dual challenge: processing vast amounts of information and maintaining emotional well-being. Traditional educational models, focused on memorization and mechanical procedures, might be better suited to preparing students for complex decision-making in an increasingly uncertain world.As part of a non-profit Latín American education initiative, Jacominesp, I seek to connect with...
1
AI #90: The Wall — LessWrong

lesswrong.com

Published on November 14, 2024 2:10 PM GMTAs the Trump transition continues and we try to...
Published on November 14, 2024 2:10 PM GMTAs the Trump transition continues and we try to steer and anticipate its decisions on AI as best we can, there was continued discussion about one of the AI debate’s favorite questions: Are we making huge progress real soon now, or is deep learning hitting a wall? My best guess is it is kind of both, that...
1
Evolutionary prompt optimization for SAE feature visualization — LessWrong

lesswrong.com

Published on November 14, 2024 1:06 PM GMTTLDR:Fluent dreaming for language models is an algorithm based on...
Published on November 14, 2024 1:06 PM GMTTLDR:Fluent dreaming for language models is an algorithm based on the GCG method that can reliably find plain-text readable prompts for LLMs that maximize certain logits or residual stream directions by using gradients and genetic algorithms. Authors showed its use for visualizing MLP neurons. We show this method can also help interpret SAE features.We reimplement the algorithm in...
1
AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems — LessWrong

lesswrong.com

Published on November 14, 2024 7:00 AM GMTYouTube link Do language models understand the causal structure...
Published on November 14, 2024 7:00 AM GMTYouTube link Do language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions. Topics we discuss: How...
1
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI — LessWrong

lesswrong.com

Published on November 14, 2024 6:13 AM GMTFrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in...
Published on November 14, 2024 6:13 AM GMTFrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AIFrontierMath presents hundreds of unpublished, expert-level mathematics problems that specialists spend days solving. It offers an ongoing measure of AI complex mathematical reasoning progress.We’re introducing FrontierMath, a benchmark of hundreds of original, expert-crafted mathematics problems designed to evaluate advanced reasoning capabilities in AI systems. These problems span major...
1
Concrete Methods for Heuristic Estimation on Neural Networks — LessWrong

lesswrong.com

Published on November 14, 2024 5:07 AM GMTThanks to Erik Jenner for helpful comments and discussion(Epistemic...
Published on November 14, 2024 5:07 AM GMTThanks to Erik Jenner for helpful comments and discussion(Epistemic Status: Tentative take on how to think about heuristic estimation and surprise accounting in the context of activation modeling and causal scrubbing. Should not be taken as authoritative/accurate representation of how ARC thinks about heuristic estimation)I'm pretty excited about ARC's heuristic arguments agenda. The general idea that "formal(ish)...
1
Heresies in the Shadow of the Sequences — LessWrong

lesswrong.com

Published on November 14, 2024 5:01 AM GMTReligions are collections of cherished but mistaken principles. So...
Published on November 14, 2024 5:01 AM GMTReligions are collections of cherished but mistaken principles. So anything that can be described either literally or metaphorically as a religion will have valuable unexplored ideas in its shadow.-Paul GrahamThis post isn't intended to construct full arguments for any of my "heresies" - I am hoping that you may not have considered them at all yet, but...
1
Current Attitudes Toward AI Provide Little Data Relevant to Attitudes Toward AGI — LessWrong

lesswrong.com

Published on November 12, 2024 6:23 PM GMTEpistemic status: Sudden public attitude shift seems quite possible,...
Published on November 12, 2024 6:23 PM GMTEpistemic status: Sudden public attitude shift seems quite possible, but I haven't seen it much in discussion, so I thought I'd float the idea again. This is somewhat dashed off since the goal is just to toss out a few possibilities and questions.In Current AIs Provide Nearly No Data Relevant to AGI Alignment, Thane Ruthenis argues that...
1
Basics of Handling Disagreements with People — LessWrong

lesswrong.com

Published on November 12, 2024 5:55 PM GMTEpistemic Status: This is a collection of useful heuristics...
Published on November 12, 2024 5:55 PM GMTEpistemic Status: This is a collection of useful heuristics I’ve gathered from a wide range of books and workshops, all rather evidence-based (robustness varies). These techniques are designed to supplement basics of rationalist discourse, helping facilitate interactions—mostly with those unfamiliar with rationalist thought, especially on entry-level arguments. They may also be useful in conversations between rationalists on...
1
Registrations Open for 2024 NYC Secular Solstice & Megameetup — LessWrong

lesswrong.com

Published on November 12, 2024 5:50 PM GMTOn December 14th, New York City will have a...
Published on November 12, 2024 5:50 PM GMTOn December 14th, New York City will have a Secular Solstice. Solstice is a holiday for people comfortable with uncomfortable truths and who believe in good. Secular Solstices take place in many cities around the world, but for us New York City is the best place for it. The first Solstice started here, amid towers that reach...
1
2024 NYC Secular Solstice & Megameetup — LessWrong

lesswrong.com

Published on November 12, 2024 5:46 PM GMTSecular Solstice is a celebration of hope in darkness....
Published on November 12, 2024 5:46 PM GMTSecular Solstice is a celebration of hope in darkness. For more than a decade now, people have gathered in New York City to sing about humanity's distant past and the future we hope to build. You are, of course, invited. This year, Solstice and the traditional Rationalist Megameetup will both be at the Sheraton Brooklyn New York Hotel,...
1
2025 Q1 Pivotal Research Fellowship (Technical & Policy) — LessWrong

lesswrong.com

Published on November 12, 2024 10:56 AM GMTWe’re excited to announce that applications are now open...
Published on November 12, 2024 10:56 AM GMTWe’re excited to announce that applications are now open for our 2025 Q1 Pivotal Research Fellowship, a 9-week program designed to enable promising researchers to produce impactful research and accelerate their careers in technical AI safety, AI governance, and biosecurity.About the FellowshipThe Pivotal Research Fellowship is hosted in London at the London Initiative for Safe AI (LISA). It offers...
1
Theories With Mentalistic Atoms Are As Validly Called Theories As Theories With Only Non-Mentalistic Atoms — LessWrong

lesswrong.com

Published on November 12, 2024 6:45 AM GMT[ This is supposed to be a didactic post....
Published on November 12, 2024 6:45 AM GMT[ This is supposed to be a didactic post. I'm not under the impression that I'm saying anything genuinely new. Thanks to Stephen Wolfram. ] I'm about an hour in to the Yudkowsky-Wolfram discussion [AI-generated transcript from which I'm quoting]. Wolfram thinks we should not fear AI doom very much in particular. I think he is wrong....
1
The lying p value — LessWrong

lesswrong.com

Published on November 12, 2024 6:12 AM GMTQuick check: do you agree or disagree with the...
Published on November 12, 2024 6:12 AM GMTQuick check: do you agree or disagree with the following statement:If a study finds a result significant at a p=0.05 level, that means they have followed a methodology which produces this conclusion correctly 95 % of the time.Yes or no? Keep that in mind, and we’ll get back to it.I’m reading the Fisher book where he popularised...
1
The Packaging and the Payload — LessWrong

lesswrong.com

Published on November 12, 2024 3:07 AM GMTI.As I've run and studied meetups, there's a useful...
Published on November 12, 2024 3:07 AM GMTI.As I've run and studied meetups, there's a useful metaphor that's become more important to how I think about them. For most meetups, there's the packaging, and a payload, and these are related but useful to approach separately. Allow me to expand.The payload is the thing you actually want. If you order some socks off Amazon, the...
1
Consider tabooing "I think" — LessWrong

lesswrong.com

Published on November 12, 2024 2:00 AM GMTPeople say "I think" a lot. Here are some...
Published on November 12, 2024 2:00 AM GMTPeople say "I think" a lot. Here are some examples:I think you brought me the wrong order.I think the numbers in the report are wrong.I think you need to turn left at the light.I think we need to replace the whole water heater.I think iPhones are better than Android phones.I think you should quit your job and...
1
Festival Stats 2024 — LessWrong

lesswrong.com

Published on November 12, 2024 2:00 AM GMT Each year ( 2014, 2015, 2016, 2017, 2018,...
Published on November 12, 2024 2:00 AM GMT Each year ( 2014, 2015, 2016, 2017, 2018, 2019, 2023) I put out a list of how many dance weekends, festivals, camps, and long dances contra bands and callers are doing. I don't really know why I do this, but it's about an hours work on top of that I'm already collecting for trycontra.com/events so I...
1
Personal AI Planning — LessWrong

lesswrong.com

Published on November 10, 2024 2:00 PM GMT LLMs are getting much more capable, and progress...
Published on November 10, 2024 2:00 PM GMT LLMs are getting much more capable, and progress is rapid. I use them in my daily work, and there are many tasks where they're usefully some combination of faster and more capable than I am. I don't see signs of these capability increases stopping or slowing down, and if they do continue I expect the impact...
1
AI alignment via civilizational cognitive updates — LessWrong

lesswrong.com

Published on November 10, 2024 9:33 AM GMT(This started as a reply to @Tamsin Leake 's...
Published on November 10, 2024 9:33 AM GMT(This started as a reply to @Tamsin Leake 's reply in my post about why cyborgism maybe should be open. This post does not require you to read our interaction, though it lead to this, and I'm very grateful for Tamsin's reply.)In general, this is a counterargument against:we should only share cyborg tools (software that lets AI...
1

~www_lesswrong_com | Bookmarks (664)

Domains