~www_lesswrong_com | Bookmarks (664)

Why do Minimal Bayes Nets often correspond to Causal Models of Reality? — LessWrong

lesswrong.com

Published on August 3, 2024 12:39 PM GMTChapter 2 of Pearl's Causality book claims you can...
Published on August 3, 2024 12:39 PM GMTChapter 2 of Pearl's Causality book claims you can recover causal models given only the observational data, under very natural assumptions of minimality and stability[1].In graphical models lingo, Pearl identifies a causal model of the observational distribution with the distribution's perfect map (if they exist).But I'm confused about a pretty fundamental point: "What does this have to...
1
PIZZA: An Open Source Library for Closed LLM Attribution (or “why did ChatGPT say that?”) — LessWrong

lesswrong.com

Published on August 3, 2024 12:07 PM GMT From the research & engineering team at Leap Laboratories...
Published on August 3, 2024 12:07 PM GMT From the research & engineering team at Leap Laboratories (incl. @Arush, @sebastian-sosa, @Robbie McCorkell), where we use AI interpretability to accelerate scientific discovery from data. This post is about our LLM attribution repo PIZZA: Prompt Input Z? Zonal Attribution. (In the grand scientific tradition we have tortured our acronym nearly to death. For the crimes of others see...
1
Cooperation and Alignment in Delegation Games: You Need Both! — LessWrong

lesswrong.com

Published on August 3, 2024 10:16 AM GMTThis work was facilitated by the Oxford AI Safety...
Published on August 3, 2024 10:16 AM GMTThis work was facilitated by the Oxford AI Safety and Governance group, Cooperative AI Foundation, and Oxford Autonomous Intelligent Machines and Systems. Thanks also to Bart Jaworski, Jesse Clifton, Joar Skalse, Sam Barnett, Vincent Conitzer, Charlie Griffin, David Hyland, Michael Wooldridge, Ted Turocy, and Alessandro Abate. This blogpost accompanies the paper Cooperation and Control in Delegation Games...
1
SRE's review of Democracy — LessWrong

lesswrong.com

Published on August 3, 2024 7:20 AM GMTDay OneWe've been handed this old legacy system called...
Published on August 3, 2024 7:20 AM GMTDay OneWe've been handed this old legacy system called "Democracy". It's an emergency. The old maintainers are saying it has been misbehaving lately but they have no idea how to fix it. We've had a meeting with them to find out as much as possible about the system, but it turns out that all the original team...
1
I didn't think I'd take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is! — LessWrong

lesswrong.com

Published on August 2, 2024 10:35 PM GMTBasically, the user is shown a splatter of colored...
Published on August 2, 2024 10:35 PM GMTBasically, the user is shown a splatter of colored circles, then the splatter is hidden, and then they're asked to remember what proportion of the splatter was a particular color. To get good at it, they'd have to get good at accurately perceiving and remembering features of entire distributions. Obvious high propensity for transfer to mentally visualizing...
1
Evaluating Sparse Autoencoders with Board Game Models — LessWrong

lesswrong.com

Published on August 2, 2024 7:50 PM GMTThis blog post discusses a collaborative research paper on...
Published on August 2, 2024 7:50 PM GMTThis blog post discusses a collaborative research paper on sparse autoencoders (SAEs), specifically focusing on SAE evaluations and a new training method we call p-annealing. As the first author, I primarily contributed to the evaluation portion of our work. The views expressed here are my own and do not necessarily reflect the perspectives of my co-authors. You...
1
Ethical Deception: Should AI Ever Lie? — LessWrong

lesswrong.com

Published on August 2, 2024 5:53 PM GMTEthical Deception: Should AI Ever Lie?Personal Artificial Intelligence Assistants...
Published on August 2, 2024 5:53 PM GMTEthical Deception: Should AI Ever Lie?Personal Artificial Intelligence Assistants (PAIAs) are coming to your smartphone. Will they always tell the truth? Given the future scenario where humans increasingly seek subjective feedback from AIs, we can expect that their influence will accelerate. Will the widespread use of PAIAs influence social norms and expectations around praise, encouragement, emotional support, beauty,...
1
The Bitter Lesson for AI Safety Research — LessWrong

lesswrong.com

Published on August 2, 2024 6:39 PM GMTRead the associated paper "Safetywashing: Do AI Safety BenchmarksActually...
Published on August 2, 2024 6:39 PM GMTRead the associated paper "Safetywashing: Do AI Safety BenchmarksActually Measure Safety Progress?": https://arxiv.org/abs/2407.21792Focus on safety problems that aren’t solved with scale.Benchmarks are crucial in ML to operationalize the properties we want models to have (knowledge, reasoning, ethics, calibration, truthfulness, etc.). They act as a criterion to judge the quality of models and drive implicit competition between researchers....
1
Request for AI risk quotes, especially around speed, large impacts and black boxes — LessWrong

lesswrong.com

Published on August 2, 2024 5:49 PM GMT@KatjaGrace, Josh Hart I are finding quotes around different...
Published on August 2, 2024 5:49 PM GMT@KatjaGrace, Josh Hart I are finding quotes around different arguments for AI being an existential riskFull list here: https://docs.google.com/spreadsheets/d/1yB1QIHtA-EMPzqJ_57RvvftvXHTI5ZLAy921Y_8sn3U/edit Currently we are struggling to find proponents of the following arguments:"Loss of control via speed" - that things that might otherwise go well are going to go badly because they are happening so fast"Loss of control via inferiority" -...
1
A Simple Toy Coherence Theorem — LessWrong

lesswrong.com

Published on August 2, 2024 5:47 PM GMTThis post presents a simple toy coherence theorem, and...
Published on August 2, 2024 5:47 PM GMTThis post presents a simple toy coherence theorem, and then uses it to address various common confusions about coherence arguments.SettingDeterministic MDP. That means at each time t.mjx-chtml {display: inline-block; line-height: 0; text-indent: 0; text-align: left; text-transform: none; font-style: normal; font-weight: normal; font-size: 100%; font-size-adjust: none; letter-spacing: normal; word-wrap: normal; word-spacing: normal; white-space: nowrap; float: none; direction: ltr; max-width:...
1
Optimizing Repeated Correlations — LessWrong

lesswrong.com

Published on August 1, 2024 5:33 PM GMTAt my work, we run experiments – we specify some...
Published on August 1, 2024 5:33 PM GMTAt my work, we run experiments – we specify some set of input parameters, run some code, and get various metrics as output. Since we run so many of these, it's important for them to be fast and cheap.Recently I was working on an experiment type that took about ~1 hour per run, where the slow part was...
1
Are unpaid UN internships a good idea? — LessWrong

lesswrong.com

Published on August 1, 2024 3:06 PM GMTDisclaimer: I am outside of the world of international...
Published on August 1, 2024 3:06 PM GMTDisclaimer: I am outside of the world of international organisations. I am a scientific researcher at university. I am writing this post to open a discussion.IntroductionUN is an international organisation with the following main goals:maintain international peace and securitydevelop friendly relations among nationsstand up for human rightspromote better living standards and social progressHere a more concrete list...
1
The need for multi-agent experiments — LessWrong

lesswrong.com

Published on August 1, 2024 5:14 PM GMTTL;DR: Let’s start iterating on experiments that approximate real,...
Published on August 1, 2024 5:14 PM GMTTL;DR: Let’s start iterating on experiments that approximate real, society-scale multi-AI deploymentEpistemic status: These ideas seem like my most prominent delta with the average AI Safety researcher, have stood the test of time, and are shared by others I intellectually respect. Please attack them fiercely!Multi-polar risksSome authors have already written about multi-polar AI failure. I especially like how Andrew...
1
Dragon Agnosticism — LessWrong

lesswrong.com

Published on August 1, 2024 5:00 PM GMT I'm agnostic on the existence of dragons. I...
Published on August 1, 2024 5:00 PM GMT I'm agnostic on the existence of dragons. I don't usually talk about this, because people might misinterpret me as actually being a covert dragon-believer, but I wanted to give some background for why I disagree with calls for people to publicly assert the non-existence of dragons. Before I do that, though, it's clear that horrible acts...
1
Morristown ACX Meetup — LessWrong

lesswrong.com

Published on August 1, 2024 4:29 PM GMTA couple of months ago I created a meetup...
Published on August 1, 2024 4:29 PM GMTA couple of months ago I created a meetup group for ACX-adjacent people in Morristown (https://www.meetup.com/morristown-nj-friendly-ambitious-nerds/)About ~7 people have been meeting up weekly and it's been going great. I'm excited to expand the group by hosting an ACX meetup in Morristown!We'll meet at the center of the Green (https://plus.codes/87G7QGW9+RJC)If the weather is good we can hang out...
1
Some comments on intelligence — LessWrong

lesswrong.com

Published on August 1, 2024 3:17 PM GMTAfter reading another article on IQ, there are a...
Published on August 1, 2024 3:17 PM GMTAfter reading another article on IQ, there are a few things that I wish would become common knowledge to increase the quality of the debate. Posting them here:1)There is a difference between an abstract definition of intelligence such that it could also apply to aliens or AIs (something like "an agent able to optimize for outcomes in...
1
AI #75: Math is Easier — LessWrong

lesswrong.com

Published on August 1, 2024 1:40 PM GMTGoogle DeepMind got a silver metal at the IMO,...
Published on August 1, 2024 1:40 PM GMTGoogle DeepMind got a silver metal at the IMO, only one point short of the gold. That’s really exciting. We continuously have people saying ‘AI progress is stalling, it’s all a bubble’ and things like that, and I always find remarkable how little curiosity or patience such people are willing to exhibit. Meanwhile GPT-4o-Mini seems excellent, OpenAI...
1
Temporary Cognitive Hyperparameter Alteration — LessWrong

lesswrong.com

Published on August 1, 2024 10:27 AM GMTSocial anxiety is one hell of a thing. I...
Published on August 1, 2024 10:27 AM GMTSocial anxiety is one hell of a thing. I used to struggle with it a lot — escaping pressure by fleeing to the toilet. I’ve reduced my levels of social anxiety by bashing it over the head with exposure therapy, repeatedly dealing with anxiety-provoking situations until they became manageable.Nowadays, my levels of social anxiety are low enough...
1
Technology and Progress — LessWrong

lesswrong.com

Published on August 1, 2024 4:49 AM GMTThe audio version can be listened to here:In this...
Published on August 1, 2024 4:49 AM GMTThe audio version can be listened to here:In this essay, I will give my views on technology and progress, focusing specifically on space travel, computers and transhumanism.Futurism is not just predictive. It is also normative. We are concerned with what we should do, not just with what will happen. So, I’m going to talk about what I...
1
2/3 Aussie & NZ AI Safety folk often or sometimes feel lonely or disconnected (and 16 other barriers to impact) — LessWrong

lesswrong.com

Published on August 1, 2024 1:15 AM GMTI did what I think is the largest piece...
Published on August 1, 2024 1:15 AM GMTI did what I think is the largest piece of research on current and aspiring AI safety folk in Australia & New Zealand. I wanted to understand their career barriers so that I could then optimize my organization's tactics to remove them.CaveatsI am not a trained social scientist nor a statistician. There will be errors.I’ve budgeted a few...
1
Self-Other Overlap: A Neglected Approach to AI Alignment — LessWrong

lesswrong.com

Published on July 30, 2024 4:22 PM GMTFigure 1. Image generated by DALL-3 to represent the...
Published on July 30, 2024 4:22 PM GMTFigure 1. Image generated by DALL-3 to represent the concept of self-other overlapMany thanks to Bogdan Ionut-Cirstea, Steve Byrnes, Gunnar Zarnacke, Jack Foxabbott and Seong Hah Cho for critical comments and feedback on earlier and ongoing versions of this work.SummaryIn this post, we introduce self-other overlap training: optimizing for similar internal representations when the model reasons about...
1
Investigating the Ability of LLMs to Recognize Their Own Writing — LessWrong

lesswrong.com

Published on July 30, 2024 3:41 PM GMTThis post is an interim progress report on work...
Published on July 30, 2024 3:41 PM GMTThis post is an interim progress report on work being conducted as part of Berkeley's Supervised Program for Alignment Research (SPAR).Summary of Key PointsWe test the robustness of an open-source LLM’s (Llama3-8b) ability to recognize its own outputs on a diverse mix of datasets, two different tasks (summarization and continuation), and two different presentation paradigms (paired and...
1
Can Generalized Adversarial Testing Enable More Rigorous LLM Safety Evals? — LessWrong

lesswrong.com

Published on July 30, 2024 2:57 PM GMTThanks to Zora Che, Michael Chen, Andi Peng, Lev...
Published on July 30, 2024 2:57 PM GMTThanks to Zora Che, Michael Chen, Andi Peng, Lev McKinney, Bilal Chughtai, Shashwat Goel, Domenic Rosati, and Rohit Gandikota.TL;DRIn contrast to evaluating AI systems under normal "input-space" attacks, using "generalized," attacks, which allow an attacker to manipulate weights or activations, might be able to help us better evaluate LLMs for risks – even if they are deployed...
1
RTFB: California’s AB 3211 — LessWrong

lesswrong.com

Published on July 30, 2024 1:10 PM GMTSome in the tech industry decided now was the...
Published on July 30, 2024 1:10 PM GMTSome in the tech industry decided now was the time to raise alarm about AB 3211. As Dean Ball points out, there’s a lot of bills out there. One must do triage. Dean Ball: But SB 1047 is far from the only AI bill worth discussing. It’s not even the only one of the dozens of AI...
1

~www_lesswrong_com | Bookmarks (664)

Domains