~www_lesswrong_com | Bookmarks (664)

A Rocket–Interpretability Analogy — LessWrong

lesswrong.com

Published on October 21, 2024 1:55 PM GMT 1. 4.4% of the US federal budget went into the...
Published on October 21, 2024 1:55 PM GMT 1. 4.4% of the US federal budget went into the space race at its peak.This was surprising to me, until a friend pointed out that landing rockets on specific parts of the moon requires very similar technology to landing rockets in soviet cities.[1]I wonder how much more enthusiastic the scientists working on Apollo were, with the convenient motivating...
1
Tokyo AI Safety 2025: Call For Papers — LessWrong

lesswrong.com

Published on October 21, 2024 8:43 AM GMTLast April, AI Safety Tokyo and Noeon Research (in...
Published on October 21, 2024 8:43 AM GMTLast April, AI Safety Tokyo and Noeon Research (in collaboration with Reaktor Japan, AI Alignment Network and AI Industry Foundation) hosted TAIS 2024, an AI safety conference in Tokyo, Japan. You can read more about that conference and how well it went here.We are running another conference next April (TAIS 2025), and have just put out our call for papers....
2
OpenAI defected, but we can take honest actions — LessWrong

lesswrong.com

Published on October 21, 2024 8:41 AM GMTDiscuss
1
Slightly More Than You Wanted To Know: Pregnancy Length Effects — LessWrong

lesswrong.com

Published on October 21, 2024 1:26 AM GMTPregnancy is most stressful at the beginning and at...
Published on October 21, 2024 1:26 AM GMTPregnancy is most stressful at the beginning and at the end. First you try to conceive, and spend half of each month superimposed between two very different realities. Then there’s morning sickness, and you worry about miscarriage, which is far more likely early on.Near the end, you can be pretty confident you’re having a baby. But there’s...
1
What actual bad outcome has "ethics-based" RLHF AI Alignment already prevented? — LessWrong

lesswrong.com

Published on October 19, 2024 6:11 AM GMTWhat actually bad outcome has "ethics-based" AI Alignment prevented...
Published on October 19, 2024 6:11 AM GMTWhat actually bad outcome has "ethics-based" AI Alignment prevented in the present or near-past? By "ethics-based" AI Alignment I mean optimization directed at LLM-derived AIs that intends to make them safer, more ethical, harmless, etc. Not future AIs, AIs that already exist. What bad thing would have happened if they hadn't been RLHF'd and given restrictive system...
1
What's a good book for a technically-minded 11-year old? — LessWrong

lesswrong.com

Published on October 19, 2024 6:05 AM GMT"I, Robot" comes to mind. What else? Discuss
1
Methodology: Contagious Beliefs — LessWrong

lesswrong.com

Published on October 19, 2024 3:58 AM GMTSimulating Political AlignmentThis methodology concerns a simulation tool which...
Published on October 19, 2024 3:58 AM GMTSimulating Political AlignmentThis methodology concerns a simulation tool which has been developed to model how beliefs, that are not directly related, end up correlated in political identities. It models the transmission of beliefs between nodes on a static hexagonal grid, based on a valence matrix. This methodology allows the user to observe the spread of ideas and...
1
AI Prejudices: Practical Implications — LessWrong

lesswrong.com

Published on October 19, 2024 2:19 AM GMTI see widespread dismissal of AI capabilities. This slows...
Published on October 19, 2024 2:19 AM GMTI see widespread dismissal of AI capabilities. This slows down the productivity gains from AI, and is a major contributor to disagreements about the risks of AI. It reminds me of prejudice against various types of biological minds. I will try to minimize the moralizing about fairness in this post, and focus more on selfish reasons to...
1
Start an Upper-Room UV Installation Company? — LessWrong

lesswrong.com

Published on October 19, 2024 2:00 AM GMT While this post touches on biosecurity it's a...
Published on October 19, 2024 2:00 AM GMT While this post touches on biosecurity it's a personal post and I'm not speaking for my employer If you want to prevent airborne spread of diseases you have a few options: Filter breath (masks, PAPRs) Replace the air (ventilation) Clean the air (filters, UV light) Masks, fans, and air filters are widely available, but what about...
1
How I'd like alignment to get done (as of 2024-10-18) — LessWrong

lesswrong.com

Published on October 18, 2024 11:39 PM GMTPreamble My alignment proposal involves aligning an encoding of...
Published on October 18, 2024 11:39 PM GMTPreamble My alignment proposal involves aligning an encoding of human-friendly values and then turning on a self-improving AGI with that encoding as its target. Obviously this involve "aligning an encoding of human-friendly values" and also "turning on a self-improving AGI with a specific target", two things we currently do not know how to do... As expected, this...
1
Sabotage Evaluations for Frontier Models — LessWrong

lesswrong.com

Published on October 18, 2024 10:33 PM GMTThis is a linkpost for a new research paper...
Published on October 18, 2024 10:33 PM GMTThis is a linkpost for a new research paper from the Alignment Evaluations team at Anthropic and other researchers, introducing a new suite of evaluations of models' abilities to undermine measurement, oversight, and decision-making. Paper link.Abstract:Sufficiently capable models could subvert human oversight and decision-making in important contexts. For example, in the context of AI development, models could...
1
D&D Sci Coliseum: Arena of Data — LessWrong

lesswrong.com

Published on October 18, 2024 10:02 PM GMTThis is an entry in the 'Dungeons & Data...
Published on October 18, 2024 10:02 PM GMTThis is an entry in the 'Dungeons & Data Science' series, a set of puzzles where players are given a dataset to analyze and an objective to pursue using information from that dataset.Estimated Complexity: 4/5 (this is a guess, I will update based on feedback/seeing how the scenario goes)STORYThe Demon King rises in his distant Demon Castle....
1
the Daydication technique — LessWrong

lesswrong.com

Published on October 18, 2024 9:47 PM GMTI came up with a technique that I have...
Published on October 18, 2024 9:47 PM GMTI came up with a technique that I have derived much benefit from; what kids call a "lifehack" these days. I have done this for many months, found it extremely useful and the usefulness has been rising fairly steadily as I got better at it. All the friends I could get to try it gave very good...
1
[Linkpost] Hawkish nationalism vs international AI power and benefit sharing — LessWrong

lesswrong.com

Published on October 18, 2024 6:13 PM GMTTLDR: In response to Leopold Aschenbrenner’s ‘Situational Awareness’ and...
Published on October 18, 2024 6:13 PM GMTTLDR: In response to Leopold Aschenbrenner’s ‘Situational Awareness’ and its accelerationist national ambitions, we argue against the claim that artificial superintelligence will inevitably be weaponised and turn its country of origin into an untouchable hegemony. Not only do we see this narrative as extremely dangerous, but also expect that the grandest AI challenges call for global coordination...
1
AI #86: Just Think of the Potential — LessWrong

lesswrong.com

Published on October 17, 2024 3:10 PM GMTDario Amodei is thinking about the potential. The result...
Published on October 17, 2024 3:10 PM GMTDario Amodei is thinking about the potential. The result is a mostly good essay called Machines of Loving Grace, outlining what can be done with ‘powerful AI’ if we had years of what was otherwise relative normality to exploit it in several key domains, and we avoided negative outcomes and solved the control and alignment problems. As...
1
Concrete benefits of making predictions — LessWrong

lesswrong.com

Published on October 17, 2024 2:23 PM GMTYour mind is a prediction machine, constantly trying to...
Published on October 17, 2024 2:23 PM GMTYour mind is a prediction machine, constantly trying to anticipate the world around you and altering its forecasts based on new information. It’s always doing this as a background process. But what would happen if you deliberately trained this skill? Could you get better at predicting your projects, your life, and the future?Sadly, I don’t have a...
1
Arithmetic is an underrated world-modeling technology — LessWrong

lesswrong.com

Published on October 17, 2024 2:00 PM GMTOf all the cognitive tools our ancestors left us,...
Published on October 17, 2024 2:00 PM GMTOf all the cognitive tools our ancestors left us, what’s best? Society seems to think pretty highly of arithmetic. It’s one of the first things we learn as children. So I think it’s weird that only a tiny percentage of people seem to know how to actually use arithmetic. Or maybe even understand what arithmetic is for....
1
The Computational Complexity of Circuit Discovery for Inner Interpretability — LessWrong

lesswrong.com

Published on October 17, 2024 1:18 PM GMTAuthors: Federico Adolfi, Martina G. Vilas, Todd Wareham.Abstract:Many proposed...
Published on October 17, 2024 1:18 PM GMTAuthors: Federico Adolfi, Martina G. Vilas, Todd Wareham.Abstract:Many proposed applications of neural networks in machine learning, cognitive/brain science, and society hinge on the feasibility of inner interpretability via circuit discovery. This calls for empirical and theoretical explorations of viable algorithmic options. Despite advances in the design and testing of heuristics, there are concerns about their scalability and...
1
is there a big dictionary somewhere with all your jargon and acronyms and whatnot? — LessWrong

lesswrong.com

Published on October 17, 2024 11:30 AM GMTit would help newcomersDiscuss
1
It is time to start war gaming for AGI — LessWrong

lesswrong.com

Published on October 17, 2024 5:14 AM GMTIn this episode of the Making Sense podcast with...
Published on October 17, 2024 5:14 AM GMTIn this episode of the Making Sense podcast with Sam Harris, Barton Gellman from The Brennan Center For Justice discusses how he "organized five nonpartisan tabletop exercises premised on an authoritarian candidate winning the presidency to test the resilience of democratic institutions"."The 175 participants across five exercises were Republicans, Democrats, and independents; liberals, conservatives, and centrists. They...
1
Reinforcement Learning: Essential Step Towards AGI or Irrelevant? — LessWrong

lesswrong.com

Published on October 17, 2024 3:37 AM GMTA friend of mine thinks that RL is a...
Published on October 17, 2024 3:37 AM GMTA friend of mine thinks that RL is a dead end: LLMs are much better at problem solving, exploration, and exploitation than any RL algorithm. And I agree that LLMs are better than RL on RL's tasks: companies even have LLMs controlling robots nowadays.The part where we disagree is that I see RL as the step that...
1
The Cognitive Bootcamp Agreement — LessWrong

lesswrong.com

Published on October 16, 2024 11:24 PM GMTFor the next Cognitive Bootcamp, I wanted to experiment...
Published on October 16, 2024 11:24 PM GMTFor the next Cognitive Bootcamp, I wanted to experiment with a format that is a) more explicitly “intense”b) a bit more opinionated on how people spend their timewhile c) still flexible enough to deal with the fact that people need different things and being too intense can be badThe next workshop is Friday, Oct 25 at 4pm thru...
1
Bitter lessons about lucid dreaming — LessWrong

lesswrong.com

Published on October 16, 2024 9:27 PM GMTThe amount of effort is not proportional to the...
Published on October 16, 2024 9:27 PM GMTThe amount of effort is not proportional to the result. One lucid dream (LD) can take hours or even dozens of hours of effort. On average, a practitioner experiences several dozen LDs in their lifetime before quitting. If they don't quit, they dedicate their entire life to it, day and night, trying endless techniques, practicing reality checks,...
1
Towards Quantitative AI Risk Management — LessWrong

lesswrong.com

Published on October 16, 2024 7:26 PM GMTReading guidelines: If you are short on time, just...
Published on October 16, 2024 7:26 PM GMTReading guidelines: If you are short on time, just read the section “The importance of quantitative risk tolerance & how to turn it into actionable signals”Tl;dr: We have recently published an AI risk management framework. This framework draws from both existing risk management approaches and AI risk management practices. We then adapted it into a rating system...
1

~www_lesswrong_com | Bookmarks (664)

Domains