~www_lesswrong_com | Bookmarks (664)

Ideologically-facilitated prompt injection: a demo — LessWrong

lesswrong.com

Published on August 24, 2024 12:13 AM GMTIntroductionWhen the end user of a deployed LLM gets...
Published on August 24, 2024 12:13 AM GMTIntroductionWhen the end user of a deployed LLM gets it to generate text that is opposed to the goals of the deployer, that end user has succeeded at prompt injection:A successful prompt injection attack. This post demonstrates a prompt injection strategy that I call ideologically-facilitated prompt injection. In the following demo, I leverage ideologically-charged language to get gpt-3.5-turbo...
1
what becoming more secure did for me — LessWrong

lesswrong.com

Published on August 22, 2024 5:44 PM GMTAfter I quit my first job two years ago,...
Published on August 22, 2024 5:44 PM GMTAfter I quit my first job two years ago, I was conflict-avoidant to the point of depression. I did ~nothing for five months and moved in with my parents in the middle of nowhere.Social conflicts used to rip me up. I would be anxious for days, sometimes months. I was so avoidant of feelings I didn’t know they...
1
A primer on the current state of longevity research — LessWrong

lesswrong.com

Published on August 22, 2024 5:14 PM GMTNote: This post is co-authored with Stacy Li, a...
Published on August 22, 2024 5:14 PM GMTNote: This post is co-authored with Stacy Li, a PhD student at Berkeley studying aging biology! Highly appreciate all her help in writing, editing, and fact-checking my understanding!IntroductionThe last time I read about aging research deeply was around 2021. The general impression I was getting was that aging research was increasingly more and more funded (good!). Unfortunately,...
1
Some reasons to start a project to stop harmful AI — LessWrong

lesswrong.com

Published on August 22, 2024 4:23 PM GMTHey, I’m a coordinator of AI Safety Camp. Our...
Published on August 22, 2024 4:23 PM GMTHey, I’m a coordinator of AI Safety Camp. Our program has supported many projects in the past for finding technical solutions, and my two colleagues still do! Below is my view on safety, and what made me want to support pause/stop AI projects. With safety, I mean constraining a system’s potential for harm. To prevent harms, we must ensure...
1
The economics of space tethers — LessWrong

lesswrong.com

Published on August 22, 2024 4:15 PM GMTSome code for this post can be found here....
Published on August 22, 2024 4:15 PM GMTSome code for this post can be found here. Space tethers take the old, defunct space elevator concept and shorten it. Rockets can fly up to a dangling hook in the sky and then climb to a higher orbit. If the tether rotates, it can act like a catapult, providing a significant boost in a location where...
1
AI #78: Some Welcome Calm — LessWrong

lesswrong.com

Published on August 22, 2024 2:20 PM GMTSB 1047 has been amended once more, with both...
Published on August 22, 2024 2:20 PM GMTSB 1047 has been amended once more, with both strict improvements and big compromises. I cover the changes, and answer objections to the bill, in my extensive Guide to SB 1047. I follow that up here with reactions to the changes and some thoughts on where the debate goes from here. Ultimately, it is going to come...
1
How do we know dreams aren't real? — LessWrong

lesswrong.com

Published on August 22, 2024 12:41 PM GMTSuppose you believe the following:the universe is infinite in...
Published on August 22, 2024 12:41 PM GMTSuppose you believe the following:the universe is infinite in the sense that every possible combination of atoms is repeated an infinite number of times (either because the negative curvature of the universe implies the universe is unbounded or because of MWI)Consciousness is an atomic phenomena[1]. That is to say, the only special relationship between past-you and present...
1
Measuring Structure Development in Algorithmic Transformers — LessWrong

lesswrong.com

Published on August 22, 2024 8:38 AM GMTtl;dr: We compute the evolution of the local learning...
Published on August 22, 2024 8:38 AM GMTtl;dr: We compute the evolution of the local learning coefficient (LLC), a proxy for model complexity, for an algorithmic transformer. The LLC decreases as the model learns more structured solutions, such as head specialization. This post is structured in three main parts, (1) a summary, giving an overview of the main results, (2) the Fine Print, that delves...
1
Iterative Refinement Stages of Lying in LLMs — LessWrong

lesswrong.com

Published on August 22, 2024 7:32 AM GMTAs models grow increasingly sophisticated, they will surpass human...
Published on August 22, 2024 7:32 AM GMTAs models grow increasingly sophisticated, they will surpass human expertise. It is a fundamentally difficult challenge to make sure that those models are robustly aligned (cite scalable oversight). For example, we might hope to reliably know whether a model is being deceptive in order to achieve an instrumental goal (Hubinger et al., 2024). Importantly, deceptive alignment and robust...
1
How do you finish your tasks faster? — LessWrong

lesswrong.com

Published on August 21, 2024 8:01 PM GMTI have the following problem: start with a goal...
Published on August 21, 2024 8:01 PM GMTI have the following problem: start with a goal that I know how to reach. Everything is there in my mind. But between the start and the finish I stray away from achieving it. It happens like this: I focus on a sub-task. Then, I focus on sub-(sub-task)s. Till I loose my focus.I would like my process...
1
Just because an LLM said it doesn't mean it's true: an illustrative example — LessWrong

lesswrong.com

Published on August 21, 2024 9:05 PM GMTThis was originally posted in the comments of You...
Published on August 21, 2024 9:05 PM GMTThis was originally posted in the comments of You don't know how bad most things are nor precisely how they're bad.; I've broken it out into a post because I think it might be a useful corrective more generally for people inclined to cite LLM remarks as fact.I asked Claude, as an illustrative example, whether ready-made clothing...
1
AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety? — LessWrong

lesswrong.com

Published on August 21, 2024 6:09 PM GMTWelcome to the AI Safety Newsletter by the Center...
Published on August 21, 2024 6:09 PM GMTWelcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.SB 1047, the Most-Discussed California AI LegislationCalifornia's Senate Bill 1047 has sparked discussion over AI regulation. While state bills often fly under...
1
Should LW suggest standard metaprompts? — LessWrong

lesswrong.com

Published on August 21, 2024 4:41 PM GMTBased on low-quality articles that seem to be coming...
Published on August 21, 2024 4:41 PM GMTBased on low-quality articles that seem to be coming up with more regularity, and as mentioned in a few recent posts, AI-generated posts are likely to be a permanent feature of LW (and most online forums, I expect). I wonder if we should focus on harm reduction (or actual value creation, in some cases) rather than trying...
1
Please do not use AI to write for you — LessWrong

lesswrong.com

Published on August 21, 2024 9:53 AM GMTI've recently seen several articles here that were clearly...
Published on August 21, 2024 9:53 AM GMTI've recently seen several articles here that were clearly generated or heavily assisted by AI. They are all dreadful. They are verbose, they are full of "on the one hand" and "on the other", they never make any assertion without also making room for the opposite, and end with "conclusions" that say nothing. Please do not do...
1
Apply to Aether - Independent LLM Agent Safety Research Group — LessWrong

lesswrong.com

Published on August 21, 2024 9:47 AM GMTThe basic ideaAether will be a small group of...
Published on August 21, 2024 9:47 AM GMTThe basic ideaAether will be a small group of talented early-career AI safety researchers with a shared research vision who work full-time with mentorship on their best effort at making AI go well. That research vision will broadly revolve around the alignment, control, and evaluation of LLM agents. There is a lot of latent talent in the...
1
the Giga Press was a mistake — LessWrong

lesswrong.com

Published on August 21, 2024 4:51 AM GMTthe giga press Tesla decided to use large aluminum...
Published on August 21, 2024 4:51 AM GMTthe giga press Tesla decided to use large aluminum castings ("gigacastings") for the frame of many of its vehicles, including the Model Y and Cybertruck. This approach and the "Giga Press" used for it have been praised by many articles and youtube videos, repeatedly called revolutionary and a key advantage. Most cars today are made by stamping...
1
What is the point of 2v2 debates? — LessWrong

lesswrong.com

Published on August 20, 2024 9:59 PM GMTFor instance, I am thinking about the munk debates...
Published on August 20, 2024 9:59 PM GMTFor instance, I am thinking about the munk debates which in 2023 tackled AI x-risk. I don't see how adding more people to a 1v1 debate makes it better in any way. One of the major frustrations with debates is that it is difficult to get the participants to respond to each other. The goal would be...
1
Where should I look for information on gut health? — LessWrong

lesswrong.com

Published on August 20, 2024 7:44 PM GMTI've been on a gut health kick, reading Brain...
Published on August 20, 2024 7:44 PM GMTI've been on a gut health kick, reading Brain Maker, adding more kale for the insoluble fiber, and cutting seed oils and sugar.The author recommends probiotic enemas and fecal transplant, but I've seen mixed information on effectiveness of such treatments, and I also read a concerning article by a professor that seemed to indicate that bacteria from...
1
Would you benefit from, or object to, a page with LW users' reacts? — LessWrong

lesswrong.com

Published on August 20, 2024 4:35 PM GMTThere is currently an admin-only page that shows a...
Published on August 20, 2024 4:35 PM GMTThere is currently an admin-only page that shows a list of all comments that have been reacted to (in chronological order). Periodically I think "it might just be nice to show this to everyone, and to let them filter by individual reacts, or individual users."The reason individual reacts might be nice is to filter for things like...
1
AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work — LessWrong

lesswrong.com

Published on August 20, 2024 4:22 PM GMTWe wanted to share a recap of our recent...
Published on August 20, 2024 4:22 PM GMTWe wanted to share a recap of our recent outputs with the AF community. Below, we fill in some details about what we have been working on, what motivated us to do it, and how we thought about its importance. We hope that this will help people build off things we have done and see how their...
1
Trying to be rational for the wrong reasons — LessWrong

lesswrong.com

Published on August 20, 2024 4:18 PM GMTRationalists are people who have an irrational preference for...
Published on August 20, 2024 4:18 PM GMTRationalists are people who have an irrational preference for rationality.This may sound silly, but when you think about it, it couldn't be any other way. I am not saying that all reasons in favor of rationality are irrational -- in fact, there are many rational reasons to be rational! It's just that "rational reasons to be rational"...
1
Vilnius – ACX Meetups Everywhere Fall 2024 — LessWrong

lesswrong.com

Published on August 19, 2024 5:38 PM GMTHey folks, We're organizing an ACX meetup in Vilnius...
Published on August 19, 2024 5:38 PM GMTHey folks, We're organizing an ACX meetup in Vilnius this September. We're meeting on Sunday the 22nd, at 3pm in Lukiškių Aikštė. We are few, yet we are mighty. RSVPs are optional. Join us here: https://discord.com/invite/jVtnuJep Discuss
1
A primer on why computational predictive toxicology is hard — LessWrong

lesswrong.com

Published on August 19, 2024 5:16 PM GMTIntroductionThere are now (claimed) foundation models for protein sequences,...
Published on August 19, 2024 5:16 PM GMTIntroductionThere are now (claimed) foundation models for protein sequences, DNA sequences, RNA sequences, molecules, scRNA-seq, chromatin accessibility, pathology slides, medical images, electronic health records, and clinical free-text. It’s a dizzying rate of progress.But there’s a few problems in biology that, interestingly enough, have evaded a similar level of ML progress, despite there seemingly being all the necessary...
1
Can Current LLMs be Trusted To Produce Paperclips Safely? — LessWrong

lesswrong.com

Published on August 19, 2024 5:17 PM GMTThere's a browser-based game about paperclip maximization: Universal Paperclips,...
Published on August 19, 2024 5:17 PM GMTThere's a browser-based game about paperclip maximization: Universal Paperclips, where you control an AI tasked with running a paperclip factory.Initially, you buy raw material, adjust the price per clip, and invest in "AutoClippers" to increase production. The game dramatically increases in complexity as it progresses, allowing you to invest resources and computational cycles into exponentially increasing production...
1

~www_lesswrong_com | Bookmarks (664)

Domains