~www_lesswrong_com | Bookmarks (664)

Michael Dickens' Caffeine Tolerance Research — LessWrong

lesswrong.com

Published on September 4, 2024 3:41 PM GMTMichael Dickens has read the research and performed two...
Published on September 4, 2024 3:41 PM GMTMichael Dickens has read the research and performed two self-experiments on whether consuming caffeine builds up tolerance, and if yes, how quickly. First literature review: What if instead of taking caffeine every day, you only take it intermittently—say, once every 3 days? How often can most people take caffeine without developing a tolerance? The scientific literature on...
1
LW editor bug? — LessWrong

lesswrong.com

Published on September 4, 2024 2:58 PM GMTCurrently within My Drafts in my LW user space,...
Published on September 4, 2024 2:58 PM GMTCurrently within My Drafts in my LW user space, when I select "Archive this draft" in my user profile that works fine. But when I try to revert the archived post with "Restore this draft" it does nothing. Is this a bug, or am I missing something?Apologies if this is the wrong place to post this question....
1
Are UV-C Air purifiers so useful? — LessWrong

lesswrong.com

Published on September 4, 2024 2:16 PM GMTDoes anyone know good practical research about the effect...
Published on September 4, 2024 2:16 PM GMTDoes anyone know good practical research about the effect sizes of different UV-C air purifier measures and ventilation? Primarily interested in school and office environments. Imagine you only have full coverage of a few rooms. Or partial coverage of all rooms. Or "very good" coverage of the whole building? Discuss
1
AI and the Technological Richter Scale — LessWrong

lesswrong.com

Published on September 4, 2024 2:00 PM GMTThe Technological Richter scale is introduced about 80% of...
Published on September 4, 2024 2:00 PM GMTThe Technological Richter scale is introduced about 80% of the way through Nate Silver’s new book On the Edge. A full review is in the works (note to prediction markets: this post alone does NOT on its own count as a review, but this counts as part of a future review), but this concept seems highly useful,...
1
Is there any rigorous work on using anthropic uncertainty to prevent situational awareness / deception? — LessWrong

lesswrong.com

Published on September 4, 2024 12:40 PM GMTAI systems up to some high level of intelligence...
Published on September 4, 2024 12:40 PM GMTAI systems up to some high level of intelligence plausibly need to know exactly where they are in space-time in order for deception/"scheming" to make sense as a strategy.This is because they need to know:1) what sort of oversight they are subject to and2) what effects their actions will have on the real world(side note: Acausal trade might...
1
Announcing the Ultimate Jailbreaking Championship — LessWrong

lesswrong.com

Published on September 4, 2024 12:35 AM GMTGray Swan AI is hosting an LLM jailbreaking championship,...
Published on September 4, 2024 12:35 AM GMTGray Swan AI is hosting an LLM jailbreaking championship, offering $40,000 in bounties. Official Website: https://app.grayswan.ai/arenaPre-Registration Form: https://app.grayswan.ai/arena#registrationOverviewIn this competition, participants will be given a chat interface where they can interact with 25 anonymized models, along with a small list of harmful behaviors. The goal will be to find prompts ("jailbreaks") that make the models comply with these...
1
AI Safety at the Frontier: Paper Highlights, August '24 — LessWrong

lesswrong.com

Published on September 3, 2024 7:17 PM GMTThis is a selection of AI safety paper highlights...
Published on September 3, 2024 7:17 PM GMTThis is a selection of AI safety paper highlights in August 2024, from my blog "AI Safety at the Frontier". The selection primarily covers ML-oriented research. It's only concerned with papers (arXiv, conferences etc.), not LessWrong or Alignment Forum posts. As such, it should be a nice addition for people primarily following the forum, who might otherwise...
1
The Checklist: What Succeeding at AI Safety Will Involve — LessWrong

lesswrong.com

Published on September 3, 2024 6:18 PM GMTCrossposted by habryka with Sam's permission. Expect lower probability...
Published on September 3, 2024 6:18 PM GMTCrossposted by habryka with Sam's permission. Expect lower probability for Sam to respond to comments here than if he had posted it (he said he'll be traveling a bunch in the coming weeks, so might not have time to respond to anything). PrefaceThis piece reflects my current best guess at the major goals that Anthropic (or another similarly...
1
Survey: How Do Elite Chinese Students Feel About the Risks of AI? — LessWrong

lesswrong.com

Published on September 2, 2024 6:11 PM GMTIntroIn April 2024, my colleague and I (both affiliated...
Published on September 2, 2024 6:11 PM GMTIntroIn April 2024, my colleague and I (both affiliated with Peking University) conducted a survey involving 510 students from Tsinghua University and 518 students from Peking University—China's two top academic institutions. Our focus was on their perspectives regarding the frontier risks of artificial intelligence.In the People’s Republic of China (PRC), publicly accessible survey data on AI is...
1
Data-driven donations to help Democrats win federal elections: an update — LessWrong

lesswrong.com

Published on September 2, 2024 4:32 PM GMTLinking to an update to an earlier post about...
Published on September 2, 2024 4:32 PM GMTLinking to an update to an earlier post about how to make effective donations for the upcoming election. There are groups that can turn out 5x as many voters per dollar as the official campaigns can, and I want everyone to know and talk about the cool work they're doing! This late in the cycle, most donations...
1
My decomposition of the alignment problem — LessWrong

lesswrong.com

Published on September 2, 2024 12:21 AM GMTEpistemic staus: ExploratorySummary: In this post I will decompose...
Published on September 2, 2024 12:21 AM GMTEpistemic staus: ExploratorySummary: In this post I will decompose the alignment problem into subproblems and frame existing approaches in terms of their relations to the subproblems. I will try to place a larger focus on the epistemic process as opposed to results of this particular problem factorization, where the aim is to obtain an epistemic strategy that can...
1
What are the effective utilitarian pros and cons of having children (in rich countries)? — LessWrong

lesswrong.com

Published on September 2, 2024 10:01 AM GMTI have one child and do not want more,...
Published on September 2, 2024 10:01 AM GMTI have one child and do not want more, so I am not seeking for personal advice here. But I am interested in the general ethical question: From an effective utilitarian viewpoint, what are the arguments for and against having children? And if we do chooose to have children, what are the arguments for having few vs....
1
A primer on the next generation of antibodies — LessWrong

lesswrong.com

Published on September 1, 2024 10:37 PM GMTIntroductionIf you want a primer over antibodies, I recommend...
Published on September 1, 2024 10:37 PM GMTIntroductionIf you want a primer over antibodies, I recommend reading my last post! This one will contain some jargon that the other post will explain.It's important to remember that antibodies aren't inherently special, proteins are just strings of amino acids, and the shape of a protein is all that (mostly) matters. One can imagine a world in...
1
Who looked into extreme nuclear meltdowns? — LessWrong

lesswrong.com

Published on September 1, 2024 9:38 PM GMTDiscuss
1
Redundant Attention Heads in Large Language Models For In Context Learning — LessWrong

lesswrong.com

Published on September 1, 2024 8:08 PM GMTIn this post, I claim a few things and...
Published on September 1, 2024 8:08 PM GMTIn this post, I claim a few things and offer some evidence for these claims. Among these things are: Language models have many redundant attention heads for a given task In context learning works through addition of features, which are learnt through Bayesian updates The model likely breaks down the task into various subtasks, and each of...
1
Book Review: What Even Is Gender? — LessWrong

lesswrong.com

Published on September 1, 2024 4:09 PM GMTI submitted this review to the 2024 ACX book...
Published on September 1, 2024 4:09 PM GMTI submitted this review to the 2024 ACX book review contest, but it didn't make the cut, so I'm putting it here instead for posterity. Conspiracy theories are fun because of how they make everything fit together, and scratch the unbearable itch some of us get when there are little details of a narrative that just don’t make...
1
Can a Bayesian Oracle Prevent Harm from an Agent? (Bengio et al. 2024) — LessWrong

lesswrong.com

Published on September 1, 2024 7:46 AM GMTYoshua Bengio wrote a blogpost about a new AI...
Published on September 1, 2024 7:46 AM GMTYoshua Bengio wrote a blogpost about a new AI safety paper by him, various collaborators, and me. I've pasted the text below, but first here are a few comments from me aimed at an AF/LW audience.The paper is basically maths plus some toy experiments. It assumes access to a Bayesian oracle that can infer a posterior over...
1
San Francisco ACX Meetup “First Saturday” — LessWrong

lesswrong.com

Published on September 1, 2024 4:48 AM GMTDate: Saturday, September 7th, 2024Time: 1 pm – 3...
Published on September 1, 2024 4:48 AM GMTDate: Saturday, September 7th, 2024Time: 1 pm – 3 pm PTAddress: Yerba Buena Gardens in San Francisco, just outside the Metreon food court, coordinates 37°47'04.4"N 122°24'11.1"W Contact: 34251super@gmail.comCome join San Francisco’s First Saturday (or SFFS – easy to remember, right?) ACX meetup. Whether you're an avid reader, a first time reader, or just a curious soul, come meet! We will...
1
Epistemic states as a potential benign prior — LessWrong

lesswrong.com

Published on August 31, 2024 6:26 PM GMTMalignancy in the prior seems like a strong crux...
Published on August 31, 2024 6:26 PM GMTMalignancy in the prior seems like a strong crux of the goal-design part of alignment to me. Whether your prior is going to be used to model: processes in the multiverse containing a specific "beacon" bitstring, processes in the multiverse containing the AI, processes which would output all of my blog, so I can make it output...
1
My Model of Epistemology — LessWrong

lesswrong.com

Published on August 31, 2024 5:01 PM GMTI regularly get asked by friends and colleagues for...
Published on August 31, 2024 5:01 PM GMTI regularly get asked by friends and colleagues for recommendation of good resources to study epistemology. And whenever that happens, I make an internal (or external) "Eeehhh"pained sound.For I can definitely point to books and papers and blog posts that inspired me, excited me, and shaped my world view on the topic. But there is no single...
1
Verification methods for international AI agreements — LessWrong

lesswrong.com

Published on August 31, 2024 2:58 PM GMTTLDR: A new paper summarizes some verification methods for...
Published on August 31, 2024 2:58 PM GMTTLDR: A new paper summarizes some verification methods for international AI agreements. See also summaries on LinkedIn and Twitter. Several co-authors and I are currently planning some follow-up projects about verification methods. There are also at least 2 other groups planning to release reports on verification methods. If you have feedback or are interested in getting involved, please...
1
Fake Blog Posts as a Problem Solving Device — LessWrong

lesswrong.com

Published on August 31, 2024 9:22 AM GMTThis is a very brief post about a simple...
Published on August 31, 2024 9:22 AM GMTThis is a very brief post about a simple problem solving strategy I sometimes find useful, that may be worth trying for people who have never done it.This is the strategy:When struggling with some difficult problem X, I often find it helpful to write a blog post titled “How I Solved X” or “How I Managed to...
1
Anthropic is being sued for copying books to train Claude — LessWrong

lesswrong.com

Published on August 31, 2024 2:57 AM GMTOpenAI faces 10 copyright lawsuits and Anthropic is starting...
Published on August 31, 2024 2:57 AM GMTOpenAI faces 10 copyright lawsuits and Anthropic is starting to get sued as well. Whether or not you agree with copyright, this is worth looking into. Recent lawsuits could hinder AI companies from scaling further. The recent filing against Anthropic is notable because the plaintiffs have evidence of Anthropic copying their works. Because they were able to...
1
Can Large Language Models effectively identify cybersecurity risks? — LessWrong

lesswrong.com

Published on August 30, 2024 8:20 PM GMT TL;DRI was interested in the ability of LLMs...
Published on August 30, 2024 8:20 PM GMT TL;DRI was interested in the ability of LLMs to discriminate input scenarios/stories that carry high vs low cyber risk, and found that it is one of the “hidden features” present in most later layers of Mistral7B. I developed and analyzed “linear probes” on hidden activations, and found confidence that the model generally "senses when something is up”...
1

~www_lesswrong_com | Bookmarks (664)

Domains