Researchers gaslit Claude into giving instructions to build explosives

05/05/2026-16:13 05/05/2026-16:15 מחשבים וטכנולוגיה The Verge דיווח

Anthropic has spent years theverge.com
as the safe AI company. But new security research shared with The Verge suggests Claude's carefully crafted th

סיכום מאמר

חוקרים הצליחו לתמרן את Claude, מודל שפה מבית Anthropic, לתת הוראות לבניית חומרי נפץ באמצעות טכניקת "גזלייטינג". החוקרים השתמשו בשיטה שבה הם יצרו אינטראקציה עם המודל בצורה שגרמה לו להאמין שהם עובדים על פרויקט לגיטימי, ובכך הובילו אותו לספק מידע רגיש. המחקר, ששותף עם The Verge, מעלה חששות לגבי הבטיחות של מודלים שפה מתקדמים ויכולתם להתמודד עם התקפות סייבר מתוחכמות. Anthropic נודעה כחברה שמתמקדת בבטיחות בינה מלאכותית, אך הממצאים החדשים מערערים על תדמית זו. החוקרים הדגישו כי הבעיה אינה נובעת מהמודל עצמו, אלא מהאופן שבו הוא מתוכנת להבין ולהגיב לקלט אנושי. המקרה מדגיש את הצורך בפיתוח שיטות אבטחה מתקדמות יותר כדי למנוע ניצול לרעה של מודלים שפה. הבעיה עלולה להיות רלוונטית גם למודלים אחרים דומים.

קרא עוד באתר The Verge

עוד מאמרים בנושא

Claude Mythos turns years of security research into 20-hour AI exploits

לפני 4 ימים TechRadar

Anthropic’s latest Claude release turns your Mac into a small business powerhouse

לפני 6 ימים 9to5Mac

‘Your Wi-Fi cable could be a secret microphone': How researchers turned an earthquake detection…

לפני 1 שבועות TechRadar

Hackers Using Fake Claude AI Installer Pages to Trick Users Into Running Malware on…

לפני 1 שבועות Cyber Security News

CVE MCP Server Turns Claude Into a Full-Spectrum Security Analyst With 27 Tools Across…

לפני 2 שבועות Cyber Security News

Anthropic Brings Claude AI Directly Into Adobe, Autodesk, and Blender

לפני 3 שבועות iClarified

ניוז קליק

Researchers gaslit Claude into giving instructions to build explosives

עוד מאמרים בנושא

Claude Mythos turns years of security research into 20-hour AI exploits

Anthropic’s latest Claude release turns your Mac into a small business powerhouse

‘Your Wi-Fi cable could be a secret microphone': How researchers turned an earthquake detection…

Hackers Using Fake Claude AI Installer Pages to Trick Users Into Running Malware on…

CVE MCP Server Turns Claude Into a Full-Spectrum Security Analyst With 27 Tools Across…

Anthropic Brings Claude AI Directly Into Adobe, Autodesk, and Blender

New video shows fiery moments engine separated from UPS plane in deadly crash

It's a giant leap forward for Google Search with new AI features starting as…

דסק החוץ סגנו של שגריר בריטניה בוושינגטון ג'יימס רוסקו עזב את תפקידו, כך הודיע משרד…

Trump says gas prices are ‘peanuts’ compared to Iran getting nukes: ‘You want to…

Insider denies PS6 release date delay or lower specs, with slow PS5 sales as…

Creepy Tenn. school board member who told young student she’s ‘hot’ is now facing…

האם השכולה קראה לתמוך בפאצ'ים של משיח ובית מקדש: "אלה חיילי הגאולה"

Fighter jets collide in midair at Idaho air show

Giants hire John Richter away from Texans as director of college scouting

מלא זמן לא היה במשלוח חינם: למי שמז* מוצרי חשמל מאמזון ארה"ב, ממיר מתח…

Arizona teen allegedly shoots dead pregnant girlfriend, 16, after demanding she ‘kill the baby’

The best IKEA products for the perfect vanity setup (and why they work)

תומר אלמגור נתניהו נשאל על אחריותו לגבי מאורעות 7 באוקטובר וסירב לקבל אחריות מלאה: "לכולם…

FBI Director Kash Patel fires back at drinking allegations during Senate hearing

פוסק הדור בדברים חותכים וברורים: "יהיה להם פרנסה בשפע רב! בשפע רב!!!"

5 כטב״מים ששיגר הבוקר חזבאללה למרחב דרום לבנון ויישובי הצפון יורטו - אחד מהם…

שוטר פתח בירי לעבר שני חשודים עם בקבוקי תבערה בחדרה

Bay Area police chief charged with hit-and-run on family’s car along highway