'Current LLMs introduce substantial errors when editing work documents': Microsoft scientists find most AI models struggle with long-running tasks — so maybe don't trust them completely just yet

12/05/2026-18:35 12/05/2026-18:40 מחשבים וטכנולוגיה TechRadar דיווח

The more interactions an AI model has, the less reliable it becomes, experts find, as even the best only scored 80.9% – and the worst scoring just 10.0%.

סיכום מאמר

מדעני מיקרוסופט גילו כי מודלים של בינה מלאכותית מתקשים בביצוע משימות ארוכות טווח, וזאת כאשר הם מבצעים אינטראקציות מרובות. המחקר מצא כי ככל שמודל ה-AI מבצע יותר אינטראקציות, כך הוא הופך פחות אמין. אפילו המודלים הטובים ביותר קיבלו ציון של 80.9% בלבד, בעוד שהגרועים ביותר קיבלו ציון של 10.0% בלבד. הממצאים מדגישים כי מודלים של בינה מלאכותית, כולל מודלים גדולים (LLMs), עדיין מתקשים בעריכת מסמכי עבודה ומבצעים שגיאות רבות. תוצאות אלו מעידות כי יש להתייחס למודלים אלו בזהירות ולא לסמוך עליהם באופן מלא, במיוחד כאשר מדובר במשימות מורכבות וארוכות טווח. המחקר מדגיש את הצורך בפיתוח מודלים משופרים ויותר אמינים. נכון לעכשיו, יש להיזהר משימוש יתר במודלים אלו. הם עדיין לא בשלים לשימוש מלא.

קרא עוד באתר TechRadar

עוד מאמרים בנושא

Android I/O: Surfshark's Alternative ID will make Android 17's 'Spoofing Protection' feature even better…

לפני 3 דקות TechRadar

The iPhone Ultra could be a surprise hit as new survey suggests potentially millions…

לפני 13 דקות TechRadar

I was already convinced I'd buy the Steam Controller when available — but this…

לפני 18 דקות TechRadar

How to watch Coppa Italia final 2026: Free streams and TV channels for Lazio…

לפני 38 דקות TechRadar

Google Fitbit Air vs Whoop: Should you get Fitbit's new screenless tracker or opt…

לפני 48 דקות TechRadar

The FBI just remotely reset thousands of home and small office routers – and…

לפני 58 דקות TechRadar

ניוז קליק

'Current LLMs introduce substantial errors when editing work documents': Microsoft scientists find most AI models struggle with long-running tasks — so maybe don't trust them completely just yet

עוד מאמרים בנושא

Android I/O: Surfshark's Alternative ID will make Android 17's 'Spoofing Protection' feature even better…

The iPhone Ultra could be a surprise hit as new survey suggests potentially millions…

I was already convinced I'd buy the Steam Controller when available — but this…

How to watch Coppa Italia final 2026: Free streams and TV channels for Lazio…

Google Fitbit Air vs Whoop: Should you get Fitbit's new screenless tracker or opt…

The FBI just remotely reset thousands of home and small office routers – and…

Argentina: Students protest Milei austerity as university funding dispute escalates

FBI Director Kash Patel fires back at drinking allegations during Senate hearing

שוטר פתח בירי לעבר שני חשודים עם בקבוקי תבערה בחדרה

Apple supplier Foxconn confirms ransomware attack affected North American factories

Yankees snap four-game skid as offense awakes from slumber in win over Orioles

נאלץ לשלם 84 אלף שקל על תאונה שגרם רכבו חודשיים אחרי שנמכר

Moment Frontier Airlines plane strikes person on Denver runway seen in horrifying new video

Argentina: Students protest Milei austerity as university funding dispute escalates

FBI Director Kash Patel fires back at drinking allegations during Senate hearing

Search team finds body of missing U.S. soldier in Morocco

תומר אלמגור נתניהו נשאל על אחריותו לגבי מאורעות 7 באוקטובר וסירב לקבל אחריות מלאה: "לכולם…

תומר אלמגור נתניהו על לוח הזמנים של המלחמה באיראן: "זה לא ייקח שנים, זה אולי…

פוסק הדור בדברים חותכים וברורים: "יהיה להם פרנסה בשפע רב! בשפע רב!!!"

טייצים שווים עם משלוח חינם עד הבית! (לקט מותגים מובילים ומותגים משתלמים במיוחד!)

5 כטב״מים ששיגר הבוקר חזבאללה למרחב דרום לבנון ויישובי הצפון יורטו - אחד מהם…

העיתון The Telegraph: ארגוני צדקה בבריטניה הפכו לכלי תעמולה של הרפובליקה האסלאמית של איראן. לפי…

‘Deed theft’ daughter accused of kidnapping dad, bilking social security payments

In Kristin Smart Case, Soil Suggests Human Remains Once Present in Yard