Researchers seek LLM baseline in human interactions

Researchers backed by a trio of leading US universities proposed a mechanism for evaluating the capabilities of large language models (LLMs) based on human psychology, addressing what they stated is a major problem in benchmarking caused by diverse use cases.

In a study of whether LLMs function as people expect, researchers funded by Harvard University, Massachusetts Institute of Technology (MIT) and the University of Chicago devised a method to evaluate how human generalisations impact their assessment of the AI-related technologies.

MIT explained people “form beliefs” about what we think others “do and do not know” when we interact, a principle which is then carried into our assessment of how well an LLM performs.

Researchers developed a human generalisation function by “asking questions, observing how a person or LLM responds and then making inferences about how that person or model would respond to related questions”.

If an LLM shows it can handle a complex subject, people will expect it to be proficient in related, less-complicated areas.

Models which fall short of this belief “could fail when deployed”, MIT stated.

Baseline
A survey developed based on whether participants believed a person or LLM would answer related questions correctly or incorrectly yielded “a dataset of nearly 19,000 examples of how humans generalise about LLM performance across 79 diverse tasks”.

The survey found participants were less able to generalise about how LLMs would perform compared to other people, a facet researchers believe could impact the way models are deployed moving forward.

Alex Imas, Professor of behavioural science and economics at the University of Chicago’s Booth School of Business, said the research highlighted a “critical issue with deploying LLMs for general consumer use”, because people may be put off using the models if they do not fully understand when responses will be accurate.

Imas added the study also provides something of a fundamental baseline for assessing LLM performance, specifically whether they “understand the problem they are solving” when giving correct answers, in turn helping to improve performance in real-world scenarios.

The post Researchers seek LLM baseline in human interactions appeared first on Mobile World Live.

Commercials Cooperation Advertisements:


(1) IT Teacher IT Freelance

IT電腦補習

立刻註冊及報名電腦補習課程吧!
电子计算机 -教育 -IT 電腦班” ( IT電腦補習 ) 提供一個方便的电子计算机 教育平台, 為大家配對信息技术, 電腦 老師, IT freelance 和 programming expert. 讓大家方便地就能找到合適的電腦補習, 電腦班, 家教, 私人老師.
We are a education and information platform which you can find a IT private tutorial teacher or freelance.
Also we provide different information about information technology, Computer, programming, mobile, Android, apple, game, movie, anime, animation…


(2) ITSec

https://itsec.vip/

www.ITSec.vip

www.Sraa.com.hk

www.ITSec.hk

www.Penetrationtest.hk

www.ITSeceu.uk

Secure Your Computers from Cyber Threats and mitigate risks with professional services to defend Hackers.

ITSec provide IT Security and Compliance Services, including IT Compliance Services, Risk Assessment, IT Audit, Security Assessment and Audit, ISO 27001 Consulting and Certification, GDPR Compliance Services, Privacy Impact Assessment (PIA), Penetration test, Ethical Hacking, Vulnerabilities scan, IT Consulting, Data Privacy Consulting, Data Protection Services, Information Security Consulting, Cyber Security Consulting, Network Security Audit, Security Awareness Training.

Contact us right away.

Email (Prefer using email to contact us):
SalesExecutive@ITSec.vip

Leave a Reply

Your email address will not be published. Required fields are marked *