RAFT: A Real-World Few-Shot Text Classification Benchmark

Large pre-trained language models have shown promise for few-shot learning, completing text-based tasks given only a few task-specific examples. Will models soon solve classification tasks that have so far been reserved for human research assistants? Existing benchmarks are not designed to measure progress in applied settings, and so don't directly answer this question. The RAFT benchmark (Real-world Annotated Few-shot Tasks) focuses on naturally occurring tasks and uses an evaluation setup that mirrors deployment. Baseline evaluations on RAFT reveal areas current techniques struggle with: reasoning over long texts and tasks with many classes. Human baselines show that some classification tasks are difficult for non-expert humans, reflecting that real-world value sometimes depends on domain expertise. Yet even non-expert human baseline F1 scores exceed GPT-3 by an average of 0.11. The RAFT datasets and leaderboard will track which model improvements translate into real-world benefits at this https URL.

Read paper

Theme

Computer Science

Date

September 28, 2021

author

s

Neel Alex, Eli Lifland, Lewis Tunstall, Abhishek Thakur, Pegah Maham, C. Jess Riedel, Emmie Hine, Carolyn Ashurst, Paul Sedille, Alexis Carlier, Michael Noetel, Andreas Stuhlmüller

RAFT: A Real-World Few-Shot Text Classification Benchmark

Theme

Date

author

s

Share

Research Summary

Footnotes

Further reading

Computer Science

Open Problems in Technical AI Governance

July 2024

Anka Reuel, Ben Bucknall, et al.

Computer Science

Recent Trends in China's Large Language Model Landscape

April 2023

GovAI Report

Jeffrey Ding, Jenny Xiao

Computer Science

Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases

March 2023

Emma Bluemke, Tantum Collins, Ben Garfinkel, Andrew Trask

Computer Science

Open Problems in Technical AI Governance

July 2024

Anka Reuel, Ben Bucknall, et al.

Computer Science

Recent Trends in China's Large Language Model Landscape

April 2023

GovAI Report

Jeffrey Ding, Jenny Xiao

Computer Science

Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases

March 2023

Emma Bluemke, Tantum Collins, Ben Garfinkel, Andrew Trask

Computer Science

AI Ethics Statements: Analysis and lessons learnt from NeurIPS Broader Impact Statements

May 2022

FAccT 2022

Carolyn Ashurst, Emmie Hine, Paul Sedille, Alexis Carlier

RAFT: A Real-World Few-Shot Text Classification Benchmark

Theme

Date

author

s

Share

Research Summary

Footnotes

Further reading

Related publications

Computer Science

Open Problems in Technical AI Governance

July 2024

Anka Reuel, Ben Bucknall, et al.

Computer Science

Recent Trends in China's Large Language Model Landscape

April 2023

GovAI Report

Jeffrey Ding, Jenny Xiao

Computer Science

Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases

March 2023

Emma Bluemke, Tantum Collins, Ben Garfinkel, Andrew Trask

Computer Science

Open Problems in Technical AI Governance

July 2024

Anka Reuel, Ben Bucknall, et al.

Computer Science

Recent Trends in China's Large Language Model Landscape

April 2023

GovAI Report

Jeffrey Ding, Jenny Xiao

Computer Science

Exploring the Relevance of Data Privacy-Enhancing Technologies for AI Governance Use Cases

March 2023

Emma Bluemke, Tantum Collins, Ben Garfinkel, Andrew Trask

Computer Science

AI Ethics Statements: Analysis and lessons learnt from NeurIPS Broader Impact Statements

May 2022

FAccT 2022

Carolyn Ashurst, Emmie Hine, Paul Sedille, Alexis Carlier