TY - GEN
T1 - Standardize
T2 - 2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024
AU - Imperial, Joseph Marvin
AU - Forey, Gail
AU - Madabushi, Harish Tayyar
PY - 2024/11/30
Y1 - 2024/11/30
N2 - Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials. However, current works in controllable text generation have yet to explore using these standards as references for control. Towards this end, we introduce Standardize, a retrieval-style in-context learning-based framework to guide large language models to align with expert-defined standards. Focusing on English language standards in the education domain as a use case, we consider the Common European Framework of Reference for Languages (CEFR) and Common Core Standards (CCS) for the task of open-ended content generation. Our findings show that models can gain 40% to 100% increase in precise accuracy for Llama2 and GPT-4, respectively, demonstrating that the use of knowledge artifacts extracted from standards and integrating them in the generation process can effectively guide models to produce better standard-aligned content.
AB - Domain experts across engineering, healthcare, and education follow strict standards for producing quality content such as technical manuals, medication instructions, and children's reading materials. However, current works in controllable text generation have yet to explore using these standards as references for control. Towards this end, we introduce Standardize, a retrieval-style in-context learning-based framework to guide large language models to align with expert-defined standards. Focusing on English language standards in the education domain as a use case, we consider the Common European Framework of Reference for Languages (CEFR) and Common Core Standards (CCS) for the task of open-ended content generation. Our findings show that models can gain 40% to 100% increase in precise accuracy for Llama2 and GPT-4, respectively, demonstrating that the use of knowledge artifacts extracted from standards and integrating them in the generation process can effectively guide models to produce better standard-aligned content.
KW - cs.CL
UR - http://www.scopus.com/inward/record.url?scp=85217812536&partnerID=8YFLogxK
U2 - 10.18653/v1/2024.emnlp-main.94
DO - 10.18653/v1/2024.emnlp-main.94
M3 - Chapter in a published conference proceeding
T3 - EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
SP - 1573
EP - 1594
BT - Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
A2 - Al-Onaizan, Yaser
A2 - Bansal, Mohit
A2 - Chen, Yun-Nung
PB - Association for Computational Linguistics
CY - Florida, U.S.A.
Y2 - 12 November 2024 through 16 November 2024
ER -