Optimizing Abstractive Arabic Summarization via RLHF and DPO with Llama 2

Authors

DOI:

https://doi.org/10.14232/actacyb.316434

Keywords:

abstractive summarization, Arabic, reinforcement learning, Direct Preference Optimization, RLHF, DPO, Llama 2

Abstract

Given the advantages observed with Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) in English, it is promising to explore their effectiveness for abstractive summarization in languages with complex morphological and syntactic features, such as Arabic. In this study, we fine-tune the Llama~2 model, which demonstrates a significant capability to enhance summarization results. We highlight how Llama 2, combined with advanced techniques like RLHF and DPO, markedly improves the quality of Abstractive Arabic summarization, showcasing the model's superior performance in this challenging task. Furthermore, the AraSum corpus plays a critical role in achieving outstanding results, highlighting its effectiveness in improving the performance of summarization models. While this work focuses on Arabic, the techniques and insights presented are language-agnostic, offering broader applications for abstractive summarization in other languages.

Downloads

Download data is not yet available.

Downloads

Published

2026-06-02

How to Cite

Kahla, M., & Yang, Z. G. (2026). Optimizing Abstractive Arabic Summarization via RLHF and DPO with Llama 2. Acta Cybernetica. https://doi.org/10.14232/actacyb.316434

Issue

Section

Special Issue of the 21th Conference on Hungarian Computational Linguistics

Most read articles by the same author(s)