Optimizing Abstractive Arabic Summarization via RLHF and DPO with Llama 2
DOI:
https://doi.org/10.14232/actacyb.316434Keywords:
abstractive summarization, Arabic, reinforcement learning, Direct Preference Optimization, RLHF, DPO, Llama 2Abstract
Given the advantages observed with Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) in English, it is promising to explore their effectiveness for abstractive summarization in languages with complex morphological and syntactic features, such as Arabic. In this study, we fine-tune the Llama~2 model, which demonstrates a significant capability to enhance summarization results. We highlight how Llama 2, combined with advanced techniques like RLHF and DPO, markedly improves the quality of Abstractive Arabic summarization, showcasing the model's superior performance in this challenging task. Furthermore, the AraSum corpus plays a critical role in achieving outstanding results, highlighting its effectiveness in improving the performance of summarization models. While this work focuses on Arabic, the techniques and insights presented are language-agnostic, offering broader applications for abstractive summarization in other languages.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Acta Cybernetica

This work is licensed under a Creative Commons Attribution 4.0 International License.

