Abstract Summary
Event extraction (EE) aims to identify event triggers and their corresponding arguments from unstructured text, providing structured knowledge essential for many downstream tasks. Despite the success of instruction-tuned large language models (LLMs), current methods often produce inconsistent formats, semantically drifted outputs, and event types that deviate from predefined schemas. These issues arise partly because supervised fine-tuning relies on static loss functions that fail to reflect task-specific objectives such as schema alignment. To address these limitations, we introduce a reinforcement learning framework based on Group Relative Policy Optimization (GRPO) designed to optimize instruction-tuned LLMs for event and argument extraction. We propose three complementary reward functions: a format reward to enforce syntactic and structural validity, a BM25-based reward to enhance lexical and semantic consistency with the input text, and a task-specific supervision reward that directly aligns optimization with task-level performance. Extensive experiments on three standard EE datasets demonstrate that our approach consistently and significantly improves EE performance over strong baselines.