Underlying cause of prompt-masking degradation in K-FAC for CrispEdit
Determine the underlying cause of the observed suboptimal performance when masking prompt tokens during Kronecker-Factored Approximate Curvature (K-FAC) calculation in CrispEdit’s low-curvature projection for large language model editing, and ascertain whether the relaxed assumption of token independence during K-FAC computation is responsible for this degradation.
References
We found that masking prompt tokens for K-FAC calculation (mirroring the fine-tuning setup) yielded suboptimal performance, even with a larger number of tokens (\cref{tab:prompt_masking_ablation}). Instead, in our K-FAC calculation for edit samples, we calculate the next token prediction loss over the entire prompt–target sequence. While we are not sure about the underlying cause of this behavior, we suspect that it arises from our relaxed assumption of token independence during K-FAC calculation.