A small eval loop for the humanizer skill
A case study in using Caliper to evaluate blader/humanizer, tighten voice calibration, and turn the improvement into an upstream contribution with regression coverage.
agents evaluation skills writing
Tagged