Anthral Labs — Open weights

Anthral Research.

LoRA adapters fine-tuned with GRPO on top of Qwen/Qwen3.5-9B-Base. Trained on retrieval-grounded forecasting questions derived from GDELT events, with strict leakage controls.


I.

Phase 1 — binary forecasting

qwen3.5-9b-gdelt-binary-grpo-phase1

GRPO on 1,015 binary yes/no forecasting questions. Paranoid retrieval — EXPLICIT and IMPLIED leakage stripped. LoRA r=16, α=32. 500 steps.

II.

Phase 2 — mixed binary & free-form

qwen3.5-9b-gdelt-mixed-grpo-phase2

Continued from Phase 1. GRPO on a mixed set of binary and free-form forecasting questions, paranoid retrieval. Same LoRA configuration.


Quick load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base  = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base", torch_dtype="bfloat16", device_map="auto")
tok   = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B-Base")
model = PeftModel.from_pretrained(base, "./qwen3.5-9b-gdelt-binary-grpo-phase1")