Anthral Labs — Open weights

Anthral Research.

LoRA adapters fine-tuned with GRPO on top of Qwen/Qwen3.5-9B-Base. Trained on retrieval-grounded forecasting questions derived from GDELT events, with strict leakage controls.

Phase 1 — binary forecasting

qwen3.5-9b-gdelt-binary-grpo-phase1

GRPO on 1,015 binary yes/no forecasting questions. Paranoid retrieval — EXPLICIT and IMPLIED leakage stripped. LoRA r=16, α=32. 500 steps.

adapter_model.safetensors111.0 MiB
adapter_config.json1.1 KiB
README.md5.1 KiB
tokenizer.json19.1 MiB
tokenizer_config.json1.2 KiB
chat_template.jinja7.6 KiB
processor_config.json1.2 KiB
training_args.bin7.0 KiB

II.

Phase 2 — mixed binary & free-form

qwen3.5-9b-gdelt-mixed-grpo-phase2

Continued from Phase 1. GRPO on a mixed set of binary and free-form forecasting questions, paranoid retrieval. Same LoRA configuration.

adapter_model.safetensors111.0 MiB
adapter_config.json1.1 KiB
README.md5.1 KiB
tokenizer.json19.1 MiB
tokenizer_config.json1.2 KiB
chat_template.jinja7.6 KiB
processor_config.json1.2 KiB
training_args.bin7.0 KiB

Quick load

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base  = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base", torch_dtype="bfloat16", device_map="auto")
tok   = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B-Base")
model = PeftModel.from_pretrained(base, "./qwen3.5-9b-gdelt-binary-grpo-phase1")