Anthral Labs — Open weights
Anthral Research.
LoRA adapters fine-tuned with GRPO on top of Qwen/Qwen3.5-9B-Base. Trained on retrieval-grounded forecasting questions derived from GDELT events, with strict leakage controls.
Phase 1 — binary forecasting
qwen3.5-9b-gdelt-binary-grpo-phase1
GRPO on 1,015 binary yes/no forecasting questions. Paranoid retrieval — EXPLICIT and IMPLIED leakage stripped. LoRA r=16, α=32. 500 steps.
- adapter_model.safetensors111.0 MiB
- adapter_config.json1.1 KiB
- README.md5.1 KiB
- tokenizer.json19.1 MiB
- tokenizer_config.json1.2 KiB
- chat_template.jinja7.6 KiB
- processor_config.json1.2 KiB
- training_args.bin7.0 KiB
Phase 2 — mixed binary & free-form
qwen3.5-9b-gdelt-mixed-grpo-phase2
Continued from Phase 1. GRPO on a mixed set of binary and free-form forecasting questions, paranoid retrieval. Same LoRA configuration.
- adapter_model.safetensors111.0 MiB
- adapter_config.json1.1 KiB
- README.md5.1 KiB
- tokenizer.json19.1 MiB
- tokenizer_config.json1.2 KiB
- chat_template.jinja7.6 KiB
- processor_config.json1.2 KiB
- training_args.bin7.0 KiB
Quick load
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3.5-9B-Base", torch_dtype="bfloat16", device_map="auto")
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-9B-Base")
model = PeftModel.from_pretrained(base, "./qwen3.5-9b-gdelt-binary-grpo-phase1")