Posts by Collection

portfolio

LMGame-GamingAgent

LLM/VLM gaming agents and model evaluation through games.

LMGame-Multi-Turn-RL-Training

RL Train LLM/VLM during Multi-Turn Environments

LMGame‑Leaderboard

Side‑by‑side leaderboards for Model (no harness) and Agent (harness‑enabled) performance.

LMGame‑Website

Official hub for LMGame resources, docs, and blog updates.

publications

lmgame‑Bench: How Good are LLMs at Playing Games?

Published in arXiv (submitted to NeurIPS ’25), 2025

Introduces lmgame‑Bench, a unified Gym‑style benchmark that tests LLM agents across platformer, puzzle, and narrative games—addressing vision brittleness, prompt variance, and data contamination.

Download Paper

General Modular Harness for LLM Agents in Multi‑Turn Gaming Environments

Published in ICML MAS Workshop, 2025

Introduces a perception–memory–reasoning harness that consistently boosts LLM/VLM gameplay across classic and modern game suites, uncovering module‑specific performance patterns.

Download Paper