GitHub games are open-source projects for testing gameplay ideas, sharing code, and collaborating publicly outside ...
Ready-to-use configurations for Anthropic's Claude Code. A comprehensive collection of AI agents, custom commands, settings, hooks, external integrations (MCPs), and project templates to enhance your ...
Evaluation allows us to assess how a given model is performing against a set of specific tasks. This is done by running a set of standardized benchmark tests against the model. Running evaluation ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results