Blog/Research

Open-source AI models: what is actually open?

Everyone calls their model open-source. Here's a ranked breakdown of what's actually public — weights, training code, data, post-training, and license — for the models that matter right now.

The phrase open-source model has become marketing shorthand. In practice, a release can give you downloadable weights and a smooth inference path while hiding everything you'd actually need to reproduce the system — the training data, the recipes, the post-training steps. That's not open-source. That's open-weight, and the distinction matters.

Open model research

Ranked by openness first, not just by benchmark strength.

Ranked by the Artificial Analysis Openness Index as of March 9, 2026. The goal isn't to pick a winner — it's to separate what you can actually reproduce from what you can only run.

Metric key

AA Rank

The model's position on the Artificial Analysis Openness Index.

Metric key

AA Open

Normalized openness score. Higher means more of the stack is public and reproducible.

Metric key

AA Int.

Normalized capability score. Shows where openness and raw performance diverge.

Status legend

OpenPartialClosed

Open: released and usable. Partial: described but not fully reproducible. Closed: not public.

WWeights

Downloadable checkpoints.

INFInference

A first-party or official runtime path.

TRNTraining code

Enough released code or recipes to reproduce training.

DATATraining data

Open data or a public mixture strong enough to recreate.

RLPost-train / RL

Alignment and post-training recipe beyond a vague mention.

LICLicense

Commercially usable without meaningful extra restrictions.

AA RankModelAA OpenAA Int.WINFTRNDATARLLIC
#3OLMo 3.1 32B ThinkAllen Institute for AI88.8913.94OpenOpenOpenOpenOpenOpen
#5K2 Think V2MBZUAI / LLM36088.8924.12OpenOpenOpenOpenOpenOpen
#11Nemotron 3 Nano 30B A3BNVIDIA72.2224.27OpenOpenPartialPartialPartialOpen
#19GLM-4.5 (Reasoning)Z.ai55.5626.42OpenOpenPartialClosedPartialOpen
#20GPT-OSS 120BOpenAI55.5626.03OpenOpenPartialClosedPartialOpen
#30Gemma 3 27B InstructGoogle50.0010.31OpenOpenClosedClosedPartialOpen
#35Magistral Small 1.2Mistral50.0018.16OpenOpenClosedClosedPartialOpen
#36DeepSeek R1 0528DeepSeek50.0027.07OpenOpenClosedClosedPartialOpen
#41GLM-5 (Reasoning)Z.ai50.0049.77OpenOpenClosedClosedClosedOpen
#80DeepSeek V3.2 ExpDeepSeek44.4432.94OpenOpenPartialClosedClosedOpen
#97Llama 4 MaverickMeta38.8918.00OpenOpenClosedPartialPartialPartial
#118Qwen3.5 397B A17BAlibaba38.8945.05OpenOpenClosedClosedClosedOpen
#127Kimi K2.5Moonshot AI33.3346.81OpenOpenClosedPartialPartialPartial
#138MiniMax-M2.5MiniMax27.7841.93OpenOpenClosedPartialPartialPartial

What should count as open?

If a model is genuinely open, you should be able to do more than run it. The minimum useful stack is weights, inference code, training code, training data, and a post-training recipe — plus a license that doesn't claw back the freedom the release seems to promise.

What stands out in this snapshot

  • OLMo 3.1 32B Think and K2 Think V2 are the real benchmark for openness — they publish the data, recipes, and code, not just the checkpoint.
  • GPT-OSS is a serious step from OpenAI: permissive license, strong runtime support, downloadable weights. But the training stack stays private, which is the divide that matters.
  • The biggest pattern here is the openness-capability inversion. The most capable models are often the least open. That gap is widening.
  • If you're evaluating models for anything beyond inference, the AA Open score matters as much as the benchmark rank. Open-weight and open-source are not the same thing.

Sources