The G7 on Open Source vs. Open Weights

mooreds1 pts0 comments

The G7 on Open Source vs Open Weights – tecosystems

You are using an outdated browser. Please upgrade your browser to improve your experience.

Skip to Content

, via Wikimedia Commons" href="https://commons.wikimedia.org/wiki/File:Evian-les-Bains_(Haute-Savoie)_(10004827914).jpg">

The term “open source” was coined in 1998, at least in part, because the term that preceded it was unclear and required explanation. Free software was descriptive and understood within technical communities familiar with it, but misleading to newcomers who understood free in commercial rather than philosophical terms. It was clear, in other words, that a new descriptor was required, and open source was the result.

Two years after the introduction of a proposed definition for open source AI, adoption has been minimal and there is an increasing awareness that unlike with pure source code, the assets that make up an AI model cannot be reduced to a single, binary open and closed definition.

It’s not that the licenses and promises of open source as they pertain to source code are simple. They can be complex, nuanced and difficult to explain. But they are, at least, that binary. Code is either open source or it’s not. By contrast, it’s now clear that open source AI – of which code is only a small part – is going to have to be defined along a spectrum from closed to open.

The G7 nations – with input from parties like the OSI – apparently concur. Their paper published this week, “G7 Vision on AI openness opportunities and shared language,” has several important takeaways. Among them:

First, it clearly and unambiguously states that both open source and AI openness have immense societal benefits. It calls the latter, in fact, “an essential contributor to our economies.”

Second, it acknowledges the risks and potential future harm that can result from the lack of clear, consistent definitions. “This lack of clarity in the field of AI tends to cast doubt on the degree of openness of such technologies, thereby undermining their benefits.”

Third, it explicitly rejects a strict open or closed definition – “the openness of an AI is not binary.”

Fourth, it implicitly rejects existing definitions – “the meaning of Open-Weight or Open Source AI remains contested.”

Lastly, it proposes a four tier system for categorizing AI projects that sit along a spectrum of open.

The proposed classifications are similar in some respects to existing attempts like the Linux Foundation’s Model Openness Framework. Both are built for a landscape in which projects will differ in what specifically is made available, the terms its made available under and what restrictions, if any, are placed on use. But where the MOF is quite granular, grading projects around 17 components across an entire development lifecycle, the G7 vision is simpler. It defines four tiers based on five components (weights, deployment code, training code, training data and use restrictions).

In rough terms, those tiers can be described as follows ranging from most open to least:

Open Source AI with Open Data : everything is open and under an OSI license – code, data, weights, every asset.

Open Source AI : what’s available is open, but it may or may not include training data, though it must include full training code.

Open Weights AI : weights and code are available and under an OSI license, but nothing else.

Weights Available AI : weights and code are available and open for inspection, but are released under a license which cannot be called open source due to use restrictions or other prohibited limitations.

It remains to be seen whether or not the industry can adapt to a definition of open that depends on a sliding scale rather a fixed yes/no. But it also doesn’t have a choice. Two years of development and discussion and two years of living with a proposed definition have gotten us no closer to an industry consensus. Subtly, however, what the G7 nations have done with this document intentionally or unintentionally is to both acknowledge that fact, make it irrelevant and implicitly propose their alternative.

The challenge for any single definition of open source AI is that it is not possible to please both definition purists and definition pragmatists. The former point out that any definition that allows for any omission of training data is effectively granting the term open source to a project that cannot ever be independently replicated. Which is legitimate. The latter, on the other hand, point to issues with datasets ranging from the byzantine nature of data licensing to the sheer impracticality of the size of these datasets. Which are also legitimate. You can please one of these groups about an open source definition, but not both.

What the G7 is proposing serves as a recognition that that debate is a lost cause. Instead, for all intents and purposes, the G7 is proposing to deprecate the term open source AI in favor of open weights.

It is true, on the one hand, that there are...

open source definition code weights available

Related Articles