Controlling how data is used with ODRL

My sport coach handles health data: heart rate, training load, sessions, how I feel. By default, everything stays on my machine: the language model that writes the messages runs locally. But I can temporarily switch on a more capable model in the cloud for the main chat. During that window, some of this data could leave my machine and be sent to the provider.

I wanted that boundary to be explicit. Not an if buried somewhere in the code that would decide, case by case, whether a given piece of data is allowed to leave. A rule written separately, readable, attached to the data itself, and enforced by a dedicated component before every send. That is exactly what ODRL allows.

ODRL in brief

ODRL (Open Digital Rights Language) is a W3C language for expressing usage rules over resources. A policy is a set of rules of three kinds: permissions, prohibitions and obligations. Each rule applies to an action on a target, under certain constraints.

The actions I use are the ODRL 2.2 ones: read, use, reproduce and distribute. Constraints are triples: a left operand (leftOperand), an operator, and a right operand (rightOperand). For example “purpose equals coaching”, or “recipient is one of {cloud provider, third-party model}”.

ODRL doesn’t fix the domain vocabulary; you have to define it. So I have a small profile, coach-odrl-v1, that declares the terms specific to the coach:

purposes: coaching, scoring, dataPortability, modelTraining, audit…
recipients: dataSubject (the person themselves), cloudProvider, thirdPartyLLM, externalThirdParty…
execution locations: localRuntime, cloudRuntime.

A concrete policy

Here, simplified, is the policy attached to a piece of health data that must stay local (I call it health_local_only). The coach: and odrl: prefixes refer to the coach’s vocabulary and to ODRL’s, respectively.

{
  "@type": "Set",
  "uid": "urn:coach:policy:health_local_only:…",
  "permission": [
    {
      "action": "read",
      "constraint": [
        {"leftOperand": "purpose", "operator": "isAnyOf",
         "rightOperand": ["coach:coaching", "coach:scoring"]},
        {"leftOperand": "virtualLocation", "operator": "eq",
         "rightOperand": "coach:localRuntime"}
      ]
    }
  ],
  "prohibition": [
    {
      "action": "distribute",
      "constraint": [
        {"leftOperand": "recipient", "operator": "isAnyOf",
         "rightOperand": ["coach:cloudProvider", "coach:thirdPartyLLM",
                          "coach:externalThirdParty"]}
      ]
    },
    {
      "action": "use",
      "constraint": [
        {"leftOperand": "purpose", "operator": "eq",
         "rightOperand": "coach:modelTraining"}
      ]
    }
  ]
}

It reads directly. You may read the data for coaching or scoring, as long as you stay in local execution. You may not send it to a cloud provider or a third-party model, nor use it to train a model. (I removed a few rules for readability; the real policy has a few more.)

When the cloud is explicitly allowed, another policy takes over, health_cloud_llm. It adds one permission: sending the data to a third-party model, but only for coaching or scoring, and still not for training nor to any external third party.

The rule travels with the data

Every piece of data the coach stores is kept in an envelope that, beyond the content, carries metadata: where it comes from, its category, and the policy that applies to it. A session imported from Garmin or entered in conversation is classified as health data, and gets the health_local_only policy by default.

The choice of policy is not made at send time, but at write time. From the moment it is created, the data knows what may be done with it. The data carries its own rule, not the code that handles it.

A single source of truth

The policies are not hard-coded. They come from a templates file, and that same file serves two things at once:

the program, which generates the concrete policy attached to each piece of data;
the public documentation: two addresses you can open in a browser, describing the vocabulary and the profile.

The goal is that there be no gap between what the system says publicly and what it actually enforces. A test checks exactly that the two coincide: if the published documentation and the enforced rule diverge, the test fails. Consistency isn’t a promise, it’s a check.

Deciding before sending

It remains to enforce all this. That’s the job of two components common in this kind of architecture: a decision point (PDP, Policy Decision Point) that rules, and an enforcement point (PEP, Policy Enforcement Point) that asks the question at the right moment and obeys the answer.

The coach’s PEP sits just before the call to the language model. It only fires in one specific case: the target model is not local, and the request’s content carries health context. The rest of the time, it does nothing. No needless cost.

When it fires, it builds a request: “am I allowed to send (distribute) this data, for the coaching purpose, to a third-party model (thirdPartyLLM), in cloud execution (cloudRuntime)?”

{
  "target": "urn:coach:data:…",
  "action": "distribute",
  "context": {
    "purpose": "coach:coaching",
    "recipient": "coach:thirdPartyLLM",
    "virtualLocation": "coach:cloudRuntime"
  }
}

It asks the PDP this question, along with the data’s policy. The answer looks like this:

{
  "allowed": false,
  "reason": "prohibition_active",
  "active_permissions": [],
  "active_prohibitions": ["…:distribute"]
}

With health_local_only, this request hits the prohibition on sending to a cloud recipient: the decision is allowed: false, and the send is blocked. With health_cloud_llm (that is, when I have explicitly opened the cloud), the same request finds a matching permission and no prohibition: the decision becomes allowed: true.

The decision rule is simple: it’s allowed if there is at least one active permission and no active prohibition. A prohibition always wins over a permission.

The engine, and doubt

Two details matter. First, I don’t reimplement ODRL’s logic myself: the decision is made by a dedicated engine (FORCE, and the odrl-evaluator library), which I call like a small service. I pass it the policy and the request, it returns the decision. I don’t have to hand-code “if this action and this recipient then…”; that is precisely what I wanted to avoid.

Second, this engine can fail, take too long to answer, or return something invalid. In all these cases, the default decision is denial, not permission. A failure of the decision point must never turn data meant to stay local into data sent to the cloud: when in doubt, close the door.

Conclusion

At the coach’s scale, blocking a send would take a few lines of code. If I chose ODRL, it isn’t to make things complicated: it’s the language in which usage rules are written within dataspaces, those data-sharing spaces where each participant keeps control over what others are allowed to do with theirs. I’m preparing the coach to join a sport dataspace; and since this is health data, a standard usage rule, attached to the data and read the same way by every actor, is not a luxury. That’s a topic I’ll detail in another article.

In the meantime, the same building block already serves inside the coach. The rule is written once, in a standard language, attached to the data, published exactly as it is enforced, and evaluated by an engine that knows its semantics. The boundary between “what stays with me” and “what may leave” is no longer a condition lost in the code: it’s a policy you can read, document and check. And as long as I don’t explicitly open the cloud, all health data stays barred from leaving, with the slightest failure of the control resolving into a denial.

ODRL in brief#

A concrete policy#

The rule travels with the data#

A single source of truth#

Deciding before sending#

The engine, and doubt#

Conclusion#