# Data Attribution Pipeline

The **Proof of Attribution** mechanism in OpenLedger ensures that each data source is cryptographically linked to model outputs, providing an immutable and decentralized record of contributions.&#x20;

The attribution pipeline follows these steps:

#### **Step 1: Data Contribution**

* Data contributors submit structured, domain-specific datasets for AI model training.
* Each dataset is attributed on-chain, ensuring transparency and verifiability.

#### **Step 2: Datanets and Influence Attribution**

* Contributors submit training data with metadata, defining its intended use.
* The impact of each data contribution is measured based on:
  * **Feature-level Influence:** Assessing the data’s effect on model training.
  * **Contributor Reputation:** Evaluating the credibility and past contributions of data providers.

#### **Step 3: Training and Verification**

* Influence scores are calculated to determine the quality and relevance of each contribution.
* Training logs ensure all data contributions are recorded and validated.

#### **Step 4: Reward Distribution Based on Attribution**

* Data contributors receive token-based rewards proportional to their data’s impact on model outputs.
* A fair attribution system ensures high-value contributions are prioritized.

#### **Step 5: Penalizing Malicious or Low-Quality Contributions**

* Contributions flagged as biased, redundant, or adversarial are penalized through stake slashing.
* If a contributor’s penalty score exceeds a threshold, future rewards are reduced, ensuring only high-quality data is retained in model training.

This structured pipeline ensures a provable and trustless attribution system that rewards valuable contributions while maintaining model integrity.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://openledger.gitbook.io/openledger/datanets-and-proof-of-attribution/data-attribution-pipeline.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
