Data Quality

How Sentinel Transforms Data Validation At Scale - Built for the Data Quality Standards of Today and Tomorrow

Sentinel's primary function is to replace extensive manual data validation. The impact on both time investment and data coverage is substantial, and this post outlines what that difference looks like in practice — particularly in the context of increasingly stringent expectations around data quality and traceability.

Validation Time

The most immediate impact of Sentinel is on validation time:

• Manual validation: ~6 person-days per cohort

• With Sentinel: ~1 hour per cohort

This reduction is driven primarily by eliminating manual sentence-level checks, which traditionally account for the bulk of validation effort. Rather than having reviewers search through outputs for potential issues, Sentinel automatically flags anomalies and directs human attention to the identified problems.

This shifts the human role from detection to review — a far more efficient use of expert time, and one that scales with dataset size in a way that manual review simply cannot. It also introduces a more consistent and reproducible validation process, which is becoming increasingly relevant as data quality expectations continue to formalize across the healthcare ecosystem.

Improved Data Completeness

One of the structural limitations of manual validation is that it requires pre-selection. No reviewer can systematically cover an entire dataset, meaning that errors outside the selected sample go undetected. This is not a reflection of reviewer quality — it is an inherent constraint of human review at scale.

To put the scale in context: a typical cohort contains approximately 55 million concepts per hospital. At that volume, complete manual coverage is not feasible, and partial review introduces known blind spots.

Sentinel addresses this directly by running automated checks detecting:

• Empty or missing values

• Implausible values (via predefined lower and upper thresholds)

• Incorrectly extracted sentences (e.g. repetitive entries, misidentified adverse events)

The result is a level of systematic coverage that manual validation cannot realistically achieve.

Targeted Checks, Maximum Impact

Sentinel focuses specifically on measurement checks and general error definitions that we define ourselves and then apply automatically across all datasets. While these checks target a defined subset of variables, they are selected precisely for their downstream impact on data quality and model performance — not breadth for its own sake.

This targeted approach directly supports three outcomes:

• Improved data quality

• Better model performance

• Faster generation of reliable insights

By focusing automated checks where they matter most, Sentinel enables organisations to operationalise data quality in a scalable way. This becomes increasingly relevant in light of frameworks such as the European Health Data Space, where consistent validation, traceability, and dataset completeness are expected — but difficult to achieve through manual processes alone.

Heading

Heading

What’s a Rich Text element?

The rich text element allows you to create and format headings, paragraphs, blockquotes, images, and video all in one place instead of having to add and format them individually. Just double-click and easily create content.

Static and dynamic content editing

A rich text element can be used with static or dynamic content. For static content, just drop it into any page and begin editing. For dynamic content, add a rich text field to any collection and then connect a rich text element to that field in the settings panel. Voila!

How to customize formatting for each rich text

Headings, paragraphs, blockquotes, figures, images, and figure captions can all be styled after a class is added to the rich text element using the "When inside of" nested selector system.

Talk to an Expert

Other articles that might interest you

Visit our Knowledge Center