This page is not yet available in Deutsch. Showing English version.

PurviewGDPRBenchmark

Real-World Purview Benchmark: We Tested Microsoft Purview on 89,521 Files

Data & More

|4 min read

Real-World Purview Benchmark: We Tested Microsoft Purview on 89,521 Files

Nobody believes a small Danish company when we say Purview classification does not work for real-world GDPR compliance. So we did something about it: we created a synthetic benchmark dataset that anyone can download and test themselves.

The Dataset

We analyzed over 5 petabytes of real-world data and generated a synthetic dataset that mirrors real unstructured data, without using any actual personal data. The distribution of file types is based on statistical analysis and reflects the typical mix found in enterprise environments.

Synthetic Benchmark Dataset: 89,521 files, 73 GB, 863 KB average, 100% manually validated

89,521 files total
73 GB total size
863 KB average file size
100% manually validated ground truth

The dataset is available on Hugging Face for anyone to download and reproduce: huggingface.co/datasets/kbillesk/synthetic-privacy

Privacy and Security Data in the Benchmark

The benchmark dataset contains 25 categories of privacy and security data. The two categories used for direct Purview comparison are Payment Card (5,286 files) and Passport (53 files).

Privacy data breakdown: 25 categories from Payment Card (5,286 files) to Religious Orientation (2 files)

Why Classification Matters

Everything in Microsoft Purview depends on correct classification. Sensitivity labels, auto-labeling, Data Loss Prevention (DLP), and retention management all rely on the ability to classify and tag data correctly. If classification fails, every downstream feature fails with it.

Benchmark Results: Passport Classification

We used Purview's out-of-the-box Passport classification (set to medium confidence) and compared it against the known ground truth in our dataset. The data was stored in SharePoint.

Passport Benchmark: Purview classified 29,845 files as passports when only 53 exist (99.8% misclassified). Auto-labeling labeled only 1 file correctly (98.1% not labeled).

99.8% misclassified by Purview's classification. Purview relies on simple regular expressions, which produce false positives (data misclassified as a passport) and false negatives (passports stored as images or scans go undetected).
98.1% not labeled by Purview's auto-labeling. Only 1 file was correctly labeled with a sensitivity label, because Purview only supports a limited number of file types.

Benchmark Results: Payment Card Classification

For payment card data, the results were somewhat better but still alarming:

Payment Card Benchmark: Purview classified 26,413 files as payment cards when only 5,286 exist (91% misclassified). Auto-labeling missed 57% of actual payment card data.

91% misclassified by Purview. The regex-based classification produces far too many false positives, incorrectly flagging data as payment cards.
57% not labeled by auto-labeling. Only 43% of payment card data was correctly labeled, because Purview only supports a limited number of file formats.

Purview has no real or compliant GDPR/privacy classification. It offers only a few imprecise samples in a limited number of languages. To achieve actual GDPR compliance, organizations would need to custom-develop classification for 28+ languages across dozens of missing privacy categories.

Missing GDPR classifications in Purview: national certificates (birth, citizenship, marriage, etc.) and privacy data types (health info, political orientation, salary, criminal records, etc.)

Missing national certificates: Birth, citizenship, marriage, divorce, registered partnership, ancestry, name change, proof of citizenship, marital status, place of residence, religion, and more.

Missing privacy data types: Health information (medicine, diagnosis, illness), political orientation, sexual orientation, ethnic origin, trade union affiliation, religious orientation, personal tax information, salary information, employment contracts, recruitment data, CVs, written warnings, work absence, criminal records, written consents, travel information, photo geolocation, and more.

The Labeling Dilemma

Purview offers two labeling options, and both have fundamental limitations:

Purview labeling: Manual labeling only works on new files (0.77% adoption rate). Auto-labeling supports only 5% of unstructured data with 50-99% error rates.

Manual labeling only works on new files, but 99.99% of data already exists. Research shows users dislike manual labeling and do not do it voluntarily. Studies show only 0.77% of data gets manually tagged. When users can only choose from 4-5 label options, the result is too limited for real compliance requirements.

Auto-labeling does not support emails at rest or their attachments, and only covers a limited number of file formats. In real-world environments, Purview auto-labeling supports around 5% of all unstructured data. The classification itself is not GDPR compliant, with a 50-99% error rate that creates systemic compliance and DLP exposure. There are also additional per-page costs for PDF and image OCR scanning on top of the E5 license.

Try It Yourself

Download the dataset from Hugging Face, load it into your own SharePoint or Exchange environment, and run Purview classification against it. Compare the results to the manually validated ground truth. We are confident you will see the same results.

If you want to see how Data & More classifies the same data, get in touch for a demo.

Data & More

Because you can't leak data that isn't there

Real-World Purview Benchmark: We Tested Microsoft Purview on 89,521 Files

The Dataset

Privacy and Security Data in the Benchmark

Why Classification Matters

Benchmark Results: Passport Classification

Benchmark Results: Payment Card Classification

The Labeling Dilemma

Try It Yourself

Weitere Artikel

Why Europe and the Free World Might Be the Real Winners in the AI War

Visit our US & Canada site?

Real-World Purview Benchmark: We Tested Microsoft Purview on 89,521 Files

The Dataset

Privacy and Security Data in the Benchmark

Why Classification Matters

Benchmark Results: Passport Classification

Benchmark Results: Payment Card Classification

What Purview Is Missing for GDPR

The Labeling Dilemma

Try It Yourself

Weitere Artikel

Why Europe and the Free World Might Be the Real Winners in the AI War