OCR Accuracy Compared: Tesseract vs Cloud APIs – wandlio Blog

What is OCR and why does accuracy matter?

OCR (Optical Character Recognition) converts printed or handwritten text into machine-readable characters. The technology is indispensable in modern work: digitizing invoices, making contracts searchable, automatically extracting form data – all OCR applications. But accuracy varies dramatically between available solutions. A recognition error can turn a €1,000 invoice into €10,000, a customer account 12345 into 12346, or a date 01/03/2026 into 01/08/2026.

In this article, we compare the four most important OCR engines using our own test data: Tesseract (open source), Google Cloud Vision, AWS Textract, and Azure Read. We measure not just recognition rate, but also error types, processing time, cost, and data privacy aspects.

Test methodology

We tested 200 documents of different types and quality:

50 printed documents (invoices, contracts, forms) – high quality, clear text
50 photos of documents – taken with smartphone camera, varying lighting and perspective
50 handwritten notes – different handwriting styles, pencil and ballpoint pen
50 scanned historical documents – faded ink, stains, old typefaces (Fraktur with official Tesseract model; Sütterlin only with custom models)

Each OCR engine was tested with default settings. For Tesseract we use version 5.3 with the German language package (deu.traineddata). Cloud APIs were called through their official SDKs.

Recognition rate: The raw numbers

Engine	Printed	Photos	Handwritten	Historical	Average
Tesseract 5.3	97.8%	91.2%	34.5%	68.7%	73.1%
Google Vision	99.1%	96.8%	78.2%	87.4%	90.4%
AWS Textract	98.7%	95.1%	45.6%	79.3%	79.7%
Azure Read	99.3%	97.4%	81.7%	88.9%	91.8%

The results show a clear trend: Cloud APIs are significantly superior for photos and handwritten text, while Tesseract almost matches them for well-printed documents. The biggest difference is in handwriting – Cloud APIs are 44-47 percentage points ahead of Tesseract.

Error types: Not every error is equal

Recognition rate alone doesn't tell the whole story. We categorized errors into four types:

Critical errors (numbers, IBAN, amounts): "1,000 €" becomes "10,000 €" – can lead to incorrect payments
Semantic errors (proper names, technical terms): "Müller-Lüdenscheidt" becomes "Müller-Lüdenscheide" – affects searchability
Formatting errors (columns, tables): Structure is lost – tables recognized as flowing text
Cosmetic errors (punctuation, spacing): Missing commas or double spaces – affect readability but not content

For printed documents, critical errors account for only 5-8% of total errors. For handwritten notes, the proportion of critical errors rises to 15-22%. Tesseract has a known problem distinguishing similar characters (0/O, 1/l/I, 5/S), which is particularly critical for IBANs and invoice numbers.

Cost comparison

Engine	Cost per 1,000 pages	Setup cost	Minimum purchase
Tesseract	$0	High (infrastructure)	None
Google Vision	$1.50	Low	None
AWS Textract	$1.50 per 1,000 pages	Low	None
Azure Read	$1.50 (Read) / $10 (Layout)	Low	None

Tesseract is free, but infrastructure costs (servers, maintenance, updates) are not negligible. For 10,000 pages per month, cloud costs are $10-15 – clearly less than half a developer day for Tesseract infrastructure.

Data privacy: The elephant in the room

Cloud OCR means: Your documents are sent to servers owned by Google, Amazon, or Microsoft. For personal data (invoices, contracts, applications), this is a GDPR issue. Cloud providers have DPA agreements, but:

Google Vision: Online requests processed in-memory only (not persisted); EU endpoints available (eu-vision.googleapis.com). Google does not use submitted images for training
AWS Textract: Offers EU regions (Frankfurt, Ireland), does not store data permanently
Azure Read: EU regions available, comprehensive compliance certifications (ISO 27001, SOC 2, HIPAA, BSI C5) – comparable to AWS and Google Cloud

For GDPR-critical documents, Tesseract is the safest choice since all data is processed locally. Alternatively, AWS and Azure offer EU regions that enable GDPR-compliant processing.

With PDF to Text or image conversion on wandlio.de, images are processed locally in the browser – no upload, no privacy risk.

Practical tips: Choosing the right engine

Printed documents, GDPR-critical: Tesseract with German language package. 97-98% accuracy is sufficient for most applications
Printed documents, cloud-ok: Azure Read – best accuracy for printed text, good EU support
Photos of documents: Google Vision or Azure Read – both over 96% recognition rate
Handwriting: Azure Read (81.7%) or Google Vision (78.2%) – Tesseract is not suitable here
Historical documents: Azure Read with special training data – or Tesseract with Fraktur model
High volume, cost-sensitive: AWS Textract – $1.50 per 1,000 pages (volume discount above 1M pages/month: $0.60)

Tesseract in detail: Strengths and weaknesses

Tesseract is the only open-source OCR engine suitable for production use. Version 4.0 (2018) introduced LSTM-based neural networks – the biggest accuracy leap. Version 5.x is a C++ modernization with performance and bugfix improvements on the same LSTM engine. Strengths:

Free and privacy-friendly (local processing)
Supports over 100 languages
Integrable in Python (pytesseract), Node.js (tesseract.js), and Docker
PDF input with pdftoppm preprocessor

Weaknesses:

Poor recognition of handwritten text (34.5%)
No automatic layout recognition (tables, columns)
Sensitive to perspective distortion and poor lighting
No automatic post-processing (spell correction)

For simple documents with clear text, Tesseract is sufficient. For complex layouts, handwriting, or historical documents, cloud APIs are the better choice.

Preprocessing: Why good preparation matters more than the engine

An often underestimated factor for OCR accuracy is image preprocessing. Before the text even reaches the OCR algorithm, simple steps can improve recognition by 5-15 percentage points:

Binarization: Converting to black-and-white with adaptive thresholding removes color noise and improves contrast
Orientation correction: Automatic detection and correction of document orientation (0°, 90°, 180°, 270°)
Noise reduction: Median filters and morphological operations remove stray pixels and scan artifacts
Scaling: Upscaling to 300 DPI significantly improves recognition for low-resolution originals

In our tests, simple preprocessing (binarization + orientation correction) improved Tesseract recognition on photos from 91.2% to 94.8% – a 3.6 percentage point improvement without changing the engine. For cloud APIs, the effect is smaller (1-2 percentage points) since they already preprocess internally.

Conclusion

The choice of OCR engine depends on the use case: Tesseract is free and privacy-friendly, but significantly weaker than cloud APIs for photos, handwriting, and historical documents. Azure Read offers the best overall accuracy (91.8% average), followed by Google Vision (90.4%). AWS Textract is cost-effective at high volumes but weaker on handwriting. For GDPR-critical documents, Tesseract remains the safest choice – with acceptable accuracy for printed text. Cloud APIs offer the best recognition rate but require GDPR compliance review.

Processing time and resource usage

Processing speed is an often underestimated factor, especially for large document volumes. Our measurements on a standard server (4 CPU cores, 8 GB RAM):

Engine	1 page (sec)	100 pages (sec)	GPU acceleration	RAM usage
Tesseract	2.1	210	No	200-500 MB
Google Vision	1.5	45	Yes (Google-side)	0 (API)
AWS Textract	2.0 (sync)	120 (async)	Yes (AWS-side)	0 (API)
Azure Read	1.8	90	Yes (Azure-side)	0 (API)

Tesseract is significantly slower than cloud APIs, especially for asynchronous processing. For batch processing of thousands of documents, cloud APIs are over 10x faster at 0.45-0.90 seconds per page. GPU acceleration from cloud providers is the decisive factor.

For real-time applications (mobile apps, web scanners), cloud APIs are critical for latency: Google Vision typically responds in 1-2 seconds, while Tesseract on-device takes 2-5 seconds depending on image size and smartphone CPU.

Combined solutions: Best of both worlds

In production use, a combination strategy has proven effective: Tesseract as fallback for GDPR-critical documents and offline scenarios, Cloud APIs for complex cases (handwriting, photos, historical documents). The decision can be automated:

Step 1: Tesseract processes the document locally
Step 2: If confidence is below 95%, the document is sent to the cloud API
Step 3: For GDPR-critical documents (IBAN, tax returns), only Tesseract is used

This architecture combines the data privacy advantages of Tesseract with the accuracy of cloud APIs. Typically 80-90% of documents are correctly recognized by Tesseract, only the remaining 10-20% require cloud support.