Photo from Unsplash
Originally Posted On: https://aaleenmirza110.medium.com/build-an-ocr-service-in-node-js-express-using-tesseract-js-3bfd17ff0977
Optical Character Recognition (OCR) is a powerful tool for extracting text from images and PDFs. In this guide, we’ll build a modular OCR microservice using Node.js, Express, and Tesseract.js, with support for both image and PDF uploads. We’ll follow best practices for structure, error handling, and file management.
1. Introduction
Optical Character Recognition (OCR) is a powerful technology that enables the extraction of text from images, scanned documents, or handwritten notes. By converting visual information into machine-readable text, OCR opens up a wide range of possibilities — from automating data entry to making printed documents searchable and accessible.
OCR plays a vital role in bridging the gap between the physical and digital worlds. In this article, we’ll explore how to build a simple yet effective OCR service using Node.js, Express, and the Tesseract OCR engine.
Tesseract.js brings the power of Tesseract OCR to Node.js, making it easy to integrate OCR into your web services.
We’ll build a REST API that accepts image or PDF uploads and returns extracted text — using a clean, maintainable architecture.
You can find the complete source code for this OCR service on GitHub. Feel free to explore, clone, or modify the project to suit your needs — it’s completely open source! If you find it helpful or end up using it in your own projects, a
on the repo would be greatly appreciated. Your support helps keep the project alive and encourages further development!
2. Project Setup
mkdir ocr-service
cd ocr-service
npm init -y
3. Project Structure
ocr-service/
├── src/
│ ├── controllers/
│ │ └── ocrController.js
│ ├── routes/
│ │ └── ocrRoutes.js
│ ├── services/
│ │ └── ocrService.js
│ ├── temp/ # Temporary files for processing
│ ├── server.js
│ └── eng.traineddata # Optional: Language data
├── .gitignore
├── package.json
└── README.md
4. Installing Dependencies
We’ll need the following:
express– Web frameworkmulter– For handling file uploadstesseract.js– OCR enginesharp– Image processingdotenv– Environment configcors– CORS supportnodemon– Auto-restart for devchild_process– To runpdftoppmfor PDF conversion
Install with
npm install express multer tesseract.js sharp dotenv cors
npm install --save-dev nodemon
Note: PDF support requires
pdftoppm.
macOS:brew install popplerUbuntu:
sudo apt-get install poppler-utils
5. Setting Up the Express Server
src/server.js
import express from 'express';
import cors from 'cors';
import dotenv from 'dotenv';
import ocrRoutes from './routes/ocrRoutes.js';
dotenv.config();
const app = express();
const port = process.env.PORT || 3000;
app.use(cors());
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
app.use('/', ocrRoutes);
// Centralized error handler
app.use((err, req, res, next) => {
console.error('Error caught by central handler:', err.stack);
res.status(500).json({ error: err.message || 'Something went wrong!' });
});
app.listen(port, () => {
console.log(` OCR server listening at http://localhost:${port}`);
});
6. Designing the Modular Architecture
We break our logic into:
- Routes: Handle endpoints & uploads
- Controllers: Handle requests/responses
- Services: Core logic for OCR & file processing
This promotes clean separation of concerns, testability, and scalability.
7. Implementing the OCR Service
src/services/ocrService.js
import { createWorker } from 'tesseract.js';
import sharp from 'sharp';
import fs from 'fs/promises';
import path from 'path';
import { fileURLToPath } from 'url';
import { exec } from 'child_process';
import { promisify } from 'util';
const execAsync = promisify(exec);
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const tempDir = path.join(__dirname, '../temp');
await fs.mkdir(tempDir, { recursive: true });
const performOCR = async (imageBuffer) => {
const worker = await createWorker('eng');
try {
const { data: { text } } = await worker.recognize(imageBuffer);
return text;
} finally {
await worker.terminate();
}
};
const performPDFOCR = async (pdfBuffer) => {
let worker = null;
const tempFiles = [];
try {
const pdfPath = path.join(tempDir, `temp_${Date.now()}.pdf`);
await fs.writeFile(pdfPath, pdfBuffer);
tempFiles.push(pdfPath);
const outputPrefix = path.join(tempDir, `page_${Date.now()}`);
await execAsync(`pdftoppm -png -r 300 "${pdfPath}" "${outputPrefix}"`);
const files = await fs.readdir(tempDir);
const pageFiles = files
.filter(file => file.startsWith(path.basename(outputPrefix)))
.sort();
worker = await createWorker('eng');
let extractedText = '';
for (const pageFile of pageFiles) {
const pagePath = path.join(tempDir, pageFile);
tempFiles.push(pagePath);
const imageBuffer = await fs.readFile(pagePath);
const processedImage = await sharp(imageBuffer).sharpen().toBuffer();
const { data: { text } } = await worker.recognize(processedImage);
extractedText += text + 'nn';
}
return extractedText.trim();
} finally {
if (worker) await worker.terminate();
for (const file of tempFiles) {
try { await fs.unlink(file); } catch {}
}
}
};
export default { performOCR, performPDFOCR };
8. Creating Controllers
src/controllers/ocrController.js
import ocrService from '../services/ocrService.js';
const handleHealthCheck = (req, res) => {
res.json({ message: 'OCR server is running!' });
};
const handleOCRRequest = async (req, res, next) => {
if (!req.file) {
return res.status(400).json({ error: 'No file uploaded' });
}
try {
const text = await ocrService.performPDFOCR(req.file.buffer);
res.json({ text });
} catch (error) {
next(new Error('PDF OCR processing failed'));
}
};
export { handleHealthCheck, handleOCRRequest };
9. Defining Routes
src/routes/ocrRoutes.js
import express from 'express';
import multer from 'multer';
import { handleHealthCheck, handleOCRRequest } from '../controllers/ocrController.js';
const router = express.Router();
const upload = multer({ storage: multer.memoryStorage() });
router.get('/', handleHealthCheck);
router.post('/ocr', upload.single('image'), handleOCRRequest);
export default router;
10. Testing the Service
You can test the OCR service using curl or Postman:
Image or PDF Upload
curl -X POST http://localhost:3000/ocr
-F "image=@/path/to/your/file.png"
curl -X POST http://localhost:3000/ocr
-F "image=@/path/to/your/file.pdf"
Sample Response:
{
"text": "Extracted text from your PDF or image..."
}
Error Handling
All errors are caught by centralized middleware and returned as JSON:
{
"error": "PDF OCR processing failed"
}
Environment Variables
You can use a .env file to configure settings like PORT, future API keys, etc.
Conclusion
You now have a working OCR microservice that:
- Accepts images and PDFs
- Extracts text using Tesseract.js
- Follows a modular and clean architecture
- Cleans up temp files automatically
on the repo would be greatly appreciated. Your support helps keep the project alive and encourages further development!