Help System - Optical Character Recognition (OCR)

Introduction to OCR

Select pdf file to perform OCR on

Select destination folder

 

Tesseract needs to be installed for this module to work - see below

 


Install Tesseract (Step-By-Step Tutorial)


 

How to Download Tesseract OCR in Windows

  1. Download Tesseract Installer for Windows

  2. Install Tesseract OCR

  3. Add installation path to Environment Variables

  4. Run Tesseract OCR

1. Download Tesseract Installer for Windows

To use Tesseract command on Windows, we first need to download Tesseract OCR binaries .exe Windows Installer.

There are many places where people can download the latest version of Tesseract OCR. Once such place is from UB Mannheim, which is forked from tesseract-ocr/tesseract (Main Repository).

Download the tesseract-ocr-w64-setup-5.3.0.20221222.exe (64 bit) Windows Installer.

Tesseract can be installed in Python prompt on macOS using either of the commands below:

brew install tesseract
sudo port install tesseract

2. Install Tesseract OCR

Next, we'll install Tesseract using the .exe file that we downloaded in the previous step. Launch the .exe installer to start Tesseract installation

Installer Language

Once the unpacking of the setup is completed, the installer's language data dialog will appear. You can install Tesseract to use multiple languages by selecting additional language packs, but here we'll just install the language data for the English language.

Install Tesseract, Figure 2: Tesseract Installer

Tesseract Installer

Click OK and the Installer language for Tesseract OCR is set.

Tesseract OCR Setup

Next, the setup wizard will appear. This Setup Wizard will guide the Tesseract installation for Windows.

Install Tesseract, Figure 3: Tesseract OCR

Tesseract OCR Setup Wizard

Click Next to continue the installation.

Accept License Agreement

Tesseract OCR is licensed under Apache License Version 2.0. As it is open source and free to use, you can redistribute and modify versions of Tesseract without any loyalty concerns.

Install Tesseract, Figure 4: Tesseract License

Tesseract OCR is licensed under Apache License v2.0. Please accept this license to continue with the installation.

Click I Agree to proceed to installation.

Choose Users

You can choose to install Tesseract for multiple users or for a single user.

Install Tesseract, Figure 5: Tesseract Choose Users

Choose to install Tesseract OCR for the Current User (you) or for all user accounts

Click Next to choose components to install with Tesseract.

Choose Components

From the components list to install, ScrollView, Training Tools, Shortcuts creation, and Language data are all selected by default. We will keep all of the default selected options. You can choose any or skip any component based on the needs. Usually all are necessary to install.

Install Tesseract, Figure 6: Tesseract Components

Here, you can choose to include or exclude Tesseract OCR components. For the best results, continue the installation with the default components selected.

Click Next to choose installation location.

Choose Installation Location

Next, we'll choose the location to install Tesseract. Make sure you copy the destination folder path. We will need this later to add the installation location to the machine's path Environment Variable.

Install Tesseract, Figure 7: Tesseract Install Location

Select a install location for the Tesseract OCR library, and remember this location for later.

Click Next to further setup the installation of Tesseract.

Choose the Start Menu Folder

This is the last step in which we will create shortcuts in Start menu. You can name the folder anything but I've kept it the same as default.

Install Tesseract, Figure 8: Tesseract Start Menu

Choose the name of Tesseract OCR's Start Menu Folder

Now, click Install and wait for the installation to complete. Once the installation is done, following screen will appear. Click Finish and we are done with installing Tesseract OCR in Windows successfully.

Install Tesseract, Figure 9: Tesseract Installer

Tesseract OCR Installation is now complete.

3. Add Installation Path to System Environment Variables

Now, we will add the Tesseract installation path to Windows' Environment Variables.

In the Start menu, type "environment variables" or "advanced system settings"

Install Tesseract, Figure 10: System Path Variables

The Windows System Properties Dialog Box

System Properties

Once the System Properties dialog box opens, click on the Advanced, and then click the Environment Variables button, located towards the bottom right of the screen.

The Environment Variables dialog box will be presented to you.

Environment Variables

Under System variables, click on the Path variable.

Install Tesseract, Figure 11: Environment Variables

Accessing the Windows' System Environment Variables

Now, click Edit.

Add Tesseract OCR for Windows Installation Directory to Environment Variables

From the Edit environment variable dialog box, click New. Paste the installation location path which was copied during the second step, and click OK.

Install Tesseract, Figure 12: Edit Environment Variable

Edit Windows' Path System Environment Variable by adding an entry that includes the Absolute path to the Tesseract OCR installation

That's it! We have successfully downloaded, installed, and set the environment variable for Tesseract OCR in Windows machine.

4. Run Tesseract OCR

To check that Tesseract OCR for Windows was successfully installed and added to Environment Variables, open Command prompt (cmd) on your Windows machine, then run the "tesseract" command. If everything worked fine, then a quick explanation usage guide must be displayed with OCR and single options such as Tesseract version.

Install Tesseract, Figure 13: Edit Environment Variable

Run the tesseract command in Windows Command line (or Windows Powershell) to make sure that the above installation steps were done correctly. The console output is the expected result of a successful Windows installation.