In this article we will explain how OCR works. OCR stands for “Optical Character Recognition”. This means that a computer can use OCR to recognize text on a scanned image and then convert it into a simple text document.
This is how OCR works
Imagine you have received a paper presentation from a colleague. You now want to edit this on the PC because you do not like individual passages. So you scan it and run your OCR tool. Now the following happens:
- The software first does a so-called layout analysis. To do this, she looks at the structure of the page and separates images from text. It also remembers their position on the page. Then the number of paragraphs is counted and individual elements such as page numbers are saved.
- Now comes the hard part. The software looks at the individual text blocks and breaks them down into sentences. The sentences are then split into individual words and the words then into letters.
- The OCR software contains patterns of letters and characters. The program now compares the scanned letters with these samples. If they are 99% similar, the algorithm decides that it probably has to be that letter. He is very precise here because he can compare many patterns in a short time. This is how he successfully differentiates between an “8” and a “B”.
- So the letters and characters are gradually recognized. Then they are combined again as words and placed back in their place in the sentence. As soon as the software is finished, the whole thing is saved in a normal document, which you can then edit. Finished!
OCR with Foxit Reader: Convert PDF to Text
If you want to convert the content of a scanned PDF document into text, you can do this with Foxit Reader via OCR. In this practical tip, we will show you exactly how to do this.
Open Foxit Reader as usual and click on “File”> “Open” at the top.
Here, find the file that you want to convert to text. Select this and choose “Open”.
Wait until the document has completely loaded in the program.
Then click on “OCR” at the top and select “Current Document”.
Enter a location and name for the document and select the desired file type.
With a click on “Save” the document is converted into the respective file type and saved.