Problem selecting OCR text in scanned PDF

Erik B. shared this question 2 years ago
Answered

I scanned a document and edited it with the free app PDF-Xchange Editor. I applied OCR. I can select/highlight and copy the text in both the editor and in Acrobat Reader, but when I add the pdf to a reference in Citavi, I cannot. It is not a secure document, so I don't know what's wrong. The funny thing is that Citavi recognizes the text when I use the search function. So why can't I select any text?

Replies (3)

photo
1

Maybe the text in the PDF file is slightly rotated? Then Citavi can no longer draw a frame around the text. (I don't know how many degrees of rotation Citavi still accepts.)

I would try to edit the scanned PDF in another OCR program, preferably Acrobat DC or ABBYY FineReader

All the best

Lee

photo
1

Thanks. I checked everything again. I re-performed OCR from the original scan, this time making sure to straighten the pages properly. Still the same result. What bugs me is that Citavi clearly regonizes the text if I use the search function. I guess it has something to do with my editor. I'll see if I can try another OCR editor.

photo
2

Dear Erik,

Thanks for your questions.

The PDF file may be corrupted. Please try repairing it as follows.

We have provided a tool for you to download at:

https://we.tl/t-rnFkkN6btv

Please note that all highlights and comments in the PDF file will disappear. The PDF file will be redrawn.

To use the tool, please follow these steps:

Set up PDF Repair

  1. Unzip the ZIP file. Save the "PDF Repair Tool" folder in any location.
  2. Right-click on the "PdfRepair.exe" file in the unzipped folder. Select the "Send to" > "Desktop (create shortcut)" command.
  3. Locate the Ghostscript directory and copy the path to the GSWIN64c.exe file to the clipboard, e.g.
    C:\Program Files\gs\gs9.16\bin\gswin64c.exe - or -
    C:\Program Files\gs\gs9.52\bin\gswin64c.exe
  4. Right-click the icon of the newly created shortcut on your desktop.
  5. Select Properties > Shortcut tab.
  6. In the "Target" field, add a space after "...\PdfRepair\PdfRepair.exe" and then: --path "C:\Program Files\gs\gs9.52\bin\gswin64c.exe"
  7. Close the window. The tool is now ready for use.

Repair PDF files

  1. Drag+drop a problematic PDF file onto the shortcut to the "PdfRepair.exe" file.
  2. A DOS window appears and informs about the progress.
  3. A copy of the repaired PDF file is created. -Repaired is added to the file name.

After repairing a PDF file, it appears that the addition must be entered again in the shortcut.

Reduce file size if necessary

You will notice that the repaired PDF files are larger than the original. To change this, do the following:

  1. Open the PDF file using Adobe Acrobat (not the free Adobe Reader).
  2. From the File menu, choose Save File As Other > Reduced Size PDF.

Best regards

Susanne

photo
1

Thank you, Susanne.

Unfortunately, I don't have Ghostscript, and I can't install anything myself on the computer (I am not the admin).

I upload a test pdf here, so that you or anyone else can try it. This is a simple scan from a printed document, to which I have applied OCR in the PDF-Xchange Editor. It does seem like Citavi can't read this file properly. The OCR'd text is perfectly understood when I open the file in Adobe Acrobat Reader, by the way.

Maybe it is possible to have Citavi read this kind of document produced by PDF-Xchange Editor properly in a future update?

photo
1

Dear Erik,

Thank you for your reply.

I have followed the steps to repair the PDF file - now annotating in Citavi does work (see attached file).

I will create a bug ticket, but unfortunately I cannot promise a solution.

Best regards

Susanne

photo
1

Thank you for the quick reply. This will do for now.

photo
1

Dear Erik,

Thank you for your kind reply.

Best regards

Susanne

photo
Leave a Comment
 
Attach file (NO confidential documents!)