Errors in quotations from pdfs

David S. shared this question 28 days ago
Answered

When I add quotations from pdfs, very frequently errors are made in the quotation. The formatting is typically off, with extra spaces being inserted, extra line breaks, letters repeated, passages from another section of the page being picked up and inserted into the quotation, etc.

The significance of the problems is at least twofold: 1) it slows my workflow down considerably as I have to carefully proofread each quotation to find and correct the errors, and 2) it leads to many errors in the quotations since I am not a good enough proofreader to catch them all.

Is anyone else experiencing this problem? My guess is that the problem lies in the formatting of the pdf document itself, rather than in Citavi. Is that correct? Either way, is there any method to address this aggravating problem?

Thanks in advance for any help!

Comments (8)

photo
1

Hello David

Your guess is very likely correct. If you like, could you please send us one sample PDF file with which you experience these issues?

To transfer the files, please use our secure upload service

https://www.citavi.com/transfer


So that we can match the files to the issue, please include the ticket ID #159577

We'll have a look at what might cause the problems and maybe find some relief.

Best regards

Sebastian

photo
1

Thanks, Sebastian. Your response was quick! Typical great customer service by the Citavi Team. Very much appreciated.

I’ve attached two one-page samples from documents to illustrate the problem, as you requested. The files are named “Citavi Sample 1” and “Citavi Sample 2”.  I’ll just refer to them from now on as #1 and #2 for the sake of brevity.

#1 has no highlighting. It is a portion of a scholarly article I downloaded as a pdf directly from the internet. Also attached is a screenshot showing what happens when I try to add a quotation from #1 as a reference to Citavi. As you can see, I am unable to limit my selection of text to just the first column – I am forced to pick up unwanted text from the second column also. It also completely misses words in the first column.

#2 is a portion of another pdf that I downloaded from the internet. I highlighted it prior to adding quotes into Citavi. You can see the highlighted portion at the bottom of the document.  A screenshot of what happens when I try to add that highlighted portion as a quotation within Citavi is also attached. It is not as bad as #1, but still has numerous issues. First, the formatting is all messed up, creating line returns in Citavi that do not exist in the original. Second, spaces are inserted into words. Note the space in the word “Moab”. Also, extra letters are inserted – note “E Eglon” and “14 th h century BC E”.

Those are representative samples. They require me to do a lot of cleaning up in order to add the quotations to references in Citavi.

My guess is that the overwhelming majority of pdfs have these types of issues.

Thanks again so much for any help that you can provide.

David Spoede

photo
1

Hello, David

I had a look at both files, both were created with Nitro PDF and have text selection issues with all PDF viewers I checked (that's Adobe Acrobat Reader DC and PDFXChange Viewer). I assume that the text recognition feature of the software initially used to extract the text from the images has difficulties aligning the columns and lines of the text, which results in the problems you mention.

Did you create those PDFs yourself from a scanned image?

Best regards

Sebastian

photo
1

Thanks, Sebastian.

I did not create either PDF myself. Rather, I downloaded both from the web. I do use Nitro as my PDF app since it is much less expensive than Acrobat. Nitro may have been designated as the app that created the PDF when I pulled out the individual pages from the originals  to send to you.

David

photo
1

Hello, David

Could you please point me to the original sources of those PDFs?

While it's not very likely that we can do something about this issue in Citavi (given that Adobe and PDFXChange both fail to handle them well), we'd like to keep track of the tools that generate such problematic files in the first place. Thank you.

Best regards

Sebastian

photo
1

Sebastian:

Thanks again. You and the rest of the Citavi support team are very diligent and persistent. Very much appreciated!

Anyway, here’s the link to the original of Sample 1: (26) (PDF) “The White Slip I of Tell el-Dab’a and Thera: Critical Challenge for the Aegean Long Chronology“, The White Slip Ware of Late Bronze Age Cyprus, Proceedings of an International Conference Organized in Honour of Malcolm Wiener, Nicosia (29–30 October 1998), 2001, pp. 195–202. | Malcolm H Wiener - Academia.edu

And Sample 2: The Israelite Conquest: History or Myth (unisa.ac.za)

It’s interesting, though. I was just playing around with the pdf from the link at Sample 2, and that pdf seems somewhat “cleaner” than the one I uploaded to Citavi. Many of these articles can be found at multiple locations on the internet, so perhaps I originally downloaded it from a different site. Who knows? It’s frustrating.

Take care,

David Spoede

photo
1

Citavi Team:

I just received the email below from you and am not sure what to make of it.

David Spoede

photo
1

Hi David

A a bored person has attached nonsense text to some forum postings. I have banned his account and deleted his posts.

Kind regards,

Peter