[Jabref-devel] pull-request (PDF-file metadata: Privacy Filtering all metadata)

a***@tortenboxer.de

2014-04-23 21:04:53 UTC

Dear JabRef developers,
[cc: to Joeran Beel from Docear who recently posted on PDF metadata on this list]

You might have been told by github already, but just to be sure the information comes across: I just submitted a pull-request[*] for a correction that should close bug #869 on the sourceforge tracker:
https://sourceforge.net/p/jabref/bugs/869/
(XMP Export Privacy Settings Ignored)

Below is a copy of the comment accompanying the pull-request, with some technical details. Do you have any comments on the potentially destructive privacy policy discussed in the last paragraph ?

If this is confirmed to close bug #869, and one of you maintainers pulls the commits into the main branch, it means there is no good reason *not* to tag PDFs from now on. And hopefully the number of PDF preprints that you can simply drag-and-drop on JabRef/Docear/whatever will steadily increase !

happy hacking,
Adrian

[*]
https://github.com/JabRef/jabref/pull/8

This pull-request pertains to the addition of metadata to PDF files associated with entries, as triggered by the menu entry "Write XMP metadata to PDFs" in the "Tools" menu. XMP is an extremely interesting feature that allows tagging PDF files (amongst others) with automatically retrievable metadata in much the same way mp3-tags allow adding title/author/... information to mp3 music files. Actually JabRef exports the metadata not only to two XMP namespaces (Dublin Core and a custom JabRef namespace), but also to the PDF DocumentInformation Object.

Practically from the beginning of the XMP-writing capabilities of JabRef, Christopher Oezbek had added *privacy filtering* for the XMP-tagging of PDF-files with data from the bibtex-record, meaning that the user could define a list of fields (in Preferences->XMP metadata) which should *not* be exported to the PDF file. Unfortunately, the filtering was incomplete: jabref exports the metadata in three different forms, only one of which was originally filtered. In 2013 filtering was extended to both XMP namespaces, but JabRef still exported all fields into the PDF DocumentInfo object. The two present commits correct this problem. The first (b45316f) prevents private fields from being exported to the PDF DocumentInfo. The second one more agressively erases these fields even if they already exist in the PDF document.

The deletion of existing fields might be debateable. It seems the right thing to do for fields clearly generated by JabRef (viz. those prefixed by "jabref/"), but there are four fields which might be of other origin (Author,Title,Subject and Keywords). Making a systematic exception for these four fields, i.e. not erasing them even if they are privacy filtered, is a bad idea and violates the principle of least surprise. This is why the second commit makes no exception. Deactivating the erasure for the four generic fields could however easily be added as an option in the XMP export preferences if it is judged important. The current behaviour has the advantage of reliably correcting PDF files previously tagged with a buggy privacy filtering.

If these commits are pulled into the master branch and confirmed to work, the bug #869 on the sourceforge tracker:
https://sourceforge.net/p/jabref/bugs/869/
can be closed.