PDFoutline.py Help
Maintaining a PDF's Bookmarks and Meta Information
This is a Python script for maintaining a PDF document's bookmark tree, also known as the "table of contents (TOC)".
It is based on PyMuPDF version 1.9.1 (or above) and wxPython version 3.0.2 (or above).
Output
Once you have finalized the new TOC version, you can save the result in the current file (the PDF incremental update technique is used in this case) or under a new filename. If the input has been decrypted or if it has an inconsistent PDF file structure, you must save your work under a new filename. Encryption is not supported as an output option.
At any time, you may interrupt your work: Press the Check Data button before quitting. The current state of your work will be saved under <PDF-filename>.json
in the PDF's directory. The PDF itself will remain unchanged. On resuming work, this parameter file will be offered to restore the previous state. If you deny, it will be deleted. It will also be deleted, if you press SAVE. The parameter file will be updated every 60 seconds while the program is awaiting input ("auto-save feature").
Main Dialog
The program's dialog consists of the following main parts:
The right part is used to display the current PDF page. If bookmarks point to this page, their target location is indicated as a horizontal red line. You can navigate in the file via buttons, the mouse wheel or by entering a specific page number.
The upper left part consists of a grid where the TOC is displayed in tabular format.
The lower left part displays file and document meta information.
Maintaining File Information
You can change some metadata fields (author, title, subject and keywords), others are automatically set:
- /CreationDate = current date-time (if original is invalid)
- /ModDate = current date-time
- /Producer = "PyMuPDF" (if original is empty)
- /Creator = "PDFoutline.py" (if original is empty)
Maintaining the TOC
Upon program start, existing bookmark (outline) entries are extracted using method getToC(simple = False)
. This information is displayed in the TOC grid in tabular format. It contains the four columns Level, Title, Page and Height. Any of this information can be changed any time without changing the underlying PDF.
If the PDF currently does not contain bookmarks, a dummy entry will be displayed in the grid.
Maintaining TOC Rows
- Delete a row: CTRL+Click row number.
- Duplicate a row: DoubleClick row number. The new row will be inserted above the current one.
- Move a row: left-click and hold row number, then drop it at the desired location.
There is no separate new row function. If a PDF contains no outline at all, a dummy bookmark is displayed in the grid to serve as a template for new entries.
Maintaining Grid Cells
You can overtype any cell at any time. Pressing ENTER will move to the next cell to the right (wrapping around to next row). On double-clicking any cell that is not currently selected, the corresponding PDF page will be displayed. The following explains any special cell behaviors:
- Title: Text entered will be indented automatically depending on the entry's level. The column width will be auto-adjusted.
- Height: This denotes the distance of the bookmark's destination from the bottom of the page in so-called user units (PDF terminology). This usually equals pixels and 72 pixels equal one inch. It should be noted, that MuPDF (and PyMuPDF) correctly handles a page's presentation in terms of its CropBox (if present, MediaBox otherwise). The Height information you supply here will therefore be translated into the following destination PDF dictionary:
[page /XYZ 0 height 0]
. If you type a number in this cell, the entry's page is automatically displayed and a red horizontal line indicates the entered height coordinate. If you double-click the cell, the page is also displayed and a slider pops up to let you choose this value. If the PDF's original TOC did not have a Height information, a value of -1 will be displayed initially.
Validating Input
Error checking your input will only take place when you press Check Data. The following will be checked:
- The hierarchy level of row 1 must be 1.
- Hierarchy levels in sequence must not increase by more than 1 (but can decrease by any number).
- Titles must not be empty.
- Page numbers are 1-based and must be in range
[1, pageCount]
- Height must be greater than zero and must not exceed the page's height. If height had been initialized with -1, it will be automatically corrected by page height minus 36 pixels (0.5 inches = 1.27 cm).
Finishing Work
Changing TOC information will disable the SAVE button. Press Check Data to validate your input and to enable the save button. As mentioned above, Check Data / QUIT will save your input but will not change the PDF.
Exception Handling
If an exception occurs while saving the PDF, traceback information will be displayed and also saved under filename <PDF-filename>.txt
in the PDF's directory.