HeraldicArt.org: Traceable Art | Emblazons | Blog

Downloading Armorials from the Österreichische Nationalbibliothek

The Austrian National Library, Österreichische Nationalbibliothek, hosts online scans of a number of fifteenth- and sixteenth-century manuscripts which may be of interest to armorial researchers, but sadly their website lacks a PDF download or bulk export feature.

To facilitate offline viewing and transfer to other repositories, we can use a little JavaScript in a browser with “developer mode” enabled to generate a batch of command-line download commands that will retrieve an entire volume.

Start by visiting the ONB web viewer for a manuscript such as the Wappenbuch des André de Rineck. While you are viewing the first page of the manuscript, open the browser’s console window and paste in the following bit of JavaScript:

window.snap_and_step = function () { src = $('img.imageTile[src!="images/loading4.png"]').first().attr('src'); console.log( 'curl -o page-' + src.match('img=00000([0-9]+)')[1] + '.jpg "https://digital.onb.ac.at/RepViewer/' + src.match('(.*?)&[a-z]=')[1] + '&s=1.0" && sleep 5' ); navigateNextPage(); setTimeout(window.snap_and_step, 5000) }; snap_and_step();

Then sit back and wait. The process is designed to run slowly, with a five-second delay between pages, in part to avoid putting any unusual load on the ONB’s web server. (You can estimate the number of minutes required by dividing the page count by twelve.)

As it runs, it will output a series of commands in the console log similar to the following, one line for each page:

curl -o page-001.jpg "https://digital.onb.ac.at/RepViewer/image?doc=DOD_50607&img=00000001.jp2&hash=5af078d29f168079215cfe7fb5bec13c59557c6cc26619236e6d08fe2e928319329f9ab8be9a5ac9e0a4039b11c0&s=1.0" && sleep 5
curl -o page-002.jpg "https://digital.onb.ac.at/RepViewer/image?doc=DOD_50607&img=00000002.jp2&hash=5af078d29f168079215cfe7fb5bec13c59557c6cc26519236e6d08fe2e928319329f9ab8be9a5ac9e0a4e0d2a662&s=1.0" && sleep 5
curl -o page-003.jpg "https://digital.onb.ac.at/RepViewer/image?doc=DOD_50607&img=00000003.jp2&hash=5af078d29f168079215cfe7fb5bec13c59557c6cc26419236e6d08fe2e928319329f9ab8be9a5ac9e0a408c536c3&s=1.0" && sleep 5

When it reaches the last page of the text, it will repeatedly output the same command multiple times; at some point I’ll get around to refining the code to detect this condition and exit. You can interrupt the process by pasting the following bit of code in to the console:

window.snap_and_step = 0

Then, copy all of the console log output and paste it into a text file named something like fetch.sh. Save that file in a new folder on your computer. Open a command-line terminal window and change your working directory to that folder, then run the commands:

$ cd andre-de-rineck
$ sh fetch.sh

Then sit back and wait again. This process also has a built-in delay as it fetches high-quality JPEG images of each page and saves them as numbered page files in the current directory.

When the process completes, you’ll have a directory full of images which can be browsed individually, packaged into a PDF, or uploaded to another hosting service such as archive.org.

Leave a Reply

Your email address will not be published. Required fields are marked *