Case Study

Digitization of Medieval and Modern History of British Isles for Institute of Historical Research, University of London, UK


Project: XML Conversion – TEI XML Conforming to TEI P5 Standard

Industry: University Institute

DAISY Conversion Services

Client background:

The client, Institute of Historical Research at the University of London (, began a project seeking to digitize the valuable primary and secondary sources pertaining to the modern and medieval history of the British Isles. After undertaking the task, however, the client realized that it is not always possible to digitize old documents using the OCR technique alone. Because of the diversity of typefaces and fonts, the OCR machines frequently failed to read and digitize the documents accurately. The client, thus, decided to adopt a more rigorous approach to the conversion of these documents.

Project Requirements:

The said documents, throwing valuable light upon the modern and medieval periods of the history of the British Isles, had to be digitized and converted into the XML format. SunTec's association with the project began in 2008 when it had entered its third phase. The target set for the third phase was the digitization of 300 Calendars of State Papers over the next twelve months. The agency in charge of the execution of the project, British History Online, decided to send SunTec seven titles from the series every month over the next one year. Each title, at around 680 pages, was quite voluminous.

The Challenges:

  • Due to the great diversity of layouts, fonts and typefaces, OCR tools were going to be ineffective on these documents.
  • Throughout, we were expected to maintain a very high level of accuracy (99.995%). Since OCR tools could not be used, we had to employ the double keying method to attain the desired accuracy level.
  • We had to meticulously follow a DTD prepared by the client. The DTD contained elaborate instructions as to how the individual typography of each book was to be handled.
  • The client had uploaded on our FTP server the notes files, controls and scans for the publication. We had to deliver the documents in the greyscale or bitmap formats after having scanned them at 400 dpi

Our Solution:

We put together a team of professionals extremely proficient in document digitization solutions. To ensure that the project is turned around in conformity with the guidelines of British History Online, the work was overseen by a dedicated Project Manager. Especially for this project, SunTec pioneered an XML workflow besides also implementing the DTD specified by the client. As a result we were able to,

  • Quickly and accurately convert the documents provided to us into the XML format as per the Text Encoding Initiative's (TEI) p5 standards.
  • Turn around the project on time while maintaining a 99.995% accuracy.
  • Enable the XML output for multi-channel publishing. Besides being printed, the content could now also be made available online or on mobile devices.


  • Our efficiency and quick turnaround time helped the client make significant cost savings.
  • The British history sources were now made fully searchable and multi-platform compatible. This was a great boon for scholars and researchers.
  • We throughout abided by the client's DTD and maintained a 99.995% accuracy using the double keying method.
Connect with SunTec Digital!

For more information on our services, kindly write to us at or for a FREE Sample job, kindly fill out our online form.

Ready to talk?
Contact Us +919311468458 +919311468458 Free Trial

Explore More

Resource Library

WhatsApp us