Ideas….

a blog for me to record thoughts and ideas

Browsing Posts tagged XML

TEI Publishing

No comments

Ways to Publish Your TEI Documents

  • CSS — just to make it look pretty
  • XSLT — to transform it into another thing (like XHTML)
  • XML Databases — to query it.  a good opensource DB is  eXist.
  • XML publishing systems — allows people to install software and publish documents.  an example of this is XTF and TEI Publisher.

More TEI Markup Notes

No comments

Encoding Parallel Structures

This is useful for mapping one thing to another, in the below example we are translating french to english. Essentially you input an xml:id attribute the corresponds to another xml:id attribute. Below is an example of this.

<lg type=”stanza” xml:lang=”fr”>
<l xml:id=”fr2.01″ corresp=”#en2.01″>Nos péchés sont têtus, nos repentirs sont lâches;</l>
<l xml:id=”fr2.02″ corresp=”#en2.02″>Nous nous faisons payer grassement nos aveux,</l>
<l xml:id=”fr2.03″ corresp=”#en2.04″>Et nous rentrons gaiement dans le chemin bourbeux,</l>
<l xml:id=”fr2.04″ corresp=”#en2.03″>Croyant par de vils pleurs laver toutes nos taches.</l>
</lg>

<lg type=”stanza” xml:lang=”en”>
<l xml:id=”en2.01″ corresp=”#fr2.01″>Our sins are stubborn, craven our repentance.</l>
<l xml:id=”en2.02″ corresp=”#fr2.02″>For our weak vows we ask excessive prices.</l>
<l xml:id=”en2.03″ corresp=”#fr2.04″>Trusting our tears will wash away the sentence,</l>
<l xml:id=”en2.04″ corresp=”#fr2.03″>We sneak off where the muddy road entices.</l>
</lg>

More Complex Parallelism

Its also possible to link to multiple elements, so in the below example mapping the french to multiple versions of the english.

<linkGrp type=”alignment”>
<link targets=”#fr2.01 #en-a2.01 #en-b2.01 #en-c2.01 #en-d2.01″/>
<link targets=”#fr2.02 #en-a2.02 #en-b2.02 #en-c2.02 #en-d2.02″/>
<link targets=”#fr2.03 #en-a2.03 #en-b2.03 #en-c2.04 #en-d2.03″/>
<link targets=”#fr2.04 #en-a2.04 #en-b2.04 #en-c2.03 #en-d2.04″/>
</linkGrp>

So essentially you’re aligning parallel versions of a translation to one another. The poem lines are then given ids so that everything matches up. If you wanted to you can also link to another file rather than an id within the document.

Choice Elements

Allows you to show different spellings, or just choose between options.

<p>…with them, bycause they woulde
<lb/>not be
<choice>
<abbr>bo?de</abbr>
<expan>bounde</expan>
</choice>
also for an other wo
<lb/>m? at theyr pleasure, whom they
<lb/>knewe not, nor yet what matter</p>

So you can have mark up that looks at the original reading and then you can see the corrected version. So if you want to view the spelling mistakes or whatever you can, but if you want to only see the corrected version you can.

Revision Processes/Editing Processes

It’s possible to look at revised manuscripts and step through the revision process. So if you can determine what was added and deleted you can include it in your mark up

<lg>
<head>After <del>an</del><add>the <del>unsolv’d</del></add> argument</head>
<l><del>The</del><add><del>Coming in,</del> A group of</add> little children, and their
<lb/>ways and chatter, flow in <del>upon me</del></l>
<l>Like <add>welcome</add> rippling water o’er my
<lb/>heated <add>nerves and</add> flesh.</l>
</lg>

There is also a more complex version of this that allows you to express the different versions of the text (assuming that you can determine it from the original). In the below example you can see how a person was trying to determine what it was in the original document.

<p>Johnston etc 1764 Mr Nikl<unclear>e</unclear>
<supplied>s</supplied><gap reason=”folded” extent=”unknown”/> Brown
<unclear>&amp;Co</unclear> to me George <unclear>Beverly juner</unclear>
to ten Rum Barels at Four pound &per; Barel — — — £40</p>

Slides For this Talk

The slides for this talk can be found on Brown’s Website

First Generation of Civil War Letters Project

Hamilton encoded administrative documents and letters from the civil war. They used TEI and Dublin Core metadata with their documents (for historical reasons they used Dublin Core).
They then created a PHP interface. The Dublin Core description was used as the “abstract” for the record and the text appears when you view the “full record”. The pages themselves were transformed using XSLT into HTML and then a CSS was used to style the page.

They created drop down boxes for person’s names, organizations, place names, and geographic features.

They also included images of the page. A link was encoded into the TEI to link to the images of the actual documents.  The TEI elements that you might want to use to do this would be a page break; below is an example of code that would do this:

<pb facs=”./page/image/here.jpg” n=”43″>

Second Generation of Civil War Letters Project

These letters were encoded and then loaded into a database called eXist-db. Since the db is already indexed it makes it easier to find data.

The Dublin Core records were replaced by ContentDM. So the records are in ContentDM, his search interface is in PHP, again with drop down boxes pulled from the eXist-db. The dropdown boxes then links to canned searches in ContentDM. The “search” then taxes you to records in ContentDM

If you want to view the full text of the letters, a link in the ContentDM record takes you outside of ContentDM to a webpage. The page it links to is TEI transformed by XSLT into HTML and stylized using CSS. There are links in the TEI that point toward the digital image of the original text.

Below is a link to the current digital collection:

http://elib.hamilton.edu/hc/hcbrowse.php?id=col_spe-civ