INTRODUCTION TO XML BY EXAMPLE

 

NOTE:

This is a draft unpublished article used for EDUCATIONAL purpose in San Jose State University CS160 by Richard Sinn ONLY. DO NOT reproduced.

 

INTRODUCTION

While humans communicate verbally, companies communicate with documents. In a traditional setting, companies will have application forms, memos, account receivable documents, purchase orders, etc, to communicate within the organization or to other companies. In today’s e-Business world, most companies have web sites and they use HTML documents as their main communication vehicle for their customers and business partners.

Documents are traditionally in human readable or "what you see is what you get" format which main purpose is to enable the communication between humans. One of the most popular documents editing tool is Microsoft Word. Word is considered editor for formatted documents. While formatted documents look and print extremely well on paper, they are often not suitable for web publishing. Most web document such as HTML are in ASCII text format. For comparison, [Figure word_Profile00] shows a document named Profile00 in Word format and [Figure xml_Profile00] in XML format.

 

IN NEED OF A STRUCTURED DOCUMENTS - XML

E-Business needs a form of document that could be understood by both humans and computers easily. Besides, the document must contains enough added information to enable the understanding of its underlying structure as well as the meaning of the data (meta data). XML is set out just to do that. In general, a XML document consist of the following three parts:

Structure

Structure is the document type and the organization of its elements. For example, memos, application forms, resume, etc. A set of rules is in place to enforce what kind of elements it can contain, in what order they can occur, and what additional attributes of elements are allowed.

Presentation

This is the way information is presented to the reader on the web, on a piece of paper or via voice synthesis. Whether a block of text is in bold or italic, which fonts to use, etc are specified in this part.

Data Content

The actual informational data contained in a document

As we go along in the article, we will present an example to show how XML divide up Structure, Presentation and Data Content. Before that, lets take a look on a brief history of XML as well as some well-known applications today.

 

HISTORY AND APPLICATIONS OF XML

The idea of XML has been in development since the 1960s through its parent called SGML. SGML is set as an international standard in 1986 as the way for structure documents publishing. Although SGML contains a lot of useful concepts and abilities to perform complex publishing, it found little application in publishing except working as a difficult technology for high-end systems by corporations with deep pockets. In mid-1990s an SGML application called HTML, emerged as the main publishing for method large-scale electronic documents in the World Wide Web. In 1996 one of the working group in World Wide Web Consortium started developing XML as a streamlined version of SGML. In a way, XML is a "cleanup" version of SGML retaining its very powerful structured concept but removing portions with high complexity and limited application. In other words, XML is a streamlined version of SGML designed for transmission of structured data over the web.

In order to provide better customer services, most of the financial institutions provide online banking for their customers. Users can purchase good online with their credit card, download their credit card statements and paid their bills without any concern on how data are represented in different financial transactions in different institutions. Online banking such as these can easily be done by application such as Microsoft Money. In fact, Microsoft Money is an OFX-compliant application. OFX stands for Open Financial Exchange, which is an XML application developed jointly by Microsoft, Intuit, and Checkfree. (For more information, please go to http://www.ofx.net/ofx/). An OFX transaction in XML might look like the following:

<RequestStatement>

<BankAccount>

<BankID>888</BankID>

<AccountID>9394</AccountID>

<AccountType>CHECKING</AccountType>

</BankAccount>

</RequestStatement>

Another common XML application initiative is called CDF – Channel Definition Format. It is an XML application that enables timely delivery of business-critical information. Users of "Channels" located and register channel information interested to them and their businesses. After the registration, any changes to the selected information appear automatically rather than to have to revisit and redownload. CDF are heavily used in Windows’ Active Channel, Active Desktop as well as Microsoft Software Update.

 

WHERE TO START

There is a lot of diverse XML information out on the web, I will recommend two of the most popular places for starter. IBM developerWorks at http://www.ibm.com/developer/ is a good neutral source of information on various latest technology including XML. Under their XML Zone, articles on different XML topics, XML development tools and sample codes are available to download.

The other popular place for learning XML is Microsoft XML Developer Center at http://www.microsoft.com/xml/. This site has a comprehensive list of tools and sample code to download (although most of them are windows only). Besides, they offer one of the best free XML tutorials online.

 

XML EXAMPLE

 

[Figure bigPic] shows a general big picture of how XML data will be developed. First of all, we have the XML document which contains special XML characters known as markup and the content character data. Secondly, an XML document can optionally be associated with a set of rules known as Document Type Definition (DTD). The DTD will specify rules such as ordering of elements, default values, etc. The third component is the XML Parser that will check the XML document against the DTD and then split the document up into markup regions and character data regions. After processing with the XML parser, the data is now in well-form structure format and can be easily processed by any XML application.

Let’s make a personal profile as our first XML example. XML documents can be edited by any text editor such as notepad in Windows or vi in UNIX. However, if you are using plain text editor, you will have to manually type in all the tag. There are some XML specific editors such as XML notepad [Figure xml_notepad] that will help you out without typing all the tags manually. However, most of the editors today are not as functionally rich as your word processor counterparts.

<?xml version="1.0"?>

<!DOCTYPE profile SYSTEM "profile.dtd">

<profile>

<owner type = "STUDENT" age = "20">

<Name>

<FirstName>Richard</FirstName>

<MiddleName init = "P">Pong Nam</MiddleName>

<LastName>Sinn</LastName>

</Name>

<Phone>

<Home>(000)000-0000</Home>

<Work>(000)000-0000</Work>

<Fax/>

<Pager/>

<Cell/>

</Phone>

<Address type = "HOUSE">

<StreetAddr>555 Bailey Avenue</StreetAddr>

<City>San Jose</City>

<State>Ca</State>

<ZipCode>95141</ZipCode>

</Address>

<Email>

<ul>

<li>sinn@us.ibm.com</li>

<li>sinn@mathcs.sjsu.edu</li>

<li>webmaster@openloop.com</li>

</ul>

</Email>

<Education>

<Institution>

<GraduationDate>1998</GraduationDate>

<schoolName>University of Minnesota-Twin Cities</schoolName>

<degree type = "MS" major = "CS" gpa = "3.97"/>

</Institution>

<Institution>

<GraduationDate>1994</GraduationDate>

<schoolName>University of Wisconsin-Madison</schoolName>

<degree type = "BS" major = "CS" gpa = "3.90"/>

</Institution>

</Education>

<techSkills>

<Languages>Java</Languages>

<Languages>C++</Languages>

<Languages>C</Languages>

<Languages>JavaScript</Languages>

<Languages>XML</Languages>

<Languages>HTML</Languages>

<Languages>SQL</Languages>

<System>Windows</System>

</techSkills>

</owner>

</profile>

In our example, the processing instructions <?xml version="1.0"?> indicates to the parser that we are using standard XML version 1.0. The second line indicates that we are using profile.dtd as our Document Type Definition. Thus, the current XML document will be checked against the rules stated in profile.dtd.

The core of our example shows the usage of start- and end-tag to contain content data. All valid XML document must be well form. Thus, a start-tag must be met with an end-tag. (For example, <profile> must be met with </profile> eventually.) In the following chunk of code,

<Address type = "HOUSE">

<StreetAddr>555 Bailey Avenue</StreetAddr>

<City>San Jose</City>

<State>Ca</State>

<ZipCode>95141</ZipCode>

</Address>

 

element Address has an attribute call type and it is set to value "HOUSE". And, element Address contains four subelements in this order: StreetAddr, City, State, ZipCode.

 

DOCUMENT TYPE DEFINITION (DTD) EXAMPLE

 

In order to enforce XML authors to follow certain rules in writing their XML document, DTD is used. The following is the profile DTD is used in our example.

<!-- Document type Definition for the Profile Application -->

<!-- An profile document contains one or more owners -->

<!ELEMENT profile (owner)+>

<!-- an owner contains these six sessions in this sequence -->

<!ELEMENT owner (Name, Phone, Address, Email, Education, techSkills)>

<!-- Every owner is either a STUDENT or PROFESSIONAL

This is indicated by its type attribute.

If a value is not supplied for this attribute,

it defaults to STUDENT -->

<!ATTLIST owner type (STUDENT|PROFESSIONAL) "STUDENT">

<!-- Every owner must also has a age attribute.-->

<!ATTLIST owner age CDATA #REQUIRED>

<!ELEMENT FirstName ANY>

<!ELEMENT LastName ANY>

<!ELEMENT Name (FirstName, MiddleName, LastName)>

<!ELEMENT MiddleName ANY>

<!ATTLIST MiddleName init (A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z) #IMPLIED>

<!ELEMENT Home ANY>

<!ELEMENT Work ANY>

<!ELEMENT Fax ANY>

<!ELEMENT Pager ANY>

<!ELEMENT Cell ANY>

<!ELEMENT Phone (Home, Work, Fax, Pager, Cell)>

<!ELEMENT StreetAddr ANY>

<!ELEMENT City ANY>

<!ELEMENT State ANY>

<!ELEMENT ZipCode ANY>

<!ELEMENT Address (StreetAddr, City, State, ZipCode)>

<!ATTLIST Address type (HOUSE|APT) "APT">

<!ELEMENT Email (ul)+>

<!ELEMENT li ANY>

<!ELEMENT ul (li)+>

<!ELEMENT Education (Institution)+>

<!ELEMENT GraduationDate ANY>

<!ELEMENT schoolName ANY>

<!ELEMENT degree ANY>

<!ELEMENT Institution (GraduationDate, schoolName, degree)>

<!ATTLIST degree

type (BS|MS|PhD) "BS"

major (CS|Math|Other) "CS"

gpa CDATA #REQUIRED>

<!ELEMENT System ANY>

<!ELEMENT Languages ANY>

<!ELEMENT techSkills (System|Languages)+>

Most of the rules are documented with the comments about. Let’s take a closer look at the rules regarding address.

<!ELEMENT Address (StreetAddr, City, State, ZipCode)>

<!ATTLIST Address type (HOUSE|APT) "APT">

The first line states that an element of type Address can contain four subelements. Firstly, it must have a StreetAddr element, then City, State and finally ZipCode. The second line states that an element of type Address has an attribute known as type and it must contain either HOUSE or APT. And the default value for attribute type is APT.

 

CHECKING YOUR XML DOCUMENT

There are different parsers available in the market. If you visit www.ibm.com/developer/xml, there are 10+ free parsers available for download. In this article, I will introduce a Microsoft command line validation tool called XMLINT.EXE. It is an updated version of the XMLINT command line tool that shipped in the IE4 SDK. The tool will check if a given XML file is well formed or not. Besides, it also uses the XML DOM to check that the document is also valid according to the DTD.

 

[Figure xmlint_error] shows two error messages caused by missing MiddleName and forgetting to have a </Name> tag. If you correct all the error, the xmlint parser will not show any message (so, no news is good news).

 

VIEW YOUR XML DOCUMENT

Graphic user interface should be your favorite way to view an XML document. Before the release of Microsoft IE 5.0 web browser, the only "web way" of viewing an XML document is through the use of Java applet such as in [Figure xml_appletview]. With IE 5.0, you could view your XML document natively in a browser [Figure vxml1]. Clicking the "+" sign will expand the XML session for detail [Figure vxml2].

 

CONCLUSION

The Information Technology industry are full of buzzword technologies such as groupware, directory system, Internet, intranets and extranets. Most of the technologies have been hyped to death without very little thought on what Internet is designed for – the improvement for information and resource sharing. XML can help you organize your information and resources better. It will be the future of e-Business communication and web publishing. I hope this article will give you a quick introduction on where to download useful XML related tools and have a jump-start on learning XML.

 

AUTHOR

Richard Sinn is a Staff Software Engineer in IBM Santa Teresa Laboratory, San Jose California. He also is a lecturer in San Jose State University and a freelance writer for different magazines and journals. He can be reached via e-mail at webmaster@openloop.com or at his Web site at http://www.openloop.com.