Using Just Word 2003 (with a little help from Notepad) to develop XML and XSL documents

Using Just Word 2003 (with a little help from Notepad) to develop XML and XSL documents

Word 2003 is not designed as an XML development environment.  Somehow, XML schemas are supposed to sprout from the forehead of Minerva, whenever she has a headache.  Apparently, she gets these headaches whenever you present her with a Word XML file.  You can then use the XML schema to validate your XML data.  For us mere mortals who do not have a direct pipeline to Minerva, we have to use other means.  Many use Visual Studio, and there are certainly a myriad of other tools out there to help you create the XML schema for your XML data.  However, if all you have on your computer is Word 2003, and your IT department will not let you install any other application, but they will allow you a couple of Word 2003 add-ins and tools, then you can actually build your XML schema, your XML forms and aggregated file, and the XSL transforms to create pretty, formatted views of your data, sufficient for your needs.

 Resources that you will need in addition to Word 2003

 Notepad (comes with OS.)

 Microsoft Office Word 2003 XML SDK (http://www.microsoft.com/downloads/details.aspx?FamilyID=ca83cb4f-8dee-41a3-9c25-dd889aea781c&DisplayLang=en )

 Notepad (comes with OS.)

 Microsoft Office 2003 WordprocessingML Transform Inference Tool (http://www.microsoft.com/downloads/details.aspx?FamilyID=2cb5b04e-61d9-4f16-9b18-223ec626080e&DisplayLang=en )

Notepad (comes with OS.)

Microsoft Office 2003 Smart Document SDK (http://www.microsoft.com/downloads/details.aspx?FamilyID=24a557f7-eb06-4a2c-8f6c-2767b174126f&DisplayLang=en )

Notepad (comes with OS.)

Mary Chipman’s Article: Using Schemas with Word 2003 and Excel 2003

(http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dno2k3ta/html/odc_OF_WordXMSchemas.asp )

Bill Coan's Sample XML Project

(http://www.wordsite.com/downloads/xmlproject.htm )

The DishList62 XML/XSL Sample files (http://common.mvps.org/obts/DishList62-XML-XSL.zip )

The Smart Document SDK is, definitely, an advanced resource, and is only of interest when you begin thinking about enterprise-wide deployment.

 

Getting Started – Layout the repeating portion of the XML File and check its XML syntax

 

Start your first cut at your XML file in Notepad.  Do one complete instance of your second level element.  In my dishlist62.xml file, is my document element and is my second level element.  This will establish what you need in your initial schema file.  Start the schema file in Notepad also and do your first cut of what you think it should be.  Now, for the first check-out step, select your xml file in your file explorer, right-click it, and choose Open with, and then select Word.  More than likely you will have an xml syntax error and you will see a common word error message.  For this part of the paper, I took my dishlist62.xml file, edited it down to one dishInstance  element, and inserted a local name space prefix (ujw3:) on all my local elements.  When I opened this XML file in Word, I got the error message that is shown in Figure 1.  This message tells me which line and character the first syntax error occurred in the file.  I can then use that information to correct the syntax and load it into Word again, to pick up the next error.  It is very much like to old assembler packages that would only give out the first error it found when assembling the module.  I used to hate that one error message at a time business. However, when assemblers got more sophisticated, and put out all the error messages that they found while reading a file, I soon learned that the first error generated a cascade of errors for later lines, and that one should only pay attention to the first or second error.  So, primitive as it may seem, one error at a time is probably the best you can do with a syntax checker.

 

Figure 1 picture is located at:  http://msmvps.com/obts/gallery/image/522.aspx  

Figure 1: Syntax error in XML file as reported by Word 2003

 

The text at line 15 was the following:

   Step1: _The Crepes:  Start with a large bowl, put in 8 cups of water, heat to slightly above luke warm, add 1 tablespoon of salt, stir in 6 cups of low-gluten flour, add 1 heaping tablespoon of active dry yeast, cover, and place on an even larger plate or bowl in an out of the way corner of the kitchen.  Let the mix rise and fall back--will take 3 to four hours, stir with a whip every hour or so. (Now, if you did indeed add the salt, you can get away with just two inches of free-space between the top of the mix and the top of the bowl.  If you forgot the salt, you need twice the depth of the liquid, as it will double in volume before falling back.  That's why you place it on an even larger bowl, so that when you do forget, you don't have to clean the resulting mess off the counter top, the cabinet fronts, and the floor.)  Put in refrigerator.

 

Column 8 is the position of the semicolon after ujw3.  That was a real puzzler, because the real error was at the end of the line, way out of sight on the right.  I had placed the prefix alias between the <-char and the /-char, which is an illegal way to construct an end tag for an element.  The lesson to be learned by this example is that syntax error checking is tricky and you are not necessarily looking for the obvious.  However, when you get it right, and only then, Word will load the XML file.

 

You can do a similar syntax check with the schema file you created.  The schema file is, by definition, an XML file.  Open it in Notepad, and save it with an XML extension.  Now, open that file with Word, get the details of the of the syntax error that word reports, close the file, correct the xsd schema file, overwrite the xml version, and open the xml version in word again.  Do this till Word opens it and shows the tags.  Now you are ready to load the xsd file into the Schema Library to use it as the schema for your XML file.

 

 

Attaching the Schema to the XML file

 

Open the XML file in Word.  In the right-hand pane, Word will be showing the Document pane.  You want the XML structure pane—drop the pane selector list and choose XML Structure.  (If no pane was showing on the right, hit Ctrl-F1 first.) Now, drop the XML Toolbox function selector, and choose the XML Schema dialog function.  It will show a box for the schema of your XML target namespace as being unavailable.  Click the Add Schema button.  This lets you browse to the directory containing your schema file.  Select it and OK the selection, and it then asks you for an Alias for your Schema’s namespace.  (If your XML file does not define a targetNameSpace attribute, then it will first ask you to input that string.) The alias you choose will only be used in the Schema Library’s dialog windows, so you just need to choose something short that lets you recognize it when you have to come back to operate on it in the future.  When you OK the alias, Word then tries to establish the xsd file as a schema in the library.  It first runs a syntax check, and sniffs out all the nooks and crannies in your schema, and barfs up to you the first error it finds.  You are then out, of course.  Get the details of that error, and go back and correct the schema, and try again, and again, and again.

 

Figure 2 shows the error that popped when I tried to load the schema file as a schema in the Schema Library.  To create this schema, I had taken the working schema file from the DishList62 sample, and inserted the target namespace prefix that was absent in the sample’s schema.  I did this because when you take a beginning xml course, it instructs you to use xs: or xsd: as the prefix for the elements that are defined in the Schema Standards Definition Document, and a prefix of your selection for the elements that are defined in your target namespace.  That is practically the first thing a beginning XML course will tell you when it is showing you how to construct an XML file and how to put your stuff into it.  Basic stuff!  But, when you try to do that with Word 2003, it gives you Figure 2.

 

The Figure 2 picture is located at: http://msmvps.com/obts/gallery/image/523.aspx 

Figure 2.  Syntax error in Schema caused by use of a target namespace prefix, as reported by Word 2003

 

The text of the error message is saying, in a completely unintelligible (to me) way, that it will not accept an alias prefix to elements defined in your target namespace.  After taking my confusion to the newsgroups, I was directed to Bill Coan’s XML project sample.  In it, he uses the blank target namespace prefix.  It was only after following his example that I was able to get Word’s Schema Library, document panel, and structure panel to work properly.  Depending on what you have in your schema file, you might get other errors.  But, ultimately, if you have a non-blank target namespace prefix, you will come down to this one, which you will not be able to correct.  So be advised.

 

When I had created this new schema file, with the target namespace prefix, I could not decide if, in the schema declaration, targetNamespace was a name in the Schema standards document or something to be defined in my target namespace.  So I tried adding my prefix to it, and get the error shown in Figure 3.  So leave that particular feature alone, also.

 

The figure 3 picture is located at: http://msmvps.com/obts/gallery/image/524.aspx 

Figure 3.  Error message caused by adding a target namespace prefix to the targetNamespace attribute in the schema declaration.

 

Word 2003 can Create the Schema from your XML file

 

During the process of building your schema file in Notepad, and loading it into Word to perform the syntax checks, and then loading it into the Schema Library and seeing if it will work, you are popping the XML Toolbox function selector down and selecting the XML schema dialog, or the Schema Library dialog, it eventually dawns on you that there is another function in that list that might be of interest.  This is the Generate Inferred Schema function.  Why are you doing all this inefficient, frustrating syntax checking when apparently it could all be done for you by Word 2003?  The answer is that, while it can create the schema, you really do not want to use a Schema generated by Word 2003.  That same Introductory XML course will tell you how to write global, named  Type constructs that greatly simplify a Schema document, and improve its readability.  The Inferred Schema does not utilize global named Type constructs.  Instead, you will find a graceful snake of indented elements coiling down the page as each “re-use” of a type generates a new anonymous type statement.  Gross! 

 

However, the inferred schema is not without its value.  If you want to know how to finagle a point, then you can see how Word would do it, and then try to manage it more gracefully by hand coding the sequence.  Word also makes some surprising choices.  One of the extension functions of the XML Schema definitions is the Choice Structure, in which one can specify a set of elements that can be included at the specified point.  Each element of the set can be included, or not, in any order, and as many times as is needed, within the limits of the Min and Max parameters.  Word codes it as a Sequence Extension, with the Minimum parameter being set to 0.  It is not quite the same, but close, and generally fits your data, as you will only have one cycle of the second level element in the XML file.  Word would not really know that, on the next instance of the second-level element, you might want to input the elements in a different order.

 

Rely on Word to infer your XML Schema file only if you have not had that Introductory XML course!

 

The Power behind the XML Feature: the XSL Transform or View

 

The XML techies will go on and on about how XML is providing the means of including the metadata about the data in the datafile and how that means companies can communicate with other companies, how counties can interop with other counties, and , yea, hallelujah! even countries can communicate with other countries, and know what they are saying.  But, come on, we know that it is the United Nations out there.  Nobody really wants to really know what's in some one else’s mind.  That would flat-out just be too painful.  The real power of XML/XSL is its ability to transform a really drab, boring, huge collection of elements, with twice the typing required by an ordinary document, into a group beautifully formatted, four-color documents, that you can select or switch between with just the click of mouse.  That’s the promise of XML and XSL transforms, and WordprocessingML 2003, in conjunction with the XSL Inference Tool, brings to Office Systems 2003.  XML without full Word formatting capability is like a song with no swing—it don’t mean a thing.  So lets learn how to use the Microsoft Office 2003 WordprocessingML Transform Inference Tool.

 

wml2xslt.exe is the XSL inference tool.  It is run from the command line, but it is easier to use if you setup an instance of the Command Prompt so that it starts in whatever directory you have placed wml2xslt.  You have an accompaning document that tells you what parameters you can use on the command line and a little bit about what each parameter does. There is also a document that tells you how to prepare a seed document that contains the formatting information that you want to be included in the XSL transform.  Assiduous searching will find a little help on the concepts used in the tool.  And, if you, by chance, search from the TechNet site, rather than the MSDN or Microsoft.com site, you will find the article by Mary Chipman.  But that is another statement about how to use the Schema Library.  Still, it is important, as it reinforces the basics that you may have found in the Word 2003 help files.

 

At first meeting, this tool appears useless.  I recall creating my seed XML file in Word,  adding formatting, and carefully saving it with the ‘save data only’ box unchecked, running wml2xslt and getting back a statement that there were no tags with formatting in the input document, so it made no output.  Now I could clearly see that I had put formatting in those tags.  Why this gibberish from wml2xslt?  And where are the explanations for the error messages?  I never found any.  I have gotten past those problems, but I don’t have versioning turned on, and don’t have the files that produced the errors.  I set about trying to recreate them.  I took the Dishlist62_T1.xml file as my seed document (which uses the Kitchen62_T1.xsd as the schema but I am not sure that the schema ever get referenced), and fed it to wml2xslt.  I expected it to produce the ‘no tags with formatting’ error message.  In fact, it ran without error.  The transform produced was very strange—but no errors!  So I had to get a bit more sophisticated to get some of the old errors.  I created DL62_T2_fmtdSeed_0.xml, in which I put all the input tags in tables.  When I ran that file in wml2xslt, I got some of the initial errors.  The first are:

 

Warning: smarttags will be removed.

Warning: The following item may result in a WordprocessingML document that is not well-formed: smarttags. The item will

 be removed from the resulting XSL transform

At node st1:City

Warning: smarttags will be removed.

Warning: The following item may result in a WordprocessingML document that is not well-formed: smarttags. The item will

 be removed from the resulting XSL transform

At node st1:place

Warning: smarttags will be removed.

Warning: The following item may result in a WordprocessingML document that is not well-formed: smarttags. The item will

 be removed from the resulting XSL transform

At node st1:place

 

These occur because smart tags are turned on in the Word Options panel.  Turn them off, and they will go away.

 

The following is another error I saw a lot at the beginning.  The command line for running the tool is:

 

Inference Tool>wml2xslt j:\UsingJustWord\DL62_T2_fmtdSeed_0.xml -mx -o j:\UsingJustWord\DL62T1_Inferd3.xslt -nsa -nf -v

 

 And the errors are:

 

At node ns0:PageTitle

At XPath: /ns0:dishlist/ns0:PageTitle

Warning: Text node contains multiple types of formatting.

Warning: Text located in the following XPath contains multiple types of formatting. The resulting XSL transform will apply only the first type of formatting encountered in that node.

 

This error is caused by having the –mx switch included in the command line, and not having any mixed content formatting.  I am not sure why I ever included the mx switch, but probably because I wasn’t getting anything from those initial runs, and I turned on everything just to see if I could force some interpretable output.  One of these errors is produced for each formatted node, so it is really quite intimidating to the novice.  Be forwarned.

 

So, the global rules for using wml2xslt are:

 

1)    Make sure that your XML and XSD documents follow the pattern set by Bill Coan’s project—that is, use a blank prefix for the targetNamespace which you define in the document element of the XML file.  Be sure that you use the same targetNamespace string that you use in the Schema document.

2)    Do not use the –mx switch in the wml2xslt command line unless you truly have mixed content.

 

Now, to the formatting problems in the interaction between the Word seed document and wml2xslt itself.  When you open the bare seed xml document in Word 2003, the data-only view will show the contents contained in open tag icons.  These open tag icons represent the paragraph tags--that is, the contents of the tag will be displayed as a separate paragraph.  You can convert the open tag icon to a filled tag icon, which means that the content of the tag will be displayed as text in-line, with no new-line characters separating then from the contents of the next tag.   I did a bit of that in the 0-version of the T1 seed and it is shown in Table 1.

 

Water

8

Cups

 

low-gluten flour

6

Cups

 

 

Salt

1

Tablespoon

 



 

Tabel 1.  The ingredient cell of DL62_T2_fmtdSeed_0.xml

 

When I created the ingredient cell for the seed, I made sure that my three interior tags were all colored tags, and , while they were on different lines, I thought that was due to the way Word had to fit them in the narrow field, and expected to get all the text in one line.  However, when I show that cell in a single-cell table that is much wider, it is easy to see that I have an included new line char between the tags.  One has to either sqeeze the other two cells of the table to really thin cells, so that you can see any new lines in the sequence of tags, or copy it to a larger single-cell table like the above.  Or, perhaps, merely show the new line characters.  The transformed output from that cell is shown in Table 2.  Clearly, some additional work needs to be done.

 

 

 

The Dish

Dish Index

 

Ingredients

 

Western-Style Crepes

 

100101

 

 

Water

8

Cups

 

low-gluten flour

6

Cups

 

 

Salt

1

Tablespoon

 



 

 

 

 

 

 

 

 

 

 

 

Table 2:  The formatted output from the Ingredients cell.

 

Another problem ocurred with the Condiment table.  I wanted those numbers in-line, and separated by a semi-colon and a space.  I managed to get them in-line but when I added the semi-colon and space, the XML-structure pane showed a error in the structure of the file, saying that the schema did not allow for the inclusion of spaces.  In the DL_T2_fmtdseed_0 document the condi tags are outside the table as shown in Table 3.

         

Condiments for this Dish

100501100510100521100525

 
       

 

Table 3:  The Condi Tags surrounding the Condiments Table.

 

Whereas, in the seed from the Dishlist62 sample, the condi tags are inside the Condiments table, as shown in Table 4.

 

Condiments for this Dish

0 100301 100305 100306 100310

 

Table 4: A Condiments Table that includes the conDi tags.  (Unfortunately, the tags were elided during the upload to the blog.  Please see the .doc version in the zip file.)

 

I can’t say for sure what makes the difference, but with a lot of trial and error finagling, you should be able to make it format your XML documents to look the way you want them to look.  (Note to readers: be aware that there is a new paste option when pasting from an XML/XSL document--to preserve the XML structure.)

 

As intimated in the reference to the mixed content switch in the wml2xslt command line, there are features that can enable formatting with mixed content, whatever that is, and with database records.  I have not attempted to work with those.  Still, I am confident that, if you follow the two global rules about targetNameSpace and appropriate use of the command line switches, you can get the XSLT Inference Tool to format those features with no more aggravation than I have experienced with un-mixed content.

 

The final thing for me to say about XSL transformations is that the real power of the XML/XSL nexus is not that you can show the XML document in a pretty skin, but that you can show the same document in several pretty skins.  Just by constructing the proper XSL transformation, you can show several sub-sets of the file in radically different presentations, depending upon your needs.  And as the XML techies will babble on about, you can use the one source file and create output transforms for your computer screen, for any number of different printers, for your ipod, for your cell-phone, and even for your wrist watch.  Soon you will even be able to send it to that strange molar that keeps broadcasting the radio station to your tonsils.  Just wait till the drug companies find out about that!  And it can all be done from one XML source file, and all changes or additions or updates are done to that one XML source file, with some tweaking of the XSL transforms, of course.  Imagine that—all the world’s commerce hangs on the ability of Notepad to efficiently edit XML files.  But only magic makes the XSL files?  Well, not quite, wml2xslt isn’t magic, but close.

 

After the Flash—How do you do data input?

 

It is easy to create an Input form for an XML file.

1)    Copy the start of the file, down through the end tag of the second-level element, and then complete the file with the end tag of the first-level element.

2)    Delete all the content from the XML tags.

3)    Copy the second level-element, and paste it after the first instance of the second-level element.  This way you have two instances of the second level element, then the end tag of the first-level element.

4)    Save the XML file as a Word template.

 

The easiest source to use for the template is the unformatted seed document that was created for use in the Inference Transform tool.  There is a reason you should save two instances of the second-level element—if you need addition instances of empty child-elements of the second-level element, you can cut and paste from the empty second instance.  This ocurrs if the element has a maxoccurs attribute set to unbounded, and the current instance needs more copies of the element than was included in the template.  Once the new instance of the second-level file has been completed, it can be copy and pasted into the XML source file that you use to aggragate all the second-level elements.  The new instance will now show up in the transformed views of the source file, as if by magic. 

 

The empty tag document presented by the template is not a very user-friendly document, especially if the data is to be input by someone, who generally does not understand about XML tags, or knows or remembers what data the tag names indicate.   Fortunately, Word 2003 includes support for a PlaceHolderText attribute.  As the template document is being prepared, one can add a PlaceHolderText attribute to each element that will accept text input.  The PlaceHolderText attribute contains the text instructions for what data should be entered into the XML tag—like “Enter Name of Recipe here”, in the case of my DishList62 template.  Now, when the data entry person opens an instance of the template, and turns off the Show XML Tags feature, the PlaceHolderText strings appear on the page to guide what text should be entered where.  (The showing of the XML tags in the document is controlled on the XML structure pane, through a check-box under the XML structure display.)  The PlaceHolderText strings disappear as soon as the data entry person clicks into the field.

 

From a designer/programmer point of view, the PlaceHolderText attributes should be defined in the schema, and the text strings entered there so that they would be pushed into, and available for use in, all the documents that use the schema.  This isn’t how schemas work—they only validate the structure of the file, not push text into it.  I did, in fact, construct a PlaceHolderText attribute for all the data elements in my DishList62 schema.  Only to find out that I would have to enter the PlaceHolderText strings in ever XML document that used the schema.  I immediately made them optional.  I also then discovered that Word 2003 treated the PlaceHolderText attribute as separate from the attributes defined in the schema.  So, even though I gave the attribute the PlaceHolderText name, the strings entered in the attributes were not used as placeholder text.  Looking at the Attribute Properties Panel, you see a text box for Required Attributes, in which you enter the strings required by the Schema, and down below, you see a line that is labeled PlaceHolderText.  Only the text entered in that control is used as PlaceHolderText when the tags are turned off in a document.  While I initially gnashed my teeth at the clumsiness of this implementation, after I have used it a bit, and also discovering how difficult it is to define attributes in the schema, I am not going kevetch about it.

 

Enterprise Deployment

 

Having created the XMLfile, and pretty, formatted views of it, the final question is how do you share it?  The Schema Library, which is holding the links between the XML document and it’s XSL transform views, is a client-side creation of Office.  It is not installed on a central server.  When you start looking into how to move the schema and transforms to a central server, you get sent to the Smart Documents SDK.  There you find out that you can put the schema and transforms and such in a server-side directory and control it all from a manifest file.  The manifest file is, apparently, just another XML file, but the killer feature is that it has to be digitally signed.  It would be nice to be able to get your document off of a server without having to deal with that security popup that asks if you really want to down-load the file you just selected and double-clicked on.  It could be dangerous, you know.  Because things on the server can have virii or ActiveX components, and Oh, my God, it could DESTROY your computer!!!  The digital signature might free you from Microsoft’s pledge to protect you from yourself.  Still, the SDK talks about smart documents as applications, which is a bit more than an XML file and XSL transforms.  So the smart document is probably not the right way to go, when it comes to deploying your schema and transforms.

 

Fortunately, the Schema Library has the facility to link to the schema and views when they are saved on a server share, public document repository, or even a SharePoint Document Library.  You can share your document and views by sending the document, or the seed, and the template , to a collaborator, and instruct them to find the schema and transforms on the server.  Said collaborator can then add the schema and solutions to his local Schema Library. If the document is in a SharePoint or server side location, then you can grant update permissions, as well as read permissions, and have a real collaboration of effort.

 

I would not suggest developing the initial XML file or schema from a SharePoint document library.  You can’t easily get a document to open in Notepad from a doclib.  When the development is done, Word has no problem opening the files, if the schema and views are located in a SharePoint doclib.

 

In Summary

 

You can

1)    Use Word 2003, with a little help from Notepad, to create XML files and Schemas to validate those files.

2)    Use the wml2xslt.exe XSLT Inferrence tool to create WordML-formatted views of the XML source files.

3)    Create XML templates and data entry forms that guide non-programmers in filling out new source data.

4)    Use the Schema Library to enable the deployment of the schema to intranet locations so that many can collaborate on the use of the XML documents.

The zip file that contains the .doc form of this article, and the sample files discussed, can be downloaded from:

http://common.mvps.org/obts/UsingJustWord.zip 

 

Published Sun, Jul 10 2005 11:53 by OBTS
Filed under: