The Problem Solver

Tell me and I will forget
Show me and I will remember
Involve me and I will understand
- Confucius -

Google Ads

This Blog

Syndication

Search

Tags

News





  • View Maurice De Beijer's profile on LinkedIn

Community

Email Notifications

Explore

Archives

Working with Office 2007 documents
Office 2007 has a much nicer document format. The new format is XML based, no longer some proprietary binary format, and unlike the previous XML format introduced in Office 2003 this is the real deal not a format that only supports a subset of the functionality.
 
But the new format, DOCX for Word, itself is actually a ZIP file containing a whole range of other documents. These other documents are XML, actually there can be other file types as well but the XML files is what it is all about. So ZIP and XML means we should have an easy time of opening and reading these file from our .NET programs, after all both are pretty standard and have plenty of tools and libraries available. But life is even better than that. Microsoft has decided to add native support for reading and writing these files in .NET, in the System.IO.Packaging package to be exact. Take a look at the following code for an example:
 
Imports System.IO.Packaging
Imports System.Xml
 
Module Module1
    Sub Main()
        Using doc As Package = Package.Open("C:\Documents and Settings\Maurice\My Documents\Microsoft\Demos\1. XML File format\test.docx")
            Dim part As PackagePart
            Console.WriteLine("===== Document parts. =====")
 
            ForEach part In doc.GetParts()
                Console.WriteLine(part.Uri)
            Next
            Console.WriteLine("Press any key to continue and show the contents.")
            Console.ReadKey()
 
            Console.WriteLine("===== /word/document.xml contents. =====")
            part = doc.GetPart(New Uri("/word/document.xml", UriKind.RelativeOrAbsolute))
            Using reader As XmlReader = XmlReader.Create(part.GetStream())
                While reader.Read()
                    Console.WriteLine(reader.ReadString())
                EndWhile
 
            EndUsing
            Console.WriteLine("Press any key to end.")
            Console.ReadKey()
        EndUsing
    EndSub
EndModule
 
The only slightly confusing thing is trying to import the System.IO.Packaging namespace. You would tend to try to set a reference to a System.IO.Packaging.dll, either from the GAC or file system, but it just isn’t there. Instead you need to add a reference to WindowsBase.dll. It should be in the list of .NET assemblies but if not you can find it in the "C:\Program Files\Reference Assemblies\Microsoft\Framework\v3.0" folder.
 
Enjoy!
 
 
Maurice de Beijer
 
Published Thu, Oct 12 2006 15:28 by Maurice
Filed under: , ,

Comments

# re: Working with Office 2007 documents@ Thursday, October 12, 2006 1:08 PM

How about the file repair in case of file corruption scenario.

# re: Working with Office 2007 documents@ Thursday, October 12, 2006 1:16 PM

I guess that all depends on what gets corrupted. If it is the zip file then these classes will not help. However if it the internal stricture they could potentially be used to recover the contents.

by Maurice