For previous versions of Office, Microsoft provided files in a proprietary binary format that changes depending on the file type (doc, xls, ppt, etc...) , at the past developers needed to know the application model of office to better manage and handle different files formats and this was so painful and costly, with introduction of Office 2007 ( aka Office 12), Microsoft provided a new standard for saving Microsoft office files in XML, the new standard called OpenXML, simply speaking, everything in the file is just saved as XML which is adhering to OpenXML standard, this is an extremely good addition to the history of Office system, it will make life of developers much more easier, and this will give developers great flexibility to manage and edit office files, moreover if you are working on a server side application you can send your content to users as an office files, suppose you are sending a newsletter to a portal users, you can send it as an attached word file, so users can save newsletters easily, one more scenario is that if users are uploading documents to your application and you just want to analyze document contents or add any standard data to the document before saving it to the server; with OpenXML you can achieve this.
New office extensions are all post scripted by ‘x’, doc is now docx, xls is now xlsx, etc.., the good news is that you can view the file XML contents in WinZip because office 2007 file is just a zip container, and let’s have one example to better explain it, open your word 2007, and write ‘Hello OpenXML’, then save it and exit Word.
Open windows explorer and browse to the file you’ve just created, right click the file and open it using WinZip or WinRar (I prefer WinRar as it handles both zip and rar files), amazingly you will get the contents of your file listed as shown in the following screenshot:
Word 2007 Package Structure (Image from MSDN)
As we said the docx file is just a package and this package contains parts, a part could be textual and it could be binary for image, audio, video files, or any other non-textual format, open [Content_Types].xml file, you will find all parts listed inside this file as indicated in this screenshot:
Browse to word folder on the root and open document.xml file, scroll down you will find the content you’ve entered that’s Hello OpenXML as indicated in this screenshot:
Get back to the root and open docProp folder you will find 2 xml files; app.xml and core.xml, app.xml contains meta data of the word file, such as number of words, characters, paragraphs , and pages, template used, etc.., while core.xml contains creator and last user modified the document, and times of creation and modification as well, contents are shown below:
For the _rels folder on the root, this folder contains relationships between parts inside the zip package.
Note: The formal way to work with OpenXML files, is using the OpenXML API released with .NET 3.0
For more information about OpenXML , have a look at these locations:
Labels: Office 2007