I’m back again with another one of my interesting and insightful posts…or well…at least another post. Unfortunately, since it is quite a big subject, I will have to divide it in to two parts. This first part contains information about what OpenXML is and how to use Silverlight to creates office documents. While the second part will show the how to use the simple library I built to make it simpler.
For those of you who don’t know what OpenXML is, it is a standardized XML based format for storing Office type documents, which is used by MS Office as well as some other office packages. It is certified by ISO as well as ECMA. You can read a lot more about it at http://www.openxmldeveloper.org. Normally you work with it using the OpenXML SDK, but that is not available to Silverlight…yet at least…
I personally had very little experience in the format as such, until recently. Just before x-mas, I got assigned to a project at my company that opened my eyes to it though. I got the honorable task, together with another developer at our office (who actually did most of the work), to update the lab material on the OpenXML Deverloper site and make sure it worked with the upcoming Office 2010. After just a few slides about the standard, it dawned to me that it seemed like a pretty simple task to implement a small, but extensible, OpenXML library for Silverlight. So I did…
At the end of this blog post, you can download the source for my little library, but I do think I need to explain a little about how it works. It has some oddities and could probably be done a lot better if I had more time. But time is a rarity in my life, so here it is. And by the way, a lot of the oddities are there partly because of the way OpenXML works. And yeah…it is at the moment limited to Word documents and only a small subset of what is possible. But it is possible to extend it quite easily both with more Word features, but also with support for spreadsheets, slideshows etc.
The current feature basically supports creating a new Word document, add text and styles to it and save it. This entry will also include code to show how to extend it to include simple image support.
So, how does OpenXML work. Well, I did say it was XML based, and it is. But that doesn’t mean that you can take you docx files and change their extension to xml and hope to see what’s inside them. But, you can change extension to zip and open the zip file. Inside the zip file you will find xml files, sub folders as well as any potential resources such as images. The base of the package are 3 files and 1 subfolder. That is all that is needed to create a document you can open in word. The “root” file is a file called [Content_Types].xml. That file contains xml-elements that declare what type of content is in the other files in the package. It can contain either default extension mapping, such as saying that files with the extension xml contain XML, or overrides. Overrides tell the client that reads it that even if a certain file has a particular extension, it is still something other than what is declared as the default. A file could look like this for example
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml" ContentType="application/xml"/>
<Override PartName="/document.xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
</Types>
It says that any file with an extension of .rels, contains some form of OpenXML specific relationship thingy. And any file with .xml contains XML. But the file called document.xml actually contains something more specific than “just” xml.
As you can see from the <Override> element, there is something called a part in the package. A part is basically an individual part of the package. In the simple document I’m talking about, there is only one part, a “document” part. The “document” part is the actual document that contains the text in the document. But a package could contain lots of parts. An image would be a part, an XML file that contains the styles used in the document is a separate part. Each part of the package has its own type.
This brings us to the next file, the actual “document” part, the document.xml that is referenced in the snipped above. This document contains, as I mentioned before, the actual content of the Office document. It is generally placed in a subfolder, but that is not necessary. It can look like this
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:t>Hello Office Open XML</w:t>
</w:r>
</w:p>
</w:body>
</w:document>
That is a very simple document, but that’s all that is needed…
And then the last part. This is a file called “.rels” and is placed in a subfolder called “_rels”. And yes, I write it correct, the file only has an extension and no name. It contains references, or relationships, that our package has. It specifies parts in the package that the package relies on. In the simple document example I am using, it will look like this
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument"
Target="document.xml"/>
</Relationships>
As you can see, each relationship has an ID, a type and a target. The ID is not actually used in this case, but needs to be there. I will explain why it is there in a minute… The type is NOT the content type as declared in the [Content_Types].xml file, it is the type of the relationship. In this case it says that the relationship points towards and Office document. And the target is of course the path to the actual part. With this relationship the client reading the package knows where to start… (It is possible to have more than one document in a package and link them…so this relationship is necessary)
There are actually 2 types of relationship files. The one described previously is mandatory, the other one is not. This other type contains relationships for a specific part. It can for example contain a reference to an image that is used in a document part. It is placed in a “_rels” folder as well, but the “_rels” folder must be a subfolder to the folder containing the parts it holds relationships for. In the simple example used so far, the “document.xml” part is placed in the root of the package. But it is often placed in a subfolder called “word”, and in that case, the “document.xml” file’s rels files must be placed in a “_rels” folder below the “word” folder.
In this package specific relationship file, the IDs become very important. They are the identifiers used in the part to point towards that external part that it needs. Say for example that our document.xml file contained the following XML
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<w:document xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
<w:body>
<w:p>
<w:r>
<w:pict>
<v:shape style="width:467pt;height:124pt">
<v:imageData r:id="rId1" />
</v:shape>
</w:pict>
</w:r>
</w:p> </w:body>
</w:document>
In that case, the rels file need to have a relationship that looks something like this
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="openxmldeveloper.gif"/>
</Relationships>
Not that the path to the target is stated relative to where the document.xml file is located. So in this case, the openxmldeveloper.gif file must be in the same folder as the document.xml file. In the [Content_Types].xml file however, the target is stated from the root of the package and always start with a slash.
Ok…so, so far it doesn’t seem to complicated. It is just a bunch of different files and some XML. And that is true. However, creating all of that XML without some library to help us, makes it a task for only the people who either know the OpenXML standard or have time to sit down and learn it. That’s where my library comes in to play. It gives us typed control of what XML to create. But that’s covered in the next post.
So is this post done? No…it isn’t. There is one little problem left. The package needs to be compressed using a zip format. And in .NET that isn’t a problem, but in Silverlight it is. Silverlight has no DeflateStream and can’t zip things. It can read zips, but not create. However, a quick search on the big interweb solves that for us. There is a Silverlight port of the SharpZiplib library available on Codeplex. Having found this makes the entire thing quite simple actually, even though a bit cumbersome when it comes to the XML.
The SharpZipLib library contains a ZipOutputStream class that can compress anything passed into it. All we have to do is give it a stream to write to, set the compression and get going. There are lots of sites out there that explain how to use this library, so I won’t. But the overview looks like this
Create a ZipOutputStream. Set the compression. Create a ZipEntry object for each file that needs to be written to the zip file. Set the size of the entry. “Put” it in the stream. Write your entry. “Put” the next ZipEntry in the stream and then the file. That’s it…
using (ZipOutputStream stream = new ZipOutputStream(stream))
{
stream.SetLevel(9);
ZipEntry entry = new ZipEntry("Folder/Filename.ext");
entry.Size = _currentStream.Length;
stream.PutNextEntry(entry);
// Write file to stream
}
Something like that. And that is sort of what I did, and the code ran and the file was created and everything seemed fine. But Word wouldn’t open the file. It said it was corrupt…hmm…another search on the web…and another answer. If you have a 64-bit OS, the library apparently creates Zip files that include some 64-bit feature that Word doesn’t get. Luckily, we can turn this off by adding one line of code after the creation of the stream. Just add this and it seems to work fine…
stream.UseZip64 = UseZip64.Off;
With the OpenXML information I have provided above and the SharpZipLib, you should be set to create your own word documents. And if you have a look at the OpenXML developer site, you should be able to create more or less the entire Office suite. But don’t get carried away…the Office suite is big and costs money for a reason…
In the next post I will cover the little library I have created for saving OpenXML documents. It will show both how to use the features that are in the library as well as how to extend it. And…don’t get your hopes up too high. It is still a tiny library…it can only write and not read files (there are other libraries available for that)…and I won;t have time to extend it until I get a project at work that needs it. But it can hopefully help you a little on the way…
So…see you later for now. And I will try to get the next post up as fast as possible…