Export Docx/doc First Header And Footer As Docx File Using Openxml

September 30, 2023 Post a Comment

I want to ask how can i convert Header/Footer part of MS Word Document (doc/docx) to HTML. I'm opening the Document like: using (WordprocessingDocument wDoc = WordprocessingDocume

Solution 1:

a lot of struggle led me to the following solution:

I Created a function for converting byte array of docx Document to Html As Follows

publicstringConvertToHtml(byte[] fileInfo, string fileName = "Default.docx")
    {
        if (string.IsNullOrEmpty(fileName) || Path.GetExtension(fileName) != ".docx")
            return"Unsupported format";

        //FileInfo fileInfo = new FileInfo(fullFilePath);string htmlText = string.Empty;
        try
        {
            htmlText = ParseDOCX(fileInfo, fileName);
        }
        catch (OpenXmlPackageException e)
        {

            if (e.ToString().Contains("Invalid Hyperlink"))
            {
                using (MemoryStream fs = new MemoryStream(fileInfo))
                {
                    UriFixer.FixInvalidUri(fs, brokenUri => FixUri(brokenUri));
                }
                htmlText = ParseDOCX(fileInfo, fileName);
            }
        }
        return htmlText;
    }

Where the ParseDOCX does all the convertion. The code of ParseDOCX :

private string ParseDOCX(byte[] fileInfo, string fileName)
    {
        try
        {
            //byte[] byteArray = File.ReadAllBytes(fileInfo.FullName);
            using (MemoryStreammemoryStream=newMemoryStream())
            {
                memoryStream.Write(fileInfo, 0, fileInfo.Length);

                using (WordprocessingDocumentwDoc= WordprocessingDocument.Open(memoryStream, true))
                {

                    intimageCounter=0;

                    varpageTitle= fileName;
                    varpart= wDoc.CoreFilePropertiesPart;
                    if (part != null)
                        pageTitle = (string)part.GetXDocument().Descendants(DC.title).FirstOrDefault() ?? fileName;

                    WmlToHtmlConverterSettingssettings=newWmlToHtmlConverterSettings()
                    {
                        AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
                        PageTitle = pageTitle,
                        FabricateCssClasses = true,
                        CssClassPrefix = "pt-",
                        RestrictToSupportedLanguages = false,
                        RestrictToSupportedNumberingFormats = false,
                        ImageHandler = imageInfo =>
                        {
                            ++imageCounter;
                            stringextension= imageInfo.ContentType.Split('/')[1].ToLower();
                            ImageFormatimageFormat=null;
                            if (extension == "png") imageFormat = ImageFormat.Png;
                            elseif (extension == "gif") imageFormat = ImageFormat.Gif;
                            elseif (extension == "bmp") imageFormat = ImageFormat.Bmp;
                            elseif (extension == "jpeg") imageFormat = ImageFormat.Jpeg;
                            elseif (extension == "tiff")
                            {
                                extension = "gif";
                                imageFormat = ImageFormat.Gif;
                            }
                            elseif (extension == "x-wmf")
                            {
                                extension = "wmf";
                                imageFormat = ImageFormat.Wmf;
                            }

                            if (imageFormat == null)
                                returnnull;

                            stringbase64=null;
                            try
                            {
                                using (MemoryStreamms=newMemoryStream())
                                {
                                    imageInfo.Bitmap.Save(ms, imageFormat);
                                    varba= ms.ToArray();
                                    base64 = System.Convert.ToBase64String(ba);
                                }
                            }
                            catch (System.Runtime.InteropServices.ExternalException)
                            { returnnull; }


                            ImageFormatformat= imageInfo.Bitmap.RawFormat;
                            ImageCodecInfocodec= ImageCodecInfo.GetImageDecoders().First(c => c.FormatID == format.Guid);
                            stringmimeType= codec.MimeType;

                            stringimageSource= string.Format("data:{0};base64,{1}", mimeType, base64);

                            XElementimg=newXElement(Xhtml.img,
                                newXAttribute(NoNamespace.src, imageSource),
                                imageInfo.ImgStyleAttribute,
                                imageInfo.AltText != null ?
                                    newXAttribute(NoNamespace.alt, imageInfo.AltText) : null);
                            return img;
                        }

                    };
                    XElementhtmlElement= WmlToHtmlConverter.ConvertToHtml(wDoc, settings);

                    varhtml=newXDocument(newXDocumentType("html", null, null, null), htmlElement);
                    varhtmlString= html.ToString(SaveOptions.DisableFormatting);
                    return htmlString;
                }
            }
        }
        catch (Exception)
        {
            return"File contains corrupt data";
        }
    }

So far everything looked nice and easy but then i realized that the Header and the Footer of the Document are just skipt, so i had to somehow convert them. I tried to use the GetStream() Method of HeaderPart, but of course exception was throw, cuz the Header tree is not the same as the one of the Document.

Then i decided to extract the Header and Footer as new documents (having hard time with this) with openXML's WordprocessingDocument headerDoc = WordprocessingDocument.Create(headerStream,Document) but unfortunaly the convertion of this document was also unsuccsesful as you might thing, because this is just creating a plain docx document without any settings,styles,webSettings etc. . This took a lot of time to figute out.

SO finaly i decided to Create a new Document Via Cathal's DocX Library and it finaly came to live. The Code is as follows :

publicstringConvertHeaderToHtml(HeaderPart header)
    {

        using (MemoryStream headerStream = new MemoryStream())
        {
            //Cathal's Docx Createvar newDocument = Novacode.DocX.Create(headerStream);
            newDocument.Save();

            using (WordprocessingDocument headerDoc = WordprocessingDocument.Open(headerStream,true))
            {
                var headerParagraphs = new List<OpenXmlElement>(header.Header.Elements());
                var mainPart = headerDoc.MainDocumentPart;

                //Cloning the List is necesery because it will throw exception for the reason// that you are working with refferences of the Elements
                mainPart.Document.Body.Append(headerParagraphs.Select(h => (OpenXmlElement)h.Clone()).ToList());

                //Copies the Header RelationShips as Document'sforeach (IdPartPair parts in header.Parts)
                {
                    //Very important second parameter of AddPart, if not set the relationship ID is being changed// and the wordDocument pictures, etc. wont show
                    mainPart.AddPart(parts.OpenXmlPart,parts.RelationshipId);
                }
                headerDoc.MainDocumentPart.Document.Save();
                headerDoc.Save();
                headerDoc.Close();
            }
            return ConvertToHtml(headerStream.ToArray());
        }
    }

So that was with the Header. I'm passing the HeaderPart and getting its Header then Elements. Extracting the relationships, which is very important if you have images in the header, and importing them in the Document itself And the Document is Ready for convertion.

The same steps are used to Generate the Html out of the Footer.

Hope This will help some in his Duty.

Code Html5

Export Docx/doc First Header And Footer As Docx File Using Openxml

Solution 1:

Post a Comment for "Export Docx/doc First Header And Footer As Docx File Using Openxml"