编写 XML 内容时的特殊字符转换

XmlWriter 包括方法 WriteRaw,使您可以手动写出原始标记。 该方法禁止转义特殊字符。 这与 WriteString 方法相反,后者将某些字符串转义为等效的实体引用。 被转义的字符可从以下位置找到:XML 1.0 建议的第 2.4 节“Character Data and Markup”(字符数据和标记),以及可扩展标记语言 (XML) 1.0(Fifth Edition [第五版])建议的第 3.3.3 节“Attribute-Value Normalization”(属性值的标准化)。 如果在编写属性值时调用 WriteString 方法,将转义 ' 和 "。 字符值 0x-0x1F 通过  编码为数字字符实体 �,空白字符 0x9、0x10 和 0x13 除外。

因此,何时使用 WriteStringWritingRaw 的指导原则是:在需要遍历每个字符以查找实体字符时使用 WriteString,而 WriteRaw 原样编写为其提供的内容。

WriteNode 方法复制当前节点中的所有内容,并将读取器放在写入器的位置。 然后,读取器前进到下一个同级节点进行进一步处理。 writeNode 方法可以快速地将一个文档中的信息准确地提取到另一个文档中。

下表显示 WriteNode 方法支持的 NodeTypes。

节点类型

说明

Element

写出元素节点以及所有属性节点。

Attribute

无操作。 使用 WriteStartAttribute 或 WriteAttributeString 来编写属性。

Text

写出文本节点。

CDATA

写出 CDATA 节节点。

EntityReference

写出实体引用节点。

ProcessingInstruction

写出 PI 节点。

Comment

写出注释节点。

DocumentType

写出 DocType 节点。

Whitespace

写出空白节点。

SignificantWhitespace

写出空白节点。

EndElement

无操作。

EndEntity

无操作。

下面的示例显示当提供“<”字符时,WriteStringWriteRaw 方法之间的差异。 此代码示例使用 WriteString

w.WriteStartElement("myRoot")
w.WriteString("<")
w.WriteEndElement()
Dim tw As New XmlTextWriter(Console.Out)
tw.WriteDocType(name, pubid, sysid, subset)
w.WriteStartElement("myRoot");
w.WriteString("<");
w.WriteEndElement();
XmlTextWriter tw = new XmlTextWriter(Console.Out);
tw.WriteDocType(name, pubid, sysid, subset);

输出

<myRoot>&lt;</myRoot>

此代码示例使用 WriteRaw,输出将非法字符作为元素内容。

w.WriteStartElement("myRoot")
w.WriteRaw("<")
w.WriteEndElement()
w.WriteStartElement("myRoot");
w.WriteRaw("<");
w.WriteEndElement();

输出

<myRoot><</myRoot>

下面的示例显示如何将 XML 文档从以元素为中心的文档转换为以属性为中心的文档。 还可以将 XML 从以属性为中心的文档转换回以元素为中心的文档。 以元素为中心的模式意味着 XML 文档被设计为具有许多元素但具有很少属性。 以属性为中心的设计具有较少的元素,那些在以元素为中心的设计中应该是元素的内容变成了元素的属性。 因此元素较少,而每个元素的属性较多。

如果已经以任何一种模式设计了 XML 数据,该示例将很有用,因为这样可以转换为另一种模式。

下面的 XML 使用以元素为中心的文档。 元素不包含任何属性。

输入 - centric.xml

<?xml version='1.0' encoding='UTF-8'?>
<root>
    <Customer>
        <firstname>Jerry</firstname>
        <lastname>Larson</lastname>
        <Order>
        <OrderID>Ord-12345</OrderID>
          <OrderDetail>
            <Quantity>1301</Quantity>
            <UnitPrice>$3000</UnitPrice>
            <ProductName>Computer</ProductName>
          </OrderDetail>
        </Order>
    </Customer>
</root>

下面的示例应用程序执行转换。

' The program will convert an element-centric document to an 
' attribute-centric document or element-centric to attribute-centric.
Imports System
Imports System.Xml
Imports System.IO
Imports System.Text
Imports System.Collections

Class ModeConverter
   Private bufferSize As Integer = 2048
  
   Friend Class ElementNode
      Private _name As [String]
      Private _prefix As [String]
      Private _namespace As [String]
      Private _startElement As Boolean
      
      Friend Sub New()
         Me._name = Nothing
         Me._prefix = Nothing
         Me._namespace = Nothing
         Me._startElement = False
      End Sub 'New
      
      Friend Sub New(prefix As [String], name As [String], [nameSpace] As [String])
         Me._name = name
         Me._prefix = prefix
         Me._namespace = [nameSpace]
      End Sub 'New
      
      Public ReadOnly Property name() As [String]
         Get
            Return _name
         End Get
      End Property
      
      Public ReadOnly Property prefix() As [String]
         Get
            Return _prefix
         End Get
      End Property
      
      Public ReadOnly Property [nameSpace]() As [String]
         Get
            Return _namespace
         End Get
      End Property
      
      Public Property startElement() As Boolean
         Get
            Return _startElement
         End Get
         Set
            _startElement = value
         End Set
      End Property
   End Class 'ElementNode
   
   ' Entry point which delegates to C-style main Private Function.
   Public Overloads Shared Sub Main()
      Main(System.Environment.GetCommandLineArgs())
   End Sub
   
   Overloads Public Shared Sub Main(args() As [String])
      Dim modeConverter As New ModeConverter()
      If args(0) Is Nothing Or args(0) = "?" Or args.Length < 2 Then
         modeConverter.Usage()
         Return
      End If
      Dim sourceFile As New FileStream(args(1), FileMode.Open, FileAccess.Read, FileShare.Read)
      Dim targetFile As New FileStream(args(2), FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite)
      If args(0) = "-a" Then
         modeConverter.ConertToAttributeCentric(sourceFile, targetFile)
      Else
         modeConverter.ConertToElementCentric(sourceFile, targetFile)
      End If
      Return
   End Sub 'Main
   
   Public Sub Usage()
      Console.WriteLine("? This help message " + ControlChars.Lf)
      Console.WriteLine("Convert -mode sourceFile, targetFile " + ControlChars.Lf)
      Console.WriteLine(ControlChars.Tab + " mode: e element centric" + ControlChars.Lf)
      Console.WriteLine(ControlChars.Tab + " mode: a attribute centric" + ControlChars.Lf)
   End Sub 'Usage
   
   Public Sub ConertToAttributeCentric(sourceFile As FileStream, targetFile As FileStream)
      ' Stack is used to track how many.
      Dim stack As New Stack()
      Dim reader As New XmlTextReader(sourceFile)
      reader.Read()
      Dim writer As New XmlTextWriter(targetFile, reader.Encoding)
      writer.Formatting = Formatting.Indented
      
      Do
         Select Case reader.NodeType
            Case XmlNodeType.XmlDeclaration
               writer.WriteStartDocument((Nothing = reader.GetAttribute("standalone") Or "yes" = reader.GetAttribute("standalone")))
            
            Case XmlNodeType.Element
               Dim element As New ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI)
               
               If 0 = stack.Count Then
                  writer.WriteStartElement(element.prefix, element.name, element.nameSpace)
                  element.startElement = True
               End If
               
               stack.Push(element)
            
            Case XmlNodeType.Attribute
                  Throw New Exception("We should never been here!")
            
            Case XmlNodeType.Text
               Dim attribute As New ElementNode()
               attribute = CType(stack.Pop(), ElementNode)
               element = CType(stack.Peek(), ElementNode)
               If Not element.startElement Then
                  writer.WriteStartElement(element.prefix, element.name, element.nameSpace)
                  element.startElement = True
               End If
               writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace)
               writer.WriteRaw(reader.Value)
               reader.Read() 'jump over the EndElement
            
            Case XmlNodeType.EndElement
               writer.WriteEndElement()
               stack.Pop()
            
            
            Case XmlNodeType.CDATA
               writer.WriteCData(reader.Value)
            
            Case XmlNodeType.Comment
               writer.WriteComment(reader.Value)
            
            Case XmlNodeType.ProcessingInstruction
               writer.WriteProcessingInstruction(reader.Name, reader.Value)
            
            Case XmlNodeType.EntityReference
               writer.WriteEntityRef(reader.Name)
            
            Case XmlNodeType.Whitespace
                writer.WriteWhitespace(reader.Value);
            
            Case XmlNodeType.None
               writer.WriteRaw(reader.Value)
            
            Case XmlNodeType.SignificantWhitespace
               writer.WriteWhitespace(reader.Value)
            
            Case XmlNodeType.DocumentType
               writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value)
            
            Case XmlNodeType.EndEntity
            
            
            Case Else
               Console.WriteLine(("UNKNOWN Node Type = " + CInt(reader.NodeType)))
         End Select
      Loop While reader.Read()
      
      writer.WriteEndDocument()
      
      reader.Close()
      writer.Flush()
      writer.Close()
   End Sub 'ConertToAttributeCentric
   
   
   ' Use the WriteNode to simplify the process.
   Public Sub ConertToElementCentric(sourceFile As FileStream, targetFile As FileStream)
      Dim reader As New XmlTextReader(sourceFile)
      reader.Read()
      Dim writer As New XmlTextWriter(targetFile, reader.Encoding)
      writer.Formatting = Formatting.Indented
      Do
         Select Case reader.NodeType
            
            Case XmlNodeType.Element
               writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI)
               If reader.MoveToFirstAttribute() Then
                  Do
                     writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI)
                     writer.WriteRaw(reader.Value)
                     writer.WriteEndElement()
                  Loop While reader.MoveToNextAttribute() 
                  
                  writer.WriteEndElement()
               End If
            
            Case XmlNodeType.Attribute
                  Throw New Exception("We should never been here!")
            
            Case XmlNodeType.Whitespace
               writer.WriteWhitespace(reader.Value)
            
            Case XmlNodeType.EndElement
               writer.WriteEndElement()
            
            Case XmlNodeType.Text
                  Throw New Exception("The input document is not a attribute centric document" + ControlChars.Lf)
            
            Case Else
               Console.WriteLine(reader.NodeType)
               writer.WriteNode(reader, False)
         End Select
      Loop While reader.Read()
      
      reader.Close()
      writer.Flush()
      writer.Close()
   End Sub 'ConertToElementCentric
End Class 'ModeConverter
// The program will convert an element-centric document to an 
// attribute-centric document or element-centric to attribute-centric.
using System;
using System.Xml;
using System.IO;
using System.Text;
using System.Collections;

class ModeConverter {
    private const int bufferSize=2048;

    internal class ElementNode {
        String _name;
        String _prefix;
        String _namespace;
        bool   _startElement;
        internal ElementNode() {
            this._name = null;
            this._prefix = null;
            this._namespace = null;
            this._startElement = false;
        }
        internal ElementNode(String prefix, String name, String nameSpace) {
            this._name = name;
            this._prefix = prefix;
            this._namespace = nameSpace;
        }
        public String name{
            get { return _name;   }
        }
        public String prefix{
            get { return _prefix;   }
        }
        public String nameSpace{
            get { return _namespace;   }
        }
        public bool startElement{
            get { return _startElement;   }
            set { _startElement = value;}
        }
    }
    public static void Main(String[] args) {
        ModeConverter modeConverter = new ModeConverter();
        if (args[0]== null || args[0]== "?" || args.Length < 2 ) {
            modeConverter.Usage();
            return;
        }
        FileStream sourceFile = new FileStream(args[1], FileMode.Open, FileAccess.Read, FileShare.Read);
        FileStream targetFile = new FileStream(args[2], FileMode.Create, FileAccess.ReadWrite, FileShare.ReadWrite);
        if (args[0] == "-a") {
            modeConverter.ConertToAttributeCentric(sourceFile, targetFile);
        } else {
            modeConverter.ConertToElementCentric(sourceFile, targetFile);
        }
        return;
    }
    public void Usage() {
        Console.WriteLine("? This help message \n");
        Console.WriteLine("Convert -mode sourceFile, targetFile \n");
        Console.WriteLine("\t mode: e element centric\n");
        Console.WriteLine("\t mode: a attribute centric\n");
    }
    public void ConertToAttributeCentric(FileStream sourceFile, FileStream targetFile) {
        // Stack is used to track how many.
        Stack stack = new Stack();
        XmlTextReader reader = new XmlTextReader(sourceFile);
        reader.Read();
        XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
        writer.Formatting = Formatting.Indented;

        do {
            switch (reader.NodeType) {
                case XmlNodeType.XmlDeclaration:
                    writer.WriteStartDocument(null == reader.GetAttribute("standalone") || "yes" == reader.GetAttribute("standalone"));
                    break;

                case XmlNodeType.Element:
                    ElementNode element = new ElementNode(reader.Prefix, reader.LocalName, reader.NamespaceURI);

                    if (0 == stack.Count) {
                        writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
                        element.startElement=true;
                    }

                    stack.Push(element);
                    break;

                case XmlNodeType.Attribute:
                    throw new Exception("We should never been here!");

                case XmlNodeType.Text:
                    ElementNode attribute = new ElementNode();
                    attribute = (ElementNode)stack.Pop();
                    element = (ElementNode)stack.Peek();
                    if (!element.startElement) {
                        writer.WriteStartElement(element.prefix, element.name, element.nameSpace);
                        element.startElement=true;
                    }
                    writer.WriteStartAttribute(attribute.prefix, attribute.name, attribute.nameSpace);
                    writer.WriteRaw(reader.Value);
                    reader.Read(); //jump over the EndElement
                    break;

                case XmlNodeType.EndElement:
                    writer.WriteEndElement();
                    stack.Pop();

                    break;

                case XmlNodeType.CDATA:
                    writer.WriteCData(reader.Value);
                    break;

                case XmlNodeType.Comment:
                    writer.WriteComment(reader.Value);
                    break;

                case XmlNodeType.ProcessingInstruction:
                    writer.WriteProcessingInstruction(reader.Name, reader.Value);
                    break;

                case XmlNodeType.EntityReference:
                    writer.WriteEntityRef( reader.Name);
                    break;

                case XmlNodeType.Whitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.None:
                    writer.WriteRaw(reader.Value);
                    break;

                case XmlNodeType.SignificantWhitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.DocumentType:
                    writer.WriteDocType(reader.Name, reader.GetAttribute("PUBLIC"), reader.GetAttribute("SYSTEM"), reader.Value);
                    break;

                case XmlNodeType.EndEntity:
                    break;


                default:
                    Console.WriteLine("UNKNOWN Node Type = " + ((int)reader.NodeType));
                    break;
             }
        } while (reader.Read());

        writer.WriteEndDocument();

        reader.Close();
        writer.Flush();
        writer.Close();
    }

    // Use the WriteNode to simplify the process.
    public void ConertToElementCentric(FileStream sourceFile, FileStream targetFile) {
        XmlTextReader reader = new XmlTextReader(sourceFile);
        reader.Read();
        XmlTextWriter writer = new XmlTextWriter(targetFile, reader.Encoding);
        writer.Formatting = Formatting.Indented;
        do {
            switch (reader.NodeType) {

                case XmlNodeType.Element:
                    writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                    if (reader.MoveToFirstAttribute()) {
                         do {
                            writer.WriteStartElement(reader.Prefix, reader.LocalName, reader.NamespaceURI);
                            writer.WriteRaw(reader.Value);
                            writer.WriteEndElement();

                        } while(reader.MoveToNextAttribute());

                        writer.WriteEndElement();
                    }
                    break;

                case XmlNodeType.Attribute:
                     throw new Exception("We should never been here!");

                case XmlNodeType.Whitespace:
                    writer.WriteWhitespace(reader.Value);
                    break;

                case XmlNodeType.EndElement:
                    writer.WriteEndElement();
                    break;

                case XmlNodeType.Text:
                    throw new Exception("The input document is not a attribute centric document\n");

                default:
                    Console.WriteLine(reader.NodeType);
                    writer.WriteNode(reader, false);
                    break;
             }
        } while (reader.Read());

        reader.Close();
        writer.Flush();
        writer.Close();
    }
}

代码编译后,在命令行键入 <compiled name> -a centric.xml <output file name> 运行代码。输出文件必须存在,而且可以是空文本文件。

对于下面的输出,假定 C# 程序已编译到 centric_cs,命令行是 C:\centric_cs -a centric.xml centric_out.xml

模式 -a 告诉应用程序将输入的 XML 转换为以属性为中心,而模式 -e 将其改变为以元素为中心。 下面的输出是使用 -a 模式生成的新的以属性为中心的输出。 元素现在包含属性而不是嵌套的元素。

输出:centric_out.xml

<?xml version="1.0" encoding="utf-8" standalone="yes" ?>

<root>
<Customer firstname="Jerry" lastname="Larson">
 <Order OrderID="Ord-12345">
  <OrderDetail Quantity="1301" UnitPrice="$3000" ProductName="Computer" />
 </Order>
</Customer>
</root>

请参见

参考

XmlTextWriter

XmlTextWriter

XmlWriter

XmlWriter

概念

使用 XmlTextWriter 创建格式正确的 XML

XmlTextWriter 的 XML 输出格式设置

XmlTextWriter 的命名空间功能