Avoid Using "< [Cdata[ ]]>" in RSS

" is often used in feeds to escape XML special characters. However, it can cause issues in some edge cases.">

" in RSS"> " is often used in feeds to escape XML special characters. However, it can cause issues in some edge cases.">

Avoid using "" in RSS | WaspDev Blog

is very commonly used in RSS (also Atom) feeds to escape XML special characters. At first glance, it looks very convenient, you simply add blocks and write any (almost) content inside of them without worrying about escaping characters:

item> title> in Titles]]>title> link>http://example.comlink> description> This description contains HTML markup. It allows us to use characters like "&" and brackets directly. ]]> description> item>

Why not CDATA?

CDATA seems to be perfect, isn't it? Except it's not possible to escape some CDATA special character sequences inside a single CDATA block, particularly ]]> (the one that ends the CDATA block). In order to do that, you have to split the CDATA block into multiple parts:

text> world]]> text>

The encoded text is "hello ]]> world". As you can see, the XML code is less readable now. CDATA loses most of its simplicity advantage.

Even though splitting makes the encoding of ]]> possible, I would say it's still not worth using CDATA:

It adds a special edge case for ]]>, which the serializer must handle.

It can mislead people into thinking the content is raw HTML or somehow safer. No, it is not.

It makes output less uniform, because sometimes you need split CDATA blocks.

It does not change the parsed value. XML parsers expose the same text either way.

It can make debugging confusing, especially if the content itself discusses CDATA, like this article title does... Just look at the RSS feed of this blog.

What to do instead?

Just escape these characters (works for HTML too):

function xmlEscape(text) { return text .replaceAll("&", "&") .replaceAll(", "<") .replaceAll(">", ">") .replaceAll('"', """) .replaceAll("'", "'");

Normal escaping is simpler and more uniform.

OK, but some people might say that CDATA might make the RSS content smaller on average since characters don't need any escape (which requires more characters in encoded form) and ]]> is encountered rarely. Fair point, however:

Feeds are usually gzip-compressed . Repeated strings like <, >, and & compress very well.

RSS feed size is rarely the bottleneck . Images, HTML pages, CSS, JS, and network latency usually matter much more.

CDATA has a special edge case . You still need to correctly handle ]]>.

Normal escaping is simpler and more uniform . One escaping path works for titles, descriptions, Atom, RSS, attributes, metadata, etc.

Conclusion

Here I listed the reasons why you should avoid using CDATA. This is especially true if you are going to implement your custom RSS / Atom feed generator. Many libraries / frameworks / CMSs still generate CDATA for RSS / Atom feeds and many of them handle the mentioned character sequence ]]> in their own ways. And they are perfectly fine to use if you have to rely on them. CDATA is common because it is convenient for legacy feed generators and visually cleaner for embedded HTML. But for new code, ordinary XML escaping is usually cleaner and more uniform.

See you later.

Load Disqus comments

Disqus uses cookies, please check Privacy & cookies before loading the comments.

Please enable JavaScript to view the comments powered by Disqus.

⬆UP

This site uses cookies for some services. By clicking Accept , you agree to their use. To find out more, including how to control cookies, see here: Privacy & cookies.

Reject

Avoid Using "< [Cdata[ ]]>" in RSS

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

The Mirror Is Part of the Machine

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits