Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> XML also gives us SVG, which is lovely.

That implies that SVG needed XML. SVG just needed a structured data format. It could just as easily have used JSON or Protocol Buffers and it would still be SVG.

> XML is easy to parse

More like it seems easy to parse. Plenty of people think they are parsing XML but their ad hoc "parsers" know nothing about CDATA, DTDs, external entities, processing instructions, or comments.

> easy to edit

...except for gotchas like the fact that you need to entity-escape any ampersands in attributes (like "href").

> and easy to emit

Harder than it sounds. A single error renders the whole document invalid, and when you're compositing information from different data sources, it's easy to make mistakes: https://web.archive.org/web/20080701064734/http://diveintoma...

> The 1.0 specification http://www.w3.org/TR/REC-xml/ is not very long.

Sure, but combined with the other specs which are assumed to be part of a modern XML stack (namespaces at least, and often XML schema, XSLT, etc), you have grown a pretty complicated mess that isn't a great match for what it's often used for.



> It could just as easily have used JSON or Protocol Buffers and it would still be SVG.

Definitely not protocol buffers. XML gives us some really nice bits of SVG, like the ability to put attributes and tags in namespaces, so you can use Inkscape to edit your SVG file, store a bunch of Inkscape-specific data in the SVG file, and not have other editors puke. SVG isn't just data interchange, it's a document edited by humans. JSON doesn't accommodate that very well.

> More like it seems easy to parse. Plenty of people think they are parsing XML but their ad hoc "parsers" know nothing about CDATA, DTDs, external entities, processing instructions, or comments.

I wouldn't use an ad-hoc parser for JSON either. The problem here isn't XML, the problem is thinking that you can solve your problem with regular expressions. Ignoring or throwing errors for DTDs, external entities, and PIs is reasonable behavior most of the time, and most parsers can be set to strip comments and erase the differences between entities/CDATA/text. This behavior is good enough for 99% of the use cases.

> ...except for gotchas like the fact that you need to entity-escape any ampersands in attributes (like "href").

Escape sequences in XML are better than JSON, at least. Try escaping a character from the astral plane in JSON, you have to encode it in UTF-16 and then encode each item in the surrogate pair as a separate escape sequence. This is insane. (Yes, sometimes you want to transmit JSON or XML in 7-bit).

Escape sequences are a natural part of any text format, other than plain text.

> Sure, but combined with the other specs which are assumed to be part of a modern XML stack

Most of the modern XML stack is a mistake, a symptom of the years when people thought XML was the coolest thing ever. XSLT is the worst mistake of all. That doesn't mean that you have to use it. Most people don't.

Let's not fall into the trap of thinking that we should use JSON for everything, just like so many fell into the trap of thinking that they should use XML for everything. Both have their use cases.


> Definitely not protocol buffers.

You know Protocol Buffers has a text format, right?

> I wouldn't use an ad-hoc parser for JSON either. The problem here isn't XML, the problem is thinking that you can solve your problem with regular expressions.

You are drawing a false equivalence. No matter how you slice it, JSON is far, far simpler to (correctly) parse than XML.

> Escape sequences in XML are better than JSON, at least.

The best case you have for this is that you need to encode high-Unicode characters over a non-8-bit-clean channel? This a very fringe use case. And your argument is that "&#x10E6D" is way better than \uD803\uDE6D? Neither of those look particularly user-friendly to me.

> Let's not fall into the trap of thinking that we should use JSON for everything, just like so many fell into the trap of thinking that they should use XML for everything. Both have their use cases.

The point is not that JSON is best for everything, the point is that XML is best for almost nothing.


> Definitely not protocol buffers.

I fail to see why protocol buffers couldn't have worked for SVG. They are expressly defined so that you can have extensible types without breaking parsers...

> I wouldn't use an ad-hoc parser for JSON either. The problem here isn't XML

...but you can find plenty of "parsers" for both that don't actually parse either fully, correctly, or securely.

> Try escaping a character from the astral plane in JSON, you have to encode it in UTF-16 and then encode each item in the surrogate pair as a separate escape sequence.

One of many reasons to be annoyed by those who think JSON is a perfectly good data format.

> Escape sequences are a natural part of any text format, other than plain text.

Another reason to loathe them. ;-)

That said, you can avoid escape sequences in text formats, so long as your parse rules are length delimited, rather than based on reserved characters.

> Most of the modern XML stack is a mistake, a symptom of the years when people thought XML was the coolest thing ever. XSLT is the worst mistake of all. That doesn't mean that you have to use it. Most people don't.

Agreed, but generally if you aren't supporting it, you aren't using XML.

> Let's not fall into the trap of thinking that we should use JSON for everything, just like so many fell into the trap of thinking that they should use XML for everything. Both have their use cases.

Yes, though I'd argue they are primarily for causing problems.


Actually JSON is such a simple format, that everybody with a bachelor in computer science should be able to write a parser as correct as standard implementations (they often ignore that the root element needs to be an object and multiple entries with the same identifier)


You're actually wrong.

(1) "A JSON text ... conforms to the JSON value grammar."

(2) The entire standard for the object grammar is, "An object structure is represented as a pair of curly bracket tokens surrounding zero or more name/value pairs. A name is a string. A single colon token follows each name, separating the name from the value. A single comma token separates a value from a following name."

In other words, people SHOULD ignore both of those, because ignoring both of those is a part of the ECMA-404 JSON standard. (The latter actually wasn't even a part of RFC 4627; the operative word there was SHOULD, not MUST.)


It's simple, but it's limited at the same time. And like when one tries to use inappropriate data structure, this may lead to issues.


Yes, not allowing comments for example. How could one come up without such a basic feature?


It all boils down to JSON's ultimate lack of extensibility. This is a significant downside, but, on the other hand, it's also a strong feature of JSON.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: