XML Stack Overflow: Why does the js-xlsx project parse XML using Regex?

A highly rated SO answer says its absolutely killin' to parse XML with Regex, and yet js-xlsx, the latest XLSX I/O library on Github, uses Regex, yes, Regex, to parse the office XML formats. Why? Is it more robust than a browser's implementation? Is it faster? Is only done for cross-browser compatibility?

For example, at line 2197 of xlsx.js:


t[0].match(tagregex).forEach(function(x) {
    var y = parsexmltag(x);
    switch(y[0]) {
        case '<fills': case '<fills>': case '</fills>': break;

        /* 18.8.20 fill CT_Fill */
        case '<fill>': break;
        case '</fill>': styles.Fills.push(fill); fill = {}; break;

        /* 18.8.32 patternFill CT_PatternFill */
        case '<patternFill':
            if(y.patternType) fill.patternType = y.patternType;
            break;
        case '<patternFill/>': case '</patternFill>': break;

        /* 18.8.3 bgColor CT_Color */
        case '<bgColor':
            if(!fill.bgColor) fill.bgColor = {};
            if(y.indexed) fill.bgColor.indexed = parseInt(y.indexed, 10);
            if(y.theme) fill.bgColor.theme = parseInt(y.theme, 10);
            if(y.tint) fill.bgColor.tint = parseFloat(y.tint);
            /* Excel uses ARGB strings */
            if(y.rgb) fill.bgColor.rgb = y.rgb.substring(y.rgb.length - 6);
            break;
        case '<bgColor/>': case '</bgColor>': break;

Absolutely speechless. What is going on here?

XML Stack Overflow

Saturday, 31 January 2015

Why does the js-xlsx project parse XML using Regex?

No comments:

Post a Comment