An XML Schema (XSD) Definition to Prevent Leading Zeros in Integers
The XML Schema specification1 provides several handy data types. For example, xs:positiveInteger2 produces “the standard mathematical concept of the positive integer numbers.”
There’s a hidden gotcha in positiveInteger. It allows an anathema. Leading zeros.
For example, given the definition:
<xs:element name="node"> <xs:complexType> <xs:attribute name="number" type="xs:positiveInteger"/> </xs:complexType> </xs:element>
These are all valid3:
<node number="1"/> <node number="100"/> <node number="8675309"/> <node number="007"/>
That last one can cause all kinds of havoc.
When leading zeros are involved in data feeds, they have to be treated as either a string (to maintain the zeros) or converted into an actual integer. Given a system of any size or longevity, the likelihood of different processes making opposing choices approaches 100%. Subtle, super-annoying bugs are born. Ones that take a surprisingly large amount of time to fix4.
Thankfully, XML Schema is robust enough that we can define data types that prohibit leading zeros. For example:
<xs:simpleType name="_positive_integer_without_leading_zeros"> <xs:restriction base="xs:positiveInteger"> <xs:pattern value="\d*"/> </xs:restriction> </xs:simpleType> <xs:element name="node"> <xs:complexType> <xs:attribute name="number" type="_positive_integer_without_leading_zeros"/> </xs:complexType> </xs:element>
This works by using a Regular Expression5 to enforce the data format. It’s a little easier to understand by breaking pattern’s
value into two parts. First:
Anything inside square brackets identifies possible values for a single character. So,
 at the start of the pattern value means the first character must be either: 1, 2, 3, 4, 5, 6, 7, 8, or 9. The lack of zero means anything starting with a “0” won’t match and will therefore be rejected as invalid.
The second part of the pattern is:
\d (without the
*) tells the pattern matcher to look for any single digit. If it was by itself, it would mean there always has to be a second character and that character must be a digit. The
\d to allow “zero or more” digits.
If the data being matched is a single character, the
\d* has no real effect. If there are two or more characters, it enforces the restriction that every character from the second until the end must be a digit. Unlike the earlier
\d pattern includes all possible digits, including zero.
\d* pattern produces the desired behavior. These actual integers all pass the validation test:
<node number="1"/> <node number="100"/> <node number="8675309"/>
But this one’s accurately rejected as invalid:
This little snippet is now safe from those pesky little leading zeros sneaking in.
Count yourself lucky if you’ve never had to deal with leading zeros. If you want to avoid them in future XML use this type of custom data type instead of xs:positiveInteger.
This technique works equally well in XML Schema 1.0 and 1.1.
xs:positiveIntegerdata type allows ”+” at the start of the number (e.g. “+8675309”). The definition above doesn’t. I built it to deal with unique IDs from a database. None of which contain the ”+“. Changing the pattern value to
\+?\d*would accommodate the plus if you need it. Other variations are left as exercises for the reader.
While not all validators find the same things, the “007” string was a valid
xs:positiveInteger_value in Saxon-EE 126.96.36.199, LIBXML, and Xerces running in oXygen XML Editor. Speaking of which, if you do any XML work at all and don’t know about oXygen, you should check it out. It’s expensive but totally worth it.
This doesn’t even begin to get into what happens when everyone agrees that strings are the way to go but you run out of numbers and need to add a new zero to the front.
Regular Expressions - a sequence of characters that define a search pattern - the heart and soul of text processing for an old Perl coder like me.
Random Related Links
Launchy - quick launch app for Windows
Found this one via Life hacker . The program is Launchy . It's a keyboard quick launcher for windows. The default way to bring it up is by…
My father's flag
On Memorial Day, remembering those who served in the past and celebrating those who serve today.
Can't believe in all my years of photography, I've never seen this before. At least, not that I remember, and I think I would have…
New Photo Storage Philosiphy
I really don't like deleting things off of a computer. You can never be quite sure that you won't need whatever it is that you are throwing…
Eric Clapton's version of "Cocaine" came on the radio the yesterday. I don't often listen to the radio (thank to my mp3 player) and I hadn't…
All work and no play
I just looked up my site on the internet archive and looked up my site. Here are two grabs from July 21, 2001 and Dec. 3, 2001 . I had a…
I don't know what you would do with it, but a great way to take advantage of the old saw "When pigs fly" would be to get the domain…
I've been meaning to put a Favorite Icon on my site for some time and finally got around to it this weekend. It's on the home page right now…
I just saw Serenity. If you would like to know what I think, read the first comment. Please note, that it may contain a spoiler.
Offline Post Creation
Inspiration for blog posts can strike at any time. The fact that blogs are by definition on the web, this can make it difficult to post if…
April Fool's Day lasts for years on the web
The web has its own version of the time/space continuum. It's kinda two dimensional when you look at it on a screen, but those dimensions…
Audio Video Setup Test
Going through some audio and video tests to improve the streaming setup. youtube: https://www.youtube.com/watch?v=ytjt2frBNyU
I think it might be a moral imparative that I do something with " Amazon Web Services ". After all, we have the same initials....
I saw Dilert a few times when I was in College. Didn't find it funny at all. I now work in a fairly corporate environment and find it…
While listening to one of the generic pop stations on the radio recently, "Our Song" by Taylor Swift came on. I wouldn't have classified the…
I'm just catching up on some shows and finally watched the first two episodes of " TERMINATOR: THE SARAH CONNOR CHRONICLES ". They were…
Airplane sign hack
My flight back from ATL last night was on a Boeing 767-300. These aircraft have three rows of seats in the main cabin. Two seats for the…
The tornado missed me
I was in ATL again this week. Spent a few days this go round including Friday night when a tornado hit downtown including the Georgia Dome…
Speed Up Firefox a Little
This will setup FireFox to make multiple connections to a web server and pull stuff down faster. in the location bar type: about:config to…
Four links for April 27, 2011
The theme for today’s collection of interesting/useful links is: Short pieces of submitted content. Note: All these will have NSFW language…
Working, working, working....
Haven't posted in a while becuase things got a little wild at work. We have recently announced that we are going to be chaning partners for…
Alien watching the pres
Check out this video of the President and notice the alien reflection in the window pane over his right shoulder. (found via boing boing…
Our brains can process the spoken word faster than we tend to speak. An idea for a new service based on that is to take podcasts and…
Four Links for May 1, 2011
The theme for today’s collection of interesting/useful links is: Questions and Answers RFC 1855 - Netiquette Guidelines - * This document…
Video draws traffic and the 3 rules of the net
More and more the content I see come across my new reader contains video, and the quality of that video (both technically and content wise…
Several years ago I discovered a company called Mixonic that provides CD and DVD services. They do one off and short run disk duplication…
For those of you have who ever done development work in a company this will be extremely funny: Project Cartoon . Further example of the…
Not sure where this is from Though I'm sure the internet would tell me, I prefer not to look this time. Sometimes it's fun to just let the…