Date Format Validation
-
What do you suppose is the best way to determine whether a string is a valid date-time in a prescribed format (but a format with a number of options) in Java, without using JodaTime or Java 8. Because I can't have those things, (and that's probably TR but it's also beside the point).
The format in question is the XSD specification of a datetime: [-]CCYY-MM-DDThhss[Z|(+|-)hh:mm]. I.e, in addition to the required date and time separated by a T, it allows but doesn't demand any of: a leading minus, up to 5 digits of fractional seconds, and either of two formats for timezones.
The context is a datetime field in a request to a SOAP webservice built around Axis2. It should be noted that I don't actually need to parse this field, I just need to be sure it's there and valid.
Method 1 (rejected): Axis2's ConverterUtil.convertToDateTime(String). Default in the generated Axis2 code. This pretends to ensure that the format is valid (throwing Exceptions resulting in SOAP faults if it isn't) before parsing it to a Calendar, but in reality will happily pretend to parse something like 2017-13-43T00:00:00, while actually leaving the field null. For raisins, this blows up on me later on in such a way that I can't handle it.
Method 2: leave it as a String and validate with a big ugly regex. Used in most equivalent places in our codebase. The current regex cheerfully allows such dates as 2017-13-43T25:63:99, but refuses the leading - or fractional seconds unless the timezone is Z.
I could improve on it so that it's actually right and a bit less clumsily written, but it would inevitably be big and ugly.
Method 3: try to parse the String with SimpleDateFormat, then use a combination of String.replace(), length checking, and a smaller and less ugly regex to verify that it has 4 digits in the year because I can't get SimpleDateFormat to reject that even in strict mode, and that anything after the seconds is valid (fractional seconds and/or timezone) since it allows a valid datestring to be suffixed with any crap.
SimpleDateFormat is also kind of slow, but not unacceptably slow for my purposes. I can't use FastDateFormat for the same reason I can't use JodaTime.
I'm currently using method 3 and it works but I don't like it. It seemed sensible enough to try and parse the String but all the surrounding crud because SimpleDateFormat sucks is quite ugly. Can anyone suggest a) a better way or b) whether 2 or 3 is better.
-
Assuming these 3 options are your only options, I'd have to agree that 3 is the best option.
On a side note, which Java version are you using? 7?
As for the 4-digit years, this is even documented in SimpleDateFormat's docs. If you have a two digit year when it expects a 4 digit year, it will treat that as a literal year. i.e. 01/01/17 (assuming
dd/MM/yyyy
orMM/dd/yyyy
) would be treated as the year 17 AD; you could also check if the resulting year is less than 1000.
-
@carrievs I'd make a simple regex that just captures structural parts of timestamp, then run additional validation for each capture group.
-
@carrievs said in Date Format Validation:
Method 3: try to parse the String with SimpleDateFormat, then use a combination of String.replace(), length checking, and a smaller and less ugly regex to verify that it has 4 digits in the year because I can't get SimpleDateFormat to reject that even in strict mode, and that anything after the seconds is valid (fractional seconds and/or timezone) since it allows a valid datestring to be suffixed with any crap.
Probably least insane. IIRC you also need to check the parameters, 2017-01-43 will be parsed to something like 2017-02-12. So you need to check year, month and dayof month matches the raw numbers.
I may recall incorrectly, but i think i ran into that problem the last time i tried to do the same.
I ended up using jodatime.
-
@swayde said in Date Format Validation:
@carrievs said in Date Format Validation:
Method 3: try to parse the String with SimpleDateFormat, then use a combination of String.replace(), length checking, and a smaller and less ugly regex to verify that it has 4 digits in the year because I can't get SimpleDateFormat to reject that even in strict mode, and that anything after the seconds is valid (fractional seconds and/or timezone) since it allows a valid datestring to be suffixed with any crap.
Probably least insane. IIRC you also need to check the parameters, 2017-01-43 will be parsed to something like 2017-02-12. So you need to check year, month and dayof month matches the raw numbers.
And to check that, he'd need to parse the original string.
-
@carrievs said in Date Format Validation:
Can anyone suggest a) a better way or b) whether 2 or 3 is better.
If it is similar to an XSD datetime, why not try to use code from JAXB or an XSD validator?
See e.g. https://stackoverflow.com/a/5732733/983949 - so far DatatypeConverter seems to raise exceptions for those of your examples I've quickly tried.
-
@gąska said in Date Format Validation:
And to check that, he'd need to parse the original string.
Yes, but not do the date logic. Parsing isn't hard as such, the date logic is the hard part, if date.getYear() matches 2017, date.getMonth() matches 01 and date.getDayOfMonth() matches 27 you're golden, it's an actual date in a (probably gregorian) calendar.
https://www.youtube.com/watch?v=-5wpm-gesOY
-
@swayde said in Date Format Validation:
@carrievs said in Date Format Validation:
Method 3: try to parse the String with SimpleDateFormat, then use a combination of String.replace(), length checking, and a smaller and less ugly regex to verify that it has 4 digits in the year because I can't get SimpleDateFormat to reject that even in strict mode, and that anything after the seconds is valid (fractional seconds and/or timezone) since it allows a valid datestring to be suffixed with any crap.
Probably least insane. IIRC you also need to check the parameters, 2017-01-43 will be parsed to something like 2017-02-12. So you need to check year, month and dayof month matches the raw numbers.
I may recall incorrectly, but i think i ran into that problem the last time i tried to do the same.
I ended up using jodatime.I ran into that but setLenient(false) sorted it out.
-
@carrievs said in Date Format Validation:
I ran into that but setLenient(false) sorted it out.
I'd liked to have known this a few years ago. :/ Oh well...
-
:D GG NodeBB.
Now on to actually read the thread...
-
Option 2 will at least guarantee a steady flow of WTF stories for years to come.
-
@carrievs said in Date Format Validation:
I don't actually need to parse this field, I just need to be sure it's there and valid.
Empirical evidence surely exists that says this is basically the same thing...
@gąska said in Date Format Validation:
And to check that, he'd need to parse the original string.
Exactly!
@nerd4sale said in Date Format Validation:
Option 2 will at least guarantee a steady flow of WTF stories for years to come.
Promote suffering now or do it right so the world gets ever so vanishingly slightly better... Oh choices...
-
@tsaukpaetra said in Date Format Validation:
GG NodeBB.
Is there any way to escape emoji? I put backslashes in but it had no effect.
@tsaukpaetra said in Date Format Validation:
I don't actually need to parse this field, I just need to be sure it's there and valid.
Empirical evidence surely exists that says this is basically the same thing...
I misspoke. I mean that I don't actually need to end up with a Date (or any other date-representing Object) that represents the same moment in time as the String.
-
@carrievs said in Date Format Validation:
@tsaukpaetra said in Date Format Validation:
GG NodeBB.
Is there any way to escape emoji? I put backslashes in but it had no effect.
Like so:
@carrievs said in Date Format Validation:
[-]CCYY-MM-DDThh\:mm\:ss[Z|(+|-)hh:mm]
Single back-tick-surrounded text is for "inline" code. Tripple it on lines to have NodeBB attempt to figure out what kind of code it is:
[-]CCYY-MM-DDThh\:mm\:ss[Z|(+|-)hh:mm]
-
[-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
&zwnj
is your friend
-
@jbert said in Date Format Validation:
@carrievs said in Date Format Validation:
Can anyone suggest a) a better way or b) whether 2 or 3 is better.
If it is similar to an XSD datetime, why not try to use code from JAXB or an XSD validator?
See e.g. https://stackoverflow.com/a/5732733/983949 - so far DatatypeConverter seems to raise exceptions for those of your examples I've quickly tried.
@CarrieVS
I actually tested the following on Java 1.7:javax.xml.bind.DatatypeConverter.parseDateTime("2017-99-01T01:01:01")
Again, this is included in the JRE since Java 1.6 and it does raise an exception if you enter anything invalid. If you don't want the parsed result you can still discard it.
-
@jbert Thanks. I've finished the bit of work I was doing so too late to change now, but next time I'm touching one that code I'll try it out.
-
@tsaukpaetra said in Date Format Validation:
@carrievs said in Date Format Validation:
@tsaukpaetra said in Date Format Validation:
GG NodeBB.
Is there any way to escape emoji? I put backslashes in but it had no effect.
Like so:
@carrievs said in Date Format Validation:
[-]CCYY-MM-DDThh\:mm\:ss[Z|(+|-)hh:mm]
Single back-tick-surrounded text is for "inline" code. Tripple it on lines to have NodeBB attempt to figure out what kind of code it is:
[-]CCYY-MM-DDThh\:mm\:ss[Z|(+|-)hh:mm]
If you have backticks in the code, you can put more backticks in.
```
-
@jaloopa said in Date Format Validation:
[-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
&zwnj
is your friendCCYY sounds like an WTF
-
@tsaukpaetra said in Date Format Validation:
Single back-tick-surrounded text is for "inline" code.
(The correct answer is no, no you can not escape emoji without also putting your text in a preformatted tag of some sort. This is because NodeBB is garbage.)
-
This post is deleted!
-
@blakeyrat said in Date Format Validation:
@tsaukpaetra said in Date Format Validation:
Single back-tick-surrounded text is for "inline" code.
(The correct answer is no, no you can not escape emoji without also putting your text in a preformatted tag of some sort. This is because NodeBB is garbage.)
Sure i can!
[-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
:
it's been part of HTML since like... i dunno, the Myans?
-
@accalia said in Date Format Validation:
it's been part of HTML since like...
Is that stock, or one of Ben L's modifications to let us use some HTML?
@accalia said in Date Format Validation:
i dunno, the Myans?
-
@accalia said in Date Format Validation:
:
Hate to break it to you, but that's not a standard entity.
-
@blakeyrat said in Date Format Validation:
@accalia said in Date Format Validation:
it's been part of HTML since like...
Is that stock, or one of Ben L's modifications to let us use some HTML?
standard HTML escape character. always been allowed, at least since i've been here.
@accalia said in Date Format Validation:
i dunno, the Myans?
No, the Nyans are a much later phenomenon.
-
@raceprouk said in Date Format Validation:
@accalia said in Date Format Validation:
:
Hate to break it to you, but that's not a standard entity.
cite your source please. cause my source says yes it is.
W3 Consortium lists the following escape characters for
:
::
,:
,:
-
@accalia said in Date Format Validation:
standard HTML escape character. always been allowed, at least since i've been here.
Ben L's HTML code has also always been here, so. Doesn't say much.
L's
L's wow this font is terrible. Look at that. L's. It's not even two letters, it turns into some kind of snake monster.
-
@accalia said in Date Format Validation:
standard HTML escape character
I found a standard that says :, but none that says :.
-
@accalia said in Date Format Validation:
@raceprouk said in Date Format Validation:
@accalia said in Date Format Validation:
:
Hate to break it to you, but that's not a standard entity.
cite your source please. cause my source says yes it is.
W3 Consortium lists the following escape characters for
:
::
,:
,:
Must be new in HTML5 then: it's not in HTML4 or earlier.
-
@ben_lubar said in Date Format Validation:
@accalia said in Date Format Validation:
standard HTML escape character
I found a standard that says :, but none that says :.
-
@accalia said in Date Format Validation:
cite your source please. cause my source says yes it is.
-
@𝔄𝔠𝔠𝔞𝔩𝔦𝔞 I didn't find that one, and it's a reference, not a standard.
-
@wharrgarbl said in Date Format Validation:
CCYY sounds like an WTF
I copied it from this source, can't seem to find the actual authoritative source on the subject.
-
@accalia said in Date Format Validation:
@blakeyrat said in Date Format Validation:
@tsaukpaetra said in Date Format Validation:
Single back-tick-surrounded text is for "inline" code.
(The correct answer is no, no you can not escape emoji without also putting your text in a preformatted tag of some sort. This is because NodeBB is garbage.)
Sure i can!
[-]CCYY-MM-DDThh:mm:ss[Z|(+|-)hh:mm]
:
it's been part of HTML since like... i dunno, the Myans?
I like the part where you had to have a ‌ character in there to make it actually work.
[-]CCYY-MM-DDThhss[Z|(+|-)hh:mm]
Go ahead, view raw... it's the same as what you must've thought that you posted...
(or did you do it on purpose to see if anybody would notice?)
-
@raceprouk said in Date Format Validation:
@accalia said in Date Format Validation:
cite your source please. cause my source says yes it is.
that one also doesn't list : as a valid character reference.
so....... whats up with that? :-P
-
@anotherusername said in Date Format Validation:
I like the part where you had to have a character in there to make it actually work.
did you view raw of my post? cause i didn't use a ZWNJ, that would have broken copy paste. i used
:
no sneaky non displaying character shenanigains.
-
@accalia yes. I viewed the raw, copied and pasted it, and found a ZWNJ character in it. Not the HTML escape code for the ZWNJ character, but the literal character itself, invisible, but there nonetheless.
-
@anotherusername said in Date Format Validation:
@accalia yes. I viewed the raw, copied and pasted it, and found a ZWNJ character in it. Not the HTML escape code for the ZWNJ character, but the literal character itself, invisible, but there nonetheless.
where the fuck did that come from? cause i didn't put it there!
.....
okay i give up. fuck emoji.
in my day we didn't need emoji. we had these things we used instead. what were they called again? oh yes. we had WORDS we used our WORDS.
-
@carrievs The WTF would be
in the specificationthat source calling a 4 digit year "CCYY". What does CC stand for? I read that as century, so the current date would be printed something like 2117-07-06
-
At least someone else got confused with this one:
-
@wharrgarbl Them off by one errors will get you.
-
-
@dkf said in Date Format Validation:
@accalia said in Date Format Validation:
fuck emoji.
Did they give a clitoris in the new EmojiOne update?
-
@ben_lubar said in Date Format Validation:
@dkf said in Date Format Validation:
@accalia said in Date Format Validation:
fuck emoji.
Did they give a clitoris in the new EmojiOne update?
Hmmm. Hard to say...