Unique datetime

Jaime

What's so hard about it?

Race conditions and time-of-check-to-time-of-access bugs. That's why I recommended the "evil" singleton.

Your method would work if modified to:

1.n=0
2.Attempt to create file from current time + n, with "don't overwrite existing" flag on.
3.Check for error.
True: done.
False: n++ and goto 2.

If you get a really big surge of activity you might get a "Schlemiel the Painter" performance problem - similar to the one you get when doing string concatenation in a loop.

Gąska

@Jaime said:

Race conditions

Since they have already a working implementation of the system that saves reports to file, I assume they have it covered or it doesn't matter.

@Jaime said:

time-of-check-to-time-of-access bugs

Since they have already a working implementation of the system that saves reports to file, I assume they have it covered or it doesn't matter.

@Jaime said:

2.Attempt to create file from current time + n, with "don't overwrite existing" flag on.

That's implementation detail of checking for file existence. Please don't expose implementation details to API level if it doesn't give you significant performance gains.

@Jaime said:

If you get a really big surge of activity

If you get big surge of activity then the whole algorithm is flawed because it works only for n small enough for users to not notice that reports get future timestamps.

Planar

@flabdablet said:

This will get increasingly slow once the average file creation rate does creep above one per minute but has the advantages of being very robust and not at all "clever".

This has the advantage of being self-regulating: when it starts taking 1 minute to find the next available file name, then you can't generate more than 1 per minute, problem solved.

flabdablet

@Gaska said:

1. n=0
2. Generate file from current time + n.
3. Check if file exists.
True: save file.
False: n++ and goto 2.

What's so hard about it?

That works, and is exactly the fallback algorithm I suggested in the very next paragraph after the one you responded to with

@Gaska said:

Y U NO USE THE FILENAMES THEMSELVES

It's an O(n) algorithm, where n is the number of files generated too soon to get naive timestamp names, and under conditions unanticipated by Marketing (i.e. somewhere, sometime in the real world) it can end up doing a lot of unnecessary work. Remembering a single value sufficient to generate the next filename without search, as proposed by both Jaime and myself, is O(1).

@Gaska said:

If you get big surge of activity then the whole algorithm is flawed because it works only for n small enough for users to not notice that reports get future timestamps.

The whole specification is flawed from the start, as is clearly apparent to everybody except the sales droids. As a working coder, my response to idiotic specifications I can't persuade the client to disidioticate has always been to try to implement them in ways that make their unavoidable edge case failures as unsurprising and unproblematic as possible.

@Gaska said:

>Jaime:
2.Attempt to create file from current time + n, with "don't overwrite existing" flag on.

That's implementation detail of checking for file existence. Please don't expose implementation details to API level if it doesn't give you significant performance gains.

I think it's perfectly reasonable, when specifying an algorithm as opposed to an API, to do so in a way that doesn't bake in potential race conditions. There's already a distressingly large amount of unreliable code in the world.

ben_lubar

Remember back when they made Thor female? Why didn't they make iron man female? That would have been a much better choice.

flabdablet

@Planar said:

when it starts taking 1 minute to find the next available file name, then you can't generate more than 1 per minute, problem solved.

Isn't @Weng's org the one that spends six figures per terabyte on disk storage? I'm sure they could just add another server farm to generate filenames. Think of the XML deployment opportunities!

ben_lubar

I thought that was snoofle. What ever happened to that guy?

Weng

Yeah that's me. Though I've recently found a less godfuckingawful expensive way to get production disk.

Allocate it to a NAS share instead of a server. The new NAS filers are all just SAN bolt-on's, so it's the same disk - but for some reason it's a third the price. Only problem is that you get slightly less performance - Fibre Channel to the NAS box, and then GigE (not even 10g, FFS) to the blade chassis*, 10G from the blade chassis to the VM Host blade as opposed to FC to the blade chassis and FCoE to the host blade.

Yeah, we actually use real, honest to god blade servers. I'm not doing the derpy non-server IT thing and calling a rackmount server a blade. I also figured out how to get CPU and RAM on the cheap - just ask for enough VMs at once that they just ask you to buy blades (the magic number seems to be 7). You have to time it so that you don't bump them over the 'need a new chassis' line, though.

Gąska

@flabdablet said:

It's an O(n) algorithm, where n is the number of files generated too soon to get naive timestamp names, and under conditions unanticipated by Marketing (i.e. somewhere, sometime in the real world) it can end up doing a lot of unnecessary work. Remembering a single value sufficient to generate the next filename without search, as proposed by both Jaime and myself, is O(1).

As I said before, this algorithm works ONLY for n small enough for users to not notice that reports get future timestamps. Premature optimization is the root of all evil. Not using any additional file to store latest timestamp has the advantage of less junk on disk (imagine someone in sales asking "what's this file for?", or even not asking and just deleting it), and the advantage of having no singleton is having no singleton.

boomzilla

@Jerome_Viveiros said:

Why don't you just overwrite the file?

Yes, the obvious passive aggressive approach. Best kept in the imaginary world of our dreams.

VaelynPhi

When generating the report, check for the file first thing. If it exists, sleep until the next minute, then make the report.

Might be slow, but it is to spec.

If you really wanted to punish them, you could have the program refuse to generate another report with a popup and a timer counting down until the next minute. Okay, maybe I'd get fired.

cartman82

Recently I had to do similar thing when generating something called sequence number for a zone config in the 'bind' DNS server. It's just a number you have to increase every time you change the config.

Why couldn't I just take a unix timestamp? Well because some asshole had decided to save 2 bytes of memory and limited the number's max size to less than what I need for a full date. So now, I have to take something like current year/day/hour/minute and increment it as long as it clashes with the current setting. Luckily, this is not changed that often, but it still sucks.

Anyway, there's no clever solution for this. Just swallow the turd and implement the suggested "increase by one in case of clash" algorithm.

dkf

@VaelynPhi said:

If you really wanted to punish them, you could have the program refuse to generate another report with a popup and a timer counting down until the next minute. Okay, maybe I'd get fired.

It's a shame you can't use something like NET SEND to pop up a notification on everyone's desktop whenever that happens. Or is that still a thing?

flabdablet

@cartman82 said:

now, I have to take something like current year/day/hour/minute and increment it as long as it clashes with the current setting.

Why not just take the current setting and increment it by 1? That's how the thing's obviously designed.

cartman82

@flabdablet said:

Why not just take the current setting and increment it by 1? That's how the thing's obviously designed.

Complicated reasons, one of which is, I'm sort of doing it that way.

Maciejasjmj

@accalia said:

ErrorReportMMDDYYYYHHMM(65535).txt
ErrorReportMMDDYYYYHHMM(-65536).txt

A 17-bit integer... I like it.

Anyway, yeah, if file exists, keep adding a minute until it doesn't. Simple, preserves the order of generation, and probably effective enough if they don't average more than one report a minute. And if they do, there's no real workaround anyway - ya canna change the laws of physics.

Or use 2014, 3014, 4014, etc... I don't suppose they plan on using it all the way till year 3000.

@flabdablet said:

It's an O(n) algorithm, where n is the number of files generated too soon to get naive timestamp names, and under conditions unanticipated by Marketing (i.e. somewhere, sometime in the real world) it can end up doing a lot of unnecessary work. Remembering a single value sufficient to generate the next filename without search, as proposed by both Jaime and myself, is O(1).

Who cares? The only thing you need to worry about is "fast enough under conditions" vs. "not fast enough until conditions".

flabdablet

@cartman82 said:

I'm sort of doing it that way

Sure, but I don't see why you need to involve the current year/day/hour/minute at all. Can't a meaningless sequence number just be a meaningless sequence number?

cartman82

It's complicated = it's crappy code hacked together and pushed to production at 10PM on the day of the deadline.

The point is, it's a similar problem to the OP and no easy solution if you need to use the date for whatever reason.

My reason being, it's a crappy code and no time to rewrite it.

dkf

@flabdablet said:

Sure, but I don't see why you need to involve the current year/day/hour/minute at all. Can't a meaningless sequence number just be a meaningless sequence number?

But the client is probably saving some bits inside the file by not putting any dates in it. After all, they're not needed: the information is in the filename!

ComputerForumUser

Because it's not me who has to implement or support this, I recommend dealing with collisions by replacing characters in the name with identically rendered ones from the Cyrillic or Greek alphabets. If the filenames are only checked by eye nobody will notice, and if it's automatic it won't happen right away and you can blame the automatic checker.

</evilmode>

dkf

@ComputerForumUser said:

I recommend dealing with collisions by replacing characters in the name with identically rendered ones from the Cyrillic or Greek alphabets.

Or use the full-width forms starting at U+FF00. Has the advantage of being easy to squirrel inside the code without an explicit conversion table (which it's too easy to spot in a code-review).

Zemm

@VaelynPhi said:

When generating the report, check for the file first thing. If it exists, busy wait checking filesystem without a sleep until the next minute, then make the report.

Might be slow, but it is to spec.

FTFY. Save 30 seconds on average for the user.

RTapeLoadingError

Option 1
If the current date/time is taken then change the system clock back one minute and try again. Repeat until you find a time that's not taken and create the file with the "current" time stamp.

This has the advantage in that the difference between the system time and the actual time gives an indication of how many minutes have multiple reports.

Option 2
When you need to create a file you work out what the filename should be. You then query a database table with all taken filenames in it. If it's not there you create the file and insert a row into the table. If it is taken then you query the DB to find the first available timestamp in either direction: future or past. You could create the table to have actual time and filename time columns.

This has the advantage of recording all file creation events. You could data mine this table to produce all sorts of reports which could be used to identify busy times and assist with capacity planning and trend analysis. Maybe have a dashboard for managers.

ben_lubar

Option 3

Multiply all timestamps by 60 or 60000 or whatever and then format them modulo 10000 years.

flabdablet

@RTapeLoadingError said:

Maybe have a dashboard for managers.

Release the resulting code as a library with a hard dependency on a huge tree of other libraries for rendering 3D pie charts.

PJH

@flabdablet said:

Release the resulting code as a library with a hard dependency on a huge tree of other libraries for rendering 3D pie charts.

.. and link with systemd.

VaelynPhi

@cartman82 said:

Recently I had to do similar thing when generating something called sequence number for a zone config in the 'bind' DNS server. It's just a number you have to increase every time you change the config.

I usually just start with 0 and increment every time the file changes. (My main router at home dials in and informs my server every time its WAN goes up, so that when its IP changes, my DNS settings are changed. A bash script handles this, and doesn't seem to have any trouble parsing then incrementing the field. I am only on iteration 4, though. :)

A workaround to having to parse the actual record, BTW, is just rewriting that section of the file by sedding out what's between some enveloping comments, one of which should include the last value to be easily parsed. I can't count how many times I've had to hack something like this together when doing sysadmin work.

@dkf said:

It's a shame you can't use something like NET SEND to pop up a notification on everyone's desktop whenever that happens. Or is that still a thing?

There are solutions built out of smaller pieces; usually it's a matter of setting up xinetd with a small notification script. At least, on Linux. Presumably there are similar methods available on Windows. The modern equivalent is calling a Skype conference, methinks.

@ComputerForumUser said:

I recommend dealing with collisions by replacing characters in the name with identically rendered ones from the Cyrillic or Greek alphabets.

flabdablet

@VaelynPhi said:

replacing characters in the name with identically rendered ones from the Cyrillic or Greek alphabets.

I did this recently to a fake-virus prank cmd script that one of the little darlings left lying around on the school server. I worked out who'd done it by checking the squid logs, then replaced every a, c, e, o and p in the whole thing with Cyrillic lookalikes, put the mod date back the way it was, and added a hidden file named @ðÁÐühð¥.cmd to the same directory. So instead of the @echo off on the first line of his script turning echo off, it jumped to mine instead:

@echo Nice try, James.
@echo If you want to learn about writing viruses, come see me.
@echo Until then, please don't leave annoyances like this on
@echo the school server.   - Stephen

Strangely enough, two days later all his soopa seekrit virus files had disappeared; @ðÁÐühð¥.cmd was still there and still hidden.

He didn't come and find me. Pity. I like a kid with spark.

Zemm

@flabdablet said:

one of the little darlings left lying around on the school server. I worked out who'd done it by checking the squid logs,

So they don't have individual user accounts so it wouldn't be owned by james:students?

flabdablet

This is a primary school (roughly K-6 for US readers). We use per-class rather than per-student Windows accounts, to avoid wasting class time on repeated logoff/logon, forgotten passwords and so forth.

Students do have their own individual folders on the file server, with read/write access granted to their class Windows account. The school culture is healthy enough that we've had almost no incidents of students stomping each other's work so the setup works very well in practice.

Young James saved his fake virus prank in a folder accessible to his entire Year 6 class rather than to his individual student folder, along with a bunch of archive-expander utilities and a huge number of duplicates of the EICAR AV signature test file. Next time I meet him down the shops or wherever I'll ask him who he was actually trying to prank.

Zemm

@flabdablet said:

We use per-class rather than per-student Windows accounts, to avoid wasting class time on repeated logoff/logon, forgotten passwords and so forth.

Still more accounts than used in my high school: everyone used the same user "student" with password coke. Back in the days of NetWare, circa 1996. The student drive had worm access only, so one could write but not overwrite files. Don't know how that really worked. But we were to use floppies so it didn't really matter. Before that there was a strange network of 286s with a read only F drive: no accounts. They still needed a 5¼" boot disk.

On the powerful 486s with massive 15" screens we played network Doom. It was glorious!

dkf

@Zemm said:

Before that there was a strange network of 286s with a read only F drive: no accounts. They still needed a 5¼" boot disk.

My school (this is the 1980s) had some strange 186s that were not really PC compatible and ran some sort of weird custom OS. Hated them. They also had a few old BBC Model B computers lying around. They were much more fun to program, despite being more limited machines in so many ways.
@Zemm said:

powerful 486s

Luxury! (I think I'll omit the rest of the 4 Yorkshiremen sketch for brevity.)

I remember trying to get Doom working on the 386sx16 systems we had in college. They weren't really powerful enough. (It wasn't until we found some more powerful machines in the CS department's public cluster that I found that that game gives me motion sickness, even if I'm only able to see it out of the corner of my eye. ) But they were powerful enough to run Linux configured to be an Xterminal (800×600 on genuine VGA hardware; needed to be monochrome and the displays weren't very happy, but the extra pixels over the standard 640×480 were truly glorious. )