Java is on crack

Zecc

@dkf

dkf

@levicki said in Java is on crack:

HTTP protocol

That's HyperText Transfer Protocol… protocol…

https://thumbs.gfycat.com/SpecificDopeyAcornwoodpecker-size_restricted.gif

Bulb

@levicki said in Java is on crack:

HTTP (over which majority of file transfers are happening nowadays)

Far from majority, really. Also all the zillions of servers would have to be taught to read them from the filesystem and all the bajillions of clients taught to save it on their side. If the client is even saving the data itself. Because the thing over which by very, very far most file transfers happen is pipes.

dkf

@Bulb said in Java is on crack:

Because the thing over which by very, very far most file transfers happen is pipes.

Or, arguably, by just passing a filename and having the other end read it directly.

Bulb

@dkf No, that's not a transfer in the above sense, because that gives the target access to all the attributes in the source filesystem and the argument was about the target often not having them.

dkf

@Bulb said in Java is on crack:

that's not a transfer in the above sense

That's why I said “arguably”…

Gąska

@levicki internet isn't just WWW.

HardwareGeek

@dkf said in Java is on crack:

@Bulb said in Java is on crack:

that's not a transfer in the above sense

That's why I said “arguably”…

This is TDWTF; everything is arguable. Especially in the Garage.

dkf

@levicki said in Java is on crack:

Just open another pipe for metadata.

Managing multiple pipes between the same two processes is a ballache at best, and much worse in many programming languages. In-band metadata encapsulation is a lot easier to get right.

Zecc

@levicki I see NodeJS isn't even worth your consideration as a webserver.

Also there's the FTP file transfer protocol (sorry @dkf), and the fact that files might be compressed. (compressed files can carry metadata of course, but how exactly it is stored needs to be defined)

At some point it would just be better to come up with a standard way of storing metadata inside the file itself.
Like in a header, following a shebang, or something.

Bulb

This post is deleted!

Zerosquare

@Zecc said in Java is on crack:

At some point it would just be better to come up with a standard way of storing metadata inside the file itself.
Like in a header, following a shebang, or something.

glarbgrl Mac classic https://en.wikipedia.org/wiki/MacBinary glarbgrl

Gąska

@levicki said in Java is on crack:

@Gąska said in Java is on crack:

@levicki internet isn't just WWW.

We are talking about file exchange, most of which is done over HTTP (yes, even WebDAV is HTTP).

You only listed a few servers that are in use in the world out of many thousands - and you only listed ones that serve webpages. There are many internet services that use HTTP for things other than webpages, and they transfer a lot of files too. And most are single-purpose, custom-built software.

At some point it would just be better to come up with a standard way of storing metadata inside the file itself.
Like in a header, following a shebang, or something.

That's a bad idea and here's why.

If you store metadata in the file your OS must open the file to read it which is more expensive than looking it up in a filesystem index/database/separate stream.

Separate stream - no, it has exactly the same performance characteristic as opening the file itself (unless the FS is aware of that special stream and stores it differently than a regular data stream, but it's cheating and shouldn't be called separate stream anymore). Index/database (in line with the data you use to find the file on disk in the first place) - yeah, for some kinds of metadata that's perfect solution, including MIME type. But once you start playing with arbitrary metadata, there are serious scalability concerns with this solution. And if you use index/database that's separate from the main index/database used for looking up the file on disk, you're pretty much back in square one of doing a performance equivalent of opening the file itself (unless you keep metadata of all files together, but again - scalability.)

It must be able to ignore other possible file errors such as not having read access to contents when you ask only for metadata -- after all, it's ok if another user knows whether certain file is a video, but not what is inside if you didn't grant them access.

Not always. Also, most metadata isn't useful without reading the file anyway.

Furthermore, updating the metadata inside the file always risks corrupting the file

Writing to disk can fail, news at 11. This is as much true for separately stored metadata.

not to mention it's impossible to change metadata of a signed file without invalidating its signature

There are kinds of metadata where this is actually a desired feature.

(metadata is also last access date, whether it is read-only, etc, not just mime type).

Are you doing that on purpose, or have you really not figured out that last access data etc. isn't relevant in the discussion of transferring files over network? Last access date has very different purpose from, say, video length, which is also metadata. Can you imagine changing video length without write permission?

Finally, if you have some sort of cold storage like Facebook has for user data which is in their view "stale" such as your cat pictures from last month which are no longer near the top of your feed, you have to spin a rack full of HDDs up just to read said file metadata even though the rest of the file might not be needed in the end.

This isn't file metadata, this is post metadata. Or even post data. Facebook doesn't care when the photo was made, it cares when the photo was posted. (Tinfoil hats aside.)

HardwareGeek

@levicki said in Java is on crack:

cold storage like Facebook ... have to spin a rack full of HDDs

And causing problems for Facebook is a bad thing how?

dkf

@Gąska said in Java is on crack:

But once you start playing with arbitrary metadata, there are serious scalability concerns with this solution.

The biggest problem with arbitrary metadata is you end up with everyone and his dog doing it their own way, and either no sharing of metadata with ostensibly the same meaning, or fighting over particular metadata names. But it does work; big archives actually have workable scalable solutions. They're also incredibly bureaucratic organisations, so formal metadata specifications suit them well.

dkf

@levicki said in Java is on crack:

I hate FB as much as the next guy

Are you sure about that when I'm the next guy? I mean, they're not exactly the spawn of Satan; the devil's children have standards and morals, there are things they won't do, whereas FB seems to have no such bounds…

Gąska

@levicki said in Java is on crack:

@Gąska said in Java is on crack:

You only listed a few servers that are in use in the world out of many thousands

Oh no you won't -- he mentioned file transfers and I said MAJORITY of file transfers is HTTP and listed "a few servers" that happen to be MAJORITY of installed HTTP servers on the planet. So don't you fucking put words in my mouth if you can't fucking read.

Then what was that "by zillions of servers you mean..." line for? I read it as a very clear suggestion that there aren't that many different HTTP server applications in the world that modifying the accepted protocol(s) would be a big problem. You're ignoring the entire world of different applications, both server- and client-side, that use HTTP for communication - you're ignoring all HTTP communication that's not websites.

@levicki said in Java is on crack:

Define "most" compared to Apache, Nginix and IIS marketshare?

Market share in what? Non-websites? I'm pretty sure all three have very close to zero market share here. Total bandwidth used? The clear winner is whatever Netflix uses, and I doubt they use Apache. Some other metric? You have to say what it is, and link to the market share data.

When I said most, I meant the absolute amount of applications. Applications, not installations. Apache counts as 1, Nginx counts as 1, IIS counts as 1, each of the thousands of purpose-built HTTP servers counts as one. When talking about effort to implement new or updated protocol, it's a much more relevant metric than number of users, because you need to implement the new or updated protocol for each of them.

Not my problem they suffer from NIH syndrome.

But it's your problem that you suffer from "things I don't know about don't exist" syndrome.

@Gąska said in Java is on crack:

unless you keep metadata of all files together, but again - scalability.

You mean like with directory entries?

Directory entries aren't arbitrary - that's why it's practical to lump them all together.

@Gąska said in Java is on crack:

Also, most metadata isn't useful without reading the file anyway.

Why not? For displaying a folder you need file name, length, timestamps, attributes, and if you want an icon you need to know mime type. All that you can do without touching the file. If it is a media file then you need to extract thumbnail once but even that can be with metadata so you really don't need to read file after you created it (and the thumb). Now if you need to show document properties they can be metadata, EXE/DLL/SYS resource chunk? Move it to metadata.

Okay, maybe not most. But certain kinds are. Like the one we started this discussion with "which interpreter to use with this file". Why do you care if you can't run it anyway?

Don't touch the fucking file. Reason is that whenever you touch file over a network for anything more than metadata you are fucking with resource locks and you are ruining performance of the query.

You better put that lock on resource and don't change the MIME type while I'm checking it!

@Gąska said in Java is on crack:

Writing to disk can fail, news at 11. This is as much true for separately stored metadata.

Yes but it is cheaper to make a copy of metadata which is a few KB at most than copy the whole file, change few bytes of metadata and then rename it to the old one once you are sure that the write succeeded.

There are two possible cases:

The system knows about metadata; the system can do the metadata backup independently of the file (like, reserve some space before the file block for it, like it most likely did anyway for original metadata).
The system doesn't know about metadata; the system doesn't care, and the app is free to put the metadata backup inside the file itself.

Yet again, you're specifically designing the worst possible implementation of a solution just to point out its flaws, and ignore that there is a very easy and very obvious way to implement it better, and then the solution wouldn't have these flaws.

@Gąska said in Java is on crack:

There are kinds of metadata where this is actually a desired feature.

Again, MAJORITY .vs. few cases.

If you signed the file to make sure it doesn't get modified in any way, a great majority of metadata most likely falls under those you don't want to have changed either. About the only exception being the filesystem bookkeeping stuff, but this is a different category of metadata than we're talking about (the discussion is about file transfers, and filesystem bookkeeping stuff doesn't get preserved when you transfer files over network).

@Gąska said in Java is on crack:

This isn't file metadata, this is post metadata. Or even post data.

Let's say that Facebook can show you the post content with a cat photo thumbnail without waking the rack full of HDDs and for the photo it has to wake them up, that was the point.

I wouldn't call that metadata either. It's more like cache - prefetched and transformed content. You probably shouldn't store thumbnails in the same way you store MIME types.

Gąska

@levicki said in Java is on crack:

there aren't zillions of different HTTP servers in use.

This is factually wrong. Incorrect. Untrue. The opposite of the facts.

Gąska

@levicki said in Java is on crack:

@Gąska said in Java is on crack:

@levicki said in Java is on crack:

there aren't zillions of different HTTP servers in use.

This is factually wrong. Incorrect. Untrue. The opposite of the facts.

Should have added "FOR FILE TRANSFERS" after "in use"

You'd still be wrong.

Gribnit

@levicki said in Java is on crack:

That's the last reply you'll get from me until you apologize.

What on earth even goes through your mind? First you want to be able to take things back and now you think there should be apologies?

Rhywden

@Gribnit Well, I for myself think there should be cake.

topspin

@Rhywden said in Java is on crack:

@Gribnit Well, I for myself think there should be cake.

All I have is a guillotine.

Rhywden

@topspin said in Java is on crack:

@Rhywden said in Java is on crack:

@Gribnit Well, I for myself think there should be cake.

All I have is a guillotine.

A small one, I hope?

Rowan Atkinson: Toby the Devil - We Are Most Amused and Amazed – [01:52..05:01] 05:01
— Comedy Centre

topspin

@Rhywden (Un)fortunately, I seem to fall into several categories.

Gribnit

@levicki Were you talking to @Gąska ?

dkf

@Rhywden said in Java is on crack:

Well, I for myself think there should be cake.

Too bad. I've already eaten it with my coffee.

Luhmann

@levicki said in Java is on crack:

takes all my replies out of context in which I meticulously place them, and then replies to that with the most nonsensical whataboutism.

Welcome to the TDWTF!

Gąska

@levicki said in Java is on crack:

he is wrong because he always takes all my replies out of context in which I meticulously place them

No, that's not it. You're just wrong. In the very specific context of HTTP servers that serve complete copies of files from filesystem, that you spend so much effort to underline that you're talking about this one specific thing and nothing else, you are wrong that just updating the big three - Apache, Nginx and IIS - is enough to adopt a new protocol. Especially when you look at HTTP servers that serve complete copies of files from filesystem that host other things than websites. They are widely used, they might even be a majority, but they're far from being the only things there. There's thousands of custom-built HTTP servers ("everyone who rolled their own web server using C# HttpListener or whatever", as you put it - a very common thing in software world) that are mission-critical for a large part of IT industry. It would be like FAT on flash drives all over again - who cares about all the cool fancy extra metadata if the file has to go through (a storage media|an intermediate server) that doesn't support it and it will all be silently erased by the time it reaches destination?

heterodox

Girls, girls. You're both very pretty.

Now shut the fuck up.

Filed under: I'd be the best parent

Gąska

@levicki said in Java is on crack:

@Gąska said in Java is on crack:

Apache, Nginx and IIS - is enough to adopt a new protocol.

Except there is no new protocol -- HTTP servers and clients already support mime-types in headers, most servers already take a guess by consulting mime type database against filename extension (or even sniff file content using mime magic) before serving it. Whether client will trust that or put it back in the filesystem with the file on saving is another thing but it is far from impossible or impractical as you claim. On the contrary, it is trivial to add new headers with additional metadata if needed.

Adding is just half of the job. Or even 1% of the job. The other 99% is agreeing to make this header (or more likely, a brand new duplicate header) authoritative and preserved across downloads no matter how many times it journeys betweeen computers.

I still maintain that the concern you are raising is bullshit whataboutism and it is totally irrelevant to the discussion on whether (additional) metadata for file transfers over the network is possible or not

Whether it's possible is not a discussion. Of course it's possible. It's just doing what is already being done, but in a more meaningful way. No one has even the tiniest doubt that it is possible. But whether it can be done in our world in a way that actually works in practice with mostly existing infrastructure - that's the important part. Sure, you can ignore this part of the discussion if you're not interested in it, and it's fine. But you haven't ignored it. You voiced a very strong opinion on it, actually, and called me wrong several times, sometimes in quite creative ways. And I was just replying to what you said.

so yes, you didn't read what we were discussing.

Do you want me to quote all the other people who talked about feasibility and not possibility in this topic before me - yourself included?

@Gąska said in Java is on crack:

who cares about all the cool fancy extra metadata

Those who care will quite obviously implement it, those who don't -- won't. Nobody will be holding a gun to their head if their current workflow works for them they don't need to do it.

Talk about taking out of context. Look: unless almost everybody will implement it, it will not work, not even for yourself - unless you completely isolate your network from anyone who doesn't have it, because otherwise, even if majority does implement it, there's still a huge chance the data will be lost in transit somewhere. And once you make metadata important, losing it in transit will be a huge problem.

Zecc

@levicki said in Java is on crack:

mime magic

Gąska

@levicki said in Java is on crack:

@Gąska I disagree with the high-lighted part:

And let me pull a quote that shows @Bulb is aware things can be changed (to which you have replied BTW, and your reply focused solely on disputing the number of servers that would have to be updated):

@Bulb said in Java is on crack:

Also all the zillions of servers would have to be taught to read them from the filesystem and all the bajillions of clients taught to save it on their side.

"Have to be taught." As in, it's work, not that it's impossible.

dkf

@levicki said in Java is on crack:

Except there is no new protocol -- HTTP servers and clients already support mime-types in headers, most servers already take a guess by consulting mime type database against filename extension (or even sniff file content using mime magic) before serving it.

FWIW, if you're actually interested in doing MIME type sniffing, the best implementation is Apache Tika. It's a total beast, very expensive to set up in terms of resources and deployment complexity, but it does a much better job than pretty much everything else, handling everything common and masses of more obscure formats too. It's recommended for those occasions when you really are getting crap dumped on you by users without useful existing metadata and you've got to just recover what you can from the wreckage. You know, exactly what happens with user data every day all day long…

Captain

@dkf said in Java is on crack:

ballache

Thought that was a french word I needed to learn for a moment...

Bulb

@levicki said in Java is on crack:

This is about direct client / server transfer, not sending files via email where content passes over dozens of servers, some of which are still using 7-bit ASCII encoding.

No, it ISN'T. It was always about each and every transfer, everywhere (including that email content, but that actually does have content-type—except when the attachment is .zip, which it almost always is, renamed to .piz, because otherwise the “spam filter” installed by Mordac will drop it).

If it is not about every transfer it is pointless, because the metadata will get dropped sooner or later and will have to be reconstructed. So you have to be able to reconstruct them and then you can just always do it and not bother with storing it.

Applications don't have to copy anything around unless they are file managers

Many applications move files around and are not what one would normally call file managers.

regular applications just need to properly tag files on creation / saving.

Exactly. Each and every one of them. Including those that are still in use 20 years after it's sources have been eaten by the Grue and it's author has moved away and became a hermit somewhere in the Himalayas.

It is already happening on the Internet all the time.

And broken rather often.

HTTP transfer

Is still just a fraction of all transfers that are happening. Yes, most of the new stuff is over HTTP, but it is not the only protocol. And then a lot of things is technically over HTTP, but actually something different wrapped in BOSH. Or not even that, because it is over WebSockets, which are upgrade from HTTP.

Also don't forget that many transfers are over HTTP, but wrapped in .zip, .tar.gz or other kind of archive and those don't have that metadata.

is not just a stream of bytes but also headers which already pass metadata including how long the stream of bytes is, what is the content-type, what is the encoding

HTTP does have headers, but they are not metadata of the content, they are metadata of the transfer itself. The content-type is there basically only for content negotiation. For actual downloads much of the time the server simply slaps in application/binary and calls it a day.

what is the destination file name

None of the server's business, actually.

Transfer methods that for whatever reason don't go beyond that are minority compared to the rest and can be fixed if necessary.

You are thinking a BFU downloading files from internet. I am thinking every internal transfer in various systems, including between applications in one device. No longer minority. Downloading from internet is minority. And even for the BFU downloading files case the metadata is not there often once you take into account encapsulation in zip.

@Gąska said in Java is on crack:

@Bulb said in Java is on crack:

Also all the zillions of servers would have to be taught to read them from the filesystem and all the bajillions of clients taught to save it on their side.

"Have to be taught." As in, it's work, not that it's impossible.

I am more and more convinced it is an idiotic idea though, and not due to the amount of work it would involve, but principally.

dkf

@Bulb said in Java is on crack:

Or not even that, because it is over WebSockets, which are allegedly an upgrade from HTTP.

FTFY

boomzilla

@levicki said in Java is on crack:

Why are you so much against progress? Why are you holding everyone back with your ancient bullshit tech?

He likes to troll @blakeyrat.

Gąska

@levicki said in Java is on crack:

@Bulb Why are you so much against progress? Why are you holding everyone back with your ancient bullshit tech?

Because your imagined newfangled bullshit tech isn't any better. Just keep important metadata in the file itself. It has worked fine for decades and there's zero gain in moving it out of file.

Here's hoping that they decide to implement things which I suggested one day just so I can see yours and Gąska's heads explode from fury for having to eat crow and adapt your own projects to it.

And millions of IT professionals spending the next two decades swearing that their files lose metadata all the time and they have to fix them either manually or by deploying auxiliary tools that fill in missing metadata automatically (and overwrite any existing, possibly wrong metadata that's already there and has possibly been corrupted - because better safe than sorry), defeating the entire purpose of attaching metadata in the first place - because having things work at all is more important than having things work nicely.

MrL

@Gąska said in Java is on crack:

And millions of IT professionals spending the next two decades swearing that their files lose metadata all the time and they have to fix them either manually or by deploying auxiliary tools that fill in missing metadata automatically (and overwrite any existing, possibly wrong metadata that's already there and has possibly been corrupted - because better safe than sorry), defeating the entire purpose of attaching metadata in the first place - because having things work at all is more important than having things work nicely.

Why do you hate ~~progress~~job security?

heterodox

@levicki said in Java is on crack:

@Bulb Why are you so much against progress? Why are you holding everyone back with your ancient bullshit tech?

"Move fast and break things" doesn't work well when applied on a global scale. It makes people hate using computers.

You can "progress" as far as you want, as long as you have the understanding that no one's going to use your systems and they're not going to be interoperable with any other systems (see also: 90% of CS students' projects and research).

Rhywden

@Gąska said in Java is on crack:

@levicki said in Java is on crack:

@Bulb Why are you so much against progress? Why are you holding everyone back with your ancient bullshit tech?

Because your imagined newfangled bullshit tech isn't any better. Just keep important metadata in the file itself. It has worked fine for decades and there's zero gain in moving it out of file.

Yeah, only move the unimportant data into the filesystem, like file length or access rights!

Gąska

@Rhywden file's length can always be recovered by looking at how long the file is (that's why HTTP downloads work even with missing Content-Length). Access rights don't describe file content.

Rhywden

@Gąska Ah, an aficionado of Schlemihl!

Gąska

@Rhywden ja ni panimaju.

HardwareGeek

@heterodox said in Java is on crack:

It makes people hate using computers.

It makes people ~~smarter~~wiser?

dkf

@heterodox said in Java is on crack:

see also: 90% of CS students' projects and research

I didn't know it was as low as 90%.

dkf

@Gąska said in Java is on crack:

file's length can always be recovered by looking at how long the file is

It's critical metadata (at the filesystem level, it's mandatory), so it is often encoded multiple times (e.g., as an explicit length and as the length of the stream). Most metadata is much more fragile.

Bulb

@levicki said in Java is on crack:

Why are you so much against progress?

I am not against progress. I am against this particular idea, because it is crap. In large part because files are not strongly typed.

Gąska

@Bulb said in Java is on crack:

In large part because files are not strongly typed.

Most are.

Bulb

@Gąska Those that are generally have a header from which it can be reliably detected too.