The absolute state of web storage protocols

Bulb

@Arantor said in The absolute state of web storage protocols:

That's the thing about webdevs, TCP is a level almost never touch, that's handled one level below us so having to deal with TCP reconnects and acknowledgements is almost exclusively a library problem. Which is why for webdevs this is a whole class of problem they're unfamiliar with.

The connecting and disconnecting itself is handled by the library, but it does not, and cannot, abstract away the failures. But failures, especially the spurious ones, are rare in development environments, so webdevs are still unfamiliar with them and very rarely handle them correctly.

In particular, TCP is prone to this “congestion controlling itself into oblivion”. When TCP connection loses a packet, it assumes that it was because of congestion, and slows itself down. But real network have other sources of packet loss and sometimes have a hiccup that causes a bunch of packets to be lost—and then the network is fine again, but some TCP connections will now take a long time to try again.

When it happens to a page load, that page is suddenly loading much slower, and you hit refresh and it loads fine. But when it happens to a fetch from javascript, most web apps become unresponsive and there's no obvious way to make them retry.

And it can't be handled by generic timeouts. Those have to be long enough for the cases when the network is just genuinely slow. There would have to be some fairly clever adaptive retry policy and most web apps don't even have a dumb one.

This is certainly the problem this group of people I mentioned ran into. I wouldn't describe them as coworkers because they're not, they're a group of #buildinpublic wingnuts on Twitter who declare that the technology is broken because they don't know how to use it correctly.

Yeah, the ones I know similarly declared ZMQ is broken because they didn't understand how to use it correctly either.

though very often even how to implement long-polling correctly is too hard (because they assume it will run like a full duplex stateful connection until they realise it isn't), and end up implementing short polling until the backend folks start complaining about how much extra load it produces with the constant reconnections/re-establishing of state etc.

The front-end can only use what the back-end provides, no?

Here I'm calling front-end the part that runs in the browser, and the back-end the part that runs on the web server. We write our front-ends fully client-side these days, so we don't have any other components. When you have a server-side-rendered web app, that server decides how to do updates.

But then it should still be possible to use different polling style between the browser and web server and between the web server and api server(s) behind it. The web server just needs a session cache where it can keep the api connection(s) open for half a minute or a minute to avoid re-establishing the state for the active connections all the time even if it uses stateless short poll from the client.

Arantor

@Bulb said in The absolute state of web storage protocols:

webdevs are still unfamiliar with them and very rarely handle them correctly

Web devs rarely test anything other than the happy path in general.

web apps don't even have a dumb one.

Same point.

Here I'm calling front-end the part that runs in the browser, and the back-end the part that runs on the web server.

Sure, that's the standard definition, but you're also making the assumption that both sides of the equation are equally competent, and that both sides of the equation talk to each other. My experience is that neither of these are ever fully true.

There's an awful lot of mysticism out there about how any of this works outside of dev environments, and frankly it's a miracle that most of it works 'as well as it does' for whatever definition that currently has.

But my experience of frontend folks in general is that they want to be wilfully ignorant of the backend wherever possible and just want to fling shit over the fence and into the Somebody Else's Problem field that sits between frontend and backend, rather than doing something ridiculous like talking to backend about how to best implement this.

But then it should still be possible to use different polling style between the browser and web server and between the web server and api server(s) behind it.

A lot of this really depends on how/what you're implementing it in on some level. If you're implementing this on PHP, good fucking luck to you unless you're exposing PHP directly to the web and not through Apache/nginx - because, for example, PHP implementations of websockets exist but they expect to be the direct connectee for exactly that reason, that they need to do funky and 'unexpected' things from Apache/nginx perspective.

This is, incidentally, one of the reasons Node has made headway, in that it is supposed to be better about this sort of thing...

Bulb

@Arantor said in The absolute state of web storage protocols:

Sure, that's the standard definition, but you're also making the assumption that both sides of the equation are equally competent, and that both sides of the equation talk to each other. My experience is that neither of these are ever fully true.

They have the daily sit-down together, but yeah, we had our share of crappy front-end devs. For about a year we had one who made basically no progress in that time and was eventually let go. And of the two we have now, one seems to care and is promising, but not very experienced yet, and the other still gets lost quite easily. And one who helps us part time and usually fixes shit the second one fumbled.

But my experience of frontend folks in general is that they want to be wilfully ignorant of the backend wherever possible and just want to fling shit over the fence and into the Somebody Else's Problem field that sits between frontend and backend, rather than doing something ridiculous like talking to backend about how to best implement this.

Yeah, it always is so that the back-end developer discusses the design with the architect, implements the API first and only then the front-end devs do anything.

If you're implementing this on PHP

Our back-ends are usually Java or C#. Since the front-ends are fully client-side, the server only needs to talk JSON and fortunately that isn't something someone would suggest PHP for.

We still expose it through nginx, but that just acts as a proxy and the nginx proxy module seems to handle websockets just fine.

This is, incidentally, one of the reasons Node has made headway, in that it is supposed to be better about this sort of thing...

The main benefit of Node here is that it has the “stackless” async, so it uses less resources for an open, but otherwise idle, connection. I'm not sure about Java, but .NET also has “stackless” async in similar way to JavaScript. And we are not using either websockets or polling in the Java server, only in the .NET one.

Watson

@Arantor said in The absolute state of web storage protocols:

frankly it's a miracle that most of it works 'as well as it does' for whatever definition that currently has.

Twenty-first Law of Systemantics.

dkf

@Bulb said in The absolute state of web storage protocols:

We still expose it through nginx, but that just acts as a proxy and the nginx proxy module seems to handle websockets just fine.

There's a few tweaks needed to the standard config, but those are well known; you put them in and it works. (I've forgotten what exactly they are, but I remember doing them; it's been a while.)

BernieTheBernie

@dkf said in The absolute state of web storage protocols:

I've forgotten what exactly they are, but I remember doing them; it's been a while.

You forgot to mention your here.

dkf

@BernieTheBernie More like "I cut-n-paste something from a website that looked likely, and who remembers that?"

BernieTheBernie

@dkf Ah: you copied code from a random website and tested it in production.
This is the way!

dkf

@BernieTheBernie said in The absolute state of web storage protocols:

@dkf Ah: you copied code from a random website and tested it in production.
This is the way!

Technically, it was tested on the testing service. And then transferred directly to production once it was found to have worked.

Bulb

@dkf said in The absolute state of web storage protocols:

standard config

I don't configure nginx directly, I configure it through the ingress-nginx wrapper. That has its own defaults.

sockpuppet7

use IPFS

Bulb

Apache is working on a library to support a bunch of the different storages (filesystem and key/value):

Apache OpenDAL™

That should help some.

Gustav

@Bulb can't wait for that ACE exploit caused by an on-by-default option to dynamically resolve JNDI expressions in query key

Bulb

@Gustav Fortunately this time they are implementing it in Rust where that kind of things does not tend to be on-by-default (and usually even not implemented).

Gustav

@Bulb wait what? Apache isn't Java-only shop anymore? I'm getting old.

Bulb

@Gustav They never were. After all, their first project, the Apache http server, is written in C. And over time they adopted a lot of open-source projects in all sorts of weird languages. Java is still dominant language for them, but isn't the only one.

dkf

@Bulb said in The absolute state of web storage protocols:

@Gustav Fortunately this time they are implementing it in Rust where that kind of things does not tend to be on-by-default (and usually even not implemented).

There will still be the other root cause of the log4j problem potentially about: reparsing things that shouldn't be reparsed. That's not a language fault (unless the language is especially bad) but is definitely possible in user code or library code. SQL injection is an example of this sort of thing.

Bulb

@dkf said in The absolute state of web storage protocols:

There will still be the other root cause of the log4j problem potentially about: reparsing things that shouldn't be reparsed. That's not a language fault (unless the language is especially bad) but is definitely possible in user code or library code. SQL injection is an example of this sort of thing.

Yeah, putting together a URL so that all the parts are URL-encoded the same number of times is very difficult.

Case in the point, as we work with the Azure Blob storage, we had a bug that some address couldn't be resolved. Well, the URL is in the form https://accountname.blob.core.windows.net/container/path/that/might/have/further/slashes. Well, some utility for composing URL, in the .net library itself, implicitly urlencodes the path. But the urlencode function, being designed for query parameters, also escapes / as %2F. But while the server will treat https://accountname.blob.core.windows.net/container/path/that/might/have/further/slashes and https://accountname.blob.core.windows.net/container/path%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes as equivalent, it will barf on https://accountname.blob.core.windows.net/container%2Fpath%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes, i.e. if the / after the first element is escaped. The behaviour makes sense in context, but making sure it gets passed correctly is …

Kamil Podlesak

@dkf said in The absolute state of web storage protocols:

@Bulb said in The absolute state of web storage protocols:

@Gustav Fortunately this time they are implementing it in Rust where that kind of things does not tend to be on-by-default (and usually even not implemented).

There will still be the other root cause of the log4j problem potentially about: reparsing things that shouldn't be reparsed. That's not a language fault (unless the language is especially bad) but is definitely possible in user code or library code. SQL injection is an example of this sort of thing.

Obligatory note: log4j does not have such problem at all. log4j2 does, which is a completely different library implemented from scratch with lots of new, shiny, "useful" features like special processing of the messages.

LaoC

@Bulb said in The absolute state of web storage protocols:

But while the server will treat https://accountname.blob.core.windows.net/container/path/that/might/have/further/slashes and https://accountname.blob.core.windows.net/container/path%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes as equivalent, it will barf on https://accountname.blob.core.windows.net/container%2Fpath%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes, i.e. if the / after the first element is escaped. The behaviour makes sense in context, but making sure it gets passed correctly is …

Is that "makes sense" as in "it's weirdness that you'd expect if you know how it's implemented" or some sense I don't understand?

Bulb

@LaoC Yes

, the former

PleegWat

@Bulb said in The absolute state of web storage protocols:

@dkf said in The absolute state of web storage protocols:

There will still be the other root cause of the log4j problem potentially about: reparsing things that shouldn't be reparsed. That's not a language fault (unless the language is especially bad) but is definitely possible in user code or library code. SQL injection is an example of this sort of thing.

Yeah, putting together a URL so that all the parts are URL-encoded the same number of times is very difficult.

Case in the point, as we work with the Azure Blob storage, we had a bug that some address couldn't be resolved. Well, the URL is in the form https://accountname.blob.core.windows.net/container/path/that/might/have/further/slashes. Well, some utility for composing URL, in the .net library itself, implicitly urlencodes the path. But the urlencode function, being designed for query parameters, also escapes / as %2F. But while the server will treat https://accountname.blob.core.windows.net/container/path/that/might/have/further/slashes and https://accountname.blob.core.windows.net/container/path%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes as equivalent, it will barf on https://accountname.blob.core.windows.net/container%2Fpath%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes, i.e. if the / after the first element is escaped. The behaviour makes sense in context, but making sure it gets passed correctly is …

Only one of those can be correct, and the fact the other works is a bug. The only reason nobody noticed this yet is that they have not tried a container whose name includes the substring %25.

Bulb

@PleegWat said in The absolute state of web storage protocols:

the fact the other works is a bug

I think originally the encoding was only meant for the query part of the URL, because you can just name your documents to fit in the allowed character set. But then people wanted file names in languages that use characters out of ASCII, so web servers started url-decoding the file names too.

And of course started to do it inconsistently. If I read the standard correctly—which I'm not sure, because it's a mess—the path is supposed to be split to segments on / before decoding, and then the segments resolved independently, so the %2F shouldn't be equivalent to / (and should usually not exist, since a file or directory name can't contain a / on any common system), but since common operating systems also don't usually have an API that would take the path as a list rather than as a string of /-separated components, they do end up being equivalent when the path is looked up on an actual filesystem.

LaoC

@Bulb said in The absolute state of web storage protocols:

@PleegWat said in The absolute state of web storage protocols:

the fact the other works is a bug

I think originally the encoding was only meant for the query part of the URL, because you can just name your documents to fit in the allowed character set. But then people wanted file names in languages that use characters out of ASCII, so web servers started url-decoding the file names too.

EBCDIC systems have been around forever.
The way I read it you always could percent-encode whatever you like unless it's an URI component separator.

And of course started to do it inconsistently. If I read the standard correctly—which I'm not sure, because it's a mess—the path is supposed to be split to segments on / before decoding, and then the segments resolved independently, so the %2F shouldn't be equivalent to /

I think the separation refers only to URI components where "path" is a single component.

(and should usually not exist, since a file or directory name can't contain a / on any common system), but since common operating systems also don't usually have an API that would take the path as a list rather than as a string of /-separated components, they do end up being equivalent when the path is looked up on an actual filesystem.

If some OS360 fossil is a cromulent platform to run web servers on, surely so is MacOS Classic

You could read the original RFC1738 in a way that you must encode slashes in paths although I don't think anyone ever implemented it like this. On URIs in genral it says

Octets must be encoded if they have no corresponding graphic
character within the US-ASCII coded character set, if the use of the
corresponding character is unsafe, or if the corresponding character
is reserved for some other interpretation within the particular URL
scheme.

Many URL schemes reserve certain characters for a special meaning:
their appearance in the scheme-specific part of the URL has a
designated semantics. If the character corresponding to an octet is
reserved in a scheme, the octet must be encoded. The characters ";",
"/", "?", ":", "@", "=" and "&" are the characters which may be
reserved for special meaning within a scheme. No other characters may
be reserved within a scheme.

And specifically on HTTP

Within the <path> and <searchpart> components, "/", ";", "?" are
reserved. The "/" character may be used within HTTP to designate a
hierarchical structure.

Yeah, it's an absolute fucking mess.
And then people advocate Everything-over-HTTP because "HTTP is a simple protocol, look I can use telnet!"

BernieTheBernie

@LaoC said in The absolute state of web storage protocols:

@Bulb said in The absolute state of web storage protocols:

But while the server will treat https://accountname.blob.core.windows.net/container/path/that/might/have/further/slashes and https://accountname.blob.core.windows.net/container/path%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes as equivalent, it will barf on https://accountname.blob.core.windows.net/container%2Fpath%2Fthat%2Fmight%2Fhave%2Ffurther%2Fslashes, i.e. if the / after the first element is escaped. The behaviour makes sense in context, but making sure it gets passed correctly is …

Is that "makes sense" as in "it's weirdness that you'd expect if you know how it's implemented" or some sense I don't understand?

It is the way the URL to a container in your storage, plus the items in it, get written in Azure. Take a look at one I mentioned elsewhere:
https://storagebernie.blob.core.windows.net/testpublic/IMGP8315.JPG
So, storagebernie is my storage, and testpublic is a container therein, and IMGP8315.JPG is an item therein.
A container is actually a flat structure - subdirectories do not really exist, but items with a slash in it are usually visualized as if they were subdirectories (and note: when you delete all items in such a subdirectory, also the subdirectory is removed).

dkf

@LaoC Encoding / as %2F would be a way to get the slash in that segment of the path. It's permitted for a web app endpoint to consume the whole path following its name. That's what's happening in this case. Except then it is also doing %-decoding on that path and losing the difference between the encoded and unencoded parts.

This stuff is all a horrible ad hoc mess.

Bulb

@LaoC said in The absolute state of web storage protocols:

EBCDIC systems have been around forever.

It was never allowed in HTTP though. HTTP has been specified as using US-ASCII for the headers since the beginning.

The way I read it you always could percent-encode whatever you like unless it's an URI component separator.

You can percent-encode whatever you like and it the encoded form is not a separator …

I think the separation refers only to URI components where "path" is a single component.

No, it is defined as a sequence of /-separated segments.

The RFC-3986 (which updates 1738 (not obsoletes, though it actually does … it's a Mess™)) says

Use of the slash character to indicate hierarchy is only required when a URI will be used as the context for relative references. (§3.3 Path)

When composing a URL from a base and a relative URL, this happens on the encoded form, so chomping off of components stops at /, but not at %2F.

But it does not say anything about how the server should treat it.

@dkf said in The absolute state of web storage protocols:

@LaoC Encoding / as %2F would be a way to get the slash in that segment of the path. It's permitted for a web app endpoint to consume the whole path following its name. That's what's happening in this case. Except then it is also doing %-decoding on that path and losing the difference between the encoded and unencoded parts.

This stuff is all a horrible ad hoc mess.

It's even somewhat consistent I'd say. When the segments are parsed to arguments to some kind of app, it is almost always split first, then decode—so encoding is a way to get a / into specific segment—and if it is passed to a filesystem, the path is resolved as a whole after decoding, so there / and %2F are equivalent.

The blob storage is a bit unusual in that the first / separates components for the app, so it must not be escaped, but the rest is the key to the almost-but-not-exactly filesystem, so there the escaping does not matter.

My original problem is actually more that .нет only has one url escaping function (and so do many other languages) and it escapes / unconditionally, so you have to watch out when assembling the path for a file with name that might need escaping.

@LaoC said in The absolute state of web storage protocols:

Yeah, it's an absolute fucking mess.
And then people advocate Everything-over-HTTP because "HTTP is a simple protocol, look I can use telnet!"

No, you can't. You can use netcat or socat, but telnet actually has some meta-commands that make it not fully 8-bit transparent. What you can use telnet for is FTP, because FTP is explicitly built on top of the telnet protocol.
Of course these days you rather have to use openssl s_client, because everything is wrapped in TLS.
You can only do that with the legacy HTTP/0.9 through HTTP/1.1. HTTP/2 is not textual and HTTP/3 is not even TCP any more.

PleegWat

@Bulb said in The absolute state of web storage protocols:

telnet actually has some meta-commands that make it not fully 8-bit transparent.

Some clients may turn those off if they detect they're not talking to a telnet server.

dkf

@Bulb said in The absolute state of web storage protocols:

It's even somewhat consistent I'd say. When the segments are parsed to arguments to some kind of app, it is almost always split first, then decode—so encoding is a way to get a / into specific segment—and if it is passed to a filesystem, the path is resolved as a whole after decoding, so there / and %2F are equivalent.

The blob storage is a bit unusual in that the first / separates components for the app, so it must not be escaped, but the rest is the key to the almost-but-not-exactly filesystem, so there the escaping does not matter.

The point is that at least some web app backends can take part of the path as effectively a parameter, rather than using the query string or the uploaded document for it. That's fine, but requires that it behaves like a resource (indeed, that's why you'd do this, to make an endpoint resource that behaves like a family of resources). You probably shouldn't be percent-encoding the /s in that path section on the client side.

My original problem is actually more that .нет only has one url escaping function (and so do many other languages) and it escapes / unconditionally, so you have to watch out when assembling the path for a file with name that might need escaping.

It's probably more that it's seeing "string I'm putting in the URL has a 'forbidden' character in it; I must encode!" Assume that things are dumb and written that way by someone not aware of all the subtleties, and you won't be disappointed.

Bulb

@dkf said in The absolute state of web storage protocols:

@Bulb said in The absolute state of web storage protocols:

It's even somewhat consistent I'd say. When the segments are parsed to arguments to some kind of app, it is almost always split first, then decode—so encoding is a way to get a / into specific segment—and if it is passed to a filesystem, the path is resolved as a whole after decoding, so there / and %2F are equivalent.

The blob storage is a bit unusual in that the first / separates components for the app, so it must not be escaped, but the rest is the key to the almost-but-not-exactly filesystem, so there the escaping does not matter.

The point is that at least some web app backends can take part of the path as effectively a parameter, rather than using the query string or the uploaded document for it. That's fine, but requires that it behaves like a resource (indeed, that's why you'd do this, to make an endpoint resource that behaves like a family of resources). You probably shouldn't be percent-encoding the /s in that path section on the client side.

It's quite common. In fact, it's standard for REST apps. And yes, I hope those first parse the path to parameters and then decode, which allows passing the special characters into the individual parameters and possibly avoids some re-parsing issues.

My original problem is actually more that .нет only has one url escaping function (and so do many other languages) and it escapes / unconditionally, so you have to watch out when assembling the path for a file with name that might need escaping.

It's probably more that it's seeing "string I'm putting in the URL has a 'forbidden' character in it; I must encode!" Assume that things are dumb and written that way by someone not aware of all the subtleties, and you won't be disappointed.

We knew there can be ‘forbidden’ characters like ą, so it might have been on of the team members who put in the quoting, just due to the way the Url logic works, and the lack of “encode-path” variant, it was a very wrong place.

hungrier

@Arantor said in The absolute state of web storage protocols:

Apparently no-one had told them that 'if the connection stops you have to reconnect'

55f7811d-6d2e-4d55-96a1-38fef081418a-10b-2767353722.jpg

LaoC

@Bulb said in The absolute state of web storage protocols:

And then people advocate Everything-over-HTTP because "HTTP is a simple protocol, look I can use telnet!"

No, you can't. You can use netcat or socat, but telnet actually has some meta-commands that make it not fully 8-bit transparent. What you can use telnet for is FTP, because FTP is explicitly built on top of the telnet protocol.

Of course these days you rather have to use openssl s_client, because everything is wrapped in TLS.

You can only do that with the legacy HTTP/0.9 through HTTP/1.1. HTTP/2 is not textual and HTTP/3 is not even TCP any more.

I'm not saying it's a valid argument. Quite the opposite.

dkf

@Kamil-Podlesak said in The absolute state of web storage protocols:

@dkf said in The absolute state of web storage protocols:

@Bulb said in The absolute state of web storage protocols:

@Gustav Fortunately this time they are implementing it in Rust where that kind of things does not tend to be on-by-default (and usually even not implemented).

There will still be the other root cause of the log4j problem potentially about: reparsing things that shouldn't be reparsed. That's not a language fault (unless the language is especially bad) but is definitely possible in user code or library code. SQL injection is an example of this sort of thing.

Obligatory note: log4j does not have such problem at all. log4j2 does, which is a completely different library implemented from scratch with lots of new, shiny, "useful" features like special processing of the messages.

But you knew what I was talking about, which was my point...

Arantor

We need more products to have a verb name followed by ever more arbitrary digits and letters. Like the 4j is logical, the 2 a bit less so, but the next letter to add to the list is unclear. Maybe we need to mix it up and have adjectives.

Maybe log4j2hot2handle?

LaoC

@Arantor said in The absolute state of web storage protocols:

We need more products to have a verb name followed by ever more arbitrary digits and letters. Like the 4j is logical, the 2 a bit less so, but the next letter to add to the list is unclear. Maybe we need to mix it up and have adjectives.

Maybe log4j2hot2handle?

log4j6🦶↓

Bulb

@Kamil-Podlesak said in The absolute state of web storage protocols:

log4j2 does, which is a completely different library implemented from scratch with lots of new, shiny, "useful" features like special processing of the messages.

Apropos useful features—the only feature I want from a logging library is turning messages on or off by a category and severity and maaaybeee adding timestamp. Then just dump it to stderr, either journald or containerd will pick it up from there and send it wherever, that's none of the application's business.

loopback0

@Arantor said in The absolute state of web storage protocols:

We need more products to have a verb name followed by ever more arbitrary digits and letters. Like the 4j is logical, the 2 a bit less so, but the next letter to add to the list is unclear. Maybe we need to mix it up and have adjectives.

Maybe log4j2hot2handle?

log3jtokyodrift

Kamil Podlesak

@Bulb said in The absolute state of web storage protocols:

@Kamil-Podlesak said in The absolute state of web storage protocols:

log4j2 does, which is a completely different library implemented from scratch with lots of new, shiny, "useful" features like special processing of the messages.

Apropos useful features—the only feature I want from a logging library is turning messages on or off by a category and severity and maaaybeee adding timestamp.

Well, yes,
log4j (1.x) still exists and if someone really wants something newer, there are at least two projects created by people with similar expectations.

Then just dump it to stderr, either journald or containerd will pick it up from there and send it wherever, that's none of the application's business.

I would also add files, that is the safest. Right now, I am doing support for some systems where logging goes to syslogd which then forwards it to journald... and all the messages are either duplicated (ok, annoying, but I can live with that) or missing completely (that is worse...). TBH it's journald from 2015, because "enterprise" (read: the head cover with longer-wavelength reflective properties )

dkf

@Bulb said in The absolute state of web storage protocols:

Apropos useful features—the only feature I want from a logging library is turning messages on or off by a category and severity and maaaybeee adding timestamp. Then just dump it to stderr, either journald or containerd will pick it up from there and send it wherever, that's none of the application's business.

Adding where the logging message came from can be very helpful too. Plus doing some basic formatting of extra arguments when the message is going to be written and not just dropped (because the DEBUG level isn't enabled, etc.) but that's pretty much trivial.

Everyone thought that was what log4j2 was doing. What they didn't expect was that it could also look stuff up in JNDI ( but perhaps OK for things in the baseline template, though not something you ought to take advantage of otherwise) and would merrily do it by repeatedly applying its full template parsing engine until there was nothing left to substitute ( ). Which included potentially calling out to JNDI.

Now, I wouldn't expect a Rust logging library to support JNDI (and I wouldn't normally expect a Java logging library to do so either) but the repeated parsing thing is entirely possible (it isn't exactly complicated to write a parse-and-substitute-until-nothing-changes loop) and still really bad because it still opens up various methods of attack through the logging system. Maybe they wouldn't cause a memory error, but they might still cause logging of wrong/misleading information. You don't want that. And that is why there should have been a fuss; looping substitution engines have totally different security properties to one-shot engines.

PleegWat

@dkf Recursive substitution can be useful but shouldn't be the default. There should be an explicit difference between 'insert this string' and 'substitute any placeholders in this format string, then insert the result', and in the latter case security scanners should trigger when the format string comes from an untrusted source.

HardwareGeek

@dkf said in The absolute state of web storage protocols:

they should have been smarter

That can be said about so many things.

dkf

@PleegWat said in The absolute state of web storage protocols:

@dkf Recursive substitution can be useful but shouldn't be the default.

It could be on for the base templates. That might even be a good idea; those definitely should be either from the library, the application source tree, or the deployment config file, and can have a high degree of caching applied. It perhaps should be applied also after the message string is inserted, though that is likely a bad move as it could well have string contents from uncontrolled sources (it's also a more expensive cache due to sheer size). But it definitely shouldn't occur after the parameter values are substituted; those are often from uncontrolled sources.

There should be an explicit difference between 'insert this string' and 'substitute any placeholders in this format string, then insert the result', and in the latter case security scanners should trigger when the format string comes from an untrusted source.

Templates should be from trusted sources only. So too should log messages, though that is all too often honoured in the breach (until it breaks, often when logging a set or map... ) Substitution parameters are not trusted... except to be convertible to strings; they're potentially hostile data.

Bulb

@dkf said in The absolute state of web storage protocols:

Templates should be from trusted sources only.

The templates in Rust standard library only accept literals, and are compiled. But that's sometimes too strict, so of course there are libraries that process templates at runtime.

PleegWat

@Bulb How the hell does that work with translation?

Gustav

@PleegWat

PleegWat

@Gustav Guess it was going to be that or "Of course you hardcode all your translations in the binary."

dkf

@Gustav Everyone gets the Elbonian version!

cvi

@PleegWat said in The absolute state of web storage protocols:

@Gustav Guess it was going to be that or "Of course you hardcode all your translations in the binary."

One binary per language.

dkf

@cvi said in The absolute state of web storage protocols:

One binary per ~~language~~ locale.

FTF maximising the amount of stupid work. (Also, you need to update the binary every time the timezone changes, a minimum of twice a year.)

Bulb

@PleegWat said in The absolute state of web storage protocols:

@Bulb How the hell does that work with translation?

There are two approaches:

Have a different templating functions that do interpret templates at runtime.
Compile all the translations to macros that will expand to switches by language at the right places.

Both approaches do have implementation. The later makes the compilation somewhat … slow, but it has a big benefit that it verifies, at compile time, that all the translations expand the same placeholders.