What the Daily WTF?

aitap

≥1, ≤3, tea with or without milk.

aitap

@Tsaukpaetra said in The Official Status Thread:

Hey we can be sector buddies

Sure, why not?

384 Offline uncorrectable sectors
ST8000DM004-2CX188 <...> 8.00 TB

Wow, how old are those?

aitap

@anonymous234 And these ludicrous penny loafers. Triple pleated khakis? Preposterous! Nobody here has an eye for fashion!

aitap

If the client only Accept:s text/csv, an empty reply should suffice. Otherwise, if rendering a proper HTML error page is too much trouble, a plain text error message should be enough.

On second thought, perhaps I would try to redirect to a special error page, possibly passing the error message in the parameters.

aitap

I'm working with a library providing a C interface that has been evolved from a "C++ but mostly C" application written by a guy who had much better understanding of physics he needed to model than software design.

The header file defines a bunch of structs:

typedef struct ab_foo { /* ... */ } AB_FOO;
typedef struct ab_bar { /* ... */ } AB_BAR;
// more structs
typedef struct ab_baz AB_BAZ; // used as opaque pointer

and the functions that have to be called in a certain order:

// this reads the settings file, sets some parameters that cannot be changed later and allocates some memory
int ab_param_init (const char * ferr_name, const char * f_settings_name, AB_FOO *, AB_BAR *, /* more structs */);

// this fixes some more parameters, allocates more memory and downloads stuff from database
int ab_frobnicate_init (const char * f_err_name, AB_FOO *, AB_BAR *, double ini_florgle, AB_BAZ ***, /* more struct pointers */);

// probably the most important function
// can be called repeatedly after setting some structure fields that still can be changed
// fills a different structure with calculated values
int ab_frobnicate (const char *f_err_name, AB_FOO *, AB_BAR *, AB_BAZ ***, /* ... */);

// and the destructor is three different functions freeing different structures
void ab_foo_free (AB_FOO *);
void ab_baz_free (AB_BAR *);
void ab_free (AB_BAZ ***, /* more struct pointers */);

The file specified by const char * f_err_name is opened by all functions and all error messages are appended there. The file specified by const char * f_settings_name is an INI file that is parsed by a copy of iniparser.

I am now in a position to influence the interface changes. Aside joining the structures in a single Parameter object that can be created with default values set, ditching logging to files and providing a way to set all parameters programmatically without creating an INI file, what else should I (try to) change?

aitap

Today's dream was in some vague extrapolation of Half-Life series.

The G-Man, looking really tired, unshaven, with bags under his eyes, his usual costume replaced by something one usually wears in one's own house, was telling the player to stop fucking around with reality and about dangers lurking in things even he does not fully understand. The talk went in a kitchen belonging to a dark underground facility, with an unmarked bottle of something potentially alcoholic on the table.

Drinking with G-Man during the small hours of the night definitely counts as fucking around with things I don't understand, that's for sure.

aitap

@Gąska said in Help Bites:

You can put everything that should not be changed after creation behind opaque struct, so users cannot change them even if they try to.

Good idea, thank you.

Also, this triple pointer looks suspicious - I'm assuming it's some kind of nested array; consider making it flat array.

Yeah, it's a pointer to an "array of arrays". The code we currently have first allocates an array of pointers, then allocates a bunch of actual arrays, then stores it all using the provided pointer. Life is fun.

aitap

@boomzilla Okay, sorry for the noise.

One more Hail Mary shot. The question title makes it sound as if it's nothing like your problem, but the answers are all about decoding UTF-8 bytes stored in a TEXT column. In plain SQL, on MS SQL Server.

At least it's consistently botched. right? It's not a mixture of encodings?

aitap

it's driving me mad how stupid the population has become

Unfortunately, no matter what you do, 50% of the population will be stupider than the median. (And approximately 50% will be stupider than the mean, but that's relying on the distribution being symmetric.) Maybe it's not the population itself, but increased ability to communicate with different layers of the population exposing you to the stupidity?

then i apply for this thing called technation visa from within the uk <...>, but then i basically was terminated <...> applying for promise route of the exceptional tier 1 visa in 2018

Making a living in a foreign country is hard. A friend of mine is in a similarly precarious situation in a EU country with a PhD visa that's about to expire and uncertain financing prospects. Unfortunately, all immigrants start at -100 points, even if their home country offers much lower standards of living. Will you be able to get some psychological help in either country? It's the only thing that helped my friend the EU PhD student to get through a really bad spot and recover their performance enough to avoid getting terminated (and having to leave the country).

use works of Nietzsche, Ayn Rand And unabomber's manifesto

Your work risks not being accepted by the people you want to convert if you rely on works they didn't believe in in the first place. I mean, I read all three of those, but mostly didn't find them convincing. Convincing large masses of people of something is essentially engaging in PR and politics, and most successful politicians don't rely on Ayn Rand. Perhaps your message could be accepted on its own merits, if the improvements to the quality of coding life you are promising are so noticeable?

People want a professional tool like Photoshop to make software, but they get Gimp. Adobe used to provide enterprise-experience to millions of people, but now it's all open source.

I'm sure there are people on both ends of the spectrum. A person with 350 EUR in monthly income wouldn't buy an Adobe product anyway, but they might still need to develop some scripts to analyse their PhD project data. I do agree that diminishing quality as a result of commodification is a problem though. One of my colleagues swears by their copy of Visual Studio 2010 because "it got worse in the later versions".

P.S. Depending on which direction you prefer for this thread, the right place for it might be a limited-access category

aitap

@djls45 said in WTF Bites:

Do you mean HTTPS or VPN?

Sorry, I meant WPA.

aitap

Disclaimer: not a lawyer.

Have I covered all the bases here?

It seems that you have. Maybe even slightly over-covered, but better safe than sorry. Your program would be an "aggregate" (section 5 of the license) that should not be covered by GPLv3. The FAQ (1 2) seems to support this, since your application is using the user interface of the GPLv3 application. It would help if your application was at least partially useful without the GPLv3 application.

Just in case: does the GPLv3 application have any non-system (for sane definitions of "system", section 1) dependencies by itself? You may be required to provide links to those, too.

aitap

@Gribnit Quick, brooms, dustpans, blowtorch, hot plate, buckets!

aitap

@anonymous234 said in WTF Bites:

@aitap I think it makes sense.

Thank you.

@Tsaukpaetra said in WTF Bites:

Link is dead. Bluehost says no go,

⊙▂⊙
Well, the repo has commits in this year. I hope the author is okay and didn't desert the project.

aitap

@Tsaukpaetra said in NAT traversal and stuff:

How much time do you want to invest in this?

Well, is "about as much as required to get it right" a meaningful answer? Is it even possible to "get right" a protocol?

Do you need other applications to use it?

Ideally, I want it to be easy to write an independent implementation of the protocol for someone else in a language I don't even know about, but chances are, no-one ever will be interested in it at all.

How resilient should this protocol be (disconnects, replay attacks, impersonation, MITM, encryption)

[D]TLS with key pinning shall be the inner layer of the protocol, so I'm assuming I've got the security aspect covered. Disconnects? Yes, one of the purposes of this exercise is to recover after partial transfers.

For example, Hypatia's voice protocol was built ground-up and has not-a-few places of weakness, but it works well for our purposes and took a year to develop in-house across three people.

Thanks, that's an important data point.

@dkf said in NAT traversal and stuff:

I've done far too much debugging of binary protocols and their implementations over the past few years, and it's really difficult and yucky.

Thank you for the warning. I needed it.

For your first version, sending JSON over HTTP is relatively easy to make work, especially if you assume that all clients are non-malicious.

I really don't want to make assumptions like that, but perhaps it's unavoidable in a hobby project.

not having to debug those parts will save you quite a lot of hair

That's true. Again, perhaps only people with infinite time can afford not having dependencies. And it's certainly not the case for a hobby project.

aitap

@obeselymorbid said in THE BAD IDEAS THREAD:

Is that (certificate + ID) not the case pretty much everywhere?

Officially, that's the case more or less everywhere. Unofficially, I've seen people just skim the PDF with the naked eye (or do they have QR parsers implanted in the eyeballs?) and decide that it's valid. Or even accept a certificate that's not formally valid in the country, out of their goodwill and my trustworthy looks, also without checking the ID.

aitap

@dkf said in WTF Bites:

The cost of launching a subprocess can, in some cases, relate strongly to the size of process that is doing the launching.

Do you mean the cost of forking? I thought modern Unices had more or less cheap fork because of copy-on-write (and there is also posix_spawn) and Windows wasn't supposed to have this problem because of their spawning model.
Any keywords I could search to read up more?

aitap

@hungrier Right. I meant setting "change language" to Ctrl+Shift and not using "change layout" at all.

aitap

@Gąska said in WTF Bites:

unallocated

Yeah, I failed to demonstrate my observations properly. I don't know a good way to show how much of the memory belonging to a Linux process is currently swapped out (it's not part of RSS), so I provided VSZ as an unreasonably high upper bound.
But you can have my word that the machine has 6 GB of used swap and if I do swapoff -a that 6 GB ends up in RSS of skypeforlinux.

aitap

I stumbled upon a nice work Peer-to-Peer Communication Across Network Address Translators, Bryan Ford, Pyda Srisuresh, and Dan Kegel. USENIX Annual Technical Conference, April 10-15, 2005 and decided to reproduce some of its results.

The first thing you notice when reading it is that TCP NAT traversal requires multiple sockets bound to the same address (one for the introducer server - should be kept alive, lest the NAT table entry is removed - one for the outgoing connection, one for the incoming connection - in practice, sometimes it's possible to get by without the third socket, but the theory says that the third socket still may be needed). It is not normally allowed in Berkeley sockets API to have multiple sockets bound to the same local address, but one can set SO_REUSEADDR and/or SO_REUSEPORT options (depending on the OS) to fix that.

I would like to ask you to compile and run the following code and tell me if it produces any errors. I am also interested whether it would produce any errors if you build it with #define NO_REUSEPORT on non-Windows.

Now, I know how bad it looks (running someone else's code written in C of all languages and letting it connect to stuff? and call listen()? what he thinks he's doing? lives have been ruined this way), but the program is really small (164 LoC), does not have any outside dependencies, and I tried to comment it so you could read it and see that it only creates three sockets, connects one of them to example.org:80 and binds the other two to the same address the first one is bound to - I just need to know how portable is it to bind multiple sockets to the same address, which options are required to do that and which are not. I am particularly interested in results from Apple-verse, since I don't have access to the hardware myself.

#ifdef _WIN32
	#define NO_REUSEPORT
	#include <winsock2.h>
	#include <ws2tcpip.h>
	#define close closesocket
	/* XXX: link with Ws2_32.lib */
#else
	#define _POSIX_C_SOURCE 201112L /* request Berkeley sockets API when strict C standard required */
	#define _BSD_SOURCE /* NI_MAX... visible with _GNU_SOURCE on new glibc, but requires _BSD_SOURCE explicitly on BSD */
	#define _GNU_SOURCE
	#include <errno.h>
	#include <netdb.h>
	#include <sys/socket.h>
	#include <sys/types.h>
	#include <unistd.h>
	typedef int SOCKET;
	enum { INVALID_SOCKET = -1 };
#endif

#include <stdio.h>

/* In real world use, this would be the introducer server. For the purposes of
 * this test, any server you could connect to will suffice. */
static const char *hostname = "example.org", *port = "80";

/* SO_REUSEADDR seems to be required for this to work. Not sure about
 * SO_REUSEPORT: it may be required on Linux (but is only available on Linux
 * > 3.9); it has been available on BSDs (I haven't checked yet) and is not
 * needed on Windows, where SO_REUSEADDR already provides necessary changes. */
static int setup_socket(SOCKET sock) {
	int ret = -1;
#ifdef _WIN32
	BOOL
#else
	int
#endif
		opt = 1;

	printf(" setsockopt(SO_REUSEADDR");

	if (setsockopt(
		sock, SOL_SOCKET, SO_REUSEADDR,
#ifdef _WIN32
		(const char *)
#endif
		&opt, sizeof opt)
	)
		goto cleanup;
#ifndef NO_REUSEPORT
	printf(",SO_REUSEPORT");
	if (setsockopt(sock, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof opt))
		goto cleanup;
#endif
	ret = 0;

cleanup:
		printf(")");
		return ret;
}

int main() {
	/* C89 requires declarations before code. */
	char hostaddr[NI_MAXHOST], localserv[NI_MAXSERV];
	SOCKET socket_conn = -1, socket_bind = -1, socket_listen = -1;
	struct addrinfo hints = {0}, *server = NULL;
	struct sockaddr_storage connaddr;
	socklen_t addrlen = sizeof connaddr;
#ifdef _WIN32
	WSADATA wsa;
	{
		int result = WSAStartup(MAKEWORD(2,2), &wsa);
		if (result) {
			printf("WSAStartup failed, %d\n", result);
			return -1;
		}
	}
#endif

	setbuf(stdout, NULL); /* Disable printf() buffering. */

	/* In real world use, first socket is connected to the introducer server
	 * to learn the address of the other party. All connections to the other
	 * party should be made from the same local port as the connection to the
	 * introducer server in the hope of getting the same outbound port on the
	 * other side of NAT. */
	printf("First socket:");

	hints.ai_family = AF_UNSPEC;
	hints.ai_socktype = SOCK_STREAM;
	{
		int result;
		printf(" getaddrinfo(%s)", hostname);
		result = getaddrinfo(hostname, port, &hints, &server);
		if (result) {
			printf(" failed, %d\n", result);
			goto cleanup;
		}

		/* This is not required, but I wanted to print the resolved address. */
		printf(" getnameinfo()");
		result = getnameinfo(server->ai_addr, server->ai_addrlen, hostaddr, sizeof hostaddr, NULL, 0, NI_NUMERICHOST);
		if (result) {
			printf(" failed, %d\n", result);
			goto cleanup;
		}
	}

	/* XXX: We are assuming that the first address in the list returned by
	 * getaddrinfo() is viable. In real world usage, we should iterate over it,
	 * retrying both socket() and connect() calls until we get a result. */
	printf(" socket()");
	socket_conn = socket(server->ai_family, server->ai_socktype, server->ai_protocol);
	if (socket_conn == INVALID_SOCKET) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	/* For this trick to work, all participating sockets should allow address
	 * and port reuse. */
	if (setup_socket(socket_conn)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	printf(" connect(%s)", hostaddr);
	if (connect(socket_conn, server->ai_addr, server->ai_addrlen)) {
		printf("failed, %d\n", errno);
		goto cleanup;
	}

	printf(" ok\n");
	/* By the way, this connection should stay open, lest the NAT would stop
	 * translating the incoming packets for us. */

	/* In real world usage, the second socket is used to connect() to the
	 * address given to us by the introducer, while the other party tries to
	 * connect() to us. Sometimes it even works, but see below.
	 * The connection attempt should stem from the same source port, hence the
	 * setsockopt() shenanigans. */
	printf("Second socket:");
	printf(" getsockname()");
	if (getsockname(socket_conn, (struct sockaddr *)&connaddr, &addrlen)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	{
		/* Again, this is not required, but I wanted to print the local port. */
		int result;
		printf(" getnameinfo()");
		result = getnameinfo((struct sockaddr *)&connaddr, addrlen, hostaddr, sizeof hostaddr, localserv, sizeof localserv, NI_NUMERICHOST|NI_NUMERICSERV);
		if (result) {
			printf(" failed, %d", result);
			goto cleanup;
		}
	}

	/* We are trying to "reproduce" the first socket as close as we can, so no
	 * iteration should be going on here. */
	printf(" socket()");
	socket_bind = socket(server->ai_family, server->ai_socktype, server->ai_protocol);
	if (socket_bind == INVALID_SOCKET) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	if (setup_socket(socket_bind)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	printf(" bind(%s,%s)", hostaddr, localserv);
	if (bind(socket_bind, (struct sockaddr *)&connaddr, addrlen)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	printf(" ok\n");

	/* Despite double-connect() trick seems to work on both Windows and Linux
	 * for some people, it is not guaranteed to work. Instead, we should also
	 * create a listening socket on the same port as above and wait for
	 * incoming connections there. Only one of the two sockets will ever get a
	 * working connection, since the tuple { protocol, local address, local
	 * port, remote address, remote port } uniquely identifies a connection
	 * and we got the first three to be the same for all three sockets.
	 * We may have to use SO_REUSEPORT for the third socket. */
	printf("Third socket:");
	printf(" socket()");
	socket_listen = socket(server->ai_family, server->ai_socktype, server->ai_protocol);
	if (socket_listen == INVALID_SOCKET) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	if (setup_socket(socket_listen)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	printf(" bind(%s,%s)", hostaddr, localserv);
	if (bind(socket_listen, (struct sockaddr *)&connaddr, addrlen)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	printf(" listen()");
	if (listen(socket_listen, 1)) {
		printf(" failed, %d\n", errno);
		goto cleanup;
	}

	printf(" ok\n");

cleanup:
	if (socket_conn != INVALID_SOCKET) close(socket_conn);
	if (socket_bind != INVALID_SOCKET) close(socket_bind);
	if (socket_listen != INVALID_SOCKET) close(socket_listen);
	if (server) freeaddrinfo(server);
#ifdef _WIN32
	{
		int result = WSACleanup();
		if (result)
			printf("WSACleanup failed, %d\n", result);
	}
#endif
	return 0;
}

aitap

@pie_flavor said in When you webpack so badly you broke every other script (and most of the web page):

Unsong

So, Apollo crashes into the celestial sphere that holds heavenly bodies, causing the machinery that kept the universe running on mathematics to break... also, Bible singularity, whale puns and Eric S. Raymond references? I'm sold. This might be better than Rapture of the Nerds!

aitap

@Benjamin-Hall said in Good reading for self-taught programmers?:

It wants software to be "smart" and "proactive", remembering everything you've done and trying to apply it, relying on undo rather than asking.

I had forgotten how infuriating was chapter 8 before I got to re-read it. I always remember examples of software trying to be "considerate" in terms of the book, but failing badly. Perhaps Cooper et al. would argue that the programs were ; I just think what a privacy nightmare it could be made into. Bloody software! I don't want a program that learns! I want a program that stays stupid!

(In the spirit of advocatus diaboli, perhaps I'm just jaded by software that tries and fails to be considerate and hate the idea of "smart" software because of that, but maybe could find a really considerate program convenient. I don't know; I've never seen one that was.)

For me, useful lessons start approximately from chapter 9.

aitap

I've been using Gpg4win to exchange private information with a small number of people I couldn't contact otherwise (or that we had previously agreed about not transmitting elsewhere). While my girlfriend is fine with this despite the UX, my parents just don't manage it well, which caused a few problems while I was in a different country.

What typically gets them is the requirement to copy&paste the whole armoured message block, from -----BEGIN PGP MESSAGE----- to -----END PGP MESSAGE-----. They can botch the newline after the BEGIN line, or forget some dashes, or forget the BEGIN and END lines, resulting in a honest but mostly unhelpful message from Kleopatra telling them that their text doesn't look like an OpenPGP message. I can't rely on the mail client to do it right for them because one of them uses web-mail; besides, we could be using a different transport.

I guess we could exchange UTF-8 encoded .txt.gpg files to avoid the problems with armoured blocks, just like we exchange other encrypted files, but is there other software we could use instead?

Being transport-agnostic is a requirement. I don't particularly care about forward secrecy (which seems to preclude offline encryption) or metadata (of course I exchange information with my relatives and friends, that's not exactly secret). I just need signed and encrypted files and text that I can send over whatever.

aitap

@Benjamin-Hall said in Handling when invalid data sneaks past into the db:

Assume you have some kind of business rules like "All Foo.count values, if they're set, must be between 2 and 43, inclusive." But someone messed up the validation on the server so that in one particular workflow, Foo.count could be anything. And so some invalid (but still type-consistent) values like 123 or -2 have gotten into the database.

How do you get the correct value for Foo.count once you realise it's wrong? In this example, it could be an expensive select count(...) and a materialised view, but I expect your actual use case to be nastier.

aitap

@_P_ Also @error_bot xkcd Drama

aitap

@dkf Hmm. Not only that. www.antipope.org is not on Cloudflare, but gets the weird treatment. astralcodexten.substack.com is on Cloudflare, but resolves correctly.

aitap

@topspin said in In other news today...:

Even with their clarification that you're not expected to follow "The Code", I'm not sure if they actually think that's a code you should try to ~~live~~ develop by

I can't be sure, of course, but it's not unlikely that this is the code that D. Richard Hipp follows for himself and it all was just a giant misunderstanding on his part. SQLite makes a big point about being developed cathedral-style, with rare outside contributions being rewritten from scratch, so the code (he thought) was really just for him and a few colleagues.

Interesting how going from forum rules to code of conduct changes the attitude towards the object in question. It must be in the connotations.

aitap

Does anyone know a photo editing application with functionality close to that of PowerPoint 2010 image enhancement features? Not less and not much more.

My relative prepares figures for a textbook by pasting them in PowerPoint, cropping/changing contrast/whatever, then saving the result as a new image. This... mostly works, but occasionally, PowerPoint manages to destroy the aspect ratio of the image. Which application would be easier to adapt to after PowerPoint? IrfanView is not the answer, apparently, or at least needs further training.

aitap

@Kamil-Podlesak And here is an example of such a replacement that actually makes sense and works somewhere in production.

aitap

@PJH Thank you! So far it looks like I won't need SO_REUSEADDR except on *BSD and macOS.

aitap

@MrL, @remi Have a ++ for being so reasonable about your disagreement.

aitap

@Deadfast I once was on a conference in Canada with some EUR, a wad of USD and a Maestro card. The plan was to exchange USD for CAD in the airport, but I forgot something in the plane, and by the time I was able to get it back the exchange booth was closed. I ended up locating one ATM on the premises that worked with Maestro and using it during the whole conference because the bank working hours coincided with the most interesting talks (all of them). No merchants accepted the Maestro, either.

aitap

@DogsB The Invisible Sun trilogy, especially its first book, was even more catching for me. Felt like a new Halting State book, except with geopolitics instead of hackers.

aitap

@Tsaukpaetra I wouldn't call Pockets gaming machines. I used my Pocket 1 to write some useful code and finish two conference talks (I'm not proud of the latter fact). Sure, it runs Quakespasm and StarCraft 1 well, but the battery life is good (approaching one working day, unlike all other laptops I ever owned) only when the CPU is mostly idling. This Pocket 2 was supposed to be a similar productivity gift to a relative, but its battery died. So it goes.

aitap

@loopback0 said in New lab PC:

Stock coolers are better these days. On standard clock speeds, the stock one supplied should be fine.

Thanks!

Fun fact: (the price I'm going to get for Ryzen 5 3600 BOX) - (Ryzen 5 3600 OEM price) is a bit more than the cost of that Arctic Freezer 12. Hopefully, this indicates that Arctic Freezer is no better anyway.

aitap

@Gąska My colleague reported problems with his stock Intel cooler, but then he's a gamer, so I guess I shouldn't be basing my assumptions on his experience. My FX6100 used to overheat with a stock cooler, but that's considered ancient by now.

Thanks for the advice on discrete GPUs; I will choose a better CPU and a cheap video card.

More pixels = more stuff visible at once, and there's never too much stuff visible at once.

But that should probably be accompanied by increasing the physical dimensions of the monitor. I have a 7" laptop with a 1920x1200 display, and at its 323DPI the bugs in pixel-based application layouts almost outweigh the benefits of sharp fonts.

aitap

@Applied-Mediocrity Thanks for the heads up about M.2 SSD and the cooler, also for the warnings! I didn't even think about buying non-SATA storage. I'll ponder the HDD and the case some more, will see if I find a better option.

aitap

@Circuitsoft said in When one's scientific equipment is another's BadUSB:

The best U3v cameras I've found yet only require power cycling a few times a day.

Wow, that's just horrible. Would you mind telling me which brand of cameras is the least bad, in your experience? I only need precise exposure control, a 12-bit ADC and external synchronisation.

aitap

Are there any books or HOWTOs on development of application-level network protocols? Specifically, the serialization part? I've been obsessing over this question and on-and-off binge-searching the Web for a few years now, but haven't started implementing anything for fear of getting things wrong. I would like the protocol to be implementable with (1) as little as possible potential for getting things wrong and (2) little to no dependencies. A person who taught me an elective on back-end C++ recommended me to (1) read RFCs for inspiration, (2) separate semantics from serialization in the protocol description and (3) read Stevens' TCP/IP Illustrated. Which kinda helped to build understanding, but didn't get me anywhere further.

From the looks of it, a simple binary protocol looks hard to get wrong: read one-byte header off the line, determine whether there's a need to read a payload, do stuff; while parsing a text-based protocol involves, well, a parser, a grammar, maybe even a buffer for variable-length strings - all those things that programmers get wrong and/or introduce corner cases in. One could say "Oh, it's just JSON (over HTTPS)", but such a framework is too general and makes it possible to describe so much more stuff different from what I need, which means that a peer expecting something simple along the lines of {"type":"INCOMING", "peer":"c30fde008b867f14ab04cd449bdce5585c5ffb26bed05928a7ebe7d6384d6096"} must be ready to receive a big structured turd (like a MongoDB dump - valid JSON, just not the kind we need) instead and still not crash. And while everyone speaks HTTPS, it would present another dependency with its own attack surface. It is, of course, possible to do that securely, but might be harder. Or not?

On the other hand, I've just (some people would say ill-advisedly) finished esr's The Art of UNIX Programming and read his post about the virtues of self-describing data. And it's even possible to parse simple JSON without building a tree of it, so maybe receiving a War and Peace equivalent in JSON world should not present such a problem for our hypothetical JSON-speaking peer. (Still, microjson only works with complete JSON representations, which might require buffering data and/or limiting the message size.) Also, JSON does not solve the problem of representing values like IP addresses and key fingerprints as strings and parsing them back: while it's easy to describe an easily comparable binary representation for them (IPv6 address: 16-byte unsigned integer, network order. Done!), there may be multiple ways to make strings of them (upper case or lower case hexadecimal digits? full or simplified representation of IPv6 address?), which all must be handled (harder to implement) or only one way to stringify the data should be presented (harder to describe the protocol).

And exchanging IP addresses and certificate fingerprints is the easy part. Later I hope to design another protocol to merge Merkle trees across a previously established P2P connection.

Perhaps a type-length-value format (Bencode?) with binary payloads is a better idea, but that means hand-parsing, which is hard to get right. There is, indeed, a whole spectrum of already existing formats, some of them human-readable, others binary; some requiring a schema, others fully self-describing; implemented as libraries or code generators. Maybe I ought to choose one of them and stop listening to my NIH syndrome - dependencies, after all, mean code reuse and less work for the implementor - but it's so hard to choose the right one!

Best posts made by aitap