Discourse to NNTP gateway

OffByOne

Usenet used the In-Reply-To header, where it listed a set of message IDs of parent posts. I always assumed it showed parent, grandparent, and so on, in order.

Pendantic dickweedery ahead: You're talking about the References header. In-Reply-To has a different semantic value.

Short summary:

References is constructed as you say: the first message ID it contains is the message ID of the parent, followed by the contents of the References header of the parent, if any. This header is used to construct the threaded view of the whole conversation.
Each reply inherits the References field of its parent and adds something to that.
In-Reply-To contains a list of message ID's to which the message is a reply. If your message is a reply to 3 different messages (that don't need to have a linear connection in the thread hierarchy), those 3 message ID's are listed.
In-Reply-To is just a collection of message ID's of posts that are replied to, but the contents of that header are independent of the contents of that header in the parent, sibling, child, ... posts.

You can read http://tools.ietf.org/html/rfc5322#section-3.6.4 for a more in-depth explanation of how to construct these headers, if you're into that kind of thing.

@FrostCat said:

the Discourse preferred model of gathering multiple replies into one post doesn't fit into that mold, however. In Usenet, you'd just see one parent in a threaded newsreader,

That's because threads are constructed using References, which is necessarily single-parent by construction.

@FrostCat said:

but people weren't assumed to be stupid, and post IDs weren't conserved a scare resource, so people were able to handle multiple separate replies. Perhaps Jeff wasn't smart enough to handle that.

Yeah, short of parsing the contents of a post for [quote] tags and extracting the thread/post number, there is no way to get that kind of hierarchical information from Discourse.
Because Discourse displays posts linearly and for each quote has the chevron and up arrow, that's not a problem. Translating Discourse posts to NNTP with this meta-information poses a challenge

As an aside: I don't think I've seen any NNTP client do useful things with the contents of In-Reply-To, but I haven't paid much attention to it either.

OffByOne

@FrostCat said:

Even in Discourse, you can only see one "this is a reply to" top-of-post indicator, so I would probably use the same model NNTP uses, and say a post can only be a reply to one other post for treeview purposes.

Each post has a reply_to_post_number, which is null for a reply to the topic or the value of post_number of the parent post when it is a reply to a specific post (that's what the "this is a reply to" top-of-post indicator links to).

Acually the NNTP model and the Discourse model are quite similar, except that Discourse doesn't provide an easy way to get the post numbers of the posts that are quoted.

FrostCat

@OffByOne said:

Pendantic dickweedery ahead: You're talking about the References header.

Correct, but since I'm going off a 15-year-old memory, I'll call that pendantry non-dickweedy.

I don't think I've ever paid attention to In-Reply-To and I'm not sure how many newsreaders implemented it. Also, RFC5322? Well, that explains it--I was reading Usenet in '89, and had mostly stopped by, oh, let's say 2000.

riking

@OffByOne said:

except that Discourse doesn't provide an easy way to get the post numbers of the posts that are quoted.

[spoiler]It's stored in the database, but not delivered to the client.[/spoiler]

(This is proven by the fact that quoting another topic will produce a backlink to your post)

accalia

@riking <a href="/t/via-quote/4660/104">said</a>:<blockquote>[spoiler]It's stored in the database, but not delivered to the client.[/spoiler]

&lt;small&gt;(This is proven by the fact that quoting another topic will produce a backlink to your post)</blockquote>

it's also in the raw post... but then you have to parse it

/\[quote.*post:(?<post_number>\d+).*topic:(?<topic_id>\d+)/

riking

@accalia said:

but then you have to parse

Yeah, but the backend already did that.

accalia

right, but you can do it to from data that is (optionally) delivered to the client!

riking

It's also in the non-optional data (the cooked).

"<aside class="quote" data-post="105" data-topic="4660"><div class="title">
<div class="quote-controls"></div>
<img width="20" height="20" src="/user_avatar/what.thedailywtf.com/accalia/40/8425.png" class="avatar">accalia:</div>
<blockquote><p>but then you have to parse</p></blockquote></aside>

<p>Yeah, but the backend already did that.</p>"

/(?x)<aside class="quote" data-post="(\d+)" data-topic="(\d+)"> # parsing html with regex /

accalia

@riking said:

parsing html with regex

for this particular purpose it's not that much of a WTF, i mean the format is super specific and machine generated so it's not entirely unreasonable to use a regular systax to find it.

of course it's far from perfect, still is it really worth wheeling out the SGML parser for such a "small" thing?

OffByOne

@FrostCat said:

Correct, but since I'm going off a 15-year-old memory, I'll call that pendantry non-dickweedy.

I don't think I've ever paid attention to In-Reply-To and I'm not sure how many newsreaders implemented it.

Agreed, my pendandry was of the non-dickweedish kind. Btw, I didn't yet implement In-Reply-To, I didn't even realize what it's for until your previous post.
Yet another thing to add to my TODO-list.

@FrostCat said:

Also, RFC5322? Well, that explains it--I was reading Usenet in '89, and had mostly stopped by, oh, let's say 2000.

Heh. Since I'm implementing an NNTP gateway for the next 10 years, I have no use for outdated specifications like RFC822

My own Usenet usage started around 1999 until 2009-ish.

OffByOne

@accalia said:

@riking said:
parsing html with regex

for this particular purpose it's not that much of a WTF, i mean the format is super specific and machine generated so it's not entirely unreasonable to use a regular systax to find it.

It's WTF enough for my purposes :P You're right though, the format looks like something that won't change soon and it's verbose enough so a regex won't match other stuff by accident. There is a case to be made to use a regex to parse this particular metadata.

@accalia said:

of course it's far from perfect, still is it really worth wheeling out the SGML parser for such a "small" thing?

If it were only this small thing, I'd just use the regex and be done with it. Thing is, I'd like to have a structured representation of the post contents for other reasons (conversion to text/plain, get a list of emoji that are used and attach them as inline images, same for the avatars, ...), so I thing I'm going to unleash a real parser on it anyway.

You brought up the raw post, which I didn't think of. I was thinking of parsing the cooked post.
Fetching the raw is an extra request to the Discourse server. I'm not really concerned about the extra kilobyte or so of traffic, but it has noticeable roundtrip time and delays displaying the post to the NNTP client.
I also intend DiscoNews to be useable on other Discourse instances, and IIRC the raw thing is a what.thedailywtf.com specific extension. I can't rely on it for core functionality.
Then again, the cooked post format might be changed, whereas the raw won't ever change.

accalia

thanks to @riking fetching raw is no longer an extra request, hust add the GET parameter include_raw=1 to any of te topic listings and you'll get raw.

;-)

OffByOne

@accalia said:

thanks to @riking fetching raw is no longer an extra request, hust add the GET parameter include_raw=1 to any of te topic listings and you'll get raw.

for @riking!

It doesn't seem to work on meta.d... Is it a what.tdwtf only feature or did I just do it wrong?

I need something that works on vanilla Discourse. Parsing the cooked post contents looks like my only option at the moment.

accalia

works on meta.d too. or it did last i tried....

https://meta.discourse.org/t/feature-request-a-civilized-mute-for-users/22114.json?include_raw=1

yep... it works. (should also work for posts.json....

yup.

https://meta.discourse.org/t/22114/posts.json?include_raw=1&post_ids[]=84309

did we miss one?

OffByOne

@accalia said:

works on meta.d too. or it did last i tried....

https://meta.discourse.org/t/feature-request-a-civilized-mute-for-users/22114.json?include_raw=1
https://meta.discourse.org/t/22114/posts.json?include_raw=1&post_ids[]=84309

did we miss one?

No, the error was between my keyboard and chair. I didn't realize the GET parameter only has an effect on JSON requests.
Thanks!

brb, coffee refill...

Maciejasjmj

@accalia said:

thanks to @riking fetching raw is no longer an extra request, hust add the GET parameter include_raw=1 to any of te topic listings and you'll get raw.

On this forum, yes.

Personally, I'm thinking of doing a lazy-loading wrapper on the posts - if it succeeds with include_raw[], it just fills the content, if it doesn't, it sends off a request to /raw/x once you access the contents (since it's generally easy to get posts in bulk, sometimes even if you don't need them all).

Not sure if it's brilliant or brillant, though.

OffByOne

@Maciejasjmj said:

Personally, I'm thinking of doing a lazy-loading wrapper on the posts - if it succeeds with include_raw[], it just fills the content, if it doesn't, it sends off a request to /raw/x once you access the contents (since it's generally easy to get posts in bulk, sometimes even if you don't need them all).

That's the thing: /raw/x is a plugin: it only works on this forum (and other Discourse instances that have that plugin loaded, but I doubt you'll find many of those in the wild).
include_raw=1 is part of the Discourse core and will be available in all sufficiently recent DC instances.

I'd rather parse the cooked contents than the raw. Proper HTML parsers exist (and there are some very good ones for Perl).
A decent DiscoDownBbHtmlWTF-parser is not as easy to find and I sure as hell am not going to roll my own ;)

That's for my specific use case though. You're writing a generic DC API, so your requirements are different. I think using include_raw=1 will give you the highest chance of succes.

Maciejasjmj

@OffByOne said:

That's the thing: /raw/x is a plugin

Huh, so it is... because obviously nobody would be interested in a non-mangled version of their post. I thought it was a default option.

end

@Arantor said:

what I perceived the biggest UI problem with threading to be - mine was about getting users to actually click on the thing that indicated the proper hierarchy for their post.

I agree with @boomzilla, both threaded and unthreaded have their own sort of problems, but the massive complexity of threading gives it a lot more cons, IMO. Simplicity is a virtue, and "proper" threading is never simple.

But... the elephant in the room is that being wildly off topic is somehow encouraged here. So whatever your solution, it has to deal with a culture where 50 different completely unrelated conversations are going on under a topic with a title that is at best vaguely related to any of those 50 conversions.

(My point is that cultural adaption is harder than the software. And the bigger the conceptual difference between "the old thing" and "the new thing" the worse this transition becomes. Also for the love of God, you guys really ought to change your culture so that being ROUGHLY on topic is expected.)

OffByOne

@codinghorror said:

I agree with @boomzilla, both threaded and unthreaded have their own sort of problems, but the massive complexity of threading gives it a lot more cons, IMO. Simplicity is a virtue, and "proper" threading is never simple.

That depends on how the UI is presented to the user. All Usenet clients (that I know of) have a separate representation of the conversation tree and the actual contents of the selected post.
This separation of structure and content alleviates a LOT of problems with threaded discussions. It gives you an overview of the discussion structure, you can see where in the hierarchy of the whole conversation the post you're reading is located (and where your reply, if you make one, goes).

I agree that a threaded view where the post contents are crammed in with the discussion structure causes some cognitive overload and obfuscates the discussion itself.
For exactly the same reason, I prefer C(++) style separation between header files and code files: one declares the structure of your code (without implementation), the other contains the implementation of that structure, but doesn't declare any structure (INB4 inline functions in header files).
Compare that with Java and C# let's-mash-declaration-and-implementation-together-in-the-same-file style of coding and you'll get cognitive overload for any files that are more than 3 screen lengths long. Using "Class View" in Visual Studio is cheating, because that's a threaded/tree representation of the structure of your code and that's exactly what you don't like ;)

Flattening a discussion tree from 2 dimensions into 1 also loses information: you're making a projection and that is surjective operation.
What do you lose? An important part of context: namely where in the discussion the post you're reading fits. Context is important: you got that completely right in your blog post (cfr. the Twitter example). That's information you lose by cramming a discussion in 1 dimension.

By the way, using your own blog as support for your opinion doesn't count: you're backing up your opinion with your opinion. There probably exists a logical fallacy describing exactly that reasoning flaw. I think "circular reasoning" is reasonably close.
I've reread the blog post you linked to and it doesn't contain much facts but a lot of your opinions. I'm willing to change my viewpoint if you could point me to verifiable facts though.

@codinghorror said:

But... the elephant in the room is that being wildly off topic is somehow encouraged here. So whatever your solution, it has to deal with a culture where 50 different completely unrelated conversations are going on under a topic with a title that is at best vaguely related to any of those 50 conversions.

That has absolutely nothing to do with threaded/linear representation of a conversation, please stay on topic or reply as new topic

It's been repeated over and over again, but I'll say it again: we as a community don't care about topic drift. We can handle it. We don't mind. It's how we roll. We can and do sustain civilized¹ discussion without much effort, despite topic drift. We don't feel like fixing it, because we don't consider that conversation style broken.

We come here to chat and vent, not to have many small self-contained discussions about a single topic per discussion.
If that were the case, we'd probably be better off with a Stack Exchange-like forum than Discourse anyway. As you said yourself in the blog post you linked: "Stack Exchange is not a discussion system – it's actually the opposite of a discussion system".

@codinghorror said:

(My point is that cultural adaption is harder than the software. And the bigger the conceptual difference between "the old thing" and "the new thing" the worse this transition becomes.

Of course it is, that's why change management is so important, both in software development as other areas of life. People fear change. People rather stay inefficient in their set ways than try something new which might be objectively better.

One important thing to keep in mind is that something is not necessarily better just because it's new. That's why I used the words "might" and "objectively" in my previous sentence.
New for the sake of new implies lost time for learning how to use the new thing, without any gains afterward.

@codinghorror said:

Also for the love of God, you guys really ought to change your culture so that being ROUGHLY on topic is expected.)

You keep saying that, but you never explain why. What would we (as a community) gain with that? Why would we need to change our ways? Please explain the ROI for doing that.

¹ that's our definition of civilized, of course. It may or may not coincide with what you consider civilized, but frankly, we as a community care more about our own standards than yours. The needs/preferences of the many outweigh the needs/preferences of the one.

boomzilla

@codinghorror said:

But... the elephant in the room is that being wildly off topic is somehow encouraged here. So whatever your solution, it has to deal with a culture where 50 different completely unrelated conversations are going on under a topic with a title that is at best vaguely related to any of those 50 conversions.

I heartily disagree. That is orthogonal to how on topic one stays. The problem is that multiple people may want to reply to a particular message. And then several of those may generate their own replies. Pretty soon you have a giant tree to navigate.

Yamikuronue

@codinghorror said:

being wildly off topic is somehow encouraged here

Have you ever been to a dinner party?

Someone mentions the weather, and two or three other people chime in about the snow and isn't it cold and isn't it too early and it's a damn shame, and someone mentions traffic, and then someone mentions their cousin's crash back in the Great Snowstorm of '89. This is a thread entitled "Weather", but it's also about car crashes now, as each person chimes in more and more stories about cars. Someone mentions a crash they were in where they hurt their leg, and a few others chime in times they hurt themselves, but one guy didn't get to share his crash story so he talks about crashes and black ice, and in short order there's two conversations happening in the same group, one about ice and slipperiness and comparing shoes for traction and have you tried this brand? they're so comfortable and good for ice, while the other conversation is on about Great Grandma's toe that could predict thunderstorms and that time Joey stepped on a nail and tetanus shots and Ebola.

That's what our threads are like, except that everyone can hear each other perfectly and can respond to more than one thread of conversation at a time. It feels totally natural, except for the big glaring title at the top passing judgement on us for daring to drift.

FrostCat

@OffByOne said:

All Usenet clients (that I know of) have a separate representation of the conversation tree and the actual contents of the selected post.

Most of them use a fairly standard treeview, like in file manager programs. Trn had the best one ever, because it was two-dimensional. Basically it was a little flowchart.

FrostCat

@OffByOne said:

You keep saying that, but you never explain why. What would we (as a community) gain with that?

Since he never explaIns it, it's not safe to assume there's a more reasonable explanation than "it gives him a headache."

Untold numbers of 18-year-olds certainly managed it fine every year in the 80s, so I'm inclined to think it's some kind of defect on Jeff's part.

OffByOne

@FrostCat said:

Since he never explaIns it, it's not safe to assume there's a more reasonable explanation than "it gives him a headache."

That's why I asked the question directly three times with different words. I hope that way he would 1) grasp what is actually asked and 2) won't skip over it, yet again.

People who know me sometimes call me so naïve that it's cute, why do you ask?

FrostCat

@OffByOne said:

That's why I asked the question directly three times with different words.

I've learned through experience that rarely works. All you do is get irritated by being ignored.

OffByOne

@FrostCat said:

I've learned through experience that rarely works. All you do is get irritated by being ignored.

Meh, I don't get irritated easily, especially when I have no emotional investment in the subject.

My experience confirms yours, it rarely works. But if it works, we'll all finally know the answer. For it to work, he only has to latch onto one of my 3 different phrasings.

boomzilla

@FrostCat said:

Since he never explaIns it, it's not safe to assume there's a more reasonable explanation than "it gives him a headache."

Yeah, I mean CDO is certainly a valid answer here, and a reason to be heavy handed in moderation on forums that you control. Just try to be self aware enough to understand that not everyone will agree with you or care, you'll just irritate them by continually harping on it.

FrostCat

@boomzilla said:

CDO

That's a more charitable guess than what I was thinking. Besides, control freakery probably would make more sense than CDO.

boomzilla

@FrostCat said:

Besides, control freakery probably would make more sense than CDO.

Six of one, half dozen of the other.

end

I'll just leave this here, then.

This is why there are generally different clusters of people talking -- if you walked into a group of people talking about Sportsball at a party and started screaming "THIS REMINDS ME OF THE DARK KNIGHT RETURNS" that'd be.. uh.. a bad party for everyone involved.

chubertdev

Hey look, it's Discourse's back-end!

http://cdn.theatlantic.com/static/mt/assets/culture_test/batman football field 615.jpg

Yamikuronue

So at your dinner parties, people who want to start a side conversation are shoved into another room so they don't pollute the "pure" conversation?

Also, did you miss the part where everyone involved in the conversation drifted? Nobody wanted to talk about snow for more than the equivalent of a half-dozen posts.

I guarantee if you go to a dinner party and cut people off saying "We were talking about snow, please stay on topic." every five minutes, that'd be just as bad a party.

OffByOne

@codinghorror said:

I'll just leave this here, then.

[Dilbert cartoon that you've already posted somewhere on this forum]

I'd rather you reply to my post than turn this t~~read~~opic into a comic book.

Can you at least answer a few of the questions I posed or reply to some of my statements?

FrostCat

@Yamikuronue said:

I guarantee if you go to a dinner party and cut people off saying "We were talking about snow, please stay on topic." every five minutes, that'd be just as bad a party.

That would be rude. Instead you just have everyone who doesn't want to talk about snow go to a different room.

dkf

@FrostCat said:

Instead you just have everyone who doesn't want to talk about snow go to a different party.

FTFY

FrostCat

@dkf said:

>Instead you just have everyone who doesn't want to talk about snow go to a different party.

FTFY

Like the time Monica and Chandler had competing parties, you know that leaving would be an improvement.

end

@Yamikuronue said:

I guarantee if you go to a dinner party and cut people off saying "We were talking about snow, please stay on topic." every five minutes, that'd be just as bad a party.

What happens is that the people who were talking about Sportsball will move away from you, since they wanted to talk Sportsball, and they don't appreciate your attempts to hijack and divert their conversation into other areas.

Hence the natural clustering of groups of people by interest.

The other advantage is that we're in the world of bits, not atoms; so "moving to a different room" is like 100 pixels, not 10 feet.

PJH

@OffByOne said:

's the thing: /raw/x is a plugin: it only works on this forum (and other Discourse instances that have that plugin loaded, but I doubt you'll find many of those in the wild).

I'm fairly certain /raw/cat/post is core, not a plugin.

mott555

@PJH said:

I'm fairly certain /raw/cat/post is core, not a plugin.

/raw/cat sounds like Korean food to me.

Weng

@OffByOne said:

You keep saying that, but you never explain why. What would we (as a community) gain with that? Why would we need to change our ways? Please explain the ROI for doing that.

You see, @codinghorror doesn't understand the difference between purely social contexts and intent-based ones.

Let's compare. There's a community that I'm a part of. It's a group of competitors, officials, roadies, camp followers etc. involved in a racing series. Discussions follow "normal" topic "rules": "Look at my rollcage, tell me what's been done wrong". "Look at what I just bought!" "Did you see the accident between x and y?" "Videos!" "Acceptance letters are out, who's in!?" "I'm in trouble with the wife" "Look at my garage injury and learn from my stupidity!"

There's enough common ground to talk about and debate each issue because we're all there for more or less the same reason, we all do more or less the same things, so stuff stays on topic. There's a clear endgame to each topic: Answer the question. Determine who was at fault. Share opinions and reactions. Watch the guy build the car.

There's another community I'm a part of where the majority of all discussion happens in a "Random thoughts" thread. It's a ostensibly a general purpose car forum. This is more like a bunch of friends sitting around a table in a pub. Different cars. Different nationalities. Different life experiences, different viewpoints, and generally just free-flowing conversation. Other threads pop up for discussions that merit further elaboration, deep discussion/debate, or on recurring events, or whatever.

This community is a hybrid between the two. We have a very broad remit here: "WTF! in a vaguely IT-ish context." Almost all WTF in IT happens in the workplace. Almost all workplaces consider details about IT to be proprietary and IT personnel to be expendable. So our ability to elaborate and give full context is limited in a public forum. We exhaust topics pretty fucking quickly as a result. So they drift. This is compounded by a level of built-in camaraderie because, hey, we're the ones that get it. And, frankly, shared trauma. Having some dickhead who wrote some pretty WTF-y software shouting at you about how you're socializing about WTF-y software wrong is pretty traumatic.

loopback0

@codinghorror said:

The other advantage is that we're in the world of bits, not atoms; so "moving to a different room" is like 100 pixels, not 10 feet.

So that's what the empty space at the sides is for?

@codinghorror said:

if you walked into a group of people talking about Sportsball at a party and started screaming "THIS REMINDS ME OF THE DARK KNIGHT RETURNS" that'd be.. uh.. a bad party for everyone involved.

You're right it would be - except what's happening here is not people entering a topic and just shouting out a new topic of conversation. Conversations drift naturally, and you end up with a group of people who were all once talking about the same thing now with some/all of them talking about something different. That's what happens here, and here there's not a heavy handed moderator forcing people back onto the original topic which is how it works in the real world. If the Sportsball conversation at your party drifts onto talking about race cars, you're going to hate the person who sits there going "HEY GUYS WE WERE TALKING ABOUT SPORTSBALL SO PLEASE GO AWAY IF YOU WANT TO DISCUSS RACE CARS". And both here and IRL, any participant is welcome to bring the topic back to the original one if there's still material there.
Discourse actually makes it very easy to follow when a topic contains different subjects of conversation thanks to the "x Replies" and "In Reply To" indicators (and the little arrows on quotes). This is something it gets right that other forum software I've used doesn't.

HardwareGeek

@loopback0 said:

So that's what the empty space at the sides is for?

+�

chubertdev

@loopback0 said:

So that's what the empty space at the sides is for?

http://img2.wikia.nocookie.net/_cb20090301100547/mk/images/3/35/Sub-Zero_Flawless_Victory.jpg

PJH

@loopback0 said:

So that's what the empty space at the sides is for?

As opposed to the bouts of empty space in the middle we've had recently?

loopback0

That seems at least a little bit accidental.

Kuro

It is suprising how long this topic actually stayed pretty much on topic until codinghorror mentioned topics straying. We should really have a badge for that (and first award it to codinghorror):

Sneaky Derailer
Derail a topic that stayed on topic for more than 100 posts
[Picture of a train or something]

Filed Under: @PJH I request this on my authority of reading this whole topic! | Also, I kinda think it would be hilarious!

PJH

@Kuro said:

Picture of a train or something

This the sort of thing you had in mind?

Kuro

Just liking it is not enough, so have a post saying that I liked that post including the picture!

Filed Under: Welp, that was a useless post :D

accalia

I'll happily second @kuro's request and also vote for using that picture!