Discourse search sucks
-
I am amazed, I really am amazed at how balls-ed up this is.
I have a post for the bad ideas thread, so I type in 'bad ideas' into the search, sans quotes.
The first result, highlighted in blue is the "Add your badge idea thread", with "THE BAD IDEAS THREAD" second.
Seriously. How the hell do you get to badge idea before bad ideas for search criteria of bad ideas?
-
If you think that's bad, just look at the "Your topic is similar to..." suggestions when creating a new topic.
-
It's been accurate in the cases where I had similar titles to previous topics (e.g. the Non-IT WTF or Tales From The Mortgage Monster topics) but other than that it might as well be completely random, and probably is.
-
select top 10 title from topics order by newid()
-
No, it's PostgreSQL on the backend.
SELECT * FROM topics ORDER BY RANDOM() LIMIT 10
-
They should probably switch to case insensitive or lower the weight given to case sensitivity.
-
I do have to wonder how they did search because the sane approach is case insensitive by default...
-
They probably used PostgreSQL's full text search plugin. I never used it, don't know if it's case-sensitive by default or they made that decision.
-
Postgresql is by default:
12.1. Introduction
- Converting tokens into lexemes. A lexeme is a string, just like a token, but it has been normalized so that different forms of the same word are made alike. For example, normalization almost always includes folding upper-case letters to lower-case,…
…So Dicsores fucked it up on purpose. Supplies!!
-
Actually... I wouldn't be quite so quick to judge.
If they're doing it based on topic title first, you wouldn't typically throw the subject into a normal fulltext index.
In which case it might be something as simple as:
SELECT topic.* FROM topics WHERE topicname LIKE '%word%' AND topicname LIKE '%word%'
which in PGSQL is case sensitive, I think it's ILIKE you have to use for insensitive-like. (One of the reasons I got frustration with SMF's half-assed attempts to be DB agnostic with its roll-your-own abstraction, but it did it before PDO was a commonly available thing)
-
If you're using LIKE and ILIKE, ummm You're Doing It Wrong™... You should be throwing the topic title into a tsvector index. And use to_tsvector') etc.
-
- I'm not a PGSQL guy, I'm a MySQL guy that has had to work with some pretence of PGSQL support.
- We're talking about Discourse's devs here, who'd credit them with knowing how PGSQL works either?
-
posts = posts.order("TS_RANK_CD(TO_TSVECTOR(#{query_locale}, topics.title), #{ts_query}) DESC") .order("TS_RANK_CD(post_search_data.search_data, #{ts_query}) DESC") .order("topics.bumped_at DESC")
It's using some built in ranking algorithm. I never worked with text search before, so this is pretty much over my head. Either way, not as simple as missing lowercase.
-
If they're doing it based on topic title first, you wouldn't typically throw the subject into a normal fulltext index.
Umm… yes you would. But you'd pitch it into a different fulltext index to the one over the content.
-
OK, fine, I'll rephrase that: you wouldn't do it in MySQL that way because the overheads of the index on smaller textual columns makes it impractical, though I gather it's getting better in more recent versions. If only the people I work with had hosts with sane update policies.
-
I have a post for the bad ideas thread, so I type in 'bad ideas' into the search, sans quotes.
Incidentally, anyone know how I can massage a thread's URL so it goes to my earliest unread post?
Normally I can use the browser's autocomplete when I remember a page title, but Disbugs' URL pollution makes this infeasible, and opening history is slower.
-
You'd probably have better luck jumping out of the topic, then back in from the list.
-
I've had some inconsistent (hello there @Arantor) success with rewriting the URL with the post I'm at.
-
How much off is it? The whole "this is the post you're at" and "go to this post" thing is broked due to wonky detection of which post you're viewing, so if it's off by only 2 - 3 posts that might be it.
For semi-working helper check my battles with Ember in the HTML abuse thread, though composing the post with that code active is... bad. Fucking Discurse calls render on each fucking character so it can render the preview, thus running the whole thing hundreds of times for no good reason.
I'm kinda suspecting it's unfixable at this point, since it does the same thing with embedded videos. And that's the code the makers of Discurse wrote, not someone poking around the code using the debugger and looking up random facts on Ember.
-
I'm kinda suspecting it's unfixable at this point, since it does the same thing with embedded videos.
I'm sure I saw a bug on meta about that, and was under the impression it was fixed...
-
I'm sure I saw a bug on meta about that, and was under the impression it was fixed...
A particularly mistaken impression.
Yep, it was fixed for timestamp links, and nothing else whatsoever. For the record, having
https://www.youtube.com/watch?v=7qpDFXPAyXA&feature=kp
in your post still triggers the bug.
-
How much off is it?
The problem is that I enter the page through a URL which is pointing at a fixed post. If I delete the post number it sends me to the first post. If I substitute with the word "last" it sends me to the end of the topic. I was wondering if there was a magic keyword that would send me to whatever my first unread post was.Jumping from a list of topics like @ChaosTheEternal suggested is usually best, since those links already point to the correct post. The problem is that it requires searching around for a link instead of just typing something on the address bar.
-
It's using some built in ranking algorithm.
Well, from the snippets it looks like a custom algo independent of the one in default tsearch, which means anything goes :)
-
A particularly mistaken impression.
This is the report I was thinking of:
https://meta.discourse.org/t/flickering-in-the-preview-with-syntax-highlighting/11303