ElasticStumped



  • So I have some data I've been chugging into ElasticSearch, the JSON schema looks something like this:

    {
      "events": {
        {
          "type": "foobar",
          "other": "data"
        },
        {
          "type": "monster_mash",
          "other": "data"
        }
      ]
    }
    

    I need to filter results by events.type. It's a NoSQL database, so this ought to be simple, right? Hell, that's even simple in Postgres or SQL Server!

    So I tried this query:

    GET /index_name/_search
    {
      "query": {
        "nested": {
          "path": "events",
          "query": {
            "bool": {
              "must": [
                { "match": { "events.type" : "foobar" } }
              ]
            }
          }
        }
      }
    }
    

    Ok, that query syntax is nightmarish, but it looks like it should work, right? Let's try it:

    Query Parsing Exception:
    [nested] nested object under path [events] is not of nested type"
    

    Uh... wha?

    I don't get what I'm doing wrong here. I've read in the docs that you can override the indexes and tell ElasticSearch that the object marked "events" is a "nested" type object using a mapping, but:

    1. I don't own this DB server, so I'm not going to fuck around with its indexes
    2. Even if I wanted to fuck around with its indexes, the instructions for doing so are extremely vague, and I can't find a good example, and it says you have to delete the old index and start over-- yikes!
    3. I find it impossible to believe that a NoSQL database doesn't have built-in support for something so simple-- I must just be looking at the wrong documentation page, right?
    4. This phrase in the docs:

    Multi level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level (and not root) if it exists within another nested query.

    Imply to me that the index should not be needed, but it's not working without one? Are the docs out-of-date?

    (And yes yes, I know I'm "doing it wrong" by not querying on something on the root level, but the data's already there so I have to cope with the world I live in.)

    Any clue here?


  • 🚽 Regular

    @blakeyrat said in ElasticStumped:

    Multi level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level (and not root) if it exists within another nested query.

    Imply to me that the index should not be needed, but it's not working without one? Are the docs out-of-date?

    My brain took a serious BSOD trying to parse that sentence in the docs. "if it exists within another nested query" is probably false in your case? But who knows?

    One thing I see is GET /index_name/_search might be wrong. I do something like

    GET /index_name/type_name/_search
    

    Not sure if that's what you meant, since I'm sure you did some redacting to make your example generic, but AFAIK if you query on the index itself without specifying a type, you aren't going to get what you want.



  • @the_quiet_one said in ElasticStumped:

    I do something like
    GET /index_name/type_name/_search

    Are you saying that I could just do:

    GET /index_name/events/_search
    {
      "simple_filter_on": "type";
    }
    

    ???


    EDIT:

    Nope, if I do a query.match_all on that path, I get zero results. Maybe it would work if I set the index of that path to "nested"? But as I stated, I don't want to do that for many many reasons, biggest one being I don't want to fuck up someone else's biznizz.


  • 🚽 Regular

    @blakeyrat said in ElasticStumped:

    @the_quiet_one said in ElasticStumped:

    I do something like
    GET /index_name/type_name/_search

    Are you saying that I could just do:

    GET /index_name/events
    {
      "simple_filter_on": "type";
    }
    

    ???

    If events is a root object, then yes. I believe you can even do something like:

    GET /index_name/events/_search
    
    {
    "query": {
        "bool": {
          "must": [
            { "match": { "type" : "foobar" } }
          ]
        }
      }
    }
    

    I was under the impression events was a property of an existing root object, though. If so, then you'll have to find the type name of that root object and use that in the URL to your query, with the nested query in-tact.

    Hopefully that makes sense. Elastic search queries are as hard to explain as regex queries, and I can't call myself an expert on the subject either. I'm just a casual user more than anything.



  • @the_quiet_one said in ElasticStumped:

    If so, then you'll have to find the type name of that root object

    How would I go about doing this?


  • 🚽 Regular

    @blakeyrat said in ElasticStumped:

    @the_quiet_one said in ElasticStumped:

    If so, then you'll have to find the type name of that root object

    How would I go about doing this?

    You can start with GET /index_name/_mapping

    If "event" is a root type you should see it right in there with the type and name properties defined. If not, then you'll probably just need to search through the results for any objects that contain an events property with type "nested".



  • @the_quiet_one Oh I already looked at that. "events" isn't mapped as a nested type. That's in my OP. My problem is I need to be able to do this (BTW fucking simple!) query without having to dick with the indexes. That's the question.


  • 🚽 Regular

    @blakeyrat said in ElasticStumped:

    @the_quiet_one Oh I already looked at that. "events" isn't mapped as a nested type. That's in my OP. My problem is I need to be able to do this (BTW fucking simple!) query without having to dick with the indexes. That's the question.

    Ah, sorry. Missed that detail. In my experience, ElasticSearch is very finicky with types, and will crap a turd if anything defined in the index doesn't match a query. What I'm confused by, though, is this is also enforced on insert, so: how the hell did that events array even get in the data without it failing with a similar error? If this is the case, then I don't think there is much else you can do but rebuild the index. :(

    Now that I've pondered it for a while, I think what that horribly confusing sentence in the docs you quoted in the OP is saying is: If you are performing a search query that requires a nested query within another nested query, you only need to put the "nested" thing at the top of your query. In other words, let's say the event object had a third property listing another array of objects and you wanted to query on that array... you wouldn't need to add a second "nested" in the nested query. Ugh, fuck me sideways, there's no fucking way to word this eloquently.



  • @the_quiet_one said in ElasticStumped:

    What I'm confused by, though, is this is also enforced on insert, so: how the hell did that events array even get in the data without it failing with a similar error?

    Well the mapping is roughly 8 million lines long and it has more than one thing in it called "type", so it's possible I missed it.

    But yes that was confusing me to. If I do a query.match_all on the parent type, it shows me a nice JSON object exactly like I'd expect-- but if I query the JSON object, suddenly it thinks events is a string or something?

    Like I said in the subject line, I'm completely stumped at this point.


  • 🚽 Regular

    @blakeyrat said in ElasticStumped:

    @the_quiet_one said in ElasticStumped:

    What I'm confused by, though, is this is also enforced on insert, so: how the hell did that events array even get in the data without it failing with a similar error?

    Well the mapping is roughly 8 million lines long and it has more than one thing in it called "type", so it's possible I missed it.

    But yes that was confusing me to. If I do a query.match_all on the parent type, it shows me a nice JSON object exactly like I'd expect-- but if I query the JSON object, suddenly it thinks events is a string or something?

    Like I said in the subject line, I'm completely stumped at this point.

    Sadly, I might need to throw my hands in the air. 🤷♂

    Elastic Search is one of those things where half the time I'm just using a tool like Sense to just fiddle with the query until I get what I want. On that end, it's near impossible to troubleshoot it in this manner, where I don't access to just fool around with it and see what works. It's a black art to me. Hopefully someone on this forum has competency with it that surpasses us mere mortals.



  • @blakeyrat said in ElasticStumped:

    Well the mapping is roughly 8 million lines long and it has more than one thing in it called "type", so it's possible I missed it.

    0_1502305810653_Capture.PNG

    So it looks like ElasticSearch thinks it's a string, and not an array or object or nested.

    But I still don't get why when I do the equivalent of a SELECT TOP 10 * on it I get a JSON object back-- shouldn't I see:

    "events": "(JSON in string form)"
    

    if ElasticSearch is so convinced it's a string? Like, does ElasticSearch keep track of types, then ignore them 90% of the time, or what the fuck is even going on here? Who designed this shit?

    This shit's giving me a headache.

    I'm about a half hour away from writing a C# program to download THE ENTIRE FUCKING INDEX and filter it manually. In a fucking LINQ statement. Because it didn't take me 8 hours to learn fucking LINQ.



  • @the_quiet_one said in ElasticStumped:

    Elastic Search is one of those things where half the time I'm just using a tool like Sense to just fiddle with the query until I get what I want.

    My next question was going to be "does ElasticSearch have any query tools that don't suck shit?" but Sense is what I'm using too. It sucks shit.


  • Impossible Mission - B

    Ugh, you have my sincerest sympathies, @blakeyrat. Every time I have to work with ElasticSearch, I find [nested] nested :wtf:s. It's basically a fractal of bad design.



  • Co-worker I talked to about this yesterday came up with a "solution". ElasticSearch (per its name I suppose) does full-text search indexing on everything-- fortunately the data I need is a unique-enough keyword that it'll come up if I do a full-text search of the index.

    Thus, the solution for now is:

    GET /index_name/_search?q=MyUniqueKeyword
    

    So basically, the solution to "I can't query in a sane way" is "use full-text search instead".


Log in to reply