Parsing JIRA, Now parsing JSON with dynamic using Newtonsoft fails



  • I've looked, but all the references seem to start from the web side.

    Is there a reference for a desktop application capturing dom from a site?

    I want to parse a particular website and grab information from it dynamically.



  • What technology? .Net, Java, UNIX shell, QT?



  • For example, in C#:

    WebClient client = new WebClient();
    string downloadString = client.DownloadString("http://www.google.com");
    


  • C# works.

    So that just downloads the page as a string?

    Great. Now I just grab a DOM Parser library.

    Thanks!



  • @xaade said:

    Now I just grab a DOM Parser library.



  • @xaade said:

    Great. Now I just grab a DOM Parser library.

    Just use a regex :trolleybus:



  • @hungrier said:

    Just use a regex

    or if you are looking for an overkill



  • sigh

    I suppose I could do that for very specific instances of certain spans.
    But I really really hate regex.



  • That's what I stumbled into.



  • libxml2 can be used to parse HTML trees as well as XML. It's typically present on linux systems with various language bindings, but according to the site there's a windows version as well.



  • For static websites (no ajax, no javscript), this is what I did the last time I needed it:

    • Download the page (html only)
    • "Normalize" html into xhtml (there was a library for that in C#)
    • Parse it into an XmlDocument
    • Use XPath to find the data (much better than Regex).

    If you need, I can dig through the old projects and find the exact libraries I've used.

    If the target site is doing some kind of AJAX or javascript templating, you'll need a headless browser library, like selenium or that node.js thing.



  • well.

    It was a great idea, but there's a login to the site.
    Not sure if I can pull that off....



  • Well, now I have an address that logs me in, but I think the redirects are killing it.

    http://site.com/login?username=blah&password=blahagain&destination=/browse/record

    wonder if I can use a hidden browser window and capture it's DOM when it finishes loading.


  • Grade A Premium Asshole

    What manner of atrocity are you trying to commit?



  • I'm trying to scrape a record from a site that I'm having to log into.
    I gave up trying to do something honest when I realized the login barrier.
    Now I'm just trying to see if I can hack it.

    Yes, it's turning into a :wtf:


  • Grade A Premium Asshole

    And there is no URL that retrieves a JSON feed or equivalent?



  • why would there be, I'm not in control of the site?

    It's an internal JIRA site.


  • Grade A Premium Asshole

    That does not mean it does not exist. Have you tried appending .JSON to the end of the URL you need to scrape info from?


  • Grade A Premium Asshole

    @xaade said:

    It's an internal JIRA site.

    Which product?



  • JIRA v6.1.2



  • Doesn't Jira have like... RSS feeds and such? Why are you going the long way 'round?

    Also it definitely has a JSON API, I just popped open Fiddler and took a look at my company's install. You just need to find some documentation and you should be set.

    Cha-cha-cha-check it out: https://docs.atlassian.com/jira/REST/latest/

    (Oh God, Atlassian has trapped the very souls of developers. From that page:

    The REST APIs are developers who want to integrate JIRA with other standalone or web applications,

    You'd better say nice things about Atlassian products, or next thing you know you'll be transformed via dark magic into a get/issues/latest API call!!!)


  • Grade A Premium Asshole

    You beat me to it.


  • kills Dumbledore

    @cartman82 said:

    Parse it into an XmlDocument
    Use XPath to find the data (much better than Regex).

    If I was going down this route, I'd use an XDocument and Linq to XML.



  • hmmm.... that seems great.

    Now on to figure out how to login.



  • If you're server's on HTTPS, it supports HTTP Basic Authentication, which is really simply to implement.

    If not, you're stuck with OAuth which is a bitch.



  • hahaah.

    I tried https and it timed out.


  • SockDev

    @blakeyrat said:

    If not, you're stuck with OAuth which is a bitch.

    And when it goes wrong (which it will), it's almost impossible to debug.



  • @blakeyrat said:

    OAuth

    Yeah, I don't have access to do that.
    Well, here it ends, kiddos.



  • Ok; I find the easiest OAuth in C# is to "steal" the code from the Twitterizer library: https://github.com/Twitterizer/Twitterizer/tree/develop/Twitterizer2/OAuth

    It's about well-done as I've seen any OAuth code. (It's written for Twitter specifically, but it should work for any OAuth-using site.) And it's small and simple enough that you can actually take a peek at it and maybe come away with something useful.



  • I'm simply trying to keep notes the way I keep notes, with small applications that sort information for me.

    Not good enough reason for me to ask them to do this

    The first step is to register a new consumer in JIRA. This is done
    through the Application Links administration screens in JIRA. Create a
    new Application Link.

    yeah, so dead in the water.

    /rest/auth/1/session didn't seem to work either
    #Thanks so much everyone for trying to help though.

    EDIT:

    Ok, so I got /rest/auth/1/session to work.

    Now I just need to figure out how to get the cookie header into the next request.

    Game back on.

    EDIT:

    And the response doesn't seem to list a cookie.

    Geez, they're making this hard.



  • #JIRA GET!!!!

    Ok, so basically I had to use cookie authentication, then add "Cookie: " to the front of the header in the response that was a cookie, then on the next rest, add that string as a header.

    Bam....

    I'm in bizniz.



  • Ok, now I'm having trouble using a dynamic to parse the JSON.

    Using JSON.NET, using Deserialize or JObject.Parse. End up with a JObject cast to dynamic.

    Try to call one of the properties in the JSON I can SEE with my own eyes, and it says JObject doesn't implement that member.

    Tell deserialize to deserialize as a ExpandoObject, and it works one level down. Two level down and I get the JObject bullshit.

    This is supposed to work out of the box.

    I use a Expando object and it works fine.

    So there's some difficulty in getting JObject to understand that dynamic means ExpandoObject.

    This is the latest .NET 4.5 Newtonsoft. (which is version 11)


  • Grade A Premium Asshole

    #1, I mean no disrespect or criticism.

    #2, is this your first time consuming a web page programmatically?


  • Grade A Premium Asshole

    This fucking parser...


  • SockDev

    @xaade said:

    End up with a JObject cast to dynamic

    i'd avoid that if possible. just use the JObject. it's capable enough and casting to dynamic does..... odd things to the objects.... any objects really.


  • I survived the hour long Uno hand

    You mentioned:

    @xaade said:

    C# works.

    If you have a choice, and you're new to this, Node.JS was pretty much MADE for parsing json objects from arbitrary sources like JIRA


  • SockDev

    true, but it's also pretty new to a lot of businesses. there may be VP and c** level resistance....

    that being said if you can make a go of it and want to give it a try i do recommend. you can do some crazy awesome shit with Node.JS



  • or any non-typed language. arbitrary JSON and typed languages amounts for lots of headaches.

    PHP can turn any JSON into an associative array :trolleybus:



  • No, but this is the first time I'm consuming it with a C# dynamic.
    I'd have no problems if I wrote a class for the parsing.

    I'm just wondering how I broke this dynamic JSON.NET feature.



  • I was making a C# client app, because I can crank those out like candy, and lightweight for my programming style.

    Of course, I could make a local single-file page app with javascript, but why would I do that? It makes more sense to just use the site at that point, and I really like to avoid webapps. They seem to be lacking vertical containment.


  • ♿

    There's a java library that I've always used to deal with JIRA. There's at least one .NET library, though it doesn't look like much:



  • @xaade said:

    Try to call one of the properties in the JSON I can SEE with my own eyes, and it says JObject doesn't implement that member.

    What does "call a property" mean in the context of C#?

    I sometimes use a 1-file JSON parser that doesn't bother to even try to serialize the data, but simply creates a big-ass HashTable collection containing other HashTables and ArrayLists, which you can access just using array notation or whatever. For simple non-customer-facing stuff like this, I like that solution. It also helps when APIs that claim to return JSON actually sometimes don't (which is surprisingly common.)

    I think you can get Newtonsoft to do that too, but I can't tell you how off the top of my head.



  • From the looks of it, that's what a JObject is, along with other things.
    More like dictionaries though.

    The dynamic thing is supposedly a feature of 1.0, that everyone seems to have no problems with.

    ...


  • BINNED

    @blakeyrat said:

    simply creates a big-ass HashTable collection containing other HashTables and ArrayLists

    Qt does pretty much the same thing. Well, ok, it has JSON Document, Object and Array which have some access methods, but I mostly just convert it to maps containing other maps and lists most of the time.

    What else can you do, really? I can't see a way where you can automatically convert JSON to anything more useful than that.



  • You're supposed to serialize into a predefined object, but that assumes a single party owns both server and client and I've never worked in a place where that was the case.


  • BINNED

    Agreed, but there's pretty much no way to do that with a generic library is what I'm saying, object implementation is still up to you. I guess you could make something that creates a class with generic getters and setters, but that's as far an off-the shelf solution could go.

    Unless some smartass decides to create a standard for serializing methods into JSON.

    No, don't post links, I really don't want to know. Running out of brain bleach.



  • Would require code generation or reflection, I think. And AFAIK reflection performance is usually not very good.


  • BINNED

    Right. Summary: too much work for not enough gain. Have a parser, implement it yourself. Hashtables / maps / dictionaries / whatever are good enough for generic libs.



  • Yeah, we use hash tables as well. And then flatten them to a different format the rules engine can access :smile:.


  • BINNED

    For a bit of Discourse API thing I managed to write so far (busy/tired at home lately) I did just that: created a pretty generic loop that flattens the objects and assigns property values in my classes based on JSON keys.

    But it only works because there's only a single level of nesting objects and no arrays of objects. Past that I'll have to go more specific and do it "by hand" because I see no sane way of automating it completely.

    Well, other than using The Enterprise Rules Engine, but I'm not Enterprise enough for such a beauty.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.