Determining data structure of a blob of data



  • What is the best way (or rather, is there a best way?) to determine the structure of a blob of data?  Basically I send an xml string to a web service, and it returns a blob.  I don't know the structure, as I'm going at it blind.  Unfortunately I can't ask the creators because it's a 3rd party web service.



  • There is no best way.  Any conclusions you make will inherently be tentative. First, try to collect any information about the service - any docs or whatever.  Use that with the function names to guess as the the purpose of the particular service or function.  This is the most important, as you may be able to identify likely structures.  For example, if the service name is "GenerateImage", I would look for JPEG or GIF identifiers in the data.

     After that, make individual changes to the data you send, either in the XML string you send or any prior calls.  This might give you some clues.

     In the end though, this is a ton of guesswork with no guarantee of success.  With such a general question, there is no way anyone can answer with any kind of specifics.



  • Yea, thats mostly what I figured.  Biggest problem, I know its a custom data structure.  Hex shows it to have the same header as PKZIP but I don't know enough about zip to figure it out.  It has repititions of PK followed by blocks of data.



  •  can you run the blob through the file command?



  • @ajp said:

     can you run the blob through the file command?

    I'll have to build a VM for it, but thats a great idea.  I dunno what it will return though.



  •  @steve.syfuhs said:

    I'll have to build a VM for it, but thats a great idea.  I dunno what it will return though.

     also, have you tried opening it up in 7zip or some other program that groks zip files. if that gives you something that seems to be valid results then problem solved. Unless of course you're dealing with steganography, then these methods will be completely useless. :)



  • Nah, they don't look anything like stegano files.  I was using Winrar, but 7zip is a good idea.



  • Would http compression give it a PK header?



  • If correctly implemented you would need to send an "Accept-encoding" header, and receive a "Content-encoding" with deflate and/or gzip in them in order for the data to be HTTP compressed. I think GZIP has 0x1f 0x8b as the first 2 bytes, not sure deflate has any magic number at the start.

    If you don't know what the output is, how do you know what inputs to send? 



  • If you think it might be protocol compression, can you look at the receiving code?  Not just sniffing the wire, but dumping the blob in the application.  This way any protocol specifics would be already taken care of.  (I'm not a web programmer, but I assume that you call some kind of HttpGet function which returns the actual object data without the http headers, protocol compression, encryption, whatever).  If the data you see comes from this function and it still has the PK header, then the blob itself looks like a compressed archive.

     I agree with the other poster, that just running it against compression programs is the best bet.  (I wish I could see the thread because I forget names very easily).



  • I forgot to put in my last post, if http is compressing the data it will say so in the header. 



  • @Ixpah said:

    If you don't know what the output is, how do you know what inputs to send? 

     The input is xml.



  • I'm thinking the data being returned is being bungled by my code and encoding the data improperly.

    (In C#) Is it possible to use an HttpWebRequest/WebResponse pair to return binary data?  Everything is getting returned as strings, which may be the problem in the first place.  My code:

                string strXML = "";
                string request = "<xml>...<xml>";
                HttpWebRequest req = null;
                WebResponse rsp = null;

                req = (HttpWebRequest)HttpWebRequest.Create("Removed to protect the guilty");
                
                req.Method = "POST";        // Post method
                req.ContentType = "application/xml";     // content type
                req.Accept = "text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2";
                req.Headers.Add("Pragma", "no-cache");
                req.Headers.Add("Cache-Control", "no-cache");
                req.Headers["Cookie"] = "JSESSIONID=F407D548C0E7219E4827A7C9059332CC";

                // create sender and send
                StreamWriter writer = new StreamWriter(req.GetRequestStream());
                writer.WriteLine(request);
                writer.Close();
               
                // get response in string form
                rsp = req.GetResponse();
                StreamReader reader = new StreamReader(rsp.GetResponseStream());
                strXML = reader.ReadToEnd();

                // create file builder
                FileStream file = new FileStream("file", FileMode.Create);
                
                // initialize stream
                BinaryWriter w = new BinaryWriter(file);

                // write data to file
                w.Write(strXML);
                w.Close();

                Console.Write(strXML);
                Console.ReadLine();

    Like I said, I think the data is getting bungled by the string coding.  Yay, nay, meh?



  • First, I don't know C#, so realize I'm just talking out of my ass right now.

    You could test your use of StreamReader and BinaryWriter by trying them against a known entity.  Try it against a few different things like a JPEG image, an HTML file, maybe a WAV file and see if your output file is correct.  If it works for known binary streams, it will probably for this unknown data.  If it's broken, you'll see it.

    I can't say if your use of C# is correct or not just by reading it. 



  • @steve.syfuhs said:

    Is it possible to use an HttpWebRequest/WebResponse pair to return binary data?

    Yes, use BinaryReader.   This sample illustrates it in use:

    <FONT color=#2b91af size=2>   HttpWebRequest</FONT><FONT size=2> req = (</FONT><FONT color=#2b91af size=2>HttpWebRequest</FONT><FONT size=2>) </FONT><FONT color=#2b91af size=2>HttpWebRequest</FONT><FONT size=2>.Create(url);
    </FONT><FONT color=#2b91af size=2>   HttpWebResponse</FONT><FONT size=2> resp = (</FONT><FONT color=#2b91af size=2>HttpWebResponse</FONT><FONT size=2>) req.GetResponse();
    </FONT><FONT color=#2b91af size=2>
       BinaryReader</FONT><FONT size=2> br = </FONT><FONT color=#0000ff size=2>new</FONT><FONT size=2> </FONT><FONT color=#2b91af size=2>BinaryReader</FONT><FONT size=2>(resp.GetResponseStream());
    </FONT><FONT color=#0000ff size=2>   byte</FONT><FONT size=2>[ ] data = br.ReadBytes(1000000);

    </FONT><FONT color=#2b91af size=2>   File</FONT><FONT size=2>.WriteAllBytes(filename, data);</FONT></FONT><FONT size=2>

    This is a simplified version, you shouldn't just try to read in a million bytes for instance, but you can see how it works.

    -cw</FONT>

     



  • Do not use the HttpRequest/Response classes.

    Visual Studio allows you to communicate with web services in a type-safe manner without having to decode the XML response yourself. Just open your project in Visual Studio, right click on your project in your solution explorer, then click "Add Web Reference". Type the URL of the web service into your prompt and hit ok. Afterward, Visual Studio will add a new namespace to your project.

    Here's a tutorial if you need one: http://www.deitel.com/articles/csharp_tutorials/20051126/csharpwebservices_part10.html



  • @Nod said:

    Do not use the HttpRequest/Response classes.

    Visual Studio allows you to communicate with web services in a type-safe manner without having to decode the XML response yourself. Just open your project in Visual Studio, right click on your project in your solution explorer, then click "Add Web Reference". Type the URL of the web service into your prompt and hit ok. Afterward, Visual Studio will add a new namespace to your project.

    Here's a tutorial if you need one: http://www.deitel.com/articles/csharp_tutorials/20051126/csharpwebservices_part10.html

    Yeeeeeaaaaaa...that would require it to be a WSDL web service.  This service produces ONLY binary.



  •  Hi -- Did you ever get this to work with the binaryreader?  It definitely works pretty well, I've used it before as well but in a different way. There are lots of overloads and options and many different ways to get this all done, some finickier than others, so if you are trying something but having issues, let us know.

    (I am pleasantly surprised that according to the Firefox spell checker, "finickier" is a word!)



  • I'm still working on it.  How would I determine how large of a byte array to create?  Instead of just reading in say a million bytes.



  • Ok, so it does get mangled when using the StreamReader.  Now it returns a proper zip header.  Problem is when I try and open it using WinRar or 7zip it comes up with unexepected end of archive, but DOES show a list of files.  Does anyone know of a good place to learn about the structure of a zip file?



  • Are you sure you getting all of the binary data downloaded properly? perhaps show us your code.

    Also, test your code with a simple "control" download if you haven't already, like downloading a binary file that you place on a webserver yourself (even localhost) and then comparing what you get with the original to be sure that the problem is not your own code.

     



  • My code is:

    HttpWebRequest req = null;
    req = (HttpWebRequest)HttpWebRequest.Create(url);

    req.Method = "POST"; // Post method
    req.ContentType = "application/xml"; // content type
    req.Accept = "text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2";
    req.Headers.Add("Pragma", "no-cache");
    req.Headers.Add("Cache-Control", "no-cache");

    // create sender and send
    StreamWriter writer = new StreamWriter(req.GetRequestStream());
    writer.WriteLine(request);
    writer.Close();

    HttpWebResponse resp = (HttpWebResponse)req.GetResponse();

    BinaryReader br = new BinaryReader(resp.GetResponseStream());
    byte[] data = br.ReadBytes(1000000);

    File.WriteAllBytes("file1", data);
    br.Close();

    Console.ReadLine();

    The code works fine when trying to download a known zip file.



  • @steve.syfuhs said:

    I'm still working on it.  How would I determine how large of a byte array to create?  Instead of just reading in say a million bytes.

    You're better off reading in a fixed buffer of, say, 64K at a time and writing that out to a file.  Repeat until you reach the end of the file -- which you can tell when the number of bytes that comes back is less than the number you requested.  

    @steve.syfuhs said:

    Does anyone know of a good place to learn about the structure of a zip file?

    Google can help you here. PKWare publishes the .ZIP File Format Specification

    -cw



  • @steve.syfuhs said:

    What is the best way (or rather, is there a best way?) to determine the structure of a blob of data?  Basically I send an xml string to a web service, and it returns a blob.
     

    Man you got a really cool job. I always wanted to be Jeff Goldblum in Independance Day.

     


Log in to reply