StringBuilder? We've heard of that.



  • Digging more into this code, the SqlQuote function (which takes a string, replaces all single quotes with double single quotes) was called by some places they build XML... because you need to do that for some reason.

    All the XML is built via string concatenation with a StringBuilder. Which, given the old XmlDocument method of building XML before .Net 3.5, I can almost understand it. But they kind of missed the point of a StringBuilder:

    StringBuilder _stringBuilder = null;
    
    _stringBuilder = new StringBuilder();
    
    _stringBuilder.Append("<StayDayGroups>");
    foreach (ProspectPayorGroup _prospectPayorGroup in _ListProspectPayorGroup)
    {
    	_stringBuilder.Append("<StayDayGroup");
    	_stringBuilder.Append(" GroupNumber=\"" + _prospectPayorGroup.GroupNumber.ToString() + "\"");
    	_stringBuilder.Append(" StartDay=\"" + _prospectPayorGroup.StartDay.ToString() + "\"");
    	_stringBuilder.Append(" EndDay=\"" + _prospectPayorGroup.EndDay.ToString() + "\"");
    	_stringBuilder.Append(" />");
    }
    _stringBuilder.Append("</StayDayGroups>");
    
    return _stringBuilder.ToString();
    

    I'm surprised no one has put a double quote into any of the fields that get put out to XML. I should also note that, with the exception of a few elements or attributes not emitted depending on some condition, that the schema is always the same for a particular block. Which is then passed along to a stored procedure.

    And in the process of writing this, I have just discovered that the XML documents are not stored in the database like I initially assumed. It's worse: The stored procs use XPATH queries to pull data out of the blocks of XML, and then insert that into a table. I think I know what my next bit of work will be once I finish removing all the places where they make SQL statements by string formatting...


  • Considered Harmful

    I don't care what the scenario is, if you're manipulating string fragments of serialized XML, you're doing things completely wrong.



  • At my old job there was an application written that compared log entries from one file to log entries in another file.  The way it did this was:

    foreach(line in file1)

       foreach(line2 in file2)

          Parse line to object

          Parse line2 to object

          Compare and store results

    I moved the parsing out of the loop and then compared the objects.  Made it run in seconds instead of hours.



  • XmlWriter (available since at least .NET 2.0) would have been the method of choice here (and looking even fairly similar). Or if you want to use XmlDocument, use Load to load a skeleton and then only fill in the variable parts using either XPath or normal DOM manipulation.



  • @bardofspoons42 said:

    And in the process of writing this, I have just discovered that the XML documents are not stored in the database like I initially assumed. It's worse: The stored procs use XPATH queries to pull data out of the blocks of XML, and then insert that into a table. I think I know what my next bit of work will be once I finish removing all the places where they make SQL statements by string formatting...
    For SQL 2005 and SQL 2000, this is actually a very efficient way to insert a large amount of data.  SQL 2008 and later have table-valued parameters which are cleaner and a bit faster.



  • @mihi said:

    XmlWriter (available since at least .NET 2.0) would have been the method of choice here (and looking even fairly similar). Or if you want to use XmlDocument, use Load to load a skeleton and then only fill in the variable parts using either XPath or normal DOM manipulation.

    You would think, wouldn't you? I've seen all sorts of stuff in here that there's been much better ways long before it was written. All SQL queries in this app are stored procs. And they apparently didn't know about parameters or setting the CommandType property on the SqlCommand. All SQL in this app is in the format of: EXEC procName @Param1={0}, @Param2={1}. That is then run through string.format and then the SqlCommand is run. This particular pattern I've never seen before this.

    @Jaime said:

    For SQL 2005 and SQL 2000, this is actually a very efficient way to insert a large amount of data.  SQL 2008 and later have table-valued parameters which are cleaner and a bit faster.

    Looking at the rest of this code, I don't think performance as on the forefront of their minds (See above), and the data sets we're talking about here may be tens of records. I have a feeling that transactions scared them, all the more so if you had to have more than one connection involved...



  • @bardofspoons42 said:

    All SQL queries in this app are stored procs. And they apparently didn't know about parameters or setting the CommandType property on the SqlCommand. All SQL in this app is in the format of: EXEC procName @Param1={0}, @Param2={1}. That is then run through string.format and then the SqlCommand is run.
    In this thread, a few people expressed doubt that anyone would ever be stupid enough to do this.  You have a real-world example of a data access strategy that uses stored procedures yet is still vulnerable to SQL Injection.



  • @Jaime said:

    @bardofspoons42 said:

    All SQL queries in this app are stored procs. And they apparently didn't know about parameters or setting the CommandType property on the SqlCommand. All SQL in this app is in the format of: EXEC procName @Param1={0}, @Param2={1}. That is then run through string.format and then the SqlCommand is run.
    In this thread, a few people expressed doubt that anyone would ever be stupid enough to do this.  You have a real-world example of a data access strategy that uses stored procedures yet is still vulnerable to SQL Injection.

    They do most of the time remember to call the function that replaces single quotes with double single quotes. At least on the .Net side, I don't want to look on the VB6 side. Though there are so many procs and UDFs that take data and build up statements on their own and then EXEC them, I wouldn't be surprised if something could be found.



  • Concatenating strings is a beginner mistake, agreed. However using string builder is only slightly better.

    Have they heard of LINQ to XML? Here is the exact same code using that:

    http://pastebin.com/qkd3GQK5



  • @Sutherlands said:

    At my old job there was an application written that compared log entries from one file to log entries in another file. 
     

    Someone rewrote the diff command?



  • @Cassidy said:

    @Sutherlands said:

    At my old job there was an application written that compared log entries from one file to log entries in another file. 
     

    Someone rewrote the diff command?

    Yes, but not in my example, and I don't know who.

    Obviously it didn't just highlight differences.  And order didn't matter.



  • For bulk loading xml into SQL Server there is the built-in SQLXMLBulkLoad component/utility (available since at least SQL server 2000). It is tremendously fast, munching through 300mb xml files in no time at all.



  • @Gyske said:

    For bulk loading xml into SQL Server there is the built-in SQLXMLBulkLoad component/utility
     

    Do you need to create the table schemas first, or does it draw that info from the XML (or from an XSD)...?



  • @Cassidy said:

    @Gyske said:

    For bulk loading xml into SQL Server there is the built-in SQLXMLBulkLoad component/utility
     

    Do you need to create the table schemas first, or does it draw that info from the XML (or from an XSD)...?

    ditto ... tell me more ... I'm trying to do this with Oracle 10g ... supposedly Oracle has tremendously powerful abilities to work with XML and Databases ... but I can't effing figure it out.



  • @Gyske said:

    For bulk loading xml into SQL Server there is the built-in SQLXMLBulkLoad component/utility (available since at least SQL server 2000). It is tremendously fast, munching through 300mb xml files in no time at all.
    Both SQLXMLBulkLoad and bcp are relly fast and should be the tools of choice for infrequent large data inserts.  However, for more mundane uses within an application, I prefer to isolate the app from the database schema by using a procedure with a table-value paramter or an xml parameter.  This allows me to evolve the schema without affecting every layer of the system.

    In other words, I rarely see a choice between the two as they fill very different needs.  Occasionally, I use a bulk load type technology in an app when I absolutely need the performance, but I feel like I made a deal with the devil when I do.

    For example, if I get a feed of data from a customer every weekend and need to update my system, I would avoid both bcp and SQLXMLBulkLoad in the solution.  But, if I needed to load data dumped from an old system into its replacement at go-live, bcp and SQLXMLBulkLoad would be my preference.  A coworker of mine recently got bitten by using bcp for a feed and then finding out after deployment that his changes weren't detected by merge replication and the servers were out of sync.


Log in to reply