C# PDF library



  • I just had a question about people's thoughts on rendering PDF docs in c#.  We're dynamically generating reports for our client from a db into pdf docs.  At the moment we're using SharpPdfLib it's ok but has a pretty limited set of features.  The script started off nice and neat but after months of change requests and a few variations of reports hacked in(ie some types of reports have extra coversheet information etc.) it has devolved into a several thousand line piece of crap.  The client now wnats newer more complex reports.

    We're going to do a rewrite(the cost now more than justifies the money saved in ease of maintenace in the future).  I was thinking rewrite it to be a bit more OO(and dare i say enterprisey).  My general design would be having a base class which generates the parts of the report common to all reports and then sub class it for the different types of reports which would do the required db lookups and add in the extra rows etc.  The problem is that sharpPdf is pretty limited, it can't insert rows in the middle of tables easily, no auto pagination etc.

    I've had a look around at some other pdf libraries and some look ok but i just wanted to see what the general consensus is.  So i gues my question is two parts.

    1.  Is there a .net pdf API out there that you would recommend

    2.  Is the OO approach outlined above a good idea or would i be better off essentially writing a script again but more broken up and better though out.

     



  • I think the OO isn't that bad of a idea, although i guess it really depends if its going to be needed.
    What you probably should do however, is separate the creating of the pdf from the retrieving of the data. That way it will be much easier to re-use certain often used layouts and with a bit of desiging it will prevent situations where you have to "insert rows in the middle of tables easily"

    Because you really first want all your data before you start building your pdf. 

    But you don't really need OO to do that of course, it would work just as wel with a set of functions. so whatever float your boat.

    Also  don't forget that you can cheat in lots of ways inside a PDF. If your client wants some sort of weird looking table with rounded borders or i don't dont know. You could fake it by using images. behind the table in question. Its more work in maintenance after that if they want to change the look of the table again though, because you will have to adjust the image every time, but at least its possible.



  • I've been using PDFlib for this sort of thing on behalf of a client. It's pretty expensive, but has language bindings for most of the 'big' languages, including .NET support. With the PPS version of it, you can trivially slurp pages from various source PDFs and merge them into a single output, as well as use what PDFlib calls "blocks" to define areas on a template that can be programatically filled with arbitrary data.

    Quite powerful in general. If you want to compare in terms of programming languages, I'd say PDFlib is C: "all the power of assembly language, with all the clarity of assembly language". It doesn't completely shield you from deal with Postscript, but for most stuff, the closest you'd get to having to code directly in Postscript is for options passed in to the various calls (font/color/size/positioning overrides, etc...)


     



  • If all you are doing is exporting a report to PDF, why don't you investigate third party reporting libraries.  Most of them support pdf exporting capabilities.  I know Crystal Reports and DevExpress.Net do this, so I'd imagine tools from Infragistics, ActiveReports, ComponentOne etc would do it as well. 

    If you need more generic capabilities, then a PDF generation library might make sense. 

     



  • Typesetting documents is a hard problem, and most libraries I have seen don't do a particularly good job of it. I usually fall back on the old reliable method of generating latex source and letting latex do all the real work. The flexibility and output quality of everything else is just laughably poor.



  • @asuffield said:

    Typesetting documents is a hard problem, and most libraries I have seen don't do a particularly good job of it. I usually fall back on the old reliable method of generating latex source and letting latex do all the real work. The flexibility and output quality of everything else is just laughably poor.


    While your certainly correct in saying that the typesetting in most (all?) pdf libraries are piss poor, i would certainly say that doing layout design in latex is pretty hard. AFAIK, latex uses flows to position everything, this leads to some unexpected results sometimes when graphs seem to slip to a next page when the text above it is enlarged. Pinning the graph to the correct page can be cumbersome, and i doubt you could easily automate it and solve all the corner cases.

    Of course automating pdf generation (which i do a lot) also suffers from this,  but it's my experience that solving the problems are mostly a bit easier.

    Also i simply assumed from the first post that these where cookie cutter reports, that will be read by management and then thrown away. So i doubt typesetting correctness will really matter that much. And even besides that, although a bitch to do, if push comes to shove its of course always possible to hand tune the spacing a bit, and manually supply the wordbreaks and correct ligature where appropriate. This wont give you the extremely nice justified blocks of text that latex can give you, but it can give you a decent enough look for business circulation.

    P.s.
    For the people who haven't got the faintest clue about typesetting, the following link gives a nice view into what asuffield is talking about.
    http://oestrem.com/thingstwice/?p=65

    Most pdf generation libraries will do better then OooOo and Word, but in comparison, much worse then latex.
     



  • @stratos said:

    AFAIK, latex uses flows to position everything, this leads to some unexpected results sometimes when graphs seem to slip to a next page when the text above it is enlarged.

    "Flows" is awfully vague, but tex does all its positioning with boxes and glue: every letter is a box of some given size and shape, and larger boxes are assembled by gluing together the smaller ones based on the available space (and generating empty boxes of given sizes where necessary), until it has a set of boxes that are the same size and shape as your paper. Unexpected results are usually caused by flawed understanding of this process.



  • I don't know about the OO part -> I've never been able to produce a printing / pdf'ing code that didn't suck - it always seems to become spaghetti :)
    As for library itself, I'd recommend iText# (http://sourceforge.net/projects/itextsharp/) - it's really nice port of a stable java pdf generator. I've been happy with that so far. It's a bit like latex/html - content orientated - it will do pagination and lines for you, but it might be tricky to force a strange layout that would continue text-flow. Absolute positioning of specific parts is easy though.


Log in to reply