LibreOffice and dead code



  • So at one point LibreOffice (a fork of OpenOffice) had over 5000 unused methods. Somehow this doesn't surprise me since it's open source software.

     



  • The comments on that article are particular telling: there are static analysis tools that can detect dead/unreachable code, and either they weren't used or code bloat didn't seem to be high on the list.

    I suppose it's admirable that they're making a drive to clean up the codebase, but IMHO things should have never reached that state. A small amount of neglect all accumulates to a huge task after a prolonged period of time.

    I've been involved in some projects where I've proposed a code cleanup, reformatting to established standards and creation of documentation (in addition to decent code commenting) prior to any changes but this idea has been rejected since "it will take too long". Never mind that without it, all maintenance and changes are drastically impeded.




  • Not unexpected in such an old piece of software. It wouldn't suprise me if simalar problems exist in other such projects like Windows, Linux or Office.

    In the comments it's noted that this doesn't apply to other engineering disciplines but I doubt that. Take for example an old office building, good chance you'll find lots of unused wiring in the buildings.



  • There is in ours - loads of phone cable running around the walls to phone sockets, and nobody really knows what's active and what could be yanked.

    (I think that'sbeen repeated over the years, so newer cable gets added then becomes inactive, add infinitum...)



  • Just because unused phone cable exists in your building doesn't meant it should. Nor should software contain dead code. If anything, fixing the software is easier than tearing down walls. There are plenty of analysis tools that make this even easier.

    Old software, like old buildings, needs refactoring phases. Those times when you go back and replace the knob and tube wiring, or get rid of the asbestos. Things that should be done despite the fact that it will cost time and money. Since ultimately you have a better codebase/building in the end.



  • @Soviut said:

    Just because unused phone cable exists in your building doesn't meant it should.

    Erm.. yeah.. that was my point (and dtech's).

    He stated that old office buildings probably contain a lot of dead wiring that could be safely stripped out. I confirmed that the situation he described reflects my current workplace, and - like many codebases - has come about as a result of new stuff being added without the old stuff being cleared away, amassing a large amount of unneeded cruft over time.

    I wasn't disagreeing with any of those points, just citing examples of them!



  • Yes, if it was closed source, it would have 50,000, but you wouln't know about it. Half of them would have glaring security issues, but only your competitors' crackers would know about them.

    Note that LibreOffice has only had control of that source for that last year, and they have been plenty busy with it.



  • @robbak said:

    Yes, if it was closed source, it would have 50,000, but you wouln't know about it. Half of them would have glaring security issues, but only your competitors' crackers would know about them.

    No, they wouldn't be security issues. If they were, it means the functions are still being used, and therefore are not dead code.

    It seems that over half of our unused code has now bitten the dust. Unfortunately as we remove more, more wastage tends to be revealed, which explains some of the upward jumps in the graph

    TRWTF is using a dead code analysis tool which doesn't deal with references inside of dead code in a sane manner. That tool will never deal with cycles.



  • @mott555 said:

    So at one point LibreOffice (a fork of OpenOffice) had over 5000 unused methods.

    I took the 5000 unused methods, put them in a new project, wrote up a quick and dirty Main() and it turned out it was a fully working Windows port of Super Mario World. True story.



  • Really? I tried that and got the latest internal dev build of Bioware's secret System Shock III. It runs on the Dragon Age/Mass Effect engine. SHODAN is romanceable. One of us is doing it wrong.



  • @mott555 said:

    So at one point LibreOffice (a fork of OpenOffice) had over 5000 unused methods. Somehow this doesn't surprise me since it's [b]former closed source software[/b].

    http://www.i-programmer.info/news/136-open-source/3607-drive-to-remove-unused-code-in-libreoffice.html

     

    FTFY.



  • @Cassidy said:

    I wasn't disagreeing with any of those points, just citing examples of them!

    Sorry. It sounded like you were saying that because old offices had unremoved cables that it was okay for software to have dead code. I guess my point was more that while it can be physically difficult to remove cables from the walls of an office, there's no excuse with software since the tools available make the process trivial if you don't let it get out of hand like LibreOffice did.



  • @Salamander said:

    @robbak said:

    Yes, if it was closed source, it would have 50,000, but you wouln't know about it. Half of them would have glaring security issues, but only your competitors' crackers would know about them.

    No, they wouldn't be security issues. If they were, it means the functions are still being used, and therefore are not dead code.

    Well, there is dead code. And then there is zombie code, that should by all rights be dead (because it performs no useful function) but still shambles on. You can get zombie code that isn't detected as dead, either because it manipulates data that is never used, or returns values that are never used. That code can still be run and contain security flaws.



  • @robbak said:

    Yes, if it was closed source, it would have 50,000, but you wouln't know about it.

    These have been piling up in OpenOffice for years! Being open source didn't make these issues any more visible. Large projects are difficult to maintain and being open or closed source makes no difference. The only reason this dead code is known is because someone finally got the idea to do an audit. Why wasn't this done early? Because it's boring and nobody usually wants to do that in their spare time. It's not the glory work, and open source pays in pats on the back.

    If anything, closed source is less likely to have these sorts of massive pile ups because there's financial incentive to keep them trim. At the very least, people are paid to do the drudgery. Do you think many people would volunteer to be garbage collectors and janitors in real life if they weren't being paid?



  • @Cassidy said:

    The comments on that article are particular telling: there are static analysis tools that can detect dead/unreachable code, and either they weren't used or code bloat didn't seem to be high on the list.

    But you can't believe the output of such tools. You have to prove that the code isn't used by reflection, and that's a lot of work in a big codebase.

    @Soviut said:

    @robbak said:
    Yes, if it was closed source, it would have 50,000, but you wouln't know about it.

    If anything, closed source is less likely to have these sorts of massive pile ups because there's financial incentive to keep them trim.

    Contrariwise, there's more emphasis on change management and avoiding the risk of removing code which might turn out to be used after all.


  • @Soviut said:

    If anything, closed source is less likely to have these sorts of massive pile ups because there's financial incentive to keep them trim.

    WTF are you talking about?

    Financial incentive to clean up dead code? You must be fucking kidding.

    This is almost always something that's only done after months of trying to convince management to allow time for the cleanup. Most of the time the priorities are adding new features and fixing critical bugs. The benefit of cleanup is far too vague for most managers to sign off on. Clean code isn't something you can monetise.



  • @_gaffer said:

    The benefit of cleanup is far too vague for most managers to sign off on. Clean code isn't something you can monetise.

    It can in some respects - as you've alluded to there, it requires building a strong business case showing that time invested in reformatting/restructing code to maintainable standards will reduce the time taken to implement changes (whether bug-fixes or requested changes) and therefore work out cheaper in the longer term, as well as ensuring compliance to external standards (eg: security compliance) that could distinguish the product form the competition.

    I thought Alex wrote an article on it, entitled "Software Investment" and the challenges of convincing management of the benefits of cruft cleanout. I agree that management see (quite rightly) very little returns for time/effort invested in the short-term and aren't likely to agree to it, but it makes sense. Hell, isn't that what's driven the LibreOffice clean-up projectin the first place?



  • Yes, Cassidy, all this makes sense. My issue is the assertion that closed source software is more likely to be cleaned up due to some financial incentive. That's just a huge pile of shit. Yes, a few companies will do this, but it's by no means the norm.

    LibreOffice is being cleaned up because the focus of development effort is chosen by developers, who are the ones generally fighting for prudent code cleanups and other such housekeeping. In this case there are no managers to stand around, fucking things up because they can't reduce the activity to a simple consideration of cost vs projected profit.

    Further, there is likely to be more incentive to clean up the codebase due to its visibility. Open source collaboration gives a strong motivation to keep a clean house since it makes it easier to attract other volunteers, and there's appreciation to be had from other community members for those who actually put in the work to make everyone else's development easier.



  • Cut some code you think is dead, build, and run your test suite. Passed? repeat.



  • @bgodot said:

    Cut some code you think is dead, build, and run your test suite. Passed? repeat.
    Fail in production because your test suite didn't find something? Priceless.



  • My feeling too - what I think is dead well may differ from what actually is.

    Hence my thoughts of using static analysis tools to identify potentially dead/unreachable code. I know they're not perfect, but they should provide a good starting-point of things to address first.



  • @pjt33 said:

    @Cassidy said:

    The comments on that article are particular telling: there are static analysis tools that can detect dead/unreachable code, and either they weren't used or code bloat didn't seem to be high on the list.

    But you can't believe the output of such tools. You have to prove that the code isn't used by reflection, and that's a lot of work in a big codebase.
    In languages without reflection the static analysis tools would work perfectly and the compiler would complain about unresolved references if you removed live code.

    But shouldn't you limit the use of reflection to very few special circumstances anyways? Those should not be too hard to go through manually.
    I've never used reflection myself and have never felt the need to, either.

     



  • I'm no expert, but.. how would one resolve the reflection issue?

    And if that was solvable by a human.. surely the same process could be automated by a static analysis tool?



  • @_gaffer said:

    LibreOffice is being cleaned up because the focus of development effort is chosen by developers, who are the ones generally fighting for prudent code cleanups and other such housekeeping. In this case there are no managers to stand around, fucking things up because they can't reduce the activity to a simple consideration of cost vs projected profit.

    Yah... in some ways, the developers are the managers - they've made a collective decision to proceed with a cleanup, and have decided that with the amount of volunteer effort at their disposal, the cleanup is a more productive "expenditure". I'd like to think that somewhere there is a manager or two (project manager, team leader, etc) heading up the effort and overseeing the entire process, steering effort in the right direction. It's rare that a community of leaderless peers organise themselves to deliver productive results; many open-source projects (or voluntary projects, etc) experience failure through loss of confidence in project viability, often as a result of weak (or no) leadership and lack of clear vision/direction.

    Of course, many fail due to the reverse (ego clashes, dictatorial leadership)... but that's another discussion.

    Be interesting to compare the before/after results of this cleanup, in terms of software stability and responsiveness.



  • @Cassidy said:

    I'm no expert, but.. how would one resolve the reflection issue?

    And if that was solvable by a human.. surely the same process could be automated by a static analysis tool?

    Fire the idiots on your team and remove their code from the codebase.  This can be accomplished by something like:

    If(checkin.Contains("Reflection"))

       SendTerminationNotice(checkin.Author);



  • @Cassidy said:

    I'm no expert, but.. how would one resolve the reflection issue?

    And if that was solvable by a human.. surely the same process could be automated by a static analysis tool?

    If a human can hold a meaningful conversation, surely a computer could as well?

    More seriously, dynamic analysis would be the tool for this.



  • @Sutherlands said:

    Fire the idiots on your team and remove their code from the codebase.  This can be accomplished by something like:

    If(checkin.Contains("Reflection"))

       Terminate(checkin.Author);

    FTFY, considering some developers I've met...

    @heterodox said:

    If a human can hold a meaningful conversation, surely a computer could as well?

    More seriously, dynamic analysis would be the tool for this.

    I get your point about not everything is within the realm of being computerised, but I just felt that there had to be tools (static or dynamic) that could assist in identifying dead code. Hell, I even approach activities in a project plan's WBS as though they're a basic flow of a Use Case description, but that's the way my mind works.



  • @topspin said:

    But shouldn't you limit the use of reflection to very few special circumstances anyways? Those should not be too hard to go through manually.

    The typical example is plugins, and I wouldn't be surprised to learn that a project like LibreOffice was built around a plugin architecture.

    @Cassidy said:

    I'm no expert, but.. how would one resolve the reflection issue?

    And if that was solvable by a human.. surely the same process could be automated by a static analysis tool?

    1. Find where reflection is used.
    2. Work backwards to find out where it gets its data from.
    3. If necessary, reverse engineer the build system to work out how things get into the directories it scans.

    And while some parts can be automated, in general the problem is bound to be uncomputable.



  • @pjt33 said:

    @topspin said:

    But shouldn't you limit the use of reflection to very few special circumstances anyways? Those should not be too hard to go through manually.

    The typical example is plugins, and I wouldn't be surprised to learn that a project like LibreOffice was built around a plugin architecture.

    Sounds like a shitty design to me. Give the plugins a stable API and be done with it, like you would do with every language not supporting reflection.
    Not disagreeing with you, just saying.

     



  • @topspin said:

    Sounds like a shitty design to me. Give the plugins a stable API and be done with it, like you would do with every language not supporting reflection.
    Not disagreeing with you, just saying.


    You've got it back to front. The reflection is used to load the plugins.



  • But then you're not calling any code in your own code base via reflection that might otherwise be marked dead, are you?



  • @topspin said:

    But then you're not calling any code in your own code base via reflection that might otherwise be marked dead, are you?

    Code which is called from plugins but not from the core would be marked dead.



  • @pjt33 said:

    Code which is called from plugins but not from the core would be marked dead.
     

    Presumably the folks doing the analysis would think to treat their API functions (and their call trees) differently than internal functions.



  • @too_many_usernames said:

    @pjt33 said:
    Code which is called from plugins but not from the core would be marked dead.

    Presumably the folks doing the analysis would think to treat their API functions (and their call trees) differently than internal functions.

    You mean everything in a C++ header file? I've never worked with any sort of plugin, and using stuff not officially marked as part of a plugin API (assuming such a thing exists) is dumb, but couldn't a plugin theoretically call a lot of otherwise dead code?



  •  @boomzilla said:

    @too_many_usernames said:
    @pjt33 said:
    Code which is called from plugins but not from the core would be marked dead.

    Presumably the folks doing the analysis would think to treat their API functions (and their call trees) differently than internal functions.

    You mean everything in a C++ header file? I've never worked with any sort of plugin, and using stuff not officially marked as part of a plugin API (assuming such a thing exists) is dumb, but couldn't a plugin theoretically call a lot of otherwise dead code?

    Yes, yes it can.  Just look at the amount of back compat internal functions and other crap which Windows has to preserve 



  • @boomzilla said:

    You mean everything in a C++ header file? I've never worked with any sort of plugin, and using stuff not officially marked as part of a plugin API (assuming such a thing exists) is dumb, but couldn't a plugin theoretically call a lot of otherwise dead code?

    Which language are we talking about here anyways, btw?

    Speaking about reflection I was thinking Java. And then everything public should be considered part of the API. Code that uses private classes or methods via reflection deserves to be broken by a version update.
    I have no idea how you would do reflection in C++. Is that possibe at all?



  • @topspin said:

    @boomzilla said:
    You mean everything in a C++ header file?

    Which language are we talking about here anyways, btw?

    Speaking about reflection I was thinking Java. And then everything public should be considered part of the API. Code that uses private classes or methods via reflection deserves to be broken by a version update.
    I have no idea how you would do reflection in C++. Is that possibe at all?

    Well, I believe that Libreoffice is really written in C++, and that it's possible to write extensions that way. I'd imagine that a java extension would be more limited in how far it could reach into the guts, which is why I brought C++ into the conversation. And having an extension writer with access to all of the header files gives him a relatively easy end run around whatever sort of SDK they expose for extensions. Again, this would be risky and fragile, but I could imagine an extension analogous to a lot of the VBA WTFs that you get around here.

    However, no, I don't think there's a native way to do reflection (short of going really low level, which probably wouldn't count as reflection any more). I suppose you might have some sort of RTTI (run time type information) framework that could enable it. But I suspect that could mess with your static analysis stuff, making stuff that isn't really used look like it's used. Maybe.



  • RTTI is afaik nowhere near as powerful as reflection. Mostly it is only about types, not methods/functions. And you cannot do anything "by name" like in Java or dynamic languages with eval(). Even with some quite intrusive things like Qt's meta type system you don't get as powerful as reflection.

    Whatever, they probably do have their WTFy reasons.


Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.