Development tool brainstorming



  • To paraphrase PBill Cosby, I posted that last thread so I could bring up this one.

    While semi-listening to that podcast, it occurred to me that the issue of balancing the need to refactor with the need to hold off until a reasonable 'sample size' regarding both user responses and the candidates for common-code refactoring was something that might benefit from a new tool, or rather, two of them. The second would mainly be an adjunct to the first, but each could be useful independently.

    The first is a specialized tool for collating, viewing, comparing, and making annotations on possible candidates for alpha-equivalence refactoring. It would allow the user to mark a function, class, or even just a section of code as a possible match to some other code, let you build up a database of perhaps 5-15 refactoring candidates at a go, visually compare them in parallel sub-windows, with annotations on their similarities and differences, and even highlight-blocking and/or relationship arrows between the similar code sections. It possibly could give you heuristic recommendations based on the degree of similarities and the number of candidates, as well, though that would be more of a wish-list thing, assuming it could be done at all.

    The second would be a codewalker that could grind through a section (or the entirety of) of the code base looking for patterns of code that might make refactoring candidates. While it would have only one conceptually simple function compared tot he first one, that function would be immensely intricate, and would probably require the codewalker to analyze both the syntax and the semantics - in effect, it would have to compile the entire program. Assuming it is even feasible, this clearly would not be a daily or even weekly procedure for a project of any real size, but if done every sprint, say, it could give additional leverage on refactoring, especially when combined with the first tool.

    I am considering the pitfalls and advantages of this myself, but I wanted to get the idea out to you to see what you thought, and also to see if any of you here had any ideas for dev tools that are, well, perhaps a bit less fanciful but possibly still useful.


  • area_pol

    @ScholRLEA said in Development tool brainstorming:

    visually compare them in parallel sub-windows, with annotations on their similarities and differences, and even highlight-blocking and/or relationship arrows between the similar code sections

    Sounds like a nice idea to be able to have more info/discussion attached to the code than just traditional comments which are hard to browse or visualize.
    Useful to explain your proposal of architectural change to a bigger team and enable discussion.
    On the other hand, who would have time to make all those comparsisons and annotations when they need to get things done.

    @ScholRLEA said in Development tool brainstorming:

    that function would be immensely intricate, and would probably require the codewalker to analyze both the syntax and the semantics

    It could try to compare the patterns - for example similarity of the syntax trees (including types of vars).
    Or even try some machine learning methods - like recurrent neural nets which can take sequences as inputs. ML needs data to train though.


  • Discourse touched me in a no-no place

    @ScholRLEA said in Development tool brainstorming:

    The second would be a codewalker that could grind through a section (or the entirety of) of the code base looking for patterns of code that might make refactoring candidates.

    Having done a bunch of refactoring from time to time, you should bear in mind that you can also end up with sections of function that are the same except for some variables and substrings (e.g., parts of error message). The structural similarity finder should be aware of these sorts of things. OTOH, you don't need an optimised build in order to do this sort of thing; it's probably best done at the AST stage. The step to the AST is typically pretty cheap.

    I'm guessing that you can probably try computing the sensible fragments of a piece of code (probably within some sort of size bounds — you want a lower bound as well as an upper so that you don't get overwhelmed with stupid trivialities) and then mapping those to some sort of descriptor that you can do the similarity matching over. Maybe some of the algorithms used in the searching of DNA and certain types of machine vision would then become useful?



  • @dkf said in Development tool brainstorming:

    @ScholRLEA said in Development tool brainstorming:

    The second would be a codewalker that could grind through a section (or the entirety of) of the code base looking for patterns of code that might make refactoring candidates.

    Having done a bunch of refactoring from time to time, you should bear in mind that you can also end up with sections of function that are the same except for some variables and substrings (e.g., parts of error message). The structural similarity finder should be aware of these sorts of things.

    Fair point; the tool should be string-aware enough to recognize both when then two or more strings are basically patterns for several similar messages, and annotate them for possible refactoring into a specialized formatting function.

    OTOH, it should also be able to discount (or more likely, down-weight) some kinds of surface similarities between types of strings that serve different purposes.

    On a related note, a different tool just came to mind, one which I think may already be in some IDEs but I'm not sure: one for automatically collating a list of strings to be separated into a resource file. Basically, it would gather up all the text strings in the source code and give the programmer a checklist of them with a field for adding a constant name for each string. it could then give the developers the option to select those which should be moved out of the code, and generate a resource file based on that checklist and set of constant names. Perhaps it could even provide a series of higher-level fields for organizing the strings into groups.

    This would mainly be for the purpose of internationalization, as it would help catch cases a coder slips up and forget to i18ize a section of code, but it would also be useful for refactoring in general, especially if it also did the substring analysis you mentioned.

    OTOH, you don't need an optimised build in order to do this sort of thing; it's probably best done at the AST stage. The step to the AST is typically pretty cheap.

    Sorry, yes, I should have been clearer on that point, because I was intending to say more on that but forgot to write out part of what I was going to say. By 'compile', I meant only to an intermediate representation (annotated AST, most likely), not to full code generation. I was actually going to go on an extended tangent regarding that, concerning how different languages would need different support (e.g., Algol/Pascal/C type languages need significant parsing but most of the semantics are rolled into the syntax, versus s-expression languages which are basically already in AST form but need more semantic analysis and often have macros and extensible syntax to deal with) but decided it was too long winded, and in dropping it I forgot to be specific about how far the codewalker would need to go in preparing the code.

    I also thought I had mentioned that compiler front ends in general are too useful to be locked away inside the compiler, and ought to be exposed as libraries/frameworks/utilities for codewalking programs to use, but... oops?

    I'm guessing that you can probably try computing the sensible fragments of a piece of code (probably within some sort of size bounds — you want a lower bound as well as an upper so that you don't get overwhelmed with stupid trivialities) and then mapping those to some sort of descriptor that you can do the similarity matching over. Maybe some of the algorithms used in the searching of DNA and certain types of machine vision would then become useful?

    Thank you! That's really good advice, yes. I'll look into those.


  • I survived the hour long Uno hand

    @ScholRLEA said in Development tool brainstorming:

    The second would be a codewalker that could grind through a section (or the entirety of) of the code base looking for patterns of code that might make refactoring candidates

    CodeClimate does that, among other things: https://codeclimate.com/github/SockDrawer/SockBot


Log in to reply