Need to call a function? Use reflection!



  • Fortunately, reflection in Python is quite a bit simpler than in some other languages. This still made me go :wtf: though:

    func = getattr(hashlib, 'md5')() ... func.update(string) ... func.hexdigest()

    For those not familiar with it, getattr(hashlib, 'md5')() is the same as hashlib.md5()



  • @Dragnslcr :facepalm: The coder knew the name of the function being called and still used getattr() to get a handle to it just to call it... there are no words. Did you ever track down the offender to ask what the 'reasoning' for this mockery was (before you presumably had them drawn and quartered)?



  • I know exactly who it is. Unfortunately, I'm not the person in charge, so constructive criticism generally isn't appreciated.

    As for the loop that reads in the file to be hashed:

    for bytes in iter(lambda: os.read(f, 2048*func.block_size), b''):



  • @Dragnslcr said in Need to call a function? Use reflection!:

    Unfortunately, I'm not the person in charge, so constructive criticism generally isn't appreciated.

    Your work here is complete. Now you just need to make them aware of the public ridicule.


  • BINNED

    So wait, md5 is a hardcoded string? This shit is fine if you're looking for a method name in an MVC and it's basically a user input, but this... :wtf:



  • @Onyx Got it in one.



  • @Dragnslcr The more I look at that the less I understand it.

    hashlib.md5 has no block_size attribute, but maybe it's in python 3 or a weird version of something, whatever. Even if it did, why couldn't you just write '512'? If you can hardcode 'md5' you can hardcode '512', surely. Why read it in one kb at a time? Because of the way md5 works you'll still accumulate 100% of the data in memory(unless you're collecting hashes of 1kb chunks? as a checksum?). f.read() would return the whole file with only one call to os.read() which'll make caching a hell of a lot easier on the hard drive. Unless f is a 'file-like' object? A socket or something?

    Not to mention 'bytes' shadows a builtin type.



  • @AyGeePlus said in Need to call a function? Use reflection!:

    Why read it in one kb at a time?

    Because then you only need to keep a scratch buffer and MD5 state (both of which have constant size) in memory, instead of the entire file. Reading file in full is rarely a thing you want to do (and only for small files), especially for hashing where it's both a waste of time and space.

    @Dragnslcr said in Need to call a function? Use reflection!:

    For those not familiar with it, getattr(hashlib, 'md5')() is the same as hashlib.md5()

    They might have confused using getattr with using hashlib.new, which takes any an algorithm name and is used to grab ones that aren't exposed as top-level hashlib functions (like RIPEMD and stuff). Still wrong thing to do for MD5, though.



  • @AyGeePlus said in Need to call a function? Use reflection!:

    hashlib.md5 has no block_size attribute, but maybe it's in python 3 or a weird version of something, whatever.

    It's been there since at least 2.7:

    Python 2.7.8 (default, Nov 10 2014, 08:19:18)
    >>> import hashlib
    >>> hashlib.md5().block_size
    64L

    Even if it did, why couldn't you just write '512'? If you can hardcode 'md5' you can hardcode '512', surely. Why read it in one kb at a time? Because of the way md5 works you'll still accumulate 100% of the data in memory(unless you're collecting hashes of 1kb chunks? as a checksum?).

    You could hard-code the number of bytes to read at a time. I don't know for certain, but I would guess that you'll get somewhat better performance by processing a multiple of the block_size at a time. 128 KB is probably too small, though; I think the optimal disk read size is a few MB these days.

    f.read() would return the whole file with only one call to os.read() which'll make caching a hell of a lot easier on the hard drive. Unless f is a 'file-like' object? A socket or something?

    It's only doing MD5 hashes of entire files, but the files could potentially be pretty large, so we don't want to read the entire file into memory.



  • @Dragnslcr Wait a minute...

    For those not familiar with it, getattr(hashlib, 'md5')() is the same as hashlib.md5()

    This statement is in fact correct, but now that I look at the code again, it makes even less sense than I thought. The name of that variable is pretty misleading.

    For those unfamiliar with Python, a Python free-standing function is a first-class object, and a reference to a function is perfectly legal (actually, methods are too, but the syntax for working with them with an explicit rather than implicit object argument is quirky). The getattr() function returns such a reference for a function in the package that the first parameter references, by the name of the function passed as a string in the second parameter.

    The crux of this :wtf: is that the coder is getting a function which they already had the name of at coding time, so there's no reason to do it this way - they could just as easily have simply called the function. It isn't justified by renaming the function either, as the import statement has an as clause for just such a purpose. Even if that weren't good enough, all you need to do to get a function reference is to have the FQN (or the local name if you import it import x from y so that it is in the local namespace) so func = hashlib.md5 would be sufficient.

    The problem is, despite appearances, that isn't what this code is doing. That last part - that the name of the function is just a constant function reference - touches on why this is the case.

    In Python, AFAIK, an argument list is just a tuple literal that happens to follow a function reference - it may not actually be implemented that way, but that's more or less the semantics. You can call a function - regardless of how that function is referenced - simply by putting a parameter list after the reference. Since there's no typechecking, and arity is checked at run time, a pair of parentheses is a valid function call with no arguments.

    In other words,

    func = getattr(hashlib, 'md5')()
    

    is not returning the reference hashlib.md5, it is calling on that reference with an empty argument, which should return a hash-generating object, not a function. So the variable func isn't a function object, as the name would imply, but a hash-generator object, hence the method calls on it after that.



  • @Dragnslcr said in Need to call a function? Use reflection!:

    @AyGeePlus said in Need to call a function? Use reflection!:

    hashlib.md5 has no block_size attribute, but maybe it's in python 3 or a weird version of something, whatever.

    It's been there since at least 2.7:

    Python 2.7.8 (default, Nov 10 2014, 08:19:18)
    >>> import hashlib
    >>> hashlib.md5().block_size
    64L

    Turns out E_PEBKAC. hashlib.md5().block_size exists, hashlib.md5.block_size does not because of course not, why would a constructor have attributes like that, TDEMSYR.

    @CatPlusPlus Oh, I didn't realize you could roll-up md5 hashes like that. Lots of brain cooties today. I even looked up md5 implementations to see if that was the case, but apparently can't brain.

    Not my finest performance.



  • @Dragnslcr said in Need to call a function? Use reflection!:

    For those not familiar with it, getattr(hashlib, 'md5')() is the same as hashlib.md5()

    Is there a performance difference? For the parallel constructions in Javascript (hashlib['md5']() vs. hashlib.md5() ) I don't believe there is.

    Is getattr() an actual function call in Python, or just syntactic sugar for whatever the engine would have to do to find a method by name anyway?

    Also, is there any difference in Python between calling a method via its containing object and calling it via a function reference? Does Python have anything like this and .call() and .apply() in Javascript?



  • @flabdablet said in Need to call a function? Use reflection!:

    @Dragnslcr said in Need to call a function? Use reflection!:

    For those not familiar with it, getattr(hashlib, 'md5')() is the same as hashlib.md5()

    Is there a performance difference? For the parallel constructions in Javascript (hashlib['md5']() vs. hashlib.md5() ) I don't believe there is.

    Probably not much of a difference. Just the overhead of calling getattr.

    Is getattr() an actual function call in Python, or just syntactic sugar for whatever the engine would have to do to find a method by name anyway?

    It's a built-in function, but I don't know if the interpreter does some magic to optimize it.

    Also, is there any difference in Python between calling a method via its containing object and calling it via a function reference?

    Functionally, nope, it's exactly the same. I don't know if there might be some very small performance penalty for doing the variable lookup.



  • @flabdablet said in Need to call a function? Use reflection!:

    Is getattr() an actual function call in Python

    Yes. It's implemented in the VM, though, so the difference is typically negligible. PyPy might be able to optimise it out in a case like this one, but CPython doesn't have a JIT and only does some rudimentary peephole optimisations on the bytecode. It doesn't really matter anyway.

    @flabdablet said in Need to call a function? Use reflection!:

    Also, is there any difference in Python between calling a method via its containing object and calling it via a function reference?

    No, object.method returns a bound callable. In general, dis module is your friend if you want to find out things like that: https://ideone.com/VkmpcC

    It's mostly ridiculous micro-optimisation territory though and not really worth keeping in mind.

    Does Python have anything like this and .call() and .apply() in Javascript?

    No, Python doesn't do dynamic scoping at all. Every callable object has __call__ that actually implements the function call operator, but you never use it directly (fn() and fn.__call__() are exactly equivalent). You can unpack lists and dicts into positional/keyword arguments using special syntax: args = [1, 2, 3]; kwargs = { 'x': 1, 'y': 2 }; f(*args, **kwargs), so there is no need for apply either.

    @Dragnslcr said in Need to call a function? Use reflection!:

    I don't know if there might be some very small performance penalty for doing the variable lookup.

    Attribute loads (and global loads) are slightly slower than local loads. You might see generated code that preloads attributes/globals into locals, but you won't pass any code review if you do it in code intended for humans.


  • I survived the hour long Uno hand



  • @Yamikuronue drama tick at a stases?

    What's a stases?


  • I survived the hour long Uno hand

    @blakeyrat No idea, I was just looking for a miniature of a warg-rider.



  • Call me crazy (I know you do, always plotting against me behind my back...), but maybe it is a misspelling of 'anastasis' (an archaic term meaning resurrection). Maybe.



  • @ScholRLEA It's split into dramatic katastases at the parent blog, but 'katastases' doesn't appear to be a thing. Maybe it's this, which would fit 'dramatic'.

    Filed under: pointless sleuthing.



  • @Dragnslcr What? No! WHY?!?!
    Don't do that to me first thing in the morning =_=



  • @AyGeePlus said in Need to call a function? Use reflection!:

    It's split into dramatic katastases at the parent blog, but 'katastases' doesn't appear to be a thing.

    My guess is a portmanteau of kata+stases.



  • @anotherusername Dramatic pauses during katas?

    Dramatic right-before-theatrical-climax-plural makes more sense. Not a lot more, but more.


Log in to reply