The index is more important than the data



  • One of the systems I regularly have to deal with is Magento – a PHP e-commerce system with a large marketing budget, some enterprise-level features, and a jungle of strange decisions. Here is something I found in its core database adapter code recently:

    public function addIndex($tableName, $indexName, $fields, 
                             $indexType = Varien_Db_Adapter_Interface::INDEX_TYPE_INDEX, $schemaName = null) {
    	// code to build the SQL statement to add this index
    	$cycle = true;
    	while ($cycle === true) {
    		try {
    			$result = $this->raw_query($query);
    			$cycle  = false;
    		} catch (Exception $e) {
    			if (in_array(strtolower($indexType), array('primary', 'unique'))) {
    				$match = array();
    				if (preg_match('#SQLSTATE\[23000\]: [^:]+: 1062[^\']+\'([\d-\.]+)\'#', $e->getMessage(), $match)) {
    					$ids = explode('-', $match[1]);
    					$this->_removeDuplicateEntry($tableName, $fields, $ids);
    					continue;
    				}
    			}
    			throw $e;
    		}
    	}
    	// ...
    }
    

    This code tries to add an index to a database table. If you try to create a primary/unique index and MySQL says "you can't do that - your existing data violates that constraint", the code parses the error message to extract the offending column values, deletes those duplicate rows except one, and calmly tries again. And again.


  • Discourse touched me in a no-no place

    @DCoder said in The index is more important than the data:

    This code tries to add an index to a database table. If you try to create a primary/unique index and MySQL says "you can't do that - your existing data violates that constraint", the code parses the error message to extract the offending column values, deletes those duplicate rows except one, and calmly tries again. And again.

    That's awe-inspiring. It should inspire someone sending the US Army after them to get a bit of shock-and-awe going.



  • @DCoder said in The index is more important than the data:

    One of the systems I regularly have to deal with is Magento – a PHP e-commerce system with a large marketing budget, some enterprise-level features, and a jungle of strange decisions.

    You have my sympathies.



  • @DCoder Why does the idea of parsing an error message to fix it sound familiar... oh, I did that once... D:



  • @DCoder Jesus you're dealing with that hellspawn that I refuse to touch and if you have been in the lounge, you know how what my code base is currently like.



  • @LB_ sometimes there is nothing else you can do but this is levels of :doing_it_wrong: here, this is why I avoid Magento.


  • Winner of the 2016 Presidential Election

    @DCoder said in The index is more important than the data:

    One of the systems I regularly have to deal with is Magento

    Run!!!!!

    (Sorry, reflexes. Now I'm going to read the rest of your post.)


  • Winner of the 2016 Presidential Election

    @Arantor said in The index is more important than the data:

    Jesus you're dealing with that hellspawn that I refuse to touch and if you have been in the lounge, you know how what my code base is currently like.

    I once had to write a Magento plugin which completely changed the price calculation rules. Oh, the horrors…

    I don't think even the original developers understood how the price and tax calculation actually worked. Of course, they were representing prices as floats, the calculation logic was spread across at least 10 different classes in 3 different modules, and they basically just rounded in random places and hoped that the result would be correct.



  • @asdf you got Magento on you, I'm sorry, I'm so so sorry.

    But yes, that's about consistent with what else I've heard which is why I won't go near it.

    If I thought there were a market I wanted to get into with a Magento replacement and could handle the shit that would come with it, I'd totally build a replacement. But I don't want what that would come with.


  • Winner of the 2016 Presidential Election

    @Arantor said in The index is more important than the data:

    you got Magento on you

    Not anymore, fortunately. Magento is one of the reasons I changed jobs and promised myself to never ever touch PHP again.


  • :belt_onion:

    I had to do something like this once.

    HOWEVER.

    • it was duplicate data due to a bug.
    • the de-duplication was done as part of a data sanitization process before adding the UNIQUE constraint
    • the code picked which duplicate to use intelligently, and merged some duplicate records if needed.
    • if relevant data was lost, the fix was very simple (IE a password reset)
    • it was a system used by like 30 people so losing data wasn't that big of a deal anyways (it was duplicate data in the user table)
    • I berated myself thoroughly for not making it unique to begin with


  • @asdf said in The index is more important than the data:

    @DCoder said in The index is more important than the data:

    One of the systems I regularly have to deal with is Magento

    Run!!!!!

    (Sorry, reflexes. Now I'm going to read the rest of your post.)

    Nah. I like solving puzzles, I like debugging problems, I like understanding their root causes, I like learning from mistakes. It's fun!

    @Arantor said in The index is more important than the data:

    @DCoder Jesus you're dealing with that hellspawn that I refuse to touch and if you have been in the lounge, you know how what my code base is currently like.

    The lounge?

    @asdf said in The index is more important than the data:

    I once had to write a Magento plugin which completely changed the price calculation rules. Oh, the horrors…

    I don't think even the original developers understood how the price and tax calculation actually worked. Of course, they were representing prices as floats, the calculation logic was spread across at least 10 different classes in 3 different modules, and they basically just rounded in random places and hoped that the result would be correct.

    In this same project I also had to modify the discount/tax interactions and write custom price calculation rules using additional price variables. I know exactly what you mean...

    I should point out that the calculation logic in PHP code is not the only one, there's also the "price indexing" which does the same calculations in pure SQL for performance reasons, to avoid recalculation of each product's price in the product list.
    Imagine an SQL query that reads a product's price parameters stored in EAV tables, computes the product min/max price, and factors in possible discounts. Now a query that does the same for all products in the inventory, across multiple stores with different currencies, different parameter values, etc... My colleagues took one look at those queries and noped out of code review.

    I also had to build custom EAV objects, and that required copy-pasting a lot of code, because their base EAV implementation was really limited and half the logic was buried in specific child classes.


  • Winner of the 2016 Presidential Election

    @DCoder said in The index is more important than the data:

    I like solving puzzles, I like debugging problems

    Oh, me too. The reason I started hating Magento was mostly the huge amount of boilerplate code necessary to perform simple task, the lack of meaningful documentation and the difficulty of finding the correct class to override. I felt I wasn't actually solving problems anymore, but spending 80% of my time fighting the system.

    @DCoder said in The index is more important than the data:

    I should point out that the calculation logic in PHP code is not the only one, there's also the "price indexing" which does the same calculations in pure SQL

    Oh, yeah, I vaguely remember something like that. In the end, I did the worst kind of TDD (trial and error until the tests pass), because I couldn't figure out how everything was supposed to work together. A few weeks ago, a former coworker told me he fixed some edge case in that module, so I guess my code is still in production and working.

    @DCoder said in The index is more important than the data:

    I also had to build custom EAV objects

    May God have mercy upon your soul. Wasn't there also some kind of index that flattened all EAV tables if some configuration setting was enabled, which means there were two code paths for everything that had to do with those?



  • @DCoder the lounge is a board here that is request-access-only at present where the regulars share things with some sense of privacy. Needless to say, some horror stories to be found and a number of the regulars have their own "this is what I put up with" commiseration threads. Mine talks about the fun parts of my workplace, heavily anonymised, of course, but it has juicy bits too.


  • Notification Spam Recipient

    @DCoder said in The index is more important than the data:

    The lounge?

    A not-so-secret area of the forum that's disallowed from spiders and the average noone.

    Edit: :hanzo: 'd. Really should have seen that coming...



  • It isn't so bad if you are doing something with it far simpler than what it is meant to be used for... like a static page masquerading as a storefront, such as the one my FIL asked me to build for him. Despite this, MaginotMagento it still bit me in the ass repeatedly.



  • @asdf said in The index is more important than the data:

    The reason I started hating Magento was mostly the huge amount of boilerplate code necessary to perform simple task, the lack of meaningful documentation and the difficulty of finding the correct class to override. I felt I wasn't actually solving problems anymore, but spending 80% of my time fighting the system.

    "lack of meaningful documentation" and its cousin "actively misleading documentation". Some PHPdoc comments reference classes that do not exist...

    Also, boilerplate XML configuration. With a custom XML parser that does not support sibling nodes with the same name! The only way to know if your config is even being read is to step over the code with a debugger and hope you don't die of old age before getting to the relevant part. Especially fun in theming, where it parses about fifty XML files into one monster structure before building the layout, and then throws out half of the layout blocks based on instructions in those XML files.

    "Fighting the system" is a very accurate description, especially once you add third-party modules to the mix. I can't count the number of times I had to step through their code in a debugger to make sense of what the hell is going on. Third-party modules are a clusterfuck in any system, I suppose...

    @asdf said in The index is more important than the data:

    Wasn't there also some kind of index that flattened all EAV tables if some configuration setting was enabled, which means there were two code paths for everything that had to do with those?

    Oh yea, that does exist. It only covers the catalog (products) EAV tables, so luckily I did not have to deal with it for my custom objects.


  • Winner of the 2016 Presidential Election

    @DCoder said in The index is more important than the data:

    Especially fun in theming, where it parses about fifty XML files into one monster structure before building the layout, and then throws out half of the layout blocks based on instructions in those XML files.

    I had already completely forgotten about that particular horror.

    I also loved how more than half of the XML configuration file format for custom modules was completely undocumented. It took me forever to figure out how ACLs actually worked, and after three years of working with Magento on a daily basis, I still found features (or "features") I had not known about before.

    @DCoder said in The index is more important than the data:

    "Fighting the system" is a very accurate description, especially once you add third-party modules to the mix. I can't count the number of times I had to step through their code in a debugger to make sense of what the hell is going on.

    I know the feeling. Without a debugger, it was impossible to tell which code was actually executed once you had enough third-party modules installed.

    I had a little hope when I heard about Magento 2, which was supposed to fix some of the worst problems:

    • Finally, custom modules are contained in one folder instead of at least four different folders.
    • Documentation!!!
    • Completely rewritten price calculation logic that's supposed to be easier to understand and extend.
    • A public Github repo, where everyone can make feature requests.
    • The ancient Javascript framework they used was supposed to be replaced with something modern. (Oh, did I mention that the Magento developers love huge amounts of inline Javascript?)
    • A template system! (Which they removed again in some alpha version, Magento 2 still uses .phtml templates. You may think you know how bad that is, but you have no idea unless you've actually written a Magento module which overrides default templates.)

    I was enthusiastic for a while and made a few sensible feature requests for the new version, based on my experience. They have ignored them all. Not that I care now, when they stopped migrating to a sane template system because it was too much work, I decided I never ever wanted to touch Magento again.


  • Discourse touched me in a no-no place

    @DCoder said in The index is more important than the data:

    I like learning from mistakes.

    Sounds like you can learn from “using Magento” then!



  • @Arantor said in The index is more important than the data:

    @DCoder the lounge is a board here that is request-access-only at present where the regulars share things with some sense of privacy. Needless to say, some horror stories to be found and a number of the regulars have their own "this is what I put up with" commiseration threads. Mine talks about the fun parts of my workplace, heavily anonymised, of course, but it has juicy bits too.

    I started reading your thread... nine hours and 1474 posts later, :wtf: , I am amazed that you are still there, and by the fact that after all that you still see Magento as worse.


    @asdf said in The index is more important than the data:

    I had a little hope when I heard about Magento 2, which was supposed to fix some of the worst problems

    When I heard about M2, I found the end-of-life date for 1.x to be "end of 2018". Which means, we will soon have to raise the question of migrating the project we launched 18 months ago (it took some 2000 hours of work, 90% of it done by me and me alone), and the client will have conniptions. I can't wait. 🍿



  • @DCoder My codebase is getting measurably better. Can we say the same about Magento?!


  • Impossible Mission - B

    @Arantor said in The index is more important than the data:

    @DCoder My codebase is getting measurably better.

    By what measurement? :trollface:



  • @LB_ said in The index is more important than the data:

    Why does the idea of parsing an error message to fix it sound familiar... oh, I did that once...

    Database-sourced errors in general are nasty to deal with. Entity Framework, for example, just tosses you a SqlException and it's up to you to unwrap all the inner exceptions and cross-reference the magic error numbers to find out what exactly the problem is - whether you've violated a PK, a check constraint, or just had the DB crap itself in a myriad of possible vendor-specific ways.



  • @masonwheeler said in The index is more important than the data:

    @Arantor said in The index is more important than the data:

    @DCoder My codebase is getting measurably better.

    By what measurement? :trollface:

    The number of defects per month is going down and the amount of code in the platform is going down as we deduplicate and standardise how we do things.


  • Notification Spam Recipient

    @Maciejasjmj said in The index is more important than the data:

    @LB_ said in The index is more important than the data:

    Why does the idea of parsing an error message to fix it sound familiar... oh, I did that once...

    Database-sourced errors in general are nasty to deal with. Entity Framework, for example, just tosses you a SqlException and it's up to you to unwrap all the inner exceptions and cross-reference the magic error numbers to find out what exactly the problem is - whether you've violated a PK, a check constraint, or just had the DB crap itself in a myriad of possible vendor-specific ways.

    Yeah....

    0_1473199604615_upload-94bacccf-04cf-463b-84b1-943b43dcd704


Log in to reply