Alternatives to EAV anti pattern



  • What are the alternatives to EAV ?

    I am learning .NET and want to rewrite a mini-version of an application that did at work (in ColdFusion--yes ColdFusion is TRWTF) and redesign the database.

    Basically it started as a static evaluation with multiple sections (which is displayed in tabs-each tab is a separate page because of the number of questions).

    Then, of course, each department wants their version to look different. Different questions, and eventually different sections.

    This was started in a WTFy way by wrapping if statements around each question. I prefer much DRY-er code. And worked on making it dynamic.

    Honestly it was not the most elegant. Couple years later we were able to do a version 2. Version 2 was mostly a front-end re-design and the dynamic forms built in. But we kept the database structure largely similar (I was not given leeway to change it).
    The table the stores the answers is extremely wide (>400 columns) with many nulls.
    And Version 2 is not compatible with Version 1 because there was a huge restructuring of the sections and questions.

    I had our architect (not my favorite person for a number of reasons including his my way or no way attitude though I still learned things from him) plan to redesign it. His idea was to have different table for each version of the answers. This would be less prone to breakage.

    I did not like this as not very DRY. I do see the point of breakage and is probably more performant.

    It isn't a huge amount of data (less than 10K rows)

    Would this actually be a use-case for EAV? If not, what is an alternative?
    While the evaluations are dynamic once each version was designed it was static for a while save for spelling/grammar corrections.

    Ideally one database would house both versions 1 (for each group) and 2 and be able to have future versions and allow for reporting across all version (there are many common fields across all versions and groups).



  • @Karla said:

    The table the stores the answers is extremely wide (>400 columns) with many nulls.

    that's a :wtf:. usually that kind of tables smell of poor normalization.

    OTOH
    @Karla said:

    His idea was to have different table for each version of the answers.

    may work, but, you really need to have it in one DB? can't you make a new structure and a migration script to go from one format to another?



  • what i'm failing to understand, it has a table for evaluations and another table for questions?



  • @Jarry said:

    that's a :wtf:. usually that kind of tables smell of poor normalization.

    Agreed.

    @Jarry said:

    may work, but, you really need to have it in one DB? can't you make a new structure and a migration script to go from one format to another?

    Other than the common elements,

    the versions are not compatible (unless I would union all the differing columns from the 2 answer tables) . That's more WTFery than above.

    @Jarry said:

    what i'm failing to understand, it has a table for evaluations and another table for questions?

    Each db has it's own version of a lookup for questions, sections, questions to groups, and sections to groups.

    The sections are not even the same across versions.

    Each DB has Evaluations and Answers table. The Answers table is the super wide one. Evaluations table is a remnant because with the version 1 there were other evaluations types that were really not compatible. Version 1 was initially created for those other types (I don't care about those). Version 2 other driving point was to separate those other evaluation types from this one. They were rightfully separated into different applications.

    And there is the likelihood of a new version of the evaluation which will not be compatible with either of previous versions.

    And I'm not doing this for work, new boss is outsourcing the replacement (they are looking at packages like SalesForce)
    .
    I just want a sample design to practice .NET and demonstrate how I would design it if I had the power to do so.



  • well, my normal approach is doing the DB design first and then build the application around that. it works for me, but YMMV

    first, identify your entities. IE questions, answers, groups, sections.
    then start making relationships betweens those entities.
    once you got that, normalize everything and see what you get.

    sometimes, the need for an EAV table arises, sometimes not.

    i have to go, but i'll check this thread to see how it goes.

    @Karla said:

    And I'm not doing this for work, new boss is outsourcing the replacement (they are looking at packages like SalesForce). I just want a sample design to practice .NET and demonstrate how I would design it if I had the power to do so.

    that's the best way to learn!


  • Notification Spam Recipient



  • @Jarry said:

    first, identify your entities. IE questions, answers, groups, sections.then start making relationships betweens those entities.once you got that, normalize everything and see what you get.

    Other than EAV, the only other way I can think of to be fully normalized is bunch small question/answer link tables. That seems to me a maintenance and reporting nightmare.

    At one point I did consider saving the more dynamic question/answers as XML (though now I would prefer JSON). Since we don't do a ton a validation maybe that would work?

    The majority of questions are radiobutton Yes/No/NA or Very Unlikely, Unlikely, Neither, Likely, Very Likely. A dozen or so are dropdowns. We have look up tables for all of those. A dozen or so date fields and a dozen or so text and date fields


  • Notification Spam Recipient

    @Karla said:

    That seems to me a maintenance and reporting nightmare.

    This. My thing I posted above is a veritable nightmare for consistent reporting, unless the users don't ever change the questions (which is against the point of the design).



  • That thread was how I learned how to change the default page that opens when you click New Query in SSMS.



  • @Tsaukpaetra said:

    That seems to me a maintenance and reporting nightmare.

    This. My thing I posted above is a veritable nightmare for consistent reporting, unless the users don't ever change the questions (which is against the point of the design).

    I'm ok with some denormalization just for this reason. But with a table 400 columns wide--I have too much denormalization.



  • @Tsaukpaetra said:

    users don't ever change the questions

    Version the questions? Report states "earlier versions" for not the current version?


  • Notification Spam Recipient

    @swayde said:

    Version the questions? Report states "earlier versions" for not the current version?

    Interesting idea, but what would that even look like? I'm not experienced enough to do it properly, so I don't pretend to do it at all. Luckily, the nature of the question changes (so far) seem mostly verbiage or available-answer type.



  • @swayde said:

    users don't ever change the questions

    Version the questions? Report states "earlier versions" for not the current version?

    "Versions" are the particular combination of questions, sections with join tables so groups can display only the sections and the questions they want.

    Versioning the questions would be too much (several hundred question in each version).

    And actually users do not change the questions. I made a rough admin for my own use but they are mostly not savvy enough for that. I display matrices of section X group/ question X group (pass in a section id) and clicking the cell toggles the display.

    Originally, I made it database driven because I was just so tired off correcting bad table based layout on 20 view pages. Another WTF.



  • Nevermind "EAV" -- it sounds to me like you want to calculate direct sums.

    Take a step back and think of it this way. Imagine that each department has its own table. In other words, you have department A that has a table A (and its specific fields) and department B that has its own table with its own fields.

    Now, what you have now is a single table that has tons of columns, because it has all the columns for A, all the columns for B, all the columns for C, etc.

    Eff that. Make a table for A, make one for B, make one for C, etc. Use outer joins to "add" the tables together, as needed (i.e., if and when you actually need to pull data from all the departments). You can carefully factor the tables so that the "main" table has all of the "common" fields, and then make tables for the things that aren't shared. To keep things sane, use the same field names for anything that is shared..

    The architect is right.



  • @Captain said:

    Eff that. Make a table for A, make one for B, make one for C, etc. Use outer joins to "add" the tables together, as needed (i.e., if and when you actually need to pull data from all the departments). You can carefully factor the tables so that the "main" table has all of the "common" fields, and then make tables for the things that aren't shared. To keep things sane, use the same field names for anything that is shared..

    The architect is right.

    I do see how that is sane but I still have a mental block on having 40 tables (2 versions X 20 groups) and another 20 when a newer version comes around.

    And I definitely wouldn't have a user-accessible admin for that (not that we do now--but in an ideal design a user could add a new group and/or new version)


  • Garbage Person

    EAV becomes pretty necessary in this sort of use case if you're stuck with relational. However, I would honestly recommend some sort of document store database storing JSON. Reporting will be interesting, (though reports on EAV tables are terrible in their own right) but if I understand the use case you should be able to summarize things in big batches (or keep running summaries on the fly).



  • @Weng said:

    EAV becomes pretty necessary in this sort of use case if you're stuck with relational. However, I would honestly recommend some sort of document store database storing JSON. Reporting will be interesting, (though reports on EAV tables are terrible in their own right) but if I understand the use case you should be able to summarize things in big batches (or keep running summaries on the fly).

    Since I still want supporting data and common data to be relational is a WTF to just store the different data in a column that is varchar as JSON?

    And since sections are on different pages to lessen the amount of data a different column for each section.


  • Garbage Person

    I've done that, also have keyed document DBs to a relational DB.



  • I worked on a survey project awhile back. Hopefully it was similar enough(and I understand your requirements).

    We had:

    • Survey table (had stuff like survey title, special instructions...)

    • SurveyToQuestions table (had F/K to SurveyID and F/K to QuestionID, and order of questions, assumed the same questions could be used in different surveys)

    • Questions table (we had a F/K to question type b/c some of the "questions" were actually breaking bars or special instructions, also had to account for predetermined answers like check boxes or free form text).

    • Answers table (F/K to QuestionID, and order of answer if multiple answers, like a checkbox/radio button, and a F/K to answer type, we assumed an answer could only belong to one question.)

    • Responses table (contained when a survey was administered, a few other fields)

    • Choices table (F/K back to Responses table, and F/K to QuestionID & and AnswerID, this is table that stored each question/answer a user chose on a survey. Also text field for free form entry.)

    It sounds like you need to squeeze section into the middle of survey question like:

    • Forms table (each version would be stored here).
    • FormsToSection (I'm assuming a section can appear in more than one survey/form)
    • Section
    • SectionToQuestion
    • Question
    • Answer (if you want the same answer to multiple question insert a many to many table in between).

    On the frontend we pulled the survey(by url), and the questions/answers into an object model. Looped through the questions, based on the question type and answer type we displayed the appropriate question (or special instructions).

    foreach(Question q in Questions)
    {
    if (q.TypeID = QuestionType.RadioButton)
    {
    div tag: Show question <- use "id" attribute like this "QuestionID_qid"
    foreach(Answer a in q.Answers)
    radio button <- use "id" attribute like this "QuestionID_qid_aid"
    }
    if (q.TypeID = QuestionType.FreeForm)
    {
    div tag: Show question <- use "id" attribute like this "QuestionID_qid"
    textarea tag: <- used "id" attribute like this "QuestionID_qid", there is no answer id for text boxes in the above datamodel.
    }
    }
    NOTE: qid is QuestionID, aid is AnswerID(the actual values), you'll need these to parse on the middle tier to figure out what was chosen/typed in.

    NOTE2: We added in an admin section so we could edit the text of the questions/answers as well thing like order w/out having to roll SQL code for every change. That paid for itself in gold(no development effort, small QA effort, all on the BAs to work w/the biz unit).

    Hope this helps, but whatever you do, don't do EAV, it will end in tears. The only time I've seen it used was for a small setting table. For larger database designs EAV will end in tears.



  • @Weng said:

    I've done that, also have keyed document DBs to a relational DB.

    OK, I will try this...it is for learning and/or sample code for next job.



  • @Karla said:

    Since I still want supporting data and common data to be relational is a WTF to just store the different data in a column that is varchar as JSON?

    If you're going to store unstructured data, I'd suggest XML over JSON for the time being, given that SQL Server has much more support for it with respect to indexing, xquery, etc.

    [Now, if you want to wait for SQL Server 2016, it becomes a different ballgame...][1]
    [1]: http://blogs.msdn.com/b/jocapc/archive/2015/05/16/json-support-in-sql-server-2016.aspx



  • @c62 said:

    Responses table (contained when a survey was administered, a few other fields)

    I'm not understanding how you are storing the responses and then displaying them.

    The rest sounds like it mostly works for what we need.
    The funny thing is I also just did the db design for a very small survey (20 questions) app that is somewhat EAV but that was after discussing with manager and we did it because it was small and the app ONLY inserts a survey, no display/edit..



  • @Groaner said:

    Since I still want supporting data and common data to be relational is a WTF to just store the different data in a column that is varchar as JSON?

    If you're going to store unstructured data, I'd suggest XML over JSON for the time being, given that SQL Server has much more support for it with respect to indexing, xquery, etc.

    Yeah that is a good point. I don't like the verbosity of XML vs JSON.
    The JSON data wouldn't be what needs optimizing for search and reports.
    So I could do much of the work on data access layer.

    @Groaner said:

    Now, if you want to wait for SQL Server 2016, it becomes a different ballgame...

    Skimming the article, I can't see the expected release date.


  • Garbage Person

    I think SQL releases are usually late Q2. I haven't been paying attention to this time out, though- too much on the docket this year for a SQL version bump.

    There might be a preview floating around by not, and what are preview builds for if not résumé projects.



  • @Weng said:

    I think SQL releases are usually late Q2. I haven't been paying attention to this time out, though- too much on the docket this year for a SQL version bump.

    There might be a preview floating around by not, and what are preview builds for if not résumé projects.

    Exactly. I'll see what I can find.



  • Response table contains things like datetime when survey/form was taken by user something like this:

    Reponses

    • ResponseID (int)
    • SubmittedOn (datetime)
    • SurveyID/FormID (int F/K to Forms/Surveys table)
    • WhoTookTheForm (int F/K to some employee table so we know who wrote the record)
    • IsActiveRecord (bit, we don't do physical deletes, only logical ones)

    Choices table contains the specific question & answer a user entered

    Choices

    • ChoiceID (int)
    • ResponseID (int, F/K to Responses table)
    • QuestionID (int, F/K to Questions table)
    • AnswerID (int, F/K to Answers table)
    • TextField (varchar, this is when users can submit free text)


  • @c62 said:

    Choices

    ChoiceID (int)
    ResponseID (int, F/K to Responses table)
    QuestionID (int, F/K to Questions table)
    AnswerID (int, F/K to Answers table)
    TextField (varchar, this is when users can submit free text)

    Isn't that a modified EAV?

    Unless I am missing something that is similar to what I did on the very small survey app.

    Let me just state my understanding:
    Response to Choices is 1 -> n relationship. Each Choice has either an AnswerID or TextField.

    I guess since there is a variable number of rows in Choices it isn't much different in my mind that EAV.


  • Trolleybus Mechanic

    @Karla said:

    Isn't that a modified EAV?

    Yes. But EAV is taking that to the nth degree. To the point where you abstract the Choice and Answer to generic tables. You end up with just one table.

    The pattern c62 laid out is how I've always seen question/answer objects saved.

    Form (or Test or Whatever) (FormID, FormName, StartDate, EndDate)
    Question (QuestionID, FormID (fk), QuestionOrder, QuestionText, HintText, QuestionType (drop down, radio button, freetext, etc), QuestionAnswerMin, QuestionAnswerMax)
    Answer (AnswerID, QuestionID, AnswerOrder, IsCorrect, IsFreeText, IsForcedLastPosition)

    Response (ResponseID, UserID, FormID, Date)
    ResponseQuestionAnswer (ResponseAnswerID, ResponseID, QuestionID, AnswerID, FreeTextResponse)

    It isn't full-on-space-cowboy EAV. There's some fun outer joins to get a Form + Response based on FormID and UserID. But this will do everything you need. Expand on it with some extra tables for things like section breaks, Question subsets, etc.

    Getting the data out into columns requires a Pivot still, but that's what Reports are for.



  • EAV (as I understand it) is something like this

    Entity Attribute Value
    EmpoyeeID int 86
    FirstName string Tom
    HiredOn datetime 1/1/2016

    (I can't get the formatting right, sorry)

    The example above just uses basic datatypes, but you can also use more complex datatypes as well. The Attribute column may have a F/K to an attributes table. Even though the attribute says "int" or "datetime", the value column stores it as a string/varchar b/c anything can be put in there. The DB can't guarantee referential integrity at this point.

    • The entity: the item being described.
    • The attribute or parameter: a foreign key into a table of attribute definitions. At the very least, the attribute definitions table would contain the following columns: an attribute ID, attribute name, description, data type, and columns assisting input validation, e.g., maximum string length and regular expression, set of permissible values, etc.
    • The value of the attribute.


  • @Lorne_Kates said:

    Yes. But EAV is taking that to the nth degree. To the point where you abstract the Choice and Answer to generic tables. You end up with just one table.

    The pattern c62 laid out is how I've always seen question/answer objects saved.

    Form (or Test or Whatever) (FormID, FormName, StartDate, EndDate)Question (QuestionID, FormID (fk), QuestionOrder, QuestionText, HintText, QuestionType (drop down, radio button, freetext, etc), QuestionAnswerMin, QuestionAnswerMax)Answer (AnswerID, QuestionID, AnswerOrder, IsCorrect, IsFreeText, IsForcedLastPosition)

    Response (ResponseID, UserID, FormID, Date)ResponseQuestionAnswer (ResponseAnswerID, ResponseID, QuestionID, AnswerID, FreeTextResponse)

    This what I did with this small survey app. I did quickly learn DataZen to allow the users to some useful datametrics.

    @Lorne_Kates said:

    It isn't full-on-space-cowboy EAV. There's some fun outer joins to get a Form + Response based on FormID and UserID. But this will do everything you need. Expand on it with some extra tables for things like section breaks, Question subsets, etc.

    I was thinking even if this only a little EAV it has all the bad of full on EAV.

    Since it isn't full-on-space -cowboy I can mention it as experiences.


  • Trolleybus Mechanic

    @Karla said:

    This what I did with this small survey app.

    You done right.

    @Karla said:

    I was thinking even if this only a little EAV it has all the bad of full on EAV.

    That's like saying a little bit of Javascript is as bad as full on Discourse.



  • @c62 said:

    EAV (as I understand it) is something like this

    Entity Attribute ValueEmpoyeeID int 86FirstName string TomHiredOn datetime 1/1/2016

    (I can't get the formatting right, sorry)

    The example above just uses basic datatypes, but you can also use more complex datatypes as well. The Attribute column may have a F/K to an attributes table. Even though the attribute says "int" or "datetime", the value column stores it as a string/varchar b/c anything can be put in there. The DB can't guarantee referential integrity at this point.

    The entity: the item being described.
    The attribute or parameter: a foreign key into a table of attribute definitions. At the very least, the attribute definitions table would contain the following columns: an attribute ID, attribute name, description, data type, and columns assisting input validation, e.g., maximum string length and regular expression, set of permissible values, etc.
    The value of the attribute.

    Yes, in it's purest form...but I was resisting even partial EAV,

    I think I am going to start with admin portion that will drive what the answers table or XML/JSON will be stored.

    I did have a thought on my commute in term of the admin since each "version" was meant to be universal to being with.
    I am going to use Evaluation Types (in the previous system each had 1 evaluation type and many groups that wanted modified versions of it. But all groups should get the default layout until they customize there version)

    So all questions from both database combine for the new question table. I am going to start saying "evaluation type" which starts with standard evaluation of version 1 and 2. Then each group can accept the standard or that group will copy and modify the default *(or one of another group that is likely to have similar questions),

    So instead of 2 versions and 20 customized by group...we will have up to 40 evaluation type, a new group by default is assigned the default version. Which then can be modified. Groups to evaluation types will be one to many. and each type gets an active start and end date.


  • Trolleybus Mechanic

    @c62 said:

    EAV (as I understand it) is something like this

    It's even worse than that.

    Imagine a table: EMPLOYEE (ID, firstname, lastname, HiredDate, FiredDate)

    And a record: (1, Lorne, Kates, 2010-01-01, null)

    Now instead imagine this:

    Table ENTITY (EntityID, EntityName)
    Table ATTRIBUTE (AttributeID, AttributeName, AttributeDataType)
    Table ENTITY_ATTRIBUTE (EntityID, AttributeID) -- Definition of all attributes that can belong to an entity
    Table VALUE (EntitiyID, AttributeID, Value) -- You might instead of ValueVarchar, ValueInt, ValueBool, ValueDateTime, ValueDouble, etc)

    So your one table is now four tables. Your one record is now many:

    ENTITY (1, 'Employee')
    ATTRIBUTE (1, ID, int)
    ATTRIBUTE (2, 'First Name', 'varchar')
    ATTRIBUTE (3, 'Last Name', 'varchar')
    ATTRIBUTE (4, 'HiredDate', 'DateTime')
    ATTRIBUTE (5, 'FiredDate', 'DateTime')
    
    ENTITY_ATTRIBUTE(1, 1)
    ENTITY_ATTRIBUTE(1, 2)
    ENTITY_ATTRIBUTE(1, 3)
    ENTITY_ATTRIBUTE(1, 4)
    ENTITY_ATTRIBUTE(1, 5)
    
    VALUE(1, 1, 1)
    VALUE(1, 2, 'Lorne')
    VALUE(1, 3, 'Kates')
    VALUE(1, 4, '2010-01-01')
    VALUE(1, 5, null) -- Or this may be omitted, depending on your level on insanity
    

    Now try to write SELECT * FROM Employees WHERE EmployeeID = 1;

    That's full-on spacecowboy EAV-- where you end up reimplementing the database in the database.



  • @Lorne_Kates said:

    I was thinking even if this only a little EAV it has all the bad of full on EAV.

    That's like saying a little bit of Javascript is as bad as full on Discourse.

    LOL



  • @Lorne_Kates said:

    Now try to write SELECT * FROM Employees WHERE EmployeeID = 1;

    That's full-on spacecowboy EAV-- where you end up reimplementing the database in the database.

    Do not want.



  • @Lorne_Kates said:

    This what I did with this small survey app.

    You done right.

    Thank you. 😄



  • @Lorne_Kates , your example & explanation was far better than mine.



  • https://what.thedailywtf.com/t/alternatives-to-eav-anti-pattern/54210/19?u=swayde
    That was a fairly good response.
    Essentially add a versions table, keep track of the version number. Then on a new version count one up, and add a new version.



  • Thanks everyone.


  • ♿ (Parody)

    @Karla said:

    Yes, in it's purest form...but I was resisting even partial EAV,

    EAV is shoving unrelated things into the same table. Having all of the questions in a table and all of the answers in a table is not EAV. It's just how a relational DB works.



  • People who design full-on spacecowboy EAV databases need to be taught a lesson, with a cluebat.


  • Winner of the 2016 Presidential Election

    @AlexMedia said:

    People who design full-on spacecowboy EAV databases need to be taught a lesson, with a cluebat.

    Please start with the Magento developers. I had to work with their shitty software for years.



  • @AlexMedia said:

    People who design full-on spacecowboy EAV databases need to be taught a lesson, with a cluebat.

    Every knows you never go full spacecowboy EAV!

    Edit Discurse!



  • @boomzilla said:

    Yes, in it's purest form...but I was resisting even partial EAV,

    EAV is shoving unrelated things into the same table. Having all of the questions in a table and all of the answers in a table is not EAV. It's just how a relational DB works.

    Yes, apparently I did not fully understand. Thank you for pointing it out.


Log in to reply