Entity-Attribute-Value model



  •  Suppose I'm building a kind of social-networking site where users can enter for instance:

         favorite food: pizza

         hair color: black

    i.e. any attribute and any value.

    Here are the designs I am considering:

    Design 1:  Find out all the attributes in advance and design a proper schema for this.  This would allow me to query for instance how many people like Italian food, but if a user wants to add "Last book read: SQL and Relational Theory by C.J. Date" and I don't have such an attribute, he can't do it.

    Design 2:  If the user adds "Last book read:  Antipatterns" then just ALTER TABLE and ... uh, no.

    Design 3:  Every night, redesign the schema properly to accomodate new attributes and allow new queries about books or whatever.

    Design 4:  Have one table with primary key = user and attributes "Attribute" and "Value".  Then just add the row ("Bill", "Last book read:", "Antipatterns").  My gut tells me this is useless for querying; my research tells me this is the notorious Entity-Attribute-Value antipattern.  I agree that it's no good trying to get any meaning out of a table like this.

    Is there any good theory behind the "Entity-Attribute-Value model" and should I study the Wikipedia article on it?  Or better yet, has someone somewhere done this the right way and can I do it that way, whatever it is?

    (I'm not actually building such a site, but I am looking for an appropriate model for something more or less similar to this.)

     

     



  • Hmmm, sounds like you're trying to make it too extensible. I would say run it in beta for a bit with a limited number of users (or limited number of additional fields specified) and see what sort of things people enter in for attributes. Tell them that's what you want them to do ahead of time, and try and get as many people as you can to use the initial model. Then, once you've collected a few hundred thousand or more entries in your EAV list, then go back and figure what people are using it for.

    However, anytime I hear people say "a social networking site" my gut reaction is "why? there's already facebook and myspace, just try to figure out how to make theirs better, either via app or what have you".

    Are you looking to help people find relations between each other based on, for instance, favorite-book=antipatterns, or are you looking to model populations based on inference from studies of subsets of those models? The first seems to stress that you need to specify what attributes are coming, meaning your Design 1:, and the second is useless, unless you're trying to model the behaviours of social networkers.

    Also, keep in mind that there's a few different types of people:

    1) will fill in every box given the opportunity. Will look for ones they don't have that others do. ([+]guilty)

    2) will fill in no box, unless mandatory. ([+] my wife)

    3) don't really care, unless it's important to them.

    Also, it sounds like you're looking to have either a huge number of intersection tables or child tables for a whole helluva lot of data, so what backend were you going to build this on? I would be curious what you end up choosing for your data model, so keep the thread posted whatever you decide. :D

    Lastly, before I go, what's your thought? Which do you think is best? (I like #1)



  • @newfweiler said:

    Design 1:  Find out all the attributes in advance and design a proper schema for this.  This would allow me to query for instance how many people like Italian food, but if a user wants to add "Last book read: SQL and Relational Theory by C.J. Date" and I don't have such an attribute, he can't do it.

    Too inflexabile


    Design 2:  If the user adds "Last book read:  Antipatterns" then just ALTER TABLE and ... uh, no.

    YUK. NEVER!


    Design 3:  Every night, redesign the schema properly to accomodate new attributes and allow new queries about books or whatever.

    Just as bad


    Design 4:  Have one table with primary key = user and attributes "Attribute" and "Value".  Then just add the row ("Bill", "Last book read:", "Antipatterns").  My gut tells me this is useless for querying; my research tells me this is the notorious Entity-Attribute-Value antipattern.  I agree that it's no good trying to get any meaning out of a table like this.

    Is there any good theory behind the "Entity-Attribute-Value model" and should I study the Wikipedia article on it?  Or better yet, has someone somewhere done this the right way and can I do it that way, whatever it is?

    (I'm not actually building such a site, but I am looking for an appropriate model for something more or less similar to this.)

    This is what you want. Why do you call it an antipattern? It is very common and powerfull enough for what you need. Here is a  basic  idea (GODFUCKINGDAMNIT I AM GOING TO KILL THIS MOTHERFUCKING WYSIWYG)

    User Table

    Contains user info that is shared between all users

    Users Profile

    uint - user_Id

    varchar(64) - attribute_name

    text/blob - attribute_value

     

    So then you just insert a new row into users_profile as such:

    (<user id>, "Last Book Read", "A book")

    Then query it like

    SELECT attribute_value, user_id FROM attributes WHERE attribute_name = "Last Book Read"

     

    You may want to also normize the attribute_name by running it though some basic text cleanup (so that "Last Book Read" is the same as "lastbookread" and so on)



  • I agree with NorseLaQuet, when you want to be dyanmic about what people can add as attributes you want a key value table. The "matching" you talk about can still be done, and you can use a few tricks to get people to enter the same values. Like for instance add autocompletion to those form elements so that if someone wants to enter a favorite food it will show pizza when they type "pi" in the value field.




  • @stratos said:

    <opinion>

     

    I agree with everything stratos said.  It certainly isn't an anti-pattern to have dynamic fields in your system.

     



  • @tster said:

    @stratos said:

    <opinion>

     

    I agree with everything stratos said.  It certainly isn't an anti-pattern to have dynamic fields in your system.

    It most certainly is an ani-pattern when implemented using a relational database.

    if 'favourite food' is all you want to look up you may get away with it.  Try writing the SQL for some more complex scenarios.  Also try it with real data complete with spelling mistakes and synonyms (best/favourite/favorite/sexyest/sexiest group/band/artist/musicians) .  You will find that this kind of user content is almost un-analysable using SQL.



  • .@LoztInSpace said:

    @tster said:

    @stratos said:

    <opinion>

     

    I agree with everything stratos said.  It certainly isn't an anti-pattern to have dynamic fields in your system.

    It most certainly is an ani-pattern when implemented using a relational database.

    if 'favourite food' is all you want to look up you may get away with it.  Try writing the SQL for some more complex scenarios.  Also try it with real data complete with spelling mistakes and synonyms (best/favourite/favorite/sexyest/sexiest group/band/artist/musicians) .  You will find that this kind of user content is almost un-analysable using SQL.

    You want to do too much with your database. The point is storing data.

    The problem of trying to get some structure into the names people give their attributes is certainly not something you would want to solve via SQL. 

    For instance you "could" solve the above by asking the user to classify their attribute into a group of predefined categories. Like "food". So that all attributes like "favorite food", "food i make for myself" "food i give to my dog" whatever, all get's put into a category called "food". and then to answer your question.

    [code]SELECT attribute, value, userid, category FROM user_profile WHERE category = 'food'[/code]



  • Your solution is correct, but at the expense of your previous argument.  Proper classification and structure is the difference between data and 'a disk full of stuff'.

    As for your 'dont do too much with your database comment' - I really don't know where to begin.  Data is everything to a business and needs to be properly modelled and stored to be useful.  Generally speaking most people are not capable of writing better ways of data access and manipulation than the clever guys and girls who wrote your database engine (with the possible exception of MySQL).  A bunch of app servers sucking XML stored in a blob into a bunch of Java/PHP scripts is not the way to do it.

     



  • @LoztInSpace said:

    Your solution is correct, but at the expense of your previous argument.  Proper classification and structure is the difference between data and 'a disk full of stuff'.

    As for your 'dont do too much with your database comment' - I really don't know where to begin.  Data is everything to a business and needs to be properly modelled and stored to be useful.  Generally speaking most people are not capable of writing better ways of data access and manipulation than the clever guys and girls who wrote your database engine (with the possible exception of MySQL).  A bunch of app servers sucking XML stored in a blob into a bunch of Java/PHP scripts is not the way to do it.

     

    Call me a new age hippie if you like, but I dislike the idea of putting businesslogic in my database. I prefer to just store data in the database and try to keep computations out of SQL, unless it's the most obvious place to do it. However i also believe that the structure of the database should follow from the needs of the application, and if that application needs users to have userdefined attributes with values then a key value table is the way to go. 

    Also about your last sentence. the 90ies just called, they say they want you back. Times are changing, tech is changing. The most common use case of databases isn't a monolithic data store anymore. Most websites don't have massive ammounts of data that needs to be stored in a relational model. Document based databases are going to take a large chunk of this market, and it is not bad design or bad practice or a anti-pattern or what will you, it's just the database tech adapting to the needs of todays applications.

     



  • We're not quite talking about the same thing.  When your entire business is the website (like this one), then you certainly have more flexibility especially in the short term.  However in many business, your data is more important than any individual app, of which there will be several.

    There are many businesses where a website is only one of many avenues of communication, and I would challenge your assertion that the most common use of databases is to serve web apps.  In real, existing and future business, stuff comes in from all over the place and without some central point that accurately reflects your world, it can get our of hand.  Sure you can write a web site that doesn't allow you to order a sofa with chrome teapot handles, or makes sure that you don't pick a car insurance policy to insure your house, but life isn't always just the web.  Orders come in from phone, fax, paper, feeds from agents & brokers etc etc from a variety of sources with a variety of bugs and quirks.  You need to manage this consistently so you can make assumptions later on about the quality of your data.  Over time, you need to manage sales, vendors, promotions, credit lines, orders, staff comissions as the business evolves.  The tech will also evolve, using new apps, new tech, new vendor systems, driven by lower cost and/or tech du jour.  The one thing that allows you to remain in business is your data and the facts and rules that go with it.  Even if disaster knocks out a few applications, doing business gets harder, but still doable, provided you still know whow your customers, suppliers are, have your orders etc etc.

    Anyway, despite the above, I agree that you will not always code all BL in the DB.  However there are certain things you should & must do to make sense of what you have.  You don't want teapot handle options on commissions, credit lines to sofas, delivery schdeules on favourite foods etc.  Yes, you can write yourself a big old central point that applies all your rules, keeps everything in order, manages all manner of things you will need, or you can take the time to learn a bit about databases and you'll find they usually do everything you need no matter what your applications are written in or how your business changes.

     



  • If it isn't meant to be searchable, just make it a big text area where people can enter whatever the like. (Actually, it's still easier to search that than an EAV schema)
    EAV is bad. EAV where users invent attribute names on the fly is really bad.
    One user writes "favorite food", the next one "favourite food", another one "food I like best" etc. Makes no fscking sense.



  • My gut instinct tells me that if people are required to enter their own attribute name and value, then they're going to stare blankly at the screen for a while before leaving it empty. The tried and true option is to just select a few attributes you think people might like to know, and then ask them to input their values.

    Basically, you're asking people to both ask a question about themselves, and answer it. It is far easier and more user friendly if YOU ask the question.


  • Discourse touched me in a no-no place

    @Scouse said:

    My gut instinct tells me that if people are required to enter their own attribute name and value, then they're going to stare blankly at the screen for a while before leaving it empty. The tried and true option is to just select a few attributes you think people might like to know, and then ask them to input their values.

    Basically, you're asking people to both ask a question about themselves, and answer it. It is far easier and more user friendly if YOU ask the question.

    Or there's the 3rd way - provide the attributes, but also offer an option to add some of their own.


  • @PJH said:

    Or there's the 3rd way - provide the attributes, but also offer an option to add some of their own.
     

    +1



  • @LoztInSpace said:

    It most certainly is an ani-pattern when implemented using a relational database.

    if 'favourite food' is all you want to look up you may get away with it.  Try writing the SQL for some more complex scenarios.  Also try it with real data complete with spelling mistakes and synonyms (best/favourite/favorite/sexyest/sexiest group/band/artist/musicians) .  You will find that this kind of user content is almost un-analysable using SQL.

     

    I didn't say to have the user enter the fields.  I would imagine it being more like an administrator can add a field easily on the fly without having to change the database or code or anything.  

     Another example would be lets say you get to specify your occupation.  Teacher, Trucker, Programmer, etc.  Each occupation comes with a set of attributes:

    Teacher => grade, subject, school, etc.

    Programmer => favorite language, favorite IDE,  etc.

    Then you want people to be able to request to add new occupations with a set of attributes.  And you want to be able to add attributes to a profession after the fact.

    As for your challange of writing the SQL for this stuff.  I have created a task tracking application which has dynamic fields for each project.  There is a full searching mechanism to search on any number of fields and it is done 100% in the SQL.



  • @LoztInSpace said:

    There are many businesses where a website is only one of many avenues of communication, and I would challenge your assertion that the most common use of databases is to serve web apps.  In real, existing and future business, stuff comes in from all over the place and without some central point that accurately reflects your world, it can get our of hand.  Sure you can write a web site that doesn't allow you to order a sofa with chrome teapot handles, or makes sure that you don't pick a car insurance policy to insure your house, but life isn't always just the web.  Orders come in from phone, fax, paper, feeds from agents & brokers etc etc from a variety of sources with a variety of bugs and quirks.

     

    In this case you could write a database layer that stores the data and then a service layer that accesses the database and does all the business logic.  Then all your apps and the website and everything would accesss the service layer.



  • I'd be tempted to add a delimited list as a column to the table listing the available careers, referencing primary keys on an attributes table.

    Attributes
    1 => grade
    2 => subject
    3 => school
    4 => favourite language
    5 => favourite IDE

    Occupations
    Teacher => 1,2,3
    Programmer => 4,5

    Then if you add new attributes, just add them to the attributes table, and append its number to any relevant occupations. It also gives has DRY advantages, if you also had "University Lecturer", for instance, you could re-use the subject attribute.

    I don't know database construction too well, so this could be completely the wrong way of doing this.



  • @tster said:

    As for your challange of writing the SQL for this stuff.  I have created a task tracking application which has dynamic fields for each project.  There is a full searching mechanism to search on any number of fields and it is done 100% in the SQL.

     

     I would be genuinely interested to see that, along with some examples and the query plans if you can.

    @tster said:

    In this case you could write a database layer that stores the data and then a service layer that accesses the database and does all the business logic.  Then all your apps and the website and everything would accesss the service layer.

    Well yes you could, and that's what you need to do for your individual applications, but the DB will do a hell of a lot of work for you if you let it, and you should when it is fundemental data model type stuff.

    @RoadieRich said:

    .....so this could be completely the wrong way of doing this.

    Yup.  Completely and utterly wrong.



  •  sigh, damn this forum software.  I just lost a bunch of shit I typed, and I'm too lazy to retype it.  Instead I'll just post the stored proc that does the filtering.  If it doesn't make any since, ask me about it and I'll try and explain what the various tables are.

     NOTE:  I've trimmed this down to the parts you will be interested in

    declare @project int<o:p></o:p>

    declare @filter int<o:p></o:p>

    set @filter = dbo.ACTIVE_FILTER(@person)<o:p></o:p>

    set @project = dbo.ACTIVE_PROJECT(@person)<o:p></o:p>

    <o:p> </o:p>

      select issue.Name, issue.IssueID, issue.StatusID, issue.SeverityTypeID, issue.AssignedTo, issue.OpenedBy<o:p></o:p>

      from Issue issue<o:p></o:p>

      where issue.ProjectID = @project and issue.deleted = 0 and<o:p></o:p>

      <snip><o:p></o:p>

      and --category<o:p></o:p>

      ( select count(*)<o:p></o:p>

        from FilterEnabledCategory<o:p></o:p>

        where FilterID = @filter ) = --the number of enabled categories<o:p></o:p>

      ( select count(*)<o:p></o:p>

        from Issue issue2<o:p></o:p>

        inner join CategoryItem2Issue items on issue2.IssueID = items.IssueID<o:p></o:p>

        inner join FilterCategoryItemMapping map on map.CategoryItemID = items.CategoryItemID<o:p></o:p>

        inner join CategoryItem catItem on catItem.CategoryItemID = items.CategoryItemID<o:p></o:p>

        inner join FilterEnabledCategory enabled on enabled.CategoryID = catItem.CategoryId<o:p></o:p>

        where issue2.deleted = 0 and issue2.ProjectID = @project and map.FilterID = @filter and issue2.IssueID = issue.IssueID And enabled.FilterID = @filter) <o:p></o:p>



  • @RoadieRich said:

    I'd be tempted to add a delimited list as a column to the table...

     

    You should have just stopped right there, because that's as far as anyone should need to read.



  • Thanks for posting the code.  Without looking at the execution plan I can't be sure but I can't help thinking though that when you compare it to a more relational model it's a bit over intensive.  Also I'm not sure how you do "this and that or (something and not somethingElse)".

     



  • @LoztInSpace said:

    Also I'm not sure how you do "this and that or (something and not somethingElse)".
    I don't


Log in to reply