Database design question



  • We have 2 products at work that need to be globalized to allow things like GUIs in foreign languages and currency etc.

     Today, we had a bit of a friendly debate on how to do it, and in the end, the two apps decided to each do it their own way.  Overall, that's not a big issue, because they are 2 distinct apps developed by 2 distinct teams and ne'er the two shall meet.  But after our debate I started wondering if one way was clearly better or not, and if so, why.  I'm no DB expert by any stretch of the imagination, so I was hoping you guys could help shed light.

     App #1

    A single string table that contains all strings in all languages.  Strings are identified by a string ID and a language ID (e.g. 1 | 1 | red and 1 | 2 | rojo) .  At runtime this is unioned with itself to create a table, for example, of all Spanish strings and all English strings where Spanish doesn't exist.  This table is then linked every time a dropdown needs to be populated, or an output needs to be created.  In cases where selection criteria is based on a specific string, a constant is defined in code, such as StringIdConductorBar = 1234 (to avoid those 'magic numbers') For labels resource files are used localized for each required language.

     

    App #2

    A "label string" table exists where every string that is used as a label on screen is stored.  Much like the bigger string table above, this is keyed off of string ID and language ID.  However, for dropdowns, an individual table is created for each kind of dropdown, for example "organizational roles" will have an "organizational role" table.  The fields will be RoleId, RoleName, Description (string)  This description is a borderline comment for the developers like "remember this role is used only by people in germany" or something similar. Then, a separate Organizational Strings table will exist and it will have the RoleId (as string ID) Language ID and Description (string). This description field would be what is actually displayed on the screen.  The RoleName would be a human readble value.  Such that RoleId 2 is a purchasing officer, so the organizational roles table might look like: [2 | purchasing_officer | employee in charge of purchasing ] then the organizational strings table will look like [2 | 1 | purchasing] (where 1 = english, for example.)   The thought being, that code would then look like "if role = purchasing_officer blah blah blah"  this was an attempt to avoid code that looked like "if role = 2" The argument here, was that a constant could be defined in code (eg const int purchasing_chief = 2) and eliminate the "organizational roles" table, but the counter was that you're maintaining a mapping in 2 places (1 in code where you point a string id to a human readable name, and one in code where you map a string ID to an actual description) and even though the datbase is going to maintain the same mapping with 2 tables, at least you didn't have to change code and data.  Although, it was pointed out, if new roles are created, code will still have to be changed to incorporate the new role in selection logic.

     

    It would have been nice if the apps could have agreed, but since they share no globalization data (only data they share is customer contacts) it wasn't a requirement.  Also, if it makes any difference, App #1 is a windows app and app #2 is a web app.

     

    The parties have both already made up their mind, so I'm not trying to prove my side right, nor their side wrong.  Nor am I trying to build a case for the "losing" side to convert.  But since I have little database experience, I wanted to see what you guys think 

     Is one idea better?  do both suck?  If one is worse, WHY? I'm asking here, because obviously both sides at work can defend theirs and attack the other, but thought maybe a neutral audience could help me learn :)

     Thanks

    Nate
     



  • You probably need both.

    The mechanism described in #1 is necessary to translate the static parts of the application, i.e. the GUI, reports. But it doesn't work out well for semi-dynamic data (which is not absolutely fixed but doesn't change regulary either) like the mentioned role names, where #2 is more appropriate.

    But #1, as simple as you have described it, probably won't work well either. That's because one word in English may have different translations in another language, depending on context. For example, the translation of "code" in English could be "Code" (e.g. secret code) or "Nummer" (e.g. product code) or "Vorschrift" (code of conduct) or even another word in German, but those words are not interchangeable. To make it even worse, the translation might take much more space in another language, so you might have to abbreviate it in some places. What you need is the ability to have a different translation for every occurance of that word or term. On the other hand, since in many cases the same translation is appropriate, the application used by the translator must make it easy to use one translation for many or all places.



  •  Mainly developing in .net, I've normally taken the approach of using the .net globalisation features (e.g. using <FONT class="" face=fmisspellt spellChecked="true">satellite</FONT> assemblies containing string <FONT class="" face=fmisspellt spellChecked="true">resources</FONT> etc)

     I've often also taken the approach of keeping any 'translation' out of the database and instead using user defined types, or <FONT class="" face=fmisspellt spellChecked="true">separate</FONT> tables for things such as status fields, or 'type' fields (e.g.  

    #3       type <FONT class=misspellet face=fmisspellt spellChecked="true">OrderStatus</FONT> = {SUB, <FONT class=misspellet face=fmisspellt spellChecked="true">PRC</FONT>, ACC, <FONT class=misspellet face=fmisspellt spellChecked="true">REJ</FONT>}   or a table in the form of 

    <FONT class=misspellet face=fmisspellt spellChecked="true">relvar</FONT> <FONT class=misspellet face=fmisspellt spellChecked="true">OrderStatus</FONT> heading {<FONT class=misspellet face=fmisspellt spellChecked="true">orderStatus</FONT> char[3]} <FONT class=misspellet face=fmisspellt spellChecked="true">pk</FONT> {<FONT class=misspellet face=fmisspellt spellChecked="true">orderStatus</FONT>} = relation {<FONT class=misspellet face=fmisspellt spellChecked="true">tuple</FONT>{'SUB'}, <FONT class=misspellet face=fmisspellt spellChecked="true">tuple</FONT>{ 'PRC'} .....}   .... Both these allow the code to look like  '  WHERE ORDER STATUS = 'REJ', which I think is nicer than 'WHERE ORDER STATUS = 2' 

     I then deal with translation in the application GUI, I quite like this approach and would advocate it if the situation is right (IE, you have a single application technology being used)....this has just <FONT class="" face=fmisspellt spellChecked="true">happened</FONT> to work quite <FONT class="" face=fmisspellt spellChecked="true">nicely</FONT> with the past couple of apps I've worked on ... I have seen many apps that use a hybrid of all three, as well as some very elaborate versions of #1

    A friend of mine works on a boxed product, which does have multiple clients version using a range of technologies from .net mobile framework running apps on PDA's, to thick client, COM+ / <FONT class="" face=fmisspellt spellChecked="true"><FONT class="" face=fmisspellt spellChecked="true"><FONT class="" face=fmisspellt spellChecked="true"><FONT class=misspellet face=fmisspellt spellChecked="true">VB6</FONT>,</FONT> an <FONT class=misspellet face=fmisspellt spellChecked="true">asp.net</FONT> front end as well as an older unix green screen front end (which <FONT class="" face=fmisspellt spellChecked="true">apparently</FONT> many client still like) ... With the hybrid of technologies they couldn't take advantage (easily) of any globalisation framework, so run a hybrid similar to that of #1 and #2, along</FONT> with </FONT>a 'localisation' editor, so customers can <FONT class="" face=fmisspellt spellChecked="true">customise</FONT> labels themselves.

     


     



  • You could always take the approach used by the system I've just inherited, which has to work in English, German and Swedish.

    Much of the presentation (it's a website) is done through XSLs. Each language has its own directory for these files, and the files are duplicated and then translated into the different languages. There are also duplicated js files full of text to display in alerts. Now there's a nice example of how not to do it...

    The other site I work on is pretty much like your #1. All the text for the site is in the DB, and a function is available to pull it out based on a languageID and a key. Far more sensible.



  • @valerion said:

    Much of the presentation (it's a website) is done through XSLs. Each language has its own directory for these files, and the files are duplicated and then translated into the different languages. There are also duplicated js files full of text to display in alerts. Now there's a nice example of how not to do it...

    Ohh eck, you've just brought back some very very painfull memories... you'd be suprised at the position on the web of some companies (i.e they should no better as they've been doing this a looong time in web terms) I've worked at who did that (and in one case, building a whole new 'inner effect platform' right this moment that handles globalisation in the way you described.. I left)...



  • Well thanks for all the answers.  I'll look at some of the methods mentioned here for later projects.  At this point in the design cycle it's past the point of no return for globalization. We've got about 5 months left until we go live with both apps, and I'd say we have about 6 months of work on both without redesigning anything :)

     



  • I agree with ammoQ that you probably need both mechanisms, although I choose to embed my static translations in the application resource files rather than the DB.  I use .Net so this is relatively trivial to accomplish. My reporting solution is also.Net based so I can reuse the same resource files. 

    Dynamic data is another story altogether, because that's typically user entered.   I haven't done a lot of that in terms of OLTP but, SAP which I've worked with typically does this via a secondary description/long text table keyed of the internal object identifier and language.  Whether it is one single table or not, I don't recall.

    From a data warehouse perspective, all the best practices I've seen is to have separate column sets on the dimension tables corresponding to the language.  For example, the date dimension would look like:

    DimDates
    Id int
    DateKey datetime
    EnglishDayName nvarchar(20)
    EnglishMonthName nvarchar(20)
    FrenchDayName nvarchar(20)
    FrenchMonthName nvarchar(20)
     


  • @Grovesy said:

    @valerion said:

    Much of the presentation (it's a website) is done through XSLs. Each language has its own directory for these files, and the files are duplicated and then translated into the different languages. There are also duplicated js files full of text to display in alerts. Now there's a nice example of how not to do it...

    Ohh eck, you've just brought back some very very painfull memories... you'd be suprised at the position on the web of some companies (i.e they should no better as they've been doing this a looong time in web terms) I've worked at who did that (and in one case, building a whole new 'inner effect platform' right this moment that handles globalisation in the way you described.. I left)...

     

    At least for you they're just painful memories, not a painful reality... 



  • no was meant to be know... must have had a brain freeze there...



  • I guess my question would be "why is this a database problem"?   Apps are localized all the time and do it with resource files, or satellite assemblies, etc.   Making extra calls against the DB in order to just display the GUI is a heavyweight solution to a lightweight problem.

    -cw



  • @CodeWhisperer said:

    I guess my question would be "why is this a database problem"?  

     

    Convenience. Developers are used to store things in the database. Tasks like making the frontend for the translator become simple standard tasks when the i18n data is in the database like everything else. 



  • I would argue, if you were using a modern framework like .net (I'm sure this applies to Java as well?).. It is far easier, <FONT class="" face=fmisspellt spellChecked="true">convienent and standard</FONT> to store your localisations in the resources. This gives you a consistent way of coding throughout your application. The process for setting a label on a form is the same as if you formatting numbers and dates for the current windows locale.

    I do buy the argument though, that many <FONT class="" face=fmisspellt spellChecked="true">developers</FONT> are much more used to storing localisation information in the database (it is a valid approach for many situation) as for many of us <FONT class="" face=fmisspellt spellChecked="true"><FONT class=misspellet face=fmisspellt spellChecked="true">VB6</FONT>/COM+/classic ASP developers who have moved over to .net it used to be the easiest option to do. </FONT>

    On the note of is localisation is a databases problem, I've <FONT class="" face=fmisspellt spellChecked="true">always</FONT> thought that if you are going to do localisation, then it should  be kept away from the actual 'main database', in my experience I've found databases far out live applications (case in example, a large UK bank that uses <FONT class=misspellet face=fmisspellt spellChecked="true">DB2</FONT> for it's pension database, it has a sound database schema modelled around pensions and pensions really haven't moved along much since it was built 20 years ago. Application side, there's been multiple incarnations of COBOL, <FONT class=misspellet face=fmisspellt spellChecked="true">C</FONT>, <FONT class=misspellet face=fmisspellt spellChecked="true">C</FONT>++, SmallTalk, Java, <FONT class=misspellet face=fmisspellt spellChecked="true">VB6</FONT> and .net apps talking to and processing this database).

    Anyhow, what we are localising is the application a users uses to view and manipulate the data, so to move localisation into the database ties the database to an incarnation of the application using it. Often, other applications start to use the database and have <FONT class="" face=fmisspellt spellChecked="true">their</FONT> own localisation <FONT class="" face=fmisspellt spellChecked="true">requirements</FONT> which need <FONT class="" face=fmisspellt spellChecked="true">different</FONT> translations  for example <FONT class=misspellet face=fmisspellt spellChecked="true">app#1</FONT> could be a back office internal <FONT class=misspellet face=fmisspellt spellChecked="true">VB6</FONT> <FONT class="" face=fmisspellt spellChecked="true">win forms</FONT> app <FONT class="" face=fmisspellt spellChecked="true">and the <FONT class=misspellet face=fmisspellt>app#2</FONT> a customer facing web portal... </FONT>

    The scenario often then continues, the old <FONT class=misspellet face=fmisspellt spellChecked="true">VB6</FONT> <FONT class=misspellet face=fmisspellt spellChecked="true">winforms</FONT> application is <FONT class="" face=fmisspellt spellChecked="true">retired</FONT> and replaced with an internal intranet site (<FONT class=misspellet face=fmisspellt spellChecked="true">app#3</FONT>), both have to run <FONT class="" face=fmisspellt spellChecked="true">simultaneously for a period of time</FONT> and the database is further hacked around to support all three.

    This problem just isn't about application localisation, it's generally about allowing the applications that use the database to 'pollute' it's schema and cause a general <FONT class="" face=fmisspellt spellChecked="true"><FONT class="" face=fmisspellt spellChecked="true">maintenance</FONT> </FONT>headache in years time.. people come, people go people forget what entire sets of tables do (which could long be redundant) and we end up with the database <FONT class="" face=fmisspellt>equivalent</FONT> of a 'big ball of mud'

     





  • Well, can't speak for the web app, it's a CF app (and I'm not a web guy).  The windows app is in .Net.   That being said, for GUI stuff such as labels, message boxes, etc, the .Net resources are being used. Where the database comes in, is the drop down.  It's a quotation and configuration tool, so we might have a drop down that lists types of metal used for the given conductor.  E.g. 100A - Copper | 200A - Stainless Steel | 200A - Aluminum | 300A - Copper etc.  These dropdowns are based on previous input parameters.  So what happens is if the customer says he needs a system that can handle 75A, we'll pull all the parts information for all the conductors that can handle 75A.  This includes a description.  It is this description that is in the datbase.

    In this case, it is kept in the database because all the other parts information is in the database.  It would not make a lot of sense to define a part in a database, and then look that description up in a resource table (2 look ups for a single part instead of 1). 

    Other globalization information, such as currency and whether a region is imperial or metric are handled by the .Net framework.

     One poster mentioned something about the possibility of needing several tables because of the various translations of Code.  That doesn't seem to be a problem here, as the language involved in this application is specific to the market we're working in.  It's a somewhat technical jargon.  So "Condcutor Bar" will always be translated in context so that it doesn't come out meaning "A building where alcohol is served to people who lead music" :)  But if this was a more vague, or general application, I can see the danger you are talking about.



  • @ammoQ said:

    Convenience. Developers are used to store things in the database. Tasks like making the frontend for the translator become simple standard tasks when the i18n data is in the database like everything else. 
     

    That made me wonder what would happen when the app can't connect to the database. Would the user get one of the fabulous "translation string not found [ok][cancel]" popups? :)

    Then I saw that the .NET localization methods are used for that. Thank god :) 



  • We can avoid all translation issues if we move away from words and toward culturally-unbiased images. Instead of an error saying "File not found", you have a picture of a little gender non-specific humanoid figure with its hands raised in defeat. Localization would be nothing more than a slider to pick how pink or brown the figure is (or whether it is sitting in a wheelchair for accessibility). I don't see how this could possibly go wrong.



  • @Welbog said:

    a little gender non-specific humanoid figure with its hands raised in defeat
     

    That might be fine for everywhere but France, where defeat is the cultural heritage and is a sensitive issue. And what happens when we join the Galactic Federation, where there'll undoubtedly be species where raised hands is the equivalent of "your mother was a hamster and your father smells of elderberries"? 

    The obvious solution is to become a military ultra-super-mega-power, conquer the entire Universe, and eliminate all languages except English. Then we can get down to the real business of arguing if we should support Canadian, British, Australian, New Zealand, and American versions of "English". You say color, I say colour.



  • @MarcB said:

    @Welbog said:

    a little gender non-specific humanoid figure with its hands raised in defeat
     

    That might be fine for everywhere but France, where defeat is the cultural heritage and is a sensitive issue. And what happens when we join the Galactic Federation, where there'll undoubtedly be species where raised hands is the equivalent of "your mother was a hamster and your father smells of elderberries"? 

    The obvious solution is to become a military ultra-super-mega-power, conquer the entire Universe, and eliminate all languages except English. Then we can get down to the real business of arguing if we should support Canadian, British, Australian, New Zealand, and American versions of "English". You say color, I say colour.

    That's the stupidest idea ever, bless your heart.

     /obviously I support the Southern US dialect :)


Log in to reply