@RTapeLoadingError said:
The apps we deal with use lookups and this works well in general as long as sufficient planning and modelling occurs when the system is set up. The environentsalso lend themselves to fairly static 'categories' so there's not a massive admin overhead.
In my experience, planning is overrated. Ask any cook, and he'll tell you that the best chef is not the guy with the best recipes, but the guy that can make something tasty with whatever is available in the fridge. Huge ERP vendors spent zillions of dollars in planning and design, but with most of those products, the users have to change their habits and adap to the system or the implementation will fail. And guess what is the last thing a user wants? Yeah, change.
Now ask any ERP vendor and they will tell you that their system can be completely customized. And this is true. The catch is that customization is not something that can be easily applied to a single system used by five thousand users. So while each customer can definitely get something custom-made, they still have to cater to the needs of many groups and departements with conflicting priorities. Back to square one.
As for having just a few static categories, this is always the case at first. Then someone needs to filter some stuff, and they start to shove metadata in whatever textbox they can find (just look at SRV records in a Windows Domain DNS). And pretty soon, they use that texbox as a subcategory, or a type, or a class, or a label, or whatever - and they do that without consulting other departments first. So before you notice it, there are tons of data already polluted with multiple schemes of subclassification, and it's a different beast when you cannot go back to a blank drawing board. It's like becoming mayor of Detroit or president of Haiti - you inherit a lot of problems that could have been avoided but it's too late and you have to deal with it.
My advice: always fight the urge to see categories as a property of something (1:N). Categories are relationships (N:N). As long as this basic design is respected, the sky is the limit, patch-wise; you can always correlate, filter, join your way back to sanity.
@RTapeLoadingError said:
Serious questions
- Are you suggesting that a "free text with search" option would be a good fit for your environment?
- If so, do you think that there would be a barrier to people searching for an existing category that fits because it's easier to just type "Laptot" and be done with it? Are therepositive/negative implications for the end-user for mis categorising items?
- How would you go about making it as easy for the user to do the right thing than what they currently do?
As someone who has been part of many data migrations from free text systems to lookup table systems I'd be interested to hear your ideas.
The situation I described occurs in a big ERP where countless people logon to the system to update the inventory. Unfortunately, there is no dropdown list for them to select a category, only a "search" textbox which returns exact matches only (no wildcard, binary collation so case sensitive). When there is no match, the user can click on "Create" to add a category. This leads to the quality of data shown in my original post.
This situation is out of control, and many workarounds have been tried over the last few years, such as maintaining an Excel file with categories, or having just a few "power users" do the data entry. None of this worked. What I came up with was a simple scorecard showing the popularity of the various categories in the system. Now the users have to lookup a category on a web page before creating it, and if their category has a low score, there is a crude soundex match done to show similar but more popular categories. This won't really fix the mess but at least it will slowdown the progression while the power users are using another web page to improve data quality. On this web page a matcher use the same strategy (show the least popular categories that have a soundex match with a more popular category) and the power users can merge categories visually, fixing hundreds of records at a time. Unfortunately, like any batch solution, this tool will be faced with a hockey stick kind of ROI graph, making it more and more time-consuming to fix less and less data, but that's the nature of the beast. 80/20 and all that.
As for the user incentive, at first that was an issue because all users have the same logon in the ERP (another WTF), making it difficult to find who did what. The solution I found was to pull data quality metrics and slice them by date/hour based on the audit field. This allows managers to view on a daily basis how the inventory is doing, and some of them have been using this info to create performance dashboards. These metrics can also track what proportion of the data entry has been previously validated via the web page lookup (where individual logons are used).
So to answer your questions: in an organization where there is no taxonomy authority, yes, a free text with search would be a good solution, as long as the results can be quantified. A free-form entry with a good matcher will at least let the user see if his entry is mainstream or completely unique. It's like using Google to do a quick spell-check - most of the time the number of results (and the auto-complete) will provide you with the correct spelling. And what is great with that kind of solution is that it allows the artificial building of peer pressure - users are lazy and will always try to find the easy way but at the office nobody wants to stick out of the crowd.
Now a disclaimer: a lot of this jives with the theory behind NoSQL products, such as Cassandra. But IRL there are huge problems associated with this kind of technology (ie: Digg, Reddit) so it's not a silver bullet.