Lack of Evidence



  • We have a system that stores a whole bunch of information. One of the tables stores an xml representation of this-is-what-happened. 

    This was sucking up a whole lot of disk space, so someone decided that we don't actually need the list of everything that happened. Instead, let's just store the list of everything that did NOT happen.

    Wait a minute, there are about 20K things we test for. Typically, only 2 or 3 events might have occurred. So now, in order to save about 1K of explanation, they want to store 20K rows of thing[i]-didn't-happen.

    When I explained that this will increase the storage consumption (whereas the purpose of the exercise is to decrease it), they answered: but we want this information!

    Why? You never wanted it before. If we do this, your monthly storage costs will skyrocket!

    Oh, we don't want that.

    Then this is probably not the best approach.

    Hmmm, maybe we need to think about this some more.

    Time for the old clue-bat...


  • Considered Harmful

    The list of what did happen and the list of what did not happen contain the same information, one is the inverse of the other, so either datum is sufficient to yield the answer to either question. You do have the list of what didn't happen, but you're storing it in a compressed format (XML! Hahaha. Sorry.).

    Seriously though, is there a more compact way to store the same data? XML does sound a bit wasteful.



  • @snoofle said:

    We have a system that stores a whole bunch of information.

     Maybe I'm just having a bad day, but I found this line to be particularly amusing.



  • @joe.edwards said:

    Seriously though, is there a more compact way to store the same data? XML does sound a bit wasteful.
     

    <comment>

    <content>

    <paragraph>

    What ever can you mean?

    </paragraph>

    </content>

    </comment>



  • On one of my programs had an issue that they had a ton of very small xml files and their file sizes were noticeably smaller than the block size, so they were taking even more space than initially modeled.

    When dealing with large xml files even the most basic compressions can greatly reduce their file size.



  • @snoofle said:

    Time for the old clue-bat...

    Apply directly to the forehead!



  • All: my original proposal was to compress the xml in the db. It was rejected in favor of dropping the table and creating a utility that could reconstruct the xml on-the-fly. Unfortunately, not all the data needed to do that is stored anywhere. To that end, they started "designing" away, and have come up with requirements that would end up generating more than 10 times the data (and by extension, space) previously required to store the xml, and would also require doing lots of the boolean logic twice, effectively doubling the run time.

    I am going to put a stop to this lunacy shortly.

     



  • @joe.edwards said:

    The list of what did happen and the list of what did not happen contain the same information, one is the inverse of the other, so either datum is sufficient to yield the answer to either question.
     

    They are inverse now, but what if in the future the list of what *could* happen changes?

     


  • Considered Harmful

    @Evilweasil said:

    @snoofle said:

    Time for the old clue-bat...

    Apply directly to the forehead!

    Or the MK19.

  • Considered Harmful

    @sprained said:

    They are inverse now, but what if in the future the list of what could happen changes?

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.



  • Need to give a nasty example of what could logically happen if you have to log everything that didn't happen:

    (Praying to the gods^h^h^h evil daemons of CS that the formatting comes out ok...)

    <LoggedEventsNotHappening>
       <EventNotHappening>
          <EventName>MonkeysFlewOuttaMyButt</EventName>
          <EventNotHappeningStatus>True</EventNotHappeningStatus>
       </EventNotHappening>
       <EventNotHappening>
          <EventName>SunWentSuperNova</EventName>
          <EventNotHappeningStatus>True</EventNotHappeningStatus>
       </EventNotHappening>
       <EventNotHappening>
          <EventName>TwoPlusTwoEqualsFive</EventName>
          <EventNotHappeningStatus>True</EventNotHappeningStatus>
       </EventNotHappening>
       <EventNotHappening>
          <EventName>RainInSpainOccurringPredominantlyOusideThePlainTrue</EventName>
          <EventNotHappeningStatus>True</EventNotHappeningStatus>
       </EventNotHappening>
       <EventNotHappening>
          <EventName>SnoofleCompanyGetAClue</EventName>
          <EventNotHappeningStatus>False</EventNotHappeningStatus>
       </EventNotHappening>
    </LoggedEventsNotHappening>



  • @Anketam said:

    …file sizes were noticeably smaller than the block size, so they were taking even more space than initially modeled.


    They need to start storing those files on a volume with a decent file system. For example, NTFS.



  • @havokk said:

    They need to start storing those files on a volume with a decent file system. For example, NTFS.

    I believed you until that last word.



  • @cdosrun said:

    @snoofle said:

    We have a system that stores a whole bunch of information.

     Maybe I'm just having a bad day, but I found this line to be particularly amusing.

    You have to kind of know snoofle's posts for that one. He means that the number of rows has like nine or so digits. When he says a whole bunch, he actually means a LOT.


  • @toon said:

    @cdosrun said:

    @snoofle said:

    We have a system that stores a whole bunch of information.

     Maybe I'm just having a bad day, but I found this line to be particularly amusing.

    You have to kind of know snoofle's posts for that one. He means that the number of rows has like nine or so digits. When he says a whole bunch, he actually means a LOT.
      Mind you, there are systems that don't store any information...

     



  • @Watson said:

    @toon said:

    @cdosrun said:

    @snoofle said:

    We have a system that stores a whole bunch of information.

     Maybe I'm just having a bad day, but I found this line to be particularly amusing.

    You have to kind of know snoofle's posts for that one. He means that the number of rows has like nine or so digits. When he says a whole bunch, he actually means a LOT.
      Mind you, there are systems that don't store any information...

     

    Nobody's stating the contrary in this thread, AFAICT?



  • @snoofle said:

    Instead, let's just store the list of everything that did NOT happen.

    When the times comes to write that book, it'll be easier for you to write the WTFs that did NOT happen.

     



  • @joe.edwards said:

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.
     

    Not really. Imagine in the old data there are potentially 6 things that could happen, and you store the things that didn't (1,3,4,5,6). Now you add another 3 events bringing the potential total up to 9. Without also storing what could have happened at that point in the past you won't know if the previous store of data meant that just event 2 happened, or event 2,7,8, & 9 did.



  • @ASheridan said:

    @joe.edwards said:

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.
     

    Not really. Imagine in the old data there are potentially 6 things that could happen, and you store the things that didn't (1,3,4,5,6). Now you add another 3 events bringing the potential total up to 9. Without also storing what could have happened at that point in the past you won't know if the previous store of data meant that just event 2 happened, or event 2,7,8, & 9 did.

    Which has the corollary that if an extra thing-that-can-happen is added, you have a snoofleload of data records that you need to update...



  • @toon said:

    snoofleload
    I like this amount. Roughly what would it equate to in other terms?



  • @ASheridan said:

    @toon said:

    snoofleload
    I like this amount. Roughly what would it equate to in other terms?

     

    About $50k in consulting fees.

     



  • @snoofle said:

    I am going to put a stop to this lunacy shortly.

    sound of gun cocking

    FTFY


  • Considered Harmful

    @ASheridan said:

    @joe.edwards said:

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.
     

    Not really. Imagine in the old data there are potentially 6 things that could happen, and you store the things that didn't (1,3,4,5,6). Now you add another 3 events bringing the potential total up to 9. Without also storing what could have happened at that point in the past you won't know if the previous store of data meant that just event 2 happened, or event 2,7,8, & 9 did.


    @joe.edwards said:
    @sprained said:
    They are inverse now, but what if in the future the list of what could happen changes?

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.



  • @joe.edwards said:

    @ASheridan said:

    @joe.edwards said:

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.
     

    Not really. Imagine in the old data there are potentially 6 things that could happen, and you store the things that didn't (1,3,4,5,6). Now you add another 3 events bringing the potential total up to 9. Without also storing what could have happened at that point in the past you won't know if the previous store of data meant that just event 2 happened, or event 2,7,8, & 9 did.

    @joe.edwards said:
    @sprained said:
    They are inverse now, but what if in the future the list of what *could* happen changes?

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.

    Thanks, I never would have gotten your point if you hadn't have pasted it twice in the same comment. It still doesn't mean it works out though. You'd still need to keep track of when the new events were added in order to correctly calculate what happened if you are only storing the events that didn't occur, because the new events wouldn't be in the list of old ones that didn't occur.

     


  • ♿ (Parody)

    @ASheridan said:

    Thanks, I never would have gotten your point if you hadn't have pasted it twice in the same comment. It still doesn't mean it works out though. You'd still need to keep track of when the new events were added in order to correctly calculate what happened if you are only storing the events that didn't occur, because the new events wouldn't be in the list of old ones that didn't occur.

    But even after pasting it twice, you haven't understood it. Third time's a charm?

    @joe.edwards said:

    @sprained said:
    They are inverse now, but what if in the future the list of what could happen changes?

    Then the list of what did happen remains correct (and can still be used to figure out what did not happen), while the list of what did not happen is retroactively incomplete.


  • Considered Harmful

    I assume new events didn't happen because they didn't exist, so they couldn't have happened.

    My quoting fail meant to capture my tags about triggers, but I screwed that up.



  • @joe.edwards said:

    I assume new events didn't happen because they didn't exist, so they couldn't have happened.

    My quoting fail meant to capture my tags about triggers, but I screwed that up.

     

    What I'm getting at though is that you'll have to track when you added the new events, in order to correctly figure out at a later date what events weren't available and therefore couldn't have happened in the past.

     


  • Considered Harmful

    @ASheridan said:

    @joe.edwards said:

    I assume new events didn't happen because they didn't exist, so they couldn't have happened.

    My quoting fail meant to capture my tags about triggers, but I screwed that up.

     

    What I'm getting at though is that you'll have to track when you added the new events, in order to correctly figure out at a later date what events weren't available and therefore couldn't have happened in the past.

     

    All events that are in {possible-events} that aren't in {happened-events} must be in {didn't-happen-events}. It doesn't matter when they were added, if they're not in {happened-events} then they didn't happen. (Here I'm working on the implicit assumption that the kind of events in {possible-events} are not things that occur naturally, but would have to be explicitly invoked. This may be a bogus assumption but I can't know one way or the other.)



  • @joe.edwards said:

    All events that are in {possible-events} that aren't in {happened-events} must be in {didn't-happen-events}. It doesn't matter when they were added, if they're not in {happened-events} then they didn't happen. (Here I'm working on the implicit assumption that the kind of events in {possible-events} are not things that occur naturally, but would have to be explicitly invoked. This may be a bogus assumption but I can't know one way or the other.)
     

    Week 1. Available events are 1,2,3,4,5,6. Event 2 happened, so the stored data becomes: 1,3,4,5,6.

    Week 2. Added events 7, 8, 9. Now, what event happened in week 1 without knowing when 7,8 and 9 were added? Was it just event 2, or was it 2,7,8,9?


  • ♿ (Parody)

    @ASheridan said:

    @joe.edwards said:
    All events that are in {possible-events} that aren't in {happened-events} must be in {didn't-happen-events}. It doesn't matter when they were added, if they're not in {happened-events} then they didn't happen. (Here I'm working on the implicit assumption that the kind of events in {possible-events} are not things that occur naturally, but would have to be explicitly invoked. This may be a bogus assumption but I can't know one way or the other.)

    Week 1. Available events are 1,2,3,4,5,6. Event 2 happened, so the stored data becomes: 1,3,4,5,6.

    Week 2. Added events 7, 8, 9. Now, what event happened in week 1 without knowing when 7,8 and 9 were added? Was it just event 2, or was it 2,7,8,9?

    That's what he said!



  • @ASheridan and JoeEdwards: You guys are both thinking way too hard. Nobody here thought about any of this - at all!

     


  • Considered Harmful

    @ASheridan said:

    @joe.edwards said:

    All events that are in {possible-events} that aren't in {happened-events} must be in {didn't-happen-events}. It doesn't matter when they were added, if they're not in {happened-events} then they didn't happen. (Here I'm working on the implicit assumption that the kind of events in {possible-events} are not things that occur naturally, but would have to be explicitly invoked. This may be a bogus assumption but I can't know one way or the other.)
     

    Week 1. Available events are 1,2,3,4,5,6. Event 2 happened, so the stored data becomes: 1,3,4,5,6.

    Week 2. Added events 7, 8, 9. Now, what event happened in week 1 without knowing when 7,8 and 9 were added? Was it just event 2, or was it 2,7,8,9?

    Um? Stored data in week 1 is "2". Week 2, we look at data and see "2"; events happened: 2, events that didn't happen: 1, 3, 4, 5, 6, 7, 8, 9.

    You seem to be talking about the opposite storage method, which as you quoted earlier, I already said would become incomplete without some kind of trigger to update stale data.



  • @joe.edwards said:

    All events that are in {possible-events} that aren't in {happened-events} must be in {didn't-happen-events}. It doesn't matter when they were added, if they're not in {happened-events} then they didn't happen. (Here I'm working on the implicit assumption that the kind of events in {possible-events} are not things that occur naturally, but would have to be explicitly invoked. This may be a bogus assumption but I can't know one way or the other.)

    If a thing does not exist, it cannot exist as an element of a set. So a thing might come into existence today, and can be an element of {happened-events}, {didn't-happen-events} or {things-that-aren't-blue}, or any other sets you can think of, but either way, yesterday it was in none of those sets. Even if all things have to be in one of those sets.

    For instance, the main headline for next week's newspaper can't have an A in it, and it also can't have no A's in it, even though all headlines either do, or don't, have at least one A in them.


  • ♿ (Parody)

    @toon said:

    For instance, the main headline for next week's newspaper can't have an A in it, and it also can't have no A's in it, even though all headlines either do, or don't, have at least one A in them.

    Sure, but having "no A's" isn't the same as examining the set and determining that it didn't have any A's. The key assumption for snoofle's case is that the set of events available to be described fully enumerates what can happen at any given time. It's a reasonable assumption, but could also reasonably be false.



  • @snoofle said:

    All: my original proposal was to compress the xml in the db. It was rejected in favor of dropping the table and creating a utility that could reconstruct the xml on-the-fly. Unfortunately, not all the data needed to do that is stored anywhere. To that end, they started "designing" away, and have come up with requirements that would end up generating more than 10 times the data (and by extension, space) previously required to store the xml, and would also require doing lots of the boolean logic twice, effectively doubling the run time.

    I am going to put a stop to this lunacy shortly.

     

    Wait... you're storing XML in a database table and not already compressing it?!?

    OK, that's your problem right there.  That should have been in place since the beginning!

     



  • @snoofle said:

    I am going to put a stop to this lunacy shortly.
    This business will get out of control. It will get out of control and we'll be lucky to live through it.

     



  • @boomzilla said:

    @ASheridan said:
    Week 1. Available events are 1,2,3,4,5,6. Event 2 happened, so the stored data becomes: 1,3,4,5,6.

    Week 2. Added events 7, 8, 9. Now, what event happened in week 1 without knowing when 7,8 and 9 were added? Was it just event 2, or was it 2,7,8,9?

    That's what he said!

     

    That made me lol.

     


  • Considered Harmful

    I kind of feel that XML in a database violates 1NF, specifically the rule that says that each field should contain only one value. It seems like it would be easier to query the data and also more compact if each event-that-happened had its own row.



  • @boomzilla said:

    @toon said:
    For instance, the main headline for next week's newspaper can't have an A in it, and it also can't have no A's in it, even though all headlines either do, or don't, have at least one A in them.

    Sure, but having "no A's" isn't the same as examining the set and determining that it didn't have any A's. The key assumption for snoofle's case is that the set of events available to be described fully enumerates what can happen at any given time. It's a reasonable assumption, but could also reasonably be false.

    I meant it the other way round: I didn't mean that next week's headline is a set that doesn't have any A's in it, what I meant was that the set of "all things without A's" doesn't include next week's headline. And once next week's paper gets printed, and its headline doesn't have an A, then that does not retroactively mean that "all headlines without A's" includes the headline today.

    Don't get me wrong, I understand that if a possible event doesn't exist, it can't occur. But a possible event that does not exist, can't NOT occur either. Other folks have made the point that you have to keep track of when the things-that-can-happen came into existence. I and others made the point that you need to update the whole snoofleload™ of records if you think of a new event (but I like the other idea better). But I think most of us are agreed that, the instant the company snoofle works for, implement a thing that can conceivably happen, the data in the entire table is self-contradictory, and therefore useless, unless you take either of those two measures.

    If the folks over at that company list things that happened, and now they want to list things that didn't happen, then they have a complete list of things that can happen, or they wouldn't have come up with that solution, because it wouldn't make any sense. Snoofle's objection wasn't that it doesn't make sense, but that it takes a shitload of storage. So I've been assuming they know everything that can happen. I think the opposite assumption might be reasonable, but not in this instance, given the OP.

    Of course, my whole shtick pivots upon the notion that new things-that-can-happen can get thought of. That might reasonably not be the case.


  • Considered Harmful

    @toon said:

    Don't get me wrong, I understand that if a possible event doesn't exist, it can't occur. But a possible event that does not exist, can't NOT occur either.

    This sort of makes sense, but it's not how I was conceptualizing the problem. There is a distinction between "things that didn't happen" and "things that didn't happen that might have happened". I was addressing the former, you are talking about the latter. I don't know which they want.

    @toon said:

    Of course, my whole shtick pivots upon the notion that new things-that-can-happen can get thought of. That might reasonably not be the case.

    With ~20K things-that-can-happen it's not unreasonable that some might have been forgotten and still need to be added.


  • ♿ (Parody)

    @toon said:

    But a possible event that does not exist, can't NOT occur either.

    I would say that an event that does not exist cannot help but to not occur. For instance, just yesterday, I didn't determine the quotient of 1 and zero.

    @toon said:

    But I think most of us are agreed that, the instant the company snoofle works for, implement a thing that can conceivably happen, the data in the entire table is self-contradictory, and therefore useless, unless you take either of those two measures.

    Yes, if you're recording things that didn't happen, this is trivially true. No one has been arguing this, though a few people have been very vehement about restating this.

    @toon said:

    Of course, my whole shtick pivots upon the notion that new things-that-can-happen can get thought of.

    The more interesting thing is whether we're talking about genuinely new things happening or just things that we're newly interested in recording.



  • @boomzilla said:

    @toon said:
    But a possible event that does not exist, can't NOT occur either.
    I would say that an event that does not exist cannot help but to not occur. For instance, just yesterday, I didn't determine the quotient of 1 and zero.

    That's certainly a good point. Although I'd argue that you didn't not determine it either (since there's no such thing as "determining the quotient of 1 and zero"). I know it's a silly point to make, but I do think it's a valid, and in the case of snoofle's app, relevant one.

    @boomzilla said:
    @toon said:
    Of course, my whole shtick pivots upon the notion that new things-that-can-happen can get thought of.

    The more interesting thing is whether we're talking about genuinely new things happening or just things that we're newly interested in recording.

    I guess I meant the former, but for the purposes of making my point, it's all the same. From the application's point of view, there's arguably no practical difference between the two.



  • @joe.edwards said:

    You seem to be talking about the opposite storage method, which as you quoted earlier, I already said would become incomplete without some kind of trigger to update stale data.
    Yes, I had my wires crossed, apologies!


  • Considered Harmful

    @toon said:

    Although I'd argue that you didn't not determine it either (since there's no such thing as "determining the quotient of 1 and zero").

    Did you determine the quotient of 1 and zero? FILE_NOT_FOUND



  • @Zylon said:

    @snoofle said:

    I am going to put a stop to this lunacy shortly.
    This business will get out of control. It will get out of control and we'll be lucky to live through it.

     

    You arrogant ass! You've killed [i]us[/i]!


Log in to reply