Seagate hard drives have 120% failure rate



  • Backblaze - the cloud backup company - continues to share their drive experience.

    They measure the annual failure rate (AFR) rather than MTBF because that's an easier number to understand. They count as a failure anytime they have to replace a drive.

    The 1.5 TB Seagate Barracuda 7200 and the Seagate Barracuda Green have the highest AFRs, 25.4% and 120% respectively. And in general the Seagate drives have proven to be less reliable overall.

    The article never explains how you can have a 120% failure rate, however some commenters offerred their explanation:

    You install 100 drives. They all fail and you send them for repair/replacement. You receive 100 new drives and 20 of them fail.

    120% failure rate.
    															<div class="author">&nbsp;</div><div class="author">bluebeard66</div>
    									<time>21 January, 2014 16:41</blockquote></time>
    	</div>


  • @El_Heffe said:

    You install 100 drives. They all fail and you send them for repair/replacement. You receive 100 new drives and 20 of them fail.



    120% failure rate.

    That seems like a correct explanation if we take into account that it's the annual failre rate. Which would mean that on average, every year, all your drives will fail, and 20% of your replacement drives will also fail.

    Also, that scares me, I have a Seagate HDD.



  • @Maciejasjmj said:

    @El_Heffe said:
    You install 100 drives. They all fail and you send them for repair/replacement. You receive 100 new drives and 20 of them fail.



    120% failure rate.

    That seems like a correct explanation if we take into account that it's the annual failre rate. Which would mean that on average, every year, all your drives will fail, and 20% of your replacement drives will also fail.

    Also, that scares me, I have a Seagate HDD.

    Seems correct to me too. We're talking "drives that failed over a period of time", not "how many drives have failed". Suppose you have 100 drives, and exactly one fails each day. So at the end of the first day, 1% failed; at the end of the 2nd day, 2% have failed, and so on. At the end of the 100th day, you have 100% failed. What happens on day 101? Well, if you replaced drive #1 when it failed, and drive #2 when it failed, then at the end of day 101 you have 101 failed drives. Is that 101 out of 101, or 101 out of 100? Depends on how you're counting. I'd look at the number of slots I have (100) and the number of failed drives in my garbage pile (101) and call it 101%.



  •  As a side note, the 1.5TB was notoriously bad, there were even recalls. So my bigger concern is why any significant shop would be using them!



  • @DrPepper said:

    Seems correct to me too. We're talking "drives that failed over a period of time", not "how many drives have failed".
    No, the article specifically says they are measuring the percentage of drives that have to be replaced annually (Annual Fail rate).

    Number of failed drives divided by number of drives purchased = failure percentage@DrPepper said:

    Suppose you have 100 drives, and exactly one fails each day. So at the end of the first day, 1% failed; at the end of the 2nd day, 2% have failed, and so on. At the end of the 100th day, you have 100% failed. What happens on day 101? Well, if you replaced drive #1 when it failed, and drive #2 when it failed, then at the end of day 101 you have 101 failed drives. Is that 101 out of 101, or 101 out of 100? Depends on how you're counting. I'd look at the number of slots I have (100) and the number of failed drives in my garbage pile (101) and call it 101%.
    Number of "slots" is irrelevant.  They are measuring percentage of drives that have to be replaced. If the drive in a slot is replaced more than once you have to count each replacement as part of your total.

    At day 100 you have bought/installed 100 drives and 100 have been replaced. That's 100%.

    On day 101 you replace another drive. Now you've bought/installed 101 drives and 101 have failed.  That's still 100%.

    On day 102 you replace another drive. Now you've bought/installed 102 drives and 102 have failed.  That's still 100%.

     

     



  • Maybe someone bought 5 hard drives and stole a sixth and they all broke?

    Breaking is about the only thing Seagate hard drives have done for me.



  • I didn't understand the confusion at all. It means 120% of the drives they used from Seagate failed. It's weird when you phrase it like that, but in the context of a failure rate it makes perfect sense.


  • Discourse touched me in a no-no place

    @El_Heffe said:

    Contrary to popular belief you cannot give more than 100%
    Depends if it is of the maximum sustainable rate or the maximum burst rate.



  • @Ben L. said:

    Maybe someone bought 5 hard drives and stole a sixth and they all broke?

    Breaking is about the only thing Seagate hard drives have done for me.

    I have one Seagate drive in my computer and it's at least 7 years old. I only use it to store stuff temprarily until I decide whether to keep or delete it but it gets used pretty heavily and still works fine.

    The problem with these things is that one way companies cut costs is by cutting back on QA . That means you might get lucky and get one that lasts 7+ years or it might die in a few months. Since it's impossible to predict how long a drive will last I just go for the one with the longest warranty.

     



  • @El_Heffe said:

    @DrPepper said:

    Seems correct to me too. We're talking "drives that failed over a period of time", not "how many drives have failed".
    No, the article specifically says they are measuring the percentage of drives that have to be replaced annually (Annual Fail rate).

    Number of failed drives divided by number of drives purchased = failure percentage

    No, number of drives that failed in a year divided by number of drives in service that year = annual failure rate.

    Whether the replacements were purchased or supplied as warranty replacements, and whether or not the drives that failed were purchased in the given year, does not affect either the number that failed in a year or the number in service for that year. So when you're interpreting an AFR you need to be a little careful; if for example a company buys a thousand identical drives in one hit, then runs them until they fail before replacing them, there's a fairly high chance that their service lives will be quite similar and that the year in which most of them wear out will show an anomalously high AFR.

    Of course, that excuse goes away as soon as the AFR goes over 100%, because that makes it absolutely certain that at least some drives needed replacing at least twice in that year.



  • AFR = 120%  => Expected drive lifetime = 1/1.2 yrs.

    Simples.

     



  • @El_Heffe said:

    They are measuring percentage of drives that have to be replaced. If the drive in a slot is replaced more than once you have to count each replacement as part of your total.

    No, you may not. They don't count. They are replacements.
    @El_Heffe said:
    At day 100 you have bought/installed 100 drives and 100 have been replaced. That's 100%.

    That is only 100% if you don't count the replacements. If you count the replacements there are already 200 (100 original + 100 replacements) disks and thus only 50% failed. But That number is useless. There is only 100 disks in service, so 100 is your total and the failure rate is 100%.
    @El_Heffe said:
    On day 101 you replace another drive. Now you've bought/installed 101 drives and 101 have failed.  That's still 100%.

    No. You installed 201 disks, but you only have installed 100 of them, because the 101 failed ones are not installed any more. So that's either 50.2% or 101%.
    @El_Heffe said:
    On day 102 you replace another drive. Now you've bought/installed 102 drives and 102 have failed.  That's still 100%.

    Again. You installed 202 disks, 100 of them are running and 102 are in the trash. Either 50.5% or 102%.

    But the rate can't stay at 100% no matter how you count it and it has to keep increasing no matter how you count it.


  • ♿ (Parody)

    @El_Heffe said:

    They measure the annual failure rate (AFR) rather than MTBF because
    that's an easier number to understand.

    TRWTF, apparently.

    @AFR on Wikipedia said:

    http://en.wikipedia.org/wiki/Annualized_failure_rate

    It is a relation between the mean time between failure (MTBF) and the hours that a number of devices are run per year.

    So, no problem with 120% AFR with a small enough MTBF. It just gets a little weird intuitively with a short enough MTBF.



  •  What if there are three doors; two of them hide a failed drive, and one has a functioning drive?


  • ♿ (Parody)

    @dhromed said:

     What if there are three doors; two of them hide a failed drive, and one has a functioning drive?

    Only if it has Excel installed. Then you just Noodle and Jam It!



  • @boomzilla said:

    Then you just Noodle and Jam It!
     

    If I jam the noodle, it's not going to be a very good hard drive, and it definitely won't excel.


  • Considered Harmful

    @dhromed said:

     What if there are three doors; two of them hide a failed drive, and one has a functioning drive?

    Is that Seagate's marketing strategy? No, wait; those odds are much better.



  • @dhromed said:

    @boomzilla said:

    Then you just Noodle and Jam It!
     

    If I jam the noodle, it's not going to be a very good hard drive, and it definitely won't excel.


    There are pills to help you get your drive hard.



  • @boomzilla said:

    @AFR on Wikipedia said:
    http://en.wikipedia.org/wiki/Annualized_failure_rate

    It is a relation between the mean time between failure (MTBF) and the hours that a number of devices are run per year.

    So, no problem with 120% AFR with a small enough MTBF. It just gets a little weird intuitively with a short enough MTBF.



    Um... according to that, MTBF would have to be imaginary to get AFR greater than 1.

    They apparently (ab)used the approximation AFR = 100%/(MTBF/8760) which isn't valid for small MTBF.

     




  • I don't think I've seen a Seagate drive last longer than a year since the 1990's.


  • Trolleybus Mechanic

    @mikeTheLiar said:

    There are pills to help you get your drive hard.
     

    Just don't crash your head.



  • @El_Heffe said:

    Backblaze - the cloud backup company - continues to share their drive experience.

    They measure the annual failure rate (AFR) rather than MTBF because that's an easier number to understand. They count as a failure anytime they have to replace a drive.

    The 1.5 TB Seagate Barracuda 7200 and the Seagate Barracuda Green have the highest AFRs, 25.4% and 120% respectively. And in general the Seagate drives have proven to be less reliable overall.

    The article never explains how you can have a 120% failure rate,

     

    Year 1 they buy 100,000 drives of model X

    Year 2 they buy 50,000 drives of model X

    Year 2 they record 60,000 failures on model X drives

    Year 2 AFR = 120%

    Yes, that would be dumb. But I would suspect that's actually what's going on here.

     



  • @mikeTheLiar said:

    @dhromed said:

    @boomzilla said:

    Then you just Noodle and Jam It!
     

    If I jam the noodle, it's not going to be a very good hard drive, and it definitely won't excel.

    There are pills to help you get your drive hard.
     

    I hear everyone's moving to solid state to go faster.

     



  • @mott555 said:

    I don't think I've seen a Seagate drive last longer than a year since the 1990's.
     

    Oh no! My 3 year old primary drive doesn't exist! Oh no!



  • @Zadkiel said:

    Year 1 they buy 100,000 drives of model X

    Year 2 they buy 50,000 drives of model X

    Year 2 they record 60,000 failures on model X drives

    Year 2 AFR = 120%

    Yes, that would be dumb. But I would suspect that's actually what's going on here.

    According to This blog post

    The Seagate Barracuda Green 1.5TB drive, though, has not been doing well. We got them from Seagate as warranty replacements for the older drives, and these new drives are dropping like flies. Their average age shows 0.8 years, but since these are warranty replacements, we believe that they are refurbished drives that were returned by other customers and erased, so they already had some usage when we got them.

    They are running 51 drives, and are replacing them every .8 years. Or installing ~60 new drives each year

    I'm not really sure how any other definition of annual failure rate would make sense, especially if MTBF is under one year.



  • @dhromed said:

    Oh no! My 3 year old primary drive doesn't exist! Oh no!
    My almost 3 years old Seagate drive in my work computer died 2 weeks ago.
    Since I didn't want to wait for a replacement to get here, I bought a 2TB Seagate SSHD, which lasted 4 days.



  • Obsly is 120% because not only all Seagate drives fail, but they induce failure in 20% of your other drives


  • Considered Harmful

    @dhromed said:

    @mott555 said:

    I don't think I've seen a Seagate drive last longer than a year since the 1990's.
     

    Oh no! My 3 year old primary drive doesn't exist! Oh no!


    I'm not convinced "I've never seen x" equates to "x does not exist."



  • @serguey123 said:

    Obsly is 120% because not only all Seagate drives fail, but they induce failure in 20% of your other drives

    Do those other drives therefore have a -20% failure rate? Or would that require that 20% of them shipped with a doubled capacity?


  • Discourse touched me in a no-no place

    @dhromed said:

    Oh no! My 3 year old primary drive doesn't exist! Oh no!
    It may well exist but be living on borrowed time.



  • @Buttembly Coder said:

    @serguey123 said:
    Obsly is 120% because not only all Seagate drives fail, but they induce failure in 20% of your other drives

    Do those other drives therefore have a -20% failure rate? Or would that require that 20% of them shipped with a doubled capacity?


    You deduce 20% from their base failure rate, if said drives are Seagate a reroll will occur and the reduction aura will be applied to your other non-Seagate hardware. If all your hardware is Seagate you will be purified by fire as the witch you are.



  • Ok, maybe:

    1. Bought 5 hard drives from Seagate
    2. One of them failed six times.

  • Considered Harmful

    @Ben L. said:

    Ok, maybe:

    1. Bought 5 hard drives from Seagate
    1. 5 drives merged into 1 anti-drive


  • People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.



  • @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 0.999

    It's no wonder we live in a world where people and governments can't balance their budgets.


  • ♿ (Parody)

    @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    OR. A poorly implemented, named and explained metric!



  • @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.

    There's a 100.00000001% chance that this comment has an invalid probability in it.



  •  There is a 0% probability that "replacement rate" says anything about probability.



  • @boomzilla said:

    @too_many_usernames said:
    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    OR. A poorly implemented, named and explained metric!

    Finally, an answer that makes sense.



  • @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.

    I'm curious. If for every 10 of these drives you were running, you would have to put 12 replacement drives in each year... what figure would you use to indicate that?

    When the event is likely to occur more than once in a specified timeframe, it seems to me that the most effective way to communicate this is a probably of higher than 100%- but if you disagree, then I'd like to hear your ideas.



  • @cdosrun said:

    @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.

    I'm curious. If for every 10 of these drives you were running, you would have to put 12 replacement drives in each year... what figure would you use to indicate that?

    When the event is likely to occur more than once in a specified timeframe, it seems to me that the most effective way to communicate this is a probably of higher than 100%- but if you disagree, then I'd like to hear your ideas.

    While the percentage may communicate that it can also lead to bad interpretations due to the timeframe being dropped by lazy reporting or reading.  I'd suggest just using the bolded part of your statement.



  • @Ben L. said:

    @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.

    There's a 100.00000001% chance that this comment has an invalid probability in it.


    There is a 99.99880000001% chance that I used floating point arithmetic to calculate this result.


  • Discourse touched me in a no-no place

    @cdosrun said:

    I'm curious. If for every 10 of these drives you were running, you would have to put 12 replacement drives in each year... what figure would you use to indicate that?
    I don't know about a figure. More of a diagram or illustration really, of a pile of steaming bovine excrement.



  • @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    Well first of all, I don't think that Annual Failure Rate is a probability figure. It's more ofa historical figure. Based on the 120% value, I bet they are using it to indicate how often the had to replace a given drive type.

    As for how they got a value of 120%, let me give that a shot ...

    You have a farm with 100 hard drive bays. Let's say, for the sake of exploring this 120% figure, that all of those bays are occupied by Seagate Barracuda Green drives. Now, say that one of those drives dies about every 3 days. If you really want to be anal about it, you could even say one drive dies every 73 hours. Being a brand loyalist (come on, you started with 100 Seagates), and being constrained by a corporate energy saving initative, you only replace your drives with more Seagate Barracude Green drives, so this rate keeps up. At the end of the year, your boss wants to know why so much money was spent on hard drives that year and asks for an Annual Failure Rate report. Knowing that you did 120 replacements in 100 drive bays that year, you quickly come up with the 120% figure and submit it to your boss.

    Now does this make the 120% figure correct? Who knows, I'm not a statistician, but it certainly explains where it could have come from.



  • People thinking there are legitimate arguments that the probability of an event can exceed 1...

     

     

    To what depths have I sunk!? I think I may have executed my first moderately successful troll post...


  • Discourse touched me in a no-no place

    @Ben L. said:

    Ok, maybe:

    1. Bought 5 hard drives from Seagate
    2. One of them failed six times.
    No.

    Vaguely amusing, but no.


  • @TheCPUWizard said:

     As a side note, the 1.5TB was notoriously bad, there were even recalls.

    Weren't all 1.5TB drives terrible? I had a 1.5TB WD drive that failed after a few months. I sent it to Singapore for warranty replacement and got back a 2TB fairly quickly.



  • @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.

    No, they think that there are legitimate arguments that the probability of an event [i]per unit time[/i] can exceed 1. I won't be voting for [b]you[/b] any time soon!



  • @DaveK said:

    No, they think that there are legitimate arguments that the probability of an event per unit time can exceed 1.
     

    If you're tralking about probability density function, I agree, but that's an instantaneous value - you can't shouldn't use it for as long a period as a year. Integrating the pdf over any interval must be less than or equal to one: "the probability I'll have n drive failures within T time" is always less than or equal to one.

     



  • @locallunatic said:

    @cdosrun said:

    @too_many_usernames said:

    People thinking there are legitimate arguments that the probability of an event can exceed 1...

    It's no wonder we live in a world where people and governments can't balance their budgets.

    I'm curious. If for every 10 of these drives you were running, you would have to put 12 replacement drives in each year... what figure would you use to indicate that?

    When the event is likely to occur more than once in a specified timeframe, it seems to me that the most effective way to communicate this is a probably of higher than 100%- but if you disagree, then I'd like to hear your ideas.

    While the percentage may communicate that it can also lead to bad interpretations due to the timeframe being dropped by lazy reporting or reading.  I'd suggest just using the bolded part of your statement.

    Fair enough. Now, we need to condense it to fit into a table- You don't want to have to read that whole sentence 50 times for 50 drives.

    Let's call in "Annual" (for per year), "Failures" and to add an extra level of precision (Since most drives don't have this batshit high number of failures) "Per 100"

    That way, we have 120 in the AFp100 column instead of 120% in the Annual Failure Percent column.

    Things are so much better designed as a committee.


Log in to reply