What's the fastest hash or CRC algorithm?
-
I need a hashing function, all it'll be doing is verifying data integrity over a high-speed network link while stress-testing a new product of ours. It needs to be very fast so the hashing operation isn't a bottleneck. The network link is a 2 Gbps optical connection, and I'm hoping to saturate this link. Doesn't need to be cryptographically secure, and even collisions aren't really a concern.
Thoughts? I'm not sure how fast MD5 is but I assume there are much faster ways. I can use C or C++. I'm too lazy to benchmark just yet.
Filed Under: Discourse and StackOverflow are now the same thing, You do not have enough reputation to post under the C++ category, go away until you have more rep, inb4 "Use JQuery."
-
I suppose I could do a simple odd-or-even bit count thing. The datasets I want to transmit will probably be randomly-generated 256 MB byte arrays.
-
Hmm... LMGTFY...
Gives me:
If all you're doing is basic-basic checking, a simple checksum would work.
'd:
a simple odd-or-even bit count thing
Filed under: Hanzo'd by the OP, what magic is this?
-
MD4 is slightly faster than MD5 and either should be able to saturate a 2Gbps link with ease; your bottleneck will be elsewhere with those. Although it really depends on the hardware and size of the data, I doubt that you'll run into a bottleneck on a 2Gbps link with those irrespective of how slow the hardware is/how large the data is. CRC32 may be faster but it's going to be a lot more likely to have collisions with no added benefit. Seriously, just try MD4. There's no reason not to. If it doesn't work for you, swap it out.
-
Filed under: Hanzo'd by the OP, what magic is this?
I think this counts as Rubber ducking.
-
If speed is the only concern, I recommend:
int hash(void * data) { return 0; }
-
-
We use Fletcher's Checksum for this kind of thing.
-
all it'll be doing is verifying data integrity
People have been using CRC32 for this since the beginning of time.
-
Yes. but MD5, SHA1, SHA-256, and SHA-512 have less collisions (getting rarer as you go from left to right).
-
CRC32 is specifically designed to detect the types of errors that are common with transmission media (dropped bits, transpositions, single bit errors). Use the right tool for the job.
-
Yes. but MD5, SHA1, SHA-256, and SHA-512 have less collisions (getting rarer as you go from left to right).
/me ponders the usefullness of having a variable bit hash where the hash is the length of the input file.
seconds later
/me realizes we have that already, it's called ENCRYPTION.
seconds later
/me is searching for a desk to introduce to her forehead repeatedly.
-
-
MD 11 is my favorite airplane.
-
Yes. but MD5, SHA1, SHA-256, and SHA-512 have less collisions (getting rarer as you go from left to right).
Riddle me this: how many false negatives will be prevented by choosing a more collision-resistant algorithm. The number will be irrelevant, either the communications link is solid and there are zero collisions, or it is crap and there are multiple collisions.
If there are zero, then the collision rate is irrelevant because there are no false negative (every test for corruption is negative). If there are more than zero, then the collision rate of the algorithm will influence what percent of the positive are seen as negative. Unless that rate is really high, corruption will be detected at a rate really close to the actual rate.
-
I'm actually an aviation buff and its my favorite model as well.
-
Whatever I come up with will be running for days at a time, at least. Even if there are a ton of errors and collisions, the chance of all errors being masked by collisions is pretty much zero. Detecting even a single error will let us know there's a firmware bug somewhere.
(Obviously, I'm hoping for zero errors, but without fully stress-testing the firmware there may be unknown edge cases resulting in garbled transmission)
-
-
Let me guess. L-1011
-
Fuck no.
-
Oh gods you're not some kind of deranged airbus fan, are you? (A340 is the exception)
-
X-15 is where it's at.
-
I'd much rather have an Incom T-65.
-
Fuck off.
-
I think that's a yes.
Is this bloated mass your style, Blakey?
http://www.cato.org/sites/cato.org/files/wp-content/uploads/airbus-beluga-nr1_l3929.jpg
-
Ooh I forgot about that. Those are almost cool. But only because of the drop nose. Otherwise it's dumb.
-
Good save with the edit. :P
-
If you think I would like Airbus anything, you do not know me at all.
-
Well, I mean, you have been known to prefer fox dudes in bras, so I'm not surprised by any preferences you seem to have anymore.
-
Blakeyrat is a Pacific Northwesterner.
-
-
Use a deterministic PRNG, share the seed between the sender and the receiver, and you can verify all the data!
-
Would reducing the packet size or MTU be useful? The reasoning being you could reduce it to the point where the packet overheads exceed the data size, and a simple file will cause considerably more packets to be sent.
-
Given modern CPUs, the more complex hash algorithm implementations that use all the possible extensions that are available will have so little overhead it doesn't matter what you choose.
If you are CPU limited then you might want to use CRC32 as suggested.
-
I figure how you're building the hash might matter more than the specific algorithm. Like, if you're building your hash up as you're reading or writing the stream at each end, that shouldn't cost much, but if you're hashing a big file then sending it, that could cause problems.
-
Try here. If you want something fun, try implementing an LDPC code for your data stream.
Although if you're only looking for the existence of errors, without caring how many or where they are, use a CRC or something else that's simple. Just make sure it doesn't match whatever the firmware and protocols are already doing.
Also, this is a good idea:
Use a deterministic PRNG, share the seed between the sender and the receiver, and you can verify all the data!
If you want something more interesting than the LDPC family, let me know. My grandfather specialized in these kinds of things, and he might know something to look at.
-
Would reducing the packet size or MTU be useful? The reasoning being you could reduce it to the point where the packet overheads exceed the data size, and a simple file will cause considerably more packets to be sent.
This is a custom network protocol using some kind of round-robin or token-passing scheme over a fiber link. I'm not sure if packet size, MTU, or even packet even mean anything here. If they do, they're abstracted away and not anything I have control over from the hardware's API.
-
Would reducing the packet size or MTU be useful?
Does his experimental fiber network even have packets?
Might wanna establish a foundation before you start building those towers.
-
OMG, that poor thing looks like it has a tumor.
-
Whatever I come up with will be running for days at a time, at least. Even if there are a ton of errors and collisions, the chance of all errors being masked by collisions is pretty much zero. Detecting even a single error will let us know there's a firmware bug somewhere.
I'd use adler32() from zlib because that library is probably already somewhere in your code base.
-
That's very similar to Fletcher's checksum, but with two additional mod operations per word. Both are still faster than CRC-32.
-
So, you are aware of a technology / methodology that can transmit digital data without any packets. Please, teach us this so that we may use it to improve our bandwidth...
-
I actually agree with blakey on this one. I don't know if there are packets or not, and the truth is it doesn't matter. I'm testing the board and firmware through the API which customers will be using. It's a black box, I have very little idea what's going on inside, nor do I care, nor do our customers care. All that matters is that I can push data onto the board and it replicates the data to all the other boards without error (it's a shared memory product).
-
a technology / methodology that can transmit digital data without any packets. Please, teach us this so that we may use it to improve our bandwidth...
Morse code, paper tape and Apple II cassette tape all fit the description.
-
So, you are aware of a technology / methodology that can transmit digital data without any packets.
Yes.
But that aside, all we know is that it's a "high speed network link". We don't know what type of network, even. Or what type of data it is. For all I know, it's transmitting streaming video and nothing else. (In which case, packets would be unnecessary.)
All I'm saying is: don't make assumptions. Or at least point out you're making the assumption. Something like, "assuming it's a packet-switched network, and it has a concept of 'MTU's, have you tried X?"
-
Then the solution is simple: Open the connection and keep pushing random data down there in real time.
Or. Find out what what Network Layers are involved and what Protocols are involved so you can better understand how to "saturate" the link. Otherwise don't waste your time trying to figure out a solution that may not be one. To be honest, at byte / word level there is no difference between compressed and uncompressed - unless you are sending simple text. Try Images especially Bit Mapped, or jpeg type if you don't want to check for the colour complexity of the image (but strip out the EXIF)
NULL NULL ETX EBT EOT NAK
-
That's very similar to Fletcher's checksum, but with two additional mod operations per word. Both are still faster than CRC-32.
Yes. Adler is theoretically closer to CRC-32 than Fletcher for reliable error detection.
If the expected error rate is close to zero, any of these will work. Even a simple XOR or additive checksum is probably plenty strong enough for this use case. Accumulating a simple checksum the width of a native machine word over data of that same width should fly like shit off a shovel - if I had a guarantee that the size of the data to be checked was always a multiple of the word size, I'd probably do that instead of even bothering with a library function.
-
Find out what what Network Layers are involved and what Protocols are involved so you can better understand
Not my job. If I can break things through the API, I go to the firmware engineer and say "I broke your network layers and protocols, you fix it."
-
Not my job
Not your job as a developer. However, as a tester, you are responsible for identifying probable failure modes and testing those.
-
Adler is theoretically closer to CRC-32 than Fletcher for reliable error detection
Yes, that's the trade-off: effectiveness (Adler) vs speed (Fletcher).
even bothering with a library function.
Adler and Fletcher are each just 4 lines of code. WTF do you need a library for? We want speed here, so just inline that crap wherever it needs to go.