How did they calculate a single sperm to have 37 megabytes of information?

1.91K views

How did they calculate a single sperm to have 37 megabytes of information?

In: Biology

12 Answers

Anonymous 0 Comments

disclaimer: This is intended as a joke

Does it mean that we can use sperm to store information?

Anonymous 0 Comments

Does this mean that men are like exceptionally large external hard drives?

Anonymous 0 Comments

They ran Little Big City 2 on it.

No, actually, they just knew how much DNA is in a person and they know the sperm has half that much.

Anonymous 0 Comments

Okay, so how much sperm can I fit in a 1TB HDD? Asking for a friend…

Anonymous 0 Comments

On average that’s how many megabytes of porn a guy has to watch to sperm all over the place.

Anonymous 0 Comments

[removed]

Anonymous 0 Comments

Other posters here have arguably gone beyond the age limit for this sub and have also mixed up “information” and “data”. Sperm cells carry DNA, which, strictly speaking, does *not* carry information, but rather is a memory molecule, and therefore contains data. Information arises when algorithms in the DNA are put to use. This is exactly how code written by humans is stored as data and information only emerges when the code is run (for those older than 5, this is because information is a thermodynamic quantity and requires heat dissipation). To estimate how much data a sperm cell carries, researchers looked at how much DNA is inside and estimated the space required to store it. I cannot find any source for the 37 Mb number, but I’m pretty sure that it simply comes from looking at how much space a FASTA file (a string of letters representing nucleotide bases) of the DNA sequence inside a sperm cell takes up in computer memory. This is why their number is neither 4 nor 400 Mb as cited by other users: these numbers are measures of information and not data storage, so their calculations include things like compression and algorithmic complexity, which are difficult to interpret for biological systems.

Source: am a PhD student studying information in biological systems.

Anonymous 0 Comments

That’s actually an extremely misleading number. The humane genome contains around 3.1 (men) to 3.2 (women) billion base pairs. Since the X chromosome is three times longer than the Y chromosome, women have a higher total genome length than men. A base pair is made of two of the four nucleobases: adenine, cytosine, guanine and thymine, but only the four combinations AT, TA, CG and GC are possible, because A and T only and always go together, and C and G only and always go together. These four combinations can be encoded with two bits, so that’s 6.2-6.4 gigabits, or about 750 megabytes for a full, exact copy of a human genome.

Now, even if you need 750 megabytes to store the “raw data” from a human genome, at least a computer scientist will have a hard time defining all of this as “information”. E.g. if you record 74 minutes of complete silence on a CD, the disc contains roughly 750 megabytes of “data” as well, but actually no “information”. Large parts of the human genome are repetitive, only a very small part actually differ between different individuals and from the difference, several base pair sequences only occur in a few well-defined varieties. Depending on how you “compress” or ignore this DNA that’s not unique, you could arrive at the conclusion that there’s only 37.5mb worth of DNA that’s “unique” in each sperm, but DNA isn’t the same as a .zip file, and while it’s useful to compress it when dealing with it as digital data, our bodies don’t work that way, so no, there is far more than 37.5mb of information in a single sperm. A sperm cell doesn’t just contain the unique parts of a person’s genome. It contains 1 full set of chromosomes (23/46 chromosomes, we have 2 of each chromosome). Every single one of the base pairs is present.

Anonymous 0 Comments

There is 4 possible nucleotide of each location in our DNA. 2 alternatives can be represented by 2 bits there is 8 bits in a byte so 4 base pair per byte. The human genome is around 3.2 billion base pairs 3 200 000 000/4= 800 000 000 = 800 MB.

So to get to 37 MB you either only include the protein-coding part of the DNA. The other alternative is you use the number that you could get if you compressed the data in some way. Because human DNA is very close to other human DNA you can losslessly compress to roughly 4 megabytes.

So if sperm contains 37 megabytes of information depending on what you mean by information. You can have values of 800 MB to 4 MB depending on how you look at it.

What information is not an easy question. What is the amount of data in the string “aaaaaaaaaa”? you could compress it to 10a and you have reduced if from 10 to 3 characters but no information loss.

EDIT: Missed that the number was for a haploid genome and a 3->4 mixup.

Anonymous 0 Comments

DNA is coded with 4 letters: A, T, G, C.

A byte can hold 4 pieces of these letters. A byte can contain for example “ATTG”.

If you know how long your data is, then you know how much byte you need. For example “AATGCCAT” is 8 code long, than you need 2 bytes.

37MB is appr. 37 Million bytes. That means the genetic code must be about 4*37 Million = 148 Million codes.

A sperm has the half of your genes/code. If a human has about 300 Milion codes then the calculation is correct.