Where the rest of 100MB file is being kept when it is compressed into 80MB zip file?

850 views

Where the rest of 100MB file is being kept when it is compressed into 80MB zip file?

In: Engineering

5 Answers

Anonymous 0 Comments

Imagine I have the word Mississippi. File compression is like taking the word Mississippi and converting it to (M)(iss)(iss)(ippi). Then, label each unique blob; (M) is 1, (iss) is 2, and (ippi) is 3.

Once each blob of data is labeled, you can convert Mississippi to 1223. Much smaller!

The zip file basically contains a condensation of information and an index table of each piece of data.

Same information, but rearranged and simplified, reducing the amount of memory needed to convey it.

Instead of spelling out the whole word, I’m providing instructions on how to recreate the word, and the instructions are smaller than the original word themselves.

Anonymous 0 Comments

File compression is somewhat confusing in how it works, and I only know of one particular algorithm, so I’m going to explain that.

Files at a very base levels just consist of ones and zeros, aka binary. Because there are only two values that makes pattern detection easy. The weakest lossless, meaning no data is lost, algorithm is based on finding a pattern and then compressing it down. So where there was data telling the computer about the file there is now some basic code, a number that says how many times that pattern was repeated and then the first part of the pattern. Rinse and repeat until the file is compressed.

There are other algorithms, all better than the simple pattern compression, but those are very complicated and as I said, I don’t really know how they work.

Source: programming experience.

Anonymous 0 Comments

It’s still there. Lossless compression – like in a zip file – tries to **remove redundancy** from a file by, more or less, replacing strings with pointers to entries in a “dictionary”.

E.g., if I wanted to compress

the quick brown fox jumps over the lazy dog

I could replace `the` with `1`, the space with `2`, and then assign a number to the other words.

12324252627212829

I could then represent those numbers with some kind of variable-length encoding – the DEFLATE algorithm uses Huffman encoding – so that common strings take up even less space in the output: those `1`, `2` don’t need to always take up 8-32 bits.

Most files contain a lot of redundancy, and as such [they can be compressed quite a bit](https://www.maximumcompression.com/) – English text can be brought down to less than one third the size – but this is why you won’t gain much by compressing already compressed files (which includes most images and audio/video files): compression tries to reduce redundancy, to represent the data more efficiently, but already-compressed files don’t have a lot of redundancy to be stripped away.

Anonymous 0 Comments

Think of each file like a shirt. When you compress a file, it’s like folding a shirt, you can fit more shirts more efficiently in your drawer if they’re folded. The shirt is technically the same size whether it’s folded or not, but it is definitely more compact when folded. Now, same as compressed files, you can’t wear a folded shirt. It must be unfolded to be used. You can’t use a compressed file until you unzip it. Either way, the shirt/file remains the same overall size.

Anonymous 0 Comments

The other comment is right. It’s not really there, it’s just an abbreviated version that gets remade when you unpack it. Like if I use text speak, U wil undrstnd wht I mn bcos u trnslt it bk wn u rd it