Fragmentation Vs. Defragmentation

We’ve all heard of defragmentation. But first, what is fragmentation of data? To explain this concept, one of the best example you can find is the one used by Roberto Di Cosmo in a conference of 1998: Your hard drive (or any other storing device) is like a shelf divided in boxes. All the boxes are of the same size and you use the shelf to store folders and files. When the shelf is empty, it is easy to put a folder in a box. If the folder is too large to fit in one box, you divide it and store the excess in the box next to it. You can do that as long as you have enough space left. However, when you are dealing with data on a computer, especially the one used by programs, the size varies a lot. Some files get bigger, deleted or moved. So really quickly, your shelf becomes a mess. Some boxes are half-empty, others cannot contain a growing folder. There are no free boxes at the bottom of the shelf (you started at the top) but you still need to store a new folder. Therefore you search for some free spaces in the previous boxes. In the end, your folder is divided and stored with some other parts of folders. You can imagine how difficult it is going to be in order to fetch the entire file in the shelf. Even if you wrote down where you stored the different parts, you still have to search in different boxes to gather all the files.

You can now imagine the pain of your computer searching for a file when the disk is really fragmented. Compared to your processor’s speed, the time needed by your hard drive to find a fragmented folder is a little eternity. So in order to stop the sufferings and the delays, we use the defragmentation process. It basically does what it sounds like: takes everything out and try to put all the folders back in order, getting rid of the wasted spaces and storing the divided parts back next to each other.

Why it does not concern Linux

Linux does not face the problem of the shelf. At least not to that extent. This is due to type of file system created specially for Linux: ext4. Ext4, like other file systems, manages the data and the space on a hard drive, but also does its best to prevent fragmentation. Going back to the shelf concept, when you store a folder into a box, ext4 will automatically book the neighboring boxes. It tries to anticipate the folder expansion, and actually does it quite well. That way, no folders will be divided and the shelf will remain ordered. The downside is that the method requires a lot of free space. If there are no boxes left in the shelf, ext4 will have no choice but to go back to the old method of filling the holes. This can happen if you have less than 20% of free space left on your hard drive. So in general, your hard drive is not fragmented, or if it is, it is frequently less than 3% of its size.

Conclusion

YES there can fragmentation on Linux, but NO you do not have to do anything about it. The only advice that I shall give you is to manage well your hard drive, use LVM if you can and leave more than 20% of free space at all time. If for some reason you suspect a heavy fragmentation, the simplest solution is to move everything on a separate device and transfer it back. Ext4 should do the rest. Do you have another tip against fragmentation? Or another question about the subject? Please let us know in the comments. Image credit: Storage by BigStockPhoto