With the many data recovery products offering to undelete your deleted files in a matter of minutes, have you ever wondered just how exactly it works, and why it’s at all possible? In this article, we’ll have a look at what Windows does when deleting a file, and what those tools do to reverse it.
- How Windows Deletes Files
- The Raise of Undelete Tools
- Content-Aware Recovery
- Data Recovery is a Lie: SSD Drives
How Windows Deletes Files
When a file is deleted (either by you via Windows Explorer, by another application or by the operating system itself), Windows will not immediately fill its content with zeroes or otherwise destroy the file’s content. Indeed, this would take a lot of time if done that way. Try deleting a large file you no longer need, such as a big movie you have already watched. Then try deleting a tiny shortcut. Note how the time required to delete either file is exactly the same. If Windows would be wiping all files it deletes, it would take a lot of time to delete that big movie. In other words, wiping the content of every file deleted would slow down your computer enormously. (Note that this no longer applies to SSD drives; more on that later).
So if Windows does not actually erase the file, what DOES it do when deleting? In fact, it does one simple thing: marking the record in the file system that points to that file as… deleted. No longer existing. Not there, empty, not pointing anywhere. As a result, the CONTENT of your file is still sitting somewhere on your disk (if it’s not an SSD), but the FILE SYSTEM RECORD pointing to that file either no longer exists (FAT32) or is marked as empty (NTFS). Of course, I have oversimplified things a great deal, but at this point we won’t need any more technical details. Let’s move on to the recovery.
The Raise of Undelete Tools
Okay, so we figured that the file still exists somewhere on the disk, but the corresponding record in the file system marks it as “deleted”. Can’t we just un-tick that mark?
That was the very idea of one of the first data recovery tools for (back then) Microsoft DOS. The tool was called “undelete”, and it did just that: scanning the file system to find files marked as “deleted” and resetting the corresponding flag. (Back then, deleted files lost the first character in their name, so you would get a “~ocument.txt” instead of “document.txt” after undeleting.
The problem with this approach? There are many. First, the file system in modern multi-tasking operating systems is always changing. New temporary files are being created, log and registry files are being written to by the many system tasks and background applications all the time. As a result, the original file system records pointing to the deleted file will quickly become obsolete, some or all of the records being claimed by the operating system to keep information about the new files.
Note that, at this point, the original deleted file may still sit on the disk with all the disk space that used to belong to that file still unclaimed, while the file system record pointing to that file may no longer exist.
Can you see a problem? That’s why we are now at the second generation of data recovery tools.
The next generation of data recovery tools no longer relies on data obtained from the file system. While modern tools still scan the file system, and are quite happily recovering deleted files if information about them is still available, they can use other sources to discover deleted files. Meet content-aware recovery!
Data recovery products of modern age such as RS File Recovery or RS Partition Recovery go extraordinary lengths to undelete your data. They will scan the entire surface of your disk looking for traces of deleted data. What do they look for, precisely? Well’ here’s how it works.
First, a content-aware tool will scan the file system. Even if no information is available about the file you deleted, scanning the file system allows building a map of disk blocks that belong to other (existing) files, and exclude them from further scanning. This is smart, as a typical consumer hard disk is more than half full, so spending the few seconds for scanning the file system can save you an hour or more of scanning the actual disk.
Next, the tool starts reading the disk, scanning disk sectors that are not marked as occupied by other files one block after another. Each block read from the disk is scanned against a database containing many characteristic signatures that can identify the block as the beginning of a certain type of file. For example, JPEG images always start with “JFIF”, while PDF files have “%PDF%” at the very beginning. (Of course, this is oversimplification again, as the actual signatures used are binary).
If a certain signature match is found, the tool performs a secondary check to determine if this is just a random occurrence or if the signature actually means the beginning of a file. This secondary check is basically a verification of a file’s header in a certain format. If the check is passed, the data recovery tool will then parse the header in order to determine the length of the file. After calculating the file’s length, the tool will then read the required number of blocks from the disk and save them to a new file on another disk.
Sounds great, but… can you see a problem? This method works perfectly well if all files on your disk are stored in solid, contiguous chunks. But what if they are scattered around the disk in multiple small pieces, or, in other words, fragmented? If this is the case, content-aware recovery may fail. To find out more about recovering fragmented files, read our other article: Recovering Fragmented Files.
Data Recovery is a Lie: SSD Drives
Remember how I told that Windows deletes files without actually zeroing their content? I lied. The content of deleted files will, indeed, remain unaffected, but only on traditional (magnetic) hard drives and some types of solid-state media such as memory cards and USB flash drives. But what about the super-fast SSD drives? Well, this is another story.
When deleting a file from an SSD drive, Windows will still not erase its content despite the drive being many times faster than magnetic hard drives. The protocol is exactly the same, with one addition: Windows will send a “TRIM” command to the drive, basically telling its controller that this and that data blocks are no longer used. From now on, the SSD drive is free to do anything it wills with these data blocks. It can wipe them clear to make subsequent writes faster, swap them with other logical data blocks to distribute wear equally (“wear leveling”), or even push them into a special non-addressable zone of reserved data blocks (again, for the purpose of wear leveling).
As a result, the entire content of the file deleted from an SSD drive is as good as gone the very moment it’s been deleted. Sure its content may still remain somewhere on the SSD drive for some time. However, the SSD controller will always return zeroes if you try to read that block. There is no way around no matter how low level you go. Even specialized imaging hardware used in many forensic labs fails to recover trimmed data.
Everything is lost? Not quite. Some SSD drives don’t support TRIM, and sometimes even if they do, the SSD may not be configured properly to accept the TRIM command. If your SSD drive is formatted with FAT32 (or exFAT) instead of NTFS, Windows will not send the TRIM command. And if the file system gets corrupted, any lost data will not be trimmed. So it’s still worth running a good data recovery tool on your SSD such as Partition Recovery and see what’s still available.