Content-aware search uses an implementation of a signature search algorithm in order to identify and locate files of certain types. In general, a persistent file signature is used to detect the very existence of a file, then header analysis is performed in order to determine the length of the file.

However, there are some exceptions from this rule. In this article, we’ll have a look at two extremes: a binary file format with a highly persistent structure, and a text format with no structure at all.

Detecting JPEG Images

JPEG files are easy to identify and easy to analyze. The format is well documented, so parsing a file header is generally not a problem. Let’s look, for example, at a typical JPEG file.

JPEG files have a characteristic signature and a highly structured format, making them easy to detect. All JPEG files begin with a hexademical value of FFD8, and end with a value of FFD9. In JPEG files, these signatures can be used several times to identify thumbnail previews in various sizes.

For example, Canon EOS 5D creates JPEG files of the following structure.

FFD8 – the beginning of the file

FFD8 – first thumbnail preview

FFD9 – end of first preview

FFD8 – second thumbnail preview

FFD9 – end of second preview

FFD9 – end of file

As you see, simply detecting fixed signatures is not enough. The program must analyze the file header, know and care about the actual file structure. If information stored in the file header does not match the actual content that follows, the resulting recovered file may come out corrupted. Corrupted images can be recovered with a specialized tool such as RS File Repair.

Detecting Text Files

Text files are on the opposite end of file formats. Having no persistent structure at all, text files are the most difficult to locate – but among the easiest to recover. Even fragmented text files can be recovered (if identified successfully) and combined into a single file if needed. There are no file headers or system structures to worry about.

Sometimes, no formal file headers are available (e.g. for text or HTML files), yet those files can still be recovered. In the case of text-based documents, a data recovery tool analyzes actual data blocks, attempting to find out if the blocks belong to what appears to be a text file. The decision is made by analyzing the file’s character set. If a certain data block contains mostly ASCII characters from a known character set (e.g. Western European, or Unicode, or Arabic etc.), the block is considered to belong to a text file. The ending of such text files is normally detected after the appearance of a certain number of non-ASCII symbols (binary data).

Detecting XML and HTML Documents

XML and HTML documents are structured text files. They normally begin with certain tags, and end with other tags. While there is no exact binary signature to look for, XML and HTML documents can be detected by looking for one of the opening tags (e.g. opening tags , , <?xml, closing tags: , etc.) The lookup must be case-insensitive, as tags can be written in either case or even with characters of mixed cases (e.g. ). The very existence of opening and closing tags allows reliable detection of the beginning and end of such documents.

Frequently Asked Questions

Yes, it is possible to recover deleted files if they have not been overwritten by new data.

Stop using the disk (create an image) as soon as the files have been deleted and use the professional data recovery software RS File Recovery to recover the deleted files.

This greatly depends on the capacity of your hard drive and your computer's performance. Basically, most of hard disk recovery operations can be performed in about 3-12 hours HDD 1TB in normal conditions.

If the file does not open, it means that the file was damaged or corrupted before recovery.

Use "Preview" to evaluate the quality of the recovered file.

When you try to access the drive, you get the message "Drive is not accessible" or "You need to format the partition drive"

Your disk structure is corrupted.

In most cases, the data may still remain available. Just run the data recovery software and scan the desired partition to get it back.

Please use free versions of programs with which you can analyze the storage and view the files available for recovery.

You can save them after purchasing the program - you won't need to scan it again.

How Content-Aware Search Works

Contents

Detecting JPEG Images

Detecting Text Files

Detecting XML and HTML Documents

Frequently Asked Questions

Approved by Adam Bean

Leave a Reply

Restoring Data From Formatted Hard Drive and Recovering Deleted Files

Recovering data from corrupted RAID arrays and network storage

Recovering history, bookmarks web browsers and saved passwords

Last From Our Blog

Related Posts

How to recover browser history after cleaning

The Windows restore errors

How to recover data from NAS OpenMediaVault (OMV)?

What is the Best RAID configuration for NAS?