How Content-Aware Search Works

Content-aware search uses an implementation of a signature search algorithm in order to identify and locate files of certain types. In general, a persistent file signature is used to detect the very existence of a file, then header analysis is performed in order to determine the length of the file.

How Content-Aware Search Works

Contents

  1. Detecting JPEG Images
  2. Detecting Text Files
  3. Detecting XML and HTML Documents

However, there are some exceptions from this rule. In this article, we’ll have a look at two extremes: a binary file format with a highly persistent structure, and a text format with no structure at all.

Detecting JPEG Images

JPEG files are easy to identify and easy to analyze. The format is well documented, so parsing a file header is generally not a problem. Let’s look, for example, at a typical JPEG file.

JPEG files have a characteristic signature and a highly structured format, making them easy to detect. All JPEG files begin with a hexademical value of FFD8, and end with a value of FFD9. In JPEG files, these signatures can be used several times to identify thumbnail previews in various sizes.

For example, Canon EOS 5D creates JPEG files of the following structure.

FFD8 – the beginning of the file

FFD8 – first thumbnail preview

FFD9 – end of first preview

FFD8 – second thumbnail preview

FFD9 – end of second preview

FFD9 – end of file

As you see, simply detecting fixed signatures is not enough. The program must analyze the file header, know and care about the actual file structure. If information stored in the file header does not match the actual content that follows, the resulting recovered file may come out corrupted. Corrupted images can be recovered with a specialized tool such as RS File Repair.

Detecting Text Files

Text files are on the opposite end of file formats. Having no persistent structure at all, text files are the most difficult to locate – but among the easiest to recover. Even fragmented text files can be recovered (if identified successfully) and combined into a single file if needed. There are no file headers or system structures to worry about.

Sometimes, no formal file headers are available (e.g. for text or HTML files), yet those files can still be recovered. In the case of text-based documents, a data recovery tool analyzes actual data blocks, attempting to find out if the blocks belong to what appears to be a text file. The decision is made by analyzing the file’s character set. If a certain data block contains mostly ASCII characters from a known character set (e.g. Western European, or Unicode, or Arabic etc.), the block is considered to belong to a text file. The ending of such text files is normally detected after the appearance of a certain number of non-ASCII symbols (binary data).

Detecting XML and HTML Documents

XML and HTML documents are structured text files. They normally begin with certain tags, and end with other tags. While there is no exact binary signature to look for, XML and HTML documents can be detected by looking for one of the opening tags (e.g. opening tags , , <?xml, closing tags: , etc.) The lookup must be case-insensitive, as tags can be written in either case or even with characters of mixed cases (e.g. ). The very existence of opening and closing tags allows reliable detection of the beginning and end of such documents.

Frequently Asked Questions

Yes, it is possible to recover deleted files if they have not been overwritten by new data.

Stop using the disk (create an image) as soon as the files have been deleted and use the professional data recovery software RS File Recovery to recover the deleted files.

This greatly depends on the capacity of your hard drive and your computer's performance. Basically, most of hard disk recovery operations can be performed in about 3-12 hours HDD 1TB in normal conditions.

If the file does not open, it means that the file was damaged or corrupted before recovery.

Use "Preview" to evaluate the quality of the recovered file.

When you try to access the drive, you get the message "Drive is not accessible" or "You need to format the partition drive"

Your disk structure is corrupted.

In most cases, the data may still remain available. Just run the data recovery software and scan the desired partition to get it back.

Please use free versions of programs with which you can analyze the storage and view the files available for recovery.

You can save them after purchasing the program - you won't need to scan it again.

Leave a comment

Related Posts

Recovering and Repairing Files (Part 1)
Recovering and Repairing Files (Part 1)
If you had to recover a bunch of files, chances are one or more of them won’t open correctly. Some files will always come out corrupted or incomplete, no matter how good a recovery tool you were using. In order … Continue reading
Content Aware Recovery and Data Carving Explained
Content Aware Recovery and Data Carving Explained
If you are shopping for a data recovery tool, you have probably seen manufacturers mention things like “file carving”, “signature search” or “content-aware recovery”. What are these, is there any difference between these technologies, and do they really help recover … Continue reading
How to Find and Recover Missing Space on Hard Drive
How to Find and Recover Missing Space on Hard Drive
Do you see a constant reduction in disk space? In this article, we will look at several methods of getting back disk space, depending on the cause of the waste. We will also present a method to help you recover … Continue reading
How to recover WhatsApp chat history and media files
How to recover WhatsApp chat history and media files
Media files and data received in WhatsApp are saved by the program, but sometimes it happens that the chat history is completely cleared. It can be an accidental formatting of a memory card, viral software corruption, physical drive damage, or … Continue reading
Online Chat with Recovery Software