How Content-Aware Search Works

Content-aware search uses an implementation of a signature search algorithm in order to identify and locate files of certain types. In general, a persistent file signature is used to detect the very existence of a file, then header analysis is performed in order to determine the length of the file.

How Content-Aware Search Works

Contents

  1. Detecting JPEG Images
  2. Detecting Text Files
  3. Detecting XML and HTML Documents

However, there are some exceptions from this rule. In this article, we’ll have a look at two extremes: a binary file format with a highly persistent structure, and a text format with no structure at all.

Detecting JPEG Images

JPEG files are easy to identify and easy to analyze. The format is well documented, so parsing a file header is generally not a problem. Let’s look, for example, at a typical JPEG file.

JPEG files have a characteristic signature and a highly structured format, making them easy to detect. All JPEG files begin with a hexademical value of FFD8, and end with a value of FFD9. In JPEG files, these signatures can be used several times to identify thumbnail previews in various sizes.

For example, Canon EOS 5D creates JPEG files of the following structure.

FFD8 – the beginning of the file

FFD8 – first thumbnail preview

FFD9 – end of first preview

FFD8 – second thumbnail preview

FFD9 – end of second preview

FFD9 – end of file

As you see, simply detecting fixed signatures is not enough. The program must analyze the file header, know and care about the actual file structure. If information stored in the file header does not match the actual content that follows, the resulting recovered file may come out corrupted. Corrupted images can be recovered with a specialized tool such as RS File Repair.

Detecting Text Files

Text files are on the opposite end of file formats. Having no persistent structure at all, text files are the most difficult to locate – but among the easiest to recover. Even fragmented text files can be recovered (if identified successfully) and combined into a single file if needed. There are no file headers or system structures to worry about.

Sometimes, no formal file headers are available (e.g. for text or HTML files), yet those files can still be recovered. In the case of text-based documents, a data recovery tool analyzes actual data blocks, attempting to find out if the blocks belong to what appears to be a text file. The decision is made by analyzing the file’s character set. If a certain data block contains mostly ASCII characters from a known character set (e.g. Western European, or Unicode, or Arabic etc.), the block is considered to belong to a text file. The ending of such text files is normally detected after the appearance of a certain number of non-ASCII symbols (binary data).

Detecting XML and HTML Documents

XML and HTML documents are structured text files. They normally begin with certain tags, and end with other tags. While there is no exact binary signature to look for, XML and HTML documents can be detected by looking for one of the opening tags (e.g. opening tags , , <?xml, closing tags: , etc.) The lookup must be case-insensitive, as tags can be written in either case or even with characters of mixed cases (e.g. ). The very existence of opening and closing tags allows reliable detection of the beginning and end of such documents.

Frequently Asked Questions

Yes, it is possible to recover deleted files if they have not been overwritten by new data.

Stop using the disk (create an image) as soon as the files have been deleted and use the professional data recovery software RS File Recovery to recover the deleted files.

This greatly depends on the capacity of your hard drive and your computer's performance. Basically, most of hard disk recovery operations can be performed in about 3-12 hours HDD 1TB in normal conditions.

If the file does not open, it means that the file was damaged or corrupted before recovery.

Use "Preview" to evaluate the quality of the recovered file.

When you try to access the drive, you get the message "Drive is not accessible" or "You need to format the partition drive"

Your disk structure is corrupted.

In most cases, the data may still remain available. Just run the data recovery software and scan the desired partition to get it back.

Please use free versions of programs with which you can analyze the storage and view the files available for recovery.

You can save them after purchasing the program - you won't need to scan it again.

Leave a comment

Related Posts

How to recover browser history after cleaning
How to recover browser history after cleaning
This article will explain how to quickly recover a deleted history from popular browsers and provide quick and easy ways to recover a lost history of internet browsers.
The Windows restore errors
The Windows restore errors
In this article, we will look at the main Windows restore errors, as well as their causes and ways to fix them.
How to recover data from NAS OpenMediaVault (OMV)?
How to recover data from NAS OpenMediaVault (OMV)?
OpenMediaVault (OMV) is a specialized operating system for independently assembled NAS storages. It is based on Debian Linux, one of the popular operating systems, and provides software for creating data storage based on various hard drive arrays. So, how can … Continue reading
What is the Best RAID configuration for NAS?
What is the Best RAID configuration for NAS?
One of the important criteria for choosing a NAS (network attached storage) device is its ability to ensure data integrity using RAID arrays. However, to ensure reliable data protection, it is important to choose the correct RAID configuration. In this … Continue reading
Online Chat with Recovery Software