How Content-Aware Search Works

Content-aware search uses an implementation of a signature search algorithm in order to identify and locate files of certain types. In general, a persistent file signature is used to detect the very existence of a file, then header analysis is performed in order to determine the length of the file.

How Content-Aware Search Works

Contents

  1. Detecting JPEG Images
  2. Detecting Text Files
  3. Detecting XML and HTML Documents

However, there are some exceptions from this rule. In this article, we’ll have a look at two extremes: a binary file format with a highly persistent structure, and a text format with no structure at all.

Detecting JPEG Images

JPEG files are easy to identify and easy to analyze. The format is well documented, so parsing a file header is generally not a problem. Let’s look, for example, at a typical JPEG file.

JPEG files have a characteristic signature and a highly structured format, making them easy to detect. All JPEG files begin with a hexademical value of FFD8, and end with a value of FFD9. In JPEG files, these signatures can be used several times to identify thumbnail previews in various sizes.

For example, Canon EOS 5D creates JPEG files of the following structure.

FFD8 – the beginning of the file

FFD8 – first thumbnail preview

FFD9 – end of first preview

FFD8 – second thumbnail preview

FFD9 – end of second preview

FFD9 – end of file

As you see, simply detecting fixed signatures is not enough. The program must analyze the file header, know and care about the actual file structure. If information stored in the file header does not match the actual content that follows, the resulting recovered file may come out corrupted. Corrupted images can be recovered with a specialized tool such as RS File Repair.

Detecting Text Files

Text files are on the opposite end of file formats. Having no persistent structure at all, text files are the most difficult to locate – but among the easiest to recover. Even fragmented text files can be recovered (if identified successfully) and combined into a single file if needed. There are no file headers or system structures to worry about.

Sometimes, no formal file headers are available (e.g. for text or HTML files), yet those files can still be recovered. In the case of text-based documents, a data recovery tool analyzes actual data blocks, attempting to find out if the blocks belong to what appears to be a text file. The decision is made by analyzing the file’s character set. If a certain data block contains mostly ASCII characters from a known character set (e.g. Western European, or Unicode, or Arabic etc.), the block is considered to belong to a text file. The ending of such text files is normally detected after the appearance of a certain number of non-ASCII symbols (binary data).

Detecting XML and HTML Documents

XML and HTML documents are structured text files. They normally begin with certain tags, and end with other tags. While there is no exact binary signature to look for, XML and HTML documents can be detected by looking for one of the opening tags (e.g. opening tags , , <?xml, closing tags: , etc.) The lookup must be case-insensitive, as tags can be written in either case or even with characters of mixed cases (e.g. ). The very existence of opening and closing tags allows reliable detection of the beginning and end of such documents.

Frequently Asked Questions

Yes, it is possible to recover deleted files if they have not been overwritten by new data.

Stop using the disk (create an image) as soon as the files have been deleted and use the professional data recovery software RS File Recovery to recover the deleted files.

This greatly depends on the capacity of your hard drive and your computer's performance. Basically, most of hard disk recovery operations can be performed in about 3-12 hours HDD 1TB in normal conditions.

If the file does not open, it means that the file was damaged or corrupted before recovery.

Use "Preview" to evaluate the quality of the recovered file.

When you try to access the drive, you get the message "Drive is not accessible" or "You need to format the partition drive"

Your disk structure is corrupted.

In most cases, the data may still remain available. Just run the data recovery software and scan the desired partition to get it back.

Please use free versions of programs with which you can analyze the storage and view the files available for recovery.

You can save them after purchasing the program - you won't need to scan it again.

Leave a comment

Related Posts

How to recover data from an unallocated space on a drive
How to recover data from an unallocated space on a drive
Unallocated disk space, also called free space, is a certain amount of storage that is not used by the operating system to store any information. In this article, we will describe in detail how to recover data from an unallocated … Continue reading
The Windows restore errors
The Windows restore errors
In this article, we will look at the main Windows restore errors, as well as their causes and ways to fix them.
How to rename a user data folder in Windows 10
How to rename a user data folder in Windows 10
During the Windows installation process, many users do not think about how important it is to choose the right username. It is the name you will see when you log in to the operating system and one of the root … Continue reading
How to fix the “This PC can’t run Windows 11” error?
How to fix the “This PC can’t run Windows 11” error?
Microsoft has just recently presented a new version of its operating system called Windows 11. However, many users have encountered the error “This PC can’t run Windows 11”. In this article, we will discuss the causes of this error, as … Continue reading
Online Chat with Recovery Software