Understanding computer searches

The simple search in perl is a very basic realtime search. It actaully reads the files and checks for matches. This is a great tool because it is always up to date and can read files in real time. Even if files have been recently modified, the search will read the current pages.

As you know, search engines are not quite the same. Since it is impossible to read billions of files each time a search is submitted, the results are preset or created in advance. It will often take considerable resources to create an index that has preset searches.

In cases where there are vast amounts of data to search it is not uncommon to run servers in the background that update the indexes as files are modified.

Searching files is one of the most basic functions yet one of the most depended on features of any website or database.

In database searches, the data can be more categorized so smaller numbers of records and easier to access than large pages.

The database design is usually closely related to the search process. By segmenting file data into smaller chunks like a product name you could search 100,000 names in one file. The name may be a key to the actual data file and if a name match is found the search script can open the datafile with that name to find more specific data.

The bigger the number of files the more complex the script will get. One million data records could be broken into 10 million parts to create faster searches.

For example, it is not necessary to search every line of every file for a phone number. If you have a file named for the actual phone number you can open it directly, or at least use it to reference the full datafile.

Since all large searches are custom written there is no way of telling how the programer is making it work. But it does get quite complicated and much more extensive than the simple search we have provided here.

You should understand the limitations of the simple search and not try to run it on a million pages. If you get to that point, you will need something much more advanced since you can't open and process a million files very quickly.

In most cases with site searches the pages are converted to datafiles and an inverted index is created to list the relative pages for keyword searches. This process can take long periods of time to compile and when changes are made the index needs to be updated making realtime searches more difficult.

Search engines like google take weeks to update their indexes with the data spanning thousands of servers. Searches can be simple or extremly complex, but the concepts are still the same. Matching keywords in files.