A Historical Background
Once upon a time, in the primitive and barbarian days before computers, the amount of information gathered by a group of people could be collected in the wisdom and the stories of its older members. In this world, storytellers, magicians, and grandparents were considered great and honored storehouses for all that was known.
Apparently, and according to vast archeological data, campfires were used by the younger members of the community to access the information stored in the minds of the elders using API's such as
public String TellUsAboutTheTimeWhen(String s);.
And then of course, like a sweeping and rapidly-encompassing viral infection, came agriculture, over-production of foodstuffs, and the origins of modern-day commerce.
Dealing with vast storehouses of wheat, rice, and maize became quite a chore for the monarchs and emperors that developed along with the new economy. There was simply too much data to be managed in the minds of the elders (who by now were feeling the effects of hardware obsolescence as they were being pushed quietly into the background).
And so, in order to store all the new information, humanity invented the technology of writing. And though great scholars like Aristotle warned that the invention of the alphabet would lead to the subtle but total demise of the creativity and sensibility of humanity, data began to be stored in voluminous data repositories, called books.
As we know, eventually books propogated with great speed and soon, whole communities of books migrated to the first real "databases", libraries.
Unlike previous versions of data warehouses (people and books), that might be considered the australopithecines of the database lineage, libraries crossed over into the modern-day species, though they were incredibly primitive of course.
Specifically, libraries introduced "standards" by which data could be stored and retrieved.
After all, without standards for accessing data, libraries would be like my closet, endless and engulfing swarms of chaos. Books, and the data within books, had to be quickly accessible by anyone if they were to be useful.
In fact, the usefulness of a library, or any base of data, is proportional to its data storage and retrieval efficiency. This one corollary would drive the evolution of databases over the next 2000 years to its current state.
Thus, early librarians defined standardized filing and retrieval protocols. Perhaps, if you have ever made it off the web, you will have seen an old library with its cute little indexing system (card catalog) and pointers (Dewey decimal system).
And for the next couple thousand years libraries grew, and grew, and grew along with associated storage/retrieval technologies such as the filing cabinet, colored tabs, and three ring binders.
All this until one day about half a century ago, some really bright folks including Alan Turing, working for the British government were asked to invent an advanced tool for breaking German cryptographic "Enigma" codes.
That day the world changed again. That day the computer was born.
The computer was an intensely revolutionary technology of course, but as with any technology, people took it and applied it to old problems instead of using it to its revolutionary potential.
Almost instantly, the computer was applied to the age-old problem of information storage and retrieval. After all, by World War Two, information was already accumulating at rates beyond the space available in publicly supported libraries. And besides, it seemed somehow cheap and tawdry to store the entire archives of "The Three Stooges" in the Library of Congress. Information was seeping out of every crack and pore of modern day society.
Thus, the first attempts at information storage and retrieval followed traditional lines and metaphors. The first systems were based on discrete files in a virtual library. In this file-oriented system, a bunch of files would be stored on a computer and could be accessed by a computer operator. Files of archived data were called "tables" because they looked like tables used in traditional file keeping. Rows in the table were called "records" and columns were called "fields".
Consider the following example:
| Eric |
Tachibana |
erict@eff.org |
213-456-0987 |
| Selena |
Sol |
selena@eff.org |
987-765-4321 |
| Li Hsien |
Lim |
hsien@somedomain.com |
65-777-9876 |
| Jordan |
Ramacciato |
nadroj@otherdomain.com |
222-3456-123 |
The "flat file" system was a start. However, it was seriously inefficient.
Essentially, in order to find a record, someone would have to read through the entire file and hope it was not the last record. With a hundred thousands records, you can imagine the dilemma.
What was needed, computer scientists thought (using existing metaphors again) was a card catalog, a means to achieve random access processing, that is the ability to efficiently access a single record without searching the entire file to find it.
The result was the indexed file-oriented system in which a single index file stored "key" words and pointers to records that were stored elsewhere. This made retrieval much more efficient. It worked just like a card catalog in a library. To find data, one needed only search for keys rather than reading entire records.
However, even with the benefits of indexing, the file-oriented system still suffered from problems including:
- Data Redundancy - the same data might be stored in different places
- Poor Data Control - redundant data might be slightly different such as in the case when Ms. Jones changes her name to Mrs. Johnson and the change is only reflected in some of the files containing her data
- Inability to Easily Manipulate Data - it was a tedious and error prone activity to modify files by hand
- Cryptic Work Flows - accessing the data could take excessive programming effort and was too difficult for real-users (as opposed to programmers).
Consider how troublesome the following data file would be to maintain.
| Mr. Eric Tachibana |
123 Kensigton |
Chemistry 102 |
C+ |
| Mr. Eric Tachibana |
123 Kensigton |
Chinese 3 |
A |
| Mr. Eric Tachibana |
122 Kensigton |
Data Structures |
B |
| Mr. Eric Tachibana |
123 Kensigton |
English 101 |
A |
| Ms. Tonya Lippert |
88 West 1st St. |
Psychology 101 |
A |
| Mrs. Tonya Ducovney |
100 Capitol Ln. |
Psychology 102 |
A |
| Ms. Tonya Lippert |
88 West 1st St. |
Human Cultures |
A |
| Ms. Tonya Lippert |
88 West 1st St. |
European Governments |
A |
What was needed was a truly unique way to deal with the age-old problem, a way that reflected the medium of the computer rather than the tools and metaphors it was replacing.
Enter the database.
Simply put, a database is a computerized record keeping system. More completely, it is a system involving data, the hardware that physically stores that data, the software that utilizes the hardware's file system in order to 1) store the data and 2) provide a standardized method for retrieving or changing the data, and finally, the users who turn the data into information.
Databases, another creature of the 60s, were created to solve the problems with file-oriented systems in that they were compact, fast, easy to use, current, accurate, allowed the easy sharing of data between multiple users, and were secure.
A database might be as complex and demanding as an account tracking system used by a bank to manage the constantly changing accounts of thousands of bank customers, or it could be as simple as a collection of electronic business cards on your laptop.
The important thing is that a database allows you to store data and get it or modify it when you need to easily and efficiently regardless of the amount of data being manipulated. What the data is and how demanding you will be when retrieving and modifying that data is simply a matter of scale.
Traditionally, databases ran on large, powerful mainframes for business applications. You will probably have heard of such packages as Oracle 8 or Sybase SQL Server for example.
However with the advent of small, powerful personal computers, databases have become more readily usable by the average computer user. Microsoft's Access is a popular PC-based engine.
More importantly for our focus, databases have quickly become integral to the design, development, and services offered by web sites.
Consider s site like Amazon.com that must be able to allow users to quickly jump through a vast virtual warehouse of books and compact disks.
How could Amazon.com create web pages for every single item in their inventory and how could they keep all those pages up to date. Well the answer is that their web pages are created on-the-fly by a program that "queries" a database of inventory items and produces an HTML page based on the results of that query.
Database Models
Through history several database models have been created and used. Among those are: Hierarchical, Network, and Relational. Studying the history of datbases is a worth while endeavor but is not in the context of this class. Instead, we will concentrate our focus on relational databases. They are by far the most popular being used today.
Relational Databases
In the 80's the "Relational Database Model" became the rage. The Relational Model developed out of the work done by Dr. E. F. Codd at IBM in the late 1960s who was looking for ways to solve the problems with the existing models.
Because he was a mathematician, he naturally built the model on mathematical concepts which he expounded in the famous work called "A Relational Model of Data for Large Shared Databanks".
At the core of the relational model is the concept of a table (also called a relation) in which all data is stored. Each table is made up of records (horizontal rows also known as tuples) and fields (vertical columns also known as attributes).
It is important to note that how or where the tables of data are stored makes no difference. Each table can be identified by a unique name and that name can be used by the database to find the table behind the scenes. As a user, all you need to know is the table name in order to use it. You do not need to worry about the complexities of how the data is stored on the hard drive.
This is quite a bit different from the hierarchical and network models in which the user had to have an understanding of how the data was structured within the database in order to retrieve, insert, update, or delete records from the database.
So how do you find data in a relational database if there is no map (like a hierarchy defined by pointers) to follow?
Well, in the relational model, operations that manipulate data do so on the basis of the data values themselves. Thus, if you wish to retrieve a row from a table for example, you do so by comparing the value stored within a particular column for that row to some search criteria.
For example, you might say (not getting into syntax yet) "Give me all the rows from the 'STUDENTS' table which have 'Selena' in the 'FIRST_NAME' column."
The database might return a list which looks essentially like this:
| Selena |
Roberts |
SID-268 |
818-934-5069 |
| Selena |
Smith |
SID-991 |
310-234-6475 |
You could then use the data from a retrieved row to query another table. For example, you might say "Okay, I want to know what grade 'Selena Sol' received in 'Underwater Basket Weaving 101'. So I will now use the 'Student ID' number from my previous query as the keyword in my next query. I want the row in the 'Underwater Basket Weaving Course' table where student ID equals 'SID-001'.
This data access methodology makes the relational model a lot different from and better than the earlier database models because it is a much simpler model to understand. This is probably the main reason for the popularity of relational database systems today.
Another benefit of the relational system is that it provides extremely useful tools for database administration. Essentially, tables can not only store actual data but they can also be used as the means for generating meta-data (data about the table and field names which form the database structure, access rights to the database, integrity and data validation rules etc).
Thus everything within the relational model can be stored in tables. This means that many relational systems can use operations recursively in order to provide information about the database. In other words, a user can query information concerning table names, access rights, or some data and the results of these queries would then be presented to the user in the form of a table.
This makes database administration as easy as usage!
Client/Server Databases
As we said before, most databases that you will come across these days will be relational databases. However, there are many types of relational databases and not all of them will be useful for web applications.
In particular, it will be the client/server databases rather than the stand-alone packages that you will use for the web.
A client/server database works like this: A database server is left running 24 hours a day, and 7 days a week. Thus, the server can handle database requests at any hour. Database requests come in from "clients" who access the database through its command line interface or by connecting to a database socket. Requests are handled as they come in and multiple requests can be handled at one time.
For web applications which must be available for world wide time zone usage, it is essential to build upon a client/server database which can run all the time.
For the most part, these are the only types of databases that Internet Service Providers will even provide. However if you are serving web pages yourself, you should consider many of the excellent freeware, shareware or commercial products around. I myself like postgres for UNIX since I prefer a UNIX-based web server. However, there are plenty of good applications for PC and Mac as well.