Introduction to Database Management System

Share:

Database Management System (DBMS):

 A database management system (DBMS) is a collection of interrelated data and a set of programs to access those data. The collection of data is called database. The primary task of DBMS is to store and retrieve those data and information which are convenient and efficient.

 Database systems are designed to manage large bodies of information. Management of data involves both storing and manipulation of information. The system must be safe to store information, despite of system crash or unauthorized access.

 1.1 Applications of DBMS:

Banking

Airlines

Universities

Credit and Transaction System

Finance

Sales

Manufacturing

Human Resources

Stock exchanges

Power development board etc.

 Now a day DBMS is broadly used in every where to store information and retrieve them when needed.

 1.2 File Systems versus Database system:

Considering a saving bank enterprise that keeps all information about customers and savings accounts.

 File system is a system in which we store information in different files and retrieve information by file name. We have to write different application program to store different types of files. To manipulate them, the system needs a number of application programs that manipulate the files.

A program to debit or credit an account

A program to add a new account

A program to find the balance sheet of an account

A program to generate monthly statements

 System programmers write those programs when needs of bank. New programs are added into the system. Before DBMS organizations usually store data in file system.

            There are many disadvantages to store information in file processing systems:

Data Redundancy and inconsistency: As the information store in files for a long tine.  Different files may be in different formats and programs may be different programming languages. As a result, same information may appear in different places in files, e.g. address of individual customers consists in the file of saving-account records as well as in the file of checking account records. This redundancy increase storage and access cost. This may create data inconsistency, e.g. various copies of same data may no longer support. Consider, change of a customer address in saving account records but may not elsewhere in the system automatically.

Difficulties in accessing data:  In file processing system, it is very difficult to access data. Suppose, one of the bank officer need a list of consumers live in the same area. The officer asks the data processing department to generate a list like that. But the department will unable to delivery because there is not available application program. So the officer has now two alternatives: either obtain the list of all customers or manually prepare or ask the system programmer to write an application facing the criteria. Of course, both of them are unsatisfactory. Suppose, the programmer writes such program and a few days later, the officer need a list of customers with a balance of $10,000 or more. In this case, the programmer has to write another application program for the list. Finally, we can conclude that a particular task will need an application program.

Data Isolation: Data are scattered in various files and files may be in different formats in file system. Writing new programs to retrieve the appropriate data is difficult.

Problems of Integrity: Data stored in the database must satisfy certain types of integrity consistency. e.g. the minimum balance of a bank say $50. Programmer applies these constraints in the system by adding appropriate code in various application programs. When new constraints are added, it is difficult to change the programs. This problem is compounded if constraints involve several data items from different data files.

Atomicity Problems: System failure is an issue of computer or electrical system. Consider a program to transfer $50 from account A to account B. If system failure occurs during the execution of program, it is possible that the $50 was removed from the account A but was not credited to account B, resulting inconsistent in database state. It must essential to database that both debit and credit occur neither nothing occur. The fund transfer must be atomic. It is too much difficult to ensure in file processing system.

Concurrent access anomalies: Many systems allow multiple users to update simultaneously for faster response. This may create inconsistent of data. Suppose, bank account X contain $1000. If two customers withdraw funds (say $100 and $200) from the account X at the same time, the concurrent executions may leave account in an inconsistent state. If the two programs run concurrently, they may both read balance $1000 and write back $900 and $800 respectively. Finally the account contain the resulting value last occur. The account may contain either $900 or $800 instead of correct value $700. To protect against this possibility, the system must maintain supervision. But in file processing system supervision is difficult to provide like that.

Security Problems: All the user of the database has no permission to access all data. For e.g. in banking system, payroll personnel need to see the report only the part of the database contains various bank employees. They do not need to see customer’s information. But if application programs are added to the system in an ad hoc manner, enforcing such security constraint is difficult.

1.3 View of Data:

Any Database is a collection of interrelated files and a set of programs that allow users to access and modify these files. The major purpose of database system is to provide users to with an abstract view of the data. The system hides certain details of how the data are stored and maintained.

1.3.1 Data Abstraction: Most of the database system users are not computer trained, developers hide the complexity from users through several level of abstraction, to simplify users. Those levels are:

Physical level: The primary level of abstraction describes how the data are actually stored. The physical level describes complex primary level of data structures in details.

Logical level: This higher level of abstraction describes what data are stored in the database and how data are related with each other. Database administrator decides what types of information to keep in the database, use the logical level of abstraction.

View level: The highest level of data abstraction describes only part of the entire database. Most of the users don’t need whole information. They only need to access a part of database. The system may provide many views for the same database. Figure 1.1 shows the relationship among the three levels of abstraction.

1.3.2        Instances and Schemas:

Database may change over time as information is inserted and deleted. The collection of data stored in the database at a particular moment is called as instances of the database. The overall design of the database is called the database schema. Schemas are changed infrequently, if at all. E.g. in a program it is similar schema likes to declaration of variable and the particular value of the variable at a particular moment corresponded to an instance of a database schema.

Database systems have several schemas according to data abstraction.

Physical schema which describes the database design at physical level.

Logical schema which describes the database design at logical level.

Database may also have several schemas at view level, sometimes called subschema, that describe different views of the database.

1.4          Data Models:

   The structure of a database is known as data models. Data model is a collection of conceptual tools for describing data, data relationships, data semantics and consistency constraints. To understand this concept of a data model, we describe two data models in this section: The entity-relationship model and the relational model. Both provide the way to describe the design of a database at the logical level.

1.4.1        The Entity-Relationship Model:

The entity-relation (E-R) data model is based on a real world objects, called entities and the relationships among these objects. An entity is a ‘thing’ or ‘object’ in the real world that is distinguishable with other objects, e.g. each person is an entity and bank account can be considered as entities.

Entities are described in a database by a set of attributes. Consider, the attributes account-number and balance describe one particular account in a bank and they form attributes of the account entity set. Similarly, customer-name, customer-street address and customer-city describe as customer entity.

A unique customer identifier must be assigned to each customer.

A relationship is an association among the several entities, e.g. a depositor relationship associates customers with each account that she has. The set of all entities of the same types and the set of all relationships of the same types are termed an entity set and relationship set, respectively.

The overall logical structure (schema) of the database can be expressed graphically by an E-R diagram, which is build up from the following components.

Rectangles, which respect entity set.

Ellipses, which represent attributes

Diamonds, which represent relationships among entity sets

Lines, which link attributes to entity sets and entity sets to relationships.

Each component is labeled with the entity or relationship that is represents. Considering a banking database system Figure 1.2 shows the corresponding E-R diagram. 

1.4.2        Relational Model:

 The relational model uses a collection of tables to represent both data and the relationship among those data. Each table has multiple columns, and each column has a unique name. Figure 1.3 presents a sample relational database comprising three tables: one show a details of bank customers, the second shows account and the third shows which accounts belong to which customers.

 In customer table shows e.g. that the customer identified by customer-id 192-83-7465 is named Johnson and lives at 12 Alma St. in Palo Alto. Account table, shows, e.g. that account-number A-101 has a balance of $500. The depositor table shows which accounts belong to which customer. The relational model is am example of record base model. Record base model are so named because the database is structures in fixed format records of several types. Each record type defines a fixed number of fields or attributes. The columns of the table correspond to the attribute of the record.

Relational data model is widely used data model. A vast majority of current database systems are on the relational model. 

1.4.3       Others Data models

 The Object oriented data model is another type of data model. The object oriented model can be seen as extending the E-R model with notions of encapsulation, methods (functions) and object identity.

 The object relational data model combines features of the object oriented data model and relational data model.

 Semi structured data a model permits the specific of data where individual data items of the same type may have different sets of attributes. The Extensible markup language (XML) is widely used to represent semi structured data.

Network data model and hierarchical data model, preceded the relational data model. these models were tied closely to the underlying implementation and complicated the task of modeling data.

1.5          Database Languages:

 A Database system provides a data definition language to specify the database schema and a data manipulation language to express database queries and updates. Data definition and data manipulation languages are not two separate languages. They are the part of a single database language, widely known as SQL language.

 1.5.1 Data Definition Language

 We specify a database schema by a set of definitions expressed by a special language called a data-definition language (DDL). The following statement in the SQL language defines the account table:

create table account

                        ( account-number char(10),

                          balance integer)         

After exaction of the above DDL statement creates the account table. We define the storage structure and access methods used by the database system by a set of statements in a special type of DDL called data storage and definition language. These statement define the implementation details of the database schema, which are usually hidden from the users.

 The database must specify certain consistency constraints to store data value in database. Suppose the balance of an account should not fall below $100. The DDL provide such constraint facilities. The database systems check this constraint every time the database is updated.

 1.5.2        Data-manipulation Language

 Data manipulation is known as the following:

The retrieval of information stored in the database

The insertion of new information into the database

The deletion of information from the database

The modification of information stored in the database

 A Data-manipulation language (DML) is a language that enables users to access or manipulate data as organized by the appropriate data model. These are basically two types:

            Procedural DMLs requires a user to specify what data are needed and how to get these data.

            Declarative DMLs (also known as nonprocedural DMLs) requires a user to specify what data are needed without specifying how to get those data.

 Declarative DMLs are normally easy to learn and use than procedural DMLs. The DML component of the SQL language is nonprocedural.

            A query is a statement requesting the retrieval of information. The portion of a DML that involves information retrieval is called a query language.

             select customer.customer-name

                        from customer

                        where customer.customer-id=192-83-7465

This query in the SQL language fields the name of the customer whose customer-id =192-83-7465.

If the query were run on the table in figure 1.3, the name Johnson would be displayed.

            Queries may involve information from more than one table. The following query finds the balance of all accounts owned by the customer with customer-id 192-83-7465.

                         select account.balance

                                    from depositor,account

                                    where depositor.customer-id=192-83-7465 and

                                                depositor.account-number=account.account-number

 Consider, the query run on the table in figure 1.3, the system would find that two accounts numbered A-101 and A-201 are owned by the customer 192-83-7465 and balances are $500 and $900 respectively.

 1.5.3        Database Access from application programs:

Application programs are programs that used to interact with the database. Application programs are usually written in a host language, such as COBOL, C, C++, or Java. E.g.  In a banking system are programs that generate payroll, checks, debit accounts, credit accounts, or transfer funds between accounts.

To access the database, DML statements need to be executing from the host language. There are two ways to do this:

By providing an application program interface (a set of procedures) that can be used DML and DDL statements to the database and retrieve the results.The Open Database Connectivity (ODBC) standard defined by Microsoft for use with the C language is a commonly used application program interface standard. The Java Database connectivity (JDBC) standard provides corresponding features to the Java language.

By extending the host language syntax to embed DML calls within the host language program. Usually, a special character prefaces DML calls and a preprocessor, called the DML precompiled, converts the DML statements to normal procedure in the host language.

1.6    Database users and Administrator:

A primary goal of a database system is to retrieve information from and store new information in the database. People who work a database can be categorized as database users or database administrators.

 1.6.1 Database users and user interfaces:

Three are four types of Database users. They are:

 Naive users are genuine users who interact with the system by invoking one of the application programs that have been written previously. Consider, a bank teller who need to transfer $50 from account A to Account B invoke a program called transfer. This program asked the teller for the amount of money to be transferred, the account from which the money is to be transferred and the account to which the money is to be transferred.

Application programmers are computer professionals who write application programs. Application programmers can choose from many tools to develop user interfaces. Rapid Application Development (RAD) tools are tools that enable an application programmer to construct forms and reports without writing a program. There are also special types of programming languages that combine imperative control structure (for e.g. for loops, while loops, and if then else statements) with statements of the data manipulation language. These languages, sometimes called fourth generation languages, often including special features to facilitate the generation of forms and the display of data on the screen. Most of the database systems include a fourth generation language.

Sophisticated users interact with the system without writing programs. They form their requests in a database query language. They submit query processor whose function is to break down DML statements into instructions that the storage manager understands. Analyst who submit queries to explore data in the database fall in this categories.

Specialized users are sophisticated users who write specialized database applications that do not fit into traditional data processing framework. Among these applications are computer aided design systems, knowledge base and expert systems, systems that store data with complex data types (e.g.  Graphics data and audio data) and environment-modeling systems.

1.6.2 Database Administrator:

One of the main reasons for using DBMS is to have central control of both the data and the programs that access those data. A person who has such central control over the system is called database administrator (DBA). The functions of a DBA include:

 Schema definition: The DBA creates the original database schema by executing a set of data definition statements in the DDL.

Storage structure and access-method definition.

Schema and physical –organization modification: The DBA carries out changes to the schema and physical organization to reflect the changing needs of the organization or to alter the physical organization to improve performance.

Granting of authorization for data access: By granting different types of authorization, the database administration can regulate which parts of the database various users can access. The authorization information is kept in a special system structure that the database system consults whenever someone attempts to access the data in the system.

Routine maintenance: The Database Administrator’s routine maintenance activities are:

Periodically backing the database, either onto tapes or onto remote services, to prevent loss of data in case of disasters such as flooding.

Ensuring that enough free disk space is available for normal operations, and upgrading disk space required.

Monitoring jobs running on the database and ensuring that performance is not degraded by very expensive tasks submitted by some users.

No comments

We appreciate your comment! You can either ask a question or review our blog. Thanks!!