Parity Block

Part 1:

Suppose we are given 3 data disks:

disk 1: 1111 0000
disk 2: 1010 1010
disk 3: 0011 1000

————————————————————–

To find compute the redundant disk, we perform mod-2 sum:

disk 4: 0110 0010

 

Part 2:

Suppose we updated the value for disk 2:

disk 2 old value: 1010 1010
disk 2 new value: 1100 1100

————————————————————–

Mod-2 Sum: 0110 0110
*1’s are the position to change (2,3,6,7)

disk 4 original: 0110 0010
disk 4 updated: 0000 0100

Result:
disk 1: 1111 0000
disk 2: 1100 1100
disk 3: 0011 1000
disk 4: 0000 0100

 

1.10 Points to Review

  • A database management system (DBMS) is software that supports management of large collections of data. A DBMS provides efficient data access, data independence, data integrity, security, quick application development, support for concurrent access, and recovery from system failure. (Section 1.1)
  • Storing data in a DBMS versus storing it in operating system files has many advantages. (Section 1.3)
  • Using a DBMS provides the user with data independence, efficient data access, automatic data integrity, and security. (Section 1.4)
  • The structure of the data is described in terms of a data model and the description is called a schema. The relational model is currently the most popular data model. A DBMS distinguishes between external, conceptual, and physical schemas and data independence, which are made possible by these three levels of abstraction, insulate the users of a DBMS from the way the data is structured and stored inside a DBMS. (Section 1.5)
  • query language and a data manipulation language enable high-level access and modification of the data. (Section 1.6)
  • transaction is a logical unit of access to a DBMS. The DBMS ensures that either all or none of a transaction’s changes are applied tot he database. For performance reasons, the DBMS processes multiple transactions concurrently, but ensures that the result is equivalent to running the transactions one after the other in some order. The DBMS maintains a record of all changes to the data in the system log, in order to undo partial transactions an recover from system crashes. Checkpointing is a periodic operation that can reduce the time for recovery from a crash (Section 1.7)
  • DBMS code is organized into several modules: the disk space manager, the buffer manager, a layer that supports the abstractions of files and index structures, a layer that implements relational operators, and a layer that optimizes queries and produces an execution plan in terms of relational operators. (Section 1.8)
  • database administrator (DBA) manages a DBMS for an enterprise. The DBA designs schemas, provide security, restore the system after a failure, and periodically tunes the database to meet changing user needs. Application programmers develop applications that use DBMS functionality to access and manipulate data, and end users involve these applications. (Section 1.9)

1.9 People who Deal with Databases

  • Database implementors, who build DBMS software; end users, who wish to store and use data in a DBMS
  • Database application programmers develop packages that facilitate data access for end users, who are usually not computer professionals, suing the host or data languages and software tools that DBMS vendors provide
  • Database administrator is responsible for:
    • Design of the conceptual and physical schemas: interacting with the users of the system to understand what data is to be stored in the DBM and how it is likely to be used
    • Security and authorization: ensuring that unauthorized data access is not permitted
    • Data availability and recovery from failures: ensuring if the system fails, users can continue to access as much of the uncorrupted data as possible
    • Database tuning: modifying the database to ensure adequate performance as user requirements change.

       

1.8 Structure of a DBMS

  • When a user issues a query, the parsed query is presented to a query optimizer, which uses information about how the data is stored to produce an efficient execution plan for evaluating the query.
  • An execution plan is a blueprint for evaluating a query, and is usually represented as a tree of relational operators (with annotations that contain additional detailed information about which access methods to use, etc).
  • Files and Access Methods layer includes a variety of software for supporting the concept of a file, which, in DBMS, is a collection of pages or a collections of records. This layer typically supports a heap file, or file of unordered pages, as well as indexes.
  • Buffer manager brings pages in from disk to main memory as needed in response to read requests.
  • The lowest layer of the DBMS software deals with management of space on disk, where the data is stored. Higher layers allocate, deallocate, read, and write pages through the disk space manager.
  • DBMS components associated with concurrency control and recovery include:
    • transaction manager, which ensures that transactions request and release locks according to a suitable locking protocol and schedules the execution transactions;
    • lock manager, which keeps track of requests for locks and grants locks on database objects when they become available
    • recover manager, which is responsible for maintaining a log, and restoring the system to a consistent state after a crash.

1.7 Transaction Management

  • A transaction is any one execution of a user program in a DBMS.
  • Concurrent Execution of Transactions
    • locking protocol is a set of rules to be followed by each transaction and enforced by DMS in order to ensure that even though actions of several transactions might be interleaved, the net effect is identical to executing all transactions in some serial order.
    • Lock is a mechanism used to control access to database objects
      • shared locks on an object can be held by two different transactions at the same time
      • exclusive lock on an object ensures that no other transactions hold any lock on this object.

         

  • Incomplete Transactions and System Crashes
    • The DBMS maintains a log of all writes to the database
    • Write-Ahead Log (WAL): each write action must be recorded in the log (on disk) before the corresponding change is reflected in the database itself.
    • Checkpoint: The procedure of periodically forcing some information to disk to reduce the time required to recover from a crash.
  • Summary
    • Every object that is read or written by a transaction is first locked in shared or exclusive mode, respectively. Placing a lock on an object restricts its availability to other transactions and thereby affects performance.
    • For efficient log maintenance, the DBMS must be able to selectively force a collection of pages in main memory to disk. Operating system support for this operation is not always satisfactory.
    • Periodic checkpointing can reduce the time needed to recover from a crash. Of course, this must be balanced against the fact that checkpointing too often slow down normal execution.

1.6 Queries in a DBMS

  • Questions involving the data stored in a DBMS are called queries.
  • Relational calculus is a formal query language based on mathematic logic and queries in this language have an intuitive, precise meaning.
  • Relational algebra is another formal query language, based on a collection of operators for manipulating relation, which is equivalent in power to the calculus.
  • A DBMS enables users to create, modify, and query data through a data manipulation language (DML).
  • The DML and DDL are collectively referred to as the data sublanguage when embedded within a host language.

     

1.5 Describing and Storing Data in a DBMS

  • data model is a collection of high-level data description constructs that hide many low-level storage details
  • semantic data model is a more abstract, high-level data model that makes it easier for a user to come up with a good initial description of the data in an enterprise.
  • A database design in terms of a semantic model serves as a useful starting point and is subsequently translated into a database design in terms of the data model the DBMS actually supports.
  • A widely used semantic data model called the entity-relationship (ER) model allows us to pictorially denote entities and the relationships among them

1.5.1 The Relational Model

  • The central data description construct in this model is relation, which can be thought of as a set of records.
  • A description of data in terms of a data model is called a schema.
  • The schema for a relation specifies its name, the name of each field or attribute or column.
  • Example: student information in a university database my be stored in a relation with the following schema (with 5 fields):
    • Students(sid: string, name: string, login: string, age: integer, gpa: real)
    • An example instance of the Students relation:
      sid name login age gpa
      53666 Jones jones@cs 18 3.4
      53588 Smith smith@ee 18 3.2
    • Each row in the Students relation is a record that describes a student. Every row follows the schema of the Student relation and schema can therefore be regarded as a template for describing a student.
    • We can make the description of a collection of students more precise by specifying integrity constraints, which are conditions that the records in a relation must staisfy.
    • Other notable models: hierarchial model, network model, object-oriented model, and the object-relational model.

1.5.2 Levels of Abstraction in a DBMS

  • data definition language (DDL) is used to define the external and conceptual schemas.
  • Information about conceptual, external, and physical schemas is stored in the system catalogs.
  • Any given database has exactly one conceptual schema and one physical schema because it has just one set of stored relations, but it may have several external schemas, each tailored to a particular group of users.

Conceptual Schema

  • The conceptual schema (sometimes called the logical schema) describes the stored data in terms of the data model of the DBMS.
  • Relations contain information about entities and relationships

Physical Schema

  • The physical schema specifies additional storage detail, summarizes how the relations described in conceptual schema are actually stored on secondary storage devices such as disks and tapes.
  • Decide what file organizations to use to store the relations, then create indexes to speed up data retrieval operations.

External Schema

  • External schemas allow data access to be customized and authorized at the level of individual user or groups of users.
  • Each external schema consists of a collection of views and relations from the conceptual schema.
  • view is conceptually a relation, but the records in a view are not stored in the DBMS. The records are computed using a definition for the view, in terms of relations stored in the DBMS.
  • The external schema design is guided by the end user requirements.

1.5.3 Data Independence

  • Data independence is achieved through the use of the three levels of data abstraction; in particular, the conceptual schema and the external schema provide distinct benefits in this area.
  • Logical data Independence:
    • Users can be shielded from changes in the logical structure of the data, or changes in the choice of relations to be stored.
    • Example: Student_public, Student_private => create levels using views in external schema
  • Physical data independence:
    • The conceptual schema insulates users from changes in the physical storage of the data.
    • The conceptual schema hides details such as how the data is actually laid out on disk, the file structure, and the choice of indexes.