NEWS & UPDATES >> BCA BCSP064 Synopsis & Project Work Started for DEC 2017, IGNOU MCA MCSP060 Synopsis Work Started for DEC 2017, CONTACT 4 IGNOU Mini Project

WHAT IS Hypercubes System, HOW MANY TYPES OF MULTIPROCESSOR OPERATING SYSTEMS & WHAT IS MULTIPROCESSOR SYNCHRONIZATION

When clients in a system, particularly CPUs in a multiprocessing system, maintain caches of a common memory resource, problems arise. Referring to the Figure 4, if the top client has a copy of a memory block from a previous read and the bottom client changes that memory block, the top client could be left with an invalid cache of memory without it knowing any better. Cache coherence is intended to manage such conflicts and maintain consistency between cache and memory.

Let us discuss some techniques which can be employed to decrease the impact of bus and memory saturation in bus-oriented system.

1)           Wider Bus Technique: As suggested by its name a bus is made wider so that more bytes can be transferred in a 

single bus cycle. In other words, a wider parallel bus increases the bandwidth by transferring more bytes in a single bus 

cycle. The need of bus transaction arises when lost or missed blocks are to fetch into his processor cache. In this 

transaction many system supports, block-level reads and writes of main memory. In the similar way, a missing block can 

be transferred from the main memory to his processor cache only by a single main-memory (block) read action. The 

advantage of block level access to memory over word-oriented busses is the amortization of overhead addressing, 

acquisition and arbitration over a large number of items in a block. Moreover, it also enjoyed specialised burst bus cycles, 

which makes their transport more efficient.

2)            Split Request/Reply Protocols: The memory request and reply are split into two individual works and are treated as separate bus transactions. As soon as a processor requests a block, the bus released to other user, meanwhile it takes for memory to fetch and assemble the related group of items.

The bus-oriented system is the simplest architecture of multiprocessor system. In this way it is believed that in a tightly coupled system this organisation can support on the order of 10 processors. However, the two contention points (the bus and the shared memory) limit the scalability of the system.


Crossbar-Connected System


Crossbar-connected System (illustrated in Figure 5) is a grid structure of processor and memory modules. The every cross point of grid structure is attached with switch. By looking at the Figure it seems to be a contention free architecture of multiprocessing system, it shows simultaneous access between processor and memory modules as N number of processors are provided with N number of memory modules. Thus each processor accesses a different memory module. Crossbar needs N2 switches for fully connected network between processors and memory. Processors may or may not have their private memories. In the later case the system supports uniform-memory-access. 

Where P= processor, M=Memory Module and =Switch

But Unfortunately this system also faces the problem of contention when,

1)            More than two processors (P1 and P2) attempt to access the one memory module (M1) at the same time. In this condition, contention can be avoided by making any one processor (P1) deferred until the other one (P2) finishes this work or left the same memory module (M1) free for processor P1.

2)           More than two processor attempts to access the same memory module. This problem cannot be solved by above-mentioned solution.

Thus in crossbar connection system the problem of contention occurs on at memory neither at processor nor at interconnection networks level. The solution of this problem includes quadratic growth of its switches as well as its complexity level. Not only this, it will make the whole system expensive and also limit the scalability of the system.


WHAT IS Hypercubes System

Hypercube base architecture is not an old concept of multiprocessor system. This architecture has some advantages over other architectures of multiprocessing system. As the Figure 6 indicates, the system topology can support large number of processors by providing increasing interconnections with increasing complexity. In an n-degree hypercube architecture, we have:

1)           2n nodes (Total number of processors)

2)           Nodes are arranged in n-dimensional cube, i.e. each node is connected to n number of nodes.

3)           Each node is assigned with a unique address which lies between 0 to2n –1

4)           The adjacent nodes (n-1) are differing in 1 bit and the nth node is having maximum 'n' internode                      distance.

Let us take an example of 3-degree hypercube to understand the above structure:

1)           3-degree hypercube will have 2n nodes i.e., 23 = 8 nodes

2)           Nodes are arranged in 3-dimensional cube, that is, each node is connected to 3 number of nodes.

3)           Each node is assigned with a unique address, which lies between 0 to 7 (2n –1), i.e., 000, 001, 010, 011,                100, 101, 110, 111

4)          Two adjacent nodes differing in 1 bit (001, 010) and the 3rd (nth ) node is having maximum '3'internode                    distance (100).

Hypercube provide a good basis for scalable system because its communication length grows logarithmically with the number of nodes. It provides a bi-directional communication between two processors. It is usually used in loosely coupled multiprocessor system because the transfer of data between two processors goes through several intermediate processors. The longest internodes delay is n-degree. To increase the input/output bandwidth the input/output devices can be attached with every node (processor).


Multistage Switch Based system

Multistage Switch Based System permits simultaneous connection between several input-output pairs. It consists of several stages of switches which provide multistage interconnection network. A N input-output connections contains K= log2N stages of N/2 switches at each stage. In simple words, N*N processor-memory interconnection network requires (N/2) x = log2N switches.

For example, in a 8u8 process-memory interconnection network requires (8/2* log28) = 4*3= 12 switches. Each switch acts as 2u2 crossbar.

This network can connect any processor to any memory module by making appropriate connection of each of the 'K' stages. The binary address of processor and memory gives binary string routing path between module pairs of input and output. the routing path id decided by on the basis of destination binary address, that the sender includes with each request for connection. Various combinations of paths between sources to destination are possible by using different switch function (straight, swap, copy, etc.)

In multistage switch based system all inputs are connected to all outputs in such a way that no two-processor attempt to access the same memory at the same time. But the problem of contention, at a switch, arises when some memory modules are contested by some fixed processor. In this situation only one request is allowed to access and rest of the requests are dropped. The processor whose requests were dropped can retry the request or if buffers are attached with each switch the rejected request is forwarded by buffer automatically for transmission. This Multistage interconnection networks also called store-and-forward networks.


HOW MANY TYPES OF MULTIPROCESSOR OPERATING SYSTEMS
The multiprocessor operating systems are complex in comparison to multiprograms on an uniprocessor operating system because multiprocessor executes tasks concurrently.

Therefore, it must be able to support the concurrent execution of multiple tasks to increase processors performance.

Depending upon the control structure and its organisation the three basic types of multiprocessor operating system are:

1)            Separate supervisor

2)            Master-slave

3)            Symmetric Supervision

Separate Supervisors

In separate supervisor system each process behaves independently. Each system has its own operating system which manages local input/output devices, file system and memory well as keeps its own copy of kernel, supervisor and data structures, whereas some common data structures also exist for communication between processors. The access protection is maintained, between processor, by using some synchronization mechanism like semaphores. Such architecture will face the following problems:

1)            Little coupling among processors.

2)            Parallel execution of single task.

3)            During process failure it degrades.

4)            Inefficient configuration as the problem of replication arises between supervisor/kernel/data                          structure code and each processor.

Master-Slave

In master-slave, out of many processors one processor behaves as a master whereas others behave as slaves. The master processor is dedicated to executing the operating system. It works as scheduler and controller over slave processors. It schedules the work and also controls the activity of the slaves. Therefore, usually data structures are stored in its private memory. Slave processors are often identified and work only as a schedulable pool of resources, in other words, the slave processors execute application programmes.

This arrangement allows the parallel execution of a single task by allocating several subtasks to multiple processors concurrently. Since the operating system is executed by only master processors this system is relatively simple to develop and efficient to use. Limited scalability is the main limitation of this system, because the master processor become a bottleneck and will consequently fail to fully utilise slave processors.


Symmetric

In symmetric organisation all processors configuration are identical. All processors are autonomous and are treated 

equally. To make all the processors functionally identical, all the resources are pooled and are available to them. This 

operating system is also symmetric as any processor may execute it. In other words there is one copy of kernel that can 

be executed by all processors concurrently. To that end, the whole process is needed to be controlled for proper interlocks 

for accessing scarce data structure and pooled resources.

The simplest way to achieve this is to treat the entire operating system as a critical section and allow only one processor to execute the operating system at one time. This method is called 'floating master' method because in spite of the presence of many processors only one operating system exists. The processor that executes the operating system has a special role and acts as a master. As the operating system is not bound to any specific processor, therefore, it floats from one processor to another.






Parallel execution of different applications is achieved by maintaining a queue of ready processors in shared memory. 




Processor allocation is then reduced to assigning the first ready process to first available processor until either all 




processors are busy or the queue is emptied. Therefore, each idled processor fetches the next work item from the queue.
MULTIPROCESSOR OPERATING SYSTEM FUNCTIONS AND REQUIREMENTS

A multiprocessor operating system manages all the available resources and schedule functionality to form an abstraction it will facilitates programme execution and interaction with users.

A processor is one of the important and basic types of resources that need to be manage. For effective use of multiprocessors the processor scheduling is necessary. Processors scheduling undertakes the following tasks:

(i)          Allocation of processors among applications in such a manner that will be consistent with system design objectives. It affects the system throughput. Throughput can be improved by co-scheduling several applications together, thus availing fewer processors to each.

(ii)        Ensure efficient use of processors allocation to an application. This primarily affects the speedup of the system.
The above two tasks are somehow conflicting each other because maximum speedup needs dedication of a large proportion of a systems processors to a single application which will decrease throughput of the system. Due to the difficulties of automating the process, the need for explicit programmer direction arises to some degree. Generally the language translators and preprocessors provide support for explicit and automatic parallelism. The two primary facets of OS support for multiprocessing are:

(i)           Flexible and efficient interprocess and interprocessor synchronization mechanism, and

(ii)         Efficient creation and management of a large number of threads of activity, such as processes or threads.

The latter aspect is important because parallelism is often accomplished by splitting an application into separate, individually executable subtasks that may be allocated to different processors.

The Memory management is the second basic type of resource that needs to be managed. In multiprocessors system memory management is highly dependent on the architecture and inter-connection scheme.

In loosely coupled systems memory is usually handled independently on a pre-processor basis whereas in multiprocessor system shared memory may be simulated by means of a message passing mechanism.

In shared memory systems the operating system should provide a flexible memory model that facilitates safe and efficient access to share data structures and synchronization variables.

A multiprocessor operating system should provide a hardware independent, unified model of shared memory to facilitate porting of applications between different multiprocessor environments. The designers of the mach operating system exploited the duality of memory management and inter-process communication.

The third basic resource is Device Management but it has received little attention in multiprocessor systems to date, because earlier the main focus point is speedup of compute intensive application, which generally do not generate much input/output after the initial loading. Now, multiprocessors are applied for more balanced general-purpose applications, therefore, the input/output requirement increases in proportion with the realised throughput and speed.

WHAT IS MULTIPROCESSOR SYNCHRONIZATION

Multiprocessor system facilitates parallel program execution and read/write sharing of data and thus may cause the processors to concurrently access location in the shared memory. Therefore, a correct and reliable mechanism is needed to serialize this access. This is called synchronization mechanism. The mechanism should make access to a shared data structure appear atomic with respect to each other. In this section, we introduce some new mechanism and techniques suitable for synchronization in multiprocessors.

Test-and-Set

The test-and-set instruction automatically reads and modifies the contents of a memory location in one memory cycle. It is as follows:

FUNCTION TEST-AND-SET (var m: Boolean); Boolean; begin

test-and set:=m; m:=rue
end;

The test-and-set instruction returns the current value of variable m (memory location) and sets it to true. This instruction can be used to implement P and V operations (Primitives) on a binary semaphore, S, in the following way (S is implemented as a memory location):

P(S): while Test-and-Set (S) do nothing; V(S): S:=false;

Initially, S is set to false. When a P(S) operation is executed for the first time, test-and-set(S) returns a false value (and sets S to true) and the "while" loop of the P(S) operation terminates. All subsequent executions of P(S) keep looping because S is true until a V(S) operation is executed.

Compare-and-Swap

The compare and swap instruction is used in the optimistic synchronization of concurrent updates to a memory location. This instruction is defined as follows (r1and r2 are to registers of a processor and m is a memory location):

FUNCTION TEST-AND-SET (var m: Boolean); Boolean; var temp: integer;
begin
temp:=m;

if temp = r1 then {m:= r2;z:=1} else [r1:= temp; z:=0}
end;

If the contents of r1 and m are identical, this instruction assigns the contents of r2 to m and sets z to 1. Otherwise, it assigns the contents of m to r1 and set z to 0. Variable z is a flag that indicates the success of the execution. This instruction can be used to synchronize concurrent access to a shared variable.

Fetch-and-Add

The fetch and add instruction is a multiple operation memory access instruction that automatically adds a constant to a memory location and returns the previous contents of the memory location. This instruction is defined as follows:

Function Fetch-and-add (m: integer; c: integer); Var temp: integer;
Begin

Temp:= m; M:= m + c; Return (temp)
end;


This instruction is executed by the hardware placed in the interconnection network not by the hardware present in the memory modules. When several processors concurrently execute a fetch-and-add instruction on the same memory location, these instructions are combined in the network and are executed by the network in the following way:

A single increment, which is the sum of the increments of all these instructions, is added to the memory location.

A single value is returned by the network to each of the processors, which is an arbitrary serialisation of the execution of the individual instructions.

If a number of processors simultaneously perform fetch-and-add instructions on the same memory location, the net result is as if these instructions were executed serially in some unpredictable order.

The fetch-and-add instruction is powerful and it allows the implementation of P and V operations on a general semaphore, 

S, in the following manner:

P(S): while (Fetch-add-Add(S, -1)< 0) do begin

Fetch-and-Add (S, 1); while (S<0) do nothing;
end;

The outer "while-do" statement ensures that only one processor succeeds in decrementing S to 0 when multiple processors try to decrement variable S. All the unsuccessful processors add 1 back to S and try again to decrement it. The "while-do" statement forces an unsuccessful processor to wait (before retrying) until S is greater then 0.

V(S): Fetch-and-Add (S, 1).
Check Your Progress 1

1)            What is the difference between a loosely coupled system and a tightly coupled system? Give examples.


      
         One feature that is commonly characterizing tightly coupled systems is that they share the clock. Therefore,              multiprocessors are typically tightly coupled but distributed workstations on a network are not.

        Another difference is that: in a tightly-coupled system, the delay experienced when a message is sent from              one computer to another is short, and data rate is high; that is, the number of bits per second that can be                  transferred is large. In a loosely-coupled system, the opposite is true: the intermachine message delay is large          and the data rate is low.

      For example, two CPU chips on the same printed circuit board and connected by wires etched onto the board are           likely to be tightly coupled, whereas two computers connected by a 2400 bit/sec modem over the telephone               system are certain to be loosely coupled.




2)            What is the difference between symmetric and asymmetric multiprocessing?

         The difference between symmetric and asymmetric multiprocessing: all processors of symmetric multiprocessing are           peers; the relationship between processors of asymmetric multiprocessing is a master-slave relationship. More                   specifically, each CPU in symmetric multiprocessing runs the same copy of the OS, while in asymmetric                               multiprocessing, they split responsibilities typically, therefore, each may have specialised (different) software and                 roles.

No comments:

Post a Comment