OpenMP - PUC-Rio | Home

OpenMP - PUC-Rio | Home

OpenMP baseado em material cedido por Cristiana Amza www.eecg.toronto.edu/~amza/ece1747h/ece1747h.html What is OpenMP? Standard for shared memory programming for scientific applications. Has specific support for scientific application needs (unlike Pthreads).

Rapidly gaining acceptance among vendors and application writers. See http://www.openmp.org for more info. gcc a partir de 4.3.2 OpenMP API Overview API is a set of compiler directives inserted in the source program (in addition to some library functions).

Ideally, compiler directives do not affect sequential code. pragmas in C / C++ . (special) comments in Fortran code. OpenMP API Example Sequential code: statement1;

statement2; statement3; Assume we want to execute statement 2 in parallel, and statement 1 and 3 sequentially. OpenMP API Example (2 of 2) OpenMP parallel code: statement 1; #pragma

statement2; statement3; Statement 2 (may be) executed in parallel. Statement 1 and 3 are executed sequentially. Important Note By giving a parallel directive, the user asserts that the program will remain correct if the statement is executed in parallel.

OpenMP compiler does not check correctness. API Semantics Master thread executes sequential code. Master and slaves execute parallel code. Note: very similar to fork-join semantics of Pthreads create/join primitives. incremental parallelization (!!)

OpenMP Implementation Overview OpenMP implementation compiler, library. Unlike Pthreads (purely a library). OpenMP Example Usage (1 of 2)

Sequential Program Annotated Source OpenMP Compiler compiler switch Parallel

Program OpenMP Example Usage (2 of 2) If you give sequential switch, comments and pragmas are ignored. If you give parallel switch, comments and/or pragmas are read, and cause translation into parallel program.

Ideally, one source for both sequential and parallel program (big maintenance plus). OpenMP Directives Parallelization directives: parallel region parallel for Data environment directives:

shared, private, threadprivate, reduction, etc. Synchronization directives: barrier, critical General Rules about Directives They always apply to the next statement, which must be a structured block.

Examples #pragma omp statement #pragma omp { statement1; statement2; statement3; } OpenMP Parallel Region #pragma omp parallel

A number of threads are spawned at entry. Each thread executes the same code. Each thread waits at the end. Very similar to a number of create/joins with the same function in Pthreads. Getting Threads to do Different Things Through explicit thread identification (as in Pthreads).

Through work-sharing directives. Thread Identification int omp_get_thread_num() int omp_get_num_threads() Gets the thread id. Gets the total number of threads. Example

#pragma omp parallel { if( !omp_get_thread_num() ) master(); else slave(); } Work Sharing Directives Always occur within a parallel region directive.

Two principal ones are parallel for parallel section OpenMP Parallel For #pragma omp parallel #pragma omp for for( ) { } Each thread executes a subset of the iterations.

All threads wait at the end of the parallel for. iteraes devem ser independentes for deve ter formato limitado for (index-start; index oprel end; index op= index) break, return, exit, goto no so permitidos varivel de controle a nica privada por default Multiple Work Sharing Directives

May occur within a single parallel region #pragma { #pragma for( ; ; ) { #pragma for( ; ; ) { } omp parallel

omp for } omp for } All threads wait at the end of the first for. The NoWait Qualifier #pragma omp parallel {

#pragma omp for nowait for( ; ; ) { } #pragma omp for for( ; ; ) { } } Threads proceed to second for w/o waiting. Parallel Sections Directive #pragma omp parallel

{ #pragma omp sections { {} #pragma omp section this is a delimiter {} #pragma omp section {} }

} A Useful Shorthand #pragma omp parallel #pragma omp for for ( ; ; ) { } is equivalent to #pragma omp parallel for for ( ; ; ) { }

(Same for parallel sections) Note the Difference between ... #pragma { #pragma for( ; ; ) { f();

#pragma for( ; ; ) { } omp parallel omp for } omp for }

and ... #pragma for( ; ; ) { f(); #pragma for( ; ; ) { omp parallel for } omp parallel for }

Sequential Matrix Multiply for( i=0; i for( i=0; i temp[i][j] = 0.25 * ( grid[i-1][j] + grid[i+1][j] grid[i][j-1] + grid[i][j+1] ); for( i=0; i for (i=0; i

for( j=0; j Equivalent OpenMP SOR for some number of timesteps/iterations { #pragma omp parallel { #pragma omp for for (i=0; i for( i=0; i Oftentimes, parallelism is only useful if the problem size is sufficiently big. For smaller sizes, overhead of parallelization

exceeds benefit. Conditional Parallelism: Specification #pragma omp parallel if( expression ) #pragma omp for if( expression ) #pragma omp parallel for if( expression ) Execute in parallel if expression is true, otherwise execute sequentially.

Conditional Parallelism: Example for( i=0; i 100 ) for( j=i+1; j Scheduling of Iterations:

Issue Scheduling: assigning iterations to a thread. So far, we have assumed the default which is block scheduling. OpenMP allows other scheduling strategies as well, for instance cyclic, gss (guided selfscheduling), etc. Scheduling of Iterations: Specification

#pragma omp parallel for schedule() can be one of block (default) cyclic gss Example Multiplication of two matrices C = A x B, where the A matrix is upper-triangular (all elements

below diagonal are 0). 0 A Sequential Matrix Multiply Becomes for( i=0; i for( k=i; k for( k=i; k characteristics. Reminder: Matrix Multiply #pragma omp parallel for for( i=0; i a, b, c are shared i, j, k are private Data Environment Directives Private Threadprivate Reduction

Private Variables #pragma omp parallel for private( list ) Makes a private copy for each thread for each variable in the list. This and all further examples are with parallel for, but same applies to other region and worksharing directives.

Private Variables: Example for( i=0; i b[i] = tmp; } Swaps the values in a and b. Loop-carried dependence on tmp.

Easily fixed by privatizing tmp. Private Variables: Example (2 of 2) #pragma omp parallel for private( tmp ) for( i=0; i tmp = a[i]; a[i] = b[i]; b[i] = tmp; }

Removes dependence on tmp. Would be more difficult to do in Pthreads.?? Private Variables: Alternative 1 for( i=0; i tmp[i] = a[i]; a[i] = b[i]; b[i] = tmp[i];

} Requires sequential program change. Wasteful in space, O(n) vs. O(p). Private Variables: Alternative 2 f() { int tmp; /* local allocation on stack */

for( i=from; i basis.

Threadprivate variables are global variables that are private throughout the execution of the program. Threadprivate #pragma omp threadprivate( list ) Example: #pragma omp threadprivate( x)

Requires program change in Pthreads. Requires an array of size p. Access as x[pthread_self()]. Costly if accessed frequently. Not cheap in OpenMP either. Reduction Variables #pragma omp parallel for reduction( op:list ) op is one of +, *, -, &, ^, |, &&, or || The variables in list must be used with this

operator in the loop. The variables are automatically initialized to sensible values. Reduction Variables: Example #pragma omp parallel for reduction( +:sum ) for( i=0; i SOR Sequential Code with Convergence for( ; diff > delta; ) { for (i=0; i } } SOR Sequential Code with Convergence for( ; diff > delta; ) { #pragma omp parallel for for (i=0; i for( i=0; i delta; ) { #pragma omp parallel for for (i=0; i Bummer: no reduction operator for max or min. Synchronization Primitives Critical #pragma omp critical name

Implements critical sections by name. Similar to Pthreads mutex locks (name ~ lock). Barrier #pragma omp critical barrier Implements global barrier. OpenMP SOR with Convergence #pragma omp parallel private( mydiff ) for( ; diff > delta; ) {

#pragma omp for nowait for( i=from; i for( i=from; i Big bummer: no condition variables. Result: must busy wait for condition

synchronization. Clumsy. Very inefficient on some architectures. PIPE: Sequential Program for( i=0; i int_pic_3 = trans3( int_pic_2);

out_pic = trans4( int_pic_3); } Sequential vs. Parallel Execution Sequential Parallel (Color -- picture; horizontal line -- processor).

PIPE: Parallel Program P0: for( i=0; i PIPE: Main Program #pragma omp parallel sections { #pragma omp section stage1(); #pragma omp section stage2(); #pragma omp section stage3(); #pragma omp section

stage4(); } PIPE: Stage 1 void stage1() { num1 = 0; for( i=0; i } PIPE: Stage 2 void stage2 () { for( i=0; i #pragma omp critical 2 num2++; } } OpenMP PIPE Note the need to exit critical during wait.

Otherwise no access by other thread. Never busy-wait inside critical! Example 1 for( i=0; i tmp = a[i]; a[i] = b[i]; b[i] = tmp; }

Dependence on tmp is removed. Example 2 for( i=0, sum=0; i Dependence on sum. Reduction #pragma omp parallel for reduction(+,sum)

for( i=0, sum=0; i index += i; a[i] = b[index]; }

Dependence on index. Induction variable: can be computed from loop variable. Induction Variable Elimination #pragma omp parallel for for( i=0, index=0; i }

Dependence on induction variable index, but no closed formula for its value. Loop Splitting for( i=0; i

b[i] = g(a[index[i]]); } Loop splitting has removed dependence. Example 5 for( k=0; k Dependence on a[i][j] prevents k-loop parallelization. No dependencies carried by i- and j-loops. Example 5 Parallelization for( k=0; k We can do better by reordering the loops. Loop Reordering #pragma omp parallel for for( i=0; i Example 6 #pragma omp parallel for for(i=0; i Loop Fusion #pragma omp parallel for for(i=0; i process(a); a++; } The number of loop iterations is unknown.

Special Case of Loop Splitting for( count=0, p=a; p!=NULL; count++, p++ ); #pragma omp parallel for for( i=0; i Count the number of loop iterations. Then parallelize the loop. Example 8: Linked Lists

for(p=head; !p; p=p->next ) /* Assume process does not affect linked list */ process(p); Dependence on p Another Special Case of Loop Splitting for( p=head, count=0; p!=NULL; p=p->next, count++);

count[i] = f(count, omp_get_numthreads()); start[0] = head; for( i=0; i< omp_get_numthreads(); i++ ) { for( p=start[i], cnt=0; cntnext); start[i+1] = p->next; } Another Special Case of Loop Splitting #pragma omp parallel sections

for( i=0; i< omp_get_numthreads(); i++ ) #pragma omp section for(p=start[i]; p!=start[i+1]; p=p->next) process(p); Example 9 for( i=0, wrap=n; i wrap = i; }

Dependence on wrap. Only first iteration causes dependence. Loop Peeling b[0] = a[0] + a[n]; #pragma omp parallel for for( i=1; i }

Example 10 for( i=0; in) { #pragma omp parallel for

for( i=0; i loop bounds become known Code can become messy there is a point of diminishing returns.

Recently Viewed Presentations

  • Providing and maintaining a view via append log

    Providing and maintaining a view via append log

    Human connection with small queries creeping through complex data trying to get a grasp of the meaning often with the same query repeated. Long lasting stable machine-machine interaction where one machine is superior to the other and where each maintains...
  • Federal Exam Cram Session!

    Federal Exam Cram Session!

    Reading Comprehension. A push for nuclear power is not the way to meet America's urgent energy needs. New plants cannot be brought on line quickly enough to offset present electricity shortages, which many experts believe are caused primarily by limited...
  • Personal hygiene presentation - Alberta

    Personal hygiene presentation - Alberta

    Personal hygiene * Today, we will discuss the following (read bullet points). It should take about one hour and we will be doing some exercises, to help you learn and so I can get feedback on what you understand *...
  • Parent Guide to Using Lexile Scores Provided on the Georgia ...

    Parent Guide to Using Lexile Scores Provided on the Georgia ...

    If students are scoring in the 740- 940 Lexile band or below in the 8th grade, they will need very strategic and intensive reading instruction focusing upon the 5 pillars of reading (phonics,phonemic awareness, vocabulary, comprehension, fluency). A diagnostic reading...
  • Development of the Trust&#x27;s next five-year Strategic Plan

    Development of the Trust's next five-year Strategic Plan

    Two stage analysis of the genome: Type 10,000 DNA with 100,000 SNPs 1% of SNPs in 1,000 additional cases per disease * Case control - matched by geographical region Choice of SNPs - all nonsynonymous coding SNPs plus 70Mb of...
  • A Wrinkle in Time - mrssansbury.com

    A Wrinkle in Time - mrssansbury.com

    A Wrinkle in Time A Wrinkle in Time Characters Person or animal that takes part in the action in the story PROTAGONIST— The main character in a literary work (often a person, but can be an animal) ANTAGONIST(S)— A character...
  • Putting It All Together II

    Putting It All Together II

    As well as input from the California Association for Bilingual Education (CABE) and the Bilingual Coordinators Network (BCN) ... Elementary and Secondary Education Act, Part A, Title III . California Education Codes. Source: Riverside COE Title III Presentation.
  • Modeling assemblies - Siemens

    Modeling assemblies - Siemens

    SolidEdgeAssembly. Modeling. assemblies. Anassemblyisacollectionofpartsandsubassembliesthatarepositionedina meaningfulway. Thepartscanbeintheirfinalorientation ...