Personal tools

Advanced topics

by admin last modified 2008-11-07 13:10

How it works – more detailed technical info about the Grid middleware and information about some other tools

Grid middleware

Components of the Workload Management System (WMS)

(based on the EGEE “Grid Tutorial, Grid Middleware” – Handouts for Students)

 

The Workload Management System (WMS) comprises a set of Grid middleware components responsible for the distribution and management of tasks across Grid resources, in such a way that applications are conveniently, efficiently and effectively executed.

The core component of the Workload Management System is the Workload Manager (WM), whose purpose is to accept and satisfy requests for job management coming from its clients. For a computation job there are two main types of request: submission and cancellation. In particular the meaning of the submission request is to pass the responsibility of the job to the WM. The WM will then pass the job to an appropriate CE for execution, taking into account the requirements and the preferences expressed in the job description. The decision of which resource should be used is the outcome of a matchmaking process between submission requests and available resources

The components of the Workload Management System (WMS) are shown in the figure below. It consists of a User Interface (UI), a Resource Broker (RB) with an associated Information Index (II) and a Job Submission System (JSS), the Globus Gatekeeper (GK) with its associated Local Resource Management System (LRMS), a Worker Node (WN) and a Logging and Bookkeeping (LB) system. The Resource Broker, the Job Submission System and the Logging and Bookkeeping system are central services in the Grid and do not have to be geographically at the same place as the user and/or the User Interface machine. The Gatekeeper and the Local Resource Management System are services that each center providing compute and storage resources to the grid will have.  Logically the Gatekeeper, the Local Resource Management System and the Worker Nodes are called a Computing Element (CE).

 

Glossary

Workload Management System (WMS)

– comprises of: UI, RB, JSS, LB.

User Interface (UI)

– it is an access point for the user of the Grid.

Resource Broker (RB)

– the broker of GRID resources. It has to find out which of all the available resources fits the best to accomplish the job. For example, if the job needs some special software or a specific platform, the resource broker takes care of choosing the best site where to submit it.

Job Submission System (JSS)

– responsible for the actual job management operations,  especially for job submission and job removal requests, interacting with CE.

Logging and Bookkeeping services (LB)

– a database to store job info.

Information Index (II)

– an LDAP server used by the Broker as a filter to select resources. It provides information about Grid resources and their status.

The job is received by the Globus GateKeeper (GK)

. Its task is to authenticate the job and its owner and to translate the request so that the JobManager can handle it.

Local Resource Management System (LRMS)

- service responsible for sending the job to one of the free Worker Nodes of the Computing Element.

 

 

 

Legend:

UI - User Interface

JSS - Job Submission System

RB-II - Resource Broker with Information Index

LB - Logging and Bookkeeping

GK - Gatekeeper

WN - Worker Node

LRMS - Local Resource Management System

JDL - Job Description Language

 

The figure above shows the steps to run a job:

1. User submits the job from the UI to the RB.

The RB does the matchmaking to find out where the job might be executed. After having found such a Computing Element the Job is transferred to the Job Submission System. At the JSS a file is created in Resource Specification Language (RSL). Also at this stage the Input SandBox

is created in which all files are specified that are needed by the job.

The InputSandbox defines a set of files to be used as inputs for the job.

 

2. This RSL file, together with the Input SandBox, is then transferred to the Gatekeeper of the Computing Element and the Gatekeeper submits the job to the Local Resource Management System.

 

3. The LRMS will then send the job to one of the free Worker Nodes of the Computing Element.

 

4. When the job has finished, the files produced by the job are available on the LRMS. The job manager running on the CE notifies the Resource Broker that the job has completed.

 

5. The RB subsequently retrieves those files specified in the OutputSandBox

.    

OutputSandbox = {"hostname.err","hostname.out"};

 
The files which have to be retrieved when job completes execution.

 

6. The RB sends the results (the OutputSandBox) back to the user on the User Interface machine.

 

7. Queries by the user on the status of the job are sent to the Logging and Bookkeeping Service.

 

Users have access to the GRID through a UI machine that – by means of a set of Python scripts allows the user to submit a job, monitor its status, and retrieve the output from the worker node back to a local directory on the UI machine. To do so, a simple JDL file is compiled. In this file all parameters to run the job are specified.

 

The relevant commands for submitting a job, querying its status, retrieving the output, and canceling a job are:

 

edg-job-submit <job.jdl> submits a job for which the description is in job.jdl

 

edg-job-status <jobId> returns the status of a job with job identifier jobId

 

edg-job-get-output <jobId>returns the place where the output of the job can be found
edg-job-cancel <jobId>
cancels the job with job identifier jobId

 

edg-job-xxx --help shows the usage of command edg-job-xxx

 

More exercises you can find in the EGEE "Grid Tutorial, Grid Middleware" – Handouts for Students.

 

Job Description Language

Job submission requires both, a description of the job to be executed, and a description of the needed resources. These descriptions are provided with a high-level language called JDL.

Job description files (.jdl files) are used to describe jobs for execution on the Grid. These files are written using a Job Description Language (JDL). The JDL adopted within the gLite middleware is the Classified Advertisement (ClassAd) language defined by the Condor Project, which deals with the management of distributed computing environments, and whose central construct is the ClassAd

, a record-like structure composed of a finite number of distinct attribute names mapped to expressions.

A ClassAd is a highly flexible and extensible data model that can be used to represent arbitrary services and constraints on their allocation. The JDL is used in gLite to specify the desired job characteristics and constraints, which are used by the match-making process to select the resources that the job can use. The fundamentals of the JDL are given below.  

 The JDL syntax consists on statements like:

 

attribute = value;

 

 Note:  Values can be of different types: numeric, string, booleans, etc. The JDL is sensitive to blank characters and tabs. NO blank characters or tabs should follow the semicolon at the end of a line.

 

The JDL is used to describe a job and resources required by the job so some attributes are used to describe the technical characteristics of the job. Job attributes would include items such as the job name, command to execute, command line options, etc. Other attributes are used to specify requirements for a CE. Resource attributes would include items such as the operating system required, the amount of memory required, the amount of time required, etc.  

 For each type of attribute, some will be mandatory while others will be optional.

 
Essentially, one must at least specify the name of the executable, the files where to write the standard output and the standard error of the job (they can even be the same file). For example:

 Executable = "test.sh";

StdOutput = "std.out ";
StdError = "std.err ";

If needed, arguments can be passed to the executable:

Arguments = "hello 10";

Files to be transferred between the UI and the WN before (Input Sandbox) and after

(Output Sandbox) the job execution can be specified:

InputSandbox = {" test.sh","std.in "};
OutputSandbox = {" std.out","std.err "};

 

Wildcards are allowed only in the InputSandbox attribute. The list of files in the Input Sandbox is specified relatively to the current working directory. Absolute paths cannot be specified in the OutputSandbox

attribute. Neither the Input Sandbox nor the Output Sandbox lists can contain two files with the same name (even if in different paths) as when transferred they would overwrite each other.

 Note: The executable flag is not preserved for the files included in the Input Sandbox when transferred to the WN. Therefore, for any file needing execution permissions a chmod u+x operation should be performed by the initial script specified as the Executable in the JDL file (the chmod u+x

operation is done automatically for this script).

The environment of the job can be modified using the Environment attribute. For example:

 

Environment = {" CMS_PATH=$HOME/cms", "CMS_DB=$CMS_PATH/cmdb "};

 

 

More info about the JDL and many examples of scripts in JDL you can find here or here. For more info on using JDL in gLite please go to thisweb site.

 

 

Migrating Desktop

 

You can run your application on the GRID using the Migration Desktop – the user friendly interface for accessing and using the BG infrastructure.

 

The Migrating Desktop provides scientists with a framework which hides the details of Grid environment and allows for setting up and interactively controlling complex systems. This tool was designed and developed as the key product of the EU CrossGrid project. The BalticGrid will use the Migrating Desktop (MD) as a common point for accessing and managing the project applications, tools, resources and services.  

The BalticGrid applications are dedicated to wide spectrum of various disciplines of science such as:

  • bioinformatics - which requires sharing data from many sources and a diverse set of tools,
  • material science - including atomic and molecular structures, modeling of advanced technological materials,
  • high-energy physics - including statistical data analysis, production of Monte Carlo samples and distributed data analysis, nuclear and sub-nuclear physics and multi-body problems.

 

Application input 

The user can specify inputs using the Job Submission Wizard. This Wizard is responsible for proper preparation of the user's job input parameters and consists of several panels. One panel is an application-specific plug-in that can be used for defining parameters specific for a given application, getting parameters from a database and other application-specific sources. The rest of panels can be used to set common parameters such as: job information, resource requirements, files and environment variables.

  • Specific application parameters panel (application plugin (IP1) or XML (IP2)).
  • Job description.
  • Resources requirements (common like job type, memory, CPU, etc.).
  • Input and output files – the user can choose one of more file. The outputs are created physically before submitting the job (empty files).
  • Environment variables.
  • Pre-processing tools like OCM-G (IP3).

 

Further info:

Contact person: Bartek Palak (cgrid at man.poznan.pl)

MD Brochure: available here.

 

 

Grid monitoring

The current structure for monitoring provides sites with up to date information on their status history as well as the latest results through GStat, SFT and SAM. That way a site admin can easily monitor the site health and take action as appropriate. However if the site does not have an active helpdesk, then the monitoring tools also provide a means for opening trouble tickets to be assigned to the site responsible.

 

Trouble ticketing system

User support is established using a ticketing system Request Tracker (RT). The user support is available through e-mail interface of support at balticgrid.org. It is also installed at http://support.balticgrid.org/

and available only for BG users.

Before submitting a bug report, please:

 
1. Check that all relevant certificates are valid.

2. Check that all CRL-s are updated.

3. Check that your computer clock is synchronized and correct.

 

 

Document Actions
EU

Baltic Grid Second Phase (BalticGrid-II) project is funded by the EU within the framework of the Seventh Framework Programme, in the 'Research infrastructures' activity area, FP7-INFRA-2007-1.2.3: e-Science Grid infrastructures, contract No 223807.

Powered by Plone