Calendar

Monday, February 22 2010

Time Items
All day
 
10:00
 
Date
22-02-2010T10:30Z
Body

DAMA-UPC research group investes you to the following talk:

Title: Many Task Computing in Scientific Workflows
Speaker: Marta Mattoso (Universidade Federal de Rio de Janeiro)
Date: 22-Feb-2010 @ 10:30
Room: Sala d'actes de la FIB

Abstract

One of the main advantages of using a scientific workflow management system (SWfMS) to orchestrate data flows among scientific activities, is to control and register the whole workflow execution. The execution of activities within a workflow with high performance computing (HPC) presents challenges in SWfMS execution control. Remote execution control and provenance registry of the parallel activities is limited from the SWfMS side. This talk aims at showing a middleware solution as a bridge between SWfMS and HPC supporting workflow parallelization and provenance combined to MTC (Many Task Computing). It presents Hydra, a set of components to be included on the workflow specification of any SWMfS to control parallelization of activities as MTC. Hydra works in map/reduce style. Through Hydra's components, an MTC parallelization strategy can be registered, reused, and provenance may be uniformly gathered. Hydra aims at reducing the complexity involved in designing and managing activity/workflow parallel executions within scientific experiments. The main contributions of this work resides in helping the scientist in: (i) identifying parallel workflow activities in an abstract level, (ii) modeling workflow activities using MTC paradigm, (iii) submitting activities from the SWfMS to the distributed environment, (iv) steering by finding failures, detecting performance bottlenecks, monitoring processes status to let the SWfMS aware of the remote execution, and (v) gathering prospective and remote retrospective provenance data. We have evaluated Hydra in a Computational Fluid Dynamics (CFD) workflow and in a sensitivity analysis workflow for computing model constants in Large Eddy Simulation (LES) of turbulence. Experimental results show that a systematic approach for distributing parallel activities is viable, sparing scientist time and diminishing operational errors, with the additional benefits of distributed provenance support.

Link
view