Operations Department

From AnnualReport

Jump to: navigation, search
Sergi Girona, Operations Department Director

The key mission of the Operations Department is to ensure the continued availability and accessibility of RES systems 24 hours a day, 7 days a week and to provide support to all the users of the RES and PRACE. Further core objectives are to manage upgrades to the MareNostrum and other RES nodes; facilitate access to RES facilities, including online electronic applications, remote access, and porting of code; manage the environmental aspect of the BSC-CNS installations; manage the technical aspects of integration of the MareNostrum in the PRACE, EUDAT and European HPC network ; and ensure that RES staff receive appropriate training and skills development in order to be able to professionally carry out their duties in an environment of constant technological change and advancement.


Led by Sergi Girona, the Operations Department ensures the continuous daily functioning of RES supercomputers and remote access by users, both within Spain and internationally. In addition to all the routine maintenance and operations tasks, Department staff are also heavily involved in planning and designing new systems and support facilities. Furthermore, the entire Department participates in European projects such as PRACE, EESI-2 and EUDAT.

The constant upgrading and utilisation of cutting edge technology implies that staff within the Department, liaison staff at the RES and PRACE nodes, and scientific users all face continuous change in systems and procedures. The management of these changes and their dissemination to all who may be affected by them are also regular activities of the Department.

The continuous growth of the BSC-CNS team requires a constant remodelling of premises to accommodate the new employees, and to facilitate the development of their research. Until the BSC-CNS headquarters building is ready to be used, maintenance and extension of current computer rooms, office space and meeting rooms requires an intensive effort.

Organisational Structure

The Operations Department is structured in three groups: System Administration, User Support and Facility Management.

Systems Administration supervises the daily operations of two key resources: the MareNostrum Supercomputer and the Spanish Supercomputing Network (RES), bearing responsibility for system administration, security, resource management, networking and helpdesk. This group also takes care of running all the other IT equipment installed at BSC-CNS and related facilities.

User Support is responsible for direct user support providing detailed knowledge of programming models, libraries, performance tools and applications, and is also responsible for management of the BSC-CNS website, and the management and support of 3D visualisation equipment.

Facility Management is responsible for the safe and efficient working conditions of key BSC-CNS facilities, such as the MareNostrum supercomputer and ancillary power, data and environmental systems.

Key Projects

The Operations Department was involved in the following projects during 2014 :

  • RES (Red Española de Supercomputación): The Spanish Supercomputing Network offers coordinated HPC services to the Spanish scientific community. The Operations Department is responsible for the coordination of the network, including all support and administration services.
  • PRACE 2IP: In the PRACE Second Implementation Phase, the Operations Department is involved in WP2, WP5, WP6, WP7 and WP10, and coordinates the participation of BSC-CNS in the project.
  • PRACE 3IP: In the PRACE Third Implementation Phase, the Operations Department is involved in WP2, WP5, WP6 and WP7, and coordinates the participation of BSC-CNS in the project.
  • EESI2: The objective of this Support Action, co-funded by the European Commission, is to build a European vision and roadmap to address the challenges of the new generation of massively parallel systems composed of millions of heterogeneous cores which will provide multi-Petaflop performances in the next few years and Exaflop performances in 2020.
  • EUDAT: The EUDAT project aims to contribute to the production of a Collaborative Data Infrastructure (CDI). The project's target is to provide a pan-European solution to the challenge of data proliferation in Europe's scientific and research communities.
  • RDA-EUROPE: The RDA-EUROPE is a project focuses on coordinating a series of cross-infrastructure experiments on global interoperability with a selected group of projects and communities. One of its main functions is support of the RDA initiative.

System Administration

The System Administration Group is responsible for general operation, upgrades and maintenance of the MareNostrum and other BSC-CNS systems, as well as providing technical support to the operators of the other RES nodes. The Group also undertakes numerous special projects for continuous improvement of BSC-CNS systems and services and provides technical support to key research projects.

Gateway hardware updated

During October 2014 the platform to access BSC-CNS resources from internal and external networks was upgraded to enhance security. The technology used is SSL VPN which first performs a user authentication followed by host checking to determine the level of reliability of the client machine. Depending on this level one of two roles will be offered to the user:

1) BSC-CNS Basic external access where only web access is permitted.

2) BSC-CNS VPN external access: In addition to the web services this role allows the establishment of a VPN connection to internal BSC-CNS resources.

Performance monitoring improvements

A BigData infrastructure was deployed to receive and store all MareNostrum3 jobs information.

This SaaS receives a continuous stream of information as jobs are being executed in the supercomputer and provides dynamic accounting and several graphical tools to help the Operations team analyse usage distribution providing a wider understanding of the overall system behaviour.

In addition to the graphic interface, this platform provides back-end information for Support's team public reporting tools.

New HPC machines

During 2014, three new installations and upgrades were performed on BSC-CNS HPC resources:

First, an IBM x3850X5 shared memory multiprocessor composed of 80 cores and 2 TB of main memory was installed and completely integrated as a node inside MareNostrum3. This server allows researchers to perform specific pre and post-processing analysis that require large shared memory systems.

Second, 256 MareNostrum nodes were upgraded to accommodate memory-bound parallel applications, half of those nodes had their memory quadrupled from from 32GB to 128GB, and the other 128 nodes had their memory improved from 32GB to 64GB. These special nodes can be requested using a specific key in the jobs description.

Finally, two new servers based on the cutting-edge Power8 processors technology were installed. Each server has:

- 4 Sockets Power8 12-core at 3 GHz (8 threads per core) - 1 TB main memory - 2 NVIDIA Tesla K40m

These servers will help to perform research in Big Data, Computer architecture and parallel application development based on this new technology.

User Support

The User Support Group provides assistance with all aspects of scientific computing. This assistance includes general user support, code optimisation and parallel model building support, and porting serial and parallel codes for supercomputers such as MareNostrum. The Group is also involved in the creation of scripts for ease of use as well as assistance with software packages and tutorials on specialised topics or programs. The Group is also in charge of the SC virtual reality and 3D visualisation system, the BSC-CNS official website, the Intranet, and the graphical design activities.

Some 7403 support requests were received in 2014 as presented in the figures below, split by support request area and by type.

Highlighted Projects

  • UserPortal

During 2014 the User Support Team put into production the UserPortal interface. This service provides the end user, in a single entry point, with a collection of the most relevant monitoring information gathered by BSC-CNS monitoring tools. This service uses the BSC-CNS central authentication service for security. Users will now be able to better understand the behaviour of their applications and the resources used during their execution.

MareNostrum Performance 2014

MareNostrum was in full production during the entire year with a total of 48128 cores. The observed system utilisation was approximately 88%. In addition to BSC-CNS internal groups, more than 207 external groups accessed the MareNostrum system during 2014.

Utilisation is defined by the formula :


Jobs are accounted at the end of execution, and this produces exceeding the total capacity in specific weeks

Facility Management

The mission of Facility Management (FM) is to keep BSC-CNS facilities under its purview in a safe and efficient working condition.

The department provides support and recommendations for building and infrastructure expansions and renovations:

  • participates in project, proposal, and project management stages of new investments;
  • maintains major building systems, performs required maintenance, and supervises vendors for all architectural, mechanical, and electrical requirements of its facilities;
  • defines and places purchase requests for fixed asset materials;
  • receives proposals and authorises payments for electrical and other utility needs of its facilities;
  • coordinates and tracks service calls for repairs.

BSC-CNS workspaces facilities distribution 2014

Facility management group includes in its tasks the preparation of the office space for all departments at BSC-CNS. Since 2004, different buildings and office space have been set for those departments, with a continuous and significant growth.

MareNostrum3, reducing power consumption

In September 2012 MareNostrum2 finished its services. MareNostrum3 started operations in January 2013. The full system capacity was reached in July 2013. At that point, by using the best practices, an intensive work to reduce the total power consumption of MareNostrum3 and all cooling facilities was initiated. The upgrade to MareNostrum3 increased the computational capacity 10 times, but the total power consumption increased only by 10%.

Personal tools