Document 6491480

Transcription

Document 6491480
1.
Background .............................................................................................................................. 1
2.
Summary of Recommendations ............................................................................................... 1
3.
xRAC Committee and Meeting Procedures ............................................................................. 2
4.
Lowering Barriers to Entry ...................................................................................................... 4
5.
Resources and Resource Providers .......................................................................................... 7
6.
Communications .................................................................................................................... 10
1. Background
The policies and procedures for allocating NSF’s cyberinfrastructure resources have evolved over
more than 20 years of serving the national user community under the original centers program,
the PACI partnerships, and now the TeraGrid. A TeraGrid Requirements Analysis Team (RAT)
was formed in March 2008 to review and update the current policies based on community
feedback, an increasing diversity of resources, and the impact of NSF’s Track 2 and Track 1
systems.
One challenge is how best to apply the merit-review xRAC process to maximize scientific impact
in a range of environments where resources are either over-requested or under-requested. The
objectives are to optimize the allocations process to adapt to resource availability and to ensure
we fully capitalize on emerging opportunities as additional resources become available. This may
include efforts within TeraGrid to facilitate increased access to the resources (while maintaining
proposal standards and scientific impact), changes in the evaluation process by the xRAC, and
changes in how TeraGrid sites implement recommendations from the xRAC. There should
emerge a consistent competitive review process that can adapt to a full range of allocation
environments. While the focus is on compute systems, the recommendations should also address
novel resource classes.
In addition, a set of issues have emerged that require policy-level decisions and TeraGrid
consensus, both to adapt policies to the current environment and to permit TeraGrid to move
forward with such activities as the Core Services 2 efforts, which in part implements these
allocation policies.
The RAT is producing two documents to capture its recommendations. A new version of the
“Grant Proposal Guide” for TeraGrid allocations will be published; this is a public document that
describes the allocations policies and process for users, reviewers and funding agencies. The
second deliverable is this document, an internal TeraGrid team document, which focuses on
recommended actions for TeraGrid team members.
2. Summary of Recommendations
The results of the RAT discussions can be categorized into four sets of recommendations
addressing aspects of the allocations process. The specific recommendations include changes to
policy, changes to implementation of the procedures, and TeraGrid-wide planning and
communications.

Formalize, adjust, and document the xRAC processes and procedures. With a growing
user base, increasing numbers of proposals and resource providers, some aspects of the
xRAC process need to be formalized and made more transparent to the TeraGrid and the
user community. While an overhaul is not recommended, some modifications are
important to maintaining the integrity of and confidence in the process by xRAC
members, the user community, and TeraGrid Resource Providers (RPs).

Take steps to reduce the real and perceived barriers to entry for the xRAC process. The
current process is based on the premise of resource scarcity. Policies need to be adjusted
so that the process can adapt to situations of significant resource availability.
Implementation of the proposal submission process must also be streamlined to lower
perceived barriers, reduce confusion, and minimize complexity. The burdens here should
be borne by the TeraGrid, not placed on current and potential users.

Resolve and clarify various policies related to the integration and presentation of
TeraGrid resources. TeraGrid RPs have a great deal of latitude in choosing which
resources to make available to the allocations process—and how to make them available.
Some limits and policies need to be defined to balance RP flexibility with the equally
important goals of simplifying the system for the user community and working within the
limits of the underlying TeraGrid infrastructure.

Improve communications about resources and allocations. A number of opportunities
exist to better communicate with the current and potential user community about the
availability of resources and about the process itself. Communications can help increase
demand for resources and lower perceived barriers to entry.
In general, we are also supportive of the implementation recommendations detailed in the Core
Services 2.0 vision document, where they do not explicitly contradict the RAT recommendations,
as changes that will improve procedures for users and contribute to lowering the real or perceived
barriers to entering the TeraGrid user community.
The remainder of this document is organized according to these broad categories, with numerous
specific recommendations detailed in each section.
3. xRAC Committee and Meeting Procedures
In large part, the xRAC process has served the NSF and the NSF-supported cyberinfrastructure
RPs well for nearly 20 years, and we see no need to significantly change the process. However,
because of the expanding user community, the growing number of RPs and resources, and the
evolution of the TeraGrid, there are some changes needed to formalize the process. In addition
to the specific changes below, we recommend that policies and procedures for the xRAC
committee selection and meetings be documented and posted on the TeraGrid Web site.
1) xRAC procedures when resources are under-requested or under-allocated
No substantive changes to xRAC procedures are policies are required for under-request or underallocated resources. We do recommend that the xRAC Coordinator begin each xRAC meeting
with a summary of resource availability and request levels, identifying resources that have been
either significantly over-requested and under-requested, as well as the level of total requests to
the total available SUs. The xRAC can use this information as context for their deliberations. We
expect that the xRAC will reduce awards less significantly and consistently across all proposals
when total available SUs exceed or meet total requests and will apply stricter standards
consistently across all proposals when total requested SUs exceeds those available.
Recommendations for dealing with under-allocated resources are described in the Resource
section (for discretionary allocations) and in the Communications section.
2) Broaden the membership selection process and clarify term limits
We recommend that the xRAC remain autonomous so as to preserve independent merit-review of
TeraGrid allocations proposals. Given the breadth of disciplines permissible for allocations
proposals, the current number of ~40 members seems to be an appropriate size to cover the
domains yet have meaningful discussions.
To ensure sufficient experience and to make the recruitment process manageable, the term limit
of an xRAC member is three years. Members may have their terms extended up to six months if
deemed necessary by the Allocations Coordinator. Previous members can rejoin after a minimum
2-year hiatus.
The Allocations Coordinator should solicit nominations for members from the following groups:
NSF OCI program officers; program areas in NSF domain directorates or other agencies (requests
for nominations made via OCI); the TeraGrid Science Advisory Board; departing xRAC
members; and the TeraGrid Forum. These reminders should include information about the
domains for which there is the most urgent need. Nominees should be recognized for and/or
actively engaged in computationally oriented research in a discipline. Candidates need not be
xRAC awardees or even eligible to apply for an xRAC award (e.g. FFRDC staff). Candidates
must be able to spend substantial time reviewing proposals in a timely way and participating in
quarterly meetings. In addition to domain coverage, it is desirable to maintain breadth of
geographical and institutional participation, as well as diversity considerations within the
committee.
The TG Allocations Coordinator will manage the process by which candidates are recruited to
serve on the committee within the Allocations Working Group.
3) Adopt formal conflict-of-interest policy for reviewers
We recommend that TeraGrid adopt the attached conflict-of-interest policy for reviewers. In
addition, we recommend changing the proposal format to permit PIs to name individuals who
should not review their proposal.
This policy should be provided to all prospective candidate reviewers and should be incorporated
by reference in the Allocations “Grant Proposal Guide.”
4) Method to select a chair of the xRAC meetings
The chair’s duties are to facilitate the committee’s open discussion, ensure that the committee’s
recommendations for each proposal are clearly summarized, and maintain a schedule to complete
the committee’s work in a timely way. There are no special decision-making privileges of the
chair, and there are no duties beyond the meeting itself.
We recommend that, when possible, an NSF OCI program officer chair the xRAC meetings.
When this is not possible, the Allocations Coordinator will open each meeting with a request for
nominations (up to 3), and the members will elect their own chair. (Nominees should have
attended at least two previous xRAC meetings.)
5) Caucus procedures
Historically, in the evening prior to each LRAC and MRAC meeting, a “caucus” session is held
in which reviewers assigned to each proposal discuss their independent reviews and seek to either
obtain a consensus recommendation or identify the areas of disagreement for discussion with the
full committee the following day. This is intended to facilitate the meeting workload—especially
important as the number of proposals is approaching 100 per day. However, there have been
concerns that this semi-private caucus procedure diminishes the open peer-review process and
makes it difficult to enforce conflict-of-interest policies.
We recommend that the caucus session be maintained, because it expedites the workload and
does not need to diminish the openness of the peer-review process. We recommend that the
purpose of the caucus be clearly explained to new members when they are recruited, and that it be
made clear that there should be no pressure to reach consensus and that significant differences of
opinion should be carried over to open debate with the full group the following day. In addition,
to preserve conflict-of-interest rules, proposals in which an xRAC reviewer in attendance is a PI
or co-PI will not be discussed at the caucus session; discussion will be deferred until the formal
meeting. These guidelines should also be stated to the full group at the beginning of each caucus
session.
6) Role of RP observers in xRAC meetings
TeraGrid RPs are permitted to send, at their own expense, a single representative (“allocation
officer”) to observe the xRAC meetings, provide information when requested by the xRAC
during deliberations, and participate in post-meeting processes. Conflict-of-interest rules for these
observers are detailed in the xRAC COI policy document.
In addition to the RP representatives, the following persons are required to ensure the smooth
functioning of the xRAC meetings: the xRAC Coordinator, the person responsible for meeting
logistics, a TeraGrid GIG representative, and a TeraGrid allocations staff member. The conflictof-interest rules apply to the xRAC Coordinator and the TeraGrid GIG representative.
7) Improve the transparency and documentation of decisions made by the
TeraGrid allocation officers to balance oversubscribed resources.
If any resources are oversubscribed at the conclusion of the xRAC allocation recommendations,
the xRAC Coordinator will convene all of the attending RP allocation officers and GIG
participants to quickly assess whether alternate resources are available to satisfy the total
allocations. If so, the xRAC will be adjourned, and the allocations officers will make reallocations with preference given to, in priority order,
(a)
(b)
(c)
(d)
assigning allocations to alternate resources specified by the user in their proposal,
re-assigning allocations to alternate resources with similar system architectures,
retaining current users rather than assigning new users to over-subscribed resources, and
allocating to sites where the user has the most previous experience.
Users assigned to resources they did not request will be informed that this was done because the
resource they requested was oversubscribed.
If it is obvious that further cuts must be made or unclear to the allocations officers whether reallocations can satisfy the spirit of the xRAC recommendations, the xRAC will reconvene after
the allocations officers make a best effort at re-allocations. The xRAC will be asked to identify
the least meritorious proposals overall whose awards received a disproportionate level of SUs,
with a charge of reducing such allocations until the total recommended awards can be satisfied by
available SUs.
The Allocations Coordinator will ensure that all changes are documented and available to the
RPs. Specifically, formal copies of the allocations spreadsheet should be kept prior to and after
the re-allocations process, and made available to NSF and TeraGrid team members upon request.
4. Lowering Barriers to Entry
Some potential users who do not use TeraGrid resources perceive that it is difficult to enter the
system for various reasons—e.g., the time from “idea-to-account” is too long, the proposal
process subjects them to double jeopardy for accomplishing their scientific objectives, the
proposal submission requirements are too complex, and so on. Particularly in times of resource
availability, it is important to examine possible barriers to entry and recommend improvements to
lower those barriers where appropriate and to better communicate the actual processes that exist
and their rationale.
8) Enable more timely entry into the allocations process
A common complaint amongst non-users of TeraGrid is that it takes too long to enter the
allocations system, and we wish to address this issue while maintaining an equitable and effective
allocations process. At the same time, we believe that the quarterly MRAC and semi-annual
LRAC schedule serves the TeraGrid and the national community well and should be maintained.
It is necessary to aggregate requests in order to set priorities and make zero-sum decisions. Six
months seems reasonable for the large requests, with quarterly MRAC meetings providing a more
frequent entry point for new users entering the system or for smaller users. In addition, more
frequent meetings may place an undue burden on the volunteers who serve on the review
committee.
There are three recommendations to enable more timely entry.
1. Larger start-up awards should be provided, particularly when resources are available (see
below).
2. The Core Services 2.0 efforts should work to shorten the timeframe between proposal
submission and the creation of allocations. This would help minimize perceived
timeliness issues and permit renewal proposals and progress reports to be submitted
closer to the end of an allocation period.
3. Policy should be changed such that, upon submission of a proposal for the next
appropriate MRAC or LRAC review cycle (at least a month prior to the stated deadline
for that cycle), a new PI is eligible to request a Pre-Award allocation equal to the size of
his/her request times the fraction of a year remaining until the start of the next allocation
period. For example, a potential LRAC PI who prepares a proposal in July (for
allocations to begin Oct. 1) with a request of 1 million SUs annually would be eligible to
receive a pre-award allocation of up to 250,000 SUs (25% of the request, since there are 3
months until the Oct. 1 start date), subject to internal TeraGrid review. Depending on the
circumstances (see below), the Allocations Coordinator may decide to grant the PreAward or ask xRAC members to review it off-cycle as a supplemental proposal. (As is
true for other PIs, the “early submittal” PI is still permitted to modify his/her proposal up
until the posted xRAC submission deadline.) [Can we expand this to cover the LRAC PI
who has a proposal ready in April (prior to the June MRAC?) This is the biggest possible
gap in the current process for new PIs.]
The Pre-Award allocation depends on the availability of SUs on the resources requested (e.g. it
would not be granted on a fully allocated system, crowding out previously allocated users). The
Pre-Award allocation is subject to the same rules as other allocations; in particular, unused SUs
are forfeited at the end of the allocation period. Since they are dependent on submission of an
xRAC proposal, Pre-Award allocations cannot have Extensions. Pre-Award allocations are
different from Advances in that the usage on a Pre-Award allocation is not deducted from the
subsequent “New” allocation.
9) Increase available resources granted to new “startup” awards, and rename
“DAC” accounts to “Startup Allocations”
Given the increase in scale of the new Track 2 and Track 1 systems, we recommend that the size
of the startup allocations granted by internal TeraGrid review to new users be increased for
compute resources from the current 30K SUs to 100K SUs. With this change, the thresholds for
DAC/MRAC/LRAC allocations would now be 100K and 500K SUs. We recommend that the
MRAC/LRAC threshold be revisited after the September 2008 LRAC meeting based on analysis
of the distribution of requests.
RPs should, however, have the ability to limit the size of a start-up allocation for resources where
100k SUs would represent a prohibitively large fraction of the resource. The “startup” limit will
be posted in the TeraGrid Resource Catalog for each resource, compute or otherwise. The
TeraGrid reviewers and allocations staff will enforce these limits until Core Services 2.0 provides
some implementation support.
We also recommend that the DAC name be changed to “Startup Allocation” to better
communicate its availability and intent.
10) Reduce confusion related to classification of proposals and proposal formats
With resources ranging in size from several hundred processors to tens of thousands of cores,
non-compute and compute resources that may not be allocated in SUs, and a complex set of
allocations policies, the current entry into the proposal submission process confuses many users.
Minimally, using fixed limits of SUs to categorize proposals is becoming less and less
satisfactory.
In the short term, we recommend that the initial entry screens for the POPS system be revised in
light of the TeraGrid’s dynamic resource situation to better guide submitters to the appropriate
meeting and to eliminate options that lead to dead ends. Because the submitter must identify a
proposal’s category, the categorization should be based on a formula that is simple to understand
and simple to calculate.
In the longer term, it would be possible to hide a more complex formula from a proposal
submitter and have POPS automatically categorize a request. For example, the category could be
derived by a sum of the requests, each inversely scaled by a system’s total available SUs. At that
point, the initial category selection (Start-up, Medium/Large) may only be important in terms of
proposal requirements; namely, a start-up request does not require a formal proposal document,
while larger requests would. As an alternatively, the allocations officers could apply subjective
evaluations to distinguish start-up from MRAC/LRAC requests and even to define “medium” and
“large” proposals on the fly to balance proposal traffic and xRAC workload.
11) Improve, clarify and simplify the POPS interface and proposal formats
With respect to implementation, we recommend that the Core Services 2.0 activity make
improvements to the POPS interface to reduce perceived barriers within the actual proposal
submission process. In particular, we recommend that POPS no longer include the “resource
attributes” that force users to answer a short Q&A about their proposed resource use. In practice,
these attributes do not play a significant role in the xRAC deliberations, do not have enough
consistency across resources to permit any quantitative analysis, and unduly complicate the user
interface, especially with more than two dozen resources listed. Instead, we recommend that
POPS provide easier access to resource descriptions and recommended use guidelines.
Post examples of “excellent” proposals for each resource of class and clearly document within
POPS the availability of these examples and the importance of addressing all requested
information and evaluation criteria. We note that exemplary proposals are already available in
POPS, but we encourage the Allocations WG to improve the POPS documentation pointing to
these examples, expand the set to include newer resource classes, and to continuously update
these examples as appropriate (e.g. for very large-scale requests in the Track 2 era).
We also recommend that the xRAC process adopt a more formal proposal document(s) structure
than that currently used to allow more consistent application of proposal length policies, clarify
for users what is expected in proposals, and permit explicit collection of information needed by
reviewers and by TeraGrid.
Proposals should include the following required and optional documents, and POPS should
restrict the different submission types to the appropriate documents:







Main Proposal (required, page length varies depending on request size). This document
should have the scientific background and objectives and the justification of the resource
request. Within the main document, some flexibility can be permitted. However, a
recommended sample template can be provided and/or recommended sections outlined.
The focus of reviewers will be on this document.
Progress Report (required for renewals and supplements, 5-page limit). Report on
progress made during prior (current) allocation award period. This is the Main document
for Progress Reports within Multi-Year Awards. Reviewers will consider this document
for rating the successful use of prior allocation awards.)
Publications Citing TeraGrid Support (required for renewals/progress reports of all kinds,
including renewed “startups,” no page limit; N/A for new submissions). A list of
bibliographic citations for publications in preparation, submitted, accepted, or published
that benefitted from access to TeraGrid resources. Where implementation allows,
bibliographic document formats (such as EndNote or BibTeX) could be supported for this
attachment. This document is separated for TeraGrid metrics purposes.
Special Requests (optional, 1-page limit). For Advanced Support Program justification
and other special requests.
Performance and Scaling of Codes (optional, 5-page limit). Information related to code
performance and scaling, if the information cannot be fit within the main proposal
document. This document should be used only to support claims made and used in the
Main Proposal regarding the performance of relevant codes. Reviewers should not be
required to closely scrutinize this document to evaluate the computational justification.
References and Figures (optional, no limit).
CVs (required for most submissions, perhaps optional for supplements and justifications).
A single attachment that includes all submitted CVs.
No other appendices will be permitted. This set of information corresponds to what is currently
requested by xRAC reviewers, and largely what is provided in current submissions, but it does
expand the total posted page count permissible for many types of submissions. However, the
actual page counts for many current submissions far exceed the posted limits by use of the policy
loophole that permits unlimited appendices. Posted guidance for each section should clearly state
that many (smaller) projects are not expected to use the full page-limit in some sections—
progress report and code scaling, in particular.
With this clarified structure for submissions, we further recommend that submissions that exceed
the posted page limits be returned to the submitters for revision or re-submission at a later date.
By changing the structure, the xRAC Coordinator can quickly examine submissions for
compliance. The page limits for the various sections can be determined without the need for
examining the content. The non-compliance policy should be clearly posted with the submission
guidelines.
5. Resources and Resource Providers
As a coordination activity and resource federation, the TeraGrid offers many advantages to its
users and a great deal of flexibility to RPs as far as the management of their resources. RP
flexibility must be balanced against the potential for user confusion and the implementation of the
coordination/federation capabilities. This section lists several recommendations to clarify this
balance.
In additions, RPs are encouraged to create incentive programs that facilitate scaling and more
productive large-scale usage, to streamline the selection process (e.g., combining with other
similar architectures as with the TeraGrid Clusters or Abe/Queen Bee), and to self-eliminate
resources for which demand remains low. Because of the diversity of situations, we do not
recommend a general TeraGrid-wide program or policy, but RPs could offer incentives to
facilitate scaling studies and code improvements which would result in increased code efficiency
and more productive large-scale usage. For example, LBNL has had a program whereby users are
partially reimbursed for computer time spent in scaling studies and large-scale debugging and
code improvements.
12) Simplify the access to Startup Allocations across TeraGrid RPs
The following changes to the DAC process can be implemented almost immediately with little or
no development effort from the Core Services team. We further recommend that the TeraGrid
documentation be improved so that startup instructions aren’t buried—the ‘QuickStart’ guide to
getting a Startup Allocation should be clearly and prominently identified for first-time users.
The current DAC allocation “meetings” exist largely for historical reasons and because prior
policies did not permit sufficient RP flexibility. Site-specific DACs do not serve user needs, force
arbitrary decisions upon users, and create unnecessary proliferation of projects.
We recommend that site-specific DACs be replaced with a set of four startup allocation types:
1. Roaming (Select this for access to most TeraGrid compute resources)
The Roaming start-up option would give requesters only the option of asking for
TeraGrid Roaming (and perhaps multi-site GPFS-WAN).
2. Specific Resources (Select this to request particular compute or storage resource(s))
The Specific Resources option would include the menu of all allocable resources.
3. Storage Only (Select this if you need only storage resources)
Storage-only startups are OPTIONAL if storage-providing RPs feel the need for this
option since there is no storage equivalent to TeraGrid Roaming.
4. Training (Select this for class instruction or training activities)
Training start-up requests would present requesters only with the option of TG Roaming,
or potentially, as proposed, a smaller subset of TG resources forming a “Training Grid.”
As long as Training Grid allocations only appeared on training start-up projects and not
in conjunction with Roaming or specific allocations, the current accounting system can
support this option without modification.
Review processes would need to be defined for the Specific and (if presented) Storage Only
startup requests, although the current model for reviewing TeraGrid DACs could be applied to
any or all of these sets of requests, with default reviewers on overlapping terms assigned to each
proposal. The EOT area will appoint reviewers for Training startups.
13) The allocations policies must adapt to novel resource classes, including but not
limited to storage and ASTA to improve adoption by the user community.
No new general policies can be defined for arbitrary resources. However, RP or TeraGrid
decision to introduce new resource classes must be required to include a discussion of (a) new
proposal requirements, (b) new proposal evaluation criteria related to the new classes, and (c) the
communication plan to inform users.
(a) Defining new proposal requirements should involve representatives of the Core Services
group in order to identify implementation mechanisms and the effort involved.
(b) When new review criteria and associated policy changes are approved, the posted allocation
“grants proposal guide” for users must be updated.
(c) The communication plan should include an analysis of TeraGrid documentation updates
required and the entity responsible (GIG or RP) for announcing the new resource class and
associated proposal changes.
To this end, we recommend a review of the recent policy documents for ASP and storage
allocations, with respect to these three areas. The ASP policies did describe new proposal
requirements and, to a degree, the evaluation criteria for this new resource class. A review of the
Long-term Storage allocation policy needs to be reviewed in this light.
14) TeraGrid Roaming policies should be reviewed, adapted and made explicit in
light of the changes to the TeraGrid resource environment.
The current concept of TeraGrid Roaming (http://www.teragridforum.org/mediawiki/index.php?title=TeraGrid_Roaming) serves several purposes that benefit a large crosssection of users:
1. It can simplify allocation requests for projects that use applications that can run on a
range of TeraGrid resources. Users can then optimize and self-regulate to find the
resource with the shortest wait time, for example.
2. It simplifies allocation requests for projects that plan multi-site, multi-resource runs.
3. It can help provide new users with access to a variety of resources so they can evaluate
the full range of TeraGrid capabilities. Similarly, it provides an easy way to “move”
xRAC awards to alternate, but non-specific architectures. Users have a hassle-free access
to many resources.
At the same time, TeraGrid Roaming has several significant flaws:
1. In the current implementation, a roaming allocation causes user logins to be created on all
TeraGrid resources, even though past experience has shown that most will never be used.
This represents significant extra work and represents a chronic security concern.
2. Since actual usage cannot be assigned to a resource a priori, Roaming makes it difficult
for RPs to plan resource availability for specific allocations.
Because of the potential benefits of Roaming, we recommend that Roaming continue to be
offered as described in the current de facto policies (including the requirement that all compute
resources be offered), with the following modifications.


We support the implementation changes described in the Core Services 2.0 Vision document
in which Roaming allocations require an extra step by PIs to identify those resources upon
which they plan to run. This step should be simple, automatic and changeable over time as
the user gains experience. This step will help reduce the most significant concerns about
security with Roaming accounts.
To ameliorate the challenges in resource planning around Roaming allocations, TeraGrid
Roaming allocations should be limited to the following three situations:
(1) Startup allocations, in which the user does not know where best to run and may need to
evaluate several architectures;
(2) xRAC recommendations and awards made to encourage users to migrate to new or
alternate platforms, especially in the case of overallocated resources; and


(3) xRAC requests in which the proposal describes how the project will use roaming to
minimize job waits by finding the least-busy resource or how the project will conduct
multi-site runs using unique grid capabilities. Failing this, the allocation should be made
to the most appropriate resource(s), based on xRAC recommendation.
RPs should be permitted (but not required) to include storage resources as valid Roaming
resources. The RP must provide a means for users to understand and calculate how storage
charges, if any, will be applied against a roaming allocation.
Permissible Exceptions: (a) Resources for which Services Units cannot be converted to
Normalized Units via some agreed-upon conversion factor, for example, storage resources.
(b) Resources that are not allocated according to some variation of core- or node-hour. The
best current example would be the Indiana Quarry cluster, which is dedicated to science
gateways and for which the allocated entity is a dedicated virtual host, rather than a share of
compute cycles. Similarly, an interactive visualization resource such as TACC’s Maverick
may also be excluded.
15) RPs will distribute access to under-allocated resources via Director’s
Discretionary awards. TeraGrid should centrally support the ability of RPs to
initiate and monitor such discretionary projects and allocations via POPS and
Core Services.
If a resource remains substantially under-allocated after the xRAC procedures, the Director of the
RP is encouraged to make discretionary allocations to other projects until the next LRAC or
MRAC cycle, to bring the total allocation up to the threshold designated for substantial allocation.
A Resource will be deemed to be “substantially under-allocated” when less than 80% of what the
RP declared available at the meeting was allocated. This level improves turnaround for the
allocated projects and retains flexibility for potential new users and opportunities.
The temporary discretionary allocations provided under this policy, and awarded for other
reasons, will be recorded as such in TGCDB, and included in any accounting of allocations and
usage.
Within the current Core Services infrastructure, it is possible to create discretionary projects and
allocations, by contacting the central allocations staff at NCSA. Such discretionary awards can be
of arbitrary duration. However, it is not now possible to “convert” such awards into allocated
awards; when an xRAC-initiated award is made, such users will receive new grant numbers. The
primary limitation on this capability is staffing resources. Priority must be given to completing
requests from allocated users. In addition, the current infrastructure does not support such awards
during “friendly user” (i.e., pre-production) periods of resource operation.
The vision for Core Services 2.0 recognizes that such capabilities are desired and must be
supported. We support the plan for Core Services 2.0 to enable site-initiated discretionary
allocations and to enable monitoring of allocations and users during pre-production phases.
In the current system, sites have the ability to coordinate discretionary projects, if desired. For
example, if PSC and TACC both want to grant discretionary time to the same PI, a joint request
can be submitted and awarded. The implementation of Core Services 2.0 will explore the
possibilities of permitting RPs to manage such coordinated discretionary projects.
6. Communications
Better and more regular communications can serve to improve the level of allocation requests and
to reduce perceived barriers to entry. In addition, RPs should consider mechanisms for
communicating the availability of new and under-requested resources to potential users.
16) Distribute regular press releases and user news on the new allocations ‘portfolio’
after xRAC meetings, as DOE INCITE does.
We recommend that the TeraGrid External Relations group formalize a schedule of press releases
and user news postings associated with xRAC proposal deadlines and award announcements.
Under the current LRAC/MRAC quarterly rotation schedule, it is recommended that the press
releases after each MRAC meeting focus on the upcoming deadline for LRAC/MRAC proposals,
the types and approximate quantities of resources available, the new resources coming online, and
how to get information about applying. The press releases after each LRAC/MRAC meeting
would focus on the overall resources allocated, the science associated with those allocation and
the scope of the user community; a lesser objective would be to announce the deadline and
process for upcoming MRAC allocations.
To expedite this effort, TeraGrid Forum should be given opportunity to comment and contribute
input, but the ER group will have the authority to issue releases and news postings upon the
declared schedule. (Note: NSF may have primary responsibility for the press releases, working in
conjunction with the ER group.)
In addition, after each xRAC event, the xRAC Coordinator will post to User News a summary of
the recent meeting, identifying whether there are or are not resources available for supplementary
requests and the approximate levels (if any) for the various resources.
17) Broadly announce and conduct “proposal” training classes early in each
proposal cycle.
The Allocations and User Support working groups should host a 1-2 hour videoconference
training session for all prospective proposers early in each proposal cycle. This should include
information about the resources available, proposal requirements, tips for writing good proposals,
and info about start-up accounts for scaling studies in support of proposals.
18) Pay special attention to communicating the availability of resources and
associated training to new users, communities and institutions
Many standard communication methods (e.g. User News) tend to reach existing users and those
already familiar with TeraGrid. It is critical to pay attention to methods which reach out to new
users, communities and institutions. The Allocations WG should work closely with the EOT WG
to ensure that TeraGrid continues to engage potential new users and bring them into the
allocations process. This includes communications at large conferences (e.g. TG’08, SC’08),
domain-specific conferences (e.g. professional societies, particularly for non-traditional HPC
domains), non-traditional community meetings (e.g. Tapia, Grace Hopper, HASATCC, CHASS,
NSTA), collaborations with MSI-CIEC and with EPSCoR, and campus champions.
19) Ensure that ongoing changes within the TeraGrid are routinely reflected in the
current allocations policies document posted for users at
http://www.teragrid.org/userinfo/access/allocationspolicy.php.
Within the TeraGrid organizational structure, the ultimate responsibility for maintaining a current
allocations policy document falls to the Area Director for User Facing Projects. In practice, the
AD for UFP and the xRAC Allocations Coordinator will monitor TeraGrid policy discussions and
decisions and will accept suggestions and comments from NSF, the Science Advisory Board, the
TeraGrid Forum, GIG management, and TeraGrid users regarding updates and clarifications
needed in this document.
The xRAC Allocations Coordinator will suggest edits as needed and solicit feedback from the
Allocations Working Group. Minor edits will be made upon consensus of the working group.
Significant modifications will be reported to the TeraGrid Forum and feedback accepted. The
presumption is that policies have been approved and that this step is an editing and updating
process; edits will be posted for users’ benefit while feedback continues. Communicating policy
changes to users via regular updates to and review of this document must be performed in a
timely fashion.
Perhaps include a matrix listing the recommendations and those Working
Groups/roles charged with implementing them.