Next: The libgomp ABI, Previous: OpenACC Library Interoperability, Up: Top [Contents][Index]
We’re implementing the OpenACC Profiling Interface as defined by the OpenACC 2.6 specification. We’re clarifying some aspects here as implementation-defined behavior, while they’re still under discussion within the OpenACC Technical Committee.
This implementation is tuned to keep the performance impact as low as possible for the (very common) case that the Profiling Interface is not enabled. This is relevant, as the Profiling Interface affects all the hot code paths (in the target code, not in the offloaded code). Users of the OpenACC Profiling Interface can be expected to understand that performance will be impacted to some degree once the Profiling Interface has gotten enabled: for example, because of the runtime (libgomp) calling into a third-party library for every event that has been registered.
We’re not yet accounting for the fact that OpenACC events may
occur during event processing.
We just handle one case specially, as required by CUDA 9.0
nvprof
, that acc_get_device_type
(acc_get_device_type)) may be called from
acc_ev_device_init_start
, acc_ev_device_init_end
callbacks.
We’re not yet implementing initialization via a
acc_register_library
function that is either statically linked
in, or dynamically via LD_PRELOAD
.
Initialization via acc_register_library
functions dynamically
loaded via the ACC_PROFLIB
environment variable does work, as
does directly calling acc_prof_register
,
acc_prof_unregister
, acc_prof_lookup
.
As currently there are no inquiry functions defined, calls to
acc_prof_lookup
will always return NULL
.
There aren’t separate start, stop events defined for the
event types acc_ev_create
, acc_ev_delete
,
acc_ev_alloc
, acc_ev_free
. It’s not clear if these
should be triggered before or after the actual device-specific call is
made. We trigger them after.
Remarks about data provided to callbacks:
acc_prof_info.event_type
It’s not clear if for nested event callbacks (for example,
acc_ev_enqueue_launch_start
as part of a parent compute
construct), this should be set for the nested event
(acc_ev_enqueue_launch_start
), or if the value of the parent
construct should remain (acc_ev_compute_construct_start
). In
this implementation, the value will generally correspond to the
innermost nested event type.
acc_prof_info.device_type
acc_ev_compute_construct_start
, and in presence of an
if
clause with false argument, this will still refer to
the offloading device type.
It’s not clear if that’s the expected behavior.
acc_ev_compute_construct_end
, this is set to
acc_device_host
in presence of an if
clause with
false argument.
It’s not clear if that’s the expected behavior.
acc_prof_info.thread_id
Always -1
; not yet implemented.
acc_prof_info.async
acc_ev_compute_construct_start
.
acc_device_host
it will always be
acc_async_sync
.
It’s not clear if that’s the expected behavior.
acc_ev_device_init_start
and acc_ev_device_init_end
,
it will always be acc_async_sync
.
It’s not clear if that’s the expected behavior.
acc_prof_info.async_queue
There is no limited number of asynchronous queues in libgomp.
This will always have the same value as acc_prof_info.async
.
acc_prof_info.src_file
Always NULL
; not yet implemented.
acc_prof_info.func_name
Always NULL
; not yet implemented.
acc_prof_info.line_no
Always -1
; not yet implemented.
acc_prof_info.end_line_no
Always -1
; not yet implemented.
acc_prof_info.func_line_no
Always -1
; not yet implemented.
acc_prof_info.func_end_line_no
Always -1
; not yet implemented.
acc_event_info.event_type
, acc_event_info.*.event_type
Relating to acc_prof_info.event_type
discussed above, in this
implementation, this will always be the same value as
acc_prof_info.event_type
.
acc_event_info.*.parent_construct
acc_construct_parallel
for all OpenACC compute
constructs as well as many OpenACC Runtime API calls; should be the
one matching the actual construct, or
acc_construct_runtime_api
, respectively.
acc_construct_enter_data
or
acc_construct_exit_data
when processing variable mappings
specified in OpenACC declare directives; should be
acc_construct_declare
.
acc_ev_device_init_start
,
acc_ev_device_init_end
, and explicit as well as implicit
acc_ev_alloc
, acc_ev_free
,
acc_ev_enqueue_upload_start
, acc_ev_enqueue_upload_end
,
acc_ev_enqueue_download_start
, and
acc_ev_enqueue_download_end
, will be
acc_construct_parallel
; should reflect the real parent
construct.
acc_event_info.*.implicit
For acc_ev_alloc
, acc_ev_free
,
acc_ev_enqueue_upload_start
, acc_ev_enqueue_upload_end
,
acc_ev_enqueue_download_start
, and
acc_ev_enqueue_download_end
, this currently will be 1
also for explicit usage.
acc_event_info.data_event.var_name
Always NULL
; not yet implemented.
acc_event_info.data_event.host_ptr
For acc_ev_alloc
, and acc_ev_free
, this is always
NULL
.
typedef union acc_api_info
… as printed in 5.2.3. Third Argument: API-Specific
Information. This should obviously be typedef struct
acc_api_info
.
acc_api_info.device_api
Possibly not yet implemented correctly for
acc_ev_compute_construct_start
,
acc_ev_device_init_start
, acc_ev_device_init_end
:
will always be acc_device_api_none
for these event types.
For acc_ev_enter_data_start
, it will be
acc_device_api_none
in some cases.
acc_api_info.device_type
Always the same as acc_prof_info.device_type
.
acc_api_info.vendor
Always -1
; not yet implemented.
acc_api_info.device_handle
Always NULL
; not yet implemented.
acc_api_info.context_handle
Always NULL
; not yet implemented.
acc_api_info.async_handle
Always NULL
; not yet implemented.
Remarks about certain event types:
acc_ev_device_init_start
, acc_ev_device_init_end
acc_ev_device_init_start
and acc_ev_device_init_end
events, they currently aren’t nested within the corresponding
acc_ev_compute_construct_start
and
acc_ev_compute_construct_end
, but they’re currently observed
before acc_ev_compute_construct_start
.
It’s not clear what to do: the standard asks us provide a lot of
details to the acc_ev_compute_construct_start
callback, without
(implicitly) initializing a device before?
acc_set_device_type
and acc_set_device_num
functions.
It’s not clear if they should be.
acc_ev_enter_data_start
, acc_ev_enter_data_end
, acc_ev_exit_data_start
, acc_ev_exit_data_end
Callbacks for the following event types will be invoked, but dispatch and information provided therein has not yet been thoroughly reviewed:
acc_ev_alloc
acc_ev_free
acc_ev_update_start
, acc_ev_update_end
acc_ev_enqueue_upload_start
, acc_ev_enqueue_upload_end
acc_ev_enqueue_download_start
, acc_ev_enqueue_download_end
During device initialization, and finalization, respectively, callbacks for the following event types will not yet be invoked:
acc_ev_alloc
acc_ev_free
Callbacks for the following event types have not yet been implemented, so currently won’t be invoked:
acc_ev_device_shutdown_start
, acc_ev_device_shutdown_end
acc_ev_runtime_shutdown
acc_ev_create
, acc_ev_delete
acc_ev_wait_start
, acc_ev_wait_end
For the following runtime library functions, not all expected callbacks will be invoked (mostly concerning implicit device initialization):
acc_get_num_devices
acc_set_device_type
acc_get_device_type
acc_set_device_num
acc_get_device_num
acc_init
acc_shutdown
Aside from implicit device initialization, for the following runtime library functions, no callbacks will be invoked for shared-memory offloading devices (it’s not clear if they should be):
acc_malloc
acc_free
acc_copyin
, acc_present_or_copyin
, acc_copyin_async
acc_create
, acc_present_or_create
, acc_create_async
acc_copyout
, acc_copyout_async
, acc_copyout_finalize
, acc_copyout_finalize_async
acc_delete
, acc_delete_async
, acc_delete_finalize
, acc_delete_finalize_async
acc_update_device
, acc_update_device_async
acc_update_self
, acc_update_self_async
acc_map_data
, acc_unmap_data
acc_memcpy_to_device
, acc_memcpy_to_device_async
acc_memcpy_from_device
, acc_memcpy_from_device_async
Next: The libgomp ABI, Previous: OpenACC Library Interoperability, Up: Top [Contents][Index]