- Wed 20 December 2023
- Neuron
- #ml-code, #llm, #ml-acceleration, #hpc-concept
Aliasing on XLA
Modern machine learning models demand substantial memory, especially during training and inference. To address this challenge, compilers like XLA (Accelerated Linear Algebra) implement memory optimizations such as aliasing. Aliasing allows different parts of a computation to share the same underlying memory buffer, thereby reducing memory footprint and improving execution performance.
What is Aliasing
Aliasing in XLA refers to mapping multiple logical buffers onto the same physical memory region. In practice, this means that input and output tensors in a computation can share the same memory if certain conditions are met. By setting up such controlled aliases, XLA can avoid allocating separate memory for intermediate results, resulting in reduced memory usage and faster data access.
However, establishing aliases must be managed carefully to maintain computational correctness. If two aliases modify the same memory simultaneously without proper sequencing, it can lead to corrupted results. Consequently, XLA enforces strict constraints based on shape compatibility, computation dependencies, and memory layouts, as outlined in the design of the XLA compiler1.
How XLA Supports Aliasing
XLA supports aliasing primarily through the
OptimizeInputOutputBufferAlias
class. This optimization
pass attempts to reuse input buffers for output buffers wherever
possible while respecting correctness guarantees. The following
annotated code illustrates the core flow:
_Ref: optimize_input_output_buffer_alias.cc with annotations
namespace xla {
// Attempts to establish aliases between input and output buffers.
::StatusOr<bool> OptimizeInputOutputBufferAlias::Build(
absl::Span<const Shape> input_shapes,
abslconst Shape& output_shape,
* alias_config,
HloInputOutputAliasConfig* buffer_donor_config) {
HloBufferDonorConfig
bool changed = false;
// Disable aliasing if the output has a dynamic shape.
if (output_shape.is_dynamic()) {
return false;
}
// Structures to collect potential donor input buffers.
struct DonorEntry {
int64_t param_number;
;
ShapeIndex indexint64_t shape_size;
};
::flat_hash_map<int64_t, std::vector<DonorEntry>> donors;
absl
// Populate donors: traverse each input's subshapes and collect eligible buffers.
for (int64_t param_number = 0; param_number < input_shapes.size(); ++param_number) {
const Shape& input_shape = input_shapes[param_number];
(LayoutUtil::HasLayout(input_shape));
TF_RET_CHECK::ForEachSubshape(input_shape, [&](const Shape& subshape, const ShapeIndex& index) {
ShapeUtilif (!LayoutUtil::IsDenseArray(subshape) || subshape.is_dynamic()) {
return;
}
if (alias_config->ParameterHasAlias(param_number, index)) {
return;
}
if (registered_buffer_donor_only_ &&
!buffer_donor_config->ParameterIsBufferDonor(param_number, index)) {
return;
}
int64_t memory_space = subshape.layout().memory_space();
[memory_space].emplace_back(
donors{param_number, index, shape_size_fn_(subshape)});
DonorEntry});
}
// Structures to collect potential donee output buffers.
struct DoneeEntry {
;
ShapeIndex indexint64_t shape_size;
};
::flat_hash_map<int64_t, std::vector<DoneeEntry>> donees;
absl
(LayoutUtil::HasLayout(output_shape));
TF_RET_CHECK::ForEachSubshape(output_shape, [&](const Shape& subshape, const ShapeIndex& index) {
ShapeUtilif (!LayoutUtil::IsDenseArray(subshape)) {
return;
}
if (alias_config->OutputHasAlias(index)) {
return;
}
int64_t memory_space = subshape.layout().memory_space();
[memory_space].emplace_back(
donees{index, shape_size_fn_(subshape)});
DoneeEntry});
// For each memory space, match donors and donees based on shape size.
for (auto& [memory_space, donor_vector] : donors) {
auto donee_it = donees.find(memory_space);
if (donee_it == donees.end()) {
continue;
}
auto& donee_vector = donee_it->second;
// Sort both donors and donees by decreasing size to maximize reuse of large buffers.
::c_stable_sort(donor_vector,
absl[](const DonorEntry& a, const DonorEntry& b) { return a.shape_size > b.shape_size; });
::c_stable_sort(donee_vector,
absl[](const DoneeEntry& a, const DoneeEntry& b) { return a.shape_size > b.shape_size; });
// Two-pointer matching: match donor and donee of the same size.
int64_t donor_vector_index = 0;
int64_t donee_vector_index = 0;
while (donor_vector_index < donor_vector.size() &&
< donee_vector.size()) {
donee_vector_index const auto& donor = donor_vector[donor_vector_index];
const auto& donee = donee_vector[donee_vector_index];
if (donor.shape_size > donee.shape_size) {
++;
donor_vector_index} else if (donor.shape_size < donee.shape_size) {
++;
donee_vector_index} else {
// Match found: establish alias and remove donor from pool.
(alias_config->SetUpAlias(
TF_RETURN_IF_ERROR.index, donor.param_number, donor.index));
donee(buffer_donor_config->RemoveBufferDonor(
TF_RETURN_IF_ERROR.param_number, donor.index));
donor++;
donor_vector_index++;
donee_vector_index= true;
changed }
}
}
return changed;
}
// Entry point for running the alias optimization on an HLO module.
::StatusOr<bool> OptimizeInputOutputBufferAlias::Run(
absl* module,
HloModuleconst absl::flat_hash_set<absl::string_view>& execution_threads) {
// Extract input and output shapes from the module entry computation.
const auto& entry_computation_layout = module->entry_computation_layout();
std::vector<Shape> input_shapes;
for (int64_t i = 0; i < module->entry_computation()->num_parameters(); ++i) {
.push_back(entry_computation_layout.parameter_shape(i));
input_shapes}
const Shape& output_shape = entry_computation_layout.result_shape();
* alias_config = &module->input_output_alias_config();
HloInputOutputAliasConfig* buffer_donor_config = &module->buffer_donor_config();
HloBufferDonorConfig
// Attempt to build aliasing configuration.
(bool changed, Build(input_shapes, output_shape,
TF_ASSIGN_OR_RETURN, buffer_donor_config));
alias_config
// Verify that the resulting alias configuration is correct.
(alias_config->Verify(*module, shape_size_fn_));
TF_RETURN_IF_ERROR
return changed;
}
} // namespace xla
This code performs input preparation, donor and donee collection, size-based sorting, two-pointer matching of donors and donees, alias setup, and verification of the final configuration.
Dynamic Shape Handling on Aliasing
Dynamic shapes introduce uncertainty in buffer sizes during runtime. If an output tensor’s shape is dynamic, the memory required cannot be reliably determined at compile time. As a result, XLA restricts aliasing for dynamic shapes by disabling optimization whenever the output is dynamic. Allowing aliasing for dynamic outputs could lead to incorrect buffer allocations and data corruption if the output grows larger than the input buffer. Verifying correctness would also require complex runtime checks that can negate performance benefits. By disallowing aliasing in such cases, XLA ensures a simple and predictable memory model, though at the cost of missed optimization opportunities for dynamic workloads.
Recent research, such as work on dynamic bufferization in MLIR2, proposes more sophisticated methods to handle dynamic shapes while minimizing overhead.
Safe Aliasing with Dynamic Shapes
One strategy to safely allow aliasing with dynamic shapes is to introduce runtime shape guards. These guards would validate buffer compatibility at execution time, only enabling aliasing if the actual shapes are compatible. Another strategy involves over-allocating buffers with conservative margins so that minor runtime variations can still be safely accommodated. A third strategy is deferred memory binding, in which buffer reuse decisions are delayed until runtime when precise shape information is available. These approaches trade a small amount of runtime complexity for potentially significant memory savings.
Aliasing Optimization to Cross-Computation Buffer Reuse
To extend aliasing beyond a single computation, XLA would require global buffer lifetime analysis across multiple HLO computations. By analyzing liveness and memory usage across function calls and loops, it would be possible to recycle buffers globally. Techniques like memory pooling and cross-computation liveness tracking, adapted from traditional compiler research 3, could help enable aliasing. Such global optimizations would offer further memory reductions for large-scale training workloads.
Trade-offs Between Compile-Time and Runtime Aliasing
Compile-time aliasing offers a simpler and predictable execution model with no runtime overhead, but it is inherently conservative. It cannot exploit dynamic input properties or adapt to changing workloads. Runtime aliasing decisions, in contrast, allow more aggressive optimization by adapting to actual runtime shapes, but introduce additional complexity and potential variability in performance. Hybrid models that integrate compile-time planning with lightweight runtime validation, such as those explored in MLIR 4, are promising future directions.
References
“XLA: Optimizing Compiler for Machine Learning.” OpenXLA Project, https://openxla.org/xla/tf2xla. Accessed 20 Dec. 2023.↩︎
Shape Inference - MLIR. https://mlir.llvm.org/docs/ShapeInference/. Accessed 20 Dec. 2023.↩︎
Muchnick, Steven. Advanced compiler design implementation. Morgan kaufmann, 1997.↩︎
Levental, Maksim. “Memory planning for deep neural networks.” arXiv preprint arXiv:2203.00448 (2022).↩︎
If you found this useful, please cite this post using
Senthilkumar Gopal. (Dec 2023). Aliasing on XLA. sengopal.me. https://sengopal.me/posts/aliasing-on-xla
or
@article{gopal2023aliasingonxla, title = {Aliasing on XLA}, author = {Senthilkumar Gopal}, journal = {sengopal.me}, year = {2023}, month = {Dec}, url = {https://sengopal.me/posts/aliasing-on-xla} }