PyTorch Custom Operation

PyTorch Custom Operation - Lei Mao's Log Book

PyTorch Custom Operation 05-10-2026 05-10-2026 blog 23 minutes read (About 3501 words) visits

Introduction Using PyTorch custom operations is common in PyTorch models. PyTorch custom operations can be custom classes and custom functions implemented in C++ and CUDA and used in both Python and C++ inference programs.

In this blog post, I would like to share how to implement PyTorch custom operations in C++ and CUDA, and how to use them in PyTorch models and AOTInductor compiled inference programs, using a simple identity convolution example.

PyTorch Custom Function PyTorch custom functions can be implemented in C++ and CUDA and registered using the TORCH_LIBRARY_IMPL macro. Both the CPU and CUDA implementations can be provided, and PyTorch will dispatch to the correct implementation based on the device of the input tensors.

custom_ops.cpp1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 // --------------------------------------------------------------------------- // CPU implementation: plain element-wise copy via clone(). // --------------------------------------------------------------------------- torch::Tensor identity_conv_cpu_impl(const torch::Tensor& input) TORCH_CHECK(!input.is_cuda(), "identity_conv_cpu_impl: input must be a CPU tensor"); return input.clone();

// --------------------------------------------------------------------------- // Host-side dispatcher. // --------------------------------------------------------------------------- torch::Tensor identity_conv_cuda_impl(const torch::Tensor& input) TORCH_CHECK(input.is_cuda(), "identity_conv_cuda_impl: input must be a CUDA tensor");

// Output has the same shape, dtype, and strides as input. auto output = torch::empty_like(input); const int64_t numel = input.numel();

if (numel == 0) return output;

// Upload shape and strides to the device so the kernel can read them. const int ndim = input.dim(); const auto opts = torch::TensorOptions().dtype(torch::kInt64).device(input.device()); const auto shape_dev = torch::tensor( std::vectorint64_t>(input.sizes().begin(), input.sizes().end()), opts); const auto strides_dev = torch::tensor( std::vectorint64_t>(input.strides().begin(), input.strides().end()), opts);

constexpr int kThreads = 256; const int blocks = static_castint>((numel + kThreads - 1) / kThreads);

AT_DISPATCH_FLOATING_TYPES_AND2( at::ScalarType::Half, at::ScalarType::BFloat16, input.scalar_type(), "identity_conv_cuda_impl", [&]() identity_kernelscalar_t>>>( input.data_ptrscalar_t>(), output.data_ptrscalar_t>(), shape_dev.data_ptrint64_t>(), strides_dev.data_ptrint64_t>(), ndim, numel); });

C10_CUDA_KERNEL_LAUNCH_CHECK(); return output;

custom_op_registration.cpp1 10 11 // CUDA kernel implementation for my_ops::identity_conv_op. TORCH_LIBRARY_IMPL(my_ops, CUDA, m) m.impl("identity_conv_op", identity_conv_cuda_impl);

// CPU fallback. TORCH_LIBRARY_IMPL(my_ops, CPU, m) m.impl("identity_conv_op", identity_conv_cpu_impl);

PyTorch Custom Class PyTorch custom functions are stateless and cannot hold any parameters. If we would like to implement a custom class that holds some parameters and has a forward() method that can be called from Python, we can use torch::CustomClassHolder to define a custom class in C++ and register it with TORCH_LIBRARY macro.

custom_class.cpp1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 // --------------------------------------------------------------------------- // IdentityConvClass // // A custom class registered with torch.classes so that it can be embedded // in a torch.nn.Module, exported with torch.export, and compiled with // AOTInductor. // // The forward() method delegates to the CUDA identity kernel. The // `channels_` field is preserved for semantic completeness and is serialised // via def_pickle so that the class survives export/import round-trips. // --------------------------------------------------------------------------- struct IdentityConvClass : torch::CustomClassHolder int64_t channels_;

explicit IdentityConvClass(int64_t channels) : channels_(channels) {}

torch::Tensor forward(const torch::Tensor& x) return x.is_cuda() ? identity_conv_cuda_impl(x) : identity_conv_cpu_impl(x);

int64_t get_channels() const { return channels_; } };

custom_class_registration.cpp1 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 // --------------------------------------------------------------------------- // Operator / class registration // // This file has no pybind11 dependency and is compiled into // libidentity_conv_ops.so, which can be dlopen'd by a pure C++ binary // without needing libpython. // --------------------------------------------------------------------------- TORCH_LIBRARY(my_ops, m) // Register IdentityConvClass so Python can instantiate it as //...

PyTorch Custom Operation

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy