NVIDIA Jetson Xavier - Using CUDA

Build All CUDA Samples

1. Go to the samples path

cd /usr/local/cuda/samples

2. Construct the samples using the makefile

sudo make

CUDA Samples

All the samples are in:

/usr/local/cuda/samples

Simple Samples

Path	Sample	Description
/0_Simple/asyncAPI	asyncAPI	This sample uses CUDA streams and events to overlap execution on CPU and GPU.
/0_Simple/cdpSimplePrint	cdpSimplePrint	This sample demonstrates simple printf implemented using CUDA Dynamic Parallelism. This sample requires devices with compute capability 3.5 or higher.
/0_Simple/cdpSimpleQuicksort	cdpSimpleQuicksort	This sample demonstrates simple quicksort implemented using CUDA Dynamic Parallelism. This sample requires devices with compute capability 3.5 or higher.
/0_Simple/clock	clock	This example shows how to use the clock function to measure the performance of a block of threads of a kernel accurately.
/0_Simple/cppIntegration	cppIntegration	This example demonstrates how to integrate CUDA into an existing C++ application, i.e. the CUDA entry point on the host side is only a function which is called from C++ code, and only the file containing this function is compiled with nvcc. It also demonstrates that vector types can be used from cpp.
/0_Simple/cppOverload	cppOverload	This sample demonstrates how to use C++ function overloading on the GPU.
/0_Simple/cudaOpenMP	cudaOpenMP	This sample demonstrates how to use OpenMP API to write an application for multiple GPUs.
/0_Simple/fp16ScalarProduct	fp16ScalarProduct	Calculates scalar product of two vectors of FP16 numbers.
/0_Simple/inlinePTX	inlinePTX	A simple test application that demonstrates a new CUDA 4.0 ability to embed PTX in a CUDA kernel.
/0_Simple/matrixMul	matrixMul	This sample implements matrix multiplication which makes use of shared memory to ensure data reuse, the matrix multiplication is done using the tiling approach.
/0_Simple/matrixMulCUBLAS	matrixMulCUBLAS	This sample implements matrix multiplication. To illustrate GPU performance for matrix multiply, this sample also shows how to use the new CUDA 4.0 interface for CUBLAS to demonstrate high-performance performance for matrix multiplication.
/0_Simple/matrixMulDrv	matrixMulDrv	This sample implements matrix multiplication and uses the new CUDA 4.0 kernel launch Driver API.
/0_Simple/simpleAssert	simpleAssert	This CUDA Runtime API sample is a very basic sample that implements how to use the assert function in the device code. Requires Compute Capability 2.0.
/0_Simple/simpleAtomicIntrinsics	simpleAtomicIntrinsics	A simple demonstration of global memory atomic instructions. Requires Compute Capability 2.0 or higher.
/0_Simple/simpleCallback	simpleCallback	This sample implements multi-threaded heterogeneous computing workloads with the new CPU callbacks for CUDA streams and events introduced with CUDA 5.0.
/0_Simple/simpleCooperativeGroups	simpleCooperativeGroups	This sample is a simple code that illustrates the basic usage of cooperative groups within the thread block.
/0_Simple/simpleCubemapTexture	simpleCubemapTexture	Simple example that demonstrates how to use a new CUDA 4.1 feature to support cubemap Textures in CUDA C.
/0_Simple/simpleCudaGraphs	simpleCudaGraphs	A demonstration of CUDA Graphs creation, instantiation, and launch using Graphs APIs and Stream Capture APIs.
/0_Simple/simpleLayeredTexture	simpleLayeredTexture	Simple example that demonstrates how to use a new CUDA 4.0 feature to support layered Textures in CUDA C.
/0_Simple/simpleMPI	simpleMPI	Simple example demonstrating how to use MPI in combination with CUDA.
/0_Simple/simpleMultiCopy	simpleMultiCopy	This sample illustrates the usage of CUDA streams to achieve overlapping of kernel execution with data copies to and from the device.
/0_Simple/simpleMultiGPU	simpleMultiGPU	This application demonstrates how to use the new CUDA 4.0 API for CUDA context management and multi-threaded access to run CUDA kernels on multiple-GPUs.
/0_Simple/simpleOccupancy	simpleOccupancy	This sample demonstrates the basic usage of the CUDA occupancy calculator and occupancy-based launch configurator APIs by launching a kernel with the launch configurator and measures the utilization difference against a manually configured launch.
/0_Simple/simplePitchLinearTexture	simplePitchLinearTexture	Use of Pitch Linear Textures
/0_Simple/simplePrintf	simplePrintf	This CUDA Runtime API sample is a very basic sample that implements how to use the printf function in the device code.
/0_Simple/simpleSeparateCompilation	simpleSeparateCompilation	This sample demonstrates a CUDA 5.0 feature, the ability to create a GPU device static library and use it within another CUDA kernel. This example demonstrates how to pass in a GPU device function (from the GPU device static library) as a function pointer to be called.
/0_Simple/simpleStreams	simpleStreams	This sample uses CUDA streams to overlap kernel executions with memory copies between the host and a GPU device.
/0_Simple/simpleSurfaceWrite	simpleSurfaceWrite	Simple example that demonstrates the use of 2D surface references (Write-to-Texture).
/0_Simple/simpleTemplates	simpleTemplates	This sample is a templatized version of the template project. It also shows how to correctly templatize dynamically allocated shared memory arrays.
/0_Simple/simpleTexture	simpleTexture	Simple example that demonstrates use of Textures in CUDA.
/0_Simple/simpleTextureDrv	simpleTextureDrv	Simple example that demonstrates the use of Textures in CUDA. This sample uses the new CUDA 4.0 kernel launch Driver API.
/0_Simple/simpleVoteIntrinsics	simpleVoteIntrinsics	Simple program which demonstrates how to use the Vote (any, all) intrinsic instruction in a CUDA kernel.
/0_Simple/simpleZeroCopy	simpleZeroCopy	This sample illustrates how to use Zero MemCopy, kernels can read and write directly to pinned system memory
/0_Simple/template	template	A trivial template project that can be used as a starting point to create new CUDA projects.
/0_Simple/UnifiedMemoryStreams	UnifiedMemoryStreams	This sample demonstrates the use of OpenMP and streams with Unified Memory on a single GPU.
/0_Simple/vectorAdd	vectorAdd	This CUDA Runtime API sample is a very basic sample that implements element by element vector addition.
/0_Simple/vectorAddDrv	vectorAddDrv	This Vector Addition sample is a basic sample that is implemented element by element.

Utilities Samples

Path	Sample	Description
/1_Utilities/bandwidthTest	bandwidthTest	This is a simple test program to measure the memcopy bandwidth of the GPU and memcpy bandwidth across PCI-e.
/1_Utilities/deviceQuery	deviceQuery	This sample enumerates the properties of the CUDA devices present in the system.
/1_Utilities/deviceQueryDrv	deviceQueryDrv	This sample enumerates the properties of the CUDA devices present using CUDA Driver API calls.
/1_Utilities/p2pBandwidthLatencyTest	p2pBandwidthLatencyTest	This application demonstrates the CUDA Peer-To-Peer (P2P) data transfers between pairs of GPUs and computes latency and bandwidth.
/1_Utilities/UnifiedMemoryPerf	UnifiedMemoryPerf	This sample demonstrates the performance comparison using matrix multiplication kernel of Unified Memory with/without hints and other types of memory like zero-copy buffers, pageable, page locked memory performing synchronous and Asynchronous transfers on a single GPU.

Graphics Samples

Path	Sample	Description
/2_Graphics/bindlessTexture	bindlessTexture	This example demonstrates use of cudaSurfaceObject, cudaTextureObject, and MipMap support in CUDA.
/2_Graphics/Mandelbrot	Mandelbrot	This sample uses CUDA to compute and display the Mandelbrot or Julia sets interactively. It also illustrates the use of "double single" arithmetic to improve precision when zooming a long way into the pattern.
/2_Graphics/marchingCubes	marchingCubes	This sample extracts a geometric isosurface from a volume dataset using the marching cubes algorithm. It uses the scan (prefix sum) function from the Thrust library to perform stream compaction.
/2_Graphics/simpleGL	simpleGL	Simple program which demonstrates interoperability between CUDA and OpenGL. The program modifies vertex positions with CUDA and uses OpenGL to render the geometry.
/2_Graphics/simpleGLES	simpleGLES	Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry.
/2_Graphics/simpleGLES_EGLOutput	simpleGLES_EGLOutput	Demonstrates data exchange between CUDA and OpenGL ES (aka Graphics interop). The program modifies vertex positions with CUDA and uses OpenGL ES to render the geometry, and shows how to render directly to the display using the EGLOutput mechanism and the DRM library.
/2_Graphics/simpleTexture3D	simpleTexture3D	Simple example that demonstrates use of 3D Textures in CUDA.
/2_Graphics/volumeFiltering	volumeFiltering	This sample demonstrates 3D Volumetric Filtering using 3D Textures and 3D Surface Writes.
/2_Graphics/volumeRender	volumeRender	This sample demonstrates basic volume rendering using 3D Textures.

Imaging Samples

Path	Sample	Description
/3_Imaging/bicubicTexture	bicubicTexture	This sample demonstrates how to efficiently implement a Bicubic B-spline interpolation filter with CUDA texture.
/3_Imaging/bilateralFilter	bilateralFilter	Bilateral filter is an edge-preserving non-linear smoothing filter that is implemented with CUDA with OpenGL rendering. It can be used in image recovery and denoising. Each pixel is weight by considering both the spatial distance and color distance between its neighbors.
/3_Imaging/boxFilter	boxFilter	Fast image box filter using CUDA with OpenGL rendering.
/3_Imaging/convolutionFFT2D	convolutionFFT2D	This sample demonstrates how 2D convolutions with very large kernel sizes can be efficiently implemented using FFT transformations.
/3_Imaging/convolutionSeparable	convolutionSeparable	This sample implements a separable convolution filter of a 2D signal with a gaussian kernel.
/3_Imaging/convolutionTexture	convolutionTexture	Texture-based implementation of a separable 2D convolution with a gaussian kernel.
/3_Imaging/dct8x8	dct8x8	This sample demonstrates how Discrete Cosine Transform (DCT) for blocks of 8 by 8 pixels can be performed using CUDA: a naive implementation by definition and a more traditional approach used in many libraries.
/3_Imaging/dwtHaar1D	dwtHaar1D	Discrete Haar wavelet decomposition for 1D signals with a length which is a power of 2.
/3_Imaging/dxtc	dxtc	High-Quality DXT Compression using CUDA. This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement.
/3_Imaging/EGLStream_CUDA_CrossGPU	EGLStream_CUDA_CrossGPU	Demonstrates CUDA and EGL Streams interop, where consumer's EGL Stream is on one GPU and producer's on other and both consumer-producer are different processes.
/3_Imaging/EGLStreams_CUDA_Interop	EGLStreams_CUDA_Interop	Demonstrates data exchange between CUDA and EGL Streams.
/3_Imaging/EGLSync_CUDAEvent_Interop	EGLSync_CUDAEvent_Interop	Demonstrates interoperability between CUDA Event and EGL Sync/EGL Image using which one can achieve synchronization on GPU itself for GL-EGL-CUDA operations instead of blocking CPU for synchronization.
/3_Imaging/histogram	histogram	This sample demonstrates the efficient implementation of 64-bin and 256-bin histograms.
/3_Imaging/HSOpticalFlow	HSOpticalFlow	Variational optical flow estimation example. Uses textures for image operations. Shows how a simple PDE solver can be accelerated with CUDA.
/3_Imaging/imageDenoising	imageDenoising	This sample demonstrates two adaptive image denoising techniques: KNN and NLM, based on the computation of both geometric and color distance between texels.
/3_Imaging/postProcessGL	postProcessGL	This sample shows how to post-process an image rendered in OpenGL using CUDA.
/3_Imaging/recursiveGaussian	recursiveGaussian	This sample implements a Gaussian blur using Deriche's recursive method.
/3_Imaging/simpleCUDA2GL	simpleCUDA2GL	This sample shows how to copy a CUDA images back to OpenGL using the most efficient methods.
/3_Imaging/SobelFilter	SobelFilter	This sample implements the Sobel edge detection filter for 8-bit monochrome images.
/3_Imaging/stereoDisparity	stereoDisparity	A CUDA program that demonstrates how to compute a stereo disparity map using SIMD SAD (Sum of Absolute Difference) intrinsics.

Finance Samples

Path	Sample	Description
/4_Finance/binomialOptions	binomialOptions	This sample evaluates fair call price for a given set of European options under the binomial model.
/4_Finance/BlackScholes	BlackScholes	This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula.
/4_Finance/MonteCarloMultiGPU	MonteCarloMultiGPU	This sample evaluates fair call price for a given set of European options using the Monte Carlo approach, taking advantage of all CUDA-capable GPUs installed in the system.
/4_Finance/quasirandomGenerator	quasirandomGenerator	This sample implements Niederreiter Quasirandom Sequence Generator and Inverse Cumulative Normal Distribution functions for the generation of Standard Normal Distributions.
/4_Finance/SobolQRNG	SobolQRNG	This sample implements Sobol Quasirandom Sequence Generator.

Simulations Samples

Path	Sample	Description
/5_Simulations/fluidsGL	fluidsGL	An example of fluid simulation using CUDA and CUFFT, with OpenGL rendering.
/5_Simulations/fluidsGLES	fluidsGLES	An example of fluid simulation using CUDA and CUFFT, with OpenGLES rendering.
/5_Simulations/nbody	nbody	This sample demonstrates the efficient all-pairs simulation of a gravitational n-body simulation in CUDA.
/5_Simulations/nbody_opengles	nbody_opengles	This sample demonstrates the efficient all-pairs simulation of a gravitational n-body simulation in CUDA. Unlike the OpenGL nbody sample, there is no user interaction.
/5_Simulations/oceanFFT	oceanFFT	This sample simulates an Ocean height field using CUFFT Library and renders the result using OpenGL.
/5_Simulations/particles	particles	This sample uses CUDA to simulate and visualize a large set of particles and their physical interaction. Adding "-particles=<N>" to the command line will allow users to set # of particles for simulation.
/5_Simulations/smokeParticles	smokeParticles	Smoke simulation with volumetric shadows using half-angle slicing technique.

Advanced Samples

Path	Sample	Description
/6_Advanced/alignedTypes	alignedTypes	A simple test, showing huge access speed gap between aligned and misaligned structures.
/6_Advanced/cdpAdvancedQuicksort	cdpAdvancedQuicksort	This sample demonstrates an advanced quicksort implemented using CUDA Dynamic Parallelism.
/6_Advanced/cdpBezierTessellation	cdpBezierTessellation	This sample demonstrates bezier tessellation of lines implemented using CUDA Dynamic Parallelism.
/6_Advanced/cdpQuadtree	cdpQuadtree	This sample demonstrates Quad Trees implemented using CUDA Dynamic Parallelism.
/6_Advanced/concurrentKernels	concurrentKernels	This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on devices of computing capability 2.0 or higher. Devices of computing capability 1.x will run the kernels sequentially.
/6_Advanced/eigenvalues	eigenvalues	This sample demonstrates a parallel implementation of a bisection algorithm for the computation of all eigenvalues of a tridiagonal symmetric matrix of arbitrary size with CUDA.
/6_Advanced/fastWalshTransform	fastWalshTransform	Naturally(Hadamard)-ordered Fast Walsh Transform for batching vectors of arbitrary eligible lengths that are the power of two in size.
/6_Advanced/FDTD3d	FDTD3d	This sample applies a finite differences time domain progression stencil on a 3D surface.
/6_Advanced/FunctionPointers	FunctionPointers	This sample illustrates how to use function pointers and implements the Sobel Edge Detection filter for 8-bit monochrome images.
/6_Advanced/interval	interval	Interval arithmetic operators example.
/6_Advanced/lineOfSight	lineOfSight	This sample is an implementation of a simple line-of-sight algorithm: Given a height map and a ray originating at some observation point, it computes all the points along the ray that are visible from the observation point.
/6_Advanced/matrixMulDynlinkJIT	matrixMulDynlinkJIT	This sample revisits matrix multiplication using the CUDA driver API. It demonstrates how to link to CUDA driver at runtime and how to use JIT (just-in-time) compilation from PTX code.
/6_Advanced/mergeSort	mergeSort	This sample implements a merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks.
/6_Advanced/newdelete	newdelete	This sample demonstrates dynamic global memory allocation through device C++ new and delete operators and virtual function declarations available with CUDA 4.0.
/6_Advanced/ptxjit	ptxjit	This sample uses the Driver API to just-in-time compile (JIT) a Kernel from PTX code. Additionally, this sample demonstrates the seamless interoperability capability of the CUDA Runtime and CUDA Driver API calls.
/6_Advanced/radixSortThrust	radixSortThrust	This sample demonstrates a very fast and efficient parallel radix sort that uses the Thrust library. The included RadixSort class can sort either key-value pairs (with a float or unsigned integer keys) or keys only.
/6_Advanced/reduction	reduction	A parallel sum reduction that computes the sum of a large array of values.
/6_Advanced/scalarProd	scalarProd	This sample calculates scalar products of a given set of input vector pairs.
/6_Advanced/scan	scan	This example demonstrates an efficient CUDA implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
/6_Advanced/segmentationTreeThrust	segmentationTreeThrust	This sample demonstrates an approach to the image segmentation trees construction. This method is based on Boruvka's MST algorithm.
/6_Advanced/shfl_scan	shfl_scan	This example demonstrates how to use the shuffle intrinsic __shfl_up to perform a scan operation across a thread block.
/6_Advanced/simpleHyperQ	simpleHyperQ	This sample demonstrates the use of CUDA streams for concurrent execution of several kernels on devices that provide HyperQ (SM 3.5). Devices without HyperQ (SM 2.0 and SM 3.0) will run a maximum of two kernels concurrently.
/6_Advanced/sortingNetworks	sortingNetworks	This sample implements bitonic sort and odd-even merge sort (also known as Batcher's sort), algorithms belonging to the class of sorting networks. While generally subefficient, for large sequences compared to algorithms with better asymptotic algorithmic complexity (i.e. merge sort or radix sort).
/6_Advanced/threadFenceReduction	threadFenceReduction	This sample shows how to perform a reduction operation on an array of values using the thread Fence intrinsic to produce a single value in a single kernel.
/6_Advanced/threadMigration	threadMigration	Simple program illustrating how to the CUDA Context Management API and uses the new CUDA 4.0 parameter passing and CUDA launch API. CUDA contexts can be created separately and attached independently to different threads.
/6_Advanced/transpose	transpose	This sample demonstrates Matrix Transpose.
/6_Advanced/warpAggregatedAtomicsCG	warpAggregatedAtomicsCG	This sample demonstrates how using Cooperative Groups (CG) to perform warp aggregated atomics, a useful technique to improve performance when many threads atomically add to a single counter.

CUDALibraries Samples

Path	Sample	Description
/7_CUDALibraries/batchCUBLAS	batchCUBLAS	A CUDA Sample that demonstrates how using batched CUBLAS API calls to improve overall performance.
/7_CUDALibraries/BiCGStab	BiCGStab	A CUDA Sample that demonstrates Bi-Conjugate Gradient Stabilized (BiCGStab) iterative method for nonsymmetric and symmetric positive definite (s.p.d.) linear systems using CUSPARSE and CUBLAS.
/7_CUDALibraries/boundSegmentsNPP	boundSegmentsNPP	An NPP CUDA Sample that demonstrates using nppiLabelMarkers to generate connected region segment labels in an 8-bit grayscale image then compressing the sparse list of generated labels into the minimum number of uniquely labeled regions in the image using nppiCompressMarkerLabels. Finally, a boundary is added surrounding each segmented region in the image using nppiBoundSegments.
/7_CUDALibraries/boxFilterNPP	boxFilterNPP	A NPP CUDA Sample that demonstrates how to use NPP FilterBox function to perform a Box Filter.
/7_CUDALibraries/cannyEdgeDetectorNPP	cannyEdgeDetectorNPP	An NPP CUDA Sample that demonstrates the recommended parameters to use with the nppiFilterCannyBorder_8u_C1R Canny Edge Detection image filter function.
/7_CUDALibraries/conjugateGradient	conjugateGradient	This sample implements a conjugate gradient solver on GPU using CUBLAS and CUSPARSE library.
/7_CUDALibraries/cuSolverDn_LinearSolver	cuSolverDn_LinearSolver	A CUDA Sample that demonstrates cuSolverDN's LU, QR, and Cholesky factorization.
/7_CUDALibraries/cuSolverRf	cuSolverRf	A CUDA Sample that demonstrates cuSolver's refactorization library - CUSOLVERRF.
/7_CUDALibraries/cuSolverSp_LinearSolver	cuSolverSp_LinearSolver	A CUDA Sample that demonstrates cuSolverSP's LU, QR, and Cholesky factorization.
/7_CUDALibraries/cuSolverSp_LowlevelCholesky	cuSolverSp_LowlevelCholesky	A CUDA Sample that demonstrates Cholesky factorization using cuSolverSP's low-level APIs.
/7_CUDALibraries/cuSolverSp_LowlevelQR	cuSolverSp_LowlevelQR	A CUDA Sample that demonstrates QR factorization using cuSolverSP's low-level APIs.
/7_CUDALibraries/FilterBorderControlNPP	FilterBorderControlNPP	This NPP CUDA Sample demonstrates how any border version of an NPP filtering function can be used in the most common mode (with border control enabled), can be used to duplicate the results of the equivalent non-border version of the NPP function, and can be used to enable and disable border control on various source image edges depending on what portion of the source image is being used as input.
/7_CUDALibraries/freeImageInteropNPP	freeImageInteropNPP	A simple CUDA Sample demonstrate how to use FreeImage library with NPP.
/7_CUDALibraries/histEqualizationNPP	histEqualizationNPP	This CUDA Sample demonstrates how to use NPP for histogram equalization for image data.
/7_CUDALibraries/jpegNPP	jpegNPP	This sample demonstrates a simple image processing pipeline. First, a JPEG file is Huffman decoded and inverse DCT transformed and dequantized. Then the different plances are resized. Finally, the resized image is quantized, forward DCT transformed and Huffman encoded.
/7_CUDALibraries/MC_EstimatePiInlineP	MC_EstimatePiInlineP	This sample uses Monte Carlo simulation for Estimation of Pi (using inline PRNG). This sample also uses the NVIDIA CURAND library.
/7_CUDALibraries/MC_EstimatePiInlineQ	MC_EstimatePiInlineQ	This sample uses Monte Carlo simulation for Estimation of Pi (using inline QRNG). This sample also uses the NVIDIA CURAND library.
/7_CUDALibraries/MC_EstimatePiP	MC_EstimatePiP	This sample uses Monte Carlo simulation for Estimation of Pi (using batch PRNG). This sample also uses the NVIDIA CURAND library.
/7_CUDALibraries/MC_EstimatePiQ	MC_EstimatePiQ	This sample uses Monte Carlo simulation for Estimation of Pi (using batch QRNG). This sample also uses the NVIDIA CURAND library.
/7_CUDALibraries/MC_SingleAsianOptionP	MC_SingleAsianOptionP	This sample uses Monte Carlo to simulate Single Asian Options using the NVIDIA CURAND library.
/7_CUDALibraries/MersenneTwisterGP11213	MersenneTwisterGP11213	This sample demonstrates the Mersenne Twister random number generator GP11213 in cuRAND.
/7_CUDALibraries/randomFog	randomFog	This sample illustrates pseudo- and quasi- random numbers produced by CURAND.
/7_CUDALibraries/simpleCUBLAS	simpleCUBLAS	Example of using CUBLAS using the new CUBLAS API interface available in CUDA 4.0.
/7_CUDALibraries/simpleCUBLASXT	simpleCUBLASXT	Example of using CUBLAS-XT library.
/7_CUDALibraries/simpleCUFFT	simpleCUFFT	Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into the frequency domain, multiplying them together, and transforming the signal back to the time domain.
/7_CUDALibraries/simpleCUFFT_2d_MGPU	simpleCUFFT_2d_MGPU	Example of using CUFFT. In this example, CUFFT is used to compute the 2D-convolution of some signal with some filter by transforming both into the frequency domain, multiplying them together, and transforming the signal back to the time domain on Multiple GPU.
/7_CUDALibraries/simpleCUFFT_MGPU	simpleCUFFT_MGPU	Example of using CUFFT. In this example, CUFFT is used to compute the 1D-convolution of some signal with some filter by transforming both into the frequency domain, multiplying them together, and transforming the signal back to the time domain on Multiple GPU.

For more information about CUDA, go to: Xavier/JetPack_4.1/Components/Cuda

Previous: Processors/GPU/Description

Index

Next: Processors/GPU/OPENGL

NVIDIA Jetson Xavier - Using CUDA

Contents

Build All CUDA Samples

CUDA Samples

Simple Samples

Utilities Samples

Graphics Samples

Imaging Samples

Finance Samples

Simulations Samples

Advanced Samples

CUDALibraries Samples

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Useful Links

Legal

Services

Tools