CUDA どのような場合にどのような目的で\マークを単体で使うのか

前提・質問したいこと

開発環境：Win8.1, Visual Studio2015, CUDA9.2
『CUDA C プロフェッショナルプログラミング』のp47~50のsumArraysOnGPU-small-case.cuのプログラムの記述によく分からない部分があり今回質問させて頂きます。なお、今回使ったサンプルコードはhttps://book.impress.co.jp/books/1115101001 から本を買っていない方でもダウンロード出来ます。
CUDA_C_J_Samples.zipを展開後、CUDA_C_J_Samples/02にsumArraysOnGPU-small-case.cuはあります。

さて本題です。マクロ定義関数の各行の終わりに\マークが付いていますが、これらの\マークはどのような意味(機能?)を持つのでしょうか？
今までに\マーク単体の記述を用いてプログラムを実装した経験が無かったので、どのような場合にどのような目的で\マーク単体を用いるのかがよく分かっていません。

\マークが無い場合と比較してみようと考え、試しにこれらの\マークをすべて消してビルドしてみたところエラーが出ました(エラーメッセージ部分に記載)。
すると定義していない変数に関するエラーが多く発生しました。
このことから、今回用いたマクロ定義関数がエラーを無視するような機能を持つことは何となく分かるのですが、それならば定義していないはずの変数があるのにも関わらず、(\マーク有の)サンプルコードが正常に動作するのは何故でしょうか？

発生している問題・エラーメッセージ

\マークをすべて消したときに発生したエラーです(ユーザ名のところは伏せたいので「ユーザ名」と書き換えていますが、半角英字のみで構成されたユーザ名を使用しています)。
無論、\マークを消さずに元のサンプルコードのままビルドした際には何もエラーは起きず、プログラムも正常に動作します。

重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 宣言が必要です sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 12
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "abs" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 31
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "srand" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 51
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "time" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 51
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "rand" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 55
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "blockIdx" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 73
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "blockDim" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 73
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "threadIdx" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 73
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "malloc" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 95
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "memset" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 104
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 式が必要です sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 122
重大度レベルコード説明プロジェクトファイル行
エラー (アクティブ) 識別子 "free" が定義されていません sumArraysOnGPU-small-case c:\Users\ユーザ名\Desktop\180919\sumArraysOnGPU-small-case\sumArraysOnGPU-small-case\kernel.cu 140

該当のソースコード(サンプルそのまま\マーク有)

#include <cuda_runtime.h>
#include <stdio.h>

/*
* This example demonstrates a simple vector sum on the GPU and on the host.
* sumArraysOnGPU splits the work of the vector sum across CUDA threads on the
* GPU. Only a single thread block is used in this small case, for simplicity.
* sumArraysOnHost sequentially iterates through vector elements on the host.
*/

#define CHECK(call)                                                  \
{                                                                    \
    const cudaError_t error = call;                                  \
    if (error != cudaSuccess)                                        \
    {                                                                \
        printf("Error: %s:%d, ", __FILE__, __LINE__);                \
        printf("code:%d, reason: %s\n", error,                       \
                cudaGetErrorString(error));                          \
        exit(1);                                                     \
    }                                                                \
}


void checkResult(float *hostRef, float *gpuRef, const int N)
{
	double epsilon = 1.0E-8;
	bool match = 1;

	for (int i = 0; i < N; i++)
	{
		if (abs(hostRef[i] - gpuRef[i]) > epsilon)
		{
			match = 0;
			printf("Arrays do not match!\n");
			printf("host %5.2f gpu %5.2f at current %d\n", hostRef[i],
				gpuRef[i], i);
			break;
		}
	}

	if (match) printf("Arrays match.\n\n");

	return;
}


void initialData(float *ip, int size)
{
	// generate different seed for random number
	time_t t;
	srand((unsigned)time(&t));

	for (int i = 0; i < size; i++)
	{
		ip[i] = (float)(rand() & 0xFF) / 10.0f;
	}

	return;
}


void sumArraysOnHost(float *A, float *B, float *C, const int N)
{
	for (int idx = 0; idx < N; idx++)
		C[idx] = A[idx] + B[idx];
}

__global__ void sumArraysOnGPU(float *A, float *B, float *C, const int N)
{
	/*
	int i = threadIdx.x;
	*/
	int i = blockIdx.x * blockDim.x + threadIdx.x;

	C[i] = A[i] + B[i];
}


int main(int argc, char **argv)
{
	printf("%s Starting...\n", argv[0]);

	// set up device
	int dev = 0;
	cudaSetDevice(dev);

	// set up data size of vectors
	int nElem = 32;
	printf("Vector size %d\n", nElem);

	// malloc host memory
	size_t nBytes = nElem * sizeof(float);

	float *h_A, *h_B, *hostRef, *gpuRef;
	h_A = (float *)malloc(nBytes);
	h_B = (float *)malloc(nBytes);
	hostRef = (float *)malloc(nBytes);
	gpuRef = (float *)malloc(nBytes);

	// initialize data at host side
	initialData(h_A, nElem);
	initialData(h_B, nElem);

	memset(hostRef, 0, nBytes);
	memset(gpuRef, 0, nBytes);

	// malloc device global memory
	float *d_A, *d_B, *d_C;
	cudaMalloc((float**)&d_A, nBytes);
	cudaMalloc((float**)&d_B, nBytes);
	cudaMalloc((float**)&d_C, nBytes);

	// transfer data from host to device
	cudaMemcpy(d_A, h_A, nBytes, cudaMemcpyHostToDevice);
	cudaMemcpy(d_B, h_B, nBytes, cudaMemcpyHostToDevice);
	cudaMemcpy(d_C, gpuRef, nBytes, cudaMemcpyHostToDevice);

	// invoke kernel at host side
	dim3 block(nElem);
	dim3 grid(1);

	sumArraysOnGPU << <grid, block >> >(d_A, d_B, d_C, nElem);
	printf("Execution configure <<<%d, %d>>>\n", grid.x, block.x);

	// copy kernel result back to host side
	cudaMemcpy(gpuRef, d_C, nBytes, cudaMemcpyDeviceToHost);

	// add vector at host side for result checks
	sumArraysOnHost(h_A, h_B, hostRef, nElem);

	// check device results
	checkResult(hostRef, gpuRef, nElem);

	// free device global memory
	CHECK(cudaFree(d_A));
	CHECK(cudaFree(d_B));
	CHECK(cudaFree(d_C));

	// free host memory
	free(h_A);
	free(h_B);
	free(hostRef);
	free(gpuRef);

	cudaDeviceReset();
	return(0);
}