機械学習によりアニメ化する件

機械学習で画像をアニメ化したくて

https://github.com/Yijunmaverick/CartoonGAN-Test-Pytorch-Torch

をためしました。
幸い
python test.py --input_dir YourImgDir --style Hosoda --gpu 0
は動作し、アニメ化出来ました。
そこで次に学習させて自前アニメ化したいと思いました。
それについては上記サイトには
The training code should be similar to the popular GAN-based image-translation frameworks and thus is not included here.
とあります。
これ私の想像するGANベースの学習とは例えばアニメする前、後の画像を用意して学習させるということなのですが、そういう意味なのでしょうか？
ということは例えばHosodaを作りたいときはHOSODAのアニメ化前と後の画像を用意したという意味ですか？
Githubで検索するとトレーニングできるCartoonGANもありますがそのへんの入力画像の用意の仕方が分かりません。
http://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_CartoonGAN_Generative_Adversarial_CVPR_2018_paper.pdf
これとか読むとA,Bで無関係な画像を用意したようにも読めます
たとえばこれ
The training data contains real-world photos and cartoonimages, and the test data only includes real-world photos.All the training images are resized and cropped to256×256.Photos.6,153 photos are downloaded from Flickr, inwhich 5,402 photos are for training and others for testing.Cartoon images.Different artists have different styleswhen creating cartoon images of real-world scenes. To ob-tain a set of cartoon images with the same style, we use thekey frames of cartoon films drawn and directed by the sameartist as the training data. In our experiments, 4,573 and4,212 cartoon images from several short cartoon videos areused for training the Makoto Shinkai and Mamoru Hosoda style models, and 3,617 and 2,302 images from the cartoonfilm “Spirited Away” and “Paprika” are used for training theMiyazaki Hayao and “Paprika” style models.
どう読んでも256X256で画像を無関係にいっぱい用意したって読めますが
それでうまくいくものなのでしょうか？
その感覚的なものが分かりません。　
お教えいただけると幸いです