ENSIKLOPEDIA
Kembali ke Ensiklopedia
Arsip Wikipedia Indonesia
Pengguna:KhalilullahAlFaath/Jaringan saraf konvolusional
{{under construction}}
{{short description|jaringan saraf tiruan}}
{{Other uses|CNN (disambiguasi)}}
{{Pemelajaran mesin|Jaringan saraf konvolusional}}
'''Jaringan saraf konvolusional''' ([[bahasa Inggris]]: ''Convolutional Neural Network'') yang biasa disingkat '''CNN''' adalah jaringan saraf umpan maju te[[regularisasi (matematika)|regularisasi]] yang dapat meng[[ekstraksi fitur]] sendiri dengan menggunakan optimasi [[filter (pemrosesan sinyal)|filter]] (atau kernel). Pada versi awalnya, jaringan saraf selama proses [[Algoritma perambatan mundur|propagasi balik]] seringkali mengalami masalah hilangnya atau meledaknya gradien. Untuk mencegah hal tersebut terjadi, digunakanlah bobot teregulasi dengan koneksi yang lebih sedikit.<ref name="auto3">{{cite book |last1=Venkatesan |first1=Ragav |url=https://books.google.com/books?id=bAM7DwAAQBAJ&q=vanishing+gradient |title=Convolutional Neural Networks in Visual Computing: A Concise Guide |last2=Li |first2=Baoxin |date=2017-10-23 |publisher=CRC Press |isbn=978-1-351-65032-8 |language=en |access-date=2020-12-13 |archive-date=2023-10-16 |archive-url=https://web.archive.org/web/20231016190415/https://books.google.com/books?id=bAM7DwAAQBAJ&q=vanishing+gradient#v=snippet&q=vanishing%20gradient&f=false |url-status=live }}</ref><ref name="auto2">{{cite book |last1=Balas |first1=Valentina E. |url=https://books.google.com/books?id=XRS_DwAAQBAJ&q=exploding+gradient |title=Recent Trends and Advances in Artificial Intelligence and Internet of Things |last2=Kumar |first2=Raghvendra |last3=Srivastava |first3=Rajshree |date=2019-11-19 |publisher=Springer Nature |isbn=978-3-030-32644-9 |language=en |access-date=2020-12-13 |archive-date=2023-10-16 |archive-url=https://web.archive.org/web/20231016190414/https://books.google.com/books?id=XRS_DwAAQBAJ&q=exploding+gradient#v=snippet&q=exploding%20gradient&f=false |url-status=live }}</ref> Misalkan, untuk ''setiap'' neuron pada lapisan yang terhubung sepenuhnya (''fully-connected layers''), 10.000 bobot diperlukan untuk memproses sebuah citra berukuran 100 x 100 piksel. Namun, dengan menerapkan kernel konvolusi (atau korelasi silang) bertingkat,<ref>{{Cite journal|last1=Zhang|first1=Yingjie|last2=Soon|first2=Hong Geok|last3=Ye|first3=Dongsen|last4=Fuh|first4=Jerry Ying Hsi|last5=Zhu|first5=Kunpeng|date=September 2020|title=Powder-Bed Fusion Process Monitoring by Machine Vision With Hybrid Convolutional Neural Networks|url=https://ieeexplore.ieee.org/document/8913613|journal=IEEE Transactions on Industrial Informatics|volume=16|issue=9|pages=5769–5779|doi=10.1109/TII.2019.2956078|s2cid=213010088|issn=1941-0050|access-date=2023-08-12|archive-date=2023-07-31|archive-url=https://web.archive.org/web/20230731120013/https://ieeexplore.ieee.org/document/8913613/|url-status=live}}</ref><ref>{{Cite journal|last1=Chervyakov|first1=N.I.|last2=Lyakhov|first2=P.A.|last3=Deryabin|first3=M.A.|last4=Nagornov|first4=N.N.|last5=Valueva|first5=M.V.|last6=Valuev|first6=G.V.|date=September 2020|title=Residue Number System-Based Solution for Reducing the Hardware Cost of a Convolutional Neural Network|url=https://linkinghub.elsevier.com/retrieve/pii/S092523122030583X|journal=Neurocomputing|language=en|volume=407|pages=439–453|doi=10.1016/j.neucom.2020.04.018|s2cid=219470398|quote=Convolutional neural networks represent deep learning architectures that are currently used in a wide range of applications, including computer vision, speech recognition, malware dedection, time series analysis in finance, and many others.|access-date=2023-08-12|archive-date=2023-06-29|archive-url=https://web.archive.org/web/20230629155646/https://linkinghub.elsevier.com/retrieve/pii/S092523122030583X|url-status=live}}</ref> hanya dibutuhkan 25 neuron untuk memproses petak berukuran 5x5.<ref name="auto1">{{cite book |title=Guide to convolutional neural networks : a practical application to traffic-sign detection and classification |last=Habibi |first=Aghdam, Hamed |others=Heravi, Elnaz Jahani |isbn=9783319575490 |location=Cham, Switzerland |oclc=987790957 |date=2017-05-30}}</ref><ref>{{Cite journal|last=Atlas, Homma, and Marks|title=An Artificial Neural Network for Spatio-Temporal Bipolar Patterns: Application to Phoneme Classification|url=https://papers.nips.cc/paper/1987/file/98f13708210194c475687be6106a3b84-Paper.pdf |archive-url=https://web.archive.org/web/20210414091306/https://papers.nips.cc/paper/1987/file/98f13708210194c475687be6106a3b84-Paper.pdf |archive-date=2021-04-14 |url-status=live|journal=Neural Information Processing Systems (NIPS 1987)|volume=1}}</ref> Fitur tingkat tinggi diekstrak dari jendela konteks yang lebih luas dibandingkan fitur tingkat rendah.
CNN diaplikasikan pada:
* [[visi komputer|pengenalan gambar dan video]],<ref name="Valueva Nagornov Lyakhov Valuev 2020 pp. 232–243">{{cite journal |last1=Valueva |first1=M.V. |last2=Nagornov |first2=N.N. |last3=Lyakhov |first3=P.A. |last4=Valuev |first4=G.V. |last5=Chervyakov |first5=N.I. |title=Application of the residue number system to reduce hardware costs of the convolutional neural network implementation |journal=Mathematics and Computers in Simulation |publisher=Elsevier BV |volume=177 |year=2020 |issn=0378-4754 |doi=10.1016/j.matcom.2020.04.031 |pages=232–243 |s2cid=218955622 |quote=Convolutional neural networks are a promising tool for solving the problem of pattern recognition.}}</ref>
* [[sistem rekomendasi]]s,<ref>{{cite book |url=https://proceedings.neurips.cc/paper/2013/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf |title=Deep content-based music recommendation |last1=van den Oord |first1=Aaron |last2=Dieleman |first2=Sander |last3=Schrauwen |first3=Benjamin |date=2013-01-01 |publisher=Curran Associates, Inc. |editor-last=Burges |editor-first=C. J. C. |pages=2643–2651 |editor-last2=Bottou |editor-first2=L. |editor-last3=Welling |editor-first3=M. |editor-last4=Ghahramani |editor-first4=Z. |editor-last5=Weinberger |editor-first5=K. Q. |access-date=2022-03-31 |archive-date=2022-03-07 |archive-url=https://web.archive.org/web/20220307172303/https://proceedings.neurips.cc/paper/2013/file/b3ba8f1bee1238a2f37603d90b58898d-Paper.pdf |url-status=live }}</ref>
*
* [[klasifikasi citra]],
*
* [[segmentasi citra]],
*
* [[komputasi citra medis|analisis citra medis]],
*
* [[pemrosesan bahasa alami]],<ref>{{cite book |last1=Collobert |first1=Ronan |last2=Weston |first2=Jason |title=Proceedings of the 25th international conference on Machine learning - ICML '08 |chapter=A unified architecture for natural language processing |date=2008-01-01 |location=New York, NY, USA |publisher=ACM |pages=160–167 |doi=10.1145/1390156.1390177 |isbn=978-1-60558-205-4 |s2cid=2617020}}</ref>
*
* [[antarmuka otak-komputer]],<ref>{{cite book |last1=Avilov |first1=Oleksii |last2=Rimbert |first2=Sebastien |last3=Popov |first3=Anton |last4=Bougrain |first4=Laurent |title=2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC) |chapter=Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals |date=July 2020 |chapter-url=https://ieeexplore.ieee.org/document/9176228 |volume=2020 |location=Montreal, QC, Canada |publisher=IEEE |pages=142–145 |doi=10.1109/EMBC44109.2020.9176228 |pmid=33017950 |isbn=978-1-7281-1990-8 |s2cid=221386616 |url=https://hal.inria.fr/hal-02920320/file/Avilov_EMBC2020.pdf |access-date=2023-07-21 |archive-date=2022-05-19 |archive-url=https://web.archive.org/web/20220519135428/https://hal.inria.fr/hal-02920320/file/Avilov_EMBC2020.pdf |url-status=live }}</ref> dan
*
* [[deret waktu]] keuangan.<ref name="Tsantekidis 7–12">{{cite book |last1=Tsantekidis |first1=Avraam |last2=Passalis |first2=Nikolaos |last3=Tefas |first3=Anastasios |last4=Kanniainen |first4=Juho |last5=Gabbouj |first5=Moncef |last6=Iosifidis |first6=Alexandros |title=2017 IEEE 19th Conference on Business Informatics (CBI) |chapter=Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks |date=July 2017 |location=Thessaloniki, Greece |publisher=IEEE |pages=7–12 |doi=10.1109/CBI.2017.23 |isbn=978-1-5386-3035-8 |s2cid=4950757}}</ref>
CNN juga dikenal sebagai '''''Shift Invariant''''' atau '''''Space Invariant Artificial Neural Networks''''' ('''SIANN'''), berdasarkan arsitektur bobot bersama (''shared-weights'') dari kernel [[Konvolusi|konvolusi]] (filter) yang bergeser sepanjang fitur masukan dan menghasilkan respons yang [[Peta ekuivariasi|equivariasi]]-translasi, yang dikenal sebagai peta fitur (''feature maps''). Sebaliknya, kebanyakan CNN tidak [[invarian translasi]], karena adanya operasi ''downsampling'' yang diaplikasikan pada masukan.
[[Jaringan saraf umpan maju]] atau ''feed-forward neural networks'' umumnya merupakan jaringan yang terhubung sepenuhnya. Maksudnya, setiap neuron di [[Lapisan (pemelajaran dalam)|lapisan]] tertentu terhubung dengan semua neuron yang ada di lapisan setelahnya. “Keterhubungan penuh” ini menyebabkan jaringan menjadi rentan terhadap [[overfitting|''overfitting'']]. Cara-cara regularisasi atau pencegahan ''overfitting'' yang umum digunakan, adalah pengurangan parameter ketika pelatihan, seperti ''weight decay'' (peluruhan bobot) atau pemangkasan konektivitas, seperti penggunaan skip connection, dropout, dll. Selain itu, penggunaan dataset yang kokoh (robust) dapat meningkatkan probabilitas kemampuan CNN untuk mempelajari prinsip-prinsip yang diperumum yang dapat mewakilkan karakter dataset, bukan bias-bias dari dataset yang tidak mewakilkan keseluruhan populasi.<ref>{{Cite journal |last=Kurtzman |first=Thomas |date=August 20, 2019 |title=Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening |journal=PLOS ONE|volume=14 |issue=8 |pages=e0220113 |doi=10.1371/journal.pone.0220113 |pmid=31430292 |pmc=6701836 |bibcode=2019PLoSO..1420113C |doi-access=free }}</ref>
Jaringan konvolusional [[biologi matematika dan teori|terinspirasi]] oleh proses [[biologis]]<ref name=fukuneoscholar/><ref name="hubelwiesel1968"/><ref name="intro"/><ref name="robust face detection">{{cite journal |last=Matusugu |first=Masakazu |year=2003 |title=Subject independent facial expression recognition with robust face detection using a convolutional neural network |url=http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/sparse/matsugo_etal_face_expression_conv_nnet.pdf |journal=Neural Networks |volume=16 |issue=5 |pages=555–559 |doi=10.1016/S0893-6080(03)00115-1 |pmid=12850007 |author2=Katsuhiko Mori |author3=Yusuke Mitari |author4=Yuji Kaneda |access-date=17 November 2013 |archive-date=13 December 2013 |archive-url=https://web.archive.org/web/20131213022740/http://www.iro.umontreal.ca/~pift6080/H09/documents/papers/sparse/matsugo_etal_face_expression_conv_nnet.pdf |url-status=live }}</ref>, terkait pola koneksi antar-[[neuron tiruan|neuron]] yang menyerupai organisasi [[korteks visual]] hewan. Setiap neuron kortikal tersebut merespons rangsangan hanya pada [[bidang visual]] terbatas yang biasa disebut sebagai [[bidang reseptif]]. Bidang reseptif pada neuron yang berbeda saling sebagiannya tumpang tindih sehingga dapat menutupi seluruh bidang visual.
CNN menggunakan pra-pemrosesan (pre-processing) yang relatif lebih sedikit, dibandingkan dengan [[klasifikasi citra|algoritma klasifikasi citra]] lain. Artinya, jaringan pada CNN dapat mengoptimasi [[filter (pemrosesan sinyal|filter]] atau kernel dengan pemelajaran otomatis, yang mana filter pada algoritma tradisional harus [[rekayasa fitur|direkayasa manual]].
Kelebihan besar CNN dalam pemodelannya oleh pengguna adalah CNN tidak memerlukan pengetahuan awal dan ekstraksi fitur manual.
{{TOC limit|3}}
== Arsitektur ==
[[File:Comparison image neural networks.svg|thumb|480px|Perbandingan lapisan konvolusi (convolution), pengumpul (pooling), rapat (dense) dari [[LeNet]] dan [[AlexNet]]<br>(Ukuran citra masukan AlexNet seharusnya 227×227×3, bukan 224×224×3 agar perhitungannya benar. Publikasi aslinya menyebutkan angka yang berbeda, tetapi Andrej Karpathy, kepala visi komputer Tesla mengatakan bahwa seharusnya ukuran citra masukannya adalah 227×227×3 (dia mengatakan bahwa Alex tidak menjelaskan mengapa dia menggunakan 224×224×3). Konvolusi berikutnya seharusnya 11×11 dengan langkah (''stride'') 4: 55×55×96 (bukan 54×54×96). Sehingga jika dihitung sebagai contoh: [(lebar input 227 - lebar kernel 11) / ''stride'' 4] + 1 = [(227 - 11) / 4] + 1 = 55. Karena luaran kernel memiliki panjang yang sama dengan lebar, maka luasnya adalah 55×55).]]
{{Main|Lapisan (pemelajaran dalam)}}
CNN terdiri atas satu lapisan masukan (''input layer''), [[jaringan saraf tiruan#organisasi|lapisan-lapisan tersembunyi]], dan satu lapisan luaran (''output layer''). Lapisan-lapisan tersembunyi tersebut di dalamnya termasuk satu atau lebih lapisan yang dapat mengkonvolusi. Lapisan ini biasanya menghitung [[Produk dot|perkalian titik]] kernel konvolusi dengan matriks lapisan masukan. Lapisan ini melakukan perkalian titik umumnya dengan [[produk titik frobenius|produk titik Frobenius]] dan [[rectifier (jaringan saraf)|ReLU]] sebagai fungsi aktivasinya. Proses konvolusi dilakukan dengan pergeseran kernel konvolusi pada matriks masukan pada lapisan tersebut, lalu menghasilkan peta fitur (''feature maps'') yang digunakan sebagai masukan untuk lapisan selanjutnya. Lapisan konvolusi ini diikuti lapisan-lapisan lainnya, seperti lapisan pengumpul (''pooling layer''), lapisan terhubung sepenuhnya ''(fully-connected layer''), dan lapisan normalisasi (''normalization layer''). Di sini dapat dilihat kemiripan antara CNN dengan [[matched filter]].<ref>Convolutional Neural Networks Demystified: A Matched Filtering Perspective Based Tutorial https://arxiv.org/abs/2108.11663v3</ref>
=== Lapisan konvolusi ===
Masukan pada CNN berupa [[Tensor (penelajaran mesin)|tensor]] dengan bentuk:
(Jumlah masukan) × (tinggi masukan) × (lebar masukan) × (masukan [[saluran (citra digital)|saluran]])
Setelah melewati sebuah lapisan konvolusi, citra tersebut diabstraksi menjadi sebuah peta fitur, disebut juga sebagai peta aktivasi (''activation map''), dengan bentuk:
(Jumlah masukan) × (tinggi peta fitur) × (lebar peta fitur) × (peta fitur [[saluran (citra digital)|saluran]]).
Lapisan konvolusi mengkonvolusi masukan dan melemparkan hasilnya kepada lapisan selanjutnya. Proses ini mirip dengan respons neuron dalam korteks visual terhadap rangsangan tertentu.<ref name="deeplearning">{{cite web |title=Convolutional Neural Networks (LeNet) – DeepLearning 0.1 documentation |url=http://deeplearning.net/tutorial/lenet.html |work=DeepLearning 0.1 |publisher=LISA Lab |access-date=31 August 2013 |archive-date=28 December 2017 |archive-url=https://web.archive.org/web/20171228091645/http://deeplearning.net/tutorial/lenet.html |url-status=dead }}</ref> Setiap neuron konvolusi memproses data hanya untuk [[bidang reseptif|bidang reseptifnya]].
[[File:1D Convolutional Neural Network feed forward example.png|thumb|'''1D Convolutional Neural Network feed forward example''']]
Meskipun [[perseptron multi-lapisan|jaringan umpan-maju yang terhubung sepenuhnya]] dapat digunakan untuk mempelajari fitur-fitur dan mengklasifikasi data, arsitektur ini umumnya tidak praktis untuk masukan yang lebih besar, contohnya citra beresolusi tinggi yang membutuhkan neuron dalam jumlah besar; setiap pikselnya merupakan satu fitur masukan (''input feature''). Sebuah lapisan terhubung sepenuhnya untuk satu citra dengan ukuran 100 × 100 memiliki 10.000 bobot ''untuk'' setiap neuron di lapisan kedua. Proses konvolusi dapat mengurangi jumlah parameter bebas sehingga jaringan dapat menjadi lebih dalam.<ref name="auto1" /> Sebagai contoh, dengan menggunakan sebuah 5 × 5 tiling region, each with the same shared weights, requires only 25 neurons. Using regularized weights over fewer parameters avoids the vanishing gradients and exploding gradients problems seen during [[backpropagation]] in earlier neural networks.<ref name="auto3" /><ref name="auto2" />
To speed processing, standard convolutional layers can be replaced by depthwise separable convolutional layers,<ref>{{Cite arXiv |last=Chollet |first=François |date=2017-04-04 |title=Xception: Deep Learning with Depthwise Separable Convolutions |class=cs.CV |eprint=1610.02357 }}</ref> which are based on a depthwise convolution followed by a pointwise convolution. The ''depthwise convolution'' is a spatial convolution applied independently over each channel of the input tensor, while the ''pointwise convolution'' is a standard convolution restricted to the use of <math>1\times1</math> kernels.
=== Pooling layers ===
Convolutional networks may include local and/or global pooling layers along with traditional convolutional layers. Pooling layers reduce the dimensions of data by combining the outputs of neuron clusters at one layer into a single neuron in the next layer. Local pooling combines small clusters, tiling sizes such as 2 × 2 are commonly used. Global pooling acts on all the neurons of the feature map.<ref name="flexible"/><ref>{{cite web |last=[[Alex Krizhevsky|Krizhevsky]] |first=Alex |title=ImageNet Classification with Deep Convolutional Neural Networks |url=https://image-net.org/static_files/files/supervision.pdf |access-date=17 November 2013 |archive-date=25 April 2021 |archive-url=https://web.archive.org/web/20210425025127/http://www.image-net.org/static_files/files/supervision.pdf |url-status=live }}</ref> There are two common types of pooling in popular use: max and average. ''Max pooling'' uses the maximum value of each local cluster of neurons in the feature map,<ref name=Yamaguchi111990>{{cite conference |title=A Neural Network for Speaker-Independent Isolated Word Recognition |last1=Yamaguchi |first1=Kouichi |last2=Sakamoto |first2=Kenji |last3=Akabane |first3=Toshio |last4=Fujimoto |first4=Yoshiji |date=November 1990 |location=Kobe, Japan |conference=First International Conference on Spoken Language Processing (ICSLP 90) |url=https://www.isca-speech.org/archive/icslp_1990/i90_1077.html |access-date=2019-09-04 |archive-date=2021-03-07 |archive-url=https://web.archive.org/web/20210307233750/https://www.isca-speech.org/archive/icslp_1990/i90_1077.html |url-status=dead }}</ref><ref name="mcdns">{{cite book |last1=Ciresan |first1=Dan |first2=Ueli |last2=Meier |first3=Jürgen |last3=Schmidhuber |title=2012 IEEE Conference on Computer Vision and Pattern Recognition |chapter=Multi-column deep neural networks for image classification |date=June 2012 |pages=3642–3649 |doi=10.1109/CVPR.2012.6248110 |arxiv=1202.2745 |isbn=978-1-4673-1226-4 |oclc=812295155 |publisher=[[Institute of Electrical and Electronics Engineers]] (IEEE) |location=New York, NY |citeseerx=10.1.1.300.3283 |s2cid=2161592}}</ref> while ''average pooling'' takes the average value.
=== Fully connected layers ===
Fully connected layers connect every neuron in one layer to every neuron in another layer. It is the same as a traditional [[multilayer perceptron]] neural network (MLP). The flattened matrix goes through a fully connected layer to classify the images.
=== Receptive field ===
In neural networks, each neuron receives input from some number of locations in the previous layer. In a convolutional layer, each neuron receives input from only a restricted area of the previous layer called the neuron's ''receptive field''. Typically the area is a square (e.g. 5 by 5 neurons). Whereas, in a fully connected layer, the receptive field is the ''entire previous layer''. Thus, in each convolutional layer, each neuron takes input from a larger area in the input than previous layers. This is due to applying the convolution over and over, which takes the value of a pixel into account, as well as its surrounding pixels. When using dilated layers, the number of pixels in the receptive field remains constant, but the field is more sparsely populated as its dimensions grow when combining the effect of several layers.
To manipulate the receptive field size as desired, there are some alternatives to the standard convolutional layer. For example, atrous or dilated convolution<ref>{{Cite arXiv|last1=Yu |first1=Fisher |last2=Koltun |first2=Vladlen |date=2016-04-30 |title=Multi-Scale Context Aggregation by Dilated Convolutions |class=cs.CV |eprint=1511.07122 }}</ref><ref>{{Cite arXiv|last1=Chen |first1=Liang-Chieh |last2=Papandreou |first2=George |last3=Schroff |first3=Florian |last4=Adam |first4=Hartwig |date=2017-12-05 |title=Rethinking Atrous Convolution for Semantic Image Segmentation |class=cs.CV |eprint=1706.05587 }}</ref> expands the receptive field size without increasing the number of parameters by interleaving visible and blind regions. Moreover, a single dilated convolutional layer can comprise filters with multiple dilation ratios,<ref>{{Cite arXiv|last1=Duta |first1=Ionut Cosmin |last2=Georgescu |first2=Mariana Iuliana |last3=Ionescu |first3=Radu Tudor |date=2021-08-16 |title=Contextual Convolutional Neural Networks |class=cs.CV |eprint=2108.07387 }}</ref> thus having a variable receptive field size.
=== Weights ===
Each neuron in a neural network computes an output value by applying a specific function to the input values received from the receptive field in the previous layer. The function that is applied to the input values is determined by a vector of weights and a bias (typically real numbers). Learning consists of iteratively adjusting these biases and weights.
The vectors of weights and biases are called ''filters'' and represent particular [[feature (machine learning)|feature]]s of the input (e.g., a particular shape). A distinguishing feature of CNNs is that many neurons can share the same filter. This reduces the [[memory footprint]] because a single bias and a single vector of weights are used across all receptive fields that share that filter, as opposed to each receptive field having its own bias and vector weighting.<ref name="LeCun">{{cite web |url=http://yann.lecun.com/exdb/lenet/ |title=LeNet-5, convolutional neural networks |last=LeCun |first=Yann |access-date=16 November 2013 |archive-date=24 February 2021 |archive-url=https://web.archive.org/web/20210224225707/http://yann.lecun.com/exdb/lenet/ |url-status=live }}</ref>