This vulnerability results from using gradient descent to determine classification of inputs via a neural network. As such,it is a vulnerability in the algorithm. In plain terms,this means that the currently-standard usage of this type of machine learning algorithm can always be fooled or manipulated if the adversary can interact with it. What kind or amount of interaction an adversary needs is not always clear,and some attacks can be successful with only minor or indirect interaction. However,in general more access or more interaction options reduce the effort required to fool the machine learning algorithm. If the adversary has information about some part of the machine learning process(training data,training results,model,or operational/testing data),then with sufficient effort the adversary can craft an input that will fool the machine learning tool to yield a result of the adversary’s choosing. In instantiations of this vulnerability that we are currently aware of,”sufficient effort”ranges widely,between seconds and weeks of commodity compute time. Within the taxonomy by Kumar et al.,such misclassifications are either perturbation attacks or adversarial examples in the physical domain. There are other kinds of failures or attacks related to ML systems,and other ML systems besides those trained via gradient descent. However,this note is restricted to this specific algorithm vulnerability. Formally,the vulnerability is defined for the following case of classification. Let x be a feature vector and y be a class label. Let L be a loss function,such as cross entropy loss. We wish to learn a parameterization vectorθfor a given class of functions f such that the expected loss is minimized. Specifically,let In the case where f(θ,x)is a neural network,finding the global minimizerθ*is often computationally intractable. Instead,various methods are used to findθ^,which is a”good enough”approximation. We refer to f(θ^,.)as the fitted neural network. If stochastic gradient descent is used to findθ^for the broadly defined set of f(θ,x)representing neural networks,then the fitted neural network f(θ^,.)is vulnerable to adversarial manipulation. Specifically,it is possible to take f(θ^,.)and find an x’ such that the difference between x and x’ is smaller than some arbitrary and yet f(θ^,x)has the label y and f(θ^,x’)has an arbitrarily different label y’. (Mathematicians,please excuse our abuse of^ashat and*as_star.) The uncertainty of the impact of this vulnerability is compounded because practitioners and vendors do not tend to disclose what machine learning algorithms they use. However,training neural networks by gradient descent is a common technique. See also the examples in the impact section.

# VU#425163: Machine learning classifiers trained via gradient descent are vulnerable to arbitrary misclassification attack

by CERT | Mar 19, 2020 | CERT-Vulnerabilities