Hypercube neuron Abstract. Wojciech Fialkiewicz, Wroclaw, Poland.
Transcription
Hypercube neuron Abstract. Wojciech Fialkiewicz, Wroclaw, Poland.
Hypercube neuron Wojciech Fialkiewicz, Wroclaw, Poland. emails : WojciechFialkiewicz@fialkiewicz.net, wfialkiewicz@hotmail.com Abstract. Classic perceptron is limited to classifying only data which is linearly separable. In this work it is proposed neuron that does not have that limitation. 1. Introduction. In [1] was introduced neuron based on hypercube architecture, in this paper it is presented enhancement of that neuron. Some may argue that using hypercube to represent neuron leads to overcomplicated design, but it is known fact that artificial neural networks with many hidden layers are hard to train with back propagation algorithm. It may be the case that creating more complex neurons will allow to create neural networks with less hidden layers but with the same informational capacity. 2. Logic functions. Logic function can be described by its pattern by enumerating values for all possible sets of arguments . Pattern of logic function XOR is as follows : (false, false; false), (false, true; true), (true, false; true), (true, true; false). Pattern of logic function can be described with use of hypercube architecture. For three dimensional function described by pattern : Pattern P = (false, false, false; false), (false, false, true; true), (false, true, false; false), (false, true, true; true), (true, false, false; false), (true, false, true; true), (true, true, false; false), (true, true, true; true). There is hypercube that visualizes that logic function shown in Graph 2.1. 110 false 100 false 111 true 101 true 010 false 000 false 011 true 001 true Graph 2.1 Three dimensional hypercube of logic function described by pattern P. 3. Linearly separable patterns of logic functions. Classic perceptron can only be trained data sets that are linearly separable. Table (3.1) shows amount of logic functions with linearly separable patterns for given arguments amount (data from [2]), as well as total amount of logic functions for the same amount of arguments. arguments amount linearly separable total functions percent 14 2 16 87,5 104 3 256 40,625 1882 4 65536 2,871704102 94572 4294967296 0,002201926 5 15028134 1,84467E+19 8,14677E-11 6 8378070864 3,40282E+38 2,46209E-27 7 17561539552946 1,15792E+77 1,51664E-62 8 144130531453121000 1,3408E+154 1,075E-135 9 Table 3.1 Amount of logic functions for given arguments amount. Last column of Table (3.1) shows percent of logic functions that classic perceptron can learn. In this paper is proposed neuron that can learn any logic function. 4. Neuron with single input. In neuron input value is x, activation function is f(x), w is weight of its single input. Neurons exit value is equal to f(w*x). f(x) is defined as follows : 4.1 f(x) = bsgm(x) is defined as follows : 4.2 bsgm(x) = , where B = 5,625 So At and Ab are respectively top and bottom asymptote of function f(x). 1,5 1 0,5 0 -3 -2 -1 -0,5 0 1 2 3 -1 -1,5 Graph of f(x) for At = 1.0 and Ab = -1.0. Neuron with single input has three parameters that will be changing during learning process, they are At, Ab and w. 5. Neuron with two inputs. In neuron with two inputs inputs are defined as x and z, weights on inputs as wx and wz, activation function f(x,z) is defined with use of four asymptote points A00, A01, A10 and A11 . Graph 5.1 shows neurons activation function skeleton for A00= 0, A01= 0, A10= 0, A11= 1. y z A11 B1 A10 f(x0, z0) B0 x A00 A01 Graph 5.1 – sample skeleton of two dimensional function. Function f(x,z) is defined with use of four f(x) (4.1) functions with At and Ab respectively equal to (A01 ,A00), (A11, A01), (A11, A10) and (A10, A00). Value of function f(x,z) in point (x0 ,z0) is calculated in following way : Firstly is reduced dimension z by calculating B0 and B1 asymptote points defining one dimensional f(x) function with x argument. Secondly this function is used to obtain f(x0) value equal to f(x0 ,z0). B0 = B1 = f(x0) = f(x0 ,z0) = f(x0) = Neuron with two inputs has six parameters that will be changing during learning process, they are A00, A01, A10, A11 , wx and wz. 6. Neuron with N inputs. Neuron with N inputs has N weights w1, w2, .., wn on its inputs. Its activation function is defined by asymptote points. Calculating exit value of neuron with N inputs involves reducing in each step of algorithm one dimension of hypercube by calculating asymptote points for hypercube with lower dimensions. Following algorithm can be used to calculate exit value of neuron with N inputs : (6.1) double ExitValue(double *Weights, double *Inputs, double *Points, int N) { int iPairsAmountANDStepSize; iPairsAmountANDStepSize = 2^(N-1); for(int iDim=N-1;iDim>=0;iDim--) { for(int iPair=0;iPair<iPairsAmountANDStepSize;iPair++) { Points[iPair]=FuncBSGM(Points[iPair], Points[iPair+iPairsAmountANDStepSize], Weights[iDim]*Inputs[iDim]); } iPairsAmountANDStepSize /= 2; } return Points[0]; } Where Weights is array of weights on neuron inputs, Inputs is array of input values that neuron will process, Points is array of asymptote points defining neuron activation function and FuncBSGM is function that calculates (4.1) function and takes Ab and At as parameter. Exit value of neuron with N inputs can be specified as formula : (6.2) f(x1, x2, .., xn) = (A0 * Fun(0, w0, x0, 0) * Fun(1, w1, x1, 0)* ... * Fun(N, wN, xN, 0) + A1 * Fun(0, w0, x0, 1) * Fun(1, w1, x1, 1)* ... * Fun(N, wN, xN, 1) + ... + A Fun(N, wN, xN, ) ) * Fun(0, w0, x0, ) * Fun(1, w1, x1, )* ... * Where x0, x1, .., xN are neuron inputs values A0, A1 ... A are asymptote points defining neurons activation function and Fun(d, w, x, p) is defined as follows: (6.3) Fun(d, w, x, p) = 7. Supervised learning of neuron. To change parameters defining neuron during learning process it is used gradient based algorithm. In this algorithm neurons error function is being minimized by modifying value of each of neuron parameter by its gradient, multiplied by learning parameter. Learning change for each parameter of neuron is defined as follows : (7.1) P = P – l* Where P is neuron parameter that is being modified, l is learning parameter and E() is error function. (7.2) E() = , where y- exit value of neuron, d – desired exit value of neuron Calculating gradient of any asymptote point is done with use of formula (6.2). Gradient for asymptote point Am is as follows : * 2 * (y - d) * Fun(0, w0, x0, m) * Fun(1, w1, x1, m)* ... * Fun(N, wN, xN, m) Where N is number of neuron inputs , Fun () is function (6.3), w0, w1, ..wn are neuron weights and x0, x1, ...,xn are neuron inputs. Calculating gradients of weights is more complicated it requires modifying algorithm (6.1), to leave dimension of weight for which gradient is being calculated to be the last one to reduce. Following algorithm shows how it is accomplished : (7.3) void LowerDimsToDim(int iDimNr, double *Weights, double *Inputs, double *Points, int N) { int iPairsAmountANDStepSize; int iSecondPairsStart; iPairsAmountANDStepSize = 2^(N-1); for(int iDim = N-1;iDim>=0;iDim--) { if(iDim!=iDimNr) { for(int iPair=0;iPair<iPairsAmountANDStepSize;iPair++) { Points[iPair]=FuncBSGM(Points[iPair], Points[iPair+iPairsAmountANDStepSize], Weights[iDim] * Inputs[ iDim]); } if(iDim<iDimNr) { for(int iPair=0;iPair<iPairsAmountANDStepSize;iPair++) { Points[iSecondPairsStart + iPair]=FuncBSGM(Points[iSecondPairsStart + Pair], Points[iSecondPairsStart +iPair+iPairsAmountANDStepSize], Weights[iDim] * Inputs[ iDim]); } } }else { iSecondPairsStart = iPairsAmountANDStepSize; } iPairsAmountANDStepSize /= 2; } Points[1] = Points[iSecondPairsStart]; } When algorithm (7.3) is executed as a result it leaves two values in Points array, which are respectively Ab and At asymptote in dimension specified as iDimNr. Using those values it is possible to attain neuron exit value y with use of function (4.1). (7.4) y = So the gradient of weight number iDimNr is equal to : (7.5) = *xiDimNr. 8. Future Work. Hybrid learning systems based on neural networks presented in [3],[4] are combining imprinting symbolic knowledge into network structure and training that networks with back propagation algorithm with unclassified data. Neurons used in mentioned approaches to remember symbolic rules are realizing AND and OR logic functions. Neuron presented in this paper has ability to realize any logic function. Nets using that neurons can also be trained with back propagation algorithm to refine or discover new rules in data. References. [1] Fialkiewicz, Wojciech (2002). Reasoning with neural networks. M.Sc Thesis. Wroclaw University of Technology. [2] Gruzling, Nicolle (2006). Linear separability of the vertices of an n-dimensional hypercube. M.Sc Thesis. University of Northern British Columbia. [3] Geoffrey G. Towell , Jude W. Shavlik (1994). Knowledge-Based Artificial Neural Networks. [4] Artur S. d'Avila Garcez , Luís C. Lamb , Dov M. Gabbay. A Connectionist Inductive Learning System for Modal Logic Programming. Appendix. Implementation of hypercube neuron in C++. File Math.h class CMath { static const double dB; static bool Activate(void); static bool m_bStart; public: static double BipolarSigmoid(double); static double DerivatBipolarSigmoid(double); static double Random(double,double);//min, max static int PowerOfTwo(int); static double FunctionWithAsymptotes(double,double,double);//lower, upper asymptote, x }; File Math.cpp #include #include #include #include "Math.h" "stdlib.h" "time.h" <math.h> bool CMath::m_bStart = CMath::Activate(); const double CMath::dB = 5.625; bool CMath::Activate() { srand((unsigned)time(NULL)); return true; } double CMath::Random(double dMin, double dMax) { return (double)rand()/(RAND_MAX + 1)*(dMax - dMin) + dMin; } int CMath::PowerOfTwo(int nPower) { int nResult = 1; while(nPower) { nResult *= 2; nPower--; } return nResult; } double CMath::BipolarSigmoid(double dX) { return 2.0/(1+exp(-dB*dX))-1.0; } double CMath::FunctionWithAsymptotes(double dBottom, double dTop, double dX) { return ((dTop - dBottom)/2.0) * BipolarSigmoid(dX)+((dTop + dBottom)/2.0); } double CMath::DerivatBipolarSigmoid(double dX) { return (2*dB*exp(-dB*dX))/((1+exp(-dB*dX))*(1+exp(-dB*dX))); } File Neuron.h class CNeuron { double *m_dWeights; double *m_dAsympPoints; int m_nInputsAmount; int m_nPointsAmount; double *m_dTmp; double *m_dTmpWeights; static void LowerDimsToDim(int,double*,double*,double*,int);//dimension nr, weights, inputs, asymptote points, inputs amount static void CopyBuffer(double*,double*,int);//src, dest, amount static bool CheckBit(int,int);//value, bit nr static void CalculateAsympPointsGradients(double*,double*,double*,double*,int,int);//buffor for gradients, asymp points, weights, inputs, inputs am, points am public: CNeuron(int);//inputs amount ~CNeuron(void); double ExitValue(double*);//inputs double ExitValueFromFormula(double*);//inputs double Lern(double*,double,double);//inputs, desired output, lerning parameter void AdaptNeuronByGradient(double*,double,double);//inputs, gradinet fo neuron, lerning parameter void GradientsOnInputs(double*,double*,double);//inputs, (out) gradients on inputs, gradient for neuron void RememberRule(char*,double);//coordinates of hypercube vertex example : "001" (false, false, true), value on the vertex void OnesOnWeights();//assign 1.0 to all weights, must be called when using RememberRule private: CNeuron(const CNeuron&); CNeuron& operator=(const CNeuron&); }; File Neuron.cpp #include "Neuron.h" #include "Math.h" CNeuron::CNeuron(int iInputsAm) { m_nPointsAmount = CMath::PowerOfTwo(iInputsAm); m_nInputsAmount = iInputsAm; m_dAsympPoints = new double[m_nPointsAmount]; m_dTmp = new double[m_nPointsAmount]; m_dTmpWeights = new double[iInputsAm]; m_dWeights = new double[iInputsAm]; for(int i=0;i<m_nInputsAmount;i++) { m_dWeights[i] = CMath::Random(-1.0, 1.0); } for(int i=0;i<m_nPointsAmount;i++) { m_dAsympPoints[i] = CMath::Random(-1.0, 1.0); } } CNeuron::~CNeuron(void) { if(m_dAsympPoints) { delete [] m_dAsympPoints; } if(m_dTmp) { delete [] m_dTmp; } if(m_dTmpWeights) { delete [] m_dTmpWeights; } if(m_dWeights) { delete [] m_dWeights; } } void CNeuron::LowerDimsToDim(int nDimNr, double *dWeights, double *dInputs, double *dPoints, int dInAm) { int nPairsAmountANDStepSize; int nSecondPairsStart; nPairsAmountANDStepSize = CMath::PowerOfTwo(dInAm - 1); for(int nDim = dInAm - 1;nDim >= 0 ; nDim--) { if(nDim!=nDimNr) { for(int nPair = 0; nPair<nPairsAmountANDStepSize;nPair++) { dPoints[nPair] = CMath::FunctionWithAsymptotes(dPoints[nPair], dPoints[nPair+nPairsAmountANDStepSize], dWeights[nDim] * dInputs[nDim]); } if(nDim < nDimNr) { for(int nPair = 0;nPair<nPairsAmountANDStepSize;nPair++) { dPoints[nSecondPairsStart+nPair] = CMath::FunctionWithAsymptotes(dPoints[nSecondPairsStart+nPair], dPoints[nSecondPairsStart+nPair+nPairsAmountANDStepSize], dWeights[nDim]*dInputs[nDim]); } } } else { nSecondPairsStart = nPairsAmountANDStepSize; } nPairsAmountANDStepSize /= 2; } dPoints[1] = dPoints[nSecondPairsStart]; } double CNeuron::ExitValue(double *dInputs) { CopyBuffer(m_dAsympPoints, m_dTmp, m_nPointsAmount); LowerDimsToDim(0, m_dWeights, dInputs, m_dTmp, m_nInputsAmount); return CMath::FunctionWithAsymptotes(m_dTmp[0], m_dTmp[1], dInputs[0]*m_dWeights[0]); } void CNeuron::CopyBuffer(double *dSrc, double *dDest, int nAmount) { for(int i=0;i<nAmount;i++) { dDest[i] = dSrc[i]; } } bool CNeuron::CheckBit(int nValue, int nBitNr) { if(nValue & (1 << nBitNr)) { return true; } else { return false; } } void CNeuron::CalculateAsympPointsGradients(double *dGradients, double *dAsympPoints, double *dWeights, double *dInputs, int nInputsAm, int nPointsAm) { double dFraction = 1.0 / CMath::PowerOfTwo(nInputsAm); for(int i=0;i<nPointsAm;i++) { dGradients[i] = dFraction; for(int j=0;j<nInputsAm;j++) { if(CheckBit(i, j)) { dGradients[i] *= (1.0 + CMath::BipolarSigmoid(dWeights[j] * dInputs[j])); } else { dGradients[i] *= (1.0 - CMath::BipolarSigmoid(dWeights[j] * dInputs[j])); } } } } double CNeuron::ExitValueFromFormula(double *dInputs) { double dExitValue = 0.0; CalculateAsympPointsGradients(m_dTmp, m_dAsympPoints, m_dWeights, dInputs, m_nInputsAmount, m_nPointsAmount); for(int i=0;i<m_nPointsAmount;i++) { dExitValue += m_dAsympPoints[i] * m_dTmp[i]; } return dExitValue; } void CNeuron::AdaptNeuronByGradient(double *dInputs, double dGradient, double dLerningParam) { for(int i=0;i<m_nInputsAmount;i++) { CopyBuffer(m_dAsympPoints, m_dTmp, m_nPointsAmount); LowerDimsToDim(i, m_dWeights, dInputs, m_dTmp, m_nInputsAmount); m_dTmpWeights[i] = m_dWeights[i] - dLerningParam * dGradient * ((m_dTmp[1] m_dTmp[0])/2.0) * CMath::DerivatBipolarSigmoid(m_dWeights[i] * dInputs[i]) * dInputs[i]; } CalculateAsympPointsGradients(m_dTmp, m_dAsympPoints, m_dWeights, dInputs, m_nInputsAmount, m_nPointsAmount); for(int i=0;i<m_nPointsAmount;i++) { m_dAsympPoints[i] -= dLerningParam * dGradient * m_dTmp[i]; } CopyBuffer(m_dTmpWeights, m_dWeights, m_nInputsAmount); } void CNeuron::RememberRule(char *cCoordinates, double dValue) { int nPointPosition = 0; for(int i=m_nInputsAmount - 1;i>=0;i--) { if(cCoordinates[i] != '0') { nPointPosition += CMath::PowerOfTwo(m_nInputsAmount - i - 1); } } m_dAsympPoints[nPointPosition] = dValue; } void CNeuron::OnesOnWeights() { for(int i=0;i<m_nInputsAmount;i++) { m_dWeights[i] = 1.0; } } double CNeuron::Lern(double *dInputs, double dDesiredOutput, double dLernParam) { double dExit = ExitValue(dInputs); AdaptNeuronByGradient(dInputs, 2.0 * (dExit - dDesiredOutput), dLernParam); return (dExit - dDesiredOutput) * (dExit - dDesiredOutput); } void CNeuron::GradientsOnInputs(double *dInputs, double *dGradients, double dGradient) { for(int i=0;i<m_nInputsAmount;i++) { CopyBuffer(m_dAsympPoints, m_dTmp, m_nPointsAmount); LowerDimsToDim(i, m_dWeights, dInputs, m_dTmp, m_nInputsAmount); dGradients[i] = dGradient * ((m_dTmp[1] - m_dTmp[0])/2.0) * CMath::DerivatBipolarSigmoid(m_dWeights[i] * dInputs[i]) * m_dWeights[i]; } }