+ -
with P=(p1,...,pm)^t
Grad_x L = Ax - b + \sum_{i=1}^m p_i Bi
----------------------------------------------------------
Ex 10, 11 et 12 :
-first find the solution after removing the constraint through variable elimination
then
-find the solution as zero of Grad L(x,p) = 0 using vector minimization (finding zero of vector)
The system is often nonlinear and we need a minimizer (Newton) to find its zeros.
With m equality constraints (E_j(x)=0, j=1,...,m), the Lagrangian L(x,p)=J(x)+
with p in R^m
Check of Grad(J(x)) and Grad(E(x)) are parallel more generally if:
Grad(J(x)) in Span{Grad(E_i(x)), i=1,...,m}
Difference between Jacobian and Hessian :
J(x) : Rn -> R , Grad J(x) in Rn, Hess(J(x)) in Rnxn (symmetric matrix) and Jac(Grad(J(x)) = Hess(J(x))
But in gal: x in Rn, F(x) in Rm, Jac(F(x)) in Rmxn (not rectangular)
-----------------------------------------------------------
Quadratic minimization under linear constraint
Read POLY p. 358, and consider Ex. 44 to 46
---
Simple example:
J(x, y) = x**2 + 2 y**2 under constraint x + y = 1.
First eliminating a variable : y=1-x, j(x)=x**2 + 2 (1-x)**2 = 3 x**2 - 4x + 2
j'(x)=0 => x=4/6 , y=1-4/6=1/3
Check the solution with Primal-Dual approach
Express: A, B, b, c in dual-primal approach
find that p=-4/3
Same question with an addition constraint: x+2y=-1
=> intersection of the two constraints is the solution.
-------------------------------------------------------------
Which problem has been solved here ?
V=1000
S=10
#vector function : gradient of Lagrangian
def fun(x):
return [x[1] + x[2] + x[3]*x[2]*x[1] + x[4]*x[1], #dL/dx1
x[0] + x[2] + x[3]*x[0]*x[2] + x[4]*x[0], #dL/dx2
x[0] + x[1] + x[3]*x[0]*x[1] + 0, #dL/dx3
x[0]*x[1]*x[2] - V, #dL/dp1 : contrainte C1=0
x[0]*x[1] - S #dL/dp2 : contrainte C2=0
]
# #gradient of vector function (gradient of gradient of Lagrangian)
def jacobian(x):
return np.array([[0 , 1+x[3]*x[2]+x[4], 1+x[3]*x[1], x[2]*x[1], x[1]],
[1+x[3]*x[2]+x[4], 0, 1+x[3]*x[0], x[2]*x[0], x[0]],
[1+x[3]*x[1], 1+x[3]*x[0], 0, x[0]*x[1], 0],
[x[1]*x[2] ,x[0]*x[2], x[0]*x[1] , 0, 0],
[ x[1] , x[0] , 0 , 0, 0]
])
J(x) = x[0]*x[1] + x[0]*x[2] + x[1]*x[2]
----------------------------------------------------------
read 16.6.3
Ex. 12 with two constraints:
consider an additional constraint: E_2(x)= x_1 * ( x_1 - 4 - x_2) -1 =0
L=J(x) + p1 E1(x) + p2 E2(x)
fun: Grad L
Jac: Grad Grad L
1. reduce to a single variable minimization problem: find the solution without constraint,
as E2 is nonlinear, this is perhaps impossible to achieve easily.
2. build zero_vect_fct.py to solve the constrained problem in vector form
3. Which constraint is more active at the optimum :
analyze the sensitivity of the functional with respect to each of the constraints (by FD).
('solution=', array([-0.20376823, 0.70376823, 0.5 , -0.5 , -1. ]))
('TEST Gradient Lagrangien NUL =', [1.375521918589584e-11, 1.0274003869881199e-12, -1.8640644583456378e-13, 0.0, 7.97961696719085e
-12])
('J(x)=', 1.2500000000079796)
('TEST E1(x) constraint=', 0.0)
('TEST E2(x) constraint=', 7.97961696719085e-12)
Sensitivity analysis : J'_E1=1/2 , J'_E2=1
Check that at optimum we have: Grad J \in Span{Grad E1, Grad E2}
without using information : Grad L(x,p)=0
QR Python --> Gram-Schmidt:
Grad J - (Grad E1/||Grad E1||) - / (Grad E2/||Grad E2||) ?
Compare solution with penalization method. Sample penalization parameters and find best tradeoff.
----------------------------------------------------------
17. q(x)=b(x,x) with b(x,y)=x1 y1 + x2 y2 + x3 y3 +1/2(x1 y2 + x2 y1)
b is symmetric bilinear, therefore, q(x) is quadratic
b is definite as b=0 <=> x=0 and positive as it is a sum of square.
A=(1, 1/2, 0)(1/2, 1 , 0) (0,0,1) b(x,y)=x^t A y
[alpha,x]=J(x) => alpha_1=alpha_2=2/3 alpha_3=1
Continuous function reaches its bounds on a compact ensemble in Rn.
Cauchy-Schwartz
-sqrt(q(alpha)) sqrt(q(x)) <= [x,alpha] <= sqrt(q(alpha)) sqrt(q(x))
q(x)=1 => -sqrt(q(alpha)) <= [x,alpha]=J(x)=x1+x2+x3 <= sqrt(q(alpha))
q(alpha)= 4/9 + 4/9 + 9/9 + 1/2 (4/9 + 4/9) = 2.33
Cauchy-Schwartz equality [x,alpha] = cste => x* = lambda alpha with lambda in R
x* is such that q(x*)= lambda^2 q(alpha)=1 => lambda = 1/sqrt(q(alpha)) => x* = alpha/sqrt(q(alpha))
x* = +/- sqrt(7/3)
same for 16.
----------------------------------------------------------
Uzawa and primal-dual methods
-----------------------------------------------------------
Primal-Dual problems 44
Solve min J(x) =1/2 (A x,x) - (b,x), with n=100, A=Hilbert, b=A 1, (sol of unconstrained minimization is 1)
Under constraint ||x||_1 = 1 ==> B x = c
-By variable elimination
-By Penalty
-By zero_vect_fct.py
-By Primal-Dual : requires A-1, B A−1 BT p = B A−1 b − c, then x = -A−1 BT p + A−1 b
-By Uzawa : min-max iterative method
-----------------------------------------------------------
Solve the portfolio optimization problem in the Markovitz sense (ex. 46)
primal-dual (POLY p. 358) for a portfolio with 3 minimum 3 assets.
min_x Risk=J(x)=1/2(Ax,x) --> x_opt=(x_1,x_2,x_3) + p_1, p_2 + J_opt for R given 2 contraints:
C_1(x)=x0+x1+x2-1=0
C_2(x)=r0*x0+r1*x1+r2*x2-R=0
at optimum, dJ/dC_i = -p_i
+ Box contraints : xmax(i)=1, xmin(i)=0
Plot the (return R / Risk J) front with N=20 points.
This requires the solution of N optimization problem with R \in [R_min, R_max]
with R_min=0.01, R_max=0.07 --> save corresponding (R,J)_i=1,...,N and plot.
r_i and A come from historical data (expertise of the bank).
https://fr.wikipedia.org/wiki/Th%C3%A9orie_moderne_du_portefeuille#:~:text=La%20th%C3%A9orie%20moderne%20du%20portefeuille,au%20risque%20moyen%20du%20march%C3%A9.
Start with
ndim=3
A[i][j]=exp(-0.05*(i-j)**2)
ri=0.01, 0.02, 0.06
R=0.03
B=[[1,...,1] [r1,...,rn]] 2 lines as two constraints
c=[1,R]^t
Then consider true values from:
https://www.centralcharts.com/fr/gm/1-apprendre/3-bourse/5-gestion-portefeuille/191-volatilite-d-un-portefeuille-boursier
Eliminating 2 variables, then using primal-dual approach (using zero_vect_fct.py or gradproj.py)
Uzawa algorithm to avoid inversion of A: A-1
Iterative minimization in x and maximization in p.
Adapt gradproj to solve general minimization under equality constraint (min in x / max in p)
and create uzawa.py iterating between primal and dual problems
-----------------------------------------------------------
Consider uzawa.py
-remove inversion of A
-introduce initialization for x and p
-introduce stability condition for rho
-----------------------------------------------------------
----------------------------------------------------------
Optimization under inequality constraint
----------------------------------------------------------
RMQ: equality constraint = 2 inequality contraints A=B <=> AA
E(x)=0 <=> E(x)< 0 & -E(x)<0
Def: E(x) <= 0 => 1/ inactive E(x)<0 2/ E(x)=0 active (on the boundary of the constraint ensemble)
----------------------------------------------------------
Lagrange multipliers for inequality constraints are in R+ and not in R as for equality cstr.
KKT:
L(x,p)=J(x)+ \sum_i=1^m p_i E_i(x) = J(x)+ , avec p_i \in R^+
grad_x L(x,p) = grad_x J + \sum_i=1^m p_i grad_x E_i(x)
p_i E_i = 0 for i=1,...,m complementarity cdt
Either p_i=0 when E_i<0, or p_i>0 when E_i=0
=0 P=(p_1,...,p_m) E=(E_1,...,E_m)
before starting always rewrite E>=0 form as -E<=0
ex: x>=0 => -x=<0
------------------------------
Ex 47 and 48 analytic and numerical solutions
------------------------------
-24. parallepipede defined (X=a/sqrt(3), Y=b/sqrt(3), Z=c/sqrt(3)) has volum=abc/(3 sqrt(3))
min -XYZ under contraints: -X,-Y,-Z <= 0 et X**2/a**2 + Y**2/b**2 + Z**2/c**2 - 1 <= 0
L = -XYZ - p1 X -p2 Y -p3 Z + p4 (X**2/a**2 + Y**2/b**2 + Z**2/c**2 - 1)
Grad_x L => 3 eq.
-Y Z + p4 2 X/a^2 -p1 = 0 *X et add: -3 XYZ + 2p4 =0 => p4=3/2 XYZ
-X Z + p4 2 Y/b^2 -p2 = 0 *Y
-Y X + p4 2 Z/c^2 -p3 = 0 *Z
p4 (X^2/a^2 + Y^2/b^2 + Z^2/c^2 - 1) = 0 4eme eq.
p1 X=0 5eme eq
p2 Y=0 6eme eq
p3 Z=0 7eme eq
-YZ + 3 XYZ X/a^2 = 0 => X^2 = a^2/3 => X = a/sqrt(3), idem pour Y et Z et p4= abc/(2 sqrt(3)),
and p1, p2, p3=0 as -X,-Y,-Z < 0
https://www.wolframalpha.com/calculators/system-equation-calculator
------------------------------
-25. maximiser x1 x2 (x2 - x1) sous contraintes x1 + x2 = 8, x1>0 x2>0
<=>
minimiser -x1 x2 (x2 - x1) sous contraintes x1 + x2 = 8, -x1<0 -x2<0
solution: x_{1,2} = 4 (+/-) 4/sqrt(3)
L=-x1 x2 (x2 - x1) + p1 (x1+x2-8) - p2 x1 -p3 x2
=-x1 x2**2 + x1**2 x2 + p1 (x1+x2-8) - p2 x1 -p3 x2
Grad_x L=0 :
-x2**2 + 2 x1 x2 + p1 = 0
-2 x1 x2 + x1**2 + p1 = 0
p1 (x1+x2-8)=0
p2 x1=0
p3 x2=0
x1,x2 = 4 (1 +/- 1/sqrt(3) ) >= 0 et x1+x2=8 avec wolfram
otherwise, use the equality constraint to eliminate x2:
-x2**2 + 2 x1 x2 + 2 x1 x2 - x1**2 = 0
-(8-x1)**2 + 2 x1 (8-x1) + 2 x1 (8-x1) - x1**2 = 0
==> x1 sol eq 2nd ordre, x2=8-x1, p1=x2**2 - 2 x1 x2
------------------------------
-27. link between optimization and spectral analysis : eigen values and vectors.
L(x,p) = - + p ( - 1)
Grad_x L = - 2 A x + 2 p x = 0 <=> A x = p x
<=> p is eigen value of A and x associated normalized eigen vector.
------------------------------
-15. Link between convex set and the convex fct defining its boundary (parabole first then general situation with g convex).
Let z=t x + (1-t) y, with x and y in Cg (therefore g(x)<=0 and g(y)<=0), 0 <= t <= 1
g(z) = g(t x + (1-t) y) <= t g(x) + (1-t) g(y) <= 0 as t and (1-t)>=0 and g(x) and g(y)<=0
Therefore z belongs to Cg => Cg is convex.
If g(x1, x2) = x1**2 - x2, g(x1,x2) is convex as the eigen values of the Hessian matrix
(H=diag(2,0)) are positives (or zero). g is convex, not strictly.
The projection on convex thm permits to look for the solution minimizing the distance under inequality constraint.
y in R2, we look for x in R2 minimizing J(x)=||x-y||=(x1 - y1)**2+(x2 - y2)**2
s.t. x in Cg : (x1**2 - x2 <=0)
A single Lagrange multiplier as there is only one constraint.
KKT cdt in each of the two cases give the solution in each case:
Lagrange multiplier zero (inactive constraint, y in C and x=y is solution),
Lagrange multiplier strictly positive (active constraint, projection on the boundary of C)
Geometry recall:
.Normal unit exterior vector to the boundary of C : y=x**2
t=(tx, ty)/xno tx=1, ty=2*x xno=sqrt(tx**2+ty**2)
n=(nx, ny) = (ty, -tx)/xno with =(tx*ty-ty*tx)/xno**2=0
-----------------------------------------------------------
Ex. 50
------
Adapt gradproj to solve the constrained minimization problem.
Accounting for xi <= xi+1 constraint:
1. dynamic box constraint: xmin(i+1) = x^n(i) at each gradient step
2. introduce n-1 artificial variable (like in simplex method) y_i>=0 s.t.: x2=x1+y2, x3=x2+y3... with fixed box constraints.
The final solution is then : (x1, x2=x1+x2, x3=x2+x3,...)
Test on target vector a s.t. a_i=(i+1)**2 for ndim=50 starting from x0=0
see example of application in scikit-learn:
https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_4_0.html#sphx-glr-auto-examples-release-highlights-plot-release-highlights-1-4-0-py
-----------------------------------------------------------
Linear programming
min J(x) = c^t x under E(x)= A x - b = 0 et 0 \leq x
see simplexe_slides_litislab.pdf & simplexe_slides_telecom.pdf (dictionnaires)
---------------------------------------------------------------------------------------
code simplex.py and recover results after putting problem under standard form (variables d'ecart):
canonical form:
min c^t x
A x <= b : change >= in <= multiplying by -1
0 =< x
standard:
min c^t x
A x + e = b : change <= in =
0 =< x, e
can be rewritten X=(x,e), A <- [A,I]
min c^t X
A X = b
0 =< X
---------------------------------------------------------------------------------------
Simplex Algorithm with dictionaries
1/Initialisation x=0 , e=b > 0
IF b>=0 (b_i>=0) ADMISSIBLE INIT with x=0, e=b
Basis variable (B) = e NON Basis (HB) = x
2/Iteration/Progression
- Express Basis variables using Non Basis: B / HB
- Entering variable xe (from HB) in base B : The most sensitive for J / the largest reduced cost / the largest partial derivative
- Exiting variable xs from basis B: producing the least impact on xe once put to zero
3/ Stop when all reduced costs are positive (for minimisation pb) and negative (for maximisation pb)
4/ en cas de conflit en 2/ entre plusieurs variables, choisir celle a indice la plus petite (regle de Bland)
pour assurer la finitude de l'algorithme.
Ex: 54 (sur R^3_+) details initialisation and iterations of Simplex algorithm using dictionaries and use Python
linear programming to solve the problem
What are the active constraints ? (when slack variables or variables d'ecart = 0).
---------------------------------------------------------------------------------------
A control problem involving a linear state equation
---------------------------------------------------------------------------------------
56. Optimization using the linearity of the equation of state with respect to the controls (optimization variables).
Linear state equation with N source terms (controls): example ADRS: Advection-Diffusion-Reaction-Source equation (see M2 MANU)
-Plot the surface J(x1,x2,....) by sampling the first two controls (the others fixed).
-How then to calculate numerically the expressions Aij and Bi?
-compare the optimal control obtained with the solution (fine enough mesh to serve as a reference).
/////////////////////////////////////////////////////
0. ADRS (xtarget = (1,2,3,4) --> udes=u(xtarget) --> J(xtarget)= 1/2 ||u(xtarget)-udes||^2
1. ADRS (x) --> u(x) --> J(x)= 1/2 ||u(x)-udes||^2
2. Use a python minimizer to find xopt
3. Use the linearity of the ADRS equation to derive analytical solution
//////////////////////////////////////////////////////
Algorithm
---------
Inverse problem
define Xopt=(1,2,3,4) in R^4
define udes=u(Xopt) for example
J(x) = min_x 1/2 ||u(x)-udes||^2 -> X* = Xopt ?
X* such that Grad_x J(X*) = 0
-Find U0
-Find U1, U2, U3, U4 <- U(ei)
-Calculate: Grad_x J(u(x)) = Ax-b = 0
-Aij= _L2
-bi=_L2
-Invert A X* = b with x=(alpha_1,...,alpha_4)
Check if u(X*)=udes? calculating J(Xopt) = ||u(X*)-udes||^2 and if Xopt = X*
////////////////////////////////////////////////////////
Gradient calculation
Grad_x J(u(x)) = Ax-b = 0
J=1/2 h sum_i=1^NX (u0 + (sum_j=1^nbc x_j u_j) - udes)**2
Grad_xj J(u(x)) = h sum_i=1^NX (u0(i) + (sum_j=1^nbc x_j u_j(i)) - udes) u_k(i), for j,k=1,...,nbc
h sum_i=1^NX ((sum_j=1^nbc x_j u_j) u_k(i)) = - h sum_i=1^NX (u0(i) - udes(i)) u_k(i) for k=1,...,nbc
sum_j=1^nbc x_j (h sum_i=1^NX ( u_j(i) u_k(i)) = h sum_i=1^NX (udes(i) - u0(i) ) u_k(i)
A x = b with Aij= _L2 and bi=_L2
see heat2d_pb_inverse.py
-----------------------------------------------------------
Ex. 57
FD1, FD2, CVM (AutoDiff direct and reverse modes)
Make your code parametric wrt epsilon for FD1,2
Plot the errors wrt epsilon, find optimal epsilon in FD
-----------------------------------------------------------
Ex. 59
Having two functionals to minimize : J1(x1,x2), J2(x1,x2)
Multi-criteria optimization using the following approaches:
-Non constrained optimization:
Build a composite function by convex combination :
J(x1,x2)= a1 J1(x1,x2) + a2 J2(x1,x2), a1+a2=1
One could initialize the next optimization problem with the solution of the
current one: do not start from scratch =>
reduces the number of functional and gradient evaluations.
-Constrained minimization :
min J1(x1,x2) under constraint J2(x1,x2) < J2(x10,x20)
making sure that one does not degrade the other functional
-Alternate constrained minimization and iterative Nash equilibrium:
x1n=argmin(J1(x1,x2n-1) under constraint J2(x1n,x2n-1) < J2(x1n-1,x2n-1)
x2n=argmin(J2(x1n,x2)) under constraint J1(x1n,x2n) < J1(x1n-1,x2n-1)
converges toward the iterative Nash equilibrium.
2 KKT systems at each iteration, or penalty
-----------------------------------------------------------
Machine Learning with Python Handbook
https://jakevdp.github.io/PythonDataScienceHandbook/
-----------------------------------------------------------
Scikit-learn platform
-----------------------------------------------------------
We will use this python ensemble to discover data manipulation and machine learning.
Install scikit-learn modules when necessary.
2 main classes of machine learning:
Supervised Learning: we have a labelled dataset
Unsupervised Learning: clustering
We will see classification (outputs are integer) and regression (outputs are real variables) problems.
The aim is to be able to reproduce the following chain:
dataset (dirty) --> analysis of DB --> reduction of dimension *
--> dataset splitting for learning, validation and inference on the test DB (unlabelled)
--> trials of different ML algorithm and scoring
--> crossfold
-----------------------------------------------------------
Database
-----------------------------------------------------------
DB comes often in csv format.
Personal machine learning project :
Consider DB_Iris.csv and create iris_classif.py
What are the size of input and output spaces ?
How many scenarios are present in the DB ?
Split the data in two subset for learning and test
https://machinelearningmastery.com/train-test-split-for-evaluating-machine-learning-algorithms/
-----------------------------------------------------------
Y(150) = X(150,4) L(4) , Linear model: find L solving (XXt) L = Xt Y
-Linear Models (ex 40) for regression/classification.
-Read and plot csv data
-Learn and Test with linear model and different regularizations (Ridge (L2), Lasso (L1), elasticNet).
-what are MAE, R2 scores
-Use a 80%-20% ratio for the learning-testing databases
-plot y_pred vs. y_test
-How Lasso can be used to identify most important features selection (looking at larger coefficients) ?
https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b
https://en.wikipedia.org/wiki/Lasso_(statistics)
-Proceed with cross-validation (present results for 10-fold cross-validation)
----------------------------------------------------
Adapt the classifier to these medical data (DB_diabetes):
https://www.youtube.com/watch?v=U1JIo8JSuYo
Compare the methods (LM, Logistic regression, KNN, RF and SVM) for a medical classification
problem where one would like to estimate the probability for an individual to develop diabetes.
The learning data base is pima-indians-diabetes.csv from R^8 to {0,1}
Discuss the different choices of the parameters in the models and report the best
choices.
Present the confusion matrices for each method alongside with a table comparing different methods specificity, sensitivity and error.
And conclude which method performs best on this problem.
see a good discussion site for this problem with a few ML methods:
https://www.andreagrandi.it/2018/04/14/machine-learning-pima-indians-diabetes/
----------------------------------------------------
SVD/SVM/SVR/SVC (39) : compression and data structuration (image 1024 x 1024) a vector of size n_svd (200 largest singular values)
Ex 39.
image -> b&w -> matrice -> svd -> compression -> image reconstitue : Find the right compression level
and compare using cosine distance of their svd btw 2 images
-SVM (Ex. 51).
https://machinelearningmastery.com/multi-output-regression-models-with-python/
For SVM it seems that we have convergence problems ?
See which optimization algorithm is used in your ML algorithms and is it possible to user second order optimization algorithm ?
SVM(SVC/SVR): Detail Lagrangian and KKT conditions for the constrained optimization problem.
min J(w)= 1/2 ||w||^2 s.t. l_k (w^T x_k + b) > 0 for k = 1,...,3
------------------------------------------
Logistic regression: Detail all steps including the maximization of the likelihood (Ex. 41)
To apply logistic regression here we need to transform the data in order for the outputs to be 0 and 1 (probability).
Then one can fit the logit function to the data.
Interprete beta_i once their values found.
-Use gradproj to develop a logistic regression algorithm (ex. 41).
Application to DB_diabetes.csv databasis.
use 50% of DB for learning and 50% for testing.
------------------------------------------
-KNN
https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/
----------------------------------------------------
-Modify the steepest descent iterations implementing stochastic gradient.
from:
xn+1 = xn - rho * grad_x J
to
xn+1 = xn - rho * d with d=(partial J / partial xi) ei i randomly chosen in {1,...,n}
compare convergence and angles between grad_x J and d, why this is still a descent method ?
-----------------------------------------------------------
----------------------------------------------------
Useful youtube videos on ML
----------------------------------------------------
linear and polynomial fits of data
https://www.youtube.com/watch?v=ro5ftxuD6is
reading csv with csv
https://www.youtube.com/watch?v=OBPjFnyxoCc
scattering and plotting
https://www.youtube.com/watch?v=t9BbYPn2nyw
reading csv with pandas
https://www.youtube.com/watch?v=Lh1dxgxk7dw
regression with sklearn
https://www.youtube.com/watch?v=cBCVvND5i9o
linear regression with sklearn
https://www.youtube.com/watch?v=NUXdtN1W1FE
Good page on LM
https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b
KNN regressor
https://www.analyticsvidhya.com/blog/2018/08/k-nearest-neighbor-introduction-regression-python/
----------------------------------------------------