6.4 Higher-Order Derivatives and Convexity

If a function \(f: D \to {\mathbb{R}}\) is differentiable then we can of course also ask the question whether \(f':D \to {\mathbb{R}}\) is also differentiable. This leads to the following (inductive) definition.

Definition 6.3

(Higher-order (continuous) differentiability, higher-order derivatives). Let \(D \subseteq {\mathbb{R}},\) \(f:D \to {\mathbb{R}},\) and \(k \in {\mathbb{N}}\setminus\{0,1\}.\)

  1. We set \(f^{(0)}:=f\) and \(f^{(1)}:=f'.\)

  2. If \(f^{(k-1)}:D \to {\mathbb{R}}\) exists and is differentiable in \(x_0 \in D,\) then \(f\) is called \(k\) times differentiable in \(x_0\) and \(f^{(k)}(x_0):=\left(f^{(k-1)}\right)'(x_0)\) is called the \(k\)-th derivative of \(f\) in \(x_0\).

  3. If \(f^{(k-1)}:D \to {\mathbb{R}}\) exists and is differentiable in all \(x_0 \in D,\) then \(f\) is called \(k\) times differentiable and \(f^{(k)}:=\left(f^{(k-1)}\right)'\) is called the \(k\)-th derivative of \(f\).

  4. If \(f^{(k)}\) exists and is continuous, then \(f\) is called \(k\) times continuously differentiable.

In the special cases \(k=2\) and \(k=3\) we also use the simpler notations \(f''(x_0) := f^{(2)}(x_0)\) and \(f'''(x_0) := f^{(3)}(x_0)\) as well as \(f'' := f^{(2)}\) and \(f''' := f^{(3)}.\)

Remark 6.14

Let \(D \subset {\mathbb{R}}\) and \(f:D \to {\mathbb{R}}.\)

  1. For the twice differentiability of \(f\) in \(x_0\) one already has to assume that \(f'\) does not only exist in \(x_0\) but in entire \(D\) (or at least an \(\varepsilon\)-neighborhood of \(x_0\)).

  2. If \(f\) is differentiable, then it cannot be guaranteed that also \(f'\) is differentiable or even continuous. For instance, the function \(g:{\mathbb{R}}\to {\mathbb{R}},\) \(x \mapsto x \cdot |x|\) is differentiable with \(g':{\mathbb{R}}\to {\mathbb{R}},\) \(x \mapsto 2|x|\) (exercise!). But \(g'\) is not differentiable in \(x_0=0.\)

  3. Moreover, the function \[h: {\mathbb{R}}\to {\mathbb{R}}, \quad x \mapsto \begin{cases} x^2\cdot \sin \frac{1}{x}, & \text{if } x \neq 0, \\ 0, & \text{if } x = 0 \end{cases}\] is differentiable in entire \({\mathbb{R}}\) with \[h': {\mathbb{R}}\to {\mathbb{R}}, \quad x \mapsto \begin{cases} 2x\cdot \sin \frac{1}{x} -\cos \frac{1}{x}, & \text{if } x \neq 0, \\ 0, & \text{if } x = 0. \end{cases}\] But \(h'\) is not continuous in \(x_0 = 0\) (exercise!).

The following notations are standard for the sets of \(k\) times (continuously) differentiable functions.

Definition 6.4 • Spaces of k-times continuously differentiable functions

Let \(D \subseteq {\mathbb{R}}.\) We set \[\begin{aligned} C(D):= C^0(D) &:= \left\{ f:D \to {\mathbb{R}}\; | \; f \text{ is continuous} \right\}, \\ C^k(D) &:= \left\{ f:D \to {\mathbb{R}}\; | \; f \text{ is $k$ times continuously differentiable} \right\}, \\ C^\infty(D) &:= \left\{ f:D \to {\mathbb{R}}\; | \; f \text{ is arbitrarily often differentiable} \right\}. \end{aligned}\]

In the definition of \(C^\infty(D)\) it is not necessary to impose the continuous differentiability since it follows automatically from the differentiability of each derivative.

While the first derivative of a function can be interpreted as the slope of its graph, the second derivative has an interpretation as “curvature”. We will not discuss this in detail here but we will introduce the related term of convexity. Convexity is of utmost importance in optimization since convex functions have very desirable existence and uniqueness properties for their global minimizers and there exist very efficient algorithms to compute these. You will learn more about this in module M16: “Optimization and Machine Learning”.

Definition 6.5 • Convex function, concave function

Let \(I \subseteq {\mathbb{R}}\) be an interval and \(f:I \to {\mathbb{R}}.\)

  1. The function \(f\) is called convex, if for all \(x_1,\,x_2 \in I\) and all \(\lambda \in [0,1]\) it holds that \[f((1-\lambda)x_1 + \lambda x_2) \le (1-\lambda) f(x_1) + \lambda f(x_2).\]

  2. The function \(f\) is called concave, if \(-f\) is convex.

Figure 6.8: Graphs of a convex and nonconvex function

The term “convex” can be explained by the left figure in Figure 6.8. For \(\lambda \in [0,1],\) the points \[\left((1-\lambda) x_1 + \lambda x_2, (1-\lambda) f(x_1) + \lambda f(x_2) \right)\] form the connecting line between the points \((x_1,f(x_1))\) and \((x_2,f(x_2)).\) Then a function is convex if its graph is always below the connecting line between two arbitrary points on the graph. This results in a “convex shape” of the graph. A nonconvex function is illustrated in the right figure in Figure 6.8 where the graph has no “convex shape”.

Theorem 6.17

Let \(I \subseteq {\mathbb{R}}\) be an interval and \(f:I \to {\mathbb{R}}\) be differentiable. Then the following statements are satisfied:

  1. The function \(f\) is convex, if and only if \(f'\) is monotonically increasing.

  2. If \(f\) is twice differentiable, then \(f\) is convex, if and only if \(f'' \ge 0,\) i. e., \(f''(x) \ge 0\) for all \(x \in I.\)

Proof.

  1. “\(\Longrightarrow\)”: Let \(x_1,\,x_2 \in I\) with \(x_1 < x_2.\) We have to show that \(f'(x_1) \le f'(x_2).\) Let \(x \in (x_1,\,x_2).\) Then there exists a \(\lambda \in (0,1)\) such that \(x = (1-\lambda)x_1 + \lambda x_2.\) (This follows from an application of the intermediate value theorem (Theorem 4.7) to the function \(t \mapsto (1-t)x_1 + tx_2.\)) Then we have \[\tag{6.7} x-x_1 = \lambda (x_2-x_1) \quad \text{and} \quad x_2-x = (1-\lambda)(x_2-x_1).\] Using the convexity of \(f\) and (6.7) twice we obtain \[\begin{aligned} \frac{f(x)-f(x_1)}{x-x_1} &\le \frac{(1-\lambda)f(x_1) + \lambda f(x_2) -f(x_1)}{\lambda(x_2-x_1)} \\ &= \frac{f(x_2) -f(x_1)}{x_2-x_1} \\ &= \frac{f(x_2) - ((1-\lambda)f(x_1) + \lambda f(x_2))}{(1-\lambda)(x_2-x_1)} \le \frac{f(x_2) - f(x)}{x_2-x}. \end{aligned}\] Then we obtain \[f'(x_1) = \lim_{x \searrow x_1} \frac{f(x)-f(x_1)}{x-x_1} \le \lim_{x \nearrow x_2} \frac{f(x_2)-f(x)}{x_2-x} = f'(x_2).\] Hence, \(f'\) is monotonically increasing.
    “\(\Longleftarrow\)”: Let \(f'\) be monotonically increasing and let \(x_1,\,x_2 \in I\) with \(x_1 < x_2,\) \(\lambda \in [0,1],\) and \(x:=(1-\lambda)x_1+\lambda x_2.\) We have to show that \(f(x) \le (1-\lambda)f(x_1) + \lambda f(x_2).\) For \(\lambda \in \{0,1\}\) this is obviously true. Therefore, assume that \(\lambda \in (0,1).\) By the mean value theorem (Theorem 6.10) there exist \(\xi_1 \in (x_1,x)\) and \(\xi_2 \in (x,x_2)\) such that \[\begin{aligned} f'(\xi_1)& = \frac{f(x)-f(x_1)}{x-x_1} {=} \frac{f(x)-f(x_1)}{\lambda(x_2-x_1)} \quad \text{and} \\ f'(\xi_2) &= \frac{f(x_2)-f(x)}{x_2-x} {=} \frac{f(x_2)-f(x)}{(1-\lambda)(x_2-x_1)}, \end{aligned}\] where we have again used (6.7). Since \(\xi_1 < \xi_2\) and the monotonicity of \(f'\) we obtain \[\frac{f(x)-f(x_1)}{\lambda(x_2-x_1)} = f'(\xi_1) \le f'(\xi_2) = \frac{f(x_2)-f(x)}{(1-\lambda)(x_2-x_1)}.\] A multiplication of the latter inequality with the positive constant \(\lambda(1-\lambda)(x_2-x_1)\) yields \[(1-\lambda)(f(x)-f(x_1)) \le \lambda (f(x_2)-f(x)),\] or equivalently, \(f(x) \le (1-\lambda)f(x_1) + \lambda f(x_2).\) Therefore, \(f\) is convex.

  2. This statement follows directly from i together with the monotonicity criterion in Theorem 6.14.

Example 6.12

  1. The exponential function \(\exp:{\mathbb{R}}\to {\mathbb{R}}\) is convex since \(\exp''(x)=\exp(x) > 0\) for all \(x \in {\mathbb{R}}.\)

  2. The function \(f:{\mathbb{R}}\to {\mathbb{R}},\) \(x \mapsto x^2\) is convex since \(f''(x) = 2 > 0\) for all \(x \in {\mathbb{R}}.\)

The next theorem illustrates the importance of convexity in optimization.

Theorem 6.18

Let \(I \subseteq {\mathbb{R}}\) be an interval and let \(f:I \to {\mathbb{R}}\) be convex. If \(f\) has a local minimum in \(x_0 \in I,\) then \(f\) even has its global minimum in \(x_0.\)

Proof. If \(f\) has a local minimum at \(x_0 \in I,\) then by definition there exists an \(\varepsilon > 0\) such that \[f(x_0) \le f(x) \quad \text{for all } x \in I \cap U_{\varepsilon}(x_0).\] Assume that \(f(x_0)\) is not the global minimum of \(f.\) Then there exists an \(y \in I\) such that \(f(y) < f(x_0).\) Set \(x_\lambda:=(1-\lambda)x_0 + \lambda y\) for \(\lambda \in (0,1).\) Then because of the convexity of \(f\) we have \[f\left(x_\lambda\right) \le (1-\lambda)f(x_0) + \lambda f(y) < f(x_0).\] But this is a contradiction, since for sufficiently small \(\lambda \in (0,1),\) \(x_\lambda \in I \cap U_\varepsilon(x_0)\) and so \(x_0\) would not be a local minimizer. Hence, \(f(x_0)\) must already be the global minimum.

6.5 Taylor’s Theorem

Let \(D \subseteq {\mathbb{R}}\) and \(f:D \to {\mathbb{R}}\) be sufficiently often differentiable. In Section 6.1 we interpreted the tangent \[t:{\mathbb{R}}\to {\mathbb{R}}, \quad x \mapsto f(x_0) + f'(x_0)(x-x_0)\] as the “best linear approximation” for \(f\) in \(x_0 \in D\) in the sense that for the error \(R(x):=f(x)-t(x)\) it holds that \[\lim_{x \to x_0} \frac{R(x_0)}{x-x_0} = 0.\] Linear functions are polynomial functions of degree at most \(1.\) Therefore, one may ask the question whether it is possible to approximate \(f\) in \(x_0\) by polynomial functions of higher degree with the goal to obtain a better approximation. The question is how we would determine such a polynomial function. If we consider the linear approximation by the tangent \(t\) we see that it has the property \(t(x_0) = f(x_0)\) and \(t'(x_0) = f'(x_0).\) So the idea is to construct an approximating polynomial function \(p\) such that \(p^{(k)}(x_0) = f^{(k)}(x_0)\) for as many \(k \in {\mathbb{N}}\) as possible.

Let \(p:{\mathbb{R}}\to {\mathbb{R}}\) be a polynomial function of degree \(n\) that we can write in the form \[p(x) = \sum_{k=0}^n a_k(x-x_0)^k \quad \text{for all } x \in {\mathbb{R}}.\] In this form, the coefficients \(a_0,\,\ldots,\,a_n \in {\mathbb{R}}\) can be expressed by the derivatives of \(p\) in \(x_0\) in an elegant way. This can be seen when putting in the argument \(x_0\) into \(p\) and its derivatives. We obtain \(p(x_0) = a_0\) and by induction, for \(j \in {\mathbb{N}}\) with \(j \le n\) we obtain \[\begin{alignedat}{3} p'(x) &= \sum_{k=1}^n ka_k(x-x_0)^{(k-1)} && \quad \Longrightarrow \quad p'(x_0) = a_1, \\ p''(x) &= \sum_{k=2}^n k(k-1)a_k(x-x_0)^{(k-2)} && \quad \Longrightarrow \quad p''(x_0) = 2a_2, \\ & \ldots && \\ p^{(j)}(x) &= \sum_{k=j}^n k(k-1)\ldots(k-j+1)a_k(x-x_0)^{(k-j)} && \quad \Longrightarrow \quad p^{(j)}(x_0) = j!\cdot a_j. \end{alignedat}\] Summarizing we have \(p^{(k)}(x_0) = k!\cdot a_k\) for \(k=0,\,1,\,\ldots,\,n\) and \[p(x) = \sum_{k=0}^n \frac{p^{(k)}(x_0)}{k!}(x-x_0)^k.\] In other words, a polynomial function of degree \(n\) is already uniquely determined by its function value and the values of its first \(n\) derivatives in \(x_0.\) Our preliminary considerations thus suggest an approximating polynomial function to be given by the requirement that \[p^{(k)}(x_0) = f^{(k)}(x_0), \quad k=0,\,\ldots,\,n.\] This motivates the following definition of the Taylor polynomial, named after the English mathematician Brook Taylor.

Definition 6.6 • Taylor polynomial, remainder term

Let \(I \subseteq {\mathbb{R}}\) be an interval, \(x_0 \in I,\) and \(f:I \to {\mathbb{R}}\) an \(n\) times differentiable function.

  1. The polynomial function \[T_n :{\mathbb{R}}\to {\mathbb{R}}, \quad x \mapsto \sum_{k=0}^n \frac{f^{(k)}(x_0)}{k!}(x-x_0)^k\] is called the \(n\)-th Taylor polynomial of \(f\) in the expansion point \(x_0\).

  2. The function \(R_n:I \to {\mathbb{R}},\) \(x \mapsto f(x) - T_n(x)\) is the remainder term corresponding to \(T_n.\)

Remark 6.15

With the notations and under the assumptions of Definition 6.6 we have \(T_n^{(k)}(x_0) = f^{(k)}(x_0).\)

Example 6.13

We consider the Taylor polynomial of the exponential function \(\exp:{\mathbb{R}}\to {\mathbb{R}}\) in the expansion point \(x_0 = 0.\) For all \(x \in {\mathbb{R}}\) we have \[T_n(x) = \sum_{k=0}^n \frac{\exp^{(k)}(0)}{k!} (x-0)^k = \sum_{k=0}^n \frac{1}{k!}x^k.\] This is exactly the \(n\)-th partial sum of the exponential series. We know that for all \(x \in {\mathbb{R}}\) we have \[\exp(x) = \sum_{k=0}^\infty \frac{x^k}{k!} = \lim_{n \to \infty} T_n(x).\] For this reason we also call the involved series the Taylor series.

Analogously we obtain the \(n\)-th Taylor polynomial of the sine and cosine functions in the expansion point \(x_0\) as the \(n\)-th partial sums of the respective series derived in Theorem 5.8 (exercise!).

Theorem 6.19 • Taylor’s theorem

Let \(I \subseteq {\mathbb{R}}\) be an interval, \(x_0 \in I,\) and \(f:I \to {\mathbb{R}}\) \(n\) times differentiable in \(x_0,\) where \(n \ge 1.\) Then the following statements are satisfied:

  1. For all \(x \in I\) it holds that \[\begin{alignedat}{3} f(x) &= \sum_{k=0}^n \frac{f^{(k)}(x_0)}{k!}(x-x_0)^k + R_n(x) \quad \text{(Taylor's formula)}, \end{alignedat}\] where \(\lim_{x \to x_0} \frac{R_n(x)}{(x-x_0)^n} = 0.\)

  2. If \(f\) is even \(n+1\) times differentiable, then for every \(x \in I\setminus\{x_0\}\) there exists a \(\xi\) between \(x\) and \(x_0\) with \[\begin{alignedat}{3} R_{n}(x) &= \frac{f^{(n+1)}(\xi)}{(n+1)!}(x-x_0)^{n+1} \quad \text{(Lagrange remainder)}. \end{alignedat}\]

Proof.

  1. By \(T_n\) we denote the \(n\)-th Taylor polynomial of \(f\) in the expansion point \(x_0.\) By applying L’Hôpital’s rule (Theorem 6.16) \(n-1\) times we obtain \[\begin{aligned} \lim_{x \to x_0} \frac{R_n(x)}{(x-x_0)^n} &= \lim_{x \to x_0} \frac{f(x) - T_n(x)}{(x-x_0)^n} \\ &= \lim_{x \to x_0} \frac{f'(x) - T_n'(x)}{n(x-x_0)^{n-1}} \\ &= \lim_{x \to x_0} \frac{f''(x) - T_n''(x)}{n(n-1)(x-x_0)^{n-2}} \\ & \ldots \\ &= \lim_{x \to x_0} \frac{f^{(n-1)}(x) - T_n^{(n-1)}(x)}{n!(x-x_0)}, \end{aligned}\] under the assumption that the last limit of this equation sequence exists. (Note that we cannot proceed by applying L’Hôpital’s rule once more: We cannot ensure that the limit \(\lim_{x \to x_0} \frac{f^{(n)}(x) - T_n^{(n)}(x)}{n!}\) exists, since we do not assume that \(f^{(n)}\) is continuous in \(x_0.\)) We will now prove the existence of this limit. Because of \(f^{(n-1)}(x_0) = T_n^{(n-1)}(x_0)\) we obtain \[\begin{aligned} &\lim_{x \to x_0} \frac{f^{(n-1)}(x) - T_n^{(n-1)}(x)}{n!(x-x_0)} \\ =& \frac{1}{n!} \lim_{x \to x_0}\left( \frac{f^{(n-1)}(x) - f^{(n-1)}(x_0)}{x-x_0} - \frac{T_n^{(n-1)}(x) - T_n^{(n-1)}(x_0)}{x-x_0} \right) \\ =& \frac{1}{n!}\left( f^{(n)}(x_0) - T_n^{(n)}(x_0) \right) = 0. \end{aligned}\] The existence of \(T_n^{(n)}\) and \(f^{(n)}\) follows from the fact that \(T_n\) is a polynomial function and since \(f\) is \(n\) times differentiable. Hence, \(\lim_{x \to x_0} \frac{R_n(x)}{(x-x_0)^n} = 0.\)

  2. Now assume that \(f\) is even \(n+1\) times differentiable and let \(x \in I \setminus \{x_0\}\) be arbitrary but fixed. We define the auxiliary function \[h:I \to {\mathbb{R}}, \quad t \mapsto \frac{f(t) - T_n(t)}{(n+1)!}(x-x_0)^{n+1} - R_n(x) \frac{(t-x_0)^{n+1}}{(n+1)!}.\] Since \(T_n\) is a polynomial function of degree at most \(n,\) we have \(T_n^{(n+1)} = 0.\) This implies \[h^{(n+1)}(t) = \frac{f^{(n+1)}(t)}{(n+1)!}(x-x_0)^{n+1} - R_n(x).\] Thus the claim is proven if we can show that \(h^{(n+1)}\) has a zero in \(I(x,x_0) = (x,x_0) \cup (x_0,x).\) It holds that \[\begin{aligned} 0 &= h(x_0) = \ldots = h^{(n)}(x_0) \quad \text{and} \\ 0 &= h(x) = (f(x)-T_n(x) - R_n(x)) \frac{(x-x_0)^{n+1}}{(n+1)!}. \end{aligned}\] Hence, by Rolle’s theorem (Theorem 6.9) we obtain the following chain of implications: \[\begin{aligned} & \text{There exists a $\xi_1 \in I(x,x_0)$ with $h'(\xi_1) = 0.$} \\ \Longrightarrow \quad & \text{There exists a $\xi_2 \in I(\xi_1,x_0)$ with $h''(\xi_2) = 0.$} \\ & \ldots \\ \Longrightarrow \quad & \text{There exists a $\xi_{n+1} \in I(\xi_{n},x_0)$ with $h^{(n+1)}(\xi_{n+1}) = 0.$} \end{aligned}\] If we choose \(\xi = \xi_{n+1},\) then it holds that \(\xi \in I(x,x_0)\) and \(R_n(x) = \frac{f^{(n+1)}(\xi)}{(n+1)!}(x-x_0)^{n+1}.\)

Video 6.13. Taylor’s theorem.

Remark 6.16

With the notations and under the assumptions of Theorem 6.19, for the special case \(n=1\) we obtain for all \(x \in I\) that \(f(x) = T_1(x) + R_1(x),\) where \[T_1(x) = f(x_0) + f'(x_0)(x-x_0)\quad \text{and} \quad \lim_{x \to x_0} \frac{R_1(x)}{x-x_0} = 0.\] This corresponds exacty to our initial considerations on differentiability and \(T_1\) is the tangent to the graph of \(f\) at the point \(x_0.\) Further, for \(n = 0,\) with the Lagrange remainder we can recover the mean value theorem (exercise!).

Example 6.14

We have already seen in Example 6.13 that the Taylor polynomials of the exponential, sine, and cosine functions in the expansion point \(x_0\) correspond to the partial sums of the respective series representations. So in this example we will try to derive a series representation of logarithm function \(\ln:(0,\infty) \to {\mathbb{R}}.\) As expansion point we choose \(x_0 = 1,\) since there the function value and the derivatives can be easily expressed. On the other hand, in the expansion point \(x_0 = 0\) we have simpler factors \(x^k\) instead of \((x-x_0)^k\) in the series representation. Since the logarithm is not defined at \(x_0 = 0,\) we consider the function \(f:(-1,\infty) \to {\mathbb{R}},\) \(x \mapsto \ln(x+1)\) at the expansion point \(x_0 = 0\) instead. Inductively we obtain the derivatives of \(f\) as \[f^{(n)}(x) = (-1)^{n-1}\cdot(n-1)!\cdot(1+x)^{-n}\] for all \(x = (-1,\infty)\) and all \(n \in {\mathbb{N}}\setminus\{0\}\) (exercise!). For \(x_0 = 0\) and \(n \in {\mathbb{N}}\setminus\{0\}\) we obtain \[f^{(n)}(0) = (-1)^{n-1}\cdot(n-1)!.\] Moreover, \(f(0) = \ln 1 = 0\) and hence, the \(n\)-th Taylor polynomial of \(f\) in the expansion point \(x_0 = 0\) takes the form \[T_n(x) = \sum_{k=1}^n \frac{(-1)^{k-1}\cdot(k-1)!}{k!}(x-0)^k = \sum_{k=1}^n (-1)^{k-1} \frac{1}{k}x^k.\] For the remainder term, with Theorem 6.19 we see that for each \(x \in (-1,\infty)\setminus\{0\}\) there exists a \(\xi_{n+1} \in I(x,0)\) such that \[R_n(x) = \ln(1+x) - T_n(x) = \frac{f^{(n+1)}(\xi_{n+1})}{(n+1)!}x^{n+1} = (-1)^n\frac{1}{n+1}\cdot \frac{1}{(1+\xi_{n+1})^{n+1}}x^{n+1}.\] For the case \(x = 1\) this simplifies to \[R_n(1) = \frac{-1}{n+1}\cdot\frac{1}{(1+\xi_{n+1})^{n+1}}.\] Since \(\xi_{n+1}\) lies between \(0\) and \(1,\) the sequence \(\left( \frac{1}{(1+\xi_{n+1})^{n+1}}\right)_{n \ge 1}\) is bounded and hence, \(\lim_{n \to \infty} R_n(1) = 0.\) With this we can finally determine the sum of the alternating harmonic series (cf. Example 3.3 i). We have \[\ln 2 = f(1) = \lim_{n \to \infty} T_n(1) = \sum_{k=1}^\infty (-1)^{k-1} \frac{1}{k}.\]

Remark 6.17

Let \(I \subseteq {\mathbb{R}}\) be an interval. If \(f:I \to {\mathbb{R}}\) is arbitrarily often differentiable and if \(T_n\) is the \(n\)-th Taylor polynomial in the expansion point \(x_0 \in I,\) then it is not guaranteed that the limit \(\lim_{n \to \infty}T_n(x)\) exists for all \(x \in I.\) In Example 6.14 we clearly see that this limit cannot exist for \(x > 1\) since the necessary condition for convergence of the series is not fulfilled, i. e., \(\left( (-1)^{n+1}\frac{1}{n}x^n\right)_{n \ge 1}\) is not a null sequence. This is not a contradiction to Theorem 6.19 since there only a statement about the limit \(\lim_{x \to x_0} \frac{R_n(x)}{(x-x_0)^n}\) for fixed \(n\) has been made.

Even if \(\left(T_n(x)\right)_{n \in {\mathbb{N}}}\) converges for an \(x \in I,\) it is not guaranteed that \(f(x) = \lim_{n \to \infty}T_n(x).\) To illustrate this, consider the function \[f:{\mathbb{R}}\to {\mathbb{R}}, \quad x \mapsto \begin{cases} \exp\left( -\frac{1}{x^2}\right), & \text{if } x \neq 0, \\ 0, & \text{if } x = 0. \end{cases}\] The function \(f\) is arbitrarily often differentiable and it holds that \(f^{(n)}(0) = 0\) for all \(n \in {\mathbb{N}}\) (exercise!). Thus, the \(n\)-th Taylor polynomial in the expansion point \(x_0 = 0\) is the zero polynomial. Therefore, for \(x \neq 0\) we obtain \[\lim_{n \to \infty} T_n(x) = 0 \neq \exp\left(-\frac{1}{x^2}\right) = f(x).\]

Finally, let us illustrate the usefulness of Taylor’s theorem for the analysis of local extrema of functions.

Theorem 6.20

Let \(D \subseteq {\mathbb{R}}\) and let \(x_0 \in D\) be an inner point of \(D.\) Assume that \(f:D \to {\mathbb{R}}\) is \(n\) times differentiable in \(x_0\) and that \[f'(x_0) = \ldots = f^{(n-1)}(x_0) = 0 \quad \text{and} \quad f^{(n)}(x_0) \neq 0.\] Then the following statements are satisfied:

  1. If \(n\) is odd, then \(f\) has no local extremum in \(x_0.\)

  2. If \(n\) is even, then \(f\) has a strict local extremum in \(x_0,\) namely a strict local minimum, if \(f^{(n)}(x_0) > 0\) and a strict local maximum, if \(f^{(n)}(x_0) < 0.\)

Proof. Since \(x_0\) is an inner point of \(D\) there exists an \(\varepsilon > 0\) such that \(U_\varepsilon(x_0) \subseteq D.\) Since \(U_\varepsilon(x_0) = (x_0-\varepsilon,x_0+\varepsilon)\) is an interval and since \(f'(x_0) = \ldots = f^{(n-1)}(x_0) = 0,\) for all \(x \in U_{\varepsilon}(x_0)\) Theorem 6.19 yields \[f(x) = f(x_0) + \frac{f^{(n)}(x_0)}{n!}(x-x_0)^n + R_n(x) \quad \text{with } \lim_{x \to x_0} \frac{R_n(x)}{(x-x_0)^n} = 0.\] In particular, the function \(g:U_{\varepsilon}(x_0)\setminus\{x_0\} \to {\mathbb{R}},\) \(x \mapsto \frac{R_n(x)}{(x-x_0)^n}\) can be continuously continued to \(U_\varepsilon(x_0)\) by defining \(g(x_0) := 0.\) Thus, by the \(\varepsilon\)-\(\delta\) criterion for continuity (Theorem 4.11) there exists a \(\delta > 0\) (with \(\delta \le \varepsilon\)) such that \[\left|\frac{R_n(x)}{(x-x_0)^n}\right| < \left| \frac{f^{(n)}(x_0)}{n!} \right| \quad \text{for all } x \in U_\delta(x_0).\] This implies that the sign of \[f(x) - f(x_0) = \left( \frac{f^{(n)}(x_0)}{n!} + \frac{R_n(x)}{(x-x_0)^n} \right) \cdot (x-x_0)^n\] is equal to the sign of \(\frac{f^{(n)}(x_0)}{n!} \cdot (x-x_0)^n\) for all \(x \in U_\delta(x_0) \setminus \{x_0\}.\) We distinguish two cases:
Case 1: \(n\) is odd. In this case, \((x-x_0)^n\) changes its sign in \(x_0\) and therefore, also \(f(x)-f(x_0)\) changes its sign in \(x_0.\) Hence, \(f\) does not have a local extemum in \(x_0.\)
Case 2: \(n\) is even. In this case, for all \(x \in U_\delta(x_0) \setminus \{x_0\}\) we have \[f(x)-f(x_0) \begin{cases} >0, & \text{if } f^{(n)}(x_0) > 0, \\ <0, & \text{if } f^{(n)}(x_0) < 0. \end{cases}\] Consequently, \(f\) has a strict local minimum in \(x_0,\) if \(f^{(n)}(x_0) > 0\) and a strict local maximum in \(x_0,\) if \(f^{(n)}(x_0) < 0.\)

Example 6.15

Let \(n \in {\mathbb{N}}\setminus\{0,1\}\) and consider the function \(f:{\mathbb{R}}\to {\mathbb{R}},\) \(x \mapsto x^n.\) Then \(f^{(k)}(0) = 0\) for \(k = 1,\,\ldots,\,n-1\) and \(f^{(n)}(0) = n! > 0.\) Thus, \(f\) has a strict local minimum in case \(n\) is even, but no local extremum in \(0,\) if \(n\) is odd.

Home

Contents

Weeks