A colleague just sent me Xerox copies of a few pages of a 1899 biography of the général Bourbaki. Its author, François Bournand, was the private secretary of Édouard Drumont, an antisemitic writer and journalist. The book would probably not be worth much being mentioned here without its dedication:

À l'abbé Félix Klein

de l'Institut catholique

Hommage respectueux de son dévoué en N.-S.

François Bournand

Professeur d'histoire de l'art à l'École professionnelle catholique

Abbé is abbot, in this context, a catholic priest without a parish; the French initials N.-S. mean Notre Seigneur, Our Lord. It appears that this Félix Klein (note the accent on the e) also has a Wikipedia page.

A few days ago, The Scotsman published a paper about Klaus Roth's legacy, explaining how he donated his fortune (1 million pounds) to various charities. This paper was reported by some friends on Facebook. Yuri Bilu added the mention that he knew two important theorems of Roth, and since one of them did not immediately reached my mind, I decided to write this post.

The first theorem was a 1935 conjecture of Erdős and Turán concerning arithmetic progression of length 3 that Roth proved in 1952. That is, one is given a set $A$ of positive integers and one seeks for triples $(a,b,c)$ of distinct elements of $A$ such that $a+c=2b$; Roth proved that infinitely many such triples exist as soon as the upper density of $A$ is positive, that is:
\[ \limsup_{x\to+\infty} \frac{\mathop{\rm Card}(A\cap [0;x])}x >0. \]
In 1975, Endre Szemerédi proved that such sets of integers contain (finite) arithmetic progressions of arbitrarily large length. Other proofs have been given by Hillel Furstenberg (using ergodic theory) and Tim Gowers (by Fourier/combinatorical methods); Roth had used Hardy-Littlewood's circle method.

In 1976, Erdős strengthened his initial conjecture with Turán and predicted that arithmetic progressions of arbitrarily large length exist in $A$ as soon as
\[ \sum_{a\in A} \frac 1a =+\infty.\]
Such a result is still a conjecture, even for arithmetic progressions of length $3$, but a remarkable particular case has been proved by Ben Green and Terry Tao in 2004, when $A$ is the set of all prime numbers.

Outstanding as these results are (Tao has been given the Fields medal in 2006 and Szemerédi the Abel prize in 2012), the second theorem of Roth was proved in 1955 and was certainly the main reason for awarding him the Fields medal in 1958. Indeed, Roth gave a definitive answer to a long standing question in diophantine approximation that originated from the works of Joseph Liouville (1844). Given a real number $\alpha$, one is interested to rational fractions $p/q$ that are close to $\alpha$, and to the quality of the approximation, namely the exponent $n$ such that $\left| \alpha- \frac pq \right|\leq 1/q^n$. Precisely, the approximation exponent $\kappa(\alpha)$ is the largest lower bound of all real numbers $n$ such that the previous inequality has infinitely many solutions in fractions $p/q$, and Roth's theorem asserts that one has $\kappa(\alpha)=2$ when $\alpha$ is an irrational algebraic number.

One part of this result goes back to Dirichlet, showing that for any irrational number $\alpha$, there exist many good approximations with exponent $2$. This can be proved using the theory of continued fractions and is also a classical application of Dirichlet's box principle. Take a positive integer $Q$ and consider the $Q+1$ numbers $q\alpha- \lfloor q\alpha\rfloor$ in $[0,1]$, for $0\leq q\leq Q$; two of them must be less that $1/Q$ apart; this furnishes integers $p',p'',q',q''$, with $0\leq q'<q''\leq Q$ such that $\left| (q''\alpha-p'')-(q'\alpha-p')\right|\leq 1/Q$; then set $p=p''-p'$ and $q=q''-q'$; one has $\left| q\alpha -p \right|\leq 1/Q$, hence $\left|\alpha-\frac pq\right|\leq 1/Qq\leq 1/q^2$.

To prove an inequality in the other direction, Liouville's argument was that if $\alpha$ is an irrational root of a nonzero polynomial $P\in\mathbf Z[T]$, then $\kappa(\alpha)\leq\deg(P)$. The proof is now standard: given an approximation $p/q$ of $\alpha$, observe that $q^d P(p/q)$ is a non-zero integer (if, say, $P$ is irreducible), so that $\left| q^d P(p/q)\right|\geq 1$. On the other hand, $P(p/q)\approx (p/q-\alpha) P'(\alpha)$, hence an inequality $\left|\alpha-\frac pq\right|\gg q^{-d}$.

This result has been generalized, first by Axel Thue en 1909 (who proved an inequality $\kappa(\alpha)\leq \frac12 d+1$), then by Carl Ludwig Siegel and Freeman Dyson in 1947 (showing $\kappa(\alpha)\leq 2\sqrt d$ and $\kappa(\alpha)\leq\sqrt{2d}$). While Liouville's result was based in the minimal polynomial of $\alpha$, these generalisations required to involve polynomials in two variables, and the non-vanishing of a quantity such that $q^dP(p/q)$ above was definitely less trivial. Roth's proof made use of polynomials of arbitrarily large degree, and his remarkable achievement was a proof of the required non-vanishing result.

Roth's proof was “elementary”, making use only of polynomials and wronskians. There are today more geometric proofs, such as the one by Hélène Esnault and Eckart Viehweg (1984) or Michael Nakamaye's subsequent proof (1995) which is based on Faltings's product theorem.

What is still missing, however, is the proof of an effective version of Roth's theorem, that would give, given any real number $n>\kappa(\alpha)$, an actual integer $Q$ such that every rational fraction $p/q$ in lowest terms such that $\left|\alpha-\frac pq\right|\leq 1/q^n$ satisfies $q\leq Q$. It seems that this defect lies at the very heart of almost all of the current approaches in diophantine approximations...

I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval $[-1;1]$ can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and $2\pi$-periodic function on $\mathbf R$ can be uniformly approximated by trigonometric polynomials.

1. Using the Stone theorem.

This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on $[-1;1]$:

Stone truncates the power series expansion of the function \[ x\mapsto \sqrt{1-(1-x^2)}=\sum_{n=0}^\infty \binom{1/2}n (x^2-1)^n, \] bounding by hand the error term.

Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving that the function $x\mapsto \sqrt x$ can be uniformly approximated by polynomials on $[0;1]$. (The absolute value function is recovered since $\mathopen|x\mathclose|\sqrt{x^2}$.) To this aim, he introduces the sequence of polynomials given by $p_0=0$ and $p_{n+1}(x)=p_n(x)+\frac12\left(x-p_n(x)^2\right)$ and proves by induction the inequalities \[ 0\leq \sqrt x-p_n(x) \leq \frac{2\sqrt x}{2+n\sqrt x} \leq \frac 2n\] for $x\in[0;1]$ and $n\geq 0$. This implies the desired result.

The algebra of polynomials separates points on the compact set $[-1;1]$, hence is dense. To treat the case of trigonometric polynomials, consider Laurent polynomials on the unit circle.

2. Convolution.

Consider an approximation $(\rho_n)$ of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on $\mathbf R$ such that $\int\rho_n=1$ and such that for every $\delta>0$, $\int_{\mathopen| x\mathclose|>\delta} \rho_n(x)\,dx\to 0$. Given a continuous function $f$ on $\mathbf R$, form the convolutions defined by $f*\rho_n(x)=\int_{\mathbf R} \rho_n(t) f(x-t)\, dt$. It is classical that $f*\rho_n$ converges uniformly on every compact to $f$.

Now, given a continuous function $f$ on $[-1;1]$, one can extend it to a continuous function with compact support on $\mathbf R$ (defining $f$ to be affine linear on $[-2;-1]$ and on $[1;2]$, and to be zero outside of $[-2;2]$. We want to choose $\rho_n$ so that $f*\rho_n$ is a polynomial on $[-1;1]$. The basic idea is just to choose a parameter $a>0$, and to take $\rho_n(x)= c_n (1-(x/a)^2)^n$ for $\mathopen|x\mathclose|\leq a$ and $\rho_n(x)=0$ otherwise, with $c_n$ adjusted so that $\int\rho_n=1$. Let us write $f*\rho_n(x)=\int_{-2}^2 \rho_n(x-t) f(t)\, dt$; if $x\in[-1;1]$ and $t\in[-2:2]$, then $x-t\in [-3;3]$ so we just need to be sure that $\rho_n$ is a polynomial on that interval, which we get by taking, say, $a=3$. This shows that the restriction of $f*\rho_n$ to $[-1;1]$ is a polynomial function, and we're done.

This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in $\mathopen]0;1\mathclose[$ which he extends linearly to $[0;1]$ so that they vanish at $0$ and $1$; he considers the same convolution, taking the parameter $a=1$.

As shown by Jacskon, the same approach works easily (in a sense, more easily) for $2\pi$-periodic functions, considering the kernel defined by $\rho_n(x)=c_n(1+\cos(x))^n$, where $c_n$ is chosen so that \int_{-\pi}^\pi \rho_n=1$.

3. Bernstein polynomials.

Take a continuous function $f$ on $[0;1]$ and, for $n\geq 0$, set \[ B_nf(x) = \sum_{k=0}^n f(k/n) \binom nk t^k (1-t)^{n-k}.\] It is classical that $B_nf$ converges uniformly to $f$ on $[0;1]$.

There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that $B_nf(x)$ is the expected value of $f(S_n)$, where $S_n$ is the sum of $n$ i.i.d. Bernoulli random variables with parameter $x\in[0;1]$. Another (generalized as the Korovkin theorem, “On convergence of linear positive operators in the space of continuous functions”, Dokl. Akad. Nauk SSSR (N.S.), vol. 90, ) consists in showing (i) that for $f=1,x,x^2$, $B_nf$ converges uniformly to $f$ (an explicit calculation), (ii) that if $f\geq 0$, then $B_nf\geq 0$ as well, (iii) for every $x\in[0;1]$, squeezing $f$ inbetween two quadratic polynomials $f^+$ and $f_-$ such that $f^+(x)-f^-(x)$ is as small as desired.

A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, $2\pi$-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing $1,x,x^2$ on $[0;1]$ by $1,z,z^2,z^{-1},z^{-2}$ on the unit circle).

Let us show that for every $\delta\in\mathopen]0,1\mathclose[$ and every $\varepsilon>0$, there exists a polynomial $p$ satisfying the following properties:

$0\leq p(x)\leq \varepsilon$ for $-1\leq x\leq-\delta$;

$0\leq p(x)\leq 1$ for $-\delta\leq x\leq \delta$;

$1-\varepsilon\leq p(x)\leq 1$ for $\delta\leq x\leq 1$.

In other words, these polynomials approximate the (discontinuous) function $f$ on $[-1;1]$ defined by $f(x)=0$ for $x< 0$, $f(x)=1$ for $x> 0$ and $f(0)=1/2$.

A possible formula is $p(x)=(1- ((1-x)/2))^n)^{2^n}$, where $n$ is a large enough integer. First of all, one has $0\leq (1-x)/2\leq 1$ for every $x\in[-1;1]$, so that $0\leq p(x)\leq 1$. Let $x\in[-1;-\delta]$; then one has $(1-x)/2\geq (1+\delta)/2$, hence $p(x)\leq (1-((1+\delta)/2)^n)^{2^n}$, which can be made arbitrarily small when $n\to\infty$. Let finally $x\in[\delta;1]$; then $(1-x)/2\geq (1-\delta)/2$, hence $p(x)\geq (1-((1-\delta)/2)^n)^{2^n}\geq 1- (1-\delta)^n$, which can be made arbitrarily close to $1$ when $n\to\infty$.

By translation and dilations, the discontinuity can be placed at any element of $[0;1]$. Let now $f$ be an arbitrary step function and let us write it as a linear combination $f=\sum a_i f_i$, where $f_i$ is a $\{0,1\}$-valued step function. For every $i$, let $p_i$ be a polynomial that approximates $f_i$ as given above. The linear combination $\sum a_i p_i$ approximates $f$ with maximal error $\sup(\mathopen|a_i\mathclose|)$.

Using uniform continuity of continuous functions on $[-1;1]$, every continuous function can be uniformly approximated by a step function. This concludes the proof.

5. Using approximation by piecewise linear functions.

As in the proof of Stone's theorem, one uses the fact that the function $x\mapsto \mathopen|x\mathclose|$ is uniformly approximated by a sequence of polynomial on $[-1;1]$. Consequently, so are the functions $x\mapsto \max(0,x)=(x+\mathopen|x\mathclose|)/2 $ and $x\mapsto\min(0,x)=(x-\mathopen|x\mathclose|)/2$. By translation and dilation, every continuous piecewise linear function on $[-1;1]$ with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.

6. Moments.

A linear subspace $A$ of a Banach space is dense if and only if every continuous linear form which vanishes on $A$ is identically $0$. In the present case, the dual of $C^0([-1;1],\mathbf C)$ is the space of complex measures on $[-1;1]$ (Riesz theorem, if one wish, or the definition of a measure). So let $\mu$ be a complex measure on $[-1;1]$ such that $\int_{-1}^1 t^n \,d\mu(t)=0$ for every integer $n\geq 0$; let us show that $\mu=0$. This is the classical problem of showing that a complex measure on $[-1;1]$ is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.

One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval $[0;1]$. So, let $F(z)=\int_0^1 t^z \,d\mu(t)$. The function $F$ is holomorphic and bounded on the half-plane $\Re(z)> 0$ and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting $w=(z-1)/(z+1)$) and gets a bounded function on the unit disk with zeroes at $(n-1)/(n+1)=1-2/(n+1)$, for $n\in\mathbf N$, and this contradicts the fact that the series $\sum 1/(n+1)$ diverges.

In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family $(t^{\lambda_n})$ generates a dense subset of $C([0;1])$ if and only if $\sum 1/\lambda_n=+\infty$.

For every complex number $a$ such that $\mathopen|a\mathclose|>1$, one can write $1/(t-a)$ as a converging power series. By summation, this quickly gives that
\[ F(a) = \int_{-1}^1 \frac{1}{t-a}\, d\mu(t) \equiv 0. \]
Observe that this formula defines a holomorphic function on $\mathbf C\setminus[-1;1]$; by analytic continuous, one thus has $F(a)=0$ for every $a\not\in[-1;1]$.
Take a $C^2$-function $g$ with compact support on the complex plane. For every $t\in\mathbf C$, one has the following formula
\[ \iint \bar\partial g(z) \frac{1}{t-z} \, dx\,dy = g(t), \]
which implies, by integration and Fubini, that
\[ \int_{-1}^1 g(t)\,d\mu(t) = \iint \int \bar\partial g(z) \frac1{t-z}\,d\mu(t)\,dx\,dy = \iint \bar\partial g(z) F(z)\,dx\, dy= 0. \]
On the other hand, every $C^2$ function on $[-1;1]$ can be extended to such a function $g$, so that the measure $\mu$ vanishes on every $C^2$ function on $[-1;1]$. Approximating a continuous function by a $C^2$ function (first take a piecewise linear approximation, and round the corners), we get that $\mu$ vanishes on every continuous function, as was to be proved.

7. Chebyshev/Markov systems.

This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence $(f_n)$ of continuous functions on an interval $I$ is a Markov system (resp. a weak Markov system) if for every integer $n$, every linear combination of $(f_0,\dots,f_n)$ has at most $n$ zeroes (resp. $n$ sign changes) in $I$.

Given a Markov system $(f_n)$, one defines a sequence $(T_n)$, where $T_n-f_n$ is the element of $\langle f_0,\dots,f_{n-1}\rangle$ which is the closest to $f_n$. The function $T_n$ has $n$ zeroes on the interval $I$; let $M_n$ be the maximum distance between two consecutive zeroes.

Borwein's theorem (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence $(f_n)$ is a Markov system consisting of $C^1$ functions, then its linear span is dense in $C(I)$ if and only if $M_n\to 0$.

The sequence of monomials $(x^n)$ on $I=[-1;1]$ is of
course a Markov system. In this case, the polynomial $T_n$ is the $n$th
Chebyshev polynomial, given by $T_n(2\cos(x))=2\cos(nx)$, and its roots
are given by $2\cos((\pi+2k\pi)/2n)$, for $k=0,\dots,n-1$, and $M_n\leq
\pi/n$. This gives yet another proof of Weierstrass's approximation theorem.

I was absolutely excited at the prospect of returning to this avant-garde jazz hall (it has been my 3rd concert there, the first one was in 2010, with Sylvie Courvoisier, Thomas Morgan and Ben Perowski, and the second, last year, with Wadada Leo Smith and Vijay Iyer) to listen to Gerry Hemingway, and the cold rain falling on New York City did not diminish my enthusiasm. (Although I had to take care on the streets, for one could almost see nothing...) I feared I would arrive late, but Gerry Hemingway was still installing his tools, various sticks, small cymbals, woodblocks, as well as a cello bow...

I admit, it took me some time to appreciate the music. Of course, it was free jazz (so what?) and I couldn't really follow the stream of music. Both musicians were acting delicately and skillfully (no discussion) at creating sound, as a painter would spread brush strokes on a canvas—and actually, Hemingway was playing a lot of brushes, those drum sticks made of many (wire or plastic) strings that have a delicate and not very resonating sound... Color after color, something was emerging, sound was being shaped.

There is an eternal discussion about the nature of music (is it rhythm? melody? harmony?) and consequently about the role of each instrument in the shaping of the music. A related question is the way a given instrument should be used to produce sound.

None of the obvious answers was to be heard tonight. Russ Lossing sometimes stroke the strings of the grand piano with mallets, something almost classical in avant-garde piano music. I should have been prepared by the concert of Tony Malaby's Tubacello, that I attended with François Loeser in Sons d'hiver a few weeks ago, where John Hollenbeck simultaneously played drums and prepared piano, but the playing of Gerry Hemingway brought me much surprise. He could blow on the heads of the drums, hit them with a woodblock or strange plastic mallets; he could make the cymbals vibrate by pressing the cell bow on it; he could also take the top hi-hat cymbal on the left hand, and then either hit it with a stick, or press it on the snare drum, thereby producing a mixture of snare/cymbal sound; during a long drum roll, he could also vary the pitch of the sound by pressing the drum head with his right foot—can you imagine the scene?

It is while discussing with him in between the two sets that I gradually understood (some of) his musical conception. How everything is about sound and color. That's why he uses an immense palette of tools, to produce the sounds he feels would best fit the music. He also discussed extended technique, by which he means not the kind of drumistic virtuosity that could allow you (unfortunately, not me...) to play the 26 drum rudiments at 300bpm, but by extending the range of sounds he can consistently produce with his “basic Buddy Rich type instrument”—Google a picture of Terry Bozzio's drumkit if you don't see what I mean. He described himself as a colorist, who thinks of his instrument in terms of pitches; he also said how rhythm also exists in negative, when it is not played explicitly. A striking remark because it exactly depicted how I understand the playing of one of my favorite jazz drummers, Paul Motian, but whom I couldn't appreciate until I became able of hearing what he did not play.

The second set did not sound as abstract as the first one. Probably the two blowing instruments helped giving the sound more flesh and more texture. Samuel Blaser, on the trombone, was absolutely exceptional—go listen at once for his Spring Rain album, an alliance of Jimmy Giuffre and contemporary jazz—and Loren Stillman sang very beautiful melodic lines on the alto sax. The four of them could also play in all combinations, and with extremly interesting dynamics, going effortlessly from one to another. And when a wonderful moment of thunder ended abruptly with the first notes of Paul Motian's Etude, music turned into pure emotion.

As was apparently first noticed by Noam Elkies, 2016 is the cardinality of the general linear group over the field with 7 elements, $G=\mathop{\rm GL}(2,\mathbf F_7)$. I was mentoring an agrégation lesson on finite fields this afternoon, and I could not resist having the student check this. Then came the natural question of describing the Sylow subgroups of this finite group. This is what I describe here.

First of all, let's recall the computation of the cardinality of $G$. The first column of a matrix in $G$ must be non-zero, hence there are $7^2-1$ possibilities; for the second column, it only needs to be non-collinear to the first one, and each choice of the first column forbids $7$ second columns, hence $7^2-7$ possibilities. In the end, one has $\mathop{\rm Card}(G)=(7^2-1)(7^2-7)=48\cdot 42=2016$. The same argument shows that the cardinality of the group $\mathop{\rm GL}(n,\mathbf F_q)$ is equal to $(q^n-1)(q^n-q)\cdots (q^n-q^{n-1})=q^{n(n-1)/2}(q-1)(q^2-1)\cdots (q^n-1)$.

Let's go back to our example. The factorization of this cardinal comes easily: $2016=(7^2-1)(7^2-7)=(7-1)(7+1)7(7-1)=6\cdot 8\cdot 7\cdot 6= 2^5\cdot 3^2\cdot 7$. Consequently, there are three Sylow subgroups to find, for the prime numbers $2$, $3$ and $7$.

The cas $p=7$ is the most classical one. One needs to find a group of order 7, and one such subgroup is given by the group of upper triangular matrices $\begin{pmatrix} 1 & * \\ 0 & 1\end{pmatrix}$. What makes things work is that $p$ is the characteristic of the chosen finite field. In general, if $q$ is a power of $p$, then the subgroup of upper-triangular matrices in $\mathop{\rm GL}(n,\mathbf F_q)$ with $1$s one the diagonal has cardinality $q\cdot q^2\cdots q^{n-1}=q^{n(n-1)/2}$, which is exactly the highest power of $p$ divising the cardinality of $\mathop{\rm GL}(n,\mathbf F_q)$.

Let's now study $p=3$. We need to find a group $S$ of order $3^2=9$ inside $G$. There are a priori two possibilities, either $S\simeq (\mathbf Z/3\mathbf Z)^2$, or $S\simeq (\mathbf Z/9\mathbf Z)$.
We will find a group of the first sort, which will that the second case doesn't happen, because all $3$-Sylows are pairwise conjugate, hence isomorphic.

Now, the multiplicative group $\mathbf F_7^\times$ is of order $6$, and is cyclic, hence contains a subgroup of order $3$, namely $C=\{1,2,4\}$. Consequently, the group of diagonal matrices with coefficients in $C$ is isomorphic to $(\mathbf Z/3\mathbf Z)^2$ and is our desired $3$-Sylow.

Another reason why $G$ does not contain a subgroup $S$ isomorphic to $\mathbf Z/9\mathbf Z$ is that it does not contain elements of order $9$. Let's argue by contradiction and consider a matrix $A\in G$ such that $A^9=I$; then its minimal polynomial $P$ divides $T^9-1$. Since $7\nmid 9$, the matrix $A$ is diagonalizable over the algebraic closure of $\mathbf F_7$. The eigenvalues of $A$ are eigenvalues are $9$th roots of unity, and are quadratic over $\mathbf F_7$ since $\deg(P)\leq 2$. On the other hand, if $\alpha$ is a $9$th root of unity belonging to $\mathbf F_{49}$, one has $\alpha^9=\alpha^{48}=1$, hence $\alpha^3=1$ since $\gcd(9,48)=3$. Consequently, $\alpha$ is a cubic root of unity and $A^3=1$, showing that $A$ has order $3$.

It remains to treat the case $p=2$, which I find slightly trickier. Let's try to find elements $A$ in $G$ whose order divides $2^5$. As above, it is diagonalizable in an algebraic closure, its minimal polynomial divides $T^{32}-1$, and its roots belong to $\mathbf F_{49}$, hence satisfy $\alpha^{32}=\alpha^{48}=1$, hence $\alpha^{16}=1$. Conversely, $\mathbf F_{49}^\times$ is cyclic of order $48$, hence contains an element of order $16$, and such an element is quadratic over $\mathbf F_7$, hence its minimal polynomial $P$ has degree $2$. The corresponding companion matrix $A$ in $G$ is an element of order $16$, generating a subgroup $S_1$ of $G$ isomorphic to $\mathbf Z/16\mathbf Z$. We also observe that $\alpha^8=-1$ (because its square is $1$); since $A^8$ is diagonalizable in an algebraic closure with $-1$ as the only eigenvalue, this shows $A^8=-I$.

Now, there exists a $2$-Sylow subgroup containing $S_1$, and $S_1$ will be a normal subgroup of $S$ (because its index is the smallest prime number dividing the order of $S$, which is $2$). This suggests to introduce the normalizer $N$ of $S_1$ in $G$. One then has $S_1\subset S\subset N$. Let $s\in S$ be such that $s\not\in S_1$; then there exists a unique $k\in\{1,\dots,15\}$ such that $s^{-1}As=A^k$, and $s^{-2}As^2=A^{k^2}=A$ (because $s$ has order $2$ modulo $S_1$), hence $k^2\equiv 1\pmod{16}$—in other words, $k\equiv \pm1\pmod 8$.

There exists a natural choice of $s$: the involution ($s^2=I$) which exchanges the two eigenspaces of $A$. To finish the computation, it's useful to take a specific example of polynomial $P$ of degree $2$ whose roots in $\mathbf F_{49}$ are primitive $16$th roots of unity. In other words, we need to factor the $16$th cyclotomic polynomial $\Phi_{16}=T^8+1$ over $\mathbf F_7$ and find a factor of degree $2$; actually, Galois theory shows that all factors have the same degree, so that there should be 4 factors of degree $2$. To explain the following computation, some remark is useful. Let
$\alpha$ be a $16$th root of unity in $\mathbf F_{49}$; we have
$(\alpha^8)^2=1$ but $\alpha^8\neq 1$, hence $\alpha^8=-1$. If $P$
is the minimal polynomial of $\alpha$, the other root is $\alpha^7$,
hence the constant term of $P$ is equal to $\alpha\cdot
\alpha^7=\alpha^8=-1$.

We start from $T^8+1=(T^4+1)^2-2T^4$ and observe that $2\equiv 4^2\pmod 7$, so that $T^8+1=(T^4+1)^2-4^2T^4=(T^4+4T^2+1)(T^4-4T^2+1)$. To find the factors of degree $2$, we remember that their constant terms should be equal to $-1$. We thus go on differently, writing $T^4+4T^2+1=(T^2+aT-1)(T^2-aT-1)$ and solving for $a$: this gives $-2-a^2=4$, hence $a^2=-6=1$ and $a=\pm1$. The other factors are found similarly and we get
\[ T^8+1=(T^2-T-1)(T^2+T-1)(T^2-4T-1)(T^2+4T-1). \]
We thus choose the factor $T^2-T-1$ and set $A=\begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}$.

Two eigenvectors for $A$ are $v=\begin{pmatrix} 1 \\ \alpha \end{pmatrix}$ and $v'=\begin{pmatrix}1 \\ \alpha'\end{pmatrix}$, where $\alpha'=\alpha^7$ is the other root of $T^2-T-1$. The equations for $B$ are $Bv=v'$ and $Bv'=v$; this gives $B=\begin{pmatrix} 1 & 0 \\ 1 & - 1\end{pmatrix}$. The subgroup $S=\langle A,B\rangle$ generated by $A$ and $B$ has order $32$ and is a $2$-Sylow subgroup of $G$.

Generalizing this method involves finding large commutative $p$-subgroups (such as $S_1$) which belong to appropriate (possibly non-split) tori of $\mathop{\rm GL}(n)$ and combining them with adequate parts of their normalizer, which is close to considering Sylow subgroups of the symmetric group. The paper Sylow $p$-subgroups of the classical groups over finite fields with characteristic prime to $p$ by A.J. Weir gives the general description (as well as for orthogonal and symplectic groups), building on an earlier paper in which he constructed Sylow subgroups of symmetric groups. See also the paper Some remarks on Sylow subgroups of the general linear groups by C. R. Leedham-Green and W. Plesken which says a lot about maximal $p$-subgroups of the general linear group (over non-necessarily finite fields). Also, the question was recently the subject of interesting discussions on MathOverflow.

[Edited on Febr. 14 to correct the computation of the 2-Sylow...]

The last ingredient to be discussed is jet spaces.

Differential algebra is seldom used explicitly in algebraic geometry. However, differential techniques have furnished a crucial tool for the study of the Mordell conjecture over function fields (beginning with the proof of this conjecture by Grauert and Manin), and its generalizations in higher dimension (theorem of Bogomolov on surfaces satisfying $c_1^2>3c_2$), or for holomorphic curve (conjecture of Green-Griffiths). They are often reformulated within the language of jet bundles.

Let us assume that $X$ is a smooth variety over a field $k$. Its tangent bundle $T(X)$ is a vector bundle over $X$ whose fiber at a (geometric) point $x$ is the tangent space $T_x(X)$ of $X$ at $x$. By construction, every morphism $f\colon Y\to X$ of algebraic varieties induces a tangent morphism $Tf\colon T(Y)\to T(X)$: it maps a tangent vector $v\in T_y(Y)$ at a (geometric) point $y\in Y$ to the tangent vector $T_yf(v)\int T_{f(y)}(X)$ at $f(y)$. This can be rephrased in the language of differential algebra as follows: for every differential field $(K,\partial)$ whose field of constants contains $k$, one has a derivative map $\nabla_1\colon X(K)\to T(X)(K)$. Here is the relation, where we assume that $K$ is the field of functions of a variety $Y$. A derivation $\partial$ on $K$ can be viewed as a vector field $V$ on $Y$, possibly not defined everywhere; replacing $Y$ by a dense open subset if needed, we assume that it is defined everywhere. Now, a point $x\in X(K)$ can be identified with a rational map $f\colon Y\dashrightarrow X$, defined on an open subset $U$ of $Y$. Then, we simply consider the morphism from $U$ to $T(X)$ given by $p\mapsto T_pf (V_p)$. At the level of function fields, this is our point $\nabla_1(x)\in T(X)(K)$.

If one wants to look at higher derivatives, the construction of the tangent bundle can be iterated and gives rise to jet bundles which are varieties $J_m(X)$, defined for all integers $m\geq 0$, such that $J_0(X)=X$, $J_1(X)=T(X)$, and for $m\geq 1$, $J_m(X)$ is a vector bundle over $J_{m-1}X$ modelled on the $m$th symmetric product of $\Omega^1_X$. For every differential field $(K,\partial)$ whose field of constants contains $k$, there is a canonical $m$th derivative map $\nabla_m\colon X(K) \to J_m(X) (K)$.

The construction of the jet bundles can be given so that the following three requirements are satisfied:

If $X=\mathbf A^1$ is the affine line, then $J_m(X)$ is an affine space of dimension $m+1$, and $\nabla_m$ is just given by $ \nabla_m (x) = (x,\partial(x),\dots,\partial^m(x)) $ for $x\in X(K)=K$;

Products: $J_m(X\times Y)=J_m(X)\times_k J_m(Y)$;

Open immersions: if $U$ is an open subset of $X$, then $J_m(U)$ is an open subset of $X$ given by the preimage of $U$ under the projection $J_m(X)\to J_{m-1}(X)\to \dots\to J_0(X)=X$.

When $X$ is an algebraic group, with origin $e$, then $J_m(X) $ is canonically isomorphic to the product of $X$ by the affine space $J_m(X)_e$ of $m$-jets at $e$.

We now describe Scanlon's application.

Let $G$ be a complex algebraic group acting on a complex algebraic variety $X$; let $S\colon X\to Z$ be the corresponding generalized Schwarzian map. Here, $Z$ is a complex algebraic variety, but $S$ is a differential map of some order $m$. In other words, there exists a constructible algebraic map $\tilde S\colon J_m(X)\to Z$ such that $S(x)=\tilde S(\nabla_m(x))$ for every differential field $(K,\partial)$ and every point $x\in X(K)$.

Let $U$ be an open subset of $X(\mathbf C)$, for the complex topology, and let $\Gamma$ be a Zariski dense subgroup of $G(\mathbf C)$ which stabilizes $U$. We assume that there exists a complex algebraic variety $Y$ and a biholomorphic map $p\colon \Gamma\backslash U \to Y(\mathbf C)$.

Locally, every open holomorphic map $\phi\colon\Omega\to Y(\mathbf C)$ can be lifted to a holomorphic map $\tilde\phi\colon \Omega\to U$. Two liftings differ locally by the action of an element of $\Gamma$, so that the composition $S\circ\tilde\phi$ does not depend on the choice of the lifting, by definition of the generalized Schwarzian map $S$. This gives a well-defined differential-analytic map $T\colon Y\to Z$. Let $m$ be the maximal order of derivatives appearing in a formula defining $T$. Then one may write $T\circ\phi =\tilde T\circ \nabla_m\tilde\phi$, where $\tilde T$ is a constructible analytic map from $J_m(Y)$ to $Z$.

Theorem (Scanlon). — Assume that there exists a fundamental domain $\mathfrak F\subset U$ such that the map $p|_{\mathfrak F}\colon \mathfrak F\to Y(\mathbf C)$ is definable in an o-minimal structure. Then $T$ is differential-algebraic: there exists a constructible map $\tilde T\colon J_m(Y)\to Z$ such that $T\circ \phi=\tilde T \circ J_m(\phi)$ for every $\phi$ as above.

For the proof, observe that the map $\tilde T$ is definable in an o-minimal structure, because it comes, by quotient of a definable map from the preimage in $J_m(U)$ of $\mathfrak F$, and o-minimal structures allow elimination of imaginaries. By the theorem of Peterzil and Starchenko, it is constructible algebraic.

Here is a well-but-ought-to-be-better known theorem.

Theorem. —Let $\ell$ be a prime number and let $G$ be a compact subgroup of $\mathop{\rm GL}_d(\overline{\mathbf Q_\ell})$. Then there exists a finite extension $E$ of $\mathbf Q_\ell$ such that $G$ is contained in $\mathop{\rm GL}_d(E)$.

Before explaining its proof, let us recall why such a theorem can be of any interest at all. The keyword here is Galois
representations.

It is now a well-established fact that linear representations are an extremly useful tool to study groups. This is standard for finite groups, for which complex linear representations appear at one point or another of graduate studies, and its topological version is even more classical for the abelian groups $\mathbf R/\mathbf Z$ (Fourier series) and $\mathbf R$ (Fourier integrals). On the other hand, some groups are extremly difficult to grasp while their representations are ubiquitous, namely the absolute Galois groups $G_K=\operatorname{Gal}(\overline K/K)$ of fields $K$.

With the notable exception of real closed fields, these groups are infinite and have a natural (profinite) topology with open subgroups the groups $\operatorname{Gal}(\overline K/L)$, where $L$ is a finite extension of $K$ lying in $\overline K$. It is therefore important to study their continuous linear representations. Complex representations are important but since $G_K$ is totally discontinuous, their image is always finite. Therefore, $\ell$-adic representations, namely continuous morphisms from $G_K$ to $\mathop{\rm GL}_d(\mathbf Q_\ell)$, are more important. Here $\mathbf Q_\ell$ is the field of $\ell$-adic numbers.

Their use goes back to Weil's proof of the Riemann hypothesis for
curves over finite fields, via the action on $\ell^\infty$-division
points of its Jacobian variety. Here $\ell$ is a prime different from
the characteristic of the ground field. More generally, every Abelian
variety $A$ over a field $K$ of characteristic $\neq\ell$ gives rise to a
Tate module $T_\ell(A)$ which is a free $\mathbf Z_\ell$-module of rank
$d=2\dim(A)$, endowed with a continuous action $\rho_{A,\ell}$ of $G_K$. Taking a
basis of $T_\ell(A)$, one thus has a continuous morphism $G_K\to
\mathop{\rm GL}_d(\mathbf Z_\ell)$, and, embedding $\mathbf Z_\ell$ in
the field of $\ell$-adic numbers, a continuous morphism
$G_K\to\mathop{\rm GL}_d(\mathbf Q_\ell)$. Even more generally, one can consider the $\ell$-adic étale cohomology of algebraic varieties over $K$.

For various
reasons, such as the need to diagonalize additional group actions, one can be led to consider similar representations where
$\mathbf Q_\ell$ is replaced by a finite extension of $\mathbf Q_\ell$,
or even by the algebraic closure $\overline{\mathbf Q_\ell}$. Since $G_K$ is a compact topological groups, its image by a continuous representation $\rho\colon G_K\to\mathop{\rm GL}_d(\overline{\mathbf Q_\ell}$ is a compact subgroup of $\mathop{\rm GL}_d(\overline{\mathbf Q_\ell}$ to which the above theorem applies.

This being said for the motivation, one proof (attributed to Warren Sinnott) is given by Keith Conrad in his short note, Compact subgroups of ${\rm GL}_n(\overline{\mathbf Q}_p)$. In fact, while browsing at his large set of excellent expository notes, I fell on that one and felt urged to write this blog post.

The following proof had been explained to me by Jean-Benoît Bost almost exactly 20 years ago. I believe that it ought to be much more widely known.

It relies on the Baire category theorem and on Krasner's lemma.

Lemma 1 (essentially Baire). — Let $G$ be a compact topological group and let $(G_n)$ be an increasing sequence of closed subgroups of $G$ such that $\bigcup G_n=G$. There exists an integer $n$ such that $G_n=G$.

Proof. Since $G$ is compact Hausdorff, it satisfies the Baire category theorem and there exists an integer $m$ such that $G_m$ contains a non-empty open subset $V$. For every $g\in V$, then $V\cdot g^{-1}$ is an open neighborhood of identity contained in $G_m$. This shows that $G_n$ is open in $G$. Since $G$ is compact, it has finitely many cosets $g_iG_m$ modulo $G_m$; there exists an integer $n\geq m$ such that $g_i\in G_n$ for every $i$, hence $G=G_n$. QED.

Lemma 2 (essentially Krasner). — For every integer $d$, the set of all extensions of $\mathbf Q_\ell$ of degree $d$, contained in $\overline{\mathbf Q_\ell}$, is finite.

Proof. Every finite extension of $\mathbf Q_\ell$ has a primitive element whose minimal polynomial can be taken monic and with coefficients in $\mathbf Z_\ell$; its degree is the degree of the polynomial. On the other hand, Krasner's lemma asserts that for every such irreducible polynomial $P$, there exist a real number $c_P$ for every monic polynomial $Q$ such that the coefficients of $Q-P$ have absolute values $<c_P$, then $Q$ has a root in the field $E_P=\mathbf Q_\ell[T]/(P)$. By compactness of $\mathbf Z_\ell$, the set of all finite subextensions of given degree of $\overline{\mathbf Q_\ell}$ is finite. QED.

Let us now give the proof of the theorem. Let $(E_n)$ be a increasing sequence of finite subextensions of $\overline{\mathbf Q_\ell}$ such that $\overline{\mathbf Q_\ell}=\bigcup_n E_n$ (lemma 2; take for $E_n$ the subfield generated by $E_{n-1}$ and all the subextensions of degree $n$ of $\overline{\mathbf Q_\ell}$). Then $G_n=G\cap \mathop{\rm GL}_d(E_n)$ is a closed subgroup of $G$, and $G$ is the increasing union of all $G_n$. By lemma 1, there exists an integer $n$ such that $G_n=G$. QED.