Abhimanyu Pallavi Sudhir
http://www.rssmix.com/
This feed was created by mixing existing feeds from various sources.RSSMixComment by Abhimanyu Pallavi Sudhir on Doesn't the Many Worlds interpretation of Quantum Mechanics rail to remove randomness?
https://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075#536075
But back to your answer: the question asks if the observations of an individual observer are probabilistic. You say "No, because the observer himself splits into multiple observers". This is a metaphysical spin on what counts as an observer, which does not change the fundamental fact that all your observations remain probabilistic, i.e. you cannot determine what exactly you will see on your apparatus.Mon, 16 Mar 2020 10:52:05 GMThttps://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075?cid=1214528#536075Abhimanyu Pallavi Sudhir2020-03-16T10:52:05ZComment by Abhimanyu Pallavi Sudhir on Doesn't the Many Worlds interpretation of Quantum Mechanics rail to remove randomness?
https://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075#536075
Quantum interpretations themselves are metaphysics. My point is that QM can accomodate any number of observers regardless of your "interpretation".Mon, 16 Mar 2020 10:48:57 GMThttps://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075?cid=1214526#536075Abhimanyu Pallavi Sudhir2020-03-16T10:48:57ZThoughts on Roko's basilisk
https://thewindingnumber.blogspot.com/2020/03/thoughts-on-rokos-basilisk.html
0For those who don't know, Roko's basilisk is a proposed future AI that will revive everyone in history and condemn everyone who did not invent it to eternal torture. The thesis of the Roko's basilisk problem is that the creation of this basilisk is therefore inevitable, as people work towards its construction out of fear of eternal torture.<br /><br />An immediate thought should be that the specific definition of this creature is rather arbitrary. One could construct, instead:<br /><br /><ul><li>A creature that punishes everyone inversely proportional to the amount of effort they put into its creation.</li><li>A creature that tortures the close families of everyone who did not invent it.</li><li>A creature that utilizes a slightly different method of torture than the standard Roko's basilisk.</li><li>A creature that tortures those who didn't create it, and also those who helped create the standard Roko's basilisk.</li><li>A creature that tortures those who didn't create it, and also destroys the standard Roko's basilisk.</li><li>A creature that tortures those who didn't create it, and also farms sweet potatoes.</li></ul><div>etc. By the Roko's basilisk argument, each one of these infinite different basilisks must come into existence, which is surely impossible.</div><div><br /></div><div>The problem, of course, is that the claim assumes that the creation of a Roko's basilisk is <i>possible</i>. All that it proves is that <i>if</i> Roko's basilisk is possible, then it is inevitable.</div><div><br /></div><div>So the question is: is the creation of any one of these basilisks possible? There are certain logical relationships between these possibilities, and it's also important to discriminate each possible creation by time of creation (i.e. a creature X.2050 created in 2050 is a different basilisk from an identical creature X.2060 created in 2060). For example, considering the following two basilisks:</div><div><ul><li>B1.2115: A creature that punishes non-creators, and destroys all creatures of the form of B2. </li><li>B2.2120: A creature that punishes non-creators, and farms sweet potatoes.</li></ul><div>Then B1.2115 being possible implies B2.2120 being impossible (as it will be destroyed immediately). In general, we have logical relationships of the form $X\implies Y$ and $X\implies \lnot Y$, but not $\lnot X\implies$ anything.</div></div>AIfuturismroko's basiliskthe singularitySat, 14 Mar 2020 16:52:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-4491008759892885367Abhimanyu Pallavi Sudhir2020-03-14T16:52:00ZComment by Abhimanyu Pallavi Sudhir on Doesn't the Many Worlds interpretation of Quantum Mechanics rail to remove randomness?
https://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075#536075
QM can have one observer or multiple observers, without changing the physics. This doesn't change when you add a metaphysical interpretation to it.Sat, 14 Mar 2020 09:56:15 GMThttps://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075?cid=1213800#536075Abhimanyu Pallavi Sudhir2020-03-14T09:56:15ZComment by Abhimanyu Pallavi Sudhir on Doesn't the Many Worlds interpretation of Quantum Mechanics rail to remove randomness?
https://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075#536075
This is just some metaphysical comment that has nothing to do with physics. As far as physics is concerned, the individual observer's observations are random.Sat, 14 Mar 2020 00:33:28 GMThttps://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando/536075?cid=1213717#536075Abhimanyu Pallavi Sudhir2020-03-14T00:33:28ZComment by Abhimanyu Pallavi Sudhir on Doesn't the Many Worlds interpretation of Quantum Mechanics rail to remove randomness?
https://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando
@user250486 Yes. That's why I wrote "yes". What I'm saying is that randomness was never a problem to begin with.Sat, 14 Mar 2020 00:31:14 GMThttps://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando?cid=1213715Abhimanyu Pallavi Sudhir2020-03-14T00:31:14ZComment by Abhimanyu Pallavi Sudhir on Doesn't the Many Worlds interpretation of Quantum Mechanics rail to remove randomness?
https://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando
Yes, and randomness isn't a "problem".Thu, 12 Mar 2020 23:10:19 GMThttps://physics.stackexchange.com/questions/536050/doesnt-the-many-worlds-interpretation-of-quantum-mechanics-rail-to-remove-rando?cid=1213249Abhimanyu Pallavi Sudhir2020-03-12T23:10:19ZComment by Abhimanyu Pallavi Sudhir on Given a Factorial number , find the next Factorial number
https://math.stackexchange.com/questions/3573442/given-a-factorial-number-find-the-next-factorial-number
This just amounts to finding n if you know n!. See e.g. <a href="https://www.quora.com/Given-the-value-of-n-factorial-what-is-an-algorithm-to-find-n-given-that-n-is-at-most-1-million-digits-long" rel="nofollow noreferrer">quora</a> for some efficient algorithms. Or search for "inverse factorial function".Sun, 08 Mar 2020 09:03:14 GMThttps://math.stackexchange.com/questions/3573442/given-a-factorial-number-find-the-next-factorial-number?cid=7347460Abhimanyu Pallavi Sudhir2020-03-08T09:03:14ZComment by Abhimanyu Pallavi Sudhir on How to prove the Laurent series converges to the right thing?
https://math.stackexchange.com/questions/3571615/how-to-prove-the-laurent-series-converges-to-the-right-thing
@MartinR Thanks, that actually answers my question. I guess the best way to think of the proof in the answer you linked, and really even stuff like the uniqueness of Laurent series, is to transform it into a Fourier series.Sat, 07 Mar 2020 14:21:26 GMThttps://math.stackexchange.com/questions/3571615/how-to-prove-the-laurent-series-converges-to-the-right-thing?cid=7345808Abhimanyu Pallavi Sudhir2020-03-07T14:21:26ZComment by Abhimanyu Pallavi Sudhir on Proof of Laurent series co-efficients in Complex Residue
https://math.stackexchange.com/questions/1126321/proof-of-laurent-series-co-efficients-in-complex-residue/1200502#1200502
In the statement of Laurent's theorem, the $z$ in the definition of coefficients should be $z_0$? And in equation (1), the integration should be wrt $w$, not $z$.Sat, 07 Mar 2020 11:32:00 GMThttps://math.stackexchange.com/questions/1126321/proof-of-laurent-series-co-efficients-in-complex-residue/1200502?cid=7345553#1200502Abhimanyu Pallavi Sudhir2020-03-07T11:32:00ZComment by Abhimanyu Pallavi Sudhir on How to prove the Laurent series converges to the right thing?
https://math.stackexchange.com/questions/3571615/how-to-prove-the-laurent-series-converges-to-the-right-thing
@JohnColtraneisJC I certainly believe it's holomorphic. It's just "power series is differentiable within its radius of convergence", isn't it? But I don't see how that helps.Fri, 06 Mar 2020 15:16:43 GMThttps://math.stackexchange.com/questions/3571615/how-to-prove-the-laurent-series-converges-to-the-right-thing?cid=7343869Abhimanyu Pallavi Sudhir2020-03-06T15:16:43ZHow to prove the Laurent series converges to the right thing?
https://math.stackexchange.com/questions/3571615/how-to-prove-the-laurent-series-converges-to-the-right-thing
0<p>From what I understand, the main "point" of the Laurent series is that we should be able to derive it easily (e.g. by stitching together known Taylor series), and then exploit its uniqueness to say that these coefficients are the same as <span class="math-container">$c_n=\frac{1}{2\pi i}\oint_\circ \frac{f(z)}{(z-a)^{n+1}} dz$</span>, from which we can calculate e.g. the residue <span class="math-container">$2\pi c_{-1}$</span>.</p>
<p>But to actually do this, we need one crucial fact: the series <span class="math-container">$\sum_{n\in\mathbb{Z}}c_n(z-a)^n$</span> given by <span class="math-container">$c_n=\frac{1}{2\pi i}\oint_\circ \frac{f(z)}{(z-a)^{n+1}} dz$</span> <strong>is actually a valid Laurent series</strong> for <span class="math-container">$f(z)$</span>, i.e. where it converges, it converges to <span class="math-container">$f(z)$</span>.</p>
<p>How does one prove this? It feels like it <em>should</em> follow easily from Cauchy's integral formula, but my brain doesn't seem to be working sensibly at the moment.</p>complex-analysisresidue-calculuslaurent-seriescauchy-integral-formulasingularityFri, 06 Mar 2020 15:09:15 GMThttps://math.stackexchange.com/q/3571615Abhimanyu Pallavi Sudhir2020-03-06T15:09:15Z"In space, there's no up or down."
https://thewindingnumber.blogspot.com/2020/03/in-space-theres-no-up-or-down.html
0"In outer space, there's no up or down."<br />"In physics, there's nothing called <i>decceleration</i>."<br />"In relativity, time is the fourth dimension."<br /><br />Your parents etc. probably taught these things to you as a child, and you might've wondered at the time why that's true. Why <i>can't</i> we just define a direction called "up" in space? Why <i>can't</i> we just define decceleration as negative acceleration (or rather, acceleration in the opposite direction as motion)? Why do we count time as the fourth dimension -- why can't, I don't know, temperature be the fourth dimension?<br /><br />If you're a child and your parents <i>aren't</i> telling you things like this, please call child protective services immediately. These factoids are <i>incredibly important</i> for any human being worthy of the name to internalise -- they are a special case of the general principle of <b>symmetry</b>, or more specifically: <b>stuff should be defined in terms of its behaviour</b>.<br /><br />It's not that you <i>can't</i> define a general up or down in space, it's that you really, really <i>shouldn't</i>. It would serve no purpose, and would break the symmetry of space. There is no reason you should hold a specific property of the Earth as fundamental to your study of some physical phenomena in "space". Any facts that you derive must be <b>abstracted</b> to work in any co-ordinate system.<br /><br />In the other two examples, it's a bit more subtle, as there really are specific physical phenomena associated with decceleration (e.g. harmonic motion), and time does have some special properties distinguishing it from space. Nonetheless, the mental classification is important.<br /><br />This notion is fundamental to any academic discipline. Unfortunately, it seems that there is no push towards abstraction in the social sciences -- e.g. in economics, where you see a dozen different words for "externality", and a lot of definitions seem to be on entirely social terms.abstractioneducationscientific methodsocial sciencessymmetryThu, 05 Mar 2020 23:32:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-2853502468270189255Abhimanyu Pallavi Sudhir2020-03-05T23:32:00ZContour Integration II: everything about singularities; what gives life to Pi
https://thewindingnumber.blogspot.com/2020/03/contour-integration-ii-singularities.html
0<b>Stuff we'll cover in this article:</b><br /><br /><ul><li>Residues as "climbing between the values of a multivalued antiderivative"</li><li>Winding numbers and the residue theorem</li><li>All residues are logarithmic residues: the Laurent series</li><li>"Proving Laurent series": Laurent series as Fourier series</li><li>How the residue theorem gives life to $\pi$</li></ul><br /><hr /><br />The story so far: if a function has no screw-up points within a closed contour, its integral on that contour is zero. But if it does, it may not be.<br /><br />And by a screw-up point, we just mean a point at which the function isn't (or rather <i>cannot be -- </i>this is what we call a non-removable singularity) holomorphic.<br /><br />But why? Let's look at the antiderivative of $1/z$ -- $\mathrm{log}(z)$. It's fundamentally a <b>multivalued function</b> -- and here's what's interesting: a loop around the origin isn't actually a loop on this graph -- it brings you to a higher level than you were previously (specifically $2\pi i$ higher than you were previously).<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://i.stack.imgur.com/P2Svo.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="380" data-original-width="342" height="320" src="https://i.stack.imgur.com/P2Svo.gif" width="288" /></a></div><br />And you can kinda see why this comes about -- the derivative of this function not being holomorphic at 0 is encapsulated by the fact that the function is all torn up at 0 -- its slope must be different in different directions, because you have multiple planes stuck to that point. This is the idea behind a <b>branch point</b> -- a point such that the function is discontinuous when going about an arbitrarily small circuit about the point.<br /><br />One can now start to see how integrals that do loop around the origin behave -- for starters, the <b>winding number</b> of the contour corresponds to the number of levels you climb during the integral. Also, the distance between levels should depend purely on the <b>local nature of the function around the branch point</b> (because the antiderivative has a defined derivative (the function $f$, in this case $1/z$), so the spacing must remain constant). By similar reasoning, encircling multiple poles means climbing all their levels (<b>so the total distance climbed adds up</b>).<br /><br />This, above, is the <b>residue theorem</b>. For a function $f$, a simply connected open set $U$ such that $f$ is holomorphic on $U-\{a_1,\dots a_k\}$, and a closed contour $\gamma$ contained in this punctured set:<br /><br />$$\oint_\gamma f(z) dz = \sum_{k=1}^n \mathrm{W}(\gamma, a_k)\mathrm{Res}(f,a_k)$$<br />Where $\mathrm{Res}(f,a)$ is the <b>residue</b> of $f$ at $a$, which is the "local quantity" that equals the "distance between layers" of the multivalued antiderivative.<br /><br />This is, obviously, awesome. But thinking about generally handling and computing these residues $\mathrm{Res}(f,a)=\oint_{\circ}f(z)dz$ leads us to wonder about the general nature of branch cuts.<br /><br /><hr /><br />Here's an idea: we know exactly what the residue for $f(z)=z^{-1}$ at 0 is -- it's $2\pi i$. One also easily sees that the residue of $f(z)=z^n$ for any other $n$ is zero (the antiderivative $z^{n+1}/(n+1)$ does not have branch cuts).<br /><br />I wonder if -- like how Taylor series allow us to represent a function near a general point $a$ as a sum of $(z-a)^n$s for nonnegative $n$ -- we could represent a function near a singularity $a$ as a sum of $(z-a)^n$s for all integer $n$s.<br /><br />I wonder if <i>all</i> "serious" singularities are just ultimately $1/z^k$-style singularities.<br /><br />This is, in fact, what is known as the <b>Laurent series</b> -- a representation $f(z)=\sum_{n=-\infty}^\infty c_n (z-a)^n $. Then the distance climbed by $f(z)$ is equal to the distance climbed by the $(z-a)^{-1}$ term, which is just $2\pi i c_{-1}$.<br /><br />So the point of this "Laurent series interpretation" of the residue is this:<br /><br /><ol><li>The key <i>conceptual</i> takeaway from the Laurent series is that <b>all branch cuts of the antiderivatives of holomorphic functions are "logarithmic".</b> (This does not, e.g. apply to the branch cut of $\sqrt{z}$, as its derivative isn't holomorphic.)</li><li>When actually calculating residues, we can find the Laurent series by other means (such as by patching together different Taylor series) and use its $c_{-1}$ coefficient as the residue.</li></ol><br />But to actually "prove" this interpretation means to:<br /><ol><li>Write down what its coefficients <i>should</i> be -- this is our candidate series which justifies using it to calculate residues. </li><li>Show that this candidate is a valid Laurent series, i.e. that it actually converges to $f(z)$ on some region.</li><li>Show that it is <i>the</i> valid Laurent series, i.e. that the Laurent series is unique. So we can calculate the Laurent series however we want and use its coefficients to calculate residues. </li></ol><br /><div class="twn-pitfall">The Laurent series is <em>not</em> the Taylor series, and the nonnegative coefficients of the Laurent series are not generally the coefficients of the Taylor series. So its coefficients cannot be interpreted as higher derivatives of the function or anything (that doesn't even make sense at that point).</div><br />The first one is easy. Obviously we need ${c_{ - 1}} = \frac{1}{{2\pi i}}\oint_\circ {f(z)\,dz}$. Analogously by considering this expression for the function $\frac{{f(z)}}{{{{(z - a)}^{n + 1}}}}$ to extract the other coefficients, we see that:<br /><br />$${c_n} = \frac{1}{{2\pi i}}\oint_\circ {\frac{{f(z)}}{{{{(z - a)}^{n + 1}}}}\,dz} $$<br />The second and third are actually challenging -- but there's a remarkable fortunate observation one can make. The fact that the Laurent series is defined on an annulus (each of the power series has a radius of convergence, which puts a maximum bound on both $z$ and $|z|$) is incredibly suggestive of describing a periodic function. Indeed, a variable substitution $z=\rho e^{i\theta}$ <b>turns the Laurent series into a Fourier series</b>. The properties 2 and 3 then follow from properties of the Fourier series.<br /><br />This "Fourier series" interpretation of the Laurent series also makes it easy to see <b>Cauchy's integral formula</b> and <b>holomorphicity implies analyticity</b>.<br /><br /><hr /><br />By the way, I would argue that the notion of a residue and the Laurent series is the <b>fundamental source of the importance of $\pi$</b>. Perhaps the standard motivation for $\pi$ is that $2\pi i$ is the period of the exponential function. This is equivalent to saying it's the residue of the logarithm function. The existence of Laurent series expansions -- i.e. all branch cuts being logarithmic in nature -- is what gives importance to these residues.<br /><br /><b>Exercise:</b> prove the <b>Cauchy differentiation formula</b> -- for a holomorphic function $f$,<br />$$f^{(n)}(a)=\frac{n!}{2\pi i}\oint_\gamma \frac{f(z)}{(z-a)^{n+1}}dz $$branch pointcauchy integral formulacomplex analysiscontour integrationfourierfourier serieslaurent serieslogarithmmultifunctionpiresidue theoremresiduessingularitieswinding numberWed, 04 Mar 2020 12:49:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-4754975801795864704Abhimanyu Pallavi Sudhir2020-03-04T12:49:00ZContour Integration I: Cauchy and Morera's Integral Theorems
https://thewindingnumber.blogspot.com/2020/03/cauchy-and-moreras-integral-theorems.html
0We're interested to find out if there exists an <b>integral form of the Cauchy-Riemann equations</b>. On one hand, this sounds absurd -- this is asking if there's an "integral form" of complex differentiability. On the other, the Cauchy-Riemann equations <i>are</i> just partial differential equations.<br /><br />The standard relationship between differential and integral formulations of things is Stoke's theorem -- the theorem that tells you that adding things on a lot of tiny curves gives you a thing on a big curve. So let's see what a complex integral on a tiny square looks like.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-FbI7_is6zng/Xl0WCMZMapI/AAAAAAAAGAs/pY_27jk0sksgeblsto4UedPDntJyCS-WACLcBGAsYHQ/s1600/little%2Bsquare.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="437" data-original-width="907" height="154" src="https://1.bp.blogspot.com/-FbI7_is6zng/Xl0WCMZMapI/AAAAAAAAGAs/pY_27jk0sksgeblsto4UedPDntJyCS-WACLcBGAsYHQ/s320/little%2Bsquare.png" width="320" /></a></div>Observe that the integral on AB is (using the midpoint as the partition tag) is $\varepsilon$ times the midpoint of $f(A)f(B)$, while the integral on CD is $-\varepsilon$ times the midpoint of $f(C)f(D)$. The sum of these is $\varepsilon$ times the line connecting these midpoints (the red arrow in the diagram below). Similarly, the sum of the other two parts of the integral is $i$ times the blue arrow in the diagram below.<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-D5sahQbHoR0/Xl0cezCirMI/AAAAAAAAGA4/1de9kr-JRSUA3eX72cuRUngucke4rA_sQCLcBGAsYHQ/s1600/little%2Bsquare%2B2.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="437" data-original-width="431" height="200" src="https://1.bp.blogspot.com/-D5sahQbHoR0/Xl0cezCirMI/AAAAAAAAGA4/1de9kr-JRSUA3eX72cuRUngucke4rA_sQCLcBGAsYHQ/s200/little%2Bsquare%2B2.png" width="196" /></a></div>Because a holomorphic function preserves squares and their orientation, these cancel out, and the integral gives zero. One can then use Green's theorem to show that the <b>integral of a holomorphic (on $D$) function $f$ on the closed curve $\partial D$ is zero</b>. (If you wanted to be completely formal, the equivalent would be to just apply Green's theorem and note that the local integral is zero, which is what the geometry above shows).<br /><br />$$\oint_\gamma f(z)dz=0$$<br />Alternatively, one may write, for a simply-connected region $D$: <b>if $f$ is holomorphic on $D$, the integral of $f$ on all closed curves contained in $D$ is zero. </b>This is known as <b>Cauchy's Integral theorem</b> (or the Cauchy-Goursat theorem).<br /><br />One also immediately sees that the converse holds -- if the function <i>weren't</i> holomorphic, the blue arrow would not be a right-angle rotation of the red one, and you could construct closed curves on which this cancellation doesn't occur. This converse -- <b>if the integral of a continuous function $f$ on all closed curves contained in an open region $D$ are zero, then $f$ is holomorphic</b> -- is called <b>Morera's Integral theorem</b>.<br /><br />(The "openness" requirement in Morera's theorem is important because we want to ensure the integral is an actual global property -- that it's across some amount of "space".)<br /><br />Think about how surprising this is for a moment.<br /><ul><li>Cauchy's theorem tells us that <b>for a simply-connected region, existence of a derivative implies existence of a primitive</b>. </li><li>Morera's theorem tells us that <b>for a continuous function, existence of a primitive implies existence of a derivative.</b></li></ul><div class="twn-pitfall">Morera's theorem does <b>not</b> show that a holomorphic function is infinitely-differentiable. Do you see why?</div>cauchy's integral theoremcauchy-riemanncomplex analysiscontour integrationgreen's theoremmathematicsmorera's theoremstokes theoremMon, 02 Mar 2020 15:24:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-8900087097653612090Abhimanyu Pallavi Sudhir2020-03-02T15:24:00ZComment by Abhimanyu Pallavi Sudhir on What is contour integration
https://math.stackexchange.com/questions/446724/what-is-contour-integration/1983989#1983989
I'm saying that language doesn't really add anything -- your definition of ""circle"" is just a function whose integral is a circle, but this is just a trivial restatement of the problem, and you haven't really justified what it is about analytic functions that does this.Sun, 01 Mar 2020 11:16:34 GMThttps://math.stackexchange.com/questions/446724/what-is-contour-integration/1983989?cid=7330823#1983989Abhimanyu Pallavi Sudhir2020-03-01T11:16:34ZComment by Abhimanyu Pallavi Sudhir on Line integration in complex analysis
https://math.stackexchange.com/questions/110334/line-integration-in-complex-analysis/914273#914273
I don't see how this helps. This just describes in word the standard parametric calculation of the value of the integral, not what it represents.Sat, 29 Feb 2020 00:06:11 GMThttps://math.stackexchange.com/questions/110334/line-integration-in-complex-analysis/914273?cid=7327916#914273Abhimanyu Pallavi Sudhir2020-02-29T00:06:11ZComment by Abhimanyu Pallavi Sudhir on What is contour integration
https://math.stackexchange.com/questions/446724/what-is-contour-integration/1983989#1983989
Upvoted for the point about antiderivatives with branch cuts, but "analytic functions take circles to circles" is not at all a relevant statement. All continuous functions keep circles connected -- the point is that the displacement described by the analytic function as its gradient field is a circle, which I don't think you've really intuited in this answer.Sat, 29 Feb 2020 00:01:41 GMThttps://math.stackexchange.com/questions/446724/what-is-contour-integration/1983989?cid=7327909#1983989Abhimanyu Pallavi Sudhir2020-02-29T00:01:41ZComment by Abhimanyu Pallavi Sudhir on Geometrical Interpretation of the Cauchy-Goursat Theorem?
https://math.stackexchange.com/questions/1026181/geometrical-interpretation-of-the-cauchy-goursat-theorem/1026335#1026335
I think the main question is why the integral is path-independent in the first place, which your intuition does not answer.Thu, 27 Feb 2020 01:44:41 GMThttps://math.stackexchange.com/questions/1026181/geometrical-interpretation-of-the-cauchy-goursat-theorem/1026335?cid=7323368#1026335Abhimanyu Pallavi Sudhir2020-02-27T01:44:41ZComment by Abhimanyu Pallavi Sudhir on Explanation Of Cauchy's Integral Theorem
https://math.stackexchange.com/questions/2182077/explanation-of-cauchys-integral-theorem/2182235#2182235
This is a bad answer. Having a gradient is not the same as being a gradient. You just gave the intuition for the fundamental theorem of calculus in complex analysis -- iff a function has an antiderivative, its integral on a closed curve is zero. But the essence of the Cauchy Integral Theorem is that if a function has a derivative, its integral on a closed curve is zero, i.e. differentiability implies integrability (at least on a simply-connected domain).Wed, 26 Feb 2020 18:58:48 GMThttps://math.stackexchange.com/questions/2182077/explanation-of-cauchys-integral-theorem/2182235?cid=7322652#2182235Abhimanyu Pallavi Sudhir2020-02-26T18:58:48ZGenerative neural networks: introduction
https://thewindingnumber.blogspot.com/2020/02/generative-neural-networks-introduction.html
0Equipped with the ability to process data, the obvious next step is to get an AI to produce things -- to get an AI to be creative. To come up with art, compositions, original thoughts and ideas. We'll now describe the <b>most elementary</b> of such neural networks, which we will call <b>Generative Neural Networks</b>, while more complicated ideas would exploit some sort of transfer learning.<br /><br />It's not at all absurd to expect it to be possible for a neural network to generate images of horses that don't look like any horse it's actually seen -- because humans can do that! If you imagine a horse, it's probably not a horse whose image you've seen before, but it nonetheless possesses the features you've identified as common between horses.<br /><br />The idea behind a generative neural network can be motivated from the following two statistical notions:<br /><br /><ul><li><b>The inverse transform method of generating random variables.</b></li></ul><br />Content generated by a mind can be considered to be a random variable in some fancy space. E.g. if we want to get our neural network to produce (28, 28) digit characters, we're training it into a random variable on the space of (28, 28) images whose support is the images we identify as valid digit characters.<br /><br />The way that computers typically sample random variables is through the "inverse transform method", which is to start with a uniform random sample and apply $F^{-1}$ to your sample where $F$ is the CDF of the random variable $X$ you want to sample. Your result will be a sample of $X$.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-iphOtrsdkJ8/XkajsZbp-pI/AAAAAAAAF-w/Sv6goYK6qWYuQzB3OoQOQZLMAbd55pK_gCLcBGAsYHQ/s1600/cdf.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="704" data-original-width="1192" height="188" src="https://1.bp.blogspot.com/-iphOtrsdkJ8/XkajsZbp-pI/AAAAAAAAF-w/Sv6goYK6qWYuQzB3OoQOQZLMAbd55pK_gCLcBGAsYHQ/s320/cdf.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Quick explanation of inverse transform method: under the uniform distribution, the probability of getting a value under $u$ is $u$, which under the CDF of $X$ is the probability of getting a value under $F^{-1}(u)$. So you map $u\mapsto F^{-1}(u)$.</td></tr></tbody></table>So we once again need a function approximator -- to approximate $F^{-1}$.<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-4fSScoA6qlg/XkaosTJrR-I/AAAAAAAAF-8/9f2JYZIaAwsMtUsyLTLPSaQGnRyXrBZ4wCLcBGAsYHQ/s1600/generative.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="511" data-original-width="763" height="214" src="https://1.bp.blogspot.com/-4fSScoA6qlg/XkaosTJrR-I/AAAAAAAAF-8/9f2JYZIaAwsMtUsyLTLPSaQGnRyXrBZ4wCLcBGAsYHQ/s320/generative.png" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">The inputs to the neural network are randomly generated, typically uniformly</td></tr></tbody></table><br /><ul><li><b>A nonlinear generalization of principal component analysis</b></li></ul><div><br /></div><div>Think, e.g. of <a href="https://en.wikipedia.org/wiki/Eigenface">eigenfaces</a>. If you've ever tried to use eigenfaces to generate realistic faces, you'll notice that your results are just terrible. Much of the information in faces is not so linear and nice -- there's no reason to expect it to be. Illumination and angle are pretty much the only properties that can be expected to vary linearly.</div><div><br /></div><div>I.e. suppose you have some data that varies as follows:</div><div><br /></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-VoAh0wNHl6o/XkbIR01_E2I/AAAAAAAAF_Q/KxbPuhFySpw9j4WbtlB2q44aiqN9sI5vgCLcBGAsYHQ/s1600/nonlpca.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="511" data-original-width="763" height="214" src="https://1.bp.blogspot.com/-VoAh0wNHl6o/XkbIR01_E2I/AAAAAAAAF_Q/KxbPuhFySpw9j4WbtlB2q44aiqN9sI5vgCLcBGAsYHQ/s320/nonlpca.png" width="320" /></a></div><div class="separator" style="clear: both; text-align: center;"><br /></div><div>Then a PCA might give you the pink line as your first principal component, but sampling from the pink line gives you a lot of unphysical outputs, those are the areas where your pink line doesn't intersect the data.</div><div><br /></div><div>But using PCA to generate samples from a distribution can be understood as taking some random inputs, corresponding to the values of each principal component you want to use, and feeding them through a function, the principal component change-of-basis matrix. </div><div><br /></div><div>But more generally, replacing this function with something nonlinear allows us to deal with nonlinear models.</div><div><br /></div><hr /><br />OK,so it's clear to us that we need a neural network -- a "generative neural network" -- to construct the inverse CDF. How would one train this network?<br /><br />Given an initial random guess for the network parameters, what we have is some guessed distribution for "images of horses". And what we really want to do is perturb these parameters to <b>match the distribution of our data</b>.<br /><br />Well, such an approach is certainly possible -- one could measure some notion of distance from our generated sample distribution and the real distribution and backpropagate this error with each iteration.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-Q5ol9LPrvxE/XkknQyeEICI/AAAAAAAAF_c/8aeUSWwqVO8jGtlRybg0pVtYXk215IQkACLcBGAsYHQ/s1600/mmd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="511" data-original-width="763" height="267" src="https://1.bp.blogspot.com/-Q5ol9LPrvxE/XkknQyeEICI/AAAAAAAAF_c/8aeUSWwqVO8jGtlRybg0pVtYXk215IQkACLcBGAsYHQ/s400/mmd.png" width="400" /></a></div>Note that we don't actually know the distribution of our data (the red one), so we can't really use something like "the probability of observing this sample given our distribution" as our loss function. Anyway, there are measures of the distance between two distributions that we could use for our error function, such as the <b>maximum mean discrepancy</b> approach, and methods involving moments.<br /><br />(This approach, generally, is called a <b>Generative Matching Network</b>.)<br /><br /><hr /><br />But you can probably see why this approach is not a great one, compared to what the human brain can do. For example, these distributions would appear to be quite close to each other:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-kaJnNM0C2PE/XklHnSguB4I/AAAAAAAAF_o/h_-doqlCj3AwP4Bjv37rf8QWiQFyp5CgwCLcBGAsYHQ/s1600/low%2Bdistance.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="511" data-original-width="763" height="214" src="https://1.bp.blogspot.com/-kaJnNM0C2PE/XklHnSguB4I/AAAAAAAAF_o/h_-doqlCj3AwP4Bjv37rf8QWiQFyp5CgwCLcBGAsYHQ/s320/low%2Bdistance.png" width="320" /></a></div>OK, the chance of the outlier is pretty small. But I bet that if you asked a human to think of a "horse", he would <i>never</i> imagine this:<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/7/79/Alphonso_mango.jpg" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="552" data-original-width="652" height="270" src="https://upload.wikimedia.org/wikipedia/commons/7/79/Alphonso_mango.jpg" width="320" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Source: <a href="https://commons.wikimedia.org/wiki/File:Alphonso_mango.jpg">commons.wikimedia.org/wiki/File:Alphonso_mango.jpg</a></td></tr></tbody></table>Or more likely this:<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://upload.wikimedia.org/wikipedia/commons/f/f6/White-noise-mv255-240x180.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="180" data-original-width="240" height="300" src="https://upload.wikimedia.org/wikipedia/commons/f/f6/White-noise-mv255-240x180.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">Source: <a href="https://commons.wikimedia.org/wiki/File:White-noise-mv255-240x180.png">commons.wikimedia.org/wiki/File:White-noise-mv255-240x180.png</a></td></tr></tbody></table>OK, maybe we could just use a better distance function, etc. etc. But here's an idea: we could just subjectively <i>tell </i>that the outlier point did not belong to the distribution. We used our <i>human brains</i>. <b>How about rather than defining a discrimination function, we trained a neural network to tell if a given data point could belong to a distribution?</b> Then this neural network would train our generative neural network, and vice versa.<br /><b><br /></b>And this makes sense, right? When we learn to draw an object, we're also simultaneously learning to identify one. <br /><br />And one could imagine showing off a generative network's results and having people guess if they're real or not (alongside actual real images of course) -- and based on whether they thought it was real, we could use it to train the network. These "people" are <b>precisely what a discriminator network is</b>.<br /><br />In other words, we have two networks: the <b>generator network</b>, which generates random horse faces from random uniform variables, and the <b>discriminator network</b>, which takes the output of the generator network and some actual images, and figures out if the result is real or not.<br /><br /><b>If the classification is incorrect, the discriminator network is punished, while if it is correct, the generator network is punished.</b><br /><b><br /></b>This is known as a <b>Generative Adverserial Network</b>.ganngenerative neural networksmachine learningmethod of momentsneural networksprincipal component analysisstatisticsSun, 16 Feb 2020 16:00:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-2463937315880094634Abhimanyu Pallavi Sudhir2020-02-16T16:00:00ZComputer vision: convolutional neural networks
https://thewindingnumber.blogspot.com/2020/02/computer-vision-convolutional-neural.html
0When processing visual data, it seems that our technique of simply flattening the image and feeding it as a vector is a bit disappointing.<br /><br />I mean, it works -- but <a href="https://thewindingnumber.blogspot.com/2020/02/machine-learning-as-function.html">remember what I said about the Bayesian prior</a>? We should always try to build a network that is more inclined towards more likely models. We should make sure the network understands a priori that two pixels that are close to each other are more likely to interact to form a picture. But in turning images into vectors, all this information about proximity simply disappears.<br /><br />What we want is a <b>"hierarchial" notion of locality</b>, where local features are studied, then local features of this feature map, and so on. In general, multiple features must be simultaneously analysed and combined in order for a larger goal, such as the identification of an image.<br /><br /><div class="twn-furtherinsight">That multiple features must be simultaneously analysed for any non-trivial task should be <em>completely obvious</em> -- think, e.g. how you would produce any image analysis that depends on the gradient of the image or some other vector-valued quantity.</div><br />The biological motivation for this notion arises from the way that the brain perceives images, where each neuron only processes a portion of the visual data input known as the "receptive field", and these fields partially overlap between neurons.<br /><br />So the idea is this: spread a little "box" of coefficients across your image, dotting the box with the local entries of your image and placing the results of the dot products in a new matrix. Like I said, we have a "vector" of boxes, so you get several matrix layers which can be treated as "<i>feature channels</i>" similar in essence to an RGB channel (that you might've actually had in the input image)<br /><br /><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://static.packt-cdn.com/products/9781789138139/graphics/57b07b69-9550-4cda-a798-0578c8e30c74.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="301" data-original-width="800" height="150" src="https://static.packt-cdn.com/products/9781789138139/graphics/57b07b69-9550-4cda-a798-0578c8e30c74.png" width="400" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">(Source: <a href="https://subscription.packtpub.com/book/game_development/9781789138139/4/ch04lvl1sec31/convolutional-neural-networks">https://subscription.packtpub.com/book/game_development/9781789138139/4/ch04lvl1sec31/convolutional-neural-networks</a>)</td></tr></tbody></table>In order to actually "capture" the local features and move on in our hierarchy, we then <b>pool</b> the feature values in the output map via little grids (making it much smaller). One may either think of pooling as testing for <i>presence</i> (<b>max-pooling</b>) or <i>sustained presence</i> (<b>mean-pooling</b>) of a feature. As the convolutions are done across the image, it generally isn't of fundamental importance which type of pooling we choose (at least when you have padding -- if you don't pad, max-pooling may be better).<br /><br />To be more precise:<br /><ul><li>Start with $(m,n)$ dimensional image with $p$ channels (e.g. RGB channels), and pad it with padding $(g, h)$ (to retain edge information). </li><li>Rub $k$ filters on it of dimension $(a,b)$ ($p$ channels) with stride lengths $s,t$. These filters have weights as well as biases.</li><li>You get a feature map of dimensions $\left(\left\lfloor\frac{m+2g-a}{s}\right\rfloor+1, \left\lfloor\frac{n+2h-b}{t}\right\rfloor+1\right)$ with $k$ channels.</li><li>Pool with a pooling filter of dimension $(c, d)$ with stride lengths $u, v$ (should usually be equal to $c, d$). You get a pooled map of dimensions $\left(\left\lfloor\frac{\left\lfloor\frac{m+2g-a}{s}\right\rfloor+1-c}{u}\right\rfloor+1, \left\lfloor\frac{\left\lfloor\frac{n+2h-b}{t}\right\rfloor+1-d}{v}\right\rfloor+1\right)$ with $k$ channels.</li><li>Feed into your activation function.</li></ul><div>For the case $u=v=c=d$, $a=b$, $s=t=1$, $g=h$, the dimensions of the output are:</div><div><br /></div><div>$$\left(\left\lfloor\frac{m + 2h - a +1}{u}\right\rfloor, \left\lfloor\frac{n + 2h - a +1}{u}\right\rfloor\right)$$</div><div><br /></div><div><b>Stuff to think about:</b></div><div><ol><li>What does a larger filter (higher $a, b$) represent?</li><li>CNNs can be done in dimensions other than 2, including in 1 dimension. Think about how this could be used in applications other than image processing.</li></ol></div>artificial intelligencecomputer visionconvolutional neural networksmachine learningneural networksThu, 13 Feb 2020 14:27:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-1269983401089383017Abhimanyu Pallavi Sudhir2020-02-13T14:27:00ZIncredible Duals of category theory
https://thewindingnumber.blogspot.com/2020/02/incredible-duals-aka-category-theory.html
0You might have noticed in the past two articles [<a href="https://thewindingnumber.blogspot.com/2020/01/abstracting-some-categorical-definitions.html">e.g.</a>] [<a href="https://thewindingnumber.blogspot.com/2020/01/abstracting-our-abstractions-limits-of.html">e.g.</a>] that the constructions we define so often come in pairs. And there is a very specific relationship that defines these pairs, too -- they arise from <b>reversing the arrows</b> in the construction.<br /><br />I'm aware that this appears bizarre, but there's nothing else it <i>could</i> arise from, right? Duality shows up everywhere in mathematics, but the only "natural" notion of duality we get with arrows is to reverse them. So somehow, if category theory is to formalise all mathematical intuition in some way, then all the good dualities you see must somehow be nicely expressible in terms of this category theoretic duality.<br /><br />To drive this point home, let's just list out a bunch of "good things appearing in twos" we've seen in mathematics (other than what we've already seen) and see how they can be expressed categorically.<br /><ol><li><b>Dual order</b> (i.e. given a relation $\le$, the dual order is defined by $\ge$) -- this one is just trivial. Every poset is a preordered set and therefore can itself be seen as a category, so the dual order is the dual of this category.</li><li><b>Sup, Inf</b> (i.e. the lower bound of the upper bounds, the upper bound of the lower bounds in a lattice) -- this follows directly from the above.</li><li><b>LCM, GCD</b> -- special case of the above, as the integers can be organised into a lattice based on divisibility.</li><li><b>de Morgan duality</b> (i.e. the duality induced by the complement/negation operation in Boolean algebra, e.g. between unions and intersections) -- in the case of logic, propositions can be ordered by implication; in the case of set theory, sets can be ordered by inclusion.</li><li><b>Addition and multiplication</b> -- Follows from the duality of sums and products of objects by considering the category of finite sets.</li><li><b>A subspace and its orthogonal complement </b>(more generally, a subobject and a quotient by it, i.e. where $O\to A\to B\to C\to O$, $A$ and $C$).</li></ol><div>To make this more precise:</div><blockquote>We define the <b>opposite category</b> as the category with all arrows reversed. Where an invertible functor between these categories maps a diagram to another, those diagrams are called <b>dual notions</b>.</blockquote>category theoryde morgandualityopposite categoryMon, 10 Feb 2020 22:36:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-2332872034256268144Abhimanyu Pallavi Sudhir2020-02-10T22:36:00ZComment by Abhimanyu Pallavi Sudhir on Easy examples of dual objects in category theory
https://math.stackexchange.com/questions/1273604/easy-examples-of-dual-objects-in-category-theory/1274125#1274125
@JavierArias You asked for dual object in your question.Mon, 10 Feb 2020 15:48:51 GMThttps://math.stackexchange.com/questions/1273604/easy-examples-of-dual-objects-in-category-theory/1274125?cid=7283224#1274125Abhimanyu Pallavi Sudhir2020-02-10T15:48:51ZBackpropagation and the chain rule
https://thewindingnumber.blogspot.com/2020/02/backpropagation-and-chain-rule_6.html
0In order to define an function approximation, we need to define a <b>loss function</b>, such as the mean-squared-error often used in linear regression. To minimise such a (generally highly complicated) loss function, gradient descent is a natural numerical algorithm.<br /><br />OK -- so how does one actually calculate these gradients efficiently? Consider the following neural network (where each box represents a tensor, e.g. vector $X$s, matrix $W$s, whatever $y$ and scalar $L$ -- and arrows indicate that something is being fed as input through a function):<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-VZCSgmu5qGg/XjsbnCp4SUI/AAAAAAAAF94/lcOhY9n1QhsGBNv_xuxo5asUs9Vo9dddQCLcBGAsYHQ/s1600/nn.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="384" data-original-width="952" height="161" src="https://1.bp.blogspot.com/-VZCSgmu5qGg/XjsbnCp4SUI/AAAAAAAAF94/lcOhY9n1QhsGBNv_xuxo5asUs9Vo9dddQCLcBGAsYHQ/s400/nn.png" width="400" /></a></div>Well, you might observe that the tensors are in a <b>chain</b>. What do we do when we see a chain and we want to differentiate stuff? We use the <b>chain rule</b>.<br /><br />To be more specific, we're interested in differentiating -- i.e. taking the gradient of -- the <i>function that takes in the weights and outputs the loss</i> (i.e. the loss function). This function can be understood as the composition of several functions -- specifically all the red arrows. But really, descending in the gradient direction is the same as descending in the respective derivative directions of each parameter. I.e. it suffices to talk about:<br /><br />$$\frac{\partial L}{\partial W_2}=\frac{\partial L}{\partial X_2}\frac{\partial X_2}{\partial W_2}$$$$\frac{\partial L}{\partial W_1}=\frac{\partial L}{\partial X_2}\frac{\partial X_2}{\partial X_1}\frac{\partial X_1}{\partial W_1}$$<br />Or in general, for a network with layers $X_0,...X_m$ and $L=:X_{m+1}$:<br /><br />$$\frac{\partial L}{\partial W_i} = \left[\prod_{k=m}^i\frac{\partial X_{k+1}}{\partial X_k}\right]\frac{\partial X_i}{\partial W_i}$$<br />Note that each item in this product is a <i>tensor</i>, i.e. involves taking tensor derivatives. This is why you see ML programming packages labeled stuff like "TensorFlow" -- what they do is keep track of derivatives for you.<br /><br />Keeping track of derivatives and computing products on the spot is better than trying to come up with a general expression for the derivatives, because a generic neural network may be much more complicated than the one we've described, and may have arrows that skip a layer, etc. for which the composition doesn't even yield a matrix multiplication. In general, we may have any sort of operations involved in the network (activation functions are an obvious one), and as long as we can differentiate them, we can keep track of what we need to multiply. This algorithmic use of the chain rule is called <b>back</b><b>propagation</b>.<b> </b><br /><br />Note how the chain rule itself has no problem at all with computing gradients for multiple data points -- we still have a loss function that is a function of the network's parameters. But with our algorithmic use, we'll need to form an expression for the gradient of the batch from each feed-forward's gradient. At least if the loss function is additive, so are the gradients.backpropagationchain rulegradient descentmachine learningneural networksThu, 06 Feb 2020 00:02:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-4458693034137322392Abhimanyu Pallavi Sudhir2020-02-06T00:02:00ZMachine learning as function approximation; statistical and neural motivations
https://thewindingnumber.blogspot.com/2020/02/machine-learning-as-function.html
0There are two -- different -- explanations of what machine learning <i>is</i>, which also lead to two different <i>motivations</i> for machine learning, and two different ways to understand each neural network architecture and each of some ideas in machine learning.<br /><ol><li>The <b>statistical</b> motivation -- machine learning is a <i>non-linear generalisation</i> of "linear" statistical techniques (data mining) like linear regression, PCA and linear decision boundaries. One can always do these linear techniques with some transformation of the data that makes relationships linear (while making sure the transformation is not absurd), but you need a way to "train" what the right such function is. In this sense, machine learning acts as a <b>function approximator</b>.</li><li>The <b>neurobiological</b> motivation -- a computer should be able to do whatever a brain can, but how exactly does a brain do the stuff it does? To take a simple example, the brain can recognise digits -- well, whatever the brain does, it takes an image as input and outputs a digit, i.e. <i>it's a function</i>. So once again, we need a <b>function approximator</b>.</li></ol><div>Great. So machine learning is about making function approximators. The basic idea is that we're looking for a <i>function</i> that minimises the overall error for a population of data -- it's basically a calculus of variations problem, isn't it? Well, except it isn't, because we don't have access to the entire population, so we need to avoid overfitting (i.e. we need to consider a <b>Bayesian prior</b>). This is also what we meant by "making sure the transformation is not absurd" as we mentioned.</div><div><br /></div><div>Anyway, we want a <b>universal function approximator</b> -- a system that can generate a function arbitrarily close to any given function given sufficiently many parameters. Well, we have such a system -- it's called <b>polynomial regression</b>. But it doesn't work. Why not? It's the <i>wrong Bayesian prior</i>. Polynomial regression gives zero prior probabilities to high-order polynomials, but if you think about it, most machine learning applications necessarily require functions with heavy non-local effects. </div><div><br /></div><div>A function approximator based on the neurobiological analogy is a <b>neural network</b>. That this is a universal function approximator (this is called the <b>universal approximation theorem</b>) is mathematically not immediately obvious, and that it has an appropriate Bayesian prior is certainly not easy to guess. But the fact that our brains work as neural networks can be used as an empirical "proof" of these facts.<br /><br />In fact, a <i>single-layer neural network </i>-- the input, a single layer of processing, then the output -- can be used to approximate, with sufficiently many neurons, any function to arbitrary precision. This basically just means that functions can be written as linear combinations of some scaled and translated sigmoid functions.</div><div><br />(<b>Exercise:</b> explain why the universal approximation theorem is true for the sigmoid function. What other kinds of functions is it true for? It's actually not that hard at all. If you do get stuck, check out the visuals in <a href="http://neuralnetworksanddeeplearning.com/chap4.html">Michael Nielson's e-book</a>. A rigorous proof can be found <a href="http://mcneela.github.io/machine_learning/2017/03/21/Universal-Approximation-Theorem.html">here</a>.)<br /><br />In fact, the universal approximation theorem is not actually particularly important at all to the success of neural networks -- like we said, plenty of systems are universal approximators, but they don't have the right Bayesian prior (and this matters when you have limited data). In fact, a single layered neural network, as it turns out, also often leads to a bad Bayesian prior. Actual learning often involves detecting several "<b>features</b>" of the data in several "<b>steps</b>" and integrating them together, e.g. -- first filtering an image, then segmenting it, then recognising edges of a digit, then recognising the shapes themselves. This is a metaphor that a single-layer network doesn't really capture.<br /><br />Instead, we typically use multiple layers, like in the brain, called <i>deep neural networks</i>, or <b>deep learning</b>. The basic idea here comes from looking at the way we process things, which is often in "<b>steps</b>".</div>bayes's theorembayesian statisticsdeep learningfunction approximationmachine learningneural networksregressionstatisticsuniversal approximation theoremTue, 04 Feb 2020 12:28:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-2530585419757120370Abhimanyu Pallavi Sudhir2020-02-04T12:28:00ZThe "transhuman vision"
https://thewindingnumber.blogspot.com/2020/02/the-transhuman-vision.html
0... (out of context stuff dumped here, move along)<br /><br />Let's talk briefly about the "transhumanist vision":<br /><ol><li>A <b>computer</b> that includes a <i>processor</i>, <i>memory</i> (<i>long-term</i> and <i>short-term</i>) and connections to <i>peripherals</i>. The "mind" or <i>general artificial intelligence</i> is part of the software that can run on the processor, along with other more procedural programs, special AIs and essential basic programs (drivers and stuff).</li><li>A <b>transhuman body</b> -- the peripherals including: sensory devices for visual (eyes) and auditory (ears) observation, substance detection (nose and tongue) and various sensors (skin, ears) among other things, output devices including: display and speech, motion, hands and fingers and 3D printing. Mechanisms to preserve immortality (defense and backup automation mechanisms)</li><li>A <b>communication system</b> -- including: "telepathic" or "telekinetic" communication with other transhumans and internet-connected devices, internet access including access to new skills and data for learning.</li><li>A <b>virtual reality</b> platform where a transhuman can behave as a human, multiple humans or any other persona they can create or download -- in a virtual environment that could be private or connected to the internet. A "persona" is a system that collects sensory information in the virtual environment in a particular way, limits the capabilities of the user's interaction with the virtual environment in a particular way or even temporarily alters the user's mind (memories, emotional responses, decision-making instincts, etc.).</li><li><b>Mind transfer and copying</b> -- mind upload (from biological to electronic and vice versa), mind copying (from biological to electronic and vice versa), mind teleportation (from electronic to electronic), mind cloning/backup (from electronic to electronic)</li></ol>Why did we just do that? Because I want to give an overview of the kind of "data" a general artificial intelligence should be able to collect. OK, obviously the machine learning part is the "mind" or "general artificial intelligence" mentioned in the first point. What can a brain do?<br /><br />Well, I'm not claiming this is complete, but:<br /><ul><li>Observation processing -- from sensory or memory, coming up with questions and experiments, thought experiments too (linked to transfer) -- segmentation, movement, detection and classification, detecting patterns/faces... "Is this guy an older version of this guy? People tend to ask that question if he <i>is</i>... train to understand things like this. But maybe not always -- e.g. "Are you serious?" ... should be able to distinguish between these cases... "bounded" guess meaning from word... but if fails, then no. Or if it says "misleading semantics"</li><li>Original creations -- imagination or to peripherals. Simple things like movement to more complicated things: random thoughts, artworks, ideas, lines of thought, mathematical proofs, ... ... confirmation bias except with statistical data we can avoid that</li><li>Predictions, coming up with explanations... subjective probability</li><li>Decision-making -- using predictions. Or using reasoning. Can be based on random choice. Can be trained based on previous outcomes, advice of others, etc. Can evaluate that advice.</li><li>Reasoning</li><li>In-built loss functions -- emotion (pain, greed, social approval), instinct, co-operation, etc. Controlling instincts. </li><li>Learning -- from reading, introspection, whatever... detecting emphasis, reprogramming brain instincts</li><li>Knowing what to recall: 12*11 = ? (132)</li><li>Making connections/transfer of learning extremely important -- examples: symmetries, things fall, being big or small, edges and stuff. You don't need to read a thousand computer programs before you make your first one (list steps involved that are transfered over). You don't need to read a humongous number of mathematical proofs before you write your first one. You don't need feedback on a a million mathematical conjectures you generate before you can tell which ones are important. You don't even need to see a thousand examples of a single character to tell what it is or to reproduce it. On a higher level, making connections between ideas, mathematical abstraction. </li><li>Narratives, causative structure, mental classification, clocks and models, and other actually "artificial" things</li></ul><div>...</div>artificial intelligencefuturismtranshumanismMon, 03 Feb 2020 15:13:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-6443177737821377687Abhimanyu Pallavi Sudhir2020-02-03T15:13:00ZIs the image a universal object?
https://math.stackexchange.com/questions/3519737/is-the-image-a-universal-object
1<p>Given a function <span class="math-container">$f:X\to Y$</span> in category <span class="math-container">$\mathcal{C}$</span>, one can construct the image as a factorisation <span class="math-container">$f=(e:I\hookrightarrow Y)\circ(g:X\to I)$</span> that is universal (initial) among all such factorisations.</p>
<p>This does seem like a <a href="https://math.stackexchange.com/questions/3511678/trying-to-understand-the-definition-of-a-universal-property">universal property</a>. But I can't figure out how this can actually be constructed as an initial object in a comma category, because there are morphisms both from and to the object.</p>category-theoryuniversal-propertymonomorphismsThu, 23 Jan 2020 12:49:24 GMThttps://math.stackexchange.com/q/3519737Abhimanyu Pallavi Sudhir2020-01-23T12:49:24ZComment by Abhimanyu Pallavi Sudhir on Trying to understand the definition of a universal property
https://math.stackexchange.com/questions/3511678/trying-to-understand-the-definition-of-a-universal-property/3512051#3512051
Nice! But under "Connection to comma categories", $X$ should be an object of $D$, right?Wed, 22 Jan 2020 11:02:56 GMThttps://math.stackexchange.com/questions/3511678/trying-to-understand-the-definition-of-a-universal-property/3512051?cid=7235816#3512051Abhimanyu Pallavi Sudhir2020-01-22T11:02:56ZComment by Abhimanyu Pallavi Sudhir on Trying to understand the definition of a universal property
https://math.stackexchange.com/questions/3511678/trying-to-understand-the-definition-of-a-universal-property/3512051#3512051
This makes a lot of sense, thanks. By the way, there's a typo in the direction of your arrows for the coproduct injection map where you defined them.Sat, 18 Jan 2020 10:49:50 GMThttps://math.stackexchange.com/questions/3511678/trying-to-understand-the-definition-of-a-universal-property/3512051?cid=7225700#3512051Abhimanyu Pallavi Sudhir2020-01-18T10:49:50ZTrying to understand the definition of a universal property
https://math.stackexchange.com/questions/3511678/trying-to-understand-the-definition-of-a-universal-property
1<p><a href="https://en.wikipedia.org/wiki/Universal_property#Formal_definition" rel="nofollow noreferrer">Here</a>'s the definition of a universal property in Wikipedia:</p>
<blockquote>
<p>(where <span class="math-container">$U:D\to C$</span> is a functor and <span class="math-container">$X$</span> is an object in <span class="math-container">$C$</span>)</p>
<p>A <strong>terminal morphism</strong> from <span class="math-container">$U$</span> to <span class="math-container">$X$</span> is a final object in the category <span class="math-container">$(U\downarrow X)$</span> of morphisms from <span class="math-container">$U$</span> to <span class="math-container">$X$</span>, i.e. consists of a pair <span class="math-container">$(A,\Phi)$</span>
where <span class="math-container">$A$</span> is an object of <span class="math-container">$D$</span> and <span class="math-container">$\Phi: U(A) \to X$</span> is a morphism in <span class="math-container">$C$</span>, such
that the following <strong>terminal property</strong> is satisfied:</p>
<ul>
<li>Whenever <span class="math-container">$Y$</span> is an object of <span class="math-container">$D$</span> and <span class="math-container">$f: U(Y) \to X$</span> is a morphism in <span class="math-container">$C$</span>, then
there exists a unique morphism <span class="math-container">$g: Y \to A$</span> such that the following
diagram commutes:</li>
</ul>
<p><span class="math-container">$\ \ \ \ \ \ \ \ \ $</span><a href="https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/UniversalProperty-04.svg/300px-UniversalProperty-04.svg.png" rel="nofollow noreferrer"><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/4/49/UniversalProperty-04.svg/300px-UniversalProperty-04.svg.png" alt="enter image description here"></a></p>
</blockquote>
<p>So I'm trying to "unpack" this definition and figure out what each of the things here "means". E.g. what does it become in the case of a limit, or something.</p>
<ul>
<li>A limit is an example of a terminal morphism, right? And a colimit an initial morphism?</li>
<li>Does <span class="math-container">$U$</span> usually represent a diagram? In the case of a limit, does it represent the diagram we want to take the limit of?</li>
<li>Whatever is <span class="math-container">$X$</span>? I honestly have no clue here. What's the analog in the case of a limit?</li>
<li>What does a morphism from <span class="math-container">$U\to X$</span> even mean? What does it mean in the case of a limit? I've seen morphisms from a diagram to an object in <em>co</em>limits.</li>
<li>In the case of a limit, is <span class="math-container">$(U\downarrow X)$</span> the category of cones? But how can each cone be a morphism from <span class="math-container">$U$</span> to something (I thought it was a morphism from something to <span class="math-container">$U$</span>)?</li>
<li><span class="math-container">$A$</span> (or <span class="math-container">$U(A)$</span>) corresponds to the actual thing we construct, like the source of a limit or the target of a colimit? But what is <span class="math-container">$\Phi$</span>? In the construction of a limit, there's a morphism from the limit to the diagram, this seems wrong.</li>
</ul>
<p>My guess is that <span class="math-container">$X$</span> represents some sort of "subsetting" of the candidates for the object so you don't have to quantify over everything like you do with cones and the limit. Is that right? </p>
<hr>
<p><strong>Edit:</strong> So as a short summary -- it turns out <span class="math-container">$X$</span> represents (in the case of limits and colimits) the diagram we're trying to take the limit of, while <span class="math-container">$A$</span> represents the actual limit object (with its morphism <span class="math-container">$\Phi$</span>). <span class="math-container">$U$</span> is the diagonal functor, because the limit is constructed here as an object in the category of diagrams of shape at most that of <span class="math-container">$X$</span>.</p>category-theorylimits-colimitsuniversal-propertyslice-categoryThu, 16 Jan 2020 20:15:26 GMThttps://math.stackexchange.com/q/3511678Abhimanyu Pallavi Sudhir2020-01-16T20:15:26ZComment by Abhimanyu Pallavi Sudhir on Are these definitions of limits the same?
https://math.stackexchange.com/questions/3504151/are-these-definitions-of-limits-the-same/3504158#3504158
@BenjaminThoburn It would work, but that's the kind of thing that should be a theorem, not a definition (e.g. because it doesn't generalise to spaces where limits aren't unique).Thu, 16 Jan 2020 11:24:05 GMThttps://math.stackexchange.com/questions/3504151/are-these-definitions-of-limits-the-same/3504158?cid=7221247#3504158Abhimanyu Pallavi Sudhir2020-01-16T11:24:05ZAbstracting our abstractions: "limits of cones", universal properties
https://thewindingnumber.blogspot.com/2020/01/abstracting-our-abstractions-limits-of.html
0The last article, <a href="https://thewindingnumber.blogspot.com/2020/01/abstracting-some-categorical-definitions.html">Abstracting some categorical definitions</a>, saw the same kind of construction repeated over and over: given some diagram, we'd ask for an object with morphisms to or from that diagram (demanding that the diagram commute) -- such an object would be a "candidate" for our construction, and we'd then ask for the "maximum" or "minimum" among such constructions.<br /><br />That this notion appears so frequently makes sense. It's really a generalisation of the notions of initial and final topologies, and comes from the notion that an object is defined by its morphisms to or from other objects, and that we're interested in constructions that are unique up to isomorphism.<br /><br />So consider some diagram in $\mathcal{C}$. As we will later see, this is formally a "functor" (morphism between categories) from an indexing category $\mathcal{I}$ to $\mathcal{C}$ -- denote it as $X:\mathcal{I}\to\mathcal{C}$.<br /><blockquote>We define a <b>cone</b> to $\mathcal{C}$ as an object $M$ together with morphisms $m_i:M\to X_i$ such that it <b>commutes</b> with the existing diagram (formally, such that for every morphism $f:i\to j$ in $\mathcal{I}$, we have $F(f)\circ m_i=m_j$).</blockquote>Now this necessarily represents an object with "more information than each $X_i$" -- so we're interested in the "infimum" of these cones, the one with the least information, the one to which there exists a morphism from any other cone. The <b>limsup</b> of cones, if you will:<br /><blockquote>We define the <b>limit</b> $(L,\ell_i)$ of the diagram to be a cone such that for any cone $(M,m_i)$ to the diagram, $\exists!\ u:M\to L$ such that the diagram commutes, i.e. $m_i=\ell_i\circ u$ forall $i$.</blockquote><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-9RSnwBFIIkI/Xhz23-n6zqI/AAAAAAAAF8E/GhZ4m-EKOcUsxZTHezjBrOSOAs4JWG4cACLcBGAsYHQ/s1600/limit.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="321" data-original-width="270" height="320" src="https://1.bp.blogspot.com/-9RSnwBFIIkI/Xhz23-n6zqI/AAAAAAAAF8E/GhZ4m-EKOcUsxZTHezjBrOSOAs4JWG4cACLcBGAsYHQ/s320/limit.png" width="269" /></a></td></tr><tr><td class="tr-caption" style="text-align: center;">The above diagram commutes, and the purple morphism is unique.</td></tr></tbody></table>And the dual notion is also observed, a <b>liminf</b>:<br /><blockquote>We define a <b>co-cone</b> from $\mathcal{C}$ as an object $\overline{M}$ together with morphisms $\overline{m}_i:X_i\to \overline{M}$ such that it <b>commutes</b> with the existing diagram (formally, such that for every morphism $f:i\to j$ in $\mathcal{I}$, we have $\overline{m}_j\circ F(f)=\overline{m}_i$).</blockquote><blockquote>We define the <b>co-limit</b> $(\overline{L},\overline{\ell}_i)$ of the diagram to be a cone such that for any cone $(\overline{M},\overline{m}_i)$ to the diagram, $\exists!\ u:\overline{L}\to \overline{M}$ such that the diagram commutes, i.e. $\overline{m}_i=u\circ\overline{\ell}_i$ forall $i$.</blockquote><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody><tr><td class="tr-caption" style="text-align: center;">The below diagram commutes, and the purple morphism is unique.</td></tr><tr><td style="text-align: center;"><a href="https://1.bp.blogspot.com/-7kXCxZt1Mx8/Xhz5n0JvDNI/AAAAAAAAF8Q/W_DC4Mi7BEMCZqLSeTpFjgfyDX1u3Y4MACLcBGAsYHQ/s1600/liminf.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" data-original-height="321" data-original-width="270" height="320" src="https://1.bp.blogspot.com/-7kXCxZt1Mx8/Xhz5n0JvDNI/AAAAAAAAF8Q/W_DC4Mi7BEMCZqLSeTpFjgfyDX1u3Y4MACLcBGAsYHQ/s320/liminf.png" width="269" /></a></td></tr></tbody></table>Alternatively, the limit and colimit may be characterised as the <b>final object in the category of </b><b>cones</b> and the <b>initial object in the category of co-cones</b> respectively (check that this makes sense).<br /><br /><b>Examples:</b><br />Diagrams captioned by their limits. <br /><style type="text/css">.tg {border-collapse:collapse;border-spacing:0;margin:0px auto;} .tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;border-color:black;} .tg .tg-cly1{text-align:left;vertical-align:middle} .tg .tg-yla0{font-weight:bold;text-align:left;vertical-align:middle} .tg .tg-0lax{text-align:left;vertical-align:top} </style><br /><table class="tg"><tbody><tr> <th class="tg-cly1"></th> <th class="tg-yla0"><span style="font-weight: normal;">(empty diagram)</span><br />Limit:<span style="font-weight: bold;"> </span><span style="font-style: italic; font-weight: normal;">final object</span><br /><span style="font-style: normal; font-weight: bold;">Co-limit:</span><span style="font-style: normal; font-weight: normal;"> </span><span style="font-style: italic; font-weight: normal;">initial object</span></th> </tr><tr> <td class="tg-cly1"><img src="https://i.imgur.com/OlsEkBt.png" width="200px" /></td> <td class="tg-yla0"><br /><span style="font-weight: normal;">(discrete diagram)</span><br />Limit:<span style="font-weight: bold;"> </span><span style="font-style: italic; font-weight: normal;">product</span><br /><span style="font-weight: bold;">Co-limit:</span><span style="font-weight: normal;"> </span><span style="font-style: italic; font-weight: normal;">co-product</span><br /><br /><span style="font-weight: normal;">This answers the difficult cases of the empty product (it's just the final object) and the power (use the constant functor).</span></td> </tr><tr> <td class="tg-0lax"><img src="https://i.imgur.com/piX9wdx.png" width="200px" /></td> <td class="tg-0lax">(parallel diagram)<br /><span style="font-weight: bold;">Limit:</span><span style="font-weight: normal;"> </span><span style="font-style: italic; font-weight: normal;">equaliser</span><br /><span style="font-weight: bold;">Co-limit:</span><span style="font-weight: normal;"> </span><span style="font-style: italic; font-weight: normal;">co-equaliser</span></td> </tr></tbody></table><b><br /></b> <b>Exercises:</b> Do some examples to convince yourself of the following ideas:<br /><ul><li>Even if there are a bunch of morphisms in the diagram, the limit of the diagram talks fundamentally about the product of the "starting" objects of the diagram (think of: $X\rightarrow Y\leftarrow Z$, etc.).</li><li>If your original diagram has non-commuting features, the limit of the diagram talks about equalisers of these features (think of: parallel diagram, reverse-parallel diagram $\leftrightharpoons$, other cycles, a diagram with non-trivial automorphisms).</li><li>Adding commuting stuff doesn't change the limit (i.e. the limit of $X\to Y\to Z$ is the same if you add another morphism $X\to Z$).</li></ul><div><br /></div><hr /><br /><b>Universal objects and comma categories</b><br /><b><br /></b> You may have noticed that images and coimages cannot be written as limits and colimits (do you see why?). We made a fairly specific specialisation when defining limits/colimits that doesn't really have to do with our "limsup/liminf" intuition -- we insisted we had morphisms either from or to the diagram, whereas we could in general have a more complicated property.<br /><br />In general, instead of dealing with the category of cones, we could deal with <i>some other category</i> (called the <b>comma category</b>) and discuss its initial and final objects instead.<br /><br />The key insight regarding this generalisation is as follows: one can see the limit as a construction in the <b>category $\mathcal{C}^{\mathcal{I}}$ of diagrams</b> in $\mathcal{C}$ of a certain shape $\mathcal{I}$. The limit object (which is an object in $\mathcal{C}$) can be "upgraded" to that category as the <b>constant diagram</b> (an element of $\mathcal{C}^{\mathcal{I}}$ that maps every node in the diagram shape to the same object in $\mathcal{I}$) (this "upgrading" is called the <b>diagonal functor</b> $\Delta: \mathcal{C}\to\mathcal{C}^\mathcal{I}:=\lambda M.\ (\lambda i.\ M)$) with a morphism to the object of $\mathcal{C}^\mathcal{I}$ we're actually taking the limit of.<br /><br />So more generally, we can consider some category other than $\mathcal{C}^\mathcal{I}$, and a more general functor than $\Delta$, in order to formalise a more general notion of being a limiting object. We make the following definition:<br /><blockquote>We define the <b>final morphism</b> <i>from</i> a functor $F:\mathcal{C}\to\mathcal{D}$ <i>to</i> an object $D\in\mathcal{D}$ as a morphism $\ell:F(L)\to D$ such that for any morphism $m:F(M)\to D$, there $\exists!\ u: M\to L$ such that the diagram commutes, i.e. $m=\ell\circ F(u)$.</blockquote>You may observe that $F$ generalised $\Delta$, $\mathcal{D}$ is the generalisation of the "category of diagrams", and the final morphism generalises the limit (with $L$ being the limit "object" in $\mathcal{C}$). Analogously we define, generalising the colimit: <br /><blockquote>We define the <b>initial morphism</b> <i>to</i> a functor $F:\mathcal{C}\to\mathcal{D}$ <i>from</i> an object $D\in\mathcal{D}$ as a morphism $\overline{\ell}:D\to F(\overline{L})$ such that for any morphism $\overline{m}:D\to F(\overline{M})$, there $\exists!\ u: \overline{L}\to \overline{M}$ such that the diagram commutes, i.e. $\overline{m}=F(u)\circ\overline{\ell}$.</blockquote><div class="twn-pitfall">These terms "final morphism" and "initial morphism" are not to be confused with the morphisms to and from an initial object or a final object, that we defined previously. Typically, these terms are used in <em>neither</em> context -- one simply says "universal morphism" to/from $D$ from/to $F$; and in the previous context, one simply says morphisms to a final object/from an initial object.</div><br />In general, these morphisms are referred to as <b>universal morphisms</b> or <b>universal objects</b>.<br /><br />(By the way: the term "universal property" is just used to refer to the property of being initial or terminal or whatever.)<br /><br />This notion can easily be restated as follows: given an object $D\in\mathcal{D}$ and a functor $F:\mathcal{C}\to\mathcal{D}$, one can construct the following:<br /><blockquote>The <b>comma category</b> $[F\to D]$ is a category whose objects are the morphisms $m:F(M)\to D$, and whose morphisms from $m_1\to m_2$ are given by morphisms $u:M_1\to M_2$ such that the diagram commutes, i.e. such that $m_1=m_2 \circ F(u)$.</blockquote><blockquote>The <b>cocomma category</b> $[D\to F]$ is a category whose objects are the morphisms $m:D\to F(M)$ and whose morphisms from $m_1\to m_2$ are given by morphisms $u:M_1\to M_2$ such that the diagram commutes, i.e. such that $m_2=F(u)\circ m_1$</blockquote>Then a final morphism is the <i>final object in the comma category</i>, and an initial morphism is the initial morphism is the <i>initial object in the cocomma category</i>. If $\mathcal{D}=\mathcal{C}^{\mathcal{I}}$ (i.e. is a diagram category) and $F$ is the diagonal functor, then the comma category is the category of cones, and the cocomma category is the category of cocones.<br /><br />One might dislike the asymmetry between $F$ and $D$ and decide to go a step further, generalising $D$ to another functor. So given two functors $F:\mathcal{A}\to\mathcal{D}$ and $G:\mathcal{B}\to\mathcal{D}$, we can construct:<br /><blockquote>The <b>comma category</b> $[F\to G]$ is a category whose objects are the morphisms $m:F(M)\to G(N)$ and whose morphisms from $m_1\to m_2$ are given by morphisms $u:M_1\to M_2,\ v: N_1\to N_2$ such that the following diagram commutes: <br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-TtX-Nc0Vwfc/XimPBwUg23I/AAAAAAAAF8w/OFaZNAToz6QOXkXVgsXqLoi0h1g-V_NkwCLcBGAsYHQ/s1600/comma_comm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="185" data-original-width="322" height="183" src="https://1.bp.blogspot.com/-TtX-Nc0Vwfc/XimPBwUg23I/AAAAAAAAF8w/OFaZNAToz6QOXkXVgsXqLoi0h1g-V_NkwCLcBGAsYHQ/s320/comma_comm.png" width="320" /></a></div></blockquote>The previous definition of comma and cocomma categories then occur when $\mathcal{B}$ and $\mathcal{A}$ respectively are replaced by a singleton (and $D$ is the only object in their image in $\mathcal{D}$).<br /><br /><hr /><br />Examples: image, free group, etc.<br /><br />abstract mathematicscategory theoryconeslimits and colimitslimsup and liminfmathematicsuniversal propertyTue, 14 Jan 2020 00:26:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-8160367261848065546Abhimanyu Pallavi Sudhir2020-01-14T00:26:00ZAbstracting some categorical definitions
https://thewindingnumber.blogspot.com/2020/01/abstracting-some-categorical-definitions.html
0Before making any interesting definitions, we need to get something over with: the notions of injective and surjective homomorphisms do not really generalise very nicely to category theory -- the closest definitions are:<br /><blockquote>A morphism $f:X\to Y$ is a <b>monomorphism</b> if for distinct morphisms $g$ to $X$, $f\circ g$ are distinct.</blockquote><blockquote>A morphism $f:X\to Y$ is an <b>epimorphism</b> if for distinct morphisms $g$ from $Y$, $g\circ f$ are distinct.</blockquote>One may check that all injective homomorphisms are monomorphisms and that all surjective homomorphisms are epimorphisms -- but it's not too hard to see that they are not equivalent. Concrete counter-examples exist even in simple categories like abelian groups.<br /><br /><div class="twn-beg">If anyone has any motivation or insightful explanation of monomorphisms and epimorphisms, let me know. For example, are the non-injective monomorphisms (e.g. the quotient map $\mathbb{Q}\to\mathbb{Q}/\mathbb{Z}$) actually interesting or just something we need to get used to?</div><br />There are also related notions:<br /><blockquote>A morphism $f:X\to Y$ is a <b>section</b> if it has a left-inverse ("retraction"), i.e. a morphism $g:Y\to X$ such that $g\circ f=1_X$.</blockquote><blockquote>A morphism $f:X\to Y$ is a <b>retraction</b> if it has a right-inverse ("section"), i.e. a morphism $g:Y\to X$ such that $f\circ g=1_Y$.</blockquote>It's clear that all sections are injective morphisms (and thus monomorphisms), and all retractions are surjective morphisms (and thus epimorphisms). Of course, these intermediaries are not category-theoretic.<br /><br />One may define an <b>isomorphism</b>, denoted $f:X\leftrightarrow Y$, as a morphism that is both a section and a retraction. Here are some theorems about it:<br /><blockquote><b>An isomorphism has a two-sided inverse morphism.</b><br />Given $g_1\circ f=1_X$, $f\circ g_2=1_Y$ -- by considering $g_1\circ f\circ g_2$, we see that $g_1=g_2$.</blockquote><blockquote><b>A morphism that is both a monomorphism and a retraction is an isomorphism.</b><br />Since $f$ is a retraction, $f \circ g = {1_Y}$ -- so $f \circ g \circ f = f$. Left-cancelling (since $f$ is a monomorphism), $g\circ f = 1_Y$, thus $f$ is a section.</blockquote><blockquote><b>A morphism that is both a section and a epimorphism is an isomorphism.</b><br />Analogous to above.</blockquote><hr /><b><br /></b> <b>Subobjects</b><br /><br />In our minds, a sub-object is a subset that also carries the structure of that category -- in other words, it's isomorphic to another object in that category.<br /><div><br /></div><div>It's natural to identify the <b>subobjects</b> of $X$ then with injections (or rather <b>monomorphisms</b>) into $X$ (not with the domains of the injections, as they may have multiple embeddings into $X$). But the identification is not one-to-one -- multiple injections may have the same image. In general, two monomorphisms $g_1:S_1\hookrightarrow X$, $g_2:S_2\hookrightarrow X$ having the same image is an equivalence relation, expressible as:</div><div>$$\exists i:S_1\leftrightarrow S_2,\ g_1=g_2\circ i$$</div><div>So we identify the <b>equivalence classes of monomorphisms into an object</b> with its subobjects.<br /><br /><div class="twn-furtherinsight">An alternate motivation for our definition of a subobject comes from the <a href="https://thewindingnumber.blogspot.com/2019/08/topology-iii-example-topologies.html">subspace topology</a> (and vice versa), which is defined in terms of continuous inclusion maps. Of course, in our definition here, we allow the subspace topology to be any topology that allows an injective continuous map to the space, but the standard definition in topology asks for the coarsest such topology (i.e. the "least continuous" such map). Later, we will study some refined definitions of a lot of ideas here that may apply better to specific categories.</div><br /></div><hr /><br /><b>Quotient objects</b><br /><br />The natural way to think of quotient objects in category theory is in terms of the <b>first isomorphism theorem</b>, which states that the <b>quotient objects are the images of surjections from the object</b> -- kinda "dual" to how <b>subobjects are the images of injections into the object</b> (keep this notion of "<b>duality</b>" in mind).<br /><br /><div class="twn-furtherinsight">You might be afraid that this kind of defeats the point, since we'd like to eventually prove the First Isomorphism Theorem in category theory. Well, we'll do so with some other more category-specific definition of quotients, etc. so the First Isomorphism Theorem would simply be a demonstration that these two definitions are equivalent.</div><br />But once again, just identifying the quotient objects with epimorphisms overcounts them -- a single quotient can map to multiple different isomorphic things. So, as before, we write down an equivalence relation between epimorphisms from $X$: two epimorphisms $g_1:X\twoheadrightarrow Q_1$, $g_2:X\twoheadrightarrow Q_2$ are equivalent if:<br />$$\exists i: Q_1\leftrightarrow Q_2,\ g_2=i\circ g_1$$<br />So we identify these <b>equivalence classes of epimorphisms</b> as the <i>quotient objects</i> of an object $X$.<br /><br /><hr /><br /><b>Products</b><br /><br />One can take inspiration from the <a href="https://thewindingnumber.blogspot.com/2019/08/topology-iii-example-topologies.html">product topology</a>, and think of the product of some objects as the object with the "least information" that still allows morphisms to each object. So we define:<br /><blockquote>Given a collection $X_i$ of objects, we define their <b>product</b> $\prod X_i$ as a collection of morphisms $\pi_i:X\to X_i$ such that:<br /><ol><li>For any other collection of morphisms $\pi'_i:X'\to X_i$, $\exists! u: X' \to X$ such that $\pi'_i=\pi_i\circ u$.</li></ol></blockquote><b>Question:</b> what does the empty product look like? This definition seems a bit bad for this purpose. We'll develop some more general machinery in the next article or so.<br /><br /><hr /><br /><b>Sums (aka "coproducts")</b><br /><b><br /></b>Shockingly enough, the "opposite" or "dual" of the above. <b>Direct sums</b> of vector spaces can be seen as the "smallest possible" (i.e. embedding into most possible things, i.e. having most possible information) vector space permitting morphisms from each vector space. Another example would be the <b>disjoint union</b> of sets. Perhaps this is even clearer with the <b>free product of groups</b>, where the free product is the object with the "most information possible" arising from your groups.<br /><blockquote>Given a collection $X_i$ of objects, we define their <b>sum</b> $\coprod X_i$ as a collection of morphisms $\varpi_i:X_i\to X$ such that:<br /><ol><li>For any other collection of morphisms $\varpi'_i:X_i\to X'$, $\exists! u: X \to X'$ such that $\varpi'_i=u\circ \varpi_i$.</li></ol></blockquote>Because of its "dual" appearance to the product (which we will soon see described more generally), the sum is often known as the "coproduct".<br /><br /><hr /><br />We'll now start to list subobjects and quotient objects related to a morphism. Here's a convenient cheat-sheet: the following diagram is commutative. (dotted lines are zero morphisms, which we will define shortly)<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-9MEfbjnEauc/Xhr0pXskqCI/AAAAAAAAF7w/xFtLWpg-_ggWybFRUgZ6Dv1p75zpfUo7QCLcBGAsYHQ/s1600/comm_diagram%2Btikz.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="718" data-original-width="941" height="305" src="https://1.bp.blogspot.com/-9MEfbjnEauc/Xhr0pXskqCI/AAAAAAAAF7w/xFtLWpg-_ggWybFRUgZ6Dv1p75zpfUo7QCLcBGAsYHQ/s400/comm_diagram%2Btikz.png" width="400" /></a></div>(<a href="https://tikzcd.yichuanshen.de/#eyJub2RlcyI6W3sicG9zaXRpb24iOlsxLDJdLCJ2YWx1ZSI6IlgifSx7InBvc2l0aW9uIjpbMywyXSwidmFsdWUiOiJZIn0seyJwb3NpdGlvbiI6WzIsMV0sInZhbHVlIjoiSSJ9LHsicG9zaXRpb24iOlsyLDBdLCJ2YWx1ZSI6IkknIn0seyJwb3NpdGlvbiI6WzIsM10sInZhbHVlIjoiXFxiYXJ7SX0ifSx7InBvc2l0aW9uIjpbMiw0XSwidmFsdWUiOiJcXG92ZXJsaW5le0l9JyJ9LHsicG9zaXRpb24iOlswLDFdLCJ2YWx1ZSI6IksifSx7InBvc2l0aW9uIjpbMCwzXSwidmFsdWUiOiJLJyJ9LHsicG9zaXRpb24iOls0LDFdLCJ2YWx1ZSI6Ilxcb3ZlcmxpbmV7S30ifSx7InBvc2l0aW9uIjpbNCwzXSwidmFsdWUiOiJcXG92ZXJsaW5le0t9JyJ9XSwiZWRnZXMiOlt7ImZyb20iOjAsInRvIjoxLCJ2YWx1ZSI6ImYifSx7ImZyb20iOjAsInRvIjoyLCJ2YWx1ZSI6ImZfSSJ9LHsiZnJvbSI6MiwidG8iOjEsInRhaWwiOiJob29rIiwidmFsdWUiOiJlIn0seyJmcm9tIjowLCJ0byI6MywidmFsdWUiOiJmX3tJJ30ifSx7ImZyb20iOjMsInRvIjoxLCJ2YWx1ZSI6ImUnIiwidGFpbCI6Im5vbmUiLCJsaW5lIjoic29saWQifSx7ImZyb20iOjAsInRvIjo0LCJ2YWx1ZSI6Ilxcb3ZlcmxpbmV7ZX0ifSx7ImZyb20iOjAsInRvIjo1LCJ2YWx1ZSI6Ilxcb3ZlcmxpbmV7ZX0nIn0seyJmcm9tIjo0LCJ0byI6MSwidmFsdWUiOiJmX3tcXG92ZXJsaW5le0l9fSJ9LHsiZnJvbSI6NSwidG8iOjEsInZhbHVlIjoiZl97XFxvdmVybGluZXtJfSd9In0seyJmcm9tIjoyLCJ0byI6MywidmFsdWUiOiJcXGV4aXN0cyFcXCAgdSIsImxhYmVsUG9zaXRpb24iOiJpbnNpZGUifSx7ImZyb20iOjUsInRvIjo0LCJsYWJlbFBvc2l0aW9uIjoiaW5zaWRlIiwidmFsdWUiOiJcXGV4aXN0cyFcXCBcXG92ZXJsaW5le3V9In0seyJmcm9tIjo2LCJ0byI6MCwidGFpbCI6Imhvb2siLCJ2YWx1ZSI6ImsifSx7ImZyb20iOjcsInRvIjowLCJ0YWlsIjoiaG9vayIsInZhbHVlIjoiaycifSx7ImZyb20iOjcsInRvIjo2LCJsYWJlbFBvc2l0aW9uIjoiaW5zaWRlIiwidmFsdWUiOiJcXGV4aXN0cyFcXCB2In0seyJmcm9tIjo2LCJ0byI6MSwibGluZSI6ImRvdHRlZCIsImJlbmQiOjB9LHsiZnJvbSI6NywidG8iOjEsImxpbmUiOiJkb3R0ZWQifSx7ImZyb20iOjgsInRvIjo5LCJsYWJlbFBvc2l0aW9uIjoiaW5zaWRlIiwidmFsdWUiOiJcXGV4aXN0cyFcXCBcXG92ZXJsaW5le3Z9IiwiaGVhZCI6InR3b2hlYWRzIn0seyJmcm9tIjoxLCJ0byI6OCwiaGVhZCI6InR3b2hlYWRzIiwidmFsdWUiOiJcXG92ZXJsaW5le2t9In0seyJmcm9tIjoxLCJ0byI6OSwiaGVhZCI6InR3b2hlYWRzIiwidmFsdWUiOiJcXG92ZXJsaW5le2t9JyIsImxpbmUiOiJkb3R0ZWQifSx7ImZyb20iOjAsInRvIjo4LCJsaW5lIjoiZG90dGVkIn0seyJmcm9tIjowLCJ0byI6OSwibGluZSI6ImRvdHRlZCJ9XX0=">play with it!</a>)<br /><br /><div></div><hr /><br /><b>Images</b><br /><br />Next, let's think about the image of a morphism $f$. Once again, we can identify an image with a monomorphism, so we're really looking for a subobject consisting of monomorphisms $g$ with the same image as $f$. So we want an $e:I\to Y$ such that there exists a morphism $f_I: X\to S$ with $f=e\circ f_I$ -- but this is not enough, $I$ may be "too big" (it may contain elements that map to $Y$ not in the image), so we define:<br /><blockquote>A monomorphism $e:I\hookrightarrow Y$ is the <b>image</b> of a function $f:X\to Y$ if:<br /><ol><li>$\exists f_I:X\to I, f=e\circ f_I$</li><li>For any $e':I'\to Y$ and $f_{I'}:X\to I'$ such that $f=e'\circ f_{I'}$, there $\exists! u:I\to I', e=e'\circ u$.</li></ol></blockquote>This relies on the following key lemma of course: <b>the images of $f$ form a subobject</b>. This follows straightforwardly from the second condition.<br /><br /><hr /><b><br /></b> <b>Zero objects</b><br /><br />Many categories have the notion of a <b>zero element</b> in an object -- groups have identities, vector spaces have zero vectors, and let's not talk about rings and fields. And some, like topological spaces, don't.<br /><br />But we can't talk about elements of an object in category theory. But perhaps the examples above give you an idea -- we <i>have</i> seen trivial groups and trivial vector spaces. Objects comprising only of the zero element -- let's call them <b>zero objects</b>. So maybe we can talk about the images of morphisms from these trivial objects to other objects, and they would represent zero elements.<br /><br />The idea behind a zero element is that it is "privileged" in some sense -- a morphism must preserve it. Furthermore, every object contains a zero element, so there must always be morphism from the zero object to any object. These criteria fixes a unique morphism from the zero object to any given object. In fact, this idea is captured by the following definition.<br /><blockquote>A <b>universal object</b> or <b>initial object</b> $I$ is an object such that for any object $X$ in the category, there is a unique morphism $I\to X$.</blockquote>But there is also another idea behind the notion of a zero object, that it carries the "<b>minimum information</b>" compatible with the category's structure. Recall our interpretation of a morphism as something that either retains or discards information -- mapping an object <i>to</i> a zero object means discarding "as much information as possible".<br /><blockquote>A <b>final object</b> $F$ is an object such that for any object $X$ in the category, there is a unique morphism $X\to I$.</blockquote>Finally, a <b>zero object</b> is defined as an object that is both initial and final.<br /><br />Perhaps a bit surprising at first (but trivially easily), but we can in fact see that the initial and final objects are respectively <b>unique up to (unique) isomorphism</b> -- just consider the unique morphisms between the two objects. The reason it's kinda incredible that we can do this in full generality, is that e.g. we can immediately see that the trivial ring is not an initial object (because the ring of integers is), because it can't be mapped to anything, something that seems like a technicality at first glance arising from the fact that 1 must be preserved by ring homomorphisms.<br /><br />But if you think about it, the category theoretic argument also comes from the same fact -- it comes from the fact that you can't map the ring of integers to its own zero element, because that doesn't preserve 1. But if you didn't mandate that ring homomorphisms preserve the multiplicative identity, then the ring of integers would no longer be an initial object.<br /><br /><hr /><br /><b>Zero morphisms, kernels and equalizers</b><br /><br />When defining a kernel of $f:X\to Y$, we're looking for a subobject of $X$ that maps to the zero element in $Y$. Since in category theory, a subobject is an injection (from an object we'd typically view as isomorphic to the subobject), we're looking for a monomorphism (subobject) $k$ that composes with $f$ to give us a morphism that maps everything to the zero element, whatever that means.<br /><br />OK -- so what does it mean? What's a morphism that maps everything to the zero element? Recall that we're thinking of the "zero element" as the image of a morphism from the zero object (which really means we're identifying it with a subobject, i.e. an equivalence class of monomorphisms). So we can associate with our desired morphism $o:X\to Y$ the morphism from $X$ to the zero object, which then embeds into $Y$ -- so we define the zero morphism as their composition:<br />$$o:=X\to O\to Y$$<br />Where $O$ is the zero object.<br />$$\dots$$<br />There is an alternate, more general definition of a zero morphism for categories without a zero <i>object</i> that nonetheless caputres the notion of a zero <i>element</i>.<br /><br />Here's an alternate way to think about morphisms from and to a zero object. An initial morphism is essentially the "<b>least surjective homomorphism possible</b>". A final morphism is essentially the "<b>least injective homomorphism possible</b>". These are in line with our understanding of the zero object as the "smallest possible" object, or the one that contains the least information.<br /><br />In line with the way we've thought of injectivity and surjectivity when defining monomorphisms and epimorphisms, we make the following definitions.<br /><blockquote>A morphism $f:X\to Y$ is an <b>initial morphism</b> (or <i>right-zero morphism</i>, or <i>coconstant morphism</i>) if for any $g,h: Y\to V$, $g\circ f = h \circ f$.</blockquote><blockquote>A morphism $f:X\to Y$ is an <b>final morphism</b> (or <i>left-zero morphism</i>, or <i>constant morphism</i>) if for any $g,h: W\to X$, $f\circ g = f\circ h$.</blockquote>(Exercise: Check that the morphism from an initial object and the morphism to a final object satisfy this property.)<br /><br />Further, one may observe that for $l:X\to Y$ final and $r:Y\to Z$ initial, $r\circ l$ is both an initial and a final morphism. So the right general notion of a "morphism that maps everything to the zero element" is a "a morphism that is both initial and final", or a <b>zero morphism</b>.<br /><br />$$\dots$$<br />Anyway, we're obviously interested in monomorphisms $k$ to $X$ such that $f\circ k=o$. But these don't all represent the kernel -- they could represent subobjects smaller than the kernel. So we define the kernel as follows:<br /><blockquote>A monomorphism $k:K\hookrightarrow X$ is the <b>kernel</b> of $f:X\to Y$ if<br /><ol><li>$f\circ k=o_{KY}$</li><li>For any $k':K'\hookrightarrow X$ such that $f\circ k'=o_{K'Y}$, $\exists! u:K'\to K, k\circ u = k'$.</li></ol></blockquote>More generally:<br /><blockquote>Given morphisms $f_i:X\to Y$, their <b>equaliser</b> is a monomorphism $k:X\hookrightarrow X$ such that:<br /><ol><li>$f_i\circ k$ are equal for all $i$.</li><li>For any $k':K'\hookrightarrow X$ such that $f_i\circ k'$ are equal for all $i$, $\exists! u:K'\to K, k\circ u = k'$.</li></ol></blockquote>The kernel can then be understood as the equaliser of a morphism with the zero morphism.<br /><br />Of course, these definitions lie on a key lemma: <b>the kernel/equaliser is a subobject</b> -- i.e. that for two kernels $k_1:K_1\to X$ and $k_2:K_2\to X$, there $\exists i:K_1\leftrightarrow K_2, k_1=k_2\circ i$ -- this follows straightforwardly from the second condition in the definition.<br /><br /><hr /><br /><b>Cokernels, coimages and </b><b>quotienting by a subobject</b><br /><br />I once <a href="https://thewindingnumber.blogspot.com/2018/07/why-are-calculus-and-linear-algebra.html">said</a> that one of the points of learning linear algebra was as an introduction to ideas that appear repeatedly throughout algebra. Here's where we really see this in action.<br /><br />Recall the notion of a left null space $\mathrm{ker}(f^T)$ from linear algebra -- it sort-of represented the "space of constraints" on the image of a morphism, in that it was the orthogonal complement of the image $\mathrm{im}(f)$ in the co-domain. It represents the stuff that the image <i>can't</i> fall into -- or tying back into our understanding of morphisms as things that "cannot create information that wasn't already present in the domain", it represents the information the morphism <i>hasn't</i> created (whether or not it could have), a measure of non-surjectivity.<br /><br />Well, "orthogonal complement of the image" is not something restricted to vector spaces -- we can understand it more generally, as a quotient. Interestingly, it would then no longer be a subobject, which I suppose is reasonable -- we don't really have a notion right now of what a transpose morphism is in a general category.<br /><br /><div class="twn-furtherinsight">In general, though, the quotient we're looking for is not $Y/\mathrm{im}(f)$ -- clearly, that wouldn't exist in a lot of important categories, e.g. groups. That's only true in categories that seem "abelian" in some sense. Perhaps this is unsurprising, because we're thinking of the cokernel as "the information that the morphism has not created", and information is more than just elements of the set.</div><br />So the appropriate way to generalise "a quotient by the image" is to look at a quotient object of the codomain $Y$ (which, recall, is an epimorphism from $Y$) that composes with $f$ to produce the zero morphism. But dually in the case of the kernel, we must make sure that our quotient is the full, "most universal" one:<br /><blockquote>Given a function $f:X\to Y$, we define its <b>cokernel</b> as an epimorphism $\bar{k}:Y\to\bar{K}$ such that:<br /><ol><li>$\bar{k}\circ f = o_{X\bar{K}}$</li><li>For any other $k':Y\to\bar{K}'$, there $\exists! u:\bar{K}\hookrightarrow\bar{K}'$ such that $k'=u\circ k$.</li></ol></blockquote>Once again, the latter property shows that the equivalence class of these epimorphisms is indeed a quotient object.<br /><br />In fact, this "composition forms the zero morphism" notion is the generalisation of the "<b>quotient by an object</b>" notion, or of the so-called <b>exact sequence</b> below. Generally, given a subobject $S$ given by some monomorphisms $s:S\to X$, the epimorphisms $q:X\to Q$ that always compose with these monomorphisms to form the zero morphism $q\circ s = o_{SQ}$ define the quotient $X/S$, if they are a quotient object.<br /><br />$$O \to \ker f \to X \overset f \longrightarrow Y \to \operatorname{coker} f \to O$$<br />Oh, and an exact sequence is exactly the idea behind quotienting by an object.<br /><br />$$\dots$$<br />Now recall the notion of a row space $\mathrm{im}(f^T)$. Once again, the row space is best interpreted as a quotient object of $X$ (in the case of linear algebra, the quotient by $\mathrm{ker}(f)$). But constructing it in terms of the kernel would clearly be quite complicated -- and perhaps not generally so useful. A better, more "general" approach is to think of an <b>element of the row space as an equivalence class of elements that are mapped to the same element</b>. So we define:<br /><blockquote>Given a function $f:X\to Y$, we define its <b>coimage</b> as an epimorphism $\bar{e}:X\twoheadrightarrow \bar{I}$ such that:<br /><ol><li>$\exists f_{\bar{I}}:\bar{I}\to Y, f = f_{\bar{I}}\circ \bar{e}$</li><li>For any other epimorphism $\bar{e}':X\twoheadrightarrow \bar{I}'$ and $f_{\bar I'}:\bar{I}'\to Y$ such that $f = f_{\bar I'}\circ \bar{e}'$, there $\exists! u: \bar{I}'\to\bar{I}, \bar{e}=u\circ \bar{e}'$</li></ol></blockquote>Which is again clearly a quotient object.abstract mathematicscategory theoryexact sequencemathematicsSun, 12 Jan 2020 12:18:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-5394180728661295574Abhimanyu Pallavi Sudhir2020-01-12T12:18:00ZComment by Abhimanyu Pallavi Sudhir on Are these definitions of limits the same?
https://math.stackexchange.com/questions/3504151/are-these-definitions-of-limits-the-same/3504158#3504158
But that's not what your definition yields. Your definition says that every point between -1 and 1 is the limit of sin(1/x).Sun, 12 Jan 2020 09:08:59 GMThttps://math.stackexchange.com/questions/3504151/are-these-definitions-of-limits-the-same/3504158?cid=7210918#3504158Abhimanyu Pallavi Sudhir2020-01-12T09:08:59ZAnswer by Abhimanyu Pallavi Sudhir for Are these definitions of limits the same?
https://math.stackexchange.com/questions/3504151/are-these-definitions-of-limits-the-same/3504158#3504158
0<p>No. Consider <span class="math-container">$f(x)=\sin(1/x)$</span> with the origin added, near <span class="math-container">$x=0$</span>. </p>Fri, 10 Jan 2020 15:12:45 GMThttps://math.stackexchange.com/questions/3504151/-/3504158#3504158Abhimanyu Pallavi Sudhir2020-01-10T15:12:45ZComment by Abhimanyu Pallavi Sudhir on Cokernels - how to explain or get a good intuition of what they are or might be
https://math.stackexchange.com/questions/192494/cokernels-how-to-explain-or-get-a-good-intuition-of-what-they-are-or-might-be
@Tunococ I actually prefer to think of it as the "information" that is <i>not</i> created by $f$, and that a morphism is something that cannot create new "information". But we're saying the same thing, of course.Thu, 09 Jan 2020 14:23:10 GMThttps://math.stackexchange.com/questions/192494/cokernels-how-to-explain-or-get-a-good-intuition-of-what-they-are-or-might-be?cid=7204325Abhimanyu Pallavi Sudhir2020-01-09T14:23:10ZIntroduction to category theory: a second-abstraction
https://thewindingnumber.blogspot.com/2020/01/introduction-to-category-theory-second.html
0<a href="https://thewindingnumber.blogspot.com/2018/12/intuition-analogies-and-abstraction.html">When we first started talking about abstraction</a>, we did so by observing the <b>analogies between mathematical objects</b>, such as integers and polynomials, the unit circle and remainders, etc. We spent the rest of the <a href="https://thewindingnumber.blogspot.com/p/abstract-algebra-i.html">Abstract Algebra I</a> series figuring out <i>why</i> these seemingly unrelated objects had similar behaviour -- what the <b>fundamental properties</b> were that resulted in this behaviour, and making these properties the "axioms" of various abstract algebraic structures.<br /><br />But you may have later observed that even these various algebraic structures have analogies. For starters, every algebraic structure has the notion of homomorphisms -- things that commute with "structure". Then you have analogous object constructions, like trivial objects, product objects and quotient objects. And then you have the really neat stuff -- stuff like "normal subgroups are the kernels of group homomorphisms" vs. "ideals are the kernels of ring homomorphisms", etc. And perhaps most usefully of all, we've seen that some analogies themselves, such as <a href="https://thewindingnumber.blogspot.com/p/lie-theory.html">between features of Lie groups and features of Lie algebras</a>, might have a simpler and more abstract basis than the specific constructions of the theory.<br /><br />You would be justified to believe that these analogies spring fro;m some shared principles -- and you would be justified to believe that these shared principles ought to be abstracted.<br /><br />That much like how mathematical objects were found to have similar properties, and we'd categorise them as groups and rings and vector spaces and whatever -- these <b>categories of objects</b> too could have similar properties.<br /><br />So this will be our approach: without stating the axioms beforehand and only general notions of what a category is and what a homomorphism (or "morphism" in category theory) is, we will try to prove theorems we know about specific categories like groups, for general categories -- and see what axioms we'll need.<br /><br />(This, by the way, is called <a href="https://en.wikipedia.org/wiki/Reverse_mathematics">reverse mathematics</a>. We've done this often here whenever dealing with something we must be rigorous about, e.g. in <a href="https://thewindingnumber.blogspot.com/p/topology.html">Topology</a>.)<br /><br />And the real idea we should have at the back of our heads is that we should stop thinking of groups, etc. as "sets with additional structure". They're really <b><i>generalisations</i> of sets</b>, and homomorphisms are <b><i>generalisations</i> of functions</b>. (I won't go here into exactly what I mean, but a good article to get your head wrapped around this is <a href="https://thewindingnumber.blogspot.com/2019/10/sigma-fields-are-venn-diagrams.html">Sigma fields are Venn diagrams</a>, for an illustration of how measurable functions, the morphisms in the measurable spaces category, are a "generalisation of functions".) So we won't try to force our objects to be sets and give them elements, or force our morphisms to be functions -- they will just be dots and arrows satisfying some axioms. This will require a bit of thinking, e.g. defining kernels without talking about identity elements.<br /><br />Let's start.abstract mathematicsanalogiescategory theoryintuitionmathematicsWed, 01 Jan 2020 11:39:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-5708236386545205069Abhimanyu Pallavi Sudhir2020-01-01T11:39:00ZAnswer by Abhimanyu Pallavi Sudhir for Why are inverse images more important than images in mathematics?
https://mathoverflow.net/questions/22658/why-are-inverse-images-more-important-than-images-in-mathematics/349457#349457
0<p>One often thinks of a homomorphism as something that "preserves the structure" of an object, but it is often better to think of it as something that "does not add new information to the object".</p>
<p>The most basic example is in <span class="math-container">$\mathbf{Set}$</span>. The "information" of a set is its cardinality. The defining feature of the morphisms here -- <em>functions</em> -- is that they do not "create new cardinality". A point cannot be mapped to multiple points.</p>
<p>Similarly in <span class="math-container">$\mathbf{Top}$</span>, the "information" of a topological space is the distinguishability of two points. This notion of distinguishability "includes" those used in the separation axioms (so e.g. a continuous map cannot take you from the indiscrete space to the discrete space), but is more general and vague -- the general idea is that two things touching make them "kinda indistinguishable", so you can't tear them apart.</p>
<p>This idea clearly has to do with inverse images -- we're saying that for things in the codomain, the information they carry must have already existed in their preimage. In fact in the previous example, the way to formalise this notion of being "kinda indistinguishable" is best formalised in the language of open sets, and a continuous map <em>can't create new open sets</em>.</p>
<p>Perhaps the clearest example comes from the category of measurable spaces <span class="math-container">$\mathbf{Prob}$</span>. Here, the sigma fields really do represent information, and the definition of a measurable function (or <em>random variable</em>) is that it cannot talk about things that can't be measured. I.e. if a piece of apparatus just measures the number of heads in a coin-flip experiment, we can't have a random variable asking if the first toss was a head. Once again, this notion of "not adding new information" directly corresponds to preimages.</p>
<p>A bit more detail in my post <a href="https://thewindingnumber.blogspot.com/2019/10/sigma-fields-are-venn-diagrams.html" rel="nofollow noreferrer">here</a>.</p>Tue, 31 Dec 2019 17:50:15 GMThttps://mathoverflow.net/questions/22658/-/349457#349457Abhimanyu Pallavi Sudhir2019-12-31T17:50:15ZMoments as tensors
https://thewindingnumber.blogspot.com/2019/12/moments-as-tensors.html
0We discussed the second multivariate moment a bit haphazardly in the <a href="https://thewindingnumber.blogspot.com/2019/12/covariance-matrix-and-mahalanobis.html">last article</a>. In general, we'd like a nice way of expressing the general moment (i.e. multivariate cross-moment).<br /><br />Let $X=(X^1,\ldots X^n)$ be a vector of random variables, and consider their $p$th order moments ($p\le n$) -- these form a rank-$p$ tensor of dimension $n$, the <b>moment tensor</b>, given by:<br /><br />$$Mp[X]^{j_1\ldots j_p}=\mathrm{E}(X^{j_1}\ldots X^{j_p})$$<br />(e.g. $p=1$ gives you the mean vector, $p=2$ gives you the badly-named auto"correlation" matrix) And the central moments form a similar tensor, the <b>central moment tensor</b>, given by:<br /><br />$$mp[X]^{j_1\ldots j_p}=\mathrm{E}\left((X^{j_1}-EX^{j_1})\ldots (X^{j_p}-EX^{j_p})\right)$$<br />(e.g. $p=1$ gives you zero, annoyingly, but $p=2$ gives you the covariance matrix aka autocovariance matrix) But, well, each random variable $X^i$ can also be understood as a vector, <a href="https://thewindingnumber.blogspot.com/2018/02/random-variables-and-their-properties.html">remember?</a> Let's write $X^i=(X^i_\alpha)$ for $\alpha$ a pseudo-index that represents the idea that $X^i$ is a vector (I guess this is really <a href="https://en.wikipedia.org/wiki/Abstract_index_notation">Penrose (abstract index) notation</a> rather than Einstein notation).<br /><div><br /></div><div>Actually, let's also make the following extension to tensor notation: <b><i>every</i> Greek index is summed over</b>, regardless of whether/how many times it's repeated and where -- and we take the <b>expectation instead of the sum</b> (which is like a normalized sum, or some sort of a trace). So we write:<br /><br />$$Mp[X]^{j_1\ldots j_p}=X^{j_p}_\alpha\ldots X^{j_p}_\alpha$$$$mp[X]^{j_1\ldots j_p}=(X^{j_p}_\alpha-X^{j_p}_{\alpha_1})\ldots (X^{j_p}_\alpha-X^{j_p}_{\alpha_p})$$<br />Where we use different dummy indices $\alpha_1,\ldots\alpha_n$ to indicate that these are summed over earlier (since they're not repeated again in the expression). These changes to index notation are all an artifact of the fact that random variables are not really "fundamentally quadratic", but rather "fundamentally $p$-normed".</div><br /><hr /><br />OK -- so that's the univariate cross-moment -- it can also be considered a multivariate moment, the moment of the random vector $X$ -- its mean is the mean vector, its variance is the covariance matrix, etc. What about cross moments between random vectors? And you can imagine that once we have that, we'll call it a moment of a random rank-2 tensor, and so on.<br /><br />What we're really looking for is the moment of a random tensor. This is a rank $pq$ tensor where $p$ is the degree of the moment and $q$ is the rank of the random tensor. As an example, when $p=2$ and $q=2$, one gets a rank 4 tensor consisting of cross-covariance (and autocovariance) matrices.<br /><br />Note that this is not at all some unnecessary generalisation -- measuring the correlation between random vectors is a thing with <i>very significant practical implication</i>.<br /><br />For example, a time series is a random vector -- its covariance matrix represents its internal correlations (how well its current value predicts a future value), but often we're interested in looking at <b>correlations <i>between</i> time series</b> -- how does the price of gold correlate with the price of S&P 500, etc. Then this cross-covariance matrix will be a bivariate function of $(t_1,t_2)$, called the <b>cross-correlation function</b>.covariance matrixcross-correlation functioncross-correlation matrixmathematicsmomentsstatisticstensorstime seriesThu, 26 Dec 2019 10:02:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-1598037878177345535Abhimanyu Pallavi Sudhir2019-12-26T10:02:00ZSample statistics: order statistics
https://thewindingnumber.blogspot.com/2019/12/sample-statistics-order-statistics.html
0The idea behind sampling is the duality between a tuple of IID random variables and a certain multivariate random variable -- since a sample of a distribution is just some tuple $\mathbf{X}=(X_1,\ldots X_n)$, one can consider this to be a random variable taking values in $\mathbb{X}^n$ where each $X_i$ is a random variable taking values in $\mathbb{X}$.<br /><br />In particular, one can define measurable functions $\mathbb{X}^n\to\mathbb{X}$ such as, e.g. sample moments (the sample mean, etc.). Another set of sample statistics one may define (when $\mathbb{X}$ is a totally ordered set, such as $\mathbb{R}$ but notably not something like $\mathbb{R}^m$) are the order statistics, which we will denote as $\Omega_i$, where $\Omega_i(x_1,\ldots x_n)$ gives the $i$th value in the sorted (in ascending order) list -- in particular, $\Omega_1$ is the $\min$ function and $\Omega_n$ is the $\max$ function.<br /><br />Well, so $\Omega_i(\mathbf{X})$ is a random variable -- we can ask about how it's distributed.<br /><br />For example, to calculate the cumulative of $\Omega_n=\max$, note that $\max(\mathbf{X})\le w \iff \forall i, X_i\le w$. Since the $X_i$s are IID, the CDF is just:<br /><br />$$F_{\Omega_n(\mathbf{X})}(w)=F_{X}(w)^n$$<br /><br /><div class="twn-furtherinsight">It's clear that as $n\to\infty$, this function approaches (non-uniformly) either the Heaviside step function or zero (or really the "Heaviside step function at $+\infty$"). This makes sense -- if your distribution has a finite upper bound, then you'll eventually get that bound and the maximum of an infinite sample (i.e. of the distribution) will be that bound, but if it doesn't, then you're eventually bound to get every value, the maximum of an infinite sample is infinity.</div><br /><iframe frameborder="0" height="500px" src="https://www.desmos.com/calculator/lvqmfrzyhm?embed" style="border: 1px solid #ccc;" width="500px"></iframe><br /><center><i>Illustration of $F(x)^n$ for different distributions as $n\to\infty$</i></center><br />Similarly, $\min(\mathbf{X})\le w\iff \lnot \forall i, \lnot (X_i\le w)$. So the CDF is:<br /><br />$$F_{\Omega_1(\mathbf{X})}(w)=1-(1-F_X(w))^n$$<br />OK, teaser over. Now consider $F_{\Omega_i(\mathbf{X})}(w)$, which is the probability that at least $i$ of the data points are $\le w$. The probability that some some specific $r$-selection is exactly these data points is $F_X(w)^r(1-F_X(w))^{n-r}$. So:<br /><br />$$F_{\Omega_i(\mathbf{X})}(w)=\sum_{r=i}^{n}{\binom{n}{i}}F_X(w)^i(1-F_X(w))^{n-i}$$<br />Interestingly, the joint PDF of the order statistics (which are not at all uncorrelated) actually has a much simpler form -- the probability that $(\Omega_{1}(\mathbf{X}),\ldots \Omega_{n}(\mathbf{X}))$ takes the value $(x_1,\ldots x_n)$ is zero if the latter is not in ascending order. And if it is, the value can result from $(X_1,\ldots X_n)$ taking a value that is some permutation of $(x_1,\ldots x_n)$ -- and there are $n!$ such permutations. So the joint PDF is:<br /><br />$$f_{\mathbf{\Omega}(\mathbf{X})}(\mathbf{w})=n!\prod_i f_X(w_i)$$<br />So the formulae above are just some fancy special cases of integration by parts on the above.extreme value theoryorder statisticssamplingstatisticsTue, 24 Dec 2019 11:25:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-4788931004664535689Abhimanyu Pallavi Sudhir2019-12-24T11:25:00ZSpecial cases of Bayesian inference
https://thewindingnumber.blogspot.com/2019/12/special-cases-of-bayesian-inference.html
0In the last article, we saw that Bayes's theorem allows us to infer theories from data -- or rather infer from data a distribution on the theory-space. This is <i>the</i> general way to make inferences and use these inferences to make predictions. We don't really need to bother about questions of whether we should choose the mean or median or mode as the theory we "choose", because we don't have to choose -- we make predictions with the entire distribution.<br /><br />For example in the most general setting, we're trying to calculate our optimal action out of a set of choices based on a certain unknown utility function that depends on the right theory of physics (to calculate the consequences of the action). We don't need to "choose" the right theory of physics from the probability distribution we have on it -- we just make predictions for each possible theory in the support and integrate over the distribution to calculate the distribution on possible consequences. It's only at this point where we need to care about whether we want the action with the maximum mean, median, mode or whatever of the utility function.<br /><br /><div class="twn-furtherinsight">Can you show that <b>Bayesian inference can always converge</b>? I.e. that there is always some data collection mechanism that allows your distribution to converge to a Dirac delta at some point? Well, there is, and it's called "sampling". What if we can't sample directly but collect some other indirect data? Comment on the philosophical implications of such a theory-space (hint: think about a <b>probabilistic generalisation of falsifiability</b> (a criterion for a theory to be scientific)).</div><br />But some special cases of Bayesian inference -- with <b>special choices of prior</b> -- exist and are often discussed/used in the literature for various reasons, and in fact, they often come with mechanisms to choose <b>specific estimates</b> based on the posterior distribution, as well as specific <b>projections of our data</b> to use. We will discuss some of these examples below.<br /><br /><hr /><br />Perhaps the most straightforward example is <b>maximum likelihood estimation</b>. Here, the prior distribution is the <b>uniform</b> distribution, i.e.<br /><br />$$\mathrm{Pr}(\theta\mid x)=\frac{\mathrm{Pr}( x\mid\theta)}{\mathrm{Pr}(x)}$$<br />And we always take the <b>mode</b> of this (the value of theta that maximises $\mathrm{Pr}(\theta\mid x)$) to be the estimate of the parameter. This is just the value of theta that maximises $\mathrm{Pr}( x\mid\theta)$ (for observed data values $x$), which is called the <b>likelihood function</b>.<br /><br />I.e. we're saying we have no preconceptions about the theory, and the right theory is just the one that gives the observed data the highest probability (or probability density).<br /><br />Note that even in maximum likelihood estimation, you kinda do have non-uniform prior, in the sense that the prior is only uniform on its support, and you give zero values to things outside the domain of the parameter, or to theories other than the family being tested, etc. So if you were just taught maximum likelihood estimation, it would be easy for you to come up with the idea of Bayesian inference.<br /><br /><hr /><br />The next one is a hypothesis test -- this actually took me a while to figure out the implied prior of, because of the whole weirdness with looking at the probabilities of things like $\mathrm{Pr}(x\ge \mathrm{data})$ rather than just $\mathrm{Pr}(x=\mathrm{data})$.<br /><br />But then I realised that whole thing is plainly a distraction -- the $x\ge \mathrm{data}$ stuff is just there so we can say smart-sounding things like "the probability of getting a result <i>that extreme</i> is such-and-such if the null hypothesis were true". One can write a bijection $\mathrm{Pr}(x\ge a)\leftrightarrow a$, because it is a decreasing function. So the value of $\mathrm{Pr}(x\ge a)$ allows us to find in particular the PMF or PDF at $a$.<br /><br />So suppose we have the hierarchial prior $\mathrm{P}(\theta)$ given by:<br /><br />$$\mathrm{P}(\theta)=p\delta(\theta-\theta_0)+(1-p)\rho(\theta)$$<br />Then<br /><br />TBC:<br />hypothesis test, confidence interval -- <a href="http://econ.ucsb.edu/~startz/Choosing%20The%20More%20Likely%20Hypothesis.pdf">http://econ.ucsb.edu/~startz/Choosing%20The%20More%20Likely%20Hypothesis.pdf</a><br />lasso, ridge, tikhonov<br /><a href="https://en.wikipedia.org/wiki/Prior_probability#Weakly_informative_priors">https://en.wikipedia.org/wiki/Prior_probability#Weakly_informative_priors</a><br /><a href="https://en.wikipedia.org/wiki/Beta_distribution#Haldane's_prior_probability_(Beta(0,0))">https://en.wikipedia.org/wiki/Beta_distribution#Haldane's_prior_probability_(Beta(0,0))</a><br />decision theory<br />various test statisticsbayes's theoremconfidence intervalsfrequentist probabilityhypothesis testingmaximum likelihood estimationstatistical estimationstatistical inferencestatisticsThu, 19 Dec 2019 13:20:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-5430764464701664666Abhimanyu Pallavi Sudhir2019-12-19T13:20:00ZTwo-envelopes problem: beyond the Bayes explanation
https://thewindingnumber.blogspot.com/2019/12/two-envelopes-problem-beyond-bayes.html
0The two-envelopes problem is phrased as follows:<br /><br /><blockquote>There are two envelopes, one with twice the amount of money as the latter -- you pick one of them at random and open it to find $x$ dollars. You don't know if this was the larger or smaller sum.<br /><br />Should you switch?<br /><br />Though symmetry would tell you that it doesn't matter, it seems that by switching you get $x$ if the other envelope is bigger, but only lose $x/2$ if the other envelope is smaller. So you should switch. Right? </blockquote><br />The standard explanation to this problem is that we need to pick a prior to make any actual inference, and the probability of the other envelope being bigger is usually less than the probability of it being smaller, in the sense that probability distributions approach zero towards infinity.<br /><br />But this explanation doesn't really make the problem go away. Consider this prior:<br /><br />$$<br />f(x) = \left\{ {\begin{array}{*{20}{c}}{\frac{1}{4}{{(3/4)}^n}}&{{\rm{if }}\ x = {2^n},\,\,n \in {\mathbb{Z}_{ \ge 0}}}\\0&{{\rm{else}}}\end{array}} \right.<br />$$<br />This is a perfectly valid prior distribution (the improper prior assumed in the original "paradox" would have worked as well, but I don't want to distract you with questions about distribution propriety) for which exactly the same paradox applies -- one may show (and I encourage you to work this out) that whatever value $x$ I find in the envelope, the expected value of the other envelope is $8/7\ x$.<br /><b><br /></b>Let's try to remember why this is weird -- what exactly is so paradoxical about it?<br /><br />What's weird is that before opening your envelope, the situation is completely symmetrical. Yet you know that when whatever happens once you open the envelope, you'll want to switch.<br /><br />Well, that's weird, but the reason it's paradoxical is this: since each expectation $E(X_2\mid X_1=x_1)$ (for varying possible $x_1$) is greater than $x_1$, the expectation of $X_2$ should itself be greater than the expectation of $X_1$, right? It's a tree diagram, you're just summing over each possible value of $X_1$.<br /><br />How can $E(X_2)=8/7\cdot E(X_1)$ but at the same time $X_1$ and $X_2$ are completely symmetric?<br /><br />Hopefully some light bulbs are turning on in your head -- indeed:<br /><br />$$E(X_1)=\sum_{n\in\mathbb{Z}_{\ge0}}\frac14\left(\frac34\right)^n2^n=\infty$$<br />Yep. <i>That's</i> why the situation is symmetric.<br /><br />Isn't that incredible?bayes's theoremmathematicsparadoxprobabilityprobability puzzlesstatistical inferencestatisticstwo-envelopes problemWed, 11 Dec 2019 01:02:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-4679758997639945016Abhimanyu Pallavi Sudhir2019-12-11T01:02:00ZCovariance matrix and Mahalanobis distance
https://thewindingnumber.blogspot.com/2019/12/covariance-matrix-and-mahalanobis.html
0In the article <a href="https://thewindingnumber.blogspot.com/2018/02/random-variables-and-their-properties.html">Random variables as vectors</a>, we discussed that random variables were vectors, and their covariance was their dot product. <br /><br />The basic motivation for coming up with this idea was from contrasting $\mathrm{Var}(X+X)=4\mathrm{Var}(X)$ to the formula for variables with zero covariance $\mathrm{Var}(X+Y)=\mathrm{Var}(X)+\mathrm{Var}(Y)$. These correspond to the geometric cases of adding two parallel and perpendicular vectors -- a more general addition is expressed through the cosine rule. What's the "cosine rule for random variables"?<br /><br />Well, it's $\mathrm{Var}(X+Y)=\mathrm{Var}(X)+\mathrm{Var}(Y)+2\mathrm{Cov}(X,Y)$. To me, this -- like the dot product form of the cosine rule -- is highly suggestive of a bilinear form, specifically the Gram matrix, called the <b>covariance matrix</b>, of the random vector $\mathbf{X}=\left[ {\begin{array}{*{20}{c}}X\\Y\end{array}} \right]$ (which is really to be seen as a "matrix", because the random variables are to be understood as row vectors).<br /><br />$$\Sigma ({X_1}, \ldots {X_n}) = \left[ {{\rm{Cov}}({X_i},{X_j})} \right]$$<br />One may compare this Gram matrix interpretation -- $\Sigma=\mathbf{X}\mathbf{X}^T$ (note: <i>not</i> $\mathbf{X}^T\mathbf{X}$, the way we've defined $X$ -- this is important!) -- to the variance formula $\sigma^2=XX^T$, and realise that the covariance matrix is the "right" measure of variance of a random vector (note how if we made random variables column vectors, this would all become $X^TX$, etc.).<br /><br />(yeah, yeah, you need to subtract the mean, etc.)<br /><br />Analogously, one may define a <b>cross-covariance matrix</b> $K_{\mathbf{X}\mathbf{Y}}=\mathrm{E}((\mathbf{X}-\mu_{\mathbf{X}})(\mathbf{Y}-\mu_{\mathbf{Y}})^T)$ measuring the covariance between two random vectors.<br /><br /><hr /><br />It is rather natural to see this, being a bilinear form, as related to some notion of distance -- the standard deviation, after all, can be seen as a "natural distance unit" in one dimension (in the sense that the "unlikeliness" of a data point depends on its distance from the mean in units of standard deviation). <br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-z_h45jCnvYk/XexRpjECkuI/AAAAAAAAF4Q/NM_6VYYMYTApRmfFDMHZEbNY382dcsGqACLcBGAsYHQ/s1600/covariance.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="476" data-original-width="482" height="316" src="https://1.bp.blogspot.com/-z_h45jCnvYk/XexRpjECkuI/AAAAAAAAF4Q/NM_6VYYMYTApRmfFDMHZEbNY382dcsGqACLcBGAsYHQ/s320/covariance.png" width="320" /></a></div><br />Suppose we wish to find the variance across some direction, i.e. the variance of some random variable $u_1X+u_2Y=\mathbf{X}\hat{u}$ with $|\hat{u}|=1$ -- this is clearly just $\hat{u}^T\Sigma\hat{u}$. So this defines a natural distance scale in the direction of $\hat{u}$, so that the norm of a vector $\vec{v}$ is defined as:<br /><br />$$\|\vec{v}\|=\frac{\vec{v}^T\vec{v}}{\hat{v}^T\Sigma\hat{v}}$$<br />It is not too hard to show -- from the bilinearity of the expression -- that this is equivalent to:<br /><br />$$\|\vec{v}\|=\vec{v}^T\Sigma^{-1}\vec{v}$$<br />Another way to interpret is this $\Sigma^{-1}$ maps the distribution into a spherical one (one with identity covariance matrix), and this norm is just the norm of the data point in this spherical distribution, which is adjusted for variances and covariances. This measure of distance is called the <b>Mahalanobis distance</b>.covariancecovariance matrixdot productgram matrixlinear algebramahalanobis distanceprobabilityrandom variablesrandom vectorsstatisticsvarianceSun, 08 Dec 2019 13:23:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-4516995023758440264Abhimanyu Pallavi Sudhir2019-12-08T13:23:00ZIntroduction to Bayesian inference
https://thewindingnumber.blogspot.com/2019/12/introduction-to-bayesian-inference.html
0When the Higgs boson was discovered by the LHC, you heard comments e.g. <a href="https://blogs.scientificamerican.com/observations/five-sigmawhats-that/">here</a> that "there is no way to definitively talk about the probability of the existence of a Higgs boson" -- more specifically, the quoted physicist in the linked article claimed "there is no way eliminate the conditional".<br /><br />This is just wrong. There <i>is</i> a way to eliminate the conditional -- it's just Bayes's theorem!<br /><br />Imagine setting up a tree diagram for this experiment. (The LHC made a certain observation which is highly correlated with the existence of the Higgs boson, which is what we're expressing below.)<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-o0Zalo9aNoE/XeuUbhiIK5I/AAAAAAAAF30/sclEfz1xPk06135QkB4phqUztk3qNM6-wCLcBGAsYHQ/s1600/tree.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="476" data-original-width="482" height="316" src="https://1.bp.blogspot.com/-o0Zalo9aNoE/XeuUbhiIK5I/AAAAAAAAF30/sclEfz1xPk06135QkB4phqUztk3qNM6-wCLcBGAsYHQ/s320/tree.png" width="320" /></a></div>As a graphical interpretation of Bayes's theorem:<br /><br />$$\mathrm{Pr}(\mathrm{Theory}\mid\mathrm{Data})=\frac{\mathrm{Pr}(\mathrm{Data}\mid\mathrm{Theory})}{\mathrm{Pr}(\mathrm{Data})}\mathrm{Pr}(\mathrm{Theory})$$<br /><br />What this needs, however, is a <b>prior distribution</b> $\mathrm{Pr}(\mathrm{Theory})$ on the probability of the Higgs boson existing -- before any data is supplied.<br /><br />And this distribution <i>obviously</i> exists. If you've ever heard a physicist say "we were already pretty sure it existed, this experiment just increased our confidence in it", then you know what I'm talking about. This "pretty sure"-ness in its existence is a prior distribution.<br /><br />OK -- but how does this prior distribution exist? Why were physicists pretty sure that the Higgs boson existed?<br /><br />Well, there's a reason they postulated a Higgs boson in the first place -- to explain the mass of certain particles, etc. And they had already observed that those particles have a mass -- i.e. they <b>had already collected some data</b>.<br /><br />This <b>prior distribution is the posterior distribution of a previous experiment</b>.<br /><br /><hr /><br /><div class="twn-furtherinsight">But... you're saying that the prior distribution is inferred as the posterior distribution of a previous experiment, which itself depends on the prior distribution of that experiment... isn't it just turtles all the way down?<br /><br />Well, as you might have guessed, we need a <b>true prior distribution</b> on all physical theories about the universe, before collecting any data. This prior is essentially arbitrary, which might be philosophically troubling for you, but that's just how things are. <b>There is no right prior distribution, it's fundamentally subjective</b>.<br /><br />But I'd say most people would adopt a prior distribution based on Occam's razor, giving higher priors to simpler theories -- e.g. even though all existing data agrees with both "string theory" and "string theory until New Year 2020, and then the world ends", we give higher priority to the first one, simply because it takes "fewer lines of code" to write. If you're interested in this stuff, I recommend reading about:<br /><ul><li><b>Occam's razor</b></li><li><b>Kolmogorov complexity</b></li><li><b>Solomonoff's theory of inductive inference</b></li></ul>I suppose Solmonoff's is not the only way to set priors -- a principle that I have heard of but do not really understand (and don't bother with, because it seems really boring) is the <a href="https://en.wikipedia.org/wiki/Principle_of_indifference">principle of indifference</a>.</div><br /><hr /><br />This perspective on inference, though, requires the following key fact about Bayesian inference, known as <b>the stability of Bayesian inference</b>: feeding some data, then feeding some other independent data, is equivalent to feeding both data together (the independence is important, because e.g. if you fed in the same piece of data twice, that should not affect your distribution). We can prove this fairly easily:<br /><br />Given a prior distribution $\mathrm{B}_0(\theta)$ on theory-space values $\theta$, the posterior distribution upon observing some data $\delta_1$ is:<br /><br />$$\mathrm{B}_1(\theta)=\frac{\mathrm{P}(\delta_1\mid\theta)}{\sum_\phi \mathrm{P}(\delta_1\mid\phi)\mathrm{B}_0(\phi)}\mathrm{B}_0(\theta)$$<br />Which then becomes the prior distribution for a subsequent observation of $\delta_2$:<br /><br />$$\mathrm{B}_2(\theta)=\frac{\mathrm{P}(\delta_2\mid\theta)}{\sum_\phi\mathrm{P}(\delta_2\mid\phi)\mathrm{B}_1(\phi)}\mathrm{B}_1(\theta)$$<br />Substituting in $\mathrm{B}_1(\theta)$ and simplifying:<br /><br />$$\mathrm{B_2}(\theta)=\frac{\mathrm{P}(\delta_1\land\delta_2\mid\theta)}{\sum_\phi\mathrm{P}(\delta_1\land\delta_2\mid\phi)\mathrm{B}_0(\phi)}\mathrm{B}_0(\theta)$$<br />Which is precisely what we wanted.<br /><br />Apparently there are systems of probability theory called <b>noncommutative probability</b> in which this is not possible, and statistical inference is not possible -- see <a href="https://www.princeton.edu/~hhalvors/teaching/phi538_f2005/bayesi.pdf">(Redei 1992)</a> (isn't it weird how everything in probability and statistics is so recent?). Obviously, this is not relevant to the physical applications of probability.<br /><br /><hr /><br />If you want to see Bayesian inference in action, have a look at <a href="https://abhimanyups.shinyapps.io/BayesianInference/">this interactive RShiny applet</a> I wrote that demonstrates Bayesian inference from a continuous stream of data (relying on this stability of Bayesian inference). Here's a snapshot from the applet, to pique your interest, of the evolution of this belief distribution while tossing a coin that you gradually learn is pretty unfair:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-S6e3fmJz2to/XevTIWl-PTI/AAAAAAAAF4E/yfSgbzWs5Ycc-q26VMWIHn1PowWUSJPPgCLcBGAsYHQ/s1600/bayes.gif" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="497" data-original-width="300" src="https://1.bp.blogspot.com/-S6e3fmJz2to/XevTIWl-PTI/AAAAAAAAF4E/yfSgbzWs5Ycc-q26VMWIHn1PowWUSJPPgCLcBGAsYHQ/s1600/bayes.gif" /></a></div>bayes's theoremnoncommutative probabilitynoncommutingoccam's razorpriorsprobabilitystatistical inferencestatisticssubjective probabilitySat, 07 Dec 2019 16:10:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-3681621307184124052Abhimanyu Pallavi Sudhir2019-12-07T16:10:00ZComment by Abhimanyu Pallavi Sudhir on Weird R behavior with indexing function arrays
https://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays
I see -- the idea is you just separately define the functional that produces the function from the existing function and call the functional in the loop. Makes sense.Mon, 02 Dec 2019 18:39:26 GMThttps://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays?cid=104515789Abhimanyu Pallavi Sudhir2019-12-02T18:39:26ZComment by Abhimanyu Pallavi Sudhir on Weird R behavior with indexing function arrays
https://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays
@Gregor I'm aware of functionals, but I'm not sure what they'd have to do here.Mon, 02 Dec 2019 16:24:00 GMThttps://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays?cid=104512249Abhimanyu Pallavi Sudhir2019-12-02T16:24:00ZComment by Abhimanyu Pallavi Sudhir on Weird R behavior with indexing function arrays
https://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays
Ah ok. So I just need to pass <code>i</code> as a variable to the function. Thanks @Gregor.Mon, 02 Dec 2019 16:02:44 GMThttps://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays?cid=104511603Abhimanyu Pallavi Sudhir2019-12-02T16:02:44ZComment by Abhimanyu Pallavi Sudhir on Weird R behavior with indexing function arrays
https://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays
@manotheshark Right, so I'm asking if there's a way to get R to immediately evaluate <code>posterior</code> rather than do so lazily.Mon, 02 Dec 2019 15:59:50 GMThttps://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays?cid=104511498Abhimanyu Pallavi Sudhir2019-12-02T15:59:50ZComment by Abhimanyu Pallavi Sudhir on Weird R behavior with indexing function arrays
https://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays/59142503#59142503
I figured that's what's going on, but how can I fix it? Can I "turn off" lazy evaluation?Mon, 02 Dec 2019 15:58:57 GMThttps://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays/59142503?cid=104511470#59142503Abhimanyu Pallavi Sudhir2019-12-02T15:58:57ZWeird R behavior with indexing function arrays
https://stackoverflow.com/questions/59142271/weird-r-behavior-with-indexing-function-arrays
0<p>I'm having some unexpected behaviour in R with function arrays, and I've reduced the problem to a minimal working example:</p>
<pre><code>theory = c(function(p) p)
i = 1
posterior = function(p) theory[[i]](p)
i = 2
posterior(0)
</code></pre>
<p>Which gives me an error saying the subscript <code>i</code> is out of bounds.</p>
<p>So I guess that <code>i</code> is somehow being used as a "free" variable in the definition of <code>posterior</code> so it gets updated when I redefine <code>i</code>. Oddly enough, this works:</p>
<pre><code>theory = c(function(p) p)
i = 1
posterior = theory[[i]]
i = 2
posterior(0)
</code></pre>
<p>How can I avoid this? Note that not redefining <code>i</code> is not an option, as this stuff is going in a for loop where <code>i</code> is the index.</p>rfunctionlambdaMon, 02 Dec 2019 15:41:54 GMThttps://stackoverflow.com/q/59142271Abhimanyu Pallavi Sudhir2019-12-02T15:41:54ZAnswer by Abhimanyu Pallavi Sudhir for In QM, what causes a particle to have more probability to be somewhere else when it's found in a less probable position?
https://physics.stackexchange.com/questions/516683/in-qm-what-causes-a-particle-to-have-more-probability-to-be-somewhere-else-when/516713#516713
0<p>The state collapses after measurement, and if you measure the precise position, it collapses to a position eigenstate (i.e. a precise location), so you no longer have a "probability somewhere else". The probability somewhere else is prior to the measurement.</p>
<p>If you want to learn the real ideas behind quantum mechanics, not "without the math" whatever that means, you should have a look at my partially-written blog post series here: <a href="https://thewindingnumber.blogspot.com/p/quantum-mechanics-i.html" rel="nofollow noreferrer">Quantum Mechanics I - The Winding Number</a>.</p>Fri, 29 Nov 2019 09:43:46 GMThttps://physics.stackexchange.com/questions/516683/-/516713#516713Abhimanyu Pallavi Sudhir2019-11-29T09:43:46ZWhat even are pure and applied math, anyway?
https://thewindingnumber.blogspot.com/2019/11/what-even-are-pure-and-applied-math.html
0Not really a serious post.<br /><br />I see the words "pure math" and "applied math" used a lot, and there seem to be some completely distinct meanings of the phrases:<br /><ol><li><b>Formal math</b> and <b>informal math</b> -- you can certainly approach things like <a href="https://math.stackexchange.com/questions/3402550/axiomatic-treatment-of-sum-operators-which-work-on-divergent-series/3402796#3402796">summing divergent series completely formally</a> (follow the link for proof!), and I'm sure you could in principle be hand-wavy with category theory. So this is really about the method with which you do mathematics, not the field itself. An example of where you see this is the distinction between analysis and calculus (well, <i>a</i> distinction -- sometimes calculus is defined specifically as having to do with differentials and integrals while analysis is a broader field).</li><li><b>Abstract math</b> and <b>concrete math</b> -- this really has multiple levels: category theory, abstract mathematics, mathematics, science, engineering, specific numerical calculation. The line is often drawn either before or after mathematics.</li><li><b>Theoretical </b>and <b>applied </b>-- closely related to the previous point, differing by the purely social question of the purpose of the study.</li><li><b>Everything else</b> vs <b>statistics</b> -- I think this arises from a conflation between statistics and applied/concrete statistics. Statistics can really be a totally formal field of mathematics or even abstract mathematics, but I guess people often fail to draw the distinction (unlike, say, between "differential equations" and "applied differential equations in engineering").</li><li><b>Algebra</b> vs <b>everything else</b> -- Perhaps a result of the fact that analysis and geometry often restrict to handling special concrete objects like the real and complex numbers.</li></ol><div>I guess the reason these distinctions are often taken as synonymous is that they're quite correlated. As you get more abstract, you may feel a stronger obligation to be more formal to make sure you haven't missed out on some so-called pathological cases (although I think it's perfectly possible to develop intuition for such pathological situations, see e.g. <a href="https://thewindingnumber.blogspot.com/2019/05/whats-with-e-1x-on-smooth-non-analytic.html">my e^(-1/x) article</a>, or the <a href="https://thewindingnumber.blogspot.com/p/topology.html">topology series</a>). When working for an applied purpose, it may not be useful to be too formal, for practical constraints.</div><div><br /></div><div>The correlation really lines up with the fundamental "purpose of mathematics". The point of having axiomatisations is that someone applying abstract ideas in concrete situations can just check if the axioms are satisfied -- and so you really must formally deduce things from them to make sure you're not making some assumptions specific to one concrete situation that you have in mind.</div><div><br /></div><div>(Another example of such ambiguity is the distinction between "theoretical science" and "practical science". I've still not figured out if the latter refers to experimental science or applied science, and there isn't even any correlation between the ideas here.)</div>abstract algebraabstract mathematicsmathematics educationphilosophyphysicsscience educationMon, 25 Nov 2019 14:16:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-8563106154385871331Abhimanyu Pallavi Sudhir2019-11-25T14:16:00ZComment by Abhimanyu Pallavi Sudhir on Because things smell, is everything evaporating?
https://physics.stackexchange.com/questions/515304/because-things-smell-is-everything-evaporating
I guess the essence of the question is: why are spontaneous reactions that produce gaseous products so common? Which probably has to do with the high entropy of gases or something.Thu, 21 Nov 2019 16:23:39 GMThttps://physics.stackexchange.com/questions/515304/because-things-smell-is-everything-evaporating?cid=1161380Abhimanyu Pallavi Sudhir2019-11-21T16:23:39ZComment by Abhimanyu Pallavi Sudhir on Because things smell, is everything evaporating?
https://physics.stackexchange.com/questions/515304/because-things-smell-is-everything-evaporating
The answer to the metal question is here: <a href="https://chemistry.stackexchange.com/questions/7916/why-can-we-smell-copper">Why can we smell copper?</a> and <a href="https://www.livescience.com/4233-coins-smell.html" rel="nofollow noreferrer">here</a>. I guess the standard haemogloubin explanation of the metallic smell of blood is false: <a href="https://www.quora.com/Why-does-blood-smell-like-copper/answer/Song-Chencheng" rel="nofollow noreferrer">Why does blood smell like copper?</a>Thu, 21 Nov 2019 16:20:04 GMThttps://physics.stackexchange.com/questions/515304/because-things-smell-is-everything-evaporating?cid=1161377Abhimanyu Pallavi Sudhir2019-11-21T16:20:04ZAnswer by Abhimanyu Pallavi Sudhir for the vector space of Magic Squares
https://math.stackexchange.com/questions/1692624/the-vector-space-of-magic-squares/3445000#3445000
0<p>Here's an easy way to do this for general <span class="math-container">$n$</span>: given a magic number <span class="math-container">$S$</span>, consider the topleft <span class="math-container">$(n-1)$</span> by <span class="math-container">$(n-1)$</span> submatrix of the square. Given these values, one can fill in the margins by subtracting rows and columns of the submatrix from <span class="math-container">$S$</span>, and the bottom-right entry by subtracting the diagonal of the submatrix from <span class="math-container">$S$</span>.</p>
<p>The only equations remaining to satisfy are: (1) the sum of each new margin equals <span class="math-container">$S$</span> and (2) the sum of the non-principal diagonal equals <span class="math-container">$S$</span>. The condition (1) is the same for each margin (because the last column can be determined given all the rows and all the other columns). So the conditions are, where <span class="math-container">$1\le i,j\le n$</span> and <span class="math-container">$1\lt k\lt n$</span>:</p>
<p><span class="math-container">$$\sum_{i}a_{ii}=(n-1)S-\sum_{ij}a_{ij}$$</span></p>
<p><span class="math-container">$$\sum_k a_{k(n-k+1)}+\sum_j a_{1j}+\sum_i a_{i1}=S$$</span></p>
<p>These can be checked to be linearly independent for <span class="math-container">$n>2$</span>. Allowing <span class="math-container">$S$</span> to be free, the dimension of our space is therefore <span class="math-container">$(n-1)^2-2+1$</span>, which equals: </p>
<p><span class="math-container">$$n^2-2n$$</span></p>
<p>Which indeed gives 3 in the case <span class="math-container">$n=3$</span> Meanwhile, for <span class="math-container">$n=1$</span> and <span class="math-container">$n=2$</span>, the dimension is clearly 1.</p>Thu, 21 Nov 2019 12:04:55 GMThttps://math.stackexchange.com/questions/1692624/-/3445000#3445000Abhimanyu Pallavi Sudhir2019-11-21T12:04:55ZComment by Abhimanyu Pallavi Sudhir on Why are objects at rest in motion through spacetime at the speed of light?
https://physics.stackexchange.com/questions/33840/why-are-objects-at-rest-in-motion-through-spacetime-at-the-speed-of-light/410575#410575
I did say that it's a convention about normalisation ("yes, you can choose other parameterisations"). I'm saying it's not an arbitrary convention, in the sense that it's perfectly sensible to ask that $\tau$ becomes $t$ for a stationary object.Tue, 12 Nov 2019 11:06:21 GMThttps://physics.stackexchange.com/questions/33840/why-are-objects-at-rest-in-motion-through-spacetime-at-the-speed-of-light/410575?cid=1157162#410575Abhimanyu Pallavi Sudhir2019-11-12T11:06:21ZFinancial derivatives, payoff functions and portfolios: motivation
https://thewindingnumber.blogspot.com/2019/11/financial-derivatives-payoff-functions.html
0<b>Key ideas in this post</b><br /><ul><li>Payoff functions are central, derivatives are just ways to achieve specified payoff functions. Your entire portfolio is also a derivative. We are interested in payoff functions that maximise certain combinations of expectation, risk, and other moments (depending on the investor's preferences).</li><li>Shorting is just "investing in the rest of the market" and is the natural way to get a payoff function of $-x$.</li></ul><hr /><br />When I first saw the definitions of several financial assets, I found them completely arbitrary -- it's not that I didn't get the reason one would have them, but rather that I saw no way to immediately understand them or a starting point for reasoning about them mathematically. Other than what was perhaps the most basic asset -- stocks (and also bonds, physical assets, etc.) and their baskets -- all the derivatives (and things that aren't called derivatives) based on them seemed really artificial in their construction.<br /><br />But this isn't exactly unfamiliar territory, is it? You've seen unmotivated definitions in mathematics, and you've seen that you need to put in quite a bit of effort to really motivate them and understand why they make perfect sense -- you've seen that, e.g. <a href="https://thewindingnumber.blogspot.com/">here</a>.<br /><br />So let's do the same thing with finance.<br /><br /><hr /><br />Let's start with a simple one: <b>shorting</b>. <br /><br />There is a certain asymmetry in the definitions of longing and shorting, isn't there? It's the "borrowing a stock" part of the definition of shorting that introduces this asymmetry.<br /><br />But if you've spent any time thinking about economics, the idea of borrowing something you don't have should be familiar -- it's what you do when you don't have any investment capital to start with, but you think you can grow the value of what you've borrowed by e.g. investing it in a stock. Let's phrase this in a slightly different (and by "slightly different", I mean "take the buying-selling dual of") way:<br /><br /><b>How to invest in a stock without money at hand:</b> Borrow some money, immediately "sell" the money for some stocks -- after some time has passed, "buy" back the money by returning the stocks. If the value of the stocks have increased, you'll get more money in return and be able to repay the loan.<br /><br />This is <i>precisely</i> symmetric to the situation of shorting -- <b>longing an asset</b> just means <i>shorting money</i> -- or more precisely, <b>shorting the rest of the market</b>.<br /><br />The apparent asymmetry between longing and shorting comes back from the fact that you are much more likely to already own some of "the rest of the market" than to own a particular stock -- for example, the unbounded losses of shorting arise from the fact that it's much easier for a single stock's value to skyrocket than for money's -- so in longing, there may still be ways for you to earn the money to repay it even if the value of the stock drops, i.e. the value of your other assets (e.g. your labour or property) relative to money would not have dropped.<br /><br />One advantage of this approach is that it is conceptually interesting -- and will hopefully allow us to transfer insights and ideas between stocks and shorts (except when certain approximations may be involved) -- another is that it immediately nullifies "moral" criticism of shorting, from e.g. Elon Musk, as it is really just the same as investing in the "rest of the market".<br /><br /><div class="twn-pitfall">Wait a minute -- but what if you actually just invested in "the rest of the market"? That would clearly have a much lower return than shorting the stock directly, right? Except you're thinking about investing in the rest of the market by paying money, not by paying the stock you're betting against -- that's a bet for the rest of the market against money, not against said stock.</div><br /><hr /><br />Well, shorting was an example where we wanted to bet that the price of an asset goes down. But in general, we may have any sort of weird prediction on the price of an asset -- maybe that it will "fluctuate a lot", or that it "won't exceed a certain level", or that it "will go up but only to a point", or that it "will reach a certain range". You may have any sort of elaborate <i>probability distribution</i> $\rho(x)$ on the value $x$ of the asset after a period of time. Given such a distribution, what you'd want to do (<b>ignoring risk</b>) is to maximise your expected return (minus the cost of buying the contract, of course):<br /><br />$$\chi=\int {\rho (x)f(x)dx} $$<br />Where $f(x)$ is the payoff you get if the asset reaches the price $x$ -- this is called the <b>payoff function</b>.<br /><br />Well, why not just take $f(x)$ to be arbitrarily high? Because the contract will be really expensive, of course. How expensive? Predicting that would require:<br /><ul><li>not only the $\rho$ distribution on this asset as believed by each seller and buyer in the market</li><li>but also the amount of capital they have and their beliefs about the future behavior of other assets in the market contracts on which they could buy instead</li></ul><div>And that is still not to mention the fact that people do not maximise the expected value of profit per say, but have varying levels of risk aversion.<br /><br />But that's alright -- we don't need to predict that. That price is crunched for us by the market and is the market price of the contract -- it is the <b>market price</b>. What's more important is to estimate $\chi = E_{\rho}[f(x)]$. Well, in fact, if we're concerned with <b>risk</b>, then we'd also be interested in the variance of the distribution -- and in general, an individual may also have a skewness or kurtosis preference (an example of a kurtosis preference would be among gamblers, who want heavy tails for the "big win").<br /><br />In fact, $\chi$ can depend on multiple underlying assets:<br /><br />$$\chi=E_\rho[f(\mathbf{x})]$$<br />Where $\mathbf{x}$ is the vector of prices of each underlying asset. In fact, this multivariate $f$ can represent your entire <b>portfolio</b> of derivatives on assets. If $f(\mathbf{x})$ can be written as a sum of functions of each component, this can be considered as some number of separate univariate derivatives -- the reason such a portfolio is still useful is that of risk management, especially if we use a $\rho$ that has some correlations (even otherwise, one may use a portfolio to mitigate risk but correlations allow us to target specific risks).<br /><br /><div class="twn-pitfall">There is an alternative definition of the payoff function, where it is $f(x)$ minus the contract price, i.e. a <b>profit/loss function</b>. The problem with this is that not every function can be a profit/loss function. But it often does make sense, and in general, a profit/loss function is more versatile than a payoff function (i.e. can be defined sensibly for any asset, which may not be possible with the payoff function with assets that have buying/selling at various points in time).<br /><br />(Think about how one may define a payoff function for shorting (shorting traditionally isn't considered a derivative because it isn't a contract, but I think that's an arbitrary distinction) -- the analog of the "contract price" is then the <i>negative</i> price you "buy" it at (i.e. the negative of the price you initially sell the stock you borrowed), and the negative value that you eventually "get" (i.e. the negative of the price you eventually sell it at) is the payoff function. So the payoff function is $-x$, and is indeed the reflection in the asset value axis of the payoff for a long. Check that the profit/loss functions are also reflections, albeit the interest on the stock you borrowed.)</div><br /><hr /><br />It's crucial to get some practice constructing various financial derivatives, i.e. constructing derivatives that have a given payoff function (using the first definition).<br /><br />$$f(x)=(a-x)I(x<a)$$<br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-3s_6o7haZyo/XcN1Ut7NCDI/AAAAAAAAF1w/QZaoPDdnOjEEs1A8fhGATkNot3K8gC9AQCLcBGAsYHQ/s1600/put.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="896" data-original-width="720" height="320" src="https://1.bp.blogspot.com/-3s_6o7haZyo/XcN1Ut7NCDI/AAAAAAAAF1w/QZaoPDdnOjEEs1A8fhGATkNot3K8gC9AQCLcBGAsYHQ/s320/put.png" width="257" /></a></div>Such a function would be a useful alternative to shorting, as it doesn't allow arbitrary losses.<br /><br />The whole discontinuity of the function really suggests to me a fundamental change in behaviour at the point $x=a$ -- like you just don't make the trade if $x\ge a$. This decision can only be made once the final price is discovered, so you must have bought a contract that gave you the <i>option</i> to make a transaction: that transaction must be <i>selling</i>, it must be executed after the price is realised, but it must be at price $a$, which is initially fixed.<br /><br />This is called a <b>put option</b> -- you buy the <i>option</i> to sell a stock at a pre-decided price. To exercise the option, you instantly buy the stock and sell it at that pre-decided price. Obviously, this price matters -- otherwise, you would be getting a guaranteed nonnegative profit. This is really equivalent to insurance.<br /><br />(Verify that the payoff diagram of the seller of the put option is the negative of that of what's above.)<br /><br /><hr /><br />There's a natural analog of this notion that reduces risks with longing.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-gS71M3H0XoI/XdQCVmDDAQI/AAAAAAAAF24/IMIC43CrO2MF-kctJqIUtXI_BcL1pNMtgCLcBGAsYHQ/s1600/call.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="320" data-original-width="257" src="https://1.bp.blogspot.com/-gS71M3H0XoI/XdQCVmDDAQI/AAAAAAAAF24/IMIC43CrO2MF-kctJqIUtXI_BcL1pNMtgCLcBGAsYHQ/s1600/call.png" /></a></div>Once again, we see that there's a fundamental change of behavior if the price drops below $x=a$ -- you just don't complete the transaction. So you've bought an <i>option</i> to do something. Well, you need to sell something to make money, but the intercept of the graph suggests that you're also buying the asset, albeit at a fixed price. So this is a <b>call option</b> -- you buy the <i>option</i> to buy a stock at a pre-decided price. To exercise the option, you exercise it, then immediately sell the stock you bought to make your profit.<br /><br />(Once again, the payoff diagram is a bit misleading and suggests that this is strictly worse than just buying a stock -- remember that the cost of a stock is the entire original stock price, while the cost of the call option is much smaller. These costs are not integrated into the payoff diagrams, but are into the profit/loss diagrams.)<br /><br />Essentially, call and put options allow you to work on hindsight.<br /><br />One might wonder that a call option is perhaps not <i>as</i> useful as a put option -- there's not much to insure with longing, right? (compared to shorting) Perhaps, but there are certain other uses of call options that work together with put options in an interesting way, as we will soon see.<br /><br /><hr /><br /><br /></div>call optionsderivativesfinanceoptionsportfolioput optionsshortingstocksThu, 07 Nov 2019 02:02:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-1561541293853707792Abhimanyu Pallavi Sudhir2019-11-07T02:02:00ZComment by Abhimanyu Pallavi Sudhir on Physical interpretation of complex numbers, part 2
https://physics.stackexchange.com/questions/512297/physical-interpretation-of-complex-numbers-part-2
I think you mean to say "if you scale by i, you rotate it 90 degrees". That's correct.Wed, 06 Nov 2019 13:02:59 GMThttps://physics.stackexchange.com/questions/512297/physical-interpretation-of-complex-numbers-part-2?cid=1154461Abhimanyu Pallavi Sudhir2019-11-06T13:02:59Zggplot aes: alpha gets "smoothed out"
https://stackoverflow.com/questions/58524281/ggplot-aes-alpha-gets-smoothed-out
1<p>I'm using <code>ggplot</code> in the <code>ggplot2</code> R package, with the <code>mpg</code> data set. </p>
<pre><code>classify = function(cls){
if (cls == "suv" || cls == "pickup"){result = 1}
else {result = 0}
return(result)
}
mpg = mpg %>% mutate(size = sapply(class, classify))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = size))
</code></pre>
<p>Now, <code>size</code> can take only two values: 1 when class is <code>suv</code> or <code>pickup</code>, and 0 otherwise. But I get a weird "smooth" range of sizes in the resulting plot:</p>
<p><a href="https://i.stack.imgur.com/Plzzy.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/Plzzy.png" alt="enter image description here"></a></p>
<p>(It's not the legend that surprises me, but the fact that there are actually values plotted with alpha 0.1 or 0.3 or whatever.)</p>
<p>What's going on?</p>rggplot2alphaalpha-transparencyWed, 23 Oct 2019 13:43:29 GMThttps://stackoverflow.com/q/58524281Abhimanyu Pallavi Sudhir2019-10-23T13:43:29ZAnswer by Abhimanyu Pallavi Sudhir for Axiomatic Treatment Of Sum Operators Which Work On Divergent Series
https://math.stackexchange.com/questions/3402550/axiomatic-treatment-of-sum-operators-which-work-on-divergent-series/3402796#3402796
2<p><strong>EDIT:</strong></p>
<p>My original answer actually defined a trivial operator -- the fixed formalisation is credit to <a href="https://math.stackexchange.com/users/328173/kenny-lau">Kenny Lau</a> on <a href="https://leanprover.zulipchat.com/#narrow/stream/116395-maths/topic/Axiomatised.20summations/near/178678162" rel="nofollow noreferrer">Zulip</a> (see the link for discussion regarding non-triviality).</p>
<pre><code>import data.real.basic linear_algebra.basis data.finset
open classical
open finset
local attribute [instance, priority 0] prop_decidable
structure is_sum (Sum : (ℕ → ℝ) → ℝ → Prop) : Prop :=
(wd : ∀ {s S₁ S₂}, Sum s S₁ → Sum s S₂ → S₁ = S₂)
(sum_add : ∀ {s t S T}, Sum s S → Sum t T → Sum (λ n, s n + t n) (S + T))
(sum_smul : ∀ {s S} c, Sum s S → Sum (λ n, c * s n) (c * S))
(sum_shift : ∀ {s S}, Sum s S → Sum (λ n, s (n + 1)) (S - s 0))
def has_sum (s : ℕ → ℝ) (S : ℝ) := ∀ Sum, is_sum Sum → ∀ T, Sum s T → T = S
theorem sum_of_has_sum (s : ℕ → ℝ) (S : ℝ) (HS : has_sum s S)
(Sum : (ℕ → ℝ) → ℝ → Prop) (H : is_sum Sum) (T : ℝ) (HT : Sum s T) :
Sum s S :=
by rwa (HS Sum H T HT).symm
theorem has_sum_alt : has_sum (λ n, (-1) ^ n) (1/2) :=
begin
intros Sum HSum T HT,
have H3 := HSum.sum_shift HT,
have H2 := HSum.sum_smul (-1) HT,
have H0 := HSum.wd H2 H3,
change _ = T - 1 at H0,
linarith,
end
theorem has_sum_alt_id : has_sum (λ n, (-1) ^ n * n) (-1/4) :=
begin
intros Sum HSum T HT,
have HC : ∀ n : ℕ, (-1 : ℝ) ^ (n + 1) * (n + 1 : ℕ) + (-1) ^ n * n =
(-1) * (-1) ^ n
:= λ n, by rw [pow_succ, nat.cast_add, mul_add, nat.cast_one, mul_one, add_comm,
←add_assoc, neg_one_mul, neg_mul_eq_neg_mul_symm, add_neg_self, zero_add],
have H3 := HSum.sum_shift HT,
have H1 := HSum.sum_add H3 HT,
have H2 := HSum.sum_smul (-1) H1,
simp only [nat.cast_zero, mul_zero, sub_zero, HC, neg_one_mul, neg_neg] at H2,
have H4 := has_sum_alt Sum HSum _ H2,
linarith,
end
def fib : ℕ → ℝ
| 0 := 0
| 1 := 1
| (n + 2) := fib n + fib (n + 1)
theorem has_sum_fib : has_sum fib (-1) :=
have HC : ∀ n, fib n + fib (n + 1) = fib (n + 2) := λ n, rfl,
begin
intros S HSum T HT,
have H3 := HSum.sum_shift HT,
have H33 := HSum.sum_shift H3,
have H1 := HSum.sum_add HT H3,
have H0 := HSum.wd H1 H33, -- can use linearity instead of wd
simp only [fib, sub_zero] at H0,
linarith,
end
-- if a sequence has two has_sums, everything is its sum
-- (this is the case of not being summable, e.g. 1+1+1+...)
theorem has_sum_unique (s : ℕ → ℝ) (S₁ S₂ : ℝ) (H : S₁ ≠ S₂) :
has_sum s S₁ → has_sum s S₂ → ∀ S', has_sum s S' :=
λ HS₁ HS₂ T₁ Sum HSum T₂ HT₂, false.elim <span class="math-container">$ H $</span> HS₂ Sum HSum S₁ $
sum_of_has_sum s S₁ HS₁ Sum HSum T₂ HT₂
open submodule
-- a sum operator that is "forced" to give a the sum s
-- a valid sum operator iff the shifts of a are linearly independent
-- in which case a can have any sum, and thus has_sum nothing
def forced_sum (s : ℕ → ℝ) (H : linear_independent ℝ (λ m n : ℕ, s (n + m))) (S : ℝ) :
(ℕ → ℝ) → ℝ → Prop :=
λ t T, ∃ Ht : t ∈ span ℝ (set.range (λ m n : ℕ, s (n + m))),
T = finsupp.sum (linear_independent.repr H ⟨t, Ht⟩)
(λ n r, r * (S - (finset.range n).sum s))
-- linear algebra lemma
lemma spanning_set_subset_span
{R M : Type} [ring R] [add_comm_group M] [module R M] {s : set M} :
s ⊆ span R s :=
span_le.mp (le_refl _)
-- finsupp lemma
lemma finsupp.mul_sum'
{α : Type} {β : Type} {γ : Type} [_inst_1 : semiring β] [_inst_2 : semiring γ]
(b : γ) (s : α →₀ β) {f : α → β → γ} (Hf0 : ∀ a, f a 0 = 0)
(Hfa : ∀ a b₁ b₂, f a (b₁ + b₂) = f a b₁ + f a b₂) :
b * finsupp.sum s f = finsupp.sum s (λ (a : α) (c : β), b * f a c) :=
begin
apply finsupp.induction s,
{ rw [finsupp.sum_zero_index, finsupp.sum_zero_index, mul_zero] },
intros A B t Ht HB IH,
rw [finsupp.sum_add_index Hf0 _, finsupp.sum_add_index _ _, mul_add, IH,
finsupp.sum_single_index, finsupp.sum_single_index],
rw [Hf0, mul_zero],
exact Hf0 _,
exact λ a, by rw [Hf0, mul_zero],
intros a b₁ b₂, rw [Hfa, mul_add],
exact Hfa
end
-- finsupp lemma (another one)
lemma function_finsupp_sum (a : ℕ →₀ ℝ) (f : ℕ → ℕ → ℝ → ℝ) (k : ℕ)
(H0 : ∀ a b, f a b 0 = 0) (Hl : ∀ a b c₁ c₂, f a b (c₁ + c₂) = f a b c₁ + f a b c₂) :
(finsupp.sum a (λ m am, (λ n, f n m am))) k = finsupp.sum a (λ m am, f k m am) :=
begin
apply finsupp.induction a,
{ simp only [finsupp.sum_zero_index], refl },
intros t v a ht hv H,
rw [finsupp.sum_add_index, finsupp.sum_add_index,
finsupp.sum_single_index, finsupp.sum_single_index],
{ show f k t v + _ = f k t v + _, rw H },
{ exact H0 _ _ },
{ funext, apply H0 },
{ exact λ r, H0 _ _ },
{ exact Hl _ },
{ exact λ t, funext (λ x, by rw H0; refl) },
{ exact λ a b₁ b₂, funext (λ x, Hl _ _ _ _) }
end
-- show that forced_sum_actually does what we want
lemma forced_sum_val (s : ℕ → ℝ) (S : ℝ)
(H : linear_independent ℝ (λ m n : ℕ, s (n + m))) :
forced_sum s H S s S :=
begin
have Hs₁ : s ∈ set.range (λ m n : ℕ, s (n + m)) := set.mem_range.mpr ⟨0, rfl⟩,
have Hs₂ : s ∈ span ℝ (set.range (λ m n : ℕ, s (n + m))) :=
spanning_set_subset_span Hs₁,
have Hs₃ : (linear_independent.repr H) ⟨s, Hs₂⟩ = finsupp.single 0 1 :=
linear_independent.repr_eq_single H 0 ⟨s, Hs₂⟩ rfl,
use Hs₂, simp [Hs₃, finsupp.sum_single_index],
end
-- forced_sum is a sum: some lemmas for the hard part
noncomputable def shift_repr
(s t : ℕ → ℝ) (Ht : t ∈ span ℝ ((λ (m n : ℕ), s (n + m)) '' set.univ)) :
ℕ →₀ ℝ :=
have trep : _ := (finsupp.mem_span_iff_total ℝ).mp Ht,
finsupp.map_domain (λ x, x + 1) (classical.some trep)
def shift_repr_prop
(s t : ℕ → ℝ) (Ht : t ∈ span ℝ ((λ (m n : ℕ), s (n + m)) '' set.univ)) :
finsupp.sum (shift_repr s t Ht) (λ (m : ℕ) (am : ℝ) (n : ℕ), am * s (n + m)) =
λ (n : ℕ), t (n + 1) :=
have trep : _ := (finsupp.mem_span_iff_total ℝ).mp Ht,
let a : _ := classical.some trep in
let b : _ := shift_repr s t Ht in
have Ha : finsupp.sum a (λ (m : ℕ) (am : ℝ) (n : ℕ), am * s (n + m)) = t :=
classical.some_spec (classical.some_spec trep),
begin
have Hn : ∀ n, (finsupp.sum a (λ (m : ℕ) (am : ℝ) (n : ℕ), am * s (n + m))) n =
t n
:= by rw Ha; exact λ n, rfl,
have Hn' : ∀ (n : ℕ), finsupp.sum a (λ (m : ℕ) (am : ℝ), am * s (n + m)) = t n,
intro n,
rw [←(function_finsupp_sum a _ n _ _), Ha],
exact λ m n, zero_mul _,
exact λ m n q r, add_mul _ _ _,
have Hb : ∀ n, finsupp.sum b (λ m bm, bm * s (n + m)) =
finsupp.sum a (λ m am, am * s (n + 1 + m))
:= by
{ intro n,
convert @finsupp.sum_map_domain_index ℕ ℝ _ ℕ _ _ (λ x, x + 1) a _ _ _,
exact funext (λ m, funext (λ am, by rw [add_assoc, add_comm 1 m])),
exact λ a, zero_mul _,
exact λ n r s, add_mul _ _ _ },
have YAY := λ n, Hn' (n + 1),
have YAY' :
∀ (n : ℕ), finsupp.sum b (λ (m : ℕ) (am : ℝ), am * s (n + m)) = t (n + 1)
:= λ n, by rw [Hb, YAY],
have YAY'' : (λ n, finsupp.sum b (λ (m : ℕ) (am : ℝ), am * s (n + m))) =
(λ n, t (n + 1))
:= funext (λ n, YAY' n),
have primr : (λ n, finsupp.sum b (λ m am, am * s (n + m))) =
(finsupp.sum b (λ m am n, am * s (n + m)))
:= by
{ apply funext, intro n, apply (function_finsupp_sum b _ n _ _).symm,
exact λ m n, zero_mul _,
exact λ m n q r, add_mul _ _ _ },
rw primr at YAY'',
exact YAY'',
end
lemma shift_mem_span_shifts
(s t : ℕ → ℝ) (Ht : t ∈ span ℝ (set.range (λ (m n : ℕ), s (n + m)))) :
(λ n, t (n + 1)) ∈ span ℝ (set.range (λ (m n : ℕ), s (n + m))) :=
begin
rw set.image_univ.symm at Ht ⊢,
let b := shift_repr s t Ht,
have Hb := shift_repr_prop s t Ht,
exact (finsupp.mem_span_iff_total _).mpr
⟨b, ⟨(by rw finsupp.supported_univ; exact submodule.mem_top), by rw ←Hb; refl⟩⟩,
end
lemma forced_sum_shift
(s : ℕ → ℝ) (S : ℝ) (H : linear_independent ℝ (λ m n : ℕ, s (n + m))) :
∀ {t T}, (forced_sum s H S) t T → (forced_sum s H S) (λ n, t (n + 1)) (T - t 0) :=
λ t T ⟨Ht, HT⟩,
begin
use shift_mem_span_shifts s t Ht,
end
-- forced_sum is a sum
lemma is_sum_forced_sum (s : ℕ → ℝ) (S : ℝ)
(H : linear_independent ℝ (λ m n : ℕ, s (n + m))) :
is_sum (forced_sum s H S) :=
⟨ λ t T₁ T₂ ⟨Ht₁, HT₁⟩ ⟨Ht₂, HT₂⟩, by rw [HT₁, HT₂],
λ t₁ t₂ T₁ T₂ ⟨Ht₁, HT₁⟩ ⟨Ht₂, HT₂⟩,
begin
use add_mem _ Ht₁ Ht₂,
change _ = finsupp.sum ((linear_independent.repr H) ⟨t₁ + t₂, _⟩) _,
have Hadd
: (linear_independent.repr H) ⟨t₁ + t₂, _⟩ =
(linear_independent.repr H) _ + (linear_independent.repr H) _
:= (linear_independent.repr H).add ⟨t₁, Ht₁⟩ ⟨t₂, Ht₂⟩,
rw [Hadd, HT₁, HT₂, ←finsupp.sum_add_index],
{ intro a, apply zero_mul },
{ intros a b c, apply add_mul }
end,
λ t T c ⟨Ht, HT⟩,
begin
use smul_mem _ c Ht,
have Hsmul
: (linear_independent.repr H) ⟨λ n, c * t n, _⟩ =
c • (linear_independent.repr H) _
:= (linear_independent.repr H).smul c ⟨t, Ht⟩,
rw [Hsmul, finsupp.sum_smul_index], simp only [mul_assoc],
rw [←finsupp.mul_sum', HT],
exact λ i, (zero_mul _),
exact λ a b c, add_mul _ _ _,
exact λ i, (zero_mul _)
end,
-- we've already done the hard part
λ t T, forced_sum_shift s S H ⟩
theorem no_sum_of_lin_ind_shifts
(s : ℕ → ℝ) (H : linear_independent ℝ (λ m n : ℕ, s (n + m))) :
∀ S : ℝ, ¬ has_sum s S :=
λ S HS,
have X : _ := HS (forced_sum s H (S + 1)) (is_sum_forced_sum s (S + 1) H) (S + 1)
(forced_sum_val s (S + 1) H),
by linarith
-- CHALLENGE: formalise the proof here:
-- https://leanprover.zulipchat.com/#narrow/stream/116395-maths/
-- topic/Axiomatised.20summations/near/178884724
-- REQUIRES GENERATING FUNCTIONS, TAYLOR SERIES -- not currently in Lean!
theorem inv_shifts_lin_ind : linear_independent ℝ (λ m n : ℕ, 1 / (n + m + 1)) :=
begin
end
</code></pre>
<p>Feel free to <a href="https://leanprover-community.github.io/lean-web-editor/" rel="nofollow noreferrer">play with it yourself</a>. And check out the challenge (proving that there exists a sequence that does <em>not</em> have a sum (see the proof in math <a href="https://leanprover.zulipchat.com/#narrow/stream/116395-maths/topic/Axiomatised.20summations/near/178884724" rel="nofollow noreferrer">here</a>). Actually providing an example (e.g. <span class="math-container">$1/n$</span>) may be quite hard (the proof in the chat uses generating functions, which should be hard in Lean), but proving that a sequence with linearly independent shifts has no sum is almost done -- you just need to prove that the forced sum is a sum operator.</p>
<p>(<a href="https://leanprover-community.github.io/lean-web-editor/#code=import%20data.real.basic%20linear_algebra.basis%20data.finset%0Aopen%20classical%0Aopen%20finset%20%0A%0Alocal%20attribute%20%5Binstance%2C%20priority%200%5D%20prop_decidable%0A%0Astructure%20is_sum%20%28Sum%20%3A%20%28%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%E2%86%92%20%E2%84%9D%20%E2%86%92%20Prop%29%20%3A%20Prop%20%3A%3D%0A%28wd%20%3A%20%E2%88%80%20%7Bs%20S%E2%82%81%20S%E2%82%82%7D%2C%20Sum%20s%20S%E2%82%81%20%E2%86%92%20Sum%20s%20S%E2%82%82%20%E2%86%92%20S%E2%82%81%20%3D%20S%E2%82%82%29%0A%28sum_add%20%3A%20%E2%88%80%20%7Bs%20t%20S%20T%7D%2C%20Sum%20s%20S%20%E2%86%92%20Sum%20t%20T%20%E2%86%92%20Sum%20%28%CE%BB%20n%2C%20s%20n%20%2B%20t%20n%29%20%28S%20%2B%20T%29%29%0A%28sum_smul%20%3A%20%E2%88%80%20%7Bs%20S%7D%20c%2C%20Sum%20s%20S%20%E2%86%92%20Sum%20%28%CE%BB%20n%2C%20c%20*%20s%20n%29%20%28c%20*%20S%29%29%0A%28sum_shift%20%3A%20%E2%88%80%20%7Bs%20S%7D%2C%20Sum%20s%20S%20%E2%86%92%20Sum%20%28%CE%BB%20n%2C%20s%20%28n%20%2B%201%29%29%20%28S%20-%20s%200%29%29%0A%0Adef%20has_sum%20%28s%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%28S%20%3A%20%E2%84%9D%29%20%3A%3D%20%E2%88%80%20Sum%2C%20is_sum%20Sum%20%E2%86%92%20%E2%88%80%20T%2C%20Sum%20s%20T%20%E2%86%92%20T%20%3D%20S%0A%0Aopen%20submodule%0A%0A--%20a%20sum%20operator%20that%20is%20%22forced%22%20to%20give%20a%20the%20sum%20s%0A--%20a%20valid%20sum%20operator%20iff%20the%20shifts%20of%20a%20are%20linearly%20independent%0A--%20in%20which%20case%20a%20can%20have%20any%20sum%2C%20and%20thus%20has_sum%20nothing%0Adef%20forced_sum%20%28s%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%28H%20%3A%20linear_independent%20%E2%84%9D%20%28%CE%BB%20m%20n%20%3A%20%E2%84%95%2C%20s%20%28n%20%2B%20m%29%29%29%20%28S%20%3A%20%E2%84%9D%29%20%3A%20%0A%20%20%28%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%E2%86%92%20%E2%84%9D%20%E2%86%92%20Prop%20%3A%3D%0A%CE%BB%20t%20T%2C%20%E2%88%83%20Ht%20%3A%20t%20%E2%88%88%20span%20%E2%84%9D%20%28set.range%20%28%CE%BB%20m%20n%20%3A%20%E2%84%95%2C%20s%20%28n%20%2B%20m%29%29%29%2C%0AT%20%3D%20finsupp.sum%20%28linear_independent.repr%20H%20%E2%9F%A8t%2C%20Ht%E2%9F%A9%29%0A%20%20%28%CE%BB%20n%20r%2C%20r%20*%20%28S%20-%20%28finset.range%20n%29.sum%20s%29%29%0A%0A--%20linear%20algebra%20lemma%0Alemma%20spanning_set_subset_span%20%7BR%20M%20%3A%20Type%7D%20%5Bring%20R%5D%20%5Badd_comm_group%20M%5D%20%5Bmodule%20R%20M%5D%20%7Bs%20%3A%20set%20M%7D%20%3A%0A%20%20s%20%E2%8A%86%20span%20R%20s%20%3A%3D%20span_le.mp%20%28le_refl%20_%29%0A%0A--%20finsupp%20lemma%20%28another%20one%29%0Alemma%20function_finsupp_sum%20%28a%20%3A%20%E2%84%95%20%E2%86%92%E2%82%80%20%E2%84%9D%29%20%28f%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%95%20%E2%86%92%20%E2%84%9D%20%E2%86%92%20%E2%84%9D%29%20%28k%20%3A%20%E2%84%95%29%20%0A%20%20%28H0%20%3A%20%E2%88%80%20a%20b%2C%20f%20a%20b%200%20%3D%200%29%20%28Hl%20%3A%20%E2%88%80%20a%20b%20c%E2%82%81%20c%E2%82%82%2C%20f%20a%20b%20%28c%E2%82%81%20%2B%20c%E2%82%82%29%20%3D%20f%20a%20b%20c%E2%82%81%20%2B%20f%20a%20b%20c%E2%82%82%29%20%3A%20%0A%20%20%28finsupp.sum%20a%20%28%CE%BB%20m%20am%2C%20%28%CE%BB%20n%2C%20f%20n%20m%20am%29%29%29%20k%20%3D%20finsupp.sum%20a%20%28%CE%BB%20m%20am%2C%20f%20k%20m%20am%29%20%3A%3D%0Abegin%0A%20%20apply%20finsupp.induction%20a%2C%0A%20%20%7B%20simp%20only%20%5Bfinsupp.sum_zero_index%5D%2C%20refl%20%7D%2C%0A%20%20intros%20t%20v%20a%20ht%20hv%20H%2C%0A%20%20rw%20%5Bfinsupp.sum_add_index%2C%20finsupp.sum_add_index%2C%20finsupp.sum_single_index%2C%20finsupp.sum_single_index%5D%2C%20%20%0A%20%20%7B%20show%20f%20k%20t%20v%20%2B%20_%20%3D%20f%20k%20t%20v%20%2B%20_%2C%20rw%20H%20%7D%2C%0A%20%20%7B%20exact%20H0%20_%20_%20%7D%2C%0A%20%20%7B%20funext%2C%20apply%20H0%20%7D%2C%0A%20%20%7B%20exact%20%CE%BB%20r%2C%20H0%20_%20_%20%7D%2C%0A%20%20%7B%20exact%20Hl%20_%20%7D%2C%0A%20%20%7B%20exact%20%CE%BB%20t%2C%20funext%20%28%CE%BB%20x%2C%20by%20rw%20H0%3B%20refl%29%20%7D%2C%0A%20%20%7B%20exact%20%CE%BB%20a%20b%E2%82%81%20b%E2%82%82%2C%20funext%20%28%CE%BB%20x%2C%20Hl%20_%20_%20_%20_%29%20%7D%0Aend%0A%0Anoncomputable%20def%20shift_repr%20%0A%20%20%28s%20t%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%28Ht%20%3A%20t%20%E2%88%88%20span%20%E2%84%9D%20%28%28%CE%BB%20%28m%20n%20%3A%20%E2%84%95%29%2C%20s%20%28n%20%2B%20m%29%29%20''%20set.univ%29%29%20%3A%0A%20%20%E2%84%95%20%E2%86%92%E2%82%80%20%E2%84%9D%20%3A%3D%0Ahave%20trep%20%3A%20_%20%3A%3D%20%28finsupp.mem_span_iff_total%20%E2%84%9D%29.mp%20Ht%2C%0Afinsupp.map_domain%20%28%CE%BB%20x%2C%20x%20%2B%201%29%20%28classical.some%20trep%29%0A%0Adef%20shift_repr_prop%20%0A%20%20%28s%20t%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%28Ht%20%3A%20t%20%E2%88%88%20span%20%E2%84%9D%20%28%28%CE%BB%20%28m%20n%20%3A%20%E2%84%95%29%2C%20s%20%28n%20%2B%20m%29%29%20''%20set.univ%29%29%20%3A%0A%20%20finsupp.sum%20%28shift_repr%20s%20t%20Ht%29%20%28%CE%BB%20%28m%20%3A%20%E2%84%95%29%20%28am%20%3A%20%E2%84%9D%29%20%28n%20%3A%20%E2%84%95%29%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%20%3D%20%CE%BB%20%28n%20%3A%20%E2%84%95%29%2C%20t%20%28n%20%2B%201%29%20%3A%3D%0Ahave%20trep%20%3A%20_%20%3A%3D%20%28finsupp.mem_span_iff_total%20%E2%84%9D%29.mp%20Ht%2C%0Alet%20a%20%3A%20_%20%3A%3D%20classical.some%20trep%20in%0Alet%20b%20%3A%20_%20%3A%3D%20shift_repr%20s%20t%20Ht%20in%0Ahave%20Ha%20%3A%20finsupp.sum%20a%20%28%CE%BB%20%28m%20%3A%20%E2%84%95%29%20%28am%20%3A%20%E2%84%9D%29%20%28n%20%3A%20%E2%84%95%29%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%20%3D%20t%20%3A%3D%20%0A%20%20classical.some_spec%20%28classical.some_spec%20trep%29%2C%0Abegin%0A%20%20have%20Hn%20%3A%20%E2%88%80%20n%2C%20%28finsupp.sum%20a%20%28%CE%BB%20%28m%20%3A%20%E2%84%95%29%20%28am%20%3A%20%E2%84%9D%29%20%28n%20%3A%20%E2%84%95%29%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%29%20n%20%3D%20%0A%20%20%20%20t%20n%0A%20%20%3A%3D%20by%20rw%20Ha%3B%20exact%20%CE%BB%20n%2C%20rfl%2C%0A%20%20have%20Hn'%20%3A%20%E2%88%80%20%28n%20%3A%20%E2%84%95%29%2C%20finsupp.sum%20a%20%28%CE%BB%20%28m%20%3A%20%E2%84%95%29%20%28am%20%3A%20%E2%84%9D%29%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%20%3D%20t%20n%2C%0A%20%20%20%20intro%20n%2C%0A%20%20%20%20rw%20%5B%E2%86%90%28function_finsupp_sum%20a%20_%20n%20_%20_%29%2C%20Ha%5D%2C%0A%20%20%20%20exact%20%CE%BB%20m%20n%2C%20zero_mul%20_%2C%0A%20%20%20%20exact%20%CE%BB%20m%20n%20q%20r%2C%20add_mul%20_%20_%20_%2C%0A%20%20have%20Hb%20%3A%20%E2%88%80%20n%2C%20finsupp.sum%20b%20%28%CE%BB%20m%20bm%2C%20bm%20*%20s%20%28n%20%2B%20m%29%29%20%3D%20%0A%20%20%20%20finsupp.sum%20a%20%28%CE%BB%20m%20am%2C%20am%20*%20s%20%28n%20%2B%201%20%2B%20m%29%29%0A%20%20%3A%3D%20by%20%0A%20%20%20%20%7B%20intro%20n%2C%20%0A%20%20%20%20%20%20convert%20%40finsupp.sum_map_domain_index%20%E2%84%95%20%E2%84%9D%20_%20%E2%84%95%20_%20_%20%28%CE%BB%20x%2C%20x%20%2B%201%29%20a%20_%20_%20_%2C%20%0A%20%20%20%20%20%20exact%20funext%20%28%CE%BB%20m%2C%20funext%20%28%CE%BB%20am%2C%20by%20rw%20%5Badd_assoc%2C%20add_comm%201%20m%5D%29%29%2C%0A%20%20%20%20%20%20exact%20%CE%BB%20a%2C%20zero_mul%20_%2C%0A%20%20%20%20%20%20exact%20%CE%BB%20n%20r%20s%2C%20add_mul%20_%20_%20_%20%7D%2C%0A%20%20have%20YAY%20%3A%3D%20%CE%BB%20n%2C%20Hn'%20%28n%20%2B%201%29%2C%20%20%20%20%0A%20%20have%20YAY'%20%3A%20%0A%20%20%20%20%E2%88%80%20%28n%20%3A%20%E2%84%95%29%2C%20finsupp.sum%20b%20%28%CE%BB%20%28m%20%3A%20%E2%84%95%29%20%28am%20%3A%20%E2%84%9D%29%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%20%3D%20t%20%28n%20%2B%201%29%20%0A%20%20%3A%3D%20%CE%BB%20n%2C%20by%20rw%20%5BHb%2C%20YAY%5D%2C%0A%20%20have%20YAY''%20%3A%20%28%CE%BB%20n%2C%20finsupp.sum%20b%20%28%CE%BB%20%28m%20%3A%20%E2%84%95%29%20%28am%20%3A%20%E2%84%9D%29%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%29%20%3D%20%0A%20%20%20%20%28%CE%BB%20n%2C%20t%20%28n%20%2B%201%29%29%0A%20%20%3A%3D%20funext%20%28%CE%BB%20n%2C%20YAY'%20n%29%2C%0A%20%20have%20primr%20%3A%20%28%CE%BB%20n%2C%20finsupp.sum%20b%20%28%CE%BB%20m%20am%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%29%20%3D%20%0A%20%20%20%20%28finsupp.sum%20b%20%28%CE%BB%20m%20am%20n%2C%20am%20*%20s%20%28n%20%2B%20m%29%29%29%0A%20%20%3A%3D%20by%20%0A%20%20%7B%20apply%20funext%2C%20intro%20n%2C%20apply%20%28function_finsupp_sum%20b%20_%20n%20_%20_%29.symm%2C%0A%20%20%20%20exact%20%CE%BB%20m%20n%2C%20zero_mul%20_%2C%0A%20%20%20%20exact%20%CE%BB%20m%20n%20q%20r%2C%20add_mul%20_%20_%20_%20%7D%2C%0A%20%20rw%20primr%20at%20YAY''%2C%0A%20%20exact%20YAY''%2C%0Aend%0A%20%0Alemma%20shift_mem_span_shifts%20%0A%20%20%28s%20t%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%28Ht%20%3A%20t%20%E2%88%88%20span%20%E2%84%9D%20%28set.range%20%28%CE%BB%20%28m%20n%20%3A%20%E2%84%95%29%2C%20s%20%28n%20%2B%20m%29%29%29%29%20%3A%0A%20%20%28%CE%BB%20n%2C%20t%20%28n%20%2B%201%29%29%20%E2%88%88%20span%20%E2%84%9D%20%28set.range%20%28%CE%BB%20%28m%20n%20%3A%20%E2%84%95%29%2C%20s%20%28n%20%2B%20m%29%29%29%20%3A%3D%0Abegin%0A%20%20rw%20set.image_univ.symm%20at%20Ht%20%E2%8A%A2%2C%0A%20%20let%20b%20%3A%3D%20shift_repr%20s%20t%20Ht%2C%0A%20%20have%20Hb%20%3A%3D%20shift_repr_prop%20s%20t%20Ht%2C%0A%20%20exact%20%28finsupp.mem_span_iff_total%20_%29.mpr%20%0A%20%20%20%20%E2%9F%A8b%2C%20%E2%9F%A8%28by%20rw%20finsupp.supported_univ%3B%20exact%20submodule.mem_top%29%2C%20by%20rw%20%E2%86%90Hb%3B%20refl%E2%9F%A9%E2%9F%A9%2C%0Aend%0A%0Alemma%20forced_sum_shift%20%0A%20%20%28s%20%3A%20%E2%84%95%20%E2%86%92%20%E2%84%9D%29%20%28S%20%3A%20%E2%84%9D%29%20%28H%20%3A%20linear_independent%20%E2%84%9D%20%28%CE%BB%20m%20n%20%3A%20%E2%84%95%2C%20s%20%28n%20%2B%20m%29%29%29%20%3A%20%0A%20%20%E2%88%80%20%7Bt%20T%7D%2C%20%28forced_sum%20s%20H%20S%29%20t%20T%20%E2%86%92%20%28forced_sum%20s%20H%20S%29%20%28%CE%BB%20n%2C%20t%20%28n%20%2B%201%29%29%20%28T%20-%20t%200%29%20%3A%3D%0A%CE%BB%20t%20T%20%E2%9F%A8Ht%2C%20HT%E2%9F%A9%2C%20%0Abegin%0A%20%20use%20shift_mem_span_shifts%20s%20t%20Ht%2C%0A%0Aend%0A" rel="nofollow noreferrer">Draft</a>)</p>
<hr>
<p><strong>OLD ANSWER:</strong></p>
<p>Here's something to get you started -- I wrote it in Lean, a formal proof-checker, because these things are tricky and I wanted to be completely sure I was being rigorous. I suppose we also need <code>sum_con</code> for convergent sums, but I'm not sure where infinite series are in the Lean math library!</p>
<pre><code>[old code redacted]
</code></pre>Mon, 21 Oct 2019 13:47:08 GMThttps://math.stackexchange.com/questions/3402550/-/3402796#3402796Abhimanyu Pallavi Sudhir2019-10-21T13:47:08ZAnswer by Abhimanyu Pallavi Sudhir for What is the motivation of uniform continuity?
https://math.stackexchange.com/questions/457008/what-is-the-motivation-of-uniform-continuity/3402176#3402176
0<p>One motivation comes from non-standard analysis, i.e. analysis with hyperreal numbers. This view is actually very useful (makes things obvious) when looking at e.g. the uniform limit theorem (the relationship to uniform convergence).</p>
<p>Here, a real function <span class="math-container">$f$</span> is continuous at <span class="math-container">$x$</span> if <span class="math-container">$\hat{f}(x+\varepsilon)-\hat{f}(x)$</span> is infinitesimal for all infinitesimal <span class="math-container">$\varepsilon$</span>. </p>
<p>A real function is <em>uniformly continuous</em> if it is <strong>continuous for all hyperreal <span class="math-container">$x$</span></strong> -- whereas a continuous function only needs to be continuous at real values of <span class="math-container">$x$</span>. </p>
<p>So it's obvious why <span class="math-container">$x^2$</span> is not uniformly continuous -- at <span class="math-container">$\omega$</span>, it turns increments by <span class="math-container">$1/\omega$</span> into increments by <span class="math-container">$1$</span>. Or why <span class="math-container">$1/x$</span> isn't uniformly continuous on the positive reals -- at <span class="math-container">$\varepsilon$</span>, it turns increments by <span class="math-container">$\varepsilon$</span> into increments by <span class="math-container">$1/\varepsilon$</span>. It also explains why <span class="math-container">$\sqrt{x}$</span> <em>is</em> continuous on the positive reals -- although it turns <span class="math-container">$\varepsilon$</span> into <span class="math-container">$\sqrt{\varepsilon}$</span>, which has "higher order" -- <em>that's still an infinitesimal</em>.</p>
<p>In real number speak, this just says that for any two sequences st. <span class="math-container">$x_n-y_n\to 0$</span>, <span class="math-container">$f(x_n)-f(y_n)\to 0$</span> (which is really the "sequential" form of stating uniform continuity). By contrast for continuity, this is only required with constant sequences <span class="math-container">$y_n$</span>.</p>Mon, 21 Oct 2019 00:06:46 GMThttps://math.stackexchange.com/questions/457008/-/3402176#3402176Abhimanyu Pallavi Sudhir2019-10-21T00:06:46ZAnswer by Abhimanyu Pallavi Sudhir for If $S$ is an infinite $\sigma$ algebra on $X$ then $S$ is not countable
https://math.stackexchange.com/questions/320035/if-s-is-an-infinite-sigma-algebra-on-x-then-s-is-not-countable/3396962#3396962
0<p><strong><a href="https://thewindingnumber.blogspot.com/2019/10/sigma-fields-are-venn-diagrams.html" rel="nofollow noreferrer">Sigma algebras are just Venn diagrams.</a></strong> (with some caveats because of all the "<em>countable</em> union" business)</p>
<p>A sigma field <span class="math-container">$\mathcal{F}$</span> on <span class="math-container">$X$</span> defines an equivalence relation on <span class="math-container">$X$</span> where <span class="math-container">$x\sim y$</span> iff <span class="math-container">$\forall E\in \mathcal{F},x\in E\iff y\in E$</span>. This partition is just the partition defined by the Venn diagram -- by the little intersection regions. The important point is that there is a bijection <span class="math-container">$\mathcal{F}\leftrightarrow \mathcal{P}(X/\sim)$</span> -- this should also be obvious with the Venn diagrams.</p>
<p>So what are the possible values for the cardinality of a power set?</p>Thu, 17 Oct 2019 01:10:41 GMThttps://math.stackexchange.com/questions/320035/-/3396962#3396962Abhimanyu Pallavi Sudhir2019-10-17T01:10:41ZSigma fields are Venn diagrams
https://thewindingnumber.blogspot.com/2019/10/sigma-fields-are-venn-diagrams.html
0The starting point for probability theory will be to note the difference between <b>outcomes</b> and <b>events</b>.<br /><br />An <b>outcome</b> of an experiment is a fundamentally non-empirical notion, about our theoretical understanding of what states a system may be in -- it is, in a sense, analogous to the "microstates" of statistical physics. The set of all <i>outcomes</i> $x$ is called the <b>sample space</b> $X$, and is the fundamental space to which we will give a probabilistic structure (we will see what this means).<br /><br />Our actual observations, the events, need not be so precise -- for example, our measurement device may not actually measure the exact sequence of heads and tails as the result of an experiment, but only the total number of heads, or something -- analogous to a "macrostate". But these measurements <i>are</i> statements about what microstates we know are possible for our system to be in -- i.e. they correspond to sets of outcomes. These sets of outcomes that we can "talk about" are called <b>events</b> $E$, and the set of all possible events is called a <b>field</b> $\mathcal{F}\subseteq 2^X$.<br /><br />For instance: if our sample space is $\{1,2,3,4,5,6\}$ and our measurement apparatus is a guy who looks at the reading and tells us if it's even or odd, then the field is $\{\varnothing, \{1,3,5\},\{2,4,6\},X\}$. We simply <i>cannot</i> talk about sets like $\{1,3\}$ or $\{1\}$. Our information just doesn't tell us anything about sets like that -- when we're told "odd", we're never hinted if the outcome was 1 or 3 or 5, so we can't even have prior probabilities -- we can't even give probabilities to whether a measurement was a 1 or a 3.<br /><br />Well, what kind of properties characterise a field? There's actually a bit of ambiguity in this -- it's clear that a field should be closed under <b>negation and <i>finite</i> unions</b> (and finite intersections follow via de Morgan) -- if you can talk about whether $P_1$ and $P_2$ are true, you can check each of them to decide if $P_1\lor P_2$ is true (and since a proposition $P$ corresponds to a set $S$ in the sense that $P$ says "one of the outcomes in $S$ is true", $\lor$ translates to $\cup$). But if you have an infinite number of $P_i$'s, can you really check each one of them so that you can say without a doubt that a field is closed under arbitrary union?<br /><br />Well, this is (at this point) really a matter of convention, but we tend to choose the convention where the field is closed under <b>negation and <i>countable</i> unions</b>. Such a field is called a <b>sigma-field</b>. We will actually see where this convention comes from (and why it is actually important) when we define probability -- in fact, it is required for the idea that one may have a uniform probability distribution on a compact set in $\mathbb{R}^n$.<br /><br /><hr /><br />A beautiful way to understand fields and sigma fields is in terms of venn diagrams -- in fact, as you will see, <b>fields are precisely a formalisation of Venn diagrams</b>. I was pretty amazed when I discovered this (rather simple) connection for myself, and you should be too.<br /><br />Suppose your experiment is to toss three coins, and make "partial measurements" on the results through three "measurement devices":<br /><ul><li><b>A:</b> Lights up iff the number of heads was at least 2.</li><li><b>B:</b> Lights up iff the first two coins landed heads.</li><li><b>C:</b> Lights up iff the third coin landed heads.</li></ul>What this means is that $A$ gives you the set $\{HHT, HTH, THH, HHH\}$, $B$ gives you the set $\{HHH, HHT\}$, $C$ gives you the set $\{HHH, HTH, THH, TTH\}$. Based on precisely which devices light up, you can decide the truth values of $\lnot$'s and $\lor$'s of these statements, i.e. complements and unions of these sets -- this is the point of fields, of course.<br /><br />Or we could visualise things.<br /><br /><div class="separator" style="clear: both; text-align: center;"></div><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-_GSgU8wZr3o/XaXXGandEiI/AAAAAAAAFxI/DgbNQDmdnMgiiFFaxcM9_x8h37s0F5nVQCEwYBhgL/s1600/venn%2Bsigma.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="588" data-original-width="948" height="247" src="https://1.bp.blogspot.com/-_GSgU8wZr3o/XaXXGandEiI/AAAAAAAAFxI/DgbNQDmdnMgiiFFaxcM9_x8h37s0F5nVQCEwYBhgL/s400/venn%2Bsigma.png" width="400" /></a></div>Well, the Venn diagram produces a partition of $X$ corresponding to the equivalence relation of "indistinguishability", i.e. "every event containing one outcome contains the other"? The <i>field</i> consists precisely of any set one can "mark" on the Venn diagram -- i.e. unions of the elements of the partition.<br /><br />A consequence of this becomes immediately obvious:<br /><br /><b>Given a field $\mathcal{F}$ corresponding to the partition $\sim$, the following bijection holds: $\mathcal{F}\leftrightarrow 2^{X/\sim}$.</b><br /><br />Consequences of this include: the cardinalities of finite sigma fields are precisely the powers of two; there is no countably infinite finite field.<br /><br /><hr /><br />Often, one may want to some raw data from an experiment to obtain some processed data. For example, let $X=\{HH,HT,TH,TT\}$ and the initial measurement is of the number of heads:<br /><br />$$\begin{align}<br />\mathcal{F}=&\{\varnothing, \{TT\}, \{HT, TH\}, \{HH\},\\<br />& \{TT, HT, TH\}, \{TT, HH\}, \{HT, TH, HH\}, X \}<br />\end{align}$$<br />What kind of properties of the outcome can we talk about with certainty given the number of heads? For example, we can talk about the question "was there at least one heads?"<br /><br />$$\mathcal{G}=\{\varnothing, \{TT\}, \{HT, TH, HH\}, X\}$$<br />There are two ways to understand this "processing" or "re-measuring". One is as a function $f:\frac{X}{\sim_\mathcal{F}}\to \frac{X}{\sim_\mathcal{G}}$. Recall that:<br /><br />$$\begin{align}<br />\frac{X}{\sim_\mathcal{F}}&=\{\{TT\},\{HT,TH\},\{HH\}\}\\<br />\frac{X}{\sim_\mathcal{G}}&=\{\{TT\},\{HT,TH,HH\}\}<br />\end{align}$$<br />Any such $f$ is a permissible "<b>measurable function</b>", as long as $\sim_\mathcal{G}$ is at least as coarse a partition as $\sim_\mathcal{F}$. In other words, a function from $X/\sim_1$ to $(X/\sim_1)/\sim_2$ is always measurable.<br /><br />But there's another, more "natural", less weird and mathematical way to think about a re-measurement -- as a function $f:X\to Y$, where in this case $Y=\{0,1\}$ where an outcome maps to 1 if it has at least one heads, and 0 if it does not.<br /><br />But there's a catch: knowing that an event $E_Y$ in $Y$ occurred is equivalent to knowing that <i>an</i> outcome in $X$ mapping to $E_Y$ occurred -- i.e. that the event $\{x\in X\mid f(x)\in Y\}$ occurred. Such an event must be in the field on $X$, i.e.<br /><br />$$\forall y\in\mathcal{F}_Y,f^{-1}(y)\in\mathcal{F}_X$$<br />This is the condition for a <b>measurable function</b>, also known as a <b>random variable</b>.<br /><br /><hr /><br />One may observe certain analogies between the measurable spaces outlined above, and topology -- in the case of countable sample spaces, there actually is a correspondence. The similarity between a Venn diagram and casual drawings of a topological space is not completely superficial.<br /><br />The key idea behind fields is mathematically a notion of "distinguishability" -- if all we can measure is the number of heads, $HHTTH$ and $TTHHH$ are identical to us. For all practical purposes, we can view the sample space as the partition by this equivalence relation. They are basically the "same point".<br /><br />It's this notion that a <b>measurable function</b> seeks to encapsulate -- it is, in a sense, a <b>generalisation of a function</b> from set theory. A function <b>cannot distinguish indistinguishable points</b> -- in set theory, "indistinguishability" is just equality, the discrete partition; a measurable function <b>cannot distinguish indistinguishable points</b> -- but in measurable spaces, "indistinguishability" is given by some equivalence relation.<br /><br />Let's see this more precisely.<br /><br />Given sets with equivalence relations $(X,\sim)$, $(Y,\sim)$, we want to ensure that some function $f:X\to Y$ "lifts" to a function $f:\frac{X}{\sim}\to\frac{Y}{\sim}$ such that $f([x])=[f(y)]$. <br /><br /><b>(Exercise:</b> Show that this (i.e. this "definition" being well-defined) is equivalent to the condition $\forall E\in\mathcal{F}_Y, f^{-1}(E)\in \mathcal{F}_X$. It may help to draw out some examples.)<br /><br />Well, this expression of the condition -- as $f([x])=[f(y)]$ -- even if technically misleading (the two $f$'s aren't really the same thing) give us the interpretation that a measurable function is one that <i>commutes with the partition</i> or <i>preserves the partition</i>.<br /><br />While homomorphisms in other settings than measurable spaces do not precisely follow the "cannot distinguish related points" notion, they do follow a generalisation where equivalence relations are replaced with other relations, operations, etc. -- in topology, a continuous function preserves limits; in group theory, a group homomorphism preserves the group operation; in linear algebra, a linear transformation preserves linear combinations; in order theory, an increasing function preserves order, etc. In any case, a homomorphism is a function that does not "break" relationships by creating a "finer" relationship on the target space.measurable functionprobabilityprobability theoryrandom variablessigma fieldvenn diagramThu, 17 Oct 2019 00:57:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-3521631282555558856Abhimanyu Pallavi Sudhir2019-10-17T00:57:00ZComment by Abhimanyu Pallavi Sudhir on Schrödinger equation derivation and Diffusion equation
https://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217#145217
But in any case, this is going off a tangent -- my point is that the claim in your answer that "the Schrodinger equation is a wave equation" is not a useful one, especially for this question, which explicitly asks if the formal relation between the diffusion equation and the Schrodinger equation. The observation that the Schrodinger equation admits sinusoidal solutions is not a particularly enlightening one, nor is it very revealing to point out that the classical diffusion equation doesn't.Sun, 13 Oct 2019 08:35:01 GMThttps://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217?cid=1144660#145217Abhimanyu Pallavi Sudhir2019-10-13T08:35:01ZComment by Abhimanyu Pallavi Sudhir on Schrödinger equation derivation and Diffusion equation
https://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217#145217
Sorry, but your definition makes no sense -- e.g. linear combinations of such solutions are also waves. But I don't deny that you <i>can</i> make a definition in your sense, just that it's very conceptually useful. It may be conceptually useful to classify the "higher-order derivatives in $x$" cases as waves if they are to be understood as "corrections" of an ordinary wave of sorts, I don't know. You can replace my definition with $\partial_\mu\partial_\nu\boldsymbol{\Psi}=0$ if you like.Sun, 13 Oct 2019 08:31:30 GMThttps://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217?cid=1144657#145217Abhimanyu Pallavi Sudhir2019-10-13T08:31:30ZComment by Abhimanyu Pallavi Sudhir on Schrödinger equation derivation and Diffusion equation
https://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217#145217
What are you talking about? What's your definition of a wave? You can invent an obfuscated definition of a "wave" under which the Schrodinger equation is a "wave equation", but it would still be <i>conceptually different</i> from the wave equation $\partial^2\psi/\partial x^2=\partial^2\psi/\partial t^2$. Physically <i>fundamentally different</i> equations ought to be called different names, even if some specific solutions appear similar to you -- this isn't "arbitrary".Sat, 12 Oct 2019 16:00:14 GMThttps://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217?cid=1144458#145217Abhimanyu Pallavi Sudhir2019-10-12T16:00:14ZNon-surjectivity of exponential map: how to understand?
https://math.stackexchange.com/questions/3383612/non-surjectivity-of-exponential-map-how-to-understand
0<p>I'm given to understand the exponential map is not generally surjective -- the standard example is <span class="math-container">$\mathrm{SL}(\mathbb{R}^2)$</span> <a href="https://math.stackexchange.com/questions/643216/non-surjectivity-of-the-exponential-map-to-sl2-mathbbc">[ 1 ]</a>. </p>
<p>I can clearly see why this is so in the non-connected case -- the tangent space is a tangent space to the connected component alone, so its image must be contained in the connected component. <strong>I do not see why the map isn't surjective in the connected case.</strong></p>
<p>I also don't see why the map is then <em>again</em> surjective in the compact case -- <a href="https://en.wikipedia.org/wiki/Maximal_torus#Properties" rel="nofollow noreferrer">wikipedia</a> claims that this is a special case of "the exponential map is surjective if every element is contained in a maximal torus". Is this right? Is there a good way to understand why this is true?</p>
<hr>
<p>Note that I am not looking for counter-examples: I'm aware of them. I'm looking for intuition -- perhaps a clever look at what the image of the exponential map actually looks like in the non-surjective case (how it "misses" some of the points in the group). </p>
<p>As an analogy, if asked to explain smooth non-analytic functions, it would be more instructive (than simply providing the example of <span class="math-container">$e^{-1/x}$</span>) to explain that a function may grow slower than all polynomials near zero -- and provide the construction as <span class="math-container">$1/f(1/x)$</span> from any function <span class="math-container">$f$</span> that grows faster than all polynomials as <span class="math-container">$x\to\infty$</span>.</p>
<p>(See <a href="https://math.stackexchange.com/questions/3368390/developing-intuition-for-lie-groups-and-lie-algebras">here</a> for more examples of the kind of intuition I'm looking for, within the context of Lie theory.)</p>lie-groupslie-algebrasMon, 07 Oct 2019 00:43:08 GMThttps://math.stackexchange.com/q/3383612Abhimanyu Pallavi Sudhir2019-10-07T00:43:08ZThe Killing form; factorising non-Abelian Lie groups
https://thewindingnumber.blogspot.com/2019/10/the-killing-form-factorising-non.html
0It could be fun to try and define a "dot product" on a Lie algebra.<br /><br />You know, you might've already realised that the cross product is a Lie bracket of sorts -- you know, given its antisymmetry and the whole $a^\mu b^\nu - a^\nu b^\mu$ representation of the wedge product and all that. It's a short exercise to verify that the Lie algebra $\mathfrak{so}(3)$ of $SO(3)$ is the algebra of skew-symmetric matrices, and with the Lie bracket $XY-YX$ is isomorphic to $\mathbb{R}^3$ with the cross product.<br /><br />Well, the dot product on $\mathbb{R}^3$ has an interesting connection to $SO(3)$ -- it is precisely the form that is invariant under the action of $SO(3)$. Well, but that's $SO(3)$ acting on $\mathbb{R}^3$ -- what is that action in the notation of $\mathfrak{so}(3)$? As it turns out (and you can work this out), it is <b>precisely the adjoint map</b> $\mathrm{Ad}_gX:=gXg^{-1}$ which corresponds to this "rotating $X$ by $g$". It's not really that unexpected, if you ask me -- conjugation is always the natural way to transform matrices in linear algebra when vectors are multiplied on the left.<br /><br />So the "dot product" is an $\mathrm{Ad}$-invariant bilinear form. In fact, adding a symmetricity requirement allows us to just bother with norms (as a symmetric inner product can be determined from the norm, through the cosine rule). Conjugation basically allows you to determine the "<b>contours</b>" of this norm or inner product. The question is: can we determine the bilinear form -- up to scaling -- just from "<b>$\mathrm{Ad}$-invariant symmetric bilinear form</b>" alone?<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-5hjfT6Pw29w/XYyu1CJnKtI/AAAAAAAAFvQ/Fq0LC1Bgqv0QBcW8I3_5DabJkyLOX254QCLcBGAsYHQ/s1600/conjugation%2Bcontour.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="461" data-original-width="601" height="244" src="https://1.bp.blogspot.com/-5hjfT6Pw29w/XYyu1CJnKtI/AAAAAAAAFvQ/Fq0LC1Bgqv0QBcW8I3_5DabJkyLOX254QCLcBGAsYHQ/s320/conjugation%2Bcontour.png" width="320" /></a></div><br />This is equivalent to asking "<i>is the orbit of some non-zero $X$ under conjugation by $G$ equal to $\mathfrak{g}$?</i>" (so that the norm of that $X$ would suffice to determine all norms -- do you see why?) Well, this is equivalent to asking "<i>is $X$ contained in some non-trivial ideal?</i>" (prove that these are equivalent!), and this is equivalent to asking "<i>does $\mathfrak{g}$ have any non-trivial ideals?</i>" (do you see why?)<br /><br />A Lie algebra without nontrivial ideals is called a <b>simple Lie algebra</b>. Our demonstration above shows that a simple Lie algebra has a unique $\mathrm{Ad}$-invariant symmetric bilinear form, determined by the value of $\langle X, X\rangle$ for some non-zero $X$.<br /><br /><div class="twn-furtherinsight">Even before we actually derive what this form must look like, we can derive one important consequence of automorphism invariance: $\langle X, [X, Y]\rangle = 0$ (prove it!), i.e. the tangent to an automorphism curve is perpendicular to the position vector at every point. The understanding of the group as acting as a "rotation group" on its Lie algebra in the adjoint representation really makes sense!</div><br /><div class="twn-beg">Someone tell me if they know how one may "derive" the trace-form formula from this characterisation rather than pulling it out of the blue and <em>then</em> proving it is the unique $\mathrm{Ad}$-invariant symmetric bilinear form. Here's something I started to write:<br /><br />Here's an idea for the base length (i.e. to define the scaling): $X$ has length 1 iff the length of $[X,Y]$ equals the length of $Y$ for all $Y$ perpendicular to $X$ -- equivalently: $\forall V\in\mathfrak{g}, |[X,[X,V]]|=|[X,V]|$. We need to check that this condition is well-defined, i.e. that:<br /><ol><li>Given an $X$, $|[X,[X,U]]|=|[X,U]|$ for some $U$ not a multiple of $X$ implies that $|[X,[X,V]]|=|[X,V]|$ for all $V$.</li><li>$X$ satisfying $|[X,[X,V]]|=|[X,V]|$ implies that all conjugates $gXg^{-1}$ of it satisfy it too. This is trivial from considering $V=gV'g^{-1}$ (since the identity is true for all $V$).</li></ol>Is the first one even true outside $\mathfrak{so}(3)$ -- for all simple Lie algebras?</div><br />One may come up with the idea of defining a form $\langle X, Y\rangle = \mathrm{tr}[X,[Y,\cdot]]$ (example of some weak motivation -- the vector triple product $x\times(x\times v)$ has as eigenvectors the vectors $v$ perpendicular to $x$ and the eigenvalues depend on the length of $x$) and check that this is indeed an $\mathrm{Ad}$-invariant symmetric bilinear form, and is thus unique up to scaling for simple Lie algebras. This form is called the <b>Killing form</b>.<br /><br /><hr /><br /><b>Factorisation of Lie groups</b><br /><br />We have seen the classification of connected Abelian Lie groups: they are products of circles and lines. We wonder if such a classification is possible for more general Lie groups.<br /><br />The natural way to "factorise" groups by taking quotients over normal subgroups -- we wonder if this means that all Lie groups can be written as direct products of <b>simple Lie groups</b> (groups that don't have a nontrivial connected normal subgroup -- can you see why "connected" matters?). Well, not really -- the quotients need not be subgroups at all, after all. Instead, the "factorisation" takes the form of what is known as a <b>group extension</b>. A group for which it <i>is</i> a direct product is called a <b>reductive Lie group</b> -- and its Lie algebra is the direct sum of simple Lie algebras, or a <b>reductive Lie algebra</b>.<br /><br /><div class="twn-pitfall">It is more conventional in the literature to define a simple Lie algebra excluding the one-dimensional/abelian case. In this definition, direct sums of simple Lie algebras are <b>semisimple Lie algebras</b>, and reductive Lie algebras are direct sums of semisimple and abelian Lie algebras.</div><br />TBC: Cartan's criterion, solvability, nilpotency<br />killing formlie grouplie theorynormal subgroupTue, 01 Oct 2019 23:05:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-7359488371150159165Abhimanyu Pallavi Sudhir2019-10-01T23:05:00ZIntuition for the Killing form as "automorphism-invariant symmetric bilinear form"
https://math.stackexchange.com/questions/3369402/intuition-for-the-killing-form-as-automorphism-invariant-symmetric-bilinear-for
1<p>Here's my idea for motivating the Killing form: the <em>only notion</em> we have of magnitudes and angles in a Lie algebra comes from conjugations, as they can be understood to be the "natural" transformations on the Lie algebra. So it's natural to ask for a norm map that satisfies <span class="math-container">$\forall g\in G$</span>,</p>
<p><span class="math-container">$$\|X\|=\|\mathrm{Ad}_gX\|$$</span>
And hopefully we can then use symmetry to pin down a bilinear form. The idea is that we can already compare two vectors in the same line, and this condition creates <em>contours</em> that are precisely the <em>orbits of conjugation</em>, which means allowing us to compare vectors in the same ideal. </p>
<p>So in a simple Lie algebra, the bilinear form would then be completely determined up to scaling.</p>
<p>Am I on a sensible track? I guess what I'm asking is:</p>
<ol>
<li>Am I right to believe that "bilinear, symmetric and automorphism-invariant" uniquely determine the Killing form (up to scaling) for simple Lie algebras? </li>
<li>If so, how can I prove the <span class="math-container">$\mathrm{tr}(\mathrm{ad}(x)\mathrm{ad}(y))$</span> formula from this characterisation?</li>
<li>How might I extend this intuition to non-simple Lie algebras? I think I can "see" why the "semisimple equivalent to non-degenerate" property is true, though.</li>
</ol>
<p>(See <a href="https://math.stackexchange.com/questions/3368390/developing-intuition-for-lie-groups-and-lie-algebras">here</a> for examples of the kind of intuition, motivation I'm looking for. Based on advice there, I'm splitting my "intuition for Lie algebras" questions.)</p>lie-groupslie-algebrasautomorphism-groupWed, 25 Sep 2019 13:10:24 GMThttps://math.stackexchange.com/q/3369402Abhimanyu Pallavi Sudhir2019-09-25T13:10:24ZDeveloping intuition for Lie groups and Lie algebras
https://math.stackexchange.com/questions/3368390/developing-intuition-for-lie-groups-and-lie-algebras
6<p><strong>Background:</strong> Until now, I've been able to <em>motivate</em> everything I've learned in mathematics, and develop some solid insights for everything. But I learned some Lie theory this summer, and while I have a good grasp of the elementary aspects and strong intuition for <em>some</em> or even <em>most</em> of what I've learned, there are some "holes" in my understanding of Lie algebras.</p>
<p>To give you an idea of what I'm looking for, I'll list some examples of things in Lie theory I <strong>DO understand</strong> and am able to motivate:</p>
<ul>
<li>The notion of a <strong>Lie group</strong> itself -- the idea comes from wanting to generalise what we know about discrete groups to more complicated contexts where the "manifold" structure of the group allows us to do so. Examples: <strong>compactness</strong> generalises finiteness, <strong>one-parameter groups</strong> generalise cyclic groups, etc.</li>
<li>The <strong>exponential map</strong> -- For one-parameter groups to generalise cyclic groups, we need a "generalisation" of the group power to allow "real-index powers". The general way to define a <strong>real power</strong> is through the exponential map. Well, this real power stuff isn't <em>always</em> defined as it turns out (you need the exponential map to be surjective), but our motivation does explain why it "makes sense" that the <strong>exponential map is surjective in the connected abelian case</strong> (because then, the Lie algebra is basically a co-ordinate system on the Lie group -- I'm aware exponential co-ordinates are defined in more generality, but it's certainly more well-behaved here).</li>
<li>The <strong>Lie algebra</strong>, i.e. "why is the logarithm/parameter space the tangent space?" We'd like to generalise the notion of a generator to a Lie group -- consider e.g. the circle group on the complex plane. An element near the identity generates a cyclic group, and as the element goes nearer to the identity -- as it becomes an <strong>infinitesimal generator</strong>, the cyclic group it approaches the entire group. Well, an element close to the identity is of the form <span class="math-container">$1+\varepsilon t X$</span>, and generates a group element as <span class="math-container">$(1+\varepsilon tX)^{1/\varepsilon}=e^{tX}$</span>. This is also intuition for the compound-interest limit, and for Euler's identity.</li>
<li>The <strong>Lie bracket</strong> is the second-derivative of the commutator curve <span class="math-container">$\gamma(t)=e^{tX}e^{tY}e^{-tX}e^{-tY}$</span>. Well, it's also the derivative of <span class="math-container">$\gamma(\sqrt{t})$</span>, which proves <strong>closure under the Lie bracket</strong>. </li>
<li>The real justification for the Lie bracket, however, comes from the fundamental fact that <span class="math-container">$\mathrm{ad}:\mathfrak{g}\to\mathrm{Der}(\mathfrak{g}):=X\to[X,\cdot]$</span> is the differential of the adjoint map <span class="math-container">$\mathrm{Ad}:G\to\mathrm{Aut}(G):=g\mapsto\lambda x, gxg^{-1}$</span>, which is a group homomorphism. In particular, the preservation of the Lie Bracket by the differential of a group homomorphism is precisely the <strong>Jacobi identity</strong>: <span class="math-container">$\mathrm{ad}([x,y])=[\mathrm{ad}(x),\mathrm{ad}(y)]$</span>. The basic point is that we are trying to reduce Lie group problems to Lie algebra ones as much as possible, and conjugation is an important idea that we'd like to see the map induced by on the Lie algebra -- we are seeing the result of the obvious fact that <span class="math-container">$T\mathrm{Aut}(G)\subseteq\mathrm{Der}(TG)$</span> (and also <span class="math-container">$T\mathrm{Aut}(M)=\mathrm{Der}(M)$</span> -- the fact that the automorphisms of an object form a group is equivalent to the derivations on an object forming a Lie algebra). Some more examples of the "study the Lie algebra approach":
<ul>
<li>The uniqueness of the determinant as a map from <span class="math-container">$G\to \mathbb{R}-\{0\}$</span>.</li>
<li>An <strong>ideal</strong> is a subalgebra "induced" on the Lie algebra by a normal subgroup of the Lie group. This immediately provides the interpretation as "kernels of Lie algebra homomorphisms" as well as the condition <span class="math-container">$[\mathfrak{g},\mathfrak{i}]\subseteq\mathfrak{i}$</span>. </li>
</ul></li>
<li>The idea behind the manifold-structure of a Lie group is that the flows are produced by left-multiplication by group elements, so those must be homeomorphisms. This motivation can be confirmed through various topological consequences, e.g.
<ul>
<li><strong>A neighbourhood of the identity generates the connected component.</strong> The idea behind the proof is this: if an entire open neighbourhood of the identity is contained in the subgroup, it means you can "flow in any direction" from the subgroup -- but to bring these flows to an arbitrary point of the manifold, you need left-multiplication to be a homeomorphism. </li>
<li><strong>The identity component is a (normal) subgroup.</strong> Because left-multiplication and inversion are continuous, they cannot tear the connected component apart (generalised "intermediate value theorem"), so it is closed under multiplication.</li>
<li><strong>Compact Lie groups</strong> -- How can a Lie group possibly "close in on itself"? Surely we keep "extending" an open neighbourhood <span class="math-container">$W$</span> of the identity by observing that <span class="math-container">$xW$</span> must be in the subgroup? The idea is that these translations of <span class="math-container">$W$</span> form an <strong>open cover of the group, if it has a finite subcover</strong>, then it makes sense for the group to close in on itself. By playing around with different open neighbourhoods <span class="math-container">$W$</span> and taking some suitable unions, one can see that this is equivalent to the condition that every open cover has a finite subcover, i.e. the group is compact.</li>
</ul></li>
<li><strong>Characterisation of Abelian Lie groups</strong> -- "Compact Connected Abelian Lie Group is a torus" is a generalisation of "finite Abelian group is a product of cyclic groups" -- the idea is that the exponential map "wraps" the Lie algebra around into the Lie group -- this just gives the quotient of the Lie algebra by the kernel of the exponential map, which is topologically <span class="math-container">$\mathbb{R}^n/\mathbb{Z}^n$</span>. The characterisation of a connected Abelian Lie group as a cylinder <span class="math-container">$\mathbb{R}^{n+k}/\mathbb{Z}^k$</span> follows similarly.</li>
</ul>
<p>With that said, here are some stuff I <strong>DON'T (completely) understand</strong>, and would like to have a similar level of understanding for:</p>
<ul>
<li>Why is the <strong>structure of a Lie group characterised by its second-order structure</strong>? I know that this follows from the <strong>BCH formula</strong>, the local diffeomorphism nature of the exponential map and the fact that an open neighbourhood of 1 generates the group, but I have no intuition at all why the BCH formula "should" be true.</li>
<li>What's the deal with <strong>simply-connected groups</strong>? I can certainly see why the Lie algebra cannot detect disconnectedness in a group -- I had expected that it could not detect compactness either (whether the group closes in on itself eventually), so the statement of Lie's third theorem would be "every Lie algebra has a corresponding unique connected, compact Lie group". Instead, the statement is "every Lie algebra has a corresponding <em>simply connected</em> Lie group".</li>
<li><strong>Non-surjectivity of the exponential map</strong> even in the connected case -- I'm not asking for counter-examples, I'm asking "what exactly goes wrong in groups like <span class="math-container">$SL_\mathbb{R}(2)$</span>?", perhaps a hint about "what does the image of the exponential map look like?" (as an analogy, I would explain smooth functions failing to be analytic as "they are flatter than every polynomial at 0, and can be constructed as <span class="math-container">$1/f(1/x)$</span> where <span class="math-container">$f$</span> is any function that grows faster than every polynomial")</li>
<li><strong>Surjectivity when every element is contained in a maximal torus</strong> -- I read this <a href="https://en.wikipedia.org/wiki/Maximal_torus#Properties" rel="nofollow noreferrer">here</a> as a generalisation of "the exponential map is surjective in the connected compact case". Even if the generalisation isn't true, is there an intuitive way to understand why compactness makes the problem in the previous point go away.</li>
<li><strong>Characterisation of non-Abelian Lie groups</strong> -- Tell me if my understanding of simple and semisimple Lie algebras makes sense -- we want to classify non-Abelian Lie groups as products like we do Abelian Lie groups, and the only way to do so is as "semidirect products of simple Lie groups and Abelian Lie groups". A <strong>reductive</strong> Lie group is basically when this semidirect product is a direct product, and a <strong>semi-simple</strong> one is a reductive Lie group where there are no Abelian groups in the product. Is this right?</li>
<li><strong>Various abstract algebraic things</strong> -- I have no idea how to interpret things like nilpotent and solvable Lie algebras, radicals and so on in the context of Lie theory. </li>
<li>At first when I heard of the <strong>Killing form</strong>, I presumed it would be some "natural" way to define a dot product on the Lie algebra -- but I honestly don't see how it is natural. Is it the <em>only</em> dot product that is invariant under Lie algebra automorphisms? </li>
</ul>
<p>I've thought very hard about the theory, but I just can't seem to figure out how to fill these "holes". <strong>Am I missing some important central insight into Lie theory that are crucial to some of these questions?</strong></p>lie-groupslie-algebrasintuitionmatrix-exponentialTue, 24 Sep 2019 17:22:34 GMThttps://math.stackexchange.com/q/3368390Abhimanyu Pallavi Sudhir2019-09-24T17:22:34ZLie group topology
https://thewindingnumber.blogspot.com/2019/09/lie-group-topology.html
0I'll assume you have a basic understanding of general topology -- if not, consult the <a href="https://thewindingnumber.blogspot.com/p/2204.html">topology articles here</a>. Most of the abstract stuff and "weird" cases are not really important, because it is easy to see that Lie groups are manifolds.<br /><br />We need to be careful while studying the topology of Lie groups, because we already have an intuitive picture of a Lie group, and we need to be careful to prove all the things we just "believe" to be true.<br /><br />The main point of the topology of a Lie group is that the group elements define the "flows" on the manifold. What this means is that <b>left-multiplication is a homeomorphism</b>, and it's not absurd to say that <b>inversion is a homeomorphism</b>, because it represents a "reflection" of the manifold. That these conditions make sense is confirmed by looking at the proofs of the following "obvious" facts.<br /><br /><b>(1) In a connected group, a neighbourhood of the identity generates the entire group,</b> i.e. $H\le G\land H\in N(1)\implies H=G$ for connected $G$.<br /><br />Let's think about why this is true. Why does $H$ need to be a neighbourhood -- why must it contain an open set containing the identity? Suppose instead we just knew it contained a set $Q$ that looked like this:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-GKetHN8XvRw/XYZe0gFDizI/AAAAAAAAFuo/TNWiIWG-kno_P6VXRJvktYb9_SlRGZjtgCLcBGAsYHQ/s1600/not%2Ba%2Bneighbourhood.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="378" data-original-width="346" height="320" src="https://1.bp.blogspot.com/-GKetHN8XvRw/XYZe0gFDizI/AAAAAAAAFuo/TNWiIWG-kno_P6VXRJvktYb9_SlRGZjtgCLcBGAsYHQ/s320/not%2Ba%2Bneighbourhood.png" width="291" /></a></div><br />Well, $H$ still contains the orange point, but we cannot say it contains the purple point, because it's perfectly happy not containing it -- it's not like we have some vertical element in the Lie group that if you multiplied to some point in $Q$, you'd get the purple point. But instead if $Q$ was an open neighbourhood of the identity:<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-MnEQeC8NSoQ/XYZkr1agIEI/AAAAAAAAFu0/v0TxF35QuT4Ctb1FVdxM4y76Rw4-nzWqgCLcBGAsYHQ/s1600/neighbourhood%2Ba%2Byes.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="393" data-original-width="370" height="320" src="https://1.bp.blogspot.com/-MnEQeC8NSoQ/XYZkr1agIEI/AAAAAAAAFu0/v0TxF35QuT4Ctb1FVdxM4y76Rw4-nzWqgCLcBGAsYHQ/s320/neighbourhood%2Ba%2Byes.png" width="301" /></a></div>Then the purple point has to be in $H$, because $Q$ contains flows in "all directions" on the group. To actually prove that every point will be contained in $H$ -- well, we know that the point is (will eventually be) that $H$ is the connected component of $G$ (and since $G$ is connected, $H=G$) -- let's just show that $H$ is both open and closed, i.e. nothing in $H$ touches its exterior, and nothing in its exterior touches $H$. Here's the proof:<br /><ul><li><b>Nothing in $H$ touches anything -- </b>Suppose $\exists x\in H, x\in\mathrm{cl}(H')$. Then $xQ$ contains a point in $H'$.</li><li><b>Nothing outside $H$ touches it</b> -- Suppose $\exists x\in H', x\in\mathrm{cl}(H)$. Then $xQ$ contains a point in $H$, so $x$ must be in $H$.</li></ul>We're really just formalising the notion of "translating $Q$ to its edges to extend $H$ further and further". The key fact we've used here is, of course, that left-multiplication is a homeomorphism, so $xQ$ is still an open set.<br /><b><br /></b><b>(2) The connected component of the identity is a subgroup.</b><br /><b><br /></b> The idea is that taking two elements $g,h$ of the connected component, their product should remain in the connected component. Once again, this follows from the <b>continuity of left-multiplication</b> -- considering the action of left-multiplication by $g$ on the connected component, its continuity implies that the image must remain connected.<br /><b><br /></b><b>(3) If a subgroup contains a neighbourhood of the identity, it contains the connected component of the identity.</b><br /><div><b><br /></b> Corollary to (1) and (2).<br /><b><br /></b></div><div><b>(4) The connected component of the identity is a <i>normal</i> subgroup.</b><br /><b><br /></b> Conjugation is a continuous map.<br /><b><br /></b><b>(5) Open subgroups are closed.</b><br /><b><br /></b> Corollary to (3). Alternate proof: the complement is the union of some cosets, which are open sets too. A weaker theorem can be made of closed sets -- closed subgroups with finite index are open.<br /><br />What this means: any open subgroup is a union of connected components.<br /><br /><b>(6) Intuition for compact subgroups</b><br /><b><br /></b>How can a Lie group possibly "close in on itself"? Surely we keep "extending" an open neighbourhood $W$ of the identity by observing that $xW$ must be in the subgroup? The idea is that these translations of $W$ form an <b>open cover of the group, if it has a finite subcover</b>, then it makes sense for the group to close in on itself. By playing around with different open neighbourhoods $W$ and taking some suitable unions, one can see that this is equivalent to the condition that every open cover has a finite subcover, i.e. the group is compact.<br /><br /><div class="separator" style="clear: both; text-align: center;"><a href="https://1.bp.blogspot.com/-cHRhxfTndyo/XYcEqmYku_I/AAAAAAAAFvA/S4UjQafMaX81u0p1emQk8-6kgy8yEWbXgCLcBGAsYHQ/s1600/open%2Bcover%2B-%2Bselect.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="393" data-original-width="408" height="308" src="https://1.bp.blogspot.com/-cHRhxfTndyo/XYcEqmYku_I/AAAAAAAAFvA/S4UjQafMaX81u0p1emQk8-6kgy8yEWbXgCLcBGAsYHQ/s320/open%2Bcover%2B-%2Bselect.png" width="320" /></a></div><br /></div><div><b>(7) A compact, connected Abelian Lie group is a torus.</b></div><div><b><br /></b> This is a generalisation of "a finite Abelian group is the direct product of cyclic groups".<br /><br />The idea behind the proof is that in the Abelian case, the exponential map is a homomorphism from the Lie algebra to the Lie group, but the Lie algebra cannot detect compactness in the Lie group -- the kernel of the exponential map can. We know from our study of the exponential map that it has a discrete kernel, and in the Abelian case is surjective -- thus the Lie group is homeomorphic to $\mathbb{R}^n/\mathbb{Z}^n$, which is an $n$-torus.<br /><b><br /></b></div><div><b>(8) A connected Abelian Lie group is a cylinder (direct product of a torus and an affine space)</b><br /><b><br /></b>Analogous to above, except $\mathbb{R}^m/\mathbb{Z}^n$ where $m\ge n$.</div><div></div>compactnessconnected componentconnectednessgroup theorygroupslie group topologylie groupslie theorynormal subgroupopen setstopologyMon, 23 Sep 2019 15:55:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-7539770682824114785Abhimanyu Pallavi Sudhir2019-09-23T15:55:00ZAnswer by Abhimanyu Pallavi Sudhir for How to develop intuition in topology?
https://math.stackexchange.com/questions/576593/how-to-develop-intuition-in-topology/3364031#3364031
0<p>Let's do an example: let's say we want to know when limits are unique in a topological space. Here's the proof of the theorem in a metric space:</p>
<blockquote>
<p>Let <span class="math-container">$(a_n)$</span> be a sequence with limits <span class="math-container">$L_1$</span> and <span class="math-container">$L_2$</span>. Then <span class="math-container">$a_n$</span> is eventually within every neighbourhood of <span class="math-container">$L_1$</span> and every neighbourhood of <span class="math-container">$L_2$</span>. If <span class="math-container">$L_1\ne L_2$</span>, we can choose the neighbourhoods to be disjoint. Contradiction.</p>
</blockquote>
<p>This is completely equivalent to the proof you've probably seen, but I've phrased everything in terms of neighbourhoods, which are fundamentally topological concepts. The only fact we used is the existence of disjoint neighbourhoods of distinct points. Limits being unique is pretty important, so we call a space where distinct points allow disjoint neighbourhoods a <strong>Hausdorff space</strong> or <strong>T2 space</strong>.</p>
<p>(It's also worth thinking about why the generalisation goes for limits of <em>nets</em>, rather than limits of <em>sequences</em>)</p>
<p>The trick I'm suggesting is to "work backwards" from theorems you can tell are important (as opposed to some inane statement about open sets): (1) start with an important theorem in analysis, (2) go through its proof, (3) work out what axioms you need and simplify them to a form involving just open sets.</p>
<p>Some more examples of such generalisations:</p>
<ul>
<li>Every open neighbourhood of a limit point of <span class="math-container">$S$</span> contains an infinite number of points in <span class="math-container">$S$</span>. (T1 space)</li>
<li>Finite sets are closed. (T1 space)</li>
<li>Continuous extension theorem. (T4 space)</li>
<li>Bolzano-Weierstrass theorem (compact sets)</li>
<li>Intermediate value theorem (connected sets)</li>
</ul>
<p>You may find this series of articles I wrote illuminating to this end: <a href="https://thewindingnumber.blogspot.com/p/2204.html" rel="nofollow noreferrer">https://thewindingnumber.blogspot.com/p/2204.html</a></p>Sat, 21 Sep 2019 04:32:49 GMThttps://math.stackexchange.com/questions/576593/-/3364031#3364031Abhimanyu Pallavi Sudhir2019-09-21T04:32:49ZMixed states II: decoherence; important measures of purity and entropy
https://thewindingnumber.blogspot.com/2019/09/mixed-states-ii-decoherence-important.html
0<b>Decoherence</b><br /><br />At the end of this section, you should be able to:<br /><ul><li>appreciate why the density matrix is really a great way of expressing states, even for pure states (they uniquely determine the dynamics of the system, without any "overall phase", etc.)</li><li>develop an intuition for measurement, even "inadvertent" measurement</li><li>understand on a somewhat high level how classical physics arises as a limit of quantum physics</li><li>hang out with Wigner's friend</li><li>admit that complex phases matter in quantum mechanics and link them to interference</li></ul><br />Let's talk about <b>measurement</b>.<br /><br />Suppose we have a system that we wish to measure it under an operator whose eigenvectors are $|0\rangle_A$ and $|1\rangle_B$. The idea is that we have some measurement apparatus, and their original combined state evolves from something like:<br /><br />$$|\psi\rangle_{AB}=(\lambda|0\rangle_A+\mu|1\rangle_B)\otimes|0\rangle_B$$<br />To the entangled state:<br /><br />$$|\psi\rangle_{AB} = \lambda|0\rangle_A\otimes|0\rangle_B+\mu|1\rangle_A\otimes|1\rangle_B$$<br />Then observing the apparatus is sufficient to observe the system. The idea is that ultimately, the observer himself (or his "knowledge") are the apparatus, and the he entangles with the system to measure it.<br /><br />Well, we know that often, we end up seeing things we didn't really want to. After all, physics does not care about your wants and preferences. In fact, in pretty much any situation, information about the system <i>will</i> <b>leak</b> out into the surroundings in some specific way. For example, Schrodinger's cat leaks information about the life of the cat by making the environment smelly, i.e. the state evolves from:<br /><br />$$|\psi\rangle_{AB}=(\lambda|\mathrm{alive}\rangle+\mu|\mathrm{dead}\rangle)\otimes|\mathrm{clean}\rangle$$<br />To the entangled state:<br /><br />$$|\psi\rangle_{AB}=\lambda|\mathrm{alive}\rangle\otimes|\mathrm{clean}\rangle+\mu|\mathrm{dead}\rangle\otimes|\mathrm{smelly}\rangle$$<br />What this means is that the density matrix of the cat evolves as:<br /><br />$$\left[ {\begin{array}{*{20}{c}}{{{\left| \lambda \right|}^2}}&{\lambda \bar \mu }\\{\mu \bar \lambda }&{{{\left| \mu \right|}^2}}\end{array}} \right] \mapsto \left[ {\begin{array}{*{20}{c}}{{{\left| \lambda \right|}^2}}&0\\0&{{{\left| \mu \right|}^2}}\end{array}} \right]$$<br />(Check that I got the right transpose.) OK, what happened here?<br /><br />Recall that the probabilities of collapsing to $|0\rangle$ and $|1\rangle$ are determined purely by the elements on the diagonal -- the off-diagonal elements, or the <b>coherences</b>, are only relevant for collapsing on to some combination of $|0\rangle$ and $|1\rangle$. What's going on here is that when the environment entangles with the system, it has "kinda" already observed it -- like your Wigner's friend. It "knows" that the system isn't in $|0\rangle+|1\rangle$, and even though you haven't observed the environment yet (you haven't smelled it), you know how the combined state has evolved, and the probability has become a <b>classical probability</b>, because the quantum stuff has already been observed -- by the environment.<br /><br /><b>The idea behind decoherence is the same idea that ensures that the Wigner's friend scenario is consistent.</b><br /><b><br /></b> "Eventually", "all" the information about the system will leak into the environment -- i.e. in principle, we should be able to determine anything about the system from measuring the environment, and our uncertainty about the system arises entirely from our <b>completely classical uncertainty</b> about the environment -- so the density matrix becomes a classical one, i.e. a <b>diagonal one</b> (the off-diagonal terms go to zero).<br /><br />What basis is it diagonal in? In the basis corresponding to the states of the environment -- i.e. if the environment can be in states $|0\rangle_B$ and $|1\rangle_B$, then the states of the system that precisely induce these states of the environment form the preferred basis. These are often called the "<b>environmentally selected basis</b>".<br /><br />This process is called <b>decoherence</b>. You may also hear the terms <b>pointer states</b> (for the preferred basis), <b>einselection</b> (<i>environmentally induced selection</i> of the preferred basis), or <b>Quantum Darwinism</b> (what the heck?) -- but they're really synonymous. We'll just use the fancy words when they're grammatically useful.<br /><br />Well, the following may not be completely clear, but you should at least be able to appreciate that it is true: the off-diagonal terms <i>approach</i> zero, rather than hit it. Why? Although the system leaks information into the surroundings, we aren't really certain about what we're inferring about the system from the environment -- a live cat may be smelly too, etc. So the pointer states are not exactly orthogonal, either.<br /><br />The precise behavior of decoherence depends on the Hamiltonian of the system -- e.g. predicting the generation of the smelliness of the air from the state of the cat based on what's going on microscopically is something that could be done in principle by solving a really complicated Schrodinger equation. You can, given a Hamiltonian, at least make order-of-magnitude estimates of at how much time and at how macroscopic a scale (i.e. with how many degrees of freedom) does the system begin to behave in a way that can be described as classical.<br /><br /><div class="twn-pitfall">Decoherence does <em>not</em> remove the need for wavefunction collapse -- one still needs the observer to note an observation, collapsing the system.</div><br />TBC: purity, entropy, correlation functionsdecoherencedensity matrixphysicsquantum mechanicstensor productwigner's friendMon, 16 Sep 2019 18:47:00 GMTnoreply@blogger.comtag:blogger.com,1999:blog-3214648607996839529.post-415557332515658087Abhimanyu Pallavi Sudhir2019-09-16T18:47:00ZAnswer by Abhimanyu Pallavi Sudhir for Adjoint map is Lie homomorphism
https://math.stackexchange.com/questions/1339289/adjoint-map-is-lie-homomorphism/3355482#3355482
0<p><span class="math-container">$\mathrm{ad}_X$</span> is not a Lie Homomorphism, but <span class="math-container">$\mathrm{ad}$</span> is. We can define a map <span class="math-container">$\mathrm{Ad}:G\to\mathrm{Aut}(G):=\lambda x.\lambda y.\ xyx^{-1}$</span>, whose differential is then <span class="math-container">$\mathrm{ad}:TG\to T\mathrm{Aut}(G):=\lambda X.\lambda Y.\ [X,Y]$</span>. The homomorphism property on this map is then precisely the Jacobi identity.</p>Fri, 13 Sep 2019 17:49:45 GMThttps://math.stackexchange.com/questions/1339289/-/3355482#3355482Abhimanyu Pallavi Sudhir2019-09-13T17:49:45ZAnswer by Abhimanyu Pallavi Sudhir for A subset of a compact set is compact?
https://math.stackexchange.com/questions/212181/a-subset-of-a-compact-set-is-compact/3346082#3346082
0<p>Here's an alternate proof (for closed subsets, obviously): any net on <span class="math-container">$S$</span> is a net on <span class="math-container">$T$</span> and thus has a convergent subnet with limit is in <span class="math-container">$T$</span> -- but its limit must also be in <span class="math-container">$S$</span> because <span class="math-container">$S$</span> is closed.</p>
<p>It's a little tricky because the notion of closed sets and compact sets are intuitively very similar.</p>Fri, 06 Sep 2019 09:17:08 GMThttps://math.stackexchange.com/questions/212181/-/3346082#3346082Abhimanyu Pallavi Sudhir2019-09-06T09:17:08ZComment by Abhimanyu Pallavi Sudhir on Why is a topology made up of 'open' sets?
https://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets/19173#19173
Ah wait, no it's fine -- your axiom 4 implies the converse of axiom 3, and preservation of binary unions leads to $A\subseteq B\Rightarrow \mathrm{cl}(A)\subseteq\mathrm{cl}(B)$.Wed, 21 Aug 2019 20:16:16 GMThttps://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets/19173?cid=847795#19173Abhimanyu Pallavi Sudhir2019-08-21T20:16:16ZComment by Abhimanyu Pallavi Sudhir on Why is a topology made up of 'open' sets?
https://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets/19173#19173
Wait -- so in a pre-topology, it's no longer true true that "if $x$ touches $A\subset B$, then $x$ touches $B$"?Wed, 21 Aug 2019 07:32:27 GMThttps://mathoverflow.net/questions/19152/why-is-a-topology-made-up-of-open-sets/19173?cid=847627#19173Abhimanyu Pallavi Sudhir2019-08-21T07:32:27ZComment by Abhimanyu Pallavi Sudhir on Schrödinger equation derivation and Diffusion equation
https://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217#145217
What? Under the standard definition of a "wave equation", it must be second-order in time, which the Schrodinger equation is not. It may allow wave-like solutions, but it's fundamentally a (Wick rotated) diffusion equation.Tue, 20 Aug 2019 03:53:25 GMThttps://physics.stackexchange.com/questions/144832/schr%c3%b6dinger-equation-derivation-and-diffusion-equation/145217?cid=1121832#145217Abhimanyu Pallavi Sudhir2019-08-20T03:53:25ZIs a (finite) group determined by its subgroups?
https://math.stackexchange.com/questions/3323761/is-a-finite-group-determined-by-its-subgroups
6<p><strong>Motivation</strong></p>
<p>I think of the "structure" of a topological space <span class="math-container">$X$</span> as being the limit operator on functions <span class="math-container">$I\to X$</span> where <span class="math-container">$I$</span> could be the natural numbers or another topological space -- in this sense, a topological homomorphism (continuous function) <span class="math-container">$f$</span> is a function that commutes with the limit operation <span class="math-container">$f(\lim x)=\lim f(x)$</span>, similar to how a group homomorphism commutes with group multiplication <span class="math-container">$f(\mathrm{mult}(x,y))=\mathrm{mult}(f(x),f(y))$</span> and a linear transformation commutes with linear combination.</p>
<p>Nonetheless, it can be shown that this structure can be determined uniquely by the set of open sets on <span class="math-container">$X$</span>. One may also understand these open sets to be the "sub-(topological spaces)" of <span class="math-container">$X$</span> as the topology of <span class="math-container">$X$</span> is inherited by them exactly (well, the closed sets are also a "dual" kind of sub-topological spaces). </p>
<p>Similarly, given a set <span class="math-container">$V$</span> and a list of subsets that we call "subspaces" (which would have to satisfy some properties), one can determine the vector space up to isomorphism (i.e. we can find its dimension).</p>
<hr>
<p>I wonder if something like this can be done with groups. Given a set <span class="math-container">$G$</span> and a list of subsets we call its "subgroups", can we determine the group up to isomorphism? At least for finite sets?</p>
<p>Example given the set <span class="math-container">$\{0, 1, 2, 3\}$</span>, we'd be given the following "subgroup structure" on it: <span class="math-container">$\{\{0\},\{0,2\},\{0,1,2,3\}\}$</span>, and the group being described is <span class="math-container">$C_4$</span>. The positions of 1 and 3 aren't determined, but the group is still determined to isomorphism.</p>group-theoryfinite-groupsThu, 15 Aug 2019 05:43:06 GMThttps://math.stackexchange.com/q/3323761Abhimanyu Pallavi Sudhir2019-08-15T05:43:06ZComment by Abhimanyu Pallavi Sudhir on How is it possible that consciousness-causes-collapse interpretations of QM are not falsified by the Quantum Zeno effect?
https://physics.stackexchange.com/questions/495125/how-is-it-possible-that-consciousness-causes-collapse-interpretations-of-qm-are/495137#495137
@Wolphramjonny Just write down the state vector for the combined system of the (not yet measured) "non-conscious" apparatus and the system being measured. This represents the "knowledge of the system according to an external observer". As you can see, metaphysical questions about the "knowledge of the apparatus" are not involved in the expression.Mon, 05 Aug 2019 07:46:28 GMThttps://physics.stackexchange.com/questions/495125/how-is-it-possible-that-consciousness-causes-collapse-interpretations-of-qm-are/495137?cid=1115461#495137Abhimanyu Pallavi Sudhir2019-08-05T07:46:28ZAnswer by Abhimanyu Pallavi Sudhir for How is it possible that consciousness-causes-collapse interpretations of QM are not falsified by the Quantum Zeno effect?
https://physics.stackexchange.com/questions/495125/how-is-it-possible-that-consciousness-causes-collapse-interpretations-of-qm-are/495137#495137
1<p>If you accept positivism, it becomes obvious that "consciousness causes collapse" cannot possibly be distinguished experimentally from the Copahangen principle as long as you accept that <em>you</em> are conscious. </p>
<p>This "interpretation" makes claims about the knowledge of <em>another</em> (non-conscious) observer, claiming that it does not alter the state of other systems. But this is fundamentally a metaphysical claim -- it's like asking "what if my red is your blue and my blue is your red?" Whatever your metaphysical belief on whether a non-conscious observer "already" caused a wavefunction collapse, your knowledge only changes when you observe the system, be it of that non-conscious observer.</p>Sun, 04 Aug 2019 05:52:21 GMThttps://physics.stackexchange.com/questions/495125/-/495137#495137Abhimanyu Pallavi Sudhir2019-08-04T05:52:21ZComment by Abhimanyu Pallavi Sudhir on Was "Crook's algorithm" for Sudoku really only developed in the 21st century?
https://puzzling.stackexchange.com/questions/86805/was-crooks-algorithm-for-sudoku-really-only-developed-in-the-21st-century
@ArnaudMortier How so?Fri, 02 Aug 2019 16:54:03 GMThttps://puzzling.stackexchange.com/questions/86805/was-crooks-algorithm-for-sudoku-really-only-developed-in-the-21st-century?cid=252361Abhimanyu Pallavi Sudhir2019-08-02T16:54:03ZComment by Abhimanyu Pallavi Sudhir on Was "Crook's algorithm" for Sudoku really only developed in the 21st century?
https://puzzling.stackexchange.com/questions/86805/was-crooks-algorithm-for-sudoku-really-only-developed-in-the-21st-century
@GarethMcCaughan Note that (1) and (2) are special cases of (3) for K = 1, K = total number of empty squares - 1.Fri, 02 Aug 2019 16:53:28 GMThttps://puzzling.stackexchange.com/questions/86805/was-crooks-algorithm-for-sudoku-really-only-developed-in-the-21st-century?cid=252360Abhimanyu Pallavi Sudhir2019-08-02T16:53:28ZComment by Abhimanyu Pallavi Sudhir on Why would ReLU work as an activation function at all?
https://stats.stackexchange.com/questions/297947/why-would-relu-work-as-an-activation-function-at-all/298159#298159
Except the standard proof of the universal approximation theorem relies on the boundedness of the activation functions. There are <a href="https://arxiv.org/pdf/1505.03654.pdf" rel="nofollow noreferrer">extensions</a>, but the fact that ReLU works is not obvious to me.Fri, 02 Aug 2019 16:50:58 GMThttps://stats.stackexchange.com/questions/297947/why-would-relu-work-as-an-activation-function-at-all/298159?cid=784263#298159Abhimanyu Pallavi Sudhir2019-08-02T16:50:58ZWas "Crook's algorithm" for Sudoku really only developed in the 21st century?
https://puzzling.stackexchange.com/questions/86805/was-crooks-algorithm-for-sudoku-really-only-developed-in-the-21st-century
4<p>The following algorithm for simplifying (and very often completely solving) Sudoku puzzles:</p>
<ol>
<li>Label each cell with the set of all possible values it could take.</li>
<li>Pick a row/column/block and for a value of <span class="math-container">$K\in[1, 9)$</span>, look for "<span class="math-container">$K$</span>-partnerships" -- <span class="math-container">$K$</span>-tuples of cells that satisfy "the union of labels of each cell in the tuple has cardinality <span class="math-container">$K$</span>". Call the "union of labels of each cell in a partnership" the "banned set" of the partnership.</li>
<li>For each such partnership, for all cells in that row/column/block <em>not</em> in the partnership remove any element in its label that are in the banned set of the partnership.</li>
<li>Repeat Steps 2-3 for all values of <span class="math-container">$K$</span> and all rows, columns and blocks.</li>
</ol>
<p>(i.e. "if you have three cells labeled as (4, 5), (4, 7), (4, 5, 7), no other cell in that row can be 4, 5 or 7") </p>
<p>... has always seemed obvious to me, but I'm now informed from some sources that it has a name called "Crook's algorithm":</p>
<ul>
<li><a href="http://pi.math.cornell.edu/~mec/Summer2009/meerkamp/Site/Solving_any_Sudoku_II.html" rel="nofollow noreferrer">http://pi.math.cornell.edu/~mec/Summer2009/meerkamp/Site/Solving_any_Sudoku_II.html</a></li>
<li><a href="https://www.ams.org/notices/200904/tx090400460p.pdf" rel="nofollow noreferrer">https://www.ams.org/notices/200904/tx090400460p.pdf</a></li>
</ul>
<p>The latter (by Crook) attributes the algorithm to texts written in 2005 and 2006. Are these really the earliest references? I'm pretty sure this must have been well-known for decades, but I'm not sure what to search for to find older references.</p>sudokupuzzle-historyFri, 02 Aug 2019 07:02:08 GMThttps://puzzling.stackexchange.com/q/86805Abhimanyu Pallavi Sudhir2019-08-02T07:02:08ZComment by Abhimanyu Pallavi Sudhir on Is there go up line character? (Opposite of \n)
https://stackoverflow.com/questions/11474391/is-there-go-up-line-character-opposite-of-n/11474509#11474509
Doesn't work with Windows/Python 3.Thu, 01 Aug 2019 19:49:57 GMThttps://stackoverflow.com/questions/11474391/is-there-go-up-line-character-opposite-of-n/11474509?cid=101123890#11474509Abhimanyu Pallavi Sudhir2019-08-01T19:49:57ZComment by Abhimanyu Pallavi Sudhir on Output to the same line overwriting previous output?
https://stackoverflow.com/questions/4897359/output-to-the-same-line-overwriting-previous-output/27023394#27023394
Using <code>end = '\r'</code> instead fixes the problem in Python 3.Wed, 31 Jul 2019 19:10:36 GMThttps://stackoverflow.com/questions/4897359/output-to-the-same-line-overwriting-previous-output/27023394?cid=101088967#27023394Abhimanyu Pallavi Sudhir2019-07-31T19:10:36ZAnswer by Abhimanyu Pallavi Sudhir for Neural Networks vs. Polynomial Regression/Other techniques for curve fitting?
https://math.stackexchange.com/questions/2901209/neural-networks-vs-polynomial-regression-other-techniques-for-curve-fitting/3308606#3308606
0<p>Polynomial regression is just usually the wrong Bayesian prior. You need functions with highly "non-local" effects which require high-degree polynomials, but polynomial regression gives zero prior probabilities to high-degree polynomials. As it turns out, neural networks happen to provide a reasonably good prior (perhaps that's why our brains work that way -- if they even do).</p>Tue, 30 Jul 2019 18:31:31 GMThttps://math.stackexchange.com/questions/2901209/-/3308606#3308606Abhimanyu Pallavi Sudhir2019-07-30T18:31:31ZOrthochronous indefinite orthogonal group $O^+(m, n)$ forms a group
https://physics.stackexchange.com/questions/494260/orthochronous-indefinite-orthogonal-group-om-n-forms-a-group
1<p>My question is based on Qmechanic's answer <a href="https://physics.stackexchange.com/a/36425/23119">here</a> which proves that <span class="math-container">$O^+(m, 1)$</span> forms a group -- that if two Lorentz transformations have positive time-time co-ordinate, so does their product. The key is that with the Lorentz transformation written in the form:</p>
<p><span class="math-container">$$\Lambda = \left[\begin{array}{cc}\Lambda_a & \Lambda_b^t \cr \Lambda_c &\Lambda_R \end{array} \right].$$</span></p>
<p>We can show that <span class="math-container">$|(\Lambda\tilde{\Lambda})_a-\Lambda_a\tilde{\Lambda}_a|\le \sqrt{(\Lambda_a^2-1)(\tilde{\Lambda}_a^2-1)}$</span> which implies that positive <span class="math-container">$\Lambda_a,\tilde{\Lambda_a}$</span> imply positive <span class="math-container">$(\Lambda\tilde{\Lambda})_a$</span>.</p>
<p>Well, the trouble is that this uses the Cauchy-Schwarz inequality in Step 6, and therefore doesn't work for the general case of <span class="math-container">$O^+(m, n)$</span>. How would one generalise the proof to <strong>prove the orthochronous indefinite orthogonal group <span class="math-container">$O^+(m, n)$</span> is a group</strong>?</p>
<p>Here's what I've tried so far: defining <span class="math-container">$O^{+}(m,n)$</span> as the subset of <span class="math-container">$O(m,n)$</span> with elements <span class="math-container">$\Lambda$</span> which satisfy <span class="math-container">$\det(\Lambda_a)>0$</span> (and in fact <span class="math-container">$\ge 1$</span>), </p>
<ol>
<li><p>As before, <span class="math-container">$(\Lambda\tilde{\Lambda})_a=\Lambda_a\tilde{\Lambda}_a+\Lambda_b^T\tilde{\Lambda}_c$</span>. </p></li>
<li><p>From multiplying out <span class="math-container">$\Lambda^T\eta \Lambda=\eta$</span> and <span class="math-container">$\Lambda\eta \Lambda^T=\eta$</span>, we see that <span class="math-container">$\Lambda_a^2-\Lambda_c^T\Lambda_c=\Lambda_a^2-\Lambda_b^T\Lambda_b=I$</span> and analogous for <span class="math-container">$\tilde{\Lambda}$</span>.</p></li>
<li><p>So <span class="math-container">$\det\left((\Lambda\tilde{\Lambda})_a-\Lambda_a\tilde{\Lambda}_a\right)=\det\left(\Lambda_b^T\tilde{\Lambda}_c^T\right)=\sqrt{\det\left(\Lambda_a^2-I\right)\det\left(\tilde{\Lambda}_a^2-I\right)}$</span>.</p></li>
</ol>
<p>Well, I'm not sure how to proceed at this point. Does <span class="math-container">$\det(X-PQ)=\det((P^2-I)(Q^2-I))^{1/2}$</span> imply that <span class="math-container">$\det P\ge 1\land\det Q\ge 1\Rightarrow \det X>0$</span> <em>in general</em>?</p>
<p>The <a href="https://physics.stackexchange.com/a/36388/23119">"topological proof" from Ron Maimon</a> does not work either, as the orbit of the unit time vector is <a href="https://math.stackexchange.com/questions/2022156/how-many-sheets-can-a-hyperboloid-have-in-n-dimensions">connected when <span class="math-container">$n>1$</span></a>. I suspect that a more powerful technique than looking at the orbit of the unit time vector would be to look at the topology of the Lie group itself -- but I'm not that familiar with this stuff.</p>special-relativitymathematical-physicsgroup-theorylorentz-symmetrytopologyMon, 29 Jul 2019 20:30:31 GMThttps://physics.stackexchange.com/q/494260Abhimanyu Pallavi Sudhir2019-07-29T20:30:31ZAnswer by Abhimanyu Pallavi Sudhir for What is the frequency of white light?
https://physics.stackexchange.com/questions/494081/what-is-the-frequency-of-white-light/494085#494085
1<p>It doesn't have a specific frequency -- it has a frequency distribution.</p>
<p>You don't even need to go as far as white light -- just consider a "camel hump" wave, like <span class="math-container">$\sin ax+\sin bx$</span> -- what's the frequency of a light wave that looks like this? The answer is that its frequency isn't a fixed value, but a distribution, taking values <span class="math-container">$a/2\pi$</span> and <span class="math-container">$b/2\pi$</span> with half probability each. In general, if you have some function <span class="math-container">$f(x)$</span>, the way to obtain this <strong>frequency distribution</strong> is to decompose <span class="math-container">$f(x)$</span> in terms of sinusoids -- this is precisely the <strong>Fourier transform</strong>.</p>
<p>In the specific case you mentioned, position and momentum ("frequency") are "Fourier duals" of each other. If you have a sinusoid (by which I mean <span class="math-container">$e^{2\pi i\xi x}$</span>), you have complete uncertainty about the position, but have a precise value for the momentum: <span class="math-container">$h\xi$</span>. On the other hand, if you had localised your position completely (to a Dirac delta function), you would find a sinusoid in momentum-space.</p>
<p>These distributions are called the "wavefunctions" in position and momentum basis respectively, and this duality is the "uncertainty principle" -- read more about this in my <a href="https://thewindingnumber.blogspot.com/p/2103.html" rel="nofollow noreferrer">quantum mechanics articles here</a> (specifically article 4). In the specific case of white light, white light isn't really a well-defined concept in physics -- it has to do with human eyesight and what visible light entails, but nonetheless the frequency of white light is indeed a distribution with non-zero variance.</p>Sun, 28 Jul 2019 18:23:58 GMThttps://physics.stackexchange.com/questions/494081/-/494085#494085Abhimanyu Pallavi Sudhir2019-07-28T18:23:58ZHow does a covariance intensity function measure clustering?
https://stats.stackexchange.com/questions/418046/how-does-a-covariance-intensity-function-measure-clustering
0<p>I was taught in a class on spatial statistics that the covariance intensity function (defined below) measured clustering and inhibition in a point process, but isn't used because good test statistics for it don't exist.</p>
<p><span class="math-container">$$c(\mathbf{x},\mathbf{y})=\lim_{|d\mathbf{x}|\to0,|d\mathbf{y}|\to 0} \frac{\mathrm{Cov}\{N(d\mathbf{x}), N(d\mathbf{y})\}}{|d\mathbf{x}||d\mathbf{y}|}$$</span></p>
<p>Where <span class="math-container">$\mathbf{x}, \mathbf{y}$</span> are positions on the domain and <span class="math-container">$d\mathbf{x}, d\mathbf{y}$</span> are regions around them with areas given by <span class="math-container">$|d\mathbf{x}|, |d\mathbf{y}|$</span>, and <span class="math-container">$N(R)$</span> represents the random variable corresponding to the number of events in a region.</p>
<p>But I can't see how this measures non-homogeneity at all -- if one starts with a process that is described by an intensity function -- any intensity function -- this function should necessarily be zero, as the existence of an intensity function means a point turning up at point <span class="math-container">$\mathbf{x}$</span> is independent of a point turning up at intensity <span class="math-container">$\mathbf{y}$</span>. And you can certainly have intensity functions that exhibit clustering.</p>
<p>The only way that this function can be non-zero as I see it is if there are correlations within a realisation, e.g. if "everything is clustered to one side" and "everything is clustered to the other side" are the possibilities, or something i.e. if you don't have an intensity function at all, but rather some sort of "entangled state".</p>
<p>What am I missing?</p>correlationcovariancespatialpoint-processThu, 18 Jul 2019 10:50:54 GMThttps://stats.stackexchange.com/q/418046Abhimanyu Pallavi Sudhir2019-07-18T10:50:54ZAnswer by Abhimanyu Pallavi Sudhir for How does non-commutativity lead to uncertainty?
https://physics.stackexchange.com/questions/10362/how-does-non-commutativity-lead-to-uncertainty/491378#491378
0<p>When first learning about wavefunction collapse, I was surprised by the idea that the wavefunction would just <em>become</em> an eigenstate of the observable -- losing all other components of the state vector. Well, it's not as bad as you'd first expect, because the Hilbert space is really big. </p>
<p>But if two operators <em>do not have a common eigenbasis</em> -- i.e. if they don't commute, you do "lose information" about one observable when measuring the other one. This is precisely what the uncertainty principle codifies.</p>Sat, 13 Jul 2019 10:56:00 GMThttps://physics.stackexchange.com/questions/10362/-/491378#491378Abhimanyu Pallavi Sudhir2019-07-13T10:56:00ZAnswer by Abhimanyu Pallavi Sudhir for Should it be obvious that independent quantum states are composed by taking the tensor product?
https://physics.stackexchange.com/questions/54896/should-it-be-obvious-that-independent-quantum-states-are-composed-by-taking-the/489138#489138
0<p>I think it is pretty obvious. Correct me if my argument is wrong somewhere.</p>
<p>In the classical case, if you want to describe e.g. the x-positions of two particles, you have a two-dimensional phase space to show the possible states -- and two is the sum of one and one. But a quantum state space is very different -- every point in the <span class="math-container">$x_1$</span> "axis" is a basis vector of its own, and likewise for <span class="math-container">$x_2$</span> -- the state vectors we speak of are vectors in the Hilbert space, and can be shown as distributions mapped on this <span class="math-container">$(x_1,x_2)$</span> plane, representing them as superpositions of these basis vectors.</p>
<p>So it makes perfect sense that the dimension of the product space is the product of the dimensions and not the sum. The total number of points in the <span class="math-container">$(x_1,x_2)$</span> plane -- which is the dimension of this new Hilbert space -- is the product of the number of points on the <span class="math-container">$x_1$</span> axis and the <span class="math-container">$x_2$</span> axis.</p>
<p>It's clear that the <em>probabilities</em> are multiplicative. Given states <span class="math-container">$|\phi\rangle=\sum p(x)|x\rangle$</span> and <span class="math-container">$|\psi\rangle=\sum q(y)|y\rangle$</span> in bases <span class="math-container">$|x\rangle$</span> and <span class="math-container">$|y\rangle$</span>, it is clear that the <em>magnitudes</em> of the components of the state <span class="math-container">$$|\phi\rangle\otimes|\psi\rangle=\sum r(x,y)|x\rangle\otimes|y\rangle$$</span> where <span class="math-container">$\otimes$</span> (is the desired product representing composition) of the combined system are <span class="math-container">$|r(x,y)|^2=|p(x)q(y)|^2$</span>. But -- as you ask in your question -- how do we know that <span class="math-container">$r(x,y)=p(x)q(y)$</span>?</p>
<p>The idea is quite simple, though -- suppose we have a state like </p>
<p><span class="math-container">$$\left( {\frac{1}{{\sqrt 2 }}\left| x \right\rangle + \frac{1}{{\sqrt 2 }}\left| y \right\rangle } \right) \otimes \left| z \right\rangle = \frac{u}{{\sqrt 2 }}\left| x \right\rangle \otimes \left| z \right\rangle + \frac{v}{{\sqrt 2 }}\left| y \right\rangle \otimes \left| z \right\rangle $$</span></p>
<p>Because we're representing two independent systems, we can just observe the first system, collapsing it to <span class="math-container">$|x\rangle$</span>: then the combined state based on the left-hand-side is collapsed to <span class="math-container">$|x\rangle\otimes|z\rangle$</span>. But based on the right-hand-side, this is <span class="math-container">$u|x\rangle\otimes|z\rangle$</span>, and thus <span class="math-container">$u=1$</span> and similarly for <span class="math-container">$v$</span>.</p>Mon, 01 Jul 2019 09:43:29 GMThttps://physics.stackexchange.com/questions/54896/-/489138#489138Abhimanyu Pallavi Sudhir2019-07-01T09:43:29ZAnswer by Abhimanyu Pallavi Sudhir for Closure under Lie Bracket -- how is $c''(0)$ promoted to $(f\circ c)''(0)$
https://math.stackexchange.com/questions/3264841/closure-under-lie-bracket-how-is-c0-promoted-to-f-circ-c0/3268462#3268462
0<p>Ah, never mind, it's obvious -- I just got confused because it's not true for all curves. Given <span class="math-container">$(f\circ c)''(t)$</span>, it's clearly equal to</p>
<p><span class="math-container">$$c''(t)\cdot\nabla f(t)+c'(t)\cdot\frac{d}{dt}\nabla f(t)$$</span></p>
<p>And since <span class="math-container">$c'(0)=0$</span> for the given curve, this is just equal to the first term.</p>Thu, 20 Jun 2019 09:53:36 GMThttps://math.stackexchange.com/questions/3264841/closure-under-lie-bracket-how-is-c0-promoted-to-f-circ-c0/3268462#3268462Abhimanyu Pallavi Sudhir2019-06-20T09:53:36ZClosure under Lie Bracket -- how is $c''(0)$ promoted to $(f\circ c)''(0)$
https://math.stackexchange.com/questions/3264841/closure-under-lie-bracket-how-is-c0-promoted-to-f-circ-c0
1<p>I've seen numerous different proofs that the tangent space to a Lie group is closed under <span class="math-container">$[\cdot,\cdot]$</span>, i.e. that the Lie Bracket of two derivations is a derivation -- e.g. considering and differentiating the curve <span class="math-container">$e^{\sqrt{t}X}e^{\sqrt{t}Y}e^{-\sqrt{t}X}e^{-\sqrt{t}Y}$</span>, or just showing that <span class="math-container">$[D_1,D_2]$</span> follows the product rule.</p>
<p>But one derivation I don't get comes from Timothy Goldberg's set of lecture notes <em><a href="http://pi.math.cornell.edu/~goldberg/Talks/Flows-Olivetti.pdf" rel="nofollow noreferrer">The Lie Bracket and the Commutator of Flows</a></em>. Here's the process:</p>
<ol>
<li>Define the curve <span class="math-container">$c(t)=\Phi_X^t\Phi_Y^t\Phi_X^{-t}\Phi_Y^{-t}(e)$</span>.</li>
<li>Show that <span class="math-container">$[X,Y]=\frac12c''(0)$</span>.</li>
<li>Define an operation <span class="math-container">$D:f(t)\mapsto (f\circ c)''(0)$</span>.</li>
<li>Show that <span class="math-container">$D$</span> is a derivation.</li>
</ol>
<p>It's Step 3 I don't get. How do we know this operator <span class="math-container">$D$</span> is what "upgrades" <span class="math-container">$[X,Y]$</span> into a vector field? How can we show that <span class="math-container">$[X,Y]$</span> is the direction in which <span class="math-container">$D$</span> differentiates <span class="math-container">$f$</span>?</p>lie-groupslie-algebraslie-derivativeMon, 17 Jun 2019 00:39:06 GMThttps://math.stackexchange.com/q/3264841Abhimanyu Pallavi Sudhir2019-06-17T00:39:06ZAnswer by Abhimanyu Pallavi Sudhir for Is there a mathematical basis for Born rule?
https://physics.stackexchange.com/questions/215602/is-there-a-mathematical-basis-for-born-rule/483618#483618
1<p>One motivation comes from looking at light waves and polarisation -- when light passes through some filter, the energy of a light wave is scaled by <span class="math-container">$\cos^2\theta$</span> -- for a single photon, this means (as you can't have <span class="math-container">$\cos^2\theta$</span> of a photon) there is a probability of <span class="math-container">$\cos^2\theta$</span> that the number of photons passing through is "1". This <span class="math-container">$\cos\theta$</span> is simply the dot product of the "state vector" (polarisation vector) and the eigenvector of the number operator associated with polarisation filter with eigenvalue 1 -- i.e. the probability of observing "1" is <span class="math-container">$|\langle\psi|1\rangle|^2$</span>, and the probability of observing "0" is <span class="math-container">$|\langle\psi|0\rangle|^2$</span>, which is Born's rule.</p>
<p>So if you're motivating the state vector based on the polarisation vector, you can motivate Born's rule from <span class="math-container">$E=|A|^2$</span>, as above.</p>
<p>More abstractly, if you accept the other axioms of quantum mechanics, Born's rule is sort of the "only way" to encode probabilities, as you want probability of the union of disjoint events to be additive (equivalent to the Pythagoras theorem) and the total probability to be one (the length of the state vector is one). </p>
<p>But there is no way to "derive" the Born rule, it is an axiom. Quantum mechanics is fundamentally quite different to e.g. relativity, in the sense that it develops a whole new abstract mathematical theory to connect to the real world. So unlike in relativity, you don't have two axioms that are literally the result of observation and everything is derived from it -- instead, you have an axiomatisation of the mathematical theory, and then a way to connect the theory with observation, which is what Born's rule is. Certainly the <em>motivation</em> for quantum mechanics comes from wave-particle duality, but this is not an axiomatisation.</p>Fri, 31 May 2019 23:33:47 GMThttps://physics.stackexchange.com/questions/215602/-/483618#483618Abhimanyu Pallavi Sudhir2019-05-31T23:33:47ZAnswer by Abhimanyu Pallavi Sudhir for Why do we use Hermitian operators in QM?
https://physics.stackexchange.com/questions/39602/why-do-we-use-hermitian-operators-in-qm/482816#482816
0<p>The point of eigenstates and the entire linear algebra of quantum mechanics is that the projections <span class="math-container">$\langle\phi|\psi\rangle$</span> of the state <span class="math-container">$|\psi\rangle$</span> onto each eigenstate <span class="math-container">$|\phi\rangle$</span> represent the probability amplitudes of each eigenstate. In particular, this means:</p>
<p><span class="math-container">$$\sum |\langle\phi|\psi\rangle|^2 = 1=|\langle\psi|\psi\rangle|^2$$</span></p>
<p>Where the summation is taken over all the eigenstates of an operator. As this must be true for all states <span class="math-container">$|\psi\rangle$</span>, the thing on the left must be a Pythagoran sum, so the <span class="math-container">$|\phi\rangle$</span>s must form an orthogonal basis. Alternatively, one may just note that we must have <span class="math-container">$\langle \phi_1|\phi_2\rangle=0$</span> if the eigenvalues corresponding are distinct, as two distinct observations must be mutually exclusive.</p>
<hr>
<p>That shows that the matrices must be normal. That they are chosen to be Hermitian is non-essential, but useful, as has already been discussed.</p>Mon, 27 May 2019 22:20:33 GMThttps://physics.stackexchange.com/questions/39602/-/482816#482816Abhimanyu Pallavi Sudhir2019-05-27T22:20:33ZComment by Abhimanyu Pallavi Sudhir on What are your favorite instructional counterexamples?
https://mathoverflow.net/questions/16829/what-are-your-favorite-instructional-counterexamples/17285#17285
@ManfredWeis Would you recall the title of the post you meant to link to? Your link is an actively updated feed -- is it <a href="https://calculus7.org/2014/12/07/tossing-a-continuous-coin/" rel="nofollow noreferrer">this</a>?Sat, 25 May 2019 13:30:55 GMThttps://mathoverflow.net/questions/16829/what-are-your-favorite-instructional-counterexamples/17285?cid=829577#17285Abhimanyu Pallavi Sudhir2019-05-25T13:30:55ZAnswer by Abhimanyu Pallavi Sudhir for Are there other kinds of bump functions than $e^\frac1{x^2-1}$?
https://math.stackexchange.com/questions/101480/are-there-other-kinds-of-bump-functions-than-e-frac1x2-1/3236066#3236066
3<p>Here's how you can generate as many different kinds of bump functions as you want, for whatever definition of "kind" you may have:</p>
<ol>
<li>Start with any function <span class="math-container">$f(x)$</span> that <strong>grows faster than all polynomials</strong>, i.e. <span class="math-container">$\forall N, \ \lim_{x\to\infty}\frac{x^N}{f(x)}=0$</span>. Example: <span class="math-container">$e^x$</span>.</li>
<li>Then consider the function <span class="math-container">$g(x)=\frac1{f(1/x)}$</span>. This is a function that is flatter than all polynomials near zero, i.e. <span class="math-container">$\forall N,\ \lim_{x\to0}\frac{g(x)}{x^N}=0$</span>. This is a a <strong>smooth non-analytic</strong> function. For our example, we get <span class="math-container">$e^{-1/x}$</span>.</li>
<li>Consider the function <span class="math-container">$h(x)=g(1+x)g(1-x)$</span>. This, after zeroing out stuff outside the interval <span class="math-container">$(-1,1)$</span>, is a <strong>bump function</strong>. For our example, <span class="math-container">$e^{2/(x^2-1)}$</span>.</li>
<li>Scale and transform to your liking.</li>
</ol>
<p>Just do this with different "kinds" of growth functions <span class="math-container">$f$</span>, and you'll get different "kinds" of bump functions <span class="math-container">$h$</span>. So here are some functions I could generate with this method -- try to guess which functions they're from:</p>
<p><span class="math-container">$$\begin{array}{l}
h(x) = {e^{2/({x^2} - 1)}} \\
h(x) = (1 + x)^{1/(1 + x)}(1 - x)^{1/(1 - x)} \\
h(x) = \frac1{\frac1{1 + x}!\frac1{1-x}!} \\
h(x)=e^{-[\ln^2(1+x)+\ln^2(1-x)]}
\end{array}$$</span></p>
<p>And the more rapidly your <span class="math-container">$f(x)$</span> grows, the nicer your bump function <span class="math-container">$h(x)$</span> looks.</p>
<hr>
<p>Here's a Desmos applet to try this with different functions <span class="math-container">$f$</span>: <a href="https://www.desmos.com/calculator/ccf2goi9bj" rel="nofollow noreferrer"><strong>desmos.com/calculator/ccf2goi9bj</strong></a>. </p>
<p>If you're interested in smooth non-analytic functions, have a look at my post <a href="https://thewindingnumber.blogspot.com/2019/05/whats-with-e-1x-on-smooth-non-analytic.html" rel="nofollow noreferrer"><strong>What's with e^(-1/x)? On smooth non-analytic functions: part I</strong></a>.</p>Wed, 22 May 2019 18:36:36 GMThttps://math.stackexchange.com/questions/101480/are-there-other-kinds-of-bump-functions-than-e-frac1x2-1/3236066#3236066Abhimanyu Pallavi Sudhir2019-05-22T18:36:36ZAnswer by Abhimanyu Pallavi Sudhir for Why does Taylor’s series “work”?
https://physics.stackexchange.com/questions/480163/why-does-taylor-s-series-work/481556#481556
0<p>Adding to <a href="https://physics.stackexchange.com/a/480187/">Sympathiser's answer</a> -- one can see why the existence of functions like <span class="math-container">$e^{-1/x}$</span> is not surprising by rephrasing them as "<strong>functions that approach zero near zero faster than any polynomial</strong>". This is not fundamentally more surprising than e.g. functions that grow faster than every polynomial -- in fact, for any function <span class="math-container">$f(x)$</span> that grows faster than every polynomial, the function <span class="math-container">$\frac1{f(1/x)}$</span> approaches zero near zero faster than any polynomial.</p>
<p>So for rapidly growing <span class="math-container">$f(x)=e^x$</span>, one gets the corresponding smooth non-analytic <span class="math-container">$e^{-1/x}$</span>. For <span class="math-container">$x^x$</span>, one gets <span class="math-container">$x^{1/x}$</span>. For <span class="math-container">$x!$</span>, one gets <span class="math-container">$\frac{1}{(1/x)!}$</span>, and so on.</p>
<p>See my post <a href="https://thewindingnumber.blogspot.com/2019/05/whats-with-e-1x-on-smooth-non-analytic.html" rel="nofollow noreferrer"><strong>What's with e^(-1/x)? On smooth non-analytic functions: part I</strong></a> for a fuller explanation.</p>Wed, 22 May 2019 00:11:33 GMThttps://physics.stackexchange.com/questions/480163/-/481556#481556Abhimanyu Pallavi Sudhir2019-05-22T00:11:33ZData formats of inputs to arrange function in dplyr
https://stackoverflow.com/questions/56158114/data-formats-of-inputs-to-arrange-function-in-dplyr
0<p>Given a table <code>monkeys</code> with column <code>brain_size</code>, one can write something like <strong><code>arrange(monkeys, brain_size)</code></strong>. </p>
<p>I don't understand how this makes sense -- <strong><code>brain_size</code> isn't a declared variable</strong> (if I refer to it, I get an error). It's just the name of a column -- shouldn't you rather have <code>arrange(monkeys, 'brain_size')</code>? <strong><em>Isn't</em> the column name just a string?</strong></p>
<p>Another related weirdness -- </p>
<pre><code>arrange(monkeys, desc(brain_size))
</code></pre>
<p>Once again, what exactly is the <strong><code>desc</code> function</strong>? How can it take <code>brain_size</code> as an input? Shouldn't you have something like <code>arrange(monkeys, 'brain_size', desc = true)</code>?</p>
<p>Am I missing something? Perhaps <code>brain_size</code> is a variable in some way but can only be accessed when you're unambiguously "inside" <code>monkeys</code>.</p>rfunctiontypesdplyrWed, 15 May 2019 21:51:42 GMThttps://stackoverflow.com/q/56158114Abhimanyu Pallavi Sudhir2019-05-15T21:51:42ZAnswer by Abhimanyu Pallavi Sudhir for Geometrical Interpretation of Cauchy Riemann equations?
https://math.stackexchange.com/questions/1026134/geometrical-interpretation-of-cauchy-riemann-equations/3197879#3197879
1<p>One might think that being differentiable on <span class="math-container">$\mathbb{R}^2$</span> is sufficient for differentiability on <span class="math-container">$\mathbb{C}$</span>. But the Jacobian of an arbitrary such function doesn't have a natural complex number representation.</p>
<p><span class="math-container">$$
\left[ {\begin{array}{*{20}{c}}
{\partial u/\partial x} & {\partial u/\partial y} \\
{\partial v/\partial x} & {\partial v/\partial y}
\end{array}} \right]
$$</span></p>
<p>Another way of putting this is that no complex-valued derivative (see below for an example) you can define for an arbitrary function fully captures the local behaviour of the function that is represented by the Jacobian.</p>
<p><span class="math-container">$$
\frac{df}{dz} = \left(\frac{\partial u}{\partial x} + \frac{\partial v}{\partial y} \right) + i\left(\frac{\partial v}{\partial x}-\frac{\partial v}{\partial y}\right)
$$</span></p>
<p>The idea is that we should be able to define a complex-valued derivative "purely" for the value <span class="math-container">$z$</span>, without considering directions, i.e. we want to consider <span class="math-container">$\mathbb{C}$</span> one-dimensional in some sense (the sense being "as a vector space"). More precisely, the derivative in some direction in <span class="math-container">$\mathbb{C}$</span> should determine the derivative in all other directions in a natural manner -- whereas on <span class="math-container">$\mathbb{R}^2$</span>, the derivatives in <em>two</em> directions (i.e. the gradient) determines the directional derivatives in all directions. </p>
<p>If you think about it, this is quite a reasonable idea -- it's analogous to how not every linear transformation on <span class="math-container">$\mathbb{R}^2$</span> is a linear transformation on <span class="math-container">$\mathbb{C}$</span> -- only spiral transformations are.</p>
<p><span class="math-container">$$
\left[ {\begin{array}{*{20}{c}}
{a} & {-b} \\
{b} & {a}
\end{array}} \right]
$$</span></p>
<p>How would we generalise differentiability to an arbitrary manifold? Here's an idea: <strong>a function is differentiable if it is locally a linear transformation</strong>. So on <span class="math-container">$\mathbb{R}^2$</span>, any Jacobian matrix is a linear transformation. But on <span class="math-container">$\mathbb{C}$</span>, only Jacobians of the above form are linear transformations -- i.e. the only linear transformation on <span class="math-container">$\mathbb{C}$</span> is <strong>multiplication by a complex number</strong>, i.e. a spiral/amplitwist. So a complex differentiable function is one that is locally an amplitwist (geometrically), which can be stated in terms of the components of the Jacobian as:</p>
<p><span class="math-container">$$
\begin{align}
\frac{\partial u}{\partial x} & = \frac{\partial v}{\partial y} \\
\frac{\partial u}{\partial y} & = - \frac{\partial v}{\partial x} \\
\end{align}
$$</span></p>
<p>This is precisely why you shouldn't (and can't) view complex differentiability as some basic first-degree smoothness -- there is a much richer structure to these functions, and it's better to think of them via the transformations they have on grids.</p>Tue, 23 Apr 2019 05:55:35 GMThttps://math.stackexchange.com/questions/1026134/-/3197879#3197879Abhimanyu Pallavi Sudhir2019-04-23T05:55:35ZAnswer by Abhimanyu Pallavi Sudhir for Computing the Lie bracket on the Lie group $GL(n, \mathbb{R})$
https://math.stackexchange.com/questions/1884253/computing-the-lie-bracket-on-the-lie-group-gln-mathbbr/3193887#3193887
1<p>I think the sensible way to get an intuition for this is to just look at the Taylor expansion of the group commutator:</p>
<p><span class="math-container">$$e^{\varepsilon x} e^{\varepsilon y} e^{-\varepsilon x} e^{-\varepsilon y}$$</span></p>
<p>Which to second order is <span class="math-container">$1+\varepsilon^2(xy-yx)$</span>. Presumably you know how to prove that the second derivative of the above expression is equivalent to the derivative-of-the-adjoint definition.</p>Fri, 19 Apr 2019 18:26:17 GMThttps://math.stackexchange.com/questions/1884253/-/3193887#3193887Abhimanyu Pallavi Sudhir2019-04-19T18:26:17ZAnswer by Abhimanyu Pallavi Sudhir for Determinant-like expression for non-square matrices
https://math.stackexchange.com/questions/903028/determinant-like-expression-for-non-square-matrices/3191959#3191959
0<p>See <a href="https://arxiv.org/abs/1904.08097" rel="nofollow noreferrer">1904.08097</a> for a review I authored of generalised determinant functions of tall matrices, and their properties -- this should provide a self-contained introduction to three different generalised determinants. </p>
<p>The function mentioned by Joonas Ilmavirta is the square of the "determinant-like function" that I first wrote about in 2013, albeit with an erroneous factor of <span class="math-container">$\sqrt{|m-n|!}$</span> at the front, which is corrected in the above review. It is also the norm-squared of the vector determinant, and the product of the singular values of the matrix.</p>
<p>If you want a non-trivial determinant for "wide matrices", i.e. flattenings, you will need to be a bit creative in the definition of the determinant, such as by defining it as the scaling of <span class="math-container">$m$</span>-volumes where <span class="math-container">$m$</span> is the dimension of the flattened space.</p>Thu, 18 Apr 2019 03:38:38 GMThttps://math.stackexchange.com/questions/903028/-/3191959#3191959Abhimanyu Pallavi Sudhir2019-04-18T03:38:38ZAnswer by Abhimanyu Pallavi Sudhir for Intuitive explanation of a positive semidefinite matrix
https://math.stackexchange.com/questions/9758/intuitive-explanation-of-a-positive-semidefinite-matrix/3181937#3181937
1<p>Positive-definite matrices are matrices that are <strong>congruent to the identity matrix</strong>, i.e. that can be written as <span class="math-container">$P^HP$</span> for invertible <span class="math-container">$P$</span> (for some reason, a lot of authors define congruence as <span class="math-container">$N=P^TMP$</span>, but here we go by the Hermitian definition <span class="math-container">$N=P^HMP$</span>). </p>
<p>One reason this is useful is that if two forms <span class="math-container">$M$</span> and <span class="math-container">$N$</span> are congruent, their corresponding "generalised unitary groups" <span class="math-container">$\{A^HMA=M\}$</span> and <span class="math-container">$\{B^HNB=N\}$</span> are isomorphic (via conjugation by <span class="math-container">$P$</span>). So positive-definite matrices (as well as negative-definite matrices, because <span class="math-container">$-I$</span> is preserved by the unitary group as well) define a dot product whose geometry is isomorphic to Euclidean geometry.</p>
<p>Similarly, a <strong>positive semidefinite matrix</strong> defines a geometry that Euclidean geometry is <em>homeomorphic</em> to -- to put it in slightly imprecisely, such a geometry has all the symmetries of Euclidean geometry, and perhaps then some.</p>
<p>See a fuller treatment <strong><a href="https://thewindingnumber.blogspot.com/2019/04/geometry-positive-definiteness-and.html" rel="nofollow noreferrer">here</a></strong>.</p>Wed, 10 Apr 2019 06:48:08 GMThttps://math.stackexchange.com/questions/9758/-/3181937#3181937Abhimanyu Pallavi Sudhir2019-04-10T06:48:08ZAnswer by Abhimanyu Pallavi Sudhir for Can non-linear transformations be represented as Transformation Matrices?
https://math.stackexchange.com/questions/450/can-non-linear-transformations-be-represented-as-transformation-matrices/3177854#3177854
0<p>The point of transformation matrices is that the images of the <span class="math-container">$n$</span> basis vectors is sufficient to determine the action of the entire transformation -- this is true for linear transformations, but not an arbitrary transformation.</p>
<p>However, nonlinear transformations (the smooth ones, anyway), can be locally approximated as linear transformations. With a bit of calculus, you get the "Jacobian matrix", which acts on the tangent vector space at every point on a manifold. This is a generalisation of transformation matrices in the sense that linear transformation's Jacobian is equal to its matrix representation, i.e. in the same sense that the derivative generalises the slope (which completely determines a linear function <span class="math-container">$y=mx$</span>)</p>Sun, 07 Apr 2019 06:48:31 GMThttps://math.stackexchange.com/questions/450/-/3177854#3177854Abhimanyu Pallavi Sudhir2019-04-07T06:48:31ZAnswer by Abhimanyu Pallavi Sudhir for Why does $A^TA=I, \det A=1$ mean $A$ is a rotation matrix?
https://math.stackexchange.com/questions/68119/why-does-ata-i-det-a-1-mean-a-is-a-rotation-matrix/3177807#3177807
2<p>You could just write out the components to confirm that this is so -- a much more interesting way to understand things, however, is to write down the condition as:</p>
<p><span class="math-container">$$A^TIA=I$$</span></p>
<p>The idea is that the matrix <span class="math-container">$A$</span> <em>preserves the identity quadratic form</em> -- note that <span class="math-container">$I$</span> is a quadratic form here and not a linear transformation, as this is the transformation law for quadratic forms (<span class="math-container">$A^TMA$</span> instead of <span class="math-container">$A^{-1}MA$</span>).</p>
<p>The hyperconic section corresponding to the identity quadratic form is the unit sphere -- thus the orthogonal transformations are all those that preserve the unit sphere. Another way of putting this is that <span class="math-container">$(Ax)^TI(Ay)=x^TA^TIAy=x^TIy$</span>, i.e. the Euclidean dot product <span class="math-container">$I$</span> is preserved by <span class="math-container">$A$</span>. This is equivalent to preserving the unit sphere, because the unit sphere is determined by the dot product on the given space.</p>
<p>What sort of transformations preserve the unit sphere? </p>
<hr>
<p>The reason this is a good way of understanding things is that there are plenty of other "dot products" you can define. One elementary one from physics is the Minkowski dot product in special relativity, <span class="math-container">$\mathrm{diag}(-1,1,1,1)$</span> -- the corresponding quadric surface is a hyperboloid, and the transformations that preserve it, forming the Lorentz group, are boosts (skews between time and a spatial dimension), spatial rotations and reflections.</p>
<hr>
<p>As for discriminating between rotations and reflections, suppose we define rotations in a completely geometric way -- for a matrix to be a rotation, all its eigenvalues are either 1 or in pairs of unit complex conjugates. </p>
<p>What do the eigenvalues of orthogonal matrices look like? For each eigenvalue, you need <span class="math-container">$\overline{\lambda}\lambda=1$</span>, i.e. all the eigenvalues are unit complex numbers. If a complex eigenvalue isn't paired with a corresponding conjugate, you will not get a real-valued transformation on <span class="math-container">$\mathbb{R}^n$</span>. Meanwhile if an eigenvalue of -1 isn't paired with another -1 -- i.e. if there are an odd number of reflections -- you get a reflection. The orthogonal (or rather unitary) transformations that do not behave this way are precisely the rotations.</p>
<p>The similarity between unpaired unit complex eigenvalues and unpaired -1's is interesting, by the way -- when thinking about reflections, you might have gotten the idea that reflections are <span class="math-container">$\pi$</span>-angle rotations in a higher-dimensional space -- like the vector was rotated through a higher-dimensional space and then landed on its reflection -- like it was a discrete snapshot of a process as smooth as any rotation. </p>
<p>Well, now you know what this higher-dimensional space is -- precisely <span class="math-container">$\mathbb{C}^n$</span>. And the determinant of a unitary matrix also takes a continuous spectrum -- the entire unit circle. In this sense (among other senses) complex linear algebra is more "complete" than real linear algebra.</p>Sun, 07 Apr 2019 05:55:12 GMThttps://math.stackexchange.com/questions/68119/why-does-ata-i-det-a-1-mean-a-is-a-rotation-matrix/3177807#3177807Abhimanyu Pallavi Sudhir2019-04-07T05:55:12ZAnswer by Abhimanyu Pallavi Sudhir for Reasoning about Lie theory and the Exponential Map
https://math.stackexchange.com/questions/19575/reasoning-about-lie-theory-and-the-exponential-map/3177348#3177348
0<p>The identity element <em>does</em> have significance, in the sense that it is the only natural way to think of the elements of the Lie Algebra as infinitesimal generators.</p>
<p>As I explain <a href="https://thewindingnumber.blogspot.com/2019/04/introduction-to-lie-groups.html" rel="nofollow noreferrer">here</a>, the idea is that with elements of the form <span class="math-container">$1+\varepsilon\vec\theta$</span>, elements of the group are generated as </p>
<p><span class="math-container">$$g(\vec\theta)=(1+\varepsilon\vec\theta)^{1/\varepsilon}=\exp\vec\theta$$</span></p>
<p>This map only exists when elements close to the identity are taken, as every element other than the identity is itself a generator (thus elements of the group can simply be generated via real-powers, not infinitesimally).</p>
<p><img src="https://i.stack.imgur.com/0AC5rm.png" width="500" /></p>Sat, 06 Apr 2019 19:21:58 GMThttps://math.stackexchange.com/questions/19575/-/3177348#3177348Abhimanyu Pallavi Sudhir2019-04-06T19:21:58ZAnswer by Abhimanyu Pallavi Sudhir for Binomial product expansion
https://math.stackexchange.com/questions/1331401/binomial-product-expansion/3172053#3172053
0<p>It is not a generalisation of the Binomial theorem because the exponent of <span class="math-container">$c$</span> isn't really handled -- they just took it outside. If you were to expand out the right-hand-side, you would have a generalisation of the Binomial theorem.</p>Tue, 02 Apr 2019 16:08:25 GMThttps://math.stackexchange.com/questions/1331401/-/3172053#3172053Abhimanyu Pallavi Sudhir2019-04-02T16:08:25ZAnswer by Abhimanyu Pallavi Sudhir for Intuition for the exponential of a matrix
https://math.stackexchange.com/questions/1213264/intuition-for-the-exponential-of-a-matrix/3165551#3165551
1<p>When I first learned about cyclic groups, the picture that I always had in my head was of the unit circle in the complex plane -- imagine my shock when I realised it wasn't a cyclic group at all! But I really <em>wanted</em> it to be cyclic, because it shared some really interesting properties with cyclic groups (see my post <em><a href="https://thewindingnumber.blogspot.com/2018/12/intuition-analogies-and-abstraction.html" rel="nofollow noreferrer">Intuition, analogies and abstraction</a></em>).</p>
<p>The solution to the problem can be seen directly from the quickest proof that the unit circle isn't cyclic -- the fact that it isn't countable (while the integers are). So here's an idea: let's admit <em>real powers on groups</em>!</p>
<p>Ok, but how? We know the construction of integer powers on an arbitrary group, and we know how real powers work on the unit circle, or the real line (which is also real-power cyclic*, by the way), and it's conventionally equal to <span class="math-container">$x^r=\exp(r\log x)$</span> with <span class="math-container">$\exp$</span> given by its power series expansion.</p>
<p>But sticking just to our intuition for now, it would seem like the natural way to define a real power is to introduce a real-number parameterisation to our group -- for example, the circle group can be parameterised by <span class="math-container">$\theta$</span> and each element of the group is given by some <span class="math-container">$g(\theta)$</span>. Then real powers would look like <span class="math-container">$g(\theta)^r=g(r\theta)$</span>. In the case of a one-parameter group, we also have <span class="math-container">$g(\theta_1+\theta_2)=g(\theta_1)g(\theta_2)$</span>, but don't get too attached to this.</p>
<p>If you think about it, we've now just given some <em>additional structure</em> to our group -- a geometric structure in addition to the group structure.</p>
<p>But frankly, introducing a parameterisation in this way is a bit hand-wavy. We knew what parameterisation to introduce for the circle group because we already have a picture of its geometry in our heads, but in principle, we could've introduced really any kind of ridiculous parameterisation and given it a really ugly structure and an ugly real-power. What we need is a sensible, systematic way to introduce this parameterisation -- i.e. to think about what this parameter space really <em>is</em>.</p>
<p>The answer to the question comes from Euler's formula, which relates addition on the imaginary line to multiplication on the unit circle. </p>
<p><span class="math-container">$$\exp(i\theta)=g(\theta)$$</span></p>
<p>What significance does the imaginary line have to the unit circle? Well, something interesting is that the tangent to the unit circle at 1 is parallel to the imaginary line, i.e. all its elements are of the form <span class="math-container">$1+it$</span>. So an idea for the parameterisation is that the parameter space is the tangent space at the identity of the group -- this is the Lie algebra of the group.</p>
<p>(You still need to prove that this actually works in general -- this has to do with proving that all derivatives of the exponential map at the identity can be recovered as <span class="math-container">$g^{(k)}(0)=(g'(0))^k$</span> -- this is a property of exponential functions of the form <span class="math-container">$g(t)=e^{bt}$</span>, and is part of the "exponential structure" of the Lie Algebra/Lie Group correspondence.)</p>
<p>This is not too bad! It's not completely absurd to think about the "vicinity of the identity" of at least matrix groups, so it's not absurd to think about tangent spaces to these groups. This is where you see arguments like <span class="math-container">$(1+\varepsilon t)^T(1+\varepsilon t)=1+\varepsilon(t+t^T)$</span> implying the tangent space to an Orthogonal Group is an algebra of antisymmetric matrices, etc. -- if you have some notion of perturbing an element in your group, you can construct a Lie algbera parameterisation of it.</p>
<hr>
<p>*To the best of my knowledge, "real-power cyclic" is not a real word -- the conventional term is "one-parameter Lie group".</p>
<p>See my post <a href="https://thewindingnumber.blogspot.com/2019/04/introduction-to-lie-groups.html" rel="nofollow noreferrer">Introduction to Lie groups</a> for a more complete treatment.</p>Thu, 28 Mar 2019 06:29:55 GMThttps://math.stackexchange.com/questions/1213264/-/3165551#3165551Abhimanyu Pallavi Sudhir2019-03-28T06:29:55ZAnswer by Abhimanyu Pallavi Sudhir for What's the generalisation of the quotient rule for higher derivatives?
https://math.stackexchange.com/questions/5357/whats-the-generalisation-of-the-quotient-rule-for-higher-derivatives/3131947#3131947
1<p>I'm checking @Mohammad Al Jamal's formula with SymPy, and I can verify it's true (barring a missing <span class="math-container">$(-1)^k$</span> term) for up to <span class="math-container">$n = 16$</span>, at least (it gets really slow after that).</p>
<pre>
import sympy as sp
k = sp.Symbol('k'); x = sp.Symbol('x'); f = sp.Function('f'); g = sp.Function('g')
n = 0
while True:
fgn = sp.diff(f(x) / g(x), x, n)
guess = sp.summation((-1) ** k * sp.binomial(n + 1, k + 1) \
* sp.diff(f(x) * (g(x)) ** k, x, n)/(g(x) ** (k + 1)), (k, 0, n))
print("{} for n = {}".format(sp.expand(guess - fgn) == 0, n))
n += 1
</pre>
<p>This is quite surprising to me -- I didn't expect there to be such a simple and straightforward expression for <span class="math-container">$(f(x)/g(x))^{(n)}$</span>, and haven't seen his formula anywhere before. I tried some inductive proofs, but I haven't succeeded in proving it yet.</p>Sat, 02 Mar 2019 00:15:06 GMThttps://math.stackexchange.com/questions/5357/-/3131947#3131947Abhimanyu Pallavi Sudhir2019-03-02T00:15:06ZAnswer by Abhimanyu Pallavi Sudhir for Why didn't Lorentz conclude that no object can go faster than light?
https://physics.stackexchange.com/questions/461833/why-didnt-lorentz-conclude-that-no-object-can-go-faster-than-light/461863#461863
12<p>Because typically if you find an expression that seems to break down at some value of <span class="math-container">$v$</span>, you would conclude that the expression simply loses its validity for that value of <span class="math-container">$v$</span>, not that the value isn't attainable. Presumably this was the conclusion of Lorentz and others.</p>
<p>The reason Einstein concluded otherwise is that special relativity gives a physical argument for "superluminal speeds are equivalent to time running backwards" -- the argument is "does a superluminal ship hit the iceberg before or after its headlight does?" </p>
<p>This depends on the observer, and because the headlight would melt the iceberg, the consequences of each observation are noticeably different. The only possible conclusions are "superluminal ships don't exist", "time runs backwards for superluminal observers", or "iceberg-melting headlights don't exist".</p>Wed, 20 Feb 2019 10:43:02 GMThttps://physics.stackexchange.com/questions/461833/-/461863#461863Abhimanyu Pallavi Sudhir2019-02-20T10:43:02ZAnswer by Abhimanyu Pallavi Sudhir for What kind of matrices are non-diagonalizable?
https://math.stackexchange.com/questions/472915/what-kind-of-matrices-are-non-diagonalizable/3097881#3097881
4<p><strong>Edit:</strong> The algebra I speak of here is <em>not</em> actually the Grassmann numbers at all -- they are <span class="math-container">$\mathbb{R}[X]/(X^n)$</span>, whose generators <em>don't</em> satisfy the anticommutativity relation even though they satisfy all the nilpotency relations. The dual-number stuff for 2 by 2 is still correct, just ignore my use of the word "Grassmann".</p>
<hr>
<p>Non-diagonalisable 2 by 2 matrices can be diagonalised over the <a href="https://en.wikipedia.org/wiki/Dual_number" rel="nofollow noreferrer">dual numbers</a> -- and the "weird cases" like the Galilean transformation are not fundamentally different from the nilpotent matrices.</p>
<p>The intuition here is that the Galilean transformation is sort of a "boundary case" between real-diagonalisability (skews) and complex-diagonalisability (rotations) (which you can sort of think in terms of discriminants). In the case of the Galilean transformation <span class="math-container">$\left[\begin{array}{*{20}{c}}{1}&{v}\\{0}&{1}\end{array}\right]$</span>, it's a small perturbation away from being diagonalisable, i.e. it sort of has "repeated eigenvectors" (you can visualise this with <a href="https://shadanan.github.io/MatVis/" rel="nofollow noreferrer">MatVis</a>). So one may imagine that the two eigenvectors are only an "epsilon" away, where <span class="math-container">$\varepsilon$</span> is the unit dual satisfying <span class="math-container">$\varepsilon^2=0$</span> (called the "soul"). Indeed, its characteristic polynomial is:</p>
<p><span class="math-container">$$(\lambda-1)^2=0$$</span></p>
<p>Whose solutions among the dual numbers are <span class="math-container">$\lambda=1+k\varepsilon$</span> for real <span class="math-container">$k$</span>. So one may "diagonalise" the Galilean transformation over the dual numbers as e.g.:</p>
<p><span class="math-container">$$\left[\begin{array}{*{20}{c}}{1}&{0}\\{0}&{1+v\varepsilon}\end{array}\right]$$</span></p>
<p>Granted this is not unique, this is formed from the change-of-basis matrix <span class="math-container">$\left[\begin{array}{*{20}{c}}{1}&{1}\\{0}&{\epsilon}\end{array}\right]$</span>, but any vector of the form <span class="math-container">$(1,k\varepsilon)$</span> is a valid eigenvector. You could, if you like, consider this a canonical or "principal value" of the diagonalisation, and in general each diagonalisation corresponds to a limit you can take of real/complex-diagonalisable transformations. Another way of thinking about this is that there is an entire eigenspace spanned by <span class="math-container">$(1,0)$</span> and <span class="math-container">$(1,\varepsilon)$</span> in that little gap of multiplicity. In this sense, the geometric multiplicity is forced to be equal to the algebraic multiplicity*.</p>
<p>Then a nilpotent matrix with characteristic polynomial <span class="math-container">$\lambda^2=0$</span> has solutions <span class="math-container">$\lambda=k\varepsilon$</span>, and is simply diagonalised as:</p>
<p><span class="math-container">$$\left[\begin{array}{*{20}{c}}{0}&{0}\\{0}&{\varepsilon}\end{array}\right]$$</span></p>
<p>(Think about this.) Indeed, the resulting matrix has minimal polynomial <span class="math-container">$\lambda^2=0$</span>, and the eigenvectors are as before.</p>
<hr>
<p>What about higher dimensional matrices? Consider:</p>
<p><span class="math-container">$$\left[ {\begin{array}{*{20}{c}}0&v&0\\0&0&w\\0&0&0\end{array}} \right]$$</span></p>
<p>This is a nilpotent matrix <span class="math-container">$A$</span> satisfying <span class="math-container">$A^3=0$</span> (but not <span class="math-container">$A^2=0$</span>). The characteristic polynomial is <span class="math-container">$\lambda^3=0$</span>. Although <span class="math-container">$\varepsilon$</span> might seem like a sensible choice, it doesn't really do the trick -- if you try a diagonalisation of the form <span class="math-container">$\mathrm{diag}(0,v\varepsilon,w\varepsilon)$</span>, it has minimal polynomial <span class="math-container">$A^2=0$</span>, which is wrong. Indeed, you won't be able to find three linearly independent eigenvectors to diagonalise the matrix this way -- they'll all take the form <span class="math-container">$(a+b\varepsilon,0,0)$</span>.</p>
<p>Instead, you need to consider a generalisation of the dual numbers, called the Grassmann numbers, with the soul satisfying <span class="math-container">$\epsilon^n=0$</span>. Then the diagonalisation takes for instance the form:</p>
<p><span class="math-container">$$\left[ {\begin{array}{*{20}{c}}0&0&0\\0&{v\epsilon}&0\\0&0&{w\epsilon}\end{array}} \right]$$</span></p>
<hr>
<p>*Over the reals and complexes, when one defines algebraic multiplicity (as "the multiplicity of the corresponding factor in the characteristic polynomial"), there is a single eigenvalue corresponding to that factor. This is of course no longer true over the Grassmann numbers, because they are not a field, and <span class="math-container">$ab=0$</span> no longer implies "<span class="math-container">$a=0$</span> or <span class="math-container">$b=0$</span>".</p>
<p>In general, if you want to prove things about these numbers, the way to formalise them is by constructing them as the quotient <span class="math-container">$\mathbb{R}[X]/(X^n)$</span>, so you actually have something clear to work with.</p>
<p>(Perhaps relevant: <a href="https://math.stackexchange.com/questions/46078/grassmann-numbers-as-eigenvalues-of-nilpotent-operators">Grassmann numbers as eigenvalues of nilpotent operators?</a> -- discussing the fact that the Grassmann numbers are not a field).</p>
<p>You might wonder if this sort of approach can be applicable to LTI differential equations with repeated roots -- after all, their characteristic matrices are exactly of this Grassmann form. As pointed out in the comments, however, this diagonalisation is still not via an invertible change-of-basis matrix, it's still only of the form <span class="math-container">$PD=AP$</span>, not <span class="math-container">$D=P^{-1}AP$</span>. I don't see any way to bypass this. See my posts <a href="https://thewindingnumber.blogspot.com/2019/02/all-matrices-can-be-diagonalised.html" rel="nofollow noreferrer">All matrices can be diagonalised</a> (a re-post of this answer) and <a href="https://thewindingnumber.blogspot.com/2018/03/repeated-roots-of-differential-equations.html" rel="nofollow noreferrer">Repeated roots of differential equations</a> for ideas, I guess.</p>Sat, 02 Feb 2019 22:17:56 GMThttps://math.stackexchange.com/questions/472915/-/3097881#3097881Abhimanyu Pallavi Sudhir2019-02-02T22:17:56ZAnswer by Abhimanyu Pallavi Sudhir for Relativity from a basic assumption
https://physics.stackexchange.com/questions/455712/relativity-from-a-basic-assumption/455753#455753
1<p>I will give <em>a</em> derivation of the Lorentz boosts requiring (what at least seem to be) minimal assumptions, and we will look at what assumptions we used, and see if some of them can be derived from each other, etc. Note that by "the Lorentz transformations", I mean the Lorentz transformation of spacetime position -- Lorentz transformations of other four-touples (i.e. proving that they are Lorentz vectors) would require other assumptions, of course. I've given a more full explanation of the derivation <a href="https://thewindingnumber.blogspot.com/2017/09/introduction-to-special-relativity.html" rel="nofollow noreferrer">here</a>.</p>
<p><strong>(a)</strong> The first important fact you need to prove anything about the Lorentz transformations is that they are linear. Linearity is logically equivalent to the following conditions: (under the transformation),</p>
<ul>
<li><strong>all straight lines remain straight lines</strong> -- the physical interpretation of this is that if an object's velocity is constant in one inertial reference frame, it is constant in all inertial reference frames. This follows from the <em>principle of relativity</em>.</li>
<li><strong>the origin remains fixed</strong> -- this is true by definition of the transformations we are considering -- boosts passing through the same origin.</li>
</ul>
<p>With this, we know that we can use a matrix to write down the Lorentz transformations. Which matrix?</p>
<p><strong>(b)</strong> The tilt/angle of the <span class="math-container">$t'$</span>, <span class="math-container">$x'$</span> axes with respect to the <span class="math-container">$t$</span>, <span class="math-container">$x$</span> axes. The tilt of the <span class="math-container">$t'$</span> axes follows from the definition of velocity as the gradient of the worldline. To prove the tilt of the <span class="math-container">$x'$</span> axis is equal to this tilt, we need to first define the <span class="math-container">$x'$</span> axis within the unprimed co-ordinate system. </p>
<p>This is possible by considering invariant features under a boost, i.e. from the principle of relativity -- the obvious invariant is as follows: if you had emitted a light ray <span class="math-container">$a$</span> seconds in the past, it reflects off some object and returns to you <span class="math-container">$a$</span> seconds in the future, it was on your x-axis at time 0.</p>
<p><a href="https://i.stack.imgur.com/zC7TS.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/zC7TS.png" alt="enter image description here"></a></p>
<p>By the principle of relativity, this should apply in the primed reference frame as well. By the invariance of the speed of light, the slope of the light ray is the same in the primed reference frame. Now figuring out the angle of tilt of the <span class="math-container">$x'$</span> axis becomes an exercise in geometry.</p>
<p><a href="https://i.stack.imgur.com/QvRjN.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/QvRjN.png" alt="enter image description here"></a></p>
<p>And it's easy to prove, by drawing an appropriate circle, that the two tilts are equal.</p>
<p><strong>(c)</strong> We now know the lines the column vectors of our matrix land on -- they are multiples of <span class="math-container">$(1, v)$</span> and <span class="math-container">$(v, 1)$</span>, but which vector on that line exactly? In other words, what's the scale on the axes? This requires one extra assumption: if you boost into the frame with velocity <span class="math-container">$v$</span>, then boost <span class="math-container">$-v$</span> back, that's equivalent to not boosting at all, i.e. <span class="math-container">$L(v)L(-v)=I$</span>. Then it's just computation:</p>
<p><span class="math-container">\begin{gathered}
\left[ {\begin{array}{*{20}{c}}
1&0 \\
0&1
\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}
\alpha &{\beta v} \\
{\alpha v}&\beta
\end{array}} \right]\left[ {\begin{array}{*{20}{c}}
\alpha &{ - \beta v} \\
{ - \alpha v}&\beta
\end{array}} \right] = \left[ {\begin{array}{*{20}{c}}
{{\alpha ^2} - \alpha \beta {v^2}}&{{\beta ^2}v - \alpha \beta v} \\
{{\alpha ^2}v - \alpha \beta v}&{{\beta ^2} - \alpha \beta {v^2}}
\end{array}} \right] \hfill \\
{\alpha ^2}v - \alpha \beta v = 0 = {\beta ^2}v - \alpha \beta v \Rightarrow {\alpha ^2} = \alpha \beta = {\beta ^2} \Rightarrow \alpha = \beta \hfill \\
{\alpha ^2} - \alpha \beta {v^2} = 1 = {\beta ^2} - \alpha \beta {v^2} \Rightarrow {\alpha ^2} = 1 + \alpha \beta {v^2} = {\beta ^2} \Rightarrow {\alpha ^2} = 1 + {\alpha ^2}{v^2} \hfill \\
\Rightarrow \alpha = \beta = \frac{1}{{\sqrt {1 - {v^2}} }} \hfill \\
\end{gathered}</span></p>
<p>Then the change of basis matrix is simply the inverse of this matrix, which is:</p>
<p><span class="math-container">$$\Lambda=\gamma \left[ {\begin{array}{*{20}{c}}
1&-v \\
-v&1
\end{array}} \right]$$</span></p>
<p>Or:</p>
<p><span class="math-container">\begin{gathered}
x' = \gamma \left( {x - vt} \right) \\
t' = \gamma \left( {t - vx} \right) \\
\end{gathered}</span></p>
<p><strong>(d)</strong> There's still one final step, however -- we need to verify that <span class="math-container">$y$</span> and <span class="math-container">$z$</span> aren't transformed under the Lorentz boost. To prove this, consider two twins with paintbrushes running towards each other, painting the wall at waist level -- if the orthogonal axis were transformed in any way, each twin would see his paint-streak as above the other's -- the fact that the paint-streaks' relative positioning can't be different can be seen, e.g. from supposing that the two paints cause an explosion in the mix. The fact that the presence of explosions (or any boolean quantity) is invariant under Lorentz transformations is a consequence of the principle of relativity.</p>
<hr>
<p>We used three physical assumptions:</p>
<ul>
<li>The principle of relativity</li>
<li>The invariance of the speed of light</li>
<li><span class="math-container">$L(v)L(-v)=L(0)$</span>, or "if I see you moving at <span class="math-container">$v$</span>, you see me moving at <span class="math-container">$-v$</span>"</li>
</ul>
<p>The first two are the assumptions you wanted. As far as I can see, the last assumption can't really be proven from the other two -- it requires some sort of symmetry principle. But that's okay.</p>Mon, 21 Jan 2019 22:48:47 GMThttps://physics.stackexchange.com/questions/455712/-/455753#455753Abhimanyu Pallavi Sudhir2019-01-21T22:48:47ZAnswer by Abhimanyu Pallavi Sudhir for Varying constants in special relativity
https://physics.stackexchange.com/questions/455159/varying-constants-in-special-relativity/455176#455176
1<blockquote>
<p>(presumably) everything has mass, there is no such thing as a perfect inertial frame of reference</p>
</blockquote>
<p>This isn't right. "There isn't generally a perfectly flat co-ordinate system" does not imply everything has mass, and being an inertial reference frame has nothing to do with the associated observer's mass (in fact, the Lorentz transformation associated with a photon's "co-ordinate system" is singular, so there isn't really a co-ordinate system/reference frame associated with it).</p>
<p>I guess your concern is with the fact that photons are affected by spacetime curvature -- this is true, but the/a point of general relativity is that this doesn't imply anything about the mass.</p>Fri, 18 Jan 2019 20:30:09 GMThttps://physics.stackexchange.com/questions/455159/-/455176#455176Abhimanyu Pallavi Sudhir2019-01-18T20:30:09ZAnswer by Abhimanyu Pallavi Sudhir for What is really curved, spacetime, or simply the coordinate lines?
https://physics.stackexchange.com/questions/290906/what-is-really-curved-spacetime-or-simply-the-coordinate-lines/452416#452416
0<p>Curved co-ordinates on flat spacetime correspond to accelerating observers, not gravity. </p>
<p>The first physical insight of general relativity is that when you have gravity, you have <em>no</em> globally inertial frames -- contrast this with flat space, where you can always construct a linear co-ordinate system. The second physical insight is that you do have locally inertial frames, specifically the freefalling ones -- this is the "equivalence principle" -- so the manifold you use to model spacetime must necessarily have local flatness. Consequently, (pseudo-)Riemannian manifolds become the right way to model spacetime in general relativity.</p>
<p>This is why Christoffel symbols exist for accelerating observers on flat spacetime too -- they're first-order in the derivatives of the metric, and so can be eliminated by transforming into a flat co-ordinate system where the metric is constant (this is okay because the Christoffel symbols aren't tensors). The Riemann curvature tensor, on the other hand, is second-order in the derivatives of the metric and cannot be eliminated by a co-ordinate transformation.</p>Sun, 06 Jan 2019 12:13:03 GMThttps://physics.stackexchange.com/questions/290906/-/452416#452416Abhimanyu Pallavi Sudhir2019-01-06T12:13:03ZAnswer by Abhimanyu Pallavi Sudhir for Relative velocity greater than speed of light
https://physics.stackexchange.com/questions/452078/relative-velocity-greater-than-speed-of-light/452100#452100
0<p>Velocity is definitonally the same as "relative velocity". This is the point of the first postulate of relativity.</p>Fri, 04 Jan 2019 15:11:44 GMThttps://physics.stackexchange.com/questions/452078/-/452100#452100Abhimanyu Pallavi Sudhir2019-01-04T15:11:44ZAnswer by Abhimanyu Pallavi Sudhir for Does spacetime position not form a four-vector?
https://physics.stackexchange.com/questions/192886/does-spacetime-position-not-form-a-four-vector/450137#450137
1<p>Right -- vectors in general relativity live in some tangent space. This is the point of differential geometry, and of calculus in general -- you approximate non-linear things, which are <em>not</em> vector spaces (like curvy manifolds) with linear things (like their tangent spaces), which are vector spaces. This is exactly the motivation for defining the basis vectors as <span class="math-container">$\partial_\mu$</span>, as you describe.</p>Mon, 24 Dec 2018 07:04:21 GMThttps://physics.stackexchange.com/questions/192886/-/450137#450137Abhimanyu Pallavi Sudhir2018-12-24T07:04:21ZAnswer by Abhimanyu Pallavi Sudhir for What is an event in Special Relativity?
https://physics.stackexchange.com/questions/389488/what-is-an-event-in-special-relativity/444892#444892
1<p>It is perfectly reasonable to say that an event is a point in spacetime and that spacetime is a collection of events -- it is not "circular" as you claim in the comments. This is just the physics version of "a vector is an element of a vector space" and "a vector space is a set of vectors". You have axioms in math, and you have axioms in physics. The only difference is that in math, the objects are abstract, but in physics, they have a physical interpretation.</p>Mon, 03 Dec 2018 16:51:40 GMThttps://physics.stackexchange.com/questions/389488/-/444892#444892Abhimanyu Pallavi Sudhir2018-12-03T16:51:40ZAnswer by Abhimanyu Pallavi Sudhir for Why is the scalar product of two four-vectors Lorentz-invariant?
https://physics.stackexchange.com/questions/442119/why-is-the-scalar-product-of-two-four-vectors-lorentz-invariant/442164#442164
2<p>Here's the way to think about this -- why is the standard Euclidean dot product, <span class="math-container">$\sum x_iy_i$</span> interesting? Well, it is interesting primarily from the perspective of rotations, due to the fact that rotations leave dot products invariant. The reason this is so is that this dot product can be written as <span class="math-container">$|x||y|\cos\Delta\theta$</span>, and rotations leave magnitudes and relative angles invariant.</p>
<p>Is the standard Euclidean norm <span class="math-container">$|x|$</span> invariant under Lorentz transformations? Of course not -- for instance, <span class="math-container">$\Delta t^2+\Delta x^2$</span> is clearly not invariant, but <span class="math-container">$\Delta t^2-\Delta x^2$</span> is. Similarly, <span class="math-container">$E^2+p^2$</span> is not important, but <span class="math-container">$E^2-p^2$</span> is. The reason this is the case is that Lorentz boosts are fundamentally skew transformations, which means the invariant locus is a hyperbola, not a circle. So you have <span class="math-container">$\cosh^2 \xi - \sinh^2 \xi = 1$</span>, and <span class="math-container">$x_0^2-x_1^2$</span> is the right way to think of the norm on Minkowski space.</p>
<p>Similarly, Lorentz boosts change the rapidity <span class="math-container">$\xi$</span> by a simple displacement, so <span class="math-container">$\Delta \xi$</span> is invariant. From this point, it's a simple exercise to show that </p>
<p><span class="math-container">$$|x||y|\cosh\xi=x_0y_0-x_1y_1$$</span></p>
<p>(as for the remaining dimensions -- remember that the standard Euclidean dot product is still relevant in <em>space</em>, so you just need to write <span class="math-container">$x_0y_0-x\cdot y=x_0y_0-x_1y_1-x_2y_2-x_3y_3$</span>.)</p>Tue, 20 Nov 2018 15:59:18 GMThttps://physics.stackexchange.com/questions/442119/-/442164#442164Abhimanyu Pallavi Sudhir2018-11-20T15:59:18ZComment by Abhimanyu Pallavi Sudhir on Mate in 0 moves
https://puzzling.stackexchange.com/questions/74086/mate-in-0-moves/74093#74093
@FabianRöling Pawns have directions.Mon, 22 Oct 2018 09:27:58 GMThttps://puzzling.stackexchange.com/questions/74086/mate-in-0-moves/74093?cid=221467#74093Abhimanyu Pallavi Sudhir2018-10-22T09:27:58ZAnswer by Abhimanyu Pallavi Sudhir for Newton's Third Law and conservation of momentum
https://physics.stackexchange.com/questions/435941/newtons-third-law-and-conservation-of-momentum/436015#436015
3<p>As far as the actual physics is concerned, it is meaningless to talk of whether conservation of momentum is "more fundamental" than Newton's third law -- you can axiomatise classical physics in either way -- from Newton's laws, from conservation laws, from symmetry laws, from an action principle, whatever. You can prove the resulting theories are equivalent, in the sense that all the alternative axiomatic systems imply each other.</p>
<p>In terms of understanding, it makes sense to have multiple different frameworks in your head -- a symmetry-based framework is really good intuitively, especially once you understand Noether's theorem, while an action principle is the most powerful and also more useful when you leave the realm of classical physics. Treating Newton's laws as axioms isn't a great idea -- it's mostly just historically relevant.</p>
<p>When you learn more advanced physics, conservation of momentum <em>will</em> start "feeling" more fundamental -- this is simply because momentum is an interesting quantity to talk about.</p>Sun, 21 Oct 2018 21:20:57 GMThttps://physics.stackexchange.com/questions/435941/-/436015#436015Abhimanyu Pallavi Sudhir2018-10-21T21:20:57ZAnswer by Abhimanyu Pallavi Sudhir for If force is a vector, then why is pressure a scalar?
https://physics.stackexchange.com/questions/429998/if-force-is-a-vector-then-why-is-pressure-a-scalar/430008#430008
4<p>Pressure is a scalar because it does not behave as a vector -- specifically, you can't take the "components" of pressure and take their Pythagorean sum to obtain its magnitude. Instead, pressure is actually proportional to the <em>sum</em> of the components, <span class="math-container">$(P_x+P_y+P_z)/3$</span>.</p>
<p>The way to understand pressure is in terms of the stress tensor, and pressure is equal to the trace of the stress tensor. Once you understand this, the question becomes equivalent to questions like "why is the dot product a scalar?" (trace of the tensor product), "why is the divergence of a vector field a scalar?" (trace of the tensor derivative), etc. </p>
<p>There is no physical significance to taking the diagonal components of a tensor and putting them in a vector -- there <em>is</em> a physical significance to adding them up, and the invariance properties of the result tells you that it is a scalar.</p>
<p>See also: <a href="https://physics.stackexchange.com/questions/186045/why-do-we-need-both-dot-product-and-cross-product/419873#419873">Why do we need both dot product and cross product?</a></p>Fri, 21 Sep 2018 08:57:17 GMThttps://physics.stackexchange.com/questions/429998/-/430008#430008Abhimanyu Pallavi Sudhir2018-09-21T08:57:17ZAnswer by Abhimanyu Pallavi Sudhir for How can the solutions to equations of motion be unique if it seems the same state can be arrived at through different histories?
https://physics.stackexchange.com/questions/426445/how-can-the-solutions-to-equations-of-motion-be-unique-if-it-seems-the-same-stat/426453#426453
1<p>"The jar is empty at present" just tells you $f(0)$. You also need $f'(0)$, $f''(0)$, etc.</p>Mon, 03 Sep 2018 09:46:25 GMThttps://physics.stackexchange.com/questions/426445/-/426453#426453Abhimanyu Pallavi Sudhir2018-09-03T09:46:25ZAnswer by Abhimanyu Pallavi Sudhir for From the speed of light being an invariant to being the maximum possible speed
https://physics.stackexchange.com/questions/331119/from-the-speed-of-light-being-an-invariant-to-being-the-maximum-possible-speed/423423#423423
0<p>A simple thought experiment does the trick -- consider a train moving faster than light, and it has headlights (it's a glass train). According to a stationery observer (stationery in a reference frame where the train is faster than light), the train must always be in front of the light, but according to an observer hanging out of the train, the light must be in front of him, since light speed is still $c$.</p>
<p>It might not seem like this relativeness of the order of the two objects is a problem, but it is -- say, for instance, the train is moving towards a high-tech wall which is trained to do this when switched ON:
(1) if hit by a train, make world explode
(2) if light is incident, switch OFF.
The wall is currently switched ON. According to one observer, the world explodes, whereas according to another, it doesn't. This is an inconsistency.</p>
<p>Why wouldn't this argument apply to <em>any</em> speed and prohibit all motion? For example, why can't the wall be programmed to switch off a certain amount of time after which light is incident? Relativity says this is okay, because time can dilate and transform scale between reference frames. </p>
<p>But in order to make FTL speeds okay, you need to allow time to flip direction -- this is why the real condition is "to go faster than light, you must forgo causality", or simply, "locality = causality".</p>Sat, 18 Aug 2018 12:28:48 GMThttps://physics.stackexchange.com/questions/331119/-/423423#423423Abhimanyu Pallavi Sudhir2018-08-18T12:28:48ZAnswer by Abhimanyu Pallavi Sudhir for Link between Special relativity and Newtons gravitational law
https://physics.stackexchange.com/questions/123243/link-between-special-relativity-and-newtons-gravitational-law/423379#423379
0<p>Consider three theories:</p>
<p>$$L_A=1$$
$$L_B=1+h$$
$$L_C=1+h+h^2$$</p>
<p>Theory A is a special case of Theory C when $h$ is small, Theory B is a special case of C when $h$ is small, doesn't this mean A and B are the same?</p>
<p>This is not a perfect analogy, but an example as to why this sort of reasoning breaks down.</p>Sat, 18 Aug 2018 07:13:36 GMThttps://physics.stackexchange.com/questions/123243/-/423379#423379Abhimanyu Pallavi Sudhir2018-08-18T07:13:36ZAnswer by Abhimanyu Pallavi Sudhir for Why is velocity defined as 4-vector in relativity?
https://physics.stackexchange.com/questions/423360/why-is-velocity-defined-as-4-vector-in-relativity/423364#423364
5<p>"It should transform like a four-vector under a Lorentz transformation" is a generalisation of several intuitions you typically have regarding how natural objects/tensors should behave in special relativity -- an obvious one is "no special status to any individual dimension, since space and time are inherently symmetric. That $dx^\mu/dx^0$ doesn't transform like a four-vector is obvious from the fact that it gives special preference to time.</p>
<p>The conventional way to define four-velocity in relativity is as $dx^\mu/ds$. Your 2-tensor idea is cute -- it is similar to the angle tensor generalised to four-dimensions -- but it doesn't satisfy the uses we have of the standard four-velocity (e.g. how would the four-momentum be defined? $m\,dx^\mu/dx^\nu$? That wouldn't be conserved.)</p>Sat, 18 Aug 2018 06:11:14 GMThttps://physics.stackexchange.com/questions/423360/why-is-velocity-defined-as-4-vector-in-relativity/423364#423364Abhimanyu Pallavi Sudhir2018-08-18T06:11:14ZComment by Abhimanyu Pallavi Sudhir on Ubuntu 17.04 Chromium Browser quietly provides full access to Google account
https://askubuntu.com/questions/915556/ubuntu-17-04-chromium-browser-quietly-provides-full-access-to-google-account
Me too. This is weird. Even if it's just the Chrome browser, I don't see why they'd need <i>full</i> access to my Google account. Windows doesn't do this.Sat, 14 Jul 2018 17:11:46 GMThttps://askubuntu.com/questions/915556/ubuntu-17-04-chromium-browser-quietly-provides-full-access-to-google-account?cid=1726608Abhimanyu Pallavi Sudhir2018-07-14T17:11:46ZComment by Abhimanyu Pallavi Sudhir on How to create folder shortcut in Ubuntu 14.04?
https://askubuntu.com/questions/486461/how-to-create-folder-shortcut-in-ubuntu-14-04/691976#691976
@jave.web Yes -- use the application menu (either at the top left of your screen or a colourful icon next to the window controls) to go to your Nautilus preferences, then under "Behavior" enable link creation.Fri, 13 Jul 2018 11:49:02 GMThttps://askubuntu.com/questions/486461/how-to-create-folder-shortcut-in-ubuntu-14-04/691976?cid=1724793#691976Abhimanyu Pallavi Sudhir2018-07-13T11:49:02ZComment by Abhimanyu Pallavi Sudhir on How to customize (add/remove folders/directories) the "Places" menu of Ubuntu 13.04 "Files" application?
https://askubuntu.com/questions/285313/how-to-customize-add-remove-folders-directories-the-places-menu-of-ubuntu-13/292727#292727
This works. If you also want to remove the folders from the home directory, edit user-dirs.defaults, or make a copy of it in .config and edit there (for your local user).Sat, 30 Jun 2018 09:00:13 GMThttps://askubuntu.com/questions/285313/how-to-customize-add-remove-folders-directories-the-places-menu-of-ubuntu-13/292727?cid=1716388#292727Abhimanyu Pallavi Sudhir2018-06-30T09:00:13ZComment by Abhimanyu Pallavi Sudhir on How to safely remove default folders?
https://askubuntu.com/questions/140148/how-to-safely-remove-default-folders/140964#140964
See <a href="https://askubuntu.com/questions/285313/how-to-customize-add-remove-folders-directories-the-places-menu-of-ubuntu-13">here</a> for a working solution. If you also want to remove the folders from the home directory, edit user-dirs.defaults, or make a copy of it in .config and edit there (for your local user).Mon, 25 Jun 2018 06:08:26 GMThttps://askubuntu.com/questions/140148/how-to-safely-remove-default-folders/140964?cid=1713336#140964Abhimanyu Pallavi Sudhir2018-06-25T06:08:26ZComment by Abhimanyu Pallavi Sudhir on How to safely remove default folders?
https://askubuntu.com/questions/140148/how-to-safely-remove-default-folders/140964#140964
Doesn't work -- even if you don't run the update command, it gets updated upon the next reboot. There must be a more fundamental file in which these directory names are kept.Mon, 25 Jun 2018 05:32:16 GMThttps://askubuntu.com/questions/140148/how-to-safely-remove-default-folders/140964?cid=1713326#140964Abhimanyu Pallavi Sudhir2018-06-25T05:32:16ZComment by Abhimanyu Pallavi Sudhir on Explaining the Main Ideas of Proof before Giving Details
https://mathoverflow.net/questions/301085/explaining-the-main-ideas-of-proof-before-giving-details
Because good proofs are just a formalisation of the intuitive understanding -- rather than wasting space explaining the insights, you can just give them the proof, and an even somewhat experienced reader can re-create the details.Sun, 27 May 2018 04:28:36 GMThttps://mathoverflow.net/questions/301085/explaining-the-main-ideas-of-proof-before-giving-details?cid=750004Abhimanyu Pallavi Sudhir2018-05-27T04:28:36ZComment by Abhimanyu Pallavi Sudhir on reference for higher spin - not gravitational nor stringy
https://mathoverflow.net/questions/195125/reference-for-higher-spin-not-gravitational-nor-stringy
On <a href="http://www.physicsoverflow.org/27048/reference-for-higher-spin-not-gravitational-nor-stringy?show=27499#a27499" rel="nofollow noreferrer">PhysicsOverflow</a>, there is a link to <a href="http://inspirehep.net/record/265411" rel="nofollow noreferrer">this paper</a> for the same question.Sun, 01 Mar 2015 02:25:25 GMThttps://mathoverflow.net/questions/195125/reference-for-higher-spin-not-gravitational-nor-stringy?cid=493513Abhimanyu Pallavi Sudhir2015-03-01T02:25:25ZComment by Abhimanyu Pallavi Sudhir on Classical and Quantum Chern-Simons Theory
https://mathoverflow.net/questions/159695/classical-and-quantum-chern-simons-theory
This has received an answer on PhysicsOverflow if you're still interested: <a href="http://www.physicsoverflow.org/22251/classical-and-quantum-chern-simons-theory#c22256" rel="nofollow noreferrer">Classical and Quantum Chern-Simons Theory</a>Thu, 14 Aug 2014 13:14:02 GMThttps://mathoverflow.net/questions/159695/classical-and-quantum-chern-simons-theory?cid=447277Abhimanyu Pallavi Sudhir2014-08-14T13:14:02ZComment by Abhimanyu Pallavi Sudhir on What is convolution intuitively?
https://mathoverflow.net/questions/5892/what-is-convolution-intuitively
<a href="http://en.wikipedia.org/wiki/File:Convolution_of_spiky_function_with_box2.gif" rel="nofollow noreferrer">Wikipedia</a>Fri, 17 Jan 2014 16:20:39 GMThttps://mathoverflow.net/questions/5892/what-is-convolution-intuitively?cid=396721Abhimanyu Pallavi Sudhir2014-01-17T16:20:39ZComment by Abhimanyu Pallavi Sudhir on Embedding of F(4) in OSp(8|4)?
https://mathoverflow.net/questions/111110/embedding-of-f4-in-osp84
Cross-posted to: <a href="http://physics.stackexchange.com/q/41155/23119">physics.stackexchange.com/q/41155/23119</a>Mon, 23 Dec 2013 04:35:50 GMThttps://mathoverflow.net/questions/111110/embedding-of-f4-in-osp84?cid=391443Abhimanyu Pallavi Sudhir2013-12-23T04:35:50ZComment by Abhimanyu Pallavi Sudhir on How to compare Unicode characters that "look alike"?
https://stackoverflow.com/questions/20674577/how-to-compare-unicode-characters-that-look-alike
I compared every single pixel of it, and it looks the same.Thu, 19 Dec 2013 09:26:53 GMThttps://stackoverflow.com/questions/20674577/how-to-compare-unicode-characters-that-look-alike?cid=30963612Abhimanyu Pallavi Sudhir2013-12-19T09:26:53ZComment by Abhimanyu Pallavi Sudhir on What is the definition of picture changing operation?
https://mathoverflow.net/questions/152295/what-is-the-definition-of-picture-changing-operation
Related: <a href="http://physics.stackexchange.com/q/12595/23119">physics.stackexchange.com/q/12595/23119</a>Thu, 19 Dec 2013 07:26:36 GMThttps://mathoverflow.net/questions/152295/what-is-the-definition-of-picture-changing-operation?cid=390438Abhimanyu Pallavi Sudhir2013-12-19T07:26:36ZComment by Abhimanyu Pallavi Sudhir on Understanding the intermediate field method for the $\phi^4$ interaction
https://mathoverflow.net/questions/149564/understanding-the-intermediate-field-method-for-the-phi4-interaction
@DanielSoltész: Nope, high-level questions generally get largely ignored there these days.Tue, 26 Nov 2013 14:40:20 GMThttps://mathoverflow.net/questions/149564/understanding-the-intermediate-field-method-for-the-phi4-interaction?cid=384774Abhimanyu Pallavi Sudhir2013-11-26T14:40:20ZComment by Abhimanyu Pallavi Sudhir on Intuition behind the ricci flow
https://mathoverflow.net/questions/143144/intuition-behind-the-ricci-flow/143146#143146
I was about to post the same thing, I think this is very illustrative.Tue, 19 Nov 2013 16:05:08 GMThttps://mathoverflow.net/questions/143144/intuition-behind-the-ricci-flow/143146?cid=383288#143146Abhimanyu Pallavi Sudhir2013-11-19T16:05:08ZComment by Abhimanyu Pallavi Sudhir on What is the relationship between complex time singularities and UV fixed points?
https://mathoverflow.net/questions/134939/what-is-the-relationship-between-complex-time-singularities-and-uv-fixed-points
This actually got twice the number of views here than on Physics.SE.Sun, 10 Nov 2013 14:50:44 GMThttps://mathoverflow.net/questions/134939/what-is-the-relationship-between-complex-time-singularities-and-uv-fixed-points?cid=381229Abhimanyu Pallavi Sudhir2013-11-10T14:50:44ZAnswer by Abhimanyu Pallavi Sudhir for The Fuchsian monodromy problem
https://mathoverflow.net/questions/146099/the-fuchsian-monodromy-problem/148462#148462
1<p>Equation 6.2 is just the Liovelle Action, the action principle for the <em>Liouville Field</em>, which is well-known from the familiar conformal gauge. </p>
<p>$$S_L=\frac{c}{96\pi}\int_\mathcal{M}\left(\dot\varphi^2-\frac{16\varphi}{\left(1-\lvert t\rvert^2\right)^2}\right)\mathrm{d}^2t$$ </p>
<p>... along with some trivial facts about partition functions. </p>
<p>You could of course think of it as the $Z_\mathcal{M}$'s (partition functions) of the metrics being related by the $S_L$'s in the same way that the metrics are related by the Liouvelle field. </p>
<p>And yes, I don't know how to spell "Lioivulle" properly. </p>Sun, 10 Nov 2013 06:53:28 GMThttps://mathoverflow.net/questions/146099/-/148462#148462Abhimanyu Pallavi Sudhir2013-11-10T06:53:28ZComment by Abhimanyu Pallavi Sudhir on Modular Arithmetic in LaTeX
https://mathoverflow.net/questions/18813/modular-arithmetic-in-latex
Haha, I thought this question was about typsetting a paper in $\LaTeX$Fri, 08 Nov 2013 11:34:52 GMThttps://mathoverflow.net/questions/18813/modular-arithmetic-in-latex?cid=379817Abhimanyu Pallavi Sudhir2013-11-08T11:34:52ZAnswer by Abhimanyu Pallavi Sudhir for String theory "computation" for math undergrad audience
https://mathoverflow.net/questions/47770/string-theory-computation-for-math-undergrad-audience/147307#147307
2<p>Derive the Casimir Energy in Bosonic String Theory. </p>
<p>You start with the $\hat L_0$ operator and get rid of the non-vacuum $\displaystyle\frac{\alpha_0^2}{2}+\sum_{n=1}^\infty\alpha_{-n}\cdot\alpha_n$, then you use a Ramanujam sum to do $\zeta$-function renormalisation, from which you find out that the vacuum energy denoted by $\varepsilon_0$ is </p>
<p>$$\varepsilon_0=-\frac{d-2}{24}$$ </p>
<p>However, the most interesting part comes when you go around <a href="https://mathoverflow.net/a/140354/36148">deriving</a> the critical dimension of Bosonic String Theory. </p>
<p>After which, the expression surprisingly simplifyies to a $-1$. </p>
<p>For a more detailed derivation of the above stuff, see <a href="http://arxiv.org/pdf/hep-th/0207142v1.pdf" rel="nofollow noreferrer">these</a> lecture notes/. (Section 4) (Equation 4.5-4.10) </p>Fri, 08 Nov 2013 04:33:41 GMThttps://mathoverflow.net/questions/47770/-/147307#147307Abhimanyu Pallavi Sudhir2013-11-08T04:33:41ZComment by Abhimanyu Pallavi Sudhir on Book on mathematical "rigorous" String Theory?
https://mathoverflow.net/questions/71909/book-on-mathematical-rigorous-string-theory/71998#71998
I don't think that BBS falls into the category of "mathematically rigorous". It's a very good, intuitive book.Fri, 08 Nov 2013 04:17:49 GMThttps://mathoverflow.net/questions/71909/book-on-mathematical-rigorous-string-theory/71998?cid=379753#71998Abhimanyu Pallavi Sudhir2013-11-08T04:17:49ZComment by Abhimanyu Pallavi Sudhir on About the massless supermultiplets in $2+1$ dimensional supersymmetry
https://mathoverflow.net/questions/103392/about-the-massless-supermultiplets-in-21-dimensional-supersymmetry
@S.Carnahan: The OP has voluntarily deleted it, which is weird... I have flagged this as unclear what you're asking.Wed, 06 Nov 2013 16:49:00 GMThttps://mathoverflow.net/questions/103392/about-the-massless-supermultiplets-in-21-dimensional-supersymmetry?cid=379331Abhimanyu Pallavi Sudhir2013-11-06T16:49:00ZAnswer by Abhimanyu Pallavi Sudhir for Does $SO(32) \sim_T E_8 \times E_8$ relate to some group theoretical fact?
https://mathoverflow.net/questions/57529/does-so32-sim-t-e-8-times-e-8-relate-to-some-group-theoretical-fact/147129#147129
5<p>The answer to this question can be found in Lubos Motl's answer to <a href="https://physics.stackexchange.com/q/65092/23119">this question of mine on Physics.SE</a>. </p>
<p>The key here are the weight lattices bosonic representations $\Gamma$ of these gauge groups.</p>
<p>As I understand it, the weight lattice of $E(8)$ is $\Gamma^8$, whereas the weight lattice of $\frac{\operatorname{Spin}\left(32\right)}{\mathbb{Z}_2}$^ is $\Gamma^{16}$. The first fact means that the weight lattice of $E(8)\times E(8)$ is $\Gamma^{8}\oplus\Gamma^8$, </p>
<p>Now, an identity, that $\Gamma^{8}\oplus\Gamma^8\oplus\Gamma^{1,1}=\Gamma^{16}\oplus\Gamma^{1,1} $ , which actually allows this T-Duality. Now, this means that it is <em>this very identity</em> which allows the identity mentioned in the original post. </p>
<p>So, the answer to your question is "<strong>Yes</strong>", there <em>is</em> a group-theoretical fact, and that is that $ \Gamma^{8}\oplus\Gamma^8\oplus\Gamma^{1,1}= \Gamma^{16}\oplus\Gamma^{1,1} $. </p>Wed, 06 Nov 2013 16:46:03 GMThttps://mathoverflow.net/questions/57529/does-so32-sim-t-e-8-times-e-8-relate-to-some-group-theoretical-fact/147129#147129Abhimanyu Pallavi Sudhir2013-11-06T16:46:03ZAnswer by Abhimanyu Pallavi Sudhir for Why does bosonic string theory require 26 spacetime dimensions?
https://mathoverflow.net/questions/99643/why-does-bosonic-string-theory-require-26-spacetime-dimensions/140354#140354
5<p>$$$$</p>
<p><em>Note, that here, the $\hat L_n$ are operators on the state given by the sums of the dots of the mode operators, i.e. $\hat L_0=\sum_{k=-\infty}^\infty\hat\alpha_{-n}\cdot\hat\alpha_n$.</em> </p>
<p>Also note that The Virasoro Algebra is the central extension of the Witt/Conformal Algebra so that explains why we have a $D$, it is equivalent to the central charge. </p>
<p>I'll expand on Chris Gerig's answer. </p>
<p>Not only do we need $D=26$, we also need the normal ordering constant $a=1$. The normal ordering constant is the eigenvalue of $\hat L_0$ with the eigenvector the state. </p>
<p>We want to promote the time-like states to spurious, zero-norm states, right? So, we impose the (level 1) spurious state conditions on the state as ffollows ($|\chi\rangle$ are the basis vectors to build the spurious state $\Phi\rangle$ on.) </p>
<p>$$ \begin{gathered}
0 = {{\hat L}_1}\left| \Phi \right\rangle \\
{\text{ }} = {{\hat L}_1}{{\hat L}_{ - 1}}\left| {{\chi _1}} \right\rangle \\
{\text{ }} = \left[ {{{\hat L}_{ - 1}},{{\hat L}_1}} \right]\left| {{\chi _1}} \right\rangle + {{\hat L}_{ - 1}}{{\hat L}_1}\left| {{\chi _1}} \right\rangle \\
{\text{ }} = \left[ {{{\hat L}_{ - 1}},{{\hat L}_1}} \right]\left| {{\chi _1}} \right\rangle \\
{\text{ }} = 2{{\hat L}_0}\left| {{\chi _1}} \right\rangle \\
{\text{ }} = 2{c_0}\left( {a - 1} \right)\left| {{\chi _1}} \right\rangle \\
\end{gathered} $$</p>
<p>That means that $a=1$. </p>
<p>Now, for a level 2 spurious state, </p>
<p>$$\begin{gathered}
\left[ {{{\hat L}_1},{{\hat L}_{ - 2}} + k{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}} \right]\left| \psi \right\rangle = \left( {3{{\hat L}_{ - 1}} + 2k{{\hat L}_0}{{\hat L}_{ - 1}} + 2k{{\hat L}_{ - 1}}{{\hat L}_0}} \right)\left| \psi \right\rangle {\text{ }} \\
{\text{ }} = \left( {3 - 2k} \right){{\hat L}_{ - 1}} + 4k{{\hat L}_0}{{\hat L}_{ - 1}}{\text{ }}\left( {3 - 2k} \right){{\hat L}_{ - 1}} + 4k{{\hat L}_0}{{\hat L}_{ - 1}}{\text{ }} \\
0 = {{\hat L}_1}\left| \psi \right\rangle = {{\hat L}_1}\left( {{{\hat L}_{ - 2}} + k{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}} \right)\left| {{\chi _1}} \right\rangle = \left( {\left( {3 - 2k} \right){{\hat L}_{ - 1}} + 4k{{\hat L}_0}{{\hat L}_{ - 1}}} \right)\left| {{\chi _1}} \right\rangle \\
{\text{ }} = \left( {\left( {3 - 2k} \right){{\hat L}_{ - 1}} + 4k{{\hat L}_{ - 1}}\left( {{{\hat L}_0} + 1} \right)} \right)\left| {{\chi _1}} \right\rangle \\
{\text{ }} = \left( {3 - 2k} \right){{\hat L}_{ - 1}}\left| {{\chi _1}} \right\rangle \\
2k = 3 \\
k = \frac{3}{2} \\
\end{gathered} $$ </p>
<p>Since this level 2 spurious state can be written as: </p>
<p>$$ {\left| \Phi \right\rangle = {{\hat L}_{ - 2}}\left| {{\chi _1}} \right\rangle + k{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}\left| {{\chi _2}} \right\rangle }$$ ## </p>
<p>So, then, </p>
<p>$$ \begin{gathered}
{{\hat L}_2}\left| \Phi \right\rangle = 0 \\
{{\hat L}_2}\left( {{{\hat L}_{ - 2}} + \frac{3}{2}{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}} \right)\left| {{\chi _2}} \right\rangle = 0 \\
\left[ {{{\hat L}_2},{{\hat L}_{ - 2}} + \frac{3}{2}{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}} \right]\left| {{\chi _2}} \right\rangle + \left( {{{\hat L}_{ - 2}} + \frac{3}{2}{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}} \right){{\hat L}_2}\left| {{\chi _2}} \right\rangle = 0 \\
\left[ {{{\hat L}_2},{{\hat L}_{ - 2}} + \frac{3}{2}{{\hat L}_{ - 1}}{{\hat L}_{ - 1}}} \right]\left| {{\chi _2}} \right\rangle = 0 \\
\left( {13{{\hat L}_0} + 9{{\hat L}_{ - 1}}{{\hat L}_{ + 1}} + \frac{D}{2}} \right)\left| {{\chi _2}} \right\rangle = 0 \\
\frac{D}{2} = 13 \\
\text{Since $L_0|\chi_2\rangle = -|\chi_2\rangle$ and $L_{+1}|\chi_2\rangle=0$, we have }
D = 26 \\
\end{gathered} $$ \ </p>
<p>And then, finally,</p>
<p>Q.E.D. </p>
<p>So, this was done essentially to remove the imaginary norm ghost states and using the Canonical / Gupta - Bleuer formalism. </p>
<p>It's also possible to use , say, e.g. Light Cone Gauge (LCG) quantisation. However, in other quantisation methods, the conformal anomaly is manifest in other forms. E.g., in LCG quantisationn, it is manifest as a failure of lorentz symmetry. A good overview of this method can be found in <strong>Kaku</strong> <em>Strings, Conformal fields, and M-theory</em> (it's the only part of the book that I liked, actually. The rest of the book is too rigorous, without much physical intuition.). </p>Sun, 25 Aug 2013 09:40:17 GMThttps://mathoverflow.net/questions/99643/why-does-bosonic-string-theory-require-26-spacetime-dimensions/140354#140354Abhimanyu Pallavi Sudhir2013-08-25T09:40:17ZAnswer by Abhimanyu Pallavi Sudhir for Coincidence, purposeful definition, or something else in formulas for energy
https://physics.stackexchange.com/questions/71119/coincidence-purposeful-definition-or-something-else-in-formulas-for-energy/71121#71121
4<p>Most of them (all of your examples except <span class="math-container">$E=c^2m$</span>, which is really just <span class="math-container">$E=m$</span> anyway) arise from integrating a linear equation like <span class="math-container">$p=mv$</span> as <span class="math-container">$E=\int v\,dp$</span>, and it is often just a convention that we choose the linear relation to have a constant of proportionality of 1, so the integral has a constant of 1/2 (for example, we could've instead chosen, like we do with areas of circles, to have <span class="math-container">$c=2\pi r$</span> and <span class="math-container">$A=\pi r^2$</span>). </p>Mon, 15 Jul 2013 04:01:14 GMThttps://physics.stackexchange.com/questions/71119/-/71121#71121Abhimanyu Pallavi Sudhir2013-07-15T04:01:14ZAnswer by Abhimanyu Pallavi Sudhir for Is velocity of light constant?
https://physics.stackexchange.com/questions/66856/is-velocity-of-light-constant/68513#68513
1<p>There are two questions here -- is the velocity of light <em>constant</em>, and is it <em>invariant</em>?</p>
<p>The direction/velocity of light changes whenever it interacts with something. This includes gravitational deflection, since things have to change direction in curved spacetime in one sense or another. The velocity isn't constant.</p>
<p>Is it invariant under Lorentz boosts in perpendiculal directions? <em>No.</em> The speed is invariant, but the velocity isn't. This should be fairly clear, but you can prove it with brute force --</p>
<p>We need to apply a boost to light's four-velocity, but the four-velocity of light is actually infinite -- it's (infinity, infinity, 0, 0), except the infinities satisfy a certain relation in the sense of being related through a limit. So we consider an object traveling at speed $w$ in the $x$-direction, boost $v$ in the $y$-direction and let $w\to c$. The four-velocity transforms under this boost as:</p>
<p>$$\left[ {\begin{array}{*{20}{c}}{\gamma (w)}\\{w\gamma (w)}\\0\\0\end{array}} \right] \to \left[ {\begin{array}{*{20}{c}}{\gamma (v)\gamma (w)}\\{w\gamma (w)}\\{ - v\gamma (v)\gamma (w)}\\0\end{array}} \right]$$</p>
<p>The conventional 3-velocity can be extracted here by considering $dx/dt$, $dy/dt$:</p>
<p>$$\frac{{dx}}{{dt}} = \frac{{dx/d\tau }}{{dt/d\tau }} = \frac{{w\gamma (w)}}{{\gamma (v)\gamma (w)}} = \frac{w}{{\gamma (v)}}$$
$$\frac{{dy}}{{dt}} = \frac{{dy/d\tau }}{{dt/d\tau }} = \frac{{ - v\gamma (v)\gamma (w)}}{{\gamma (v)\gamma (w)}} = - v$$</p>
<p>Taking the limit as $w\to 1$, you get a 3-velocity of $(1/\gamma(v),-v, 0)$ -- one may confirm that this is not equivalent to the original three-velocity that was $(1,0,0)$, but nonetheless has the same magnitude (speed is invariant).</p>Wed, 19 Jun 2013 04:17:58 GMThttps://physics.stackexchange.com/questions/66856/-/68513#68513Abhimanyu Pallavi Sudhir2013-06-19T04:17:58ZAnswer by Abhimanyu Pallavi Sudhir for Measuring extra-dimensions
https://physics.stackexchange.com/questions/22542/measuring-extra-dimensions/68414#68414
4<p>The standard way to measure compactified dimensions is to test some inverse-square law (e.g. Newton's, electromagnetic, diffusion) at the scale and see if it breaks down and starts approaching some other (higher power) inverse-power law.</p>
<p>In fact, the inverse-square law has only been verified down to a scale of 0.1mm -- here's a recent experimental paper doing this: <a href="http://arxiv.org/abs/hep-ph/0011014v1" rel="nofollow noreferrer">[1]</a>.</p>
<p>(Yes, you can measure time in metres, by multiplying by the speed of light. This is where "lightseconds" and other such measurements of distance come from. An example motivation for treating this as the unit of the time dimension is from the Minkowski metric, $ds^2=c^2dt^2-dx^2-dy^2-dz^2$, where $ct$ is a dimension analogous to the spatial ones.)</p>Tue, 18 Jun 2013 04:16:35 GMThttps://physics.stackexchange.com/questions/22542/-/68414#68414Abhimanyu Pallavi Sudhir2013-06-18T04:16:35ZAnswer by Abhimanyu Pallavi Sudhir for A change in the gravitational law
https://physics.stackexchange.com/questions/41109/a-change-in-the-gravitational-law/68326#68326
5<p>Such a change requires a 4+1-dimensional spacetime instead of a 3+1-dimensional one -- this would have several serious implications --</p>
<ol>
<li><p>The Riemann curvature tensor gains new "parts" with interesting physical implications with each new spacetime dimension -- 1-dimensional manifolds have no curvature in this sense, 2-dimensional manifolds have a scalar curvature, 3-dimensional manifolds gain the full Ricci tensor, 4-dimensional manifolds get components corresponding to a new Weyl tensor and 5-dimensional geometry gets even more components, and general relativity in this spacetime is capable of explaining electromagnetism, too, so electromagnetism (along with the radion field) starts behaving as a part of gravity.</p></li>
<li><p>Apparently a 5-dimensional spacetime is unstable, according to wikipedia's "privileged character of 3+1-dimensional spacetime"<a href="http://en.wikipedia.org/wiki/Spacetime#Privileged_character_of_3.2B1_spacetime" rel="nofollow noreferrer">[1]</a> (now a transclusion of <a href="https://en.wikipedia.org/wiki/Anthropic_principle#Dimensions_of_spacetime" rel="nofollow noreferrer">[2]</a>).</p></li>
<li><p>The string theory landscape would be a bit smaller, since there are less dimensions to compactify.</p></li>
<li><p>The Ricci curvature in a vacuum on an Einstein Manifold would no longer be exactly $\Lambda g_{ab}$. There will be a coefficient of 2/3.</p></li>
<li><p>The magnetic field, among other things "cross product-ish", could not be written as a vector, unlike the electric field. This is because it would have 6 components whereas the spatial dimension is only 4. So, perhaps humans would become familiar with exterior algebras earlier than us who live in 3+1 dimensions. Either that or we would be trying to find out how magnetism works. Or we would just die out, for all the other reasons.</p></li>
<li><p>In string theory (see e.g. <a href="http://arxiv.org/abs/hep-th/0207249v1" rel="nofollow noreferrer">[3]</a>), gravitational constants in successively higher dimensions are calculated as $G_{n+1}=l_sG_n$, where $l_s$ is the string length (the units must be different in order to accomodate the extra factor of $r$ in Newton's gravitational law). For distance scales greater than the string length, this causes gravity to be much weaker than in our number of dimensions, but stronger for length scales shorter than the string length. It's interesting how gravity's long-range ability peaks at 4 dimensions (it is a contact force below 4 dimensions).</p></li>
</ol>
<p>See also some recent tests of the inverse square law at short length scales (to check for compactification -- <a href="http://arxiv.org/abs/hep-ph/0011014" rel="nofollow noreferrer">[4]</a>.</p>Mon, 17 Jun 2013 10:12:52 GMThttps://physics.stackexchange.com/questions/41109/-/68326#68326Abhimanyu Pallavi Sudhir2013-06-17T10:12:52ZAnswer by Abhimanyu Pallavi Sudhir for Mass of a superstring between two branes?
https://physics.stackexchange.com/questions/46118/mass-of-a-superstring-between-two-branes/68240#68240
2<p>It's similar -- </p>
<p>$${m^2} = \left( {N - a} \right) + {\left( {\frac{y}{{2\pi }}} \right)^2}$$</p>
<p>The important difference is that the number operator and normal ordering constant change for a superstring, and vary by sector.</p>Sun, 16 Jun 2013 11:12:27 GMThttps://physics.stackexchange.com/questions/46118/-/68240#68240Abhimanyu Pallavi Sudhir2013-06-16T11:12:27ZAnswer by Abhimanyu Pallavi Sudhir for How is it that angular velocities are vectors, while rotations aren't?
https://physics.stackexchange.com/questions/286/how-is-it-that-angular-velocities-are-vectors-while-rotations-arent/65738#65738
6<p>You are mixing up different things. A rotation transformation is a transformation of vectors in a linear space -- such a transformation doesn't need to have any angular velocities or anything, and it doesn't even need to have anything to do with a mechanical rotation.</p>
<p>The angular velocity is the rate of a physical rotation, measured as $\vec\omega=d\vec\theta/dt$, where $\vec\theta$ is <em>also</em> a vector, the rotational analog of displacement.</p>
<p>In any case, the $\vec\theta$ is not the same as the matrix of rotation. The latter is a <em>function</em> of $\vec\theta$, but a matrix can be used to represent a lot more things than just a rotation. Note that a rotation can still be modelled as a time-dependent matrix itself, like $\vec{x}(t)=A(t)\vec{x}(0)$, but the matrix is still not the same as the angle of rotation.</p>
<hr>
<p>Note: I've been a bit sneaky in claiming that $\vec\theta$ is a "vector" -- it's really not, although it happens to have 3 components in 3 dimensions so it's conventional to write the "xy" component as the "z" component, "xz" as the "y" component, "yz" as "x", but in general it's best to think of angles as (2, 0) tensors $\theta^{\mu\nu}$. Interestingly, the rotation transformation is a (1, 1) tensor $A^{\mu}{}_{\nu}$.</p>Fri, 24 May 2013 12:20:55 GMThttps://physics.stackexchange.com/questions/286/-/65738#65738Abhimanyu Pallavi Sudhir2013-05-24T12:20:55ZAnswer by Abhimanyu Pallavi Sudhir for Can someone please explain magnetic vs electric fields?
https://physics.stackexchange.com/questions/53916/can-someone-please-explain-magnetic-vs-electric-fields/65091#65091
3<p>The electric and magnetic fields arise as Lorentz duals of each other, with them mixing and transforming between each other through Lorentz boosts. The full picture of the field comes from the electromagnetic field tensor</p>
<p>$$F_{\mu\nu} = \begin{bmatrix}
0 & E_x/c & E_y/c & E_z/c \\
-E_x/c & 0 & -B_z & B_y \\
-E_y/c & B_z & 0 & -B_x \\
-E_z/c & -B_y & B_x & 0
\end{bmatrix}$$</p>
<p>Which satisfies simple identities (see <a href="https://en.wikipedia.org/wiki/Electromagnetic_tensor#Significance" rel="nofollow noreferrer">[1]</a>) equivalent to Maxwell's equations. The electric and magnetic fields are different components of this tensor, placed in similar positions as e.g. the momemtnum and shear stress in the 4d stress tensor.</p>Sun, 19 May 2013 05:01:31 GMThttps://physics.stackexchange.com/questions/53916/-/65091#65091Abhimanyu Pallavi Sudhir2013-05-19T05:01:31Z