Dinosaur comics discusses entropy and the fate of the universe.
T. Rex muses on the Poincare recurrence theorem and Boltzmann’s suggested resolution of the arrow of time problem, but Dromiceiomimus seems to have a better understanding of the lessons of modern cosmology. Utahraptor, meanwhile, argues that the universe is not manifestly ergodic, and insists that the entropy problem is not yet resolved.
Shouldn’t this be in the “great errors civilians make” category?
Because given infinite time, we could still skip not only a finite subset of possibilities, we could skip an arbitary number of infinite subsets of possibilities.
In fact, annoyingly, we could just repeat a finite number of possibilities an infinite number of times and be exceedingly boring. Missing out on almost all of the infinite possibilities through sheer stubbornness.
But now I feel like I am channeling Max, so I’d better stop… 😉
Pingback: dasmoo »
Steinn: but doesn’t that violate the spirit of the approach Boltzmann took in formulating stat mech (that all microstates have equal probability?)
All accessible microstates, surely.
There may well be physically possible states, which are not accessible from (some given) initial conditions.
Maybe if it can be shown that the ensemble of universes includes all possible initial conditions, then given infinite time all allowed microstates occur; though I’d like to see a proof by construction (ie I can’t convince myself that it is impossible to exclude some subset of conceivable microstates by any such evolution).
Given some particular set of “initial condition” on a “small enough” space (which I think may still be infinite), I think a heuristic proof is possible that many possible microstates are never actually reached, even given infinite time.
For a finite initial spatial extent, but infinite time, I think the proof is trivial.
Actually, surely this is trivial: even under Boltzman, we could have parity constraints – for example an infinite universe for an infinite time might still be such as to exclude anyone ever being lefthanded. It would be physically conceivable for people to be lefthanded, but a strong constraint forbidding any actua l person from actually achieving left handedness in reality. And this would be purely arbitary in that it could have been right handedness that was excluded.
Anyway, you can see where I am going with this…
As long as we don’t get into a semantic argument about “possible”.
I so damn sick of people not understanding this.
Here is an essay from my wiki (not yet on-line) that explains EXACTLY what is going on here.
(It’s in mediawiki syntax, but even so should be pretty easy to read.
The only thing that’s probably not clear is {{sc|pdf}} means PDF in small caps.)
=The Second Law of Thermodynamics and Boltzmann’s H-Theorem=
==The issue==
When one reads about statistical mechanics, both in textbooks and in popular
works, some misunderstandings on the subject that date from the late 1800s
and early 1900s still remain common. These tend to cluster around three
issues:
* The second law of thermodynamics is absolute but a particular system can evolve towards a “less likely” state.
* The reversibility and recurrence “paradoxes”.
* Are the systems of classical mechanics ergodic?
Physicists of the modern world have basically made their peace with the first
point. Although it was a big deal in the late 1800s, and considered a blow
against kinetic/atomic theory, no-one nowadays has a problem with the reality
that ice-cubes in the sun melt, alongside the theoretical possibility that
just once, this time, liquid water placed under a warm sun might freeze.
Even so, the understanding of what this actually means mathematically
is pretty limited and, if pressed, the details are usually wrong.
The second point, even more so, is usually completely botched, and the
horrible explanations given for it usually poison the understanding of the
other two points. So with this in mind, let’s examine the issue properly.
==A little history of the H theorem==
In the mid-1800s Clausius came up with the idea of entropy and the second
law of thermodynamics. At this time, recall, the very idea of atoms was
controversial; some felt that the concepts of thermodynamics were primal and
did not need to be justified or derived from models of how the world was
constructed, while at the same time others were pursuing the kinetic theory
of gases and trying to use its successes to prove the existence of atoms.
Against this background, the most significant thing yet proved about the
kinetic theory of gases was Maxwell’s velocity distribution. But some people
were unhappy with various aspects of the proof. The proof then, just like the
proof one usually sees today, assumed that the velocities of interacting
molecules were uncorrelated, something some felt was not justifiable.
(On the other hand, by making this assumption, the proof showed the
generality of the resultant distribution regardless of whatever details
one might assume of the interaction of the molecules.)
Boltzmann, to deal with this, came up with the Boltzmann Transport Equation,
which more explicitly dealt with the interactions. It was fairly easy to show
that a maxwellian distribution was static under this equation, in other words
would not change with time. But Boltzmann wanted to show something more; that
any other distribution would monotonically evolve towards the maxwellian
distribution.
To this end he defined a quantity (which we now call H), a property of a
particular distribution, and proved two things
* dH/dt = 0 for a maxwellian distribution and
* dH/dt
The sad thing is that this same
mishmash of poorly thought-out arguments and counter-arguments still appears
in today’s textbooks. I remember being bugged by the sloppiness of these
arguments back almost twenty years ago when I was an undergraduate.
None of this is necessary — there’s a perfectly good, perfectly simple
explanation for what’s going on that doesn’t require this handwaving.
However to get to that point, we need a slight detour. I’m going to give the detour in
more detail than is needed just to deal with this problem because the ideas are
interesting, worth remembering, and best understood in a non-thermodynamics
context than hasn’t been poisoned with invalid arguments.
==Data compression==
Let’s switch to an apparently very different problem, the problem of data
compression as performed by computers. Data compression consists of two parts.
===modelling===
The first stage, called modeling, transforms fragments of the data in some
way so as to generate a stream of so-called symbols. Modeling varies from
compression scheme to compression scheme — in JPEG, for example, it
involves, among other things, splitting the image into 8×8 blocks and
performing a 2D DCT (something like a Fourier transform) on the data in each
8×8 block.
Modeling is specific to each compression scheme and the details do not
matter to us. What matters is that after modeling the result is a stream
of what we might abstractly call symbols. Suppose that our modeling results
in symbols that can have values 0..255. The simplest way to store these values
would simply be to use 8 bits for each symbol. This, however, would be far
from optimal if some symbols are very much more common than other symbols.
===entropy coding===
What is done in data compression is to encode the symbols using what is called
entropy coding. Entropy coding comes in two forms, Huffman coding and
Arithmetic coding.
{{infobox1|A second part of the theory
is that the bit stream you construct has to be readable, even though there
are no markers between the (variable length) bit strings indicating where
one stops and the next starts. This implies that the collection of bit strings
you use has to possess what is called the prefix property.}}
Huffman coding uses shorter strings of bits for symbols
that are more common, and longer strings of bits for symbols that are less
common. The theory tells you (given the probabilities of different symbols)
the optimal way to map symbls onto bit strings.
Arithmetic coding achieves the same goal as Huffman coding, namely using fewer
bits to encode the more common symbols, in a way that is somewhat more
efficient than Huffman coding, but quite a bit more difficult to understand.
However it’s not relevant to our discussion.
===an example: compressing english text===
So, given what we have said above, suppose we want to compress some data.
To avoid getting bogged down in irrelevant details, let us assume that the
data we want to compress is English language text encoded using 8-bit ASCII
using LATIN-1 high-bit encoding,
and that we are going to ignore the modelling stage of compression.
So the problem we have given ourselves is that we have symbols which are 8-bit
ASCII characters, 0..255. Right away we know that some characters are going
to be far more common than others. The characters with the high bit set (ie
with a value >127) are highly unlikely. These refer to diphthongs accented
characters, punctuation symbols rarely used in English and so on.
Punctuation characters are less likely than many letters, and capital letters
are less frequent than lower case letters.
Certain letters are much more likely than other letters.
====the probability distribution function for english text====
Compression is all about having an accurate mathematical model of the
probability structure of the data.
As a first approximation, we can consider the probability of each individual
ASCII character. This gives us an array of 256 probabilities. In some vague
sense that philosophers can argue over, there is presumably some sort of
“ideal” probability distribution function ({{sc|pdf}}) for English language text
that incorporates all text that has been and can be written, and that’s what
our compression program is targetting. But, of course, we can’t just conjure
up that ideal, so what we do is gather a large body of what we hope is
representative English text, calculate the empirical (as opposed to ideal)
statistics for that text, and treat those (sample) statistics as representative
of all English text and thus equal to our philosophical ideal.
We can then use these empirical probabilities to
construct a Huffman code (or to drive an arithmetic coder), and we have a
way to compress English ASCII text.
===the mathematical entropy associated with any discrete {{sc|pdf}}===
Now let’s step back a little from this example and consider the general
issue. As ”’mathematicians”’, we can define a quantity, named
the ”’mathematical entropy”’, for any {{sc|pdf}}. The entropy is defined as
S=-Sum[ probability(symbol)*lb( probability(symbol) ),
summed over all symbols ]
where lb() is the binary log (ie log to base 2) of a number.
This may seem a bit much to take in, but really it’s not hard.
Let’s assume we have four symbols, A, B, C, D, and that the probabilities are
(A, 1/2) (B, 1/4), (C, 1/8), (D, 1/8)
The entropy associated with this {{sc|pdf}} is 1*.5 + 2*.25 + 3*.125 + 3*.125 =1.75.
Note that perfect entropy
coding of a collection of symbols with some given {{sc|pdf}} means that each symbol
will take, on average, -lb( probability(symbol) ) bits to encode.
(Probabilities are less than one, the log is negative, so we add a minus
sign to make the result positive.)
So perfect entropy coding of our example would utilize
1 bit to encode an A, 2 bits to encode a B, and 3 bits to encode a C or a D.
It should be obvious from the above calculation that the entropy of the {{sc|pdf}}
is nothing more than the average number of bits required per symbol to perfectly
entropy encode a stream of data conforming to this {{sc|pdf}}.
{{infobox| arithmetic coding |
In fact arithmetic coding entropy encodes data using a non-integral number
of bits per symbol, so we can actually approach perfect entropy coding in real
computer programs. This is a pretty neat trick, and I’d recommend you read
up on how it is done if you have time.
}}
You may wonder what happens when the probabilities of a symbol are not nice
power-of-two probabilities as in the example. In that case, Huffman encoding
cannot generate perfect entropy coding results, because the length of a
Huffman code is obviously some integer number of bits, while the perfect
entropy code might be some irrational number of bits, say 3.7569…
In this case the average number of bits required to Huffman encode the symbol
stream will be larger than the entropy; the entropy is a lower bound, the
absolute best we can do.
There are, of course, different {{sc|pdf}}s for different
sets of material we may consider compressing, for example the statistics,
and thus the {{sc|pdf}}, associated with the set of all photos, information very
valid to the design of a compression scheme like JPEG, are very different
from the statistics for English language text.
===entropy is a property of a {{sc|pdf}}, not a finite sample from that {{sc|pdf}}===
At this stage we now need to point out an essential point,
”’the”’ essential point to understanding this stuff, both in the context
of data compression and later in the physics context:
”’The {{sc|pdf}} describing the distribution of symbols is a property of some abstract infinite stream of symbols, for example some vague idea of the set of all English text.”’
Now the properties of a {{sc|pdf}} will almost certainly be measured empirically,
using as large a collection as is feasible of the type of material we want
to compress, for example a large collection of English documents.
From the statistics of this sample stream, an estimate of the entropy of
the {{sc|pdf}} governing these symbols is then a simple calculation.
The {{sc|pdf}} is, however, some sort of ideal entity not linked to
the particular sample material we used; the particular symbol stream
used to design a compression algorithm is simply regarded as a
representative sample from an infinite stream of symbols.
===a misleading concept. the “entropy” of a finite sample===
Switch now from the idea of all English text to focus on a particular
piece of English text, a particular file we wish to compress.
For any ”’specific”’ piece of English text, we can compress the stream of
symbols using an entropy coder and the {{sc|pdf}} for English text, and the
compressed data will have some size, meaning some average number of bits
per symbol.
We can call this, if we want, the entropy of this ”’specific”’ piece of English
text, but it is conceptually a very different thing from the mathematical
entropy we defined for the English language {{sc|pdf}}. This specific entropy (ie the
average number of bits required per symbol to represent the text) may be
rather larger than the entropy of the English language {{sc|pdf}} (for example the
text may be something written by James Joyce, or an article about words to
use in scrabble), or this specific entropy may be less than that
of the English language {{sc|pdf}} (for example the text may
be written for children, and may utilize only short simple words with
very little punctuation).
===if you want to learn more about data compression and entropy coding===
{{infobox| correlation between symbols |
The most important subject we have omitted from the discussion above,
interesting but not relevant to where we are going with this, is
exploitation of the correlation between
successive symbols to reduce the number of bits required for compression,
something that gets us into Markov models. (An obvious example is that the
letter q is almost always followed by the letter u, and surely a compression
scheme should be able to exploit that somehow.)
While Markov models are a
theoretically powerful method of doing so, there are severe practical
problems with using them because of a combinatorial explosion in the number
of probabilities one has to keep track of. The major goal of modeling
is to attempt to restructure the data stream from its initial form, where
there are obvious correlations between various pieces of data, to some
intermediate form whose symbols are, as far as is practical, independent of
each other. How best to do this clearly depends on the type of data and the
techniques used for text, still images, video, general audio or speech are
all very different.
The rest of the book is concerned with the details of the modeling used by
JPEG2000 — fascinating but very dense.
}}
If you are interested in the details of entropy coding beyond what I’ve
discussed, IMHO by far the best introduction is Chapter 2 of
[http://www.amazon.com/exec/obidos/tg/detail/-/079237519X/qid=1090272956/sr=8-1/ref=sr_8_xs_ap_i1_xgl14102-5724443-9727327?v=glance&s=books&n=507846 the JPEG2000 book by Taubman and Marcellin].
(This is an expensive book and, unless you are really interested in the
subject, you probably won’t want to read most of it, so I’d suggest borrowing
a copy from a library or a friend rather than buying it.)
==The H theorem refers to {{sc|pdf}}s, not samples==
{| style=”float:right; margin-left: 1em; width:50%;” cellpadding=5 cellspacing=1 border=0
|-
|align=left width=100% style=”background-color:#f3f3ff; border:1px solid”|
”’physics entropy rather than cs entropy”’
Note that the explanation above utilized by logarithms to base 2 to calculate the
entropy for the purposes of computer science. In physics, with a different set of
concerns we calculate entropy using logarithms to base e, but the essential points
remain the same.
Note also that the explanation above dealt with a discrete {{sc|pdf}}.
There are interesting technical mathematical challenges when one goes from a
discrete {{sc|pdf}} to a continuous {{sc|pdf}}, like for example, a gaussian, but
we will ignore those and focus on the important thing which is that, after all
the pain of proving the results, the bottom line is that our ideas from discrete
{{sc|pdf}}s map over to continuous {{sc|pdf}}s pretty much as we’d expect.
|}
With the above detour out the way, let’s return to Boltzmann;
perhaps you can already see what the fundamental issue is.
Boltzmann’s theorem refers to ”'{{sc|pdf}}s”’. It says that the time evolution of a
{{sc|pdf}} occurs in a certain way.
Meanwhile the reversion and recurrence paradoxes refer to
specific instances of a mechanical system, ”’not”’ to {{sc|pdf}}s. As such, what they
do or don’t say is irrelevant to Boltzmann’s theorem.
===a rigorous mathematical view of the Boltzmann transport equation===
More specifically we can say that, from the point of view of a nicely
manageable mathematical structure, we want to talk about {{sc|pdf}}s.
We can, as mathematicians, define a mathematical structure that is a function
of space and time and that has as its value at each space-time point a value
which is a probability density function for a velocity. This is a more careful,
more explicit way of defining the function of Boltzmann’s transport equation.
If we now define a way in which this {{sc|pdf}}-valued function evolves with time
(the Boltzmann transport equation) we have a perfectly consistent well defined
mathematical problem. We can now prove various properties of this
mathematical system, one of which is that (assuming various properties of
specific transport equation we’re using), the entropy of the {{sc|pdf}} associated
with each spatial point is monotonically non-decreasing.
(This mathematical result holds for any {{sc|pdf}}, but is physically only useful for
situations where a {{sc|pdf}} plausibly suggests itself.
For the most part such situations are either equilibrium [ie the
pdf is the maxwellian-boltzmann distribution], or “different equilibrium at
different points of space” eg a gas with some non-uniform temperature
distribution. )
===a real world view of a collection of molecules===
OK, this is a fully consistent mathematical construction.
However to some extent in the real world, we don’t deal with {{sc|pdf}}s,
we deal with finite collections of real atoms or molecules.
For example a finite collection of real gas molecules does ”’not”’ according
to the Boltzmann transport equation. The very idea makes no sense, since
the entities referred to in the two situations (on the one hand a {{sc|pdf}}-valued
function, on the other hand a large collection of positions and velocities)
are completely different.
A collection of real gas molecules evolves according to the laws of mechanics
rather than the Boltzmann transport equation, and therefore is indeed subject to
the issues of reversibility and recurrence, properties that can be proved for
mechanical (hamiltonian) systems.
Now, going back to the transport equation, the pdf that we associate with any
particular point of space-time at equilibrium is, of course, the maxwellian
distribution. With this distribution in mind, note that, just as we did with our
specific piece of English text, we can calculate a ”’specific”’ entropy for a specific
collection of gas molecules. Such a calculation would first calculate the appropriate
“temperature” parameter for this collection of molecules, perhaps based on the
standard deviation of the distribution of speeds of all the moelcules. It would
then loop over all the molecules, for each one calculating, for that molecule’s
velocity, an appropriate probability from the maxwellian pdf, multiplying that
probability by the log of that probability, and summing the results.
Just as in the case of compressing a particular piece of English text, this calculation
might result in a value higher or lower than the entropy of the maxwellian {{sc|pdf}} at
the temperature we calculated for this system.
===connection between the mathematical ideal and the collection of molecules===
The connection between the mathematical ideal and the real world is that
# assuming the mathematical {{sc|pdf}} is chosen correctly, things happen in the real world as frequently or infrequently as the probabilities of the {{sc|pdf}}, ie sampling the properties of a large number of molecules and binning the results will give you values just like what you’d expect from the {{sc|pdf}}
# the {{sc|pdf}} for most physical situations is astonishingly peaked, meaning that physical configurations of molecules that don’t match everyday experience have ridiculously low probabilities. (Compare, for example, the statistics of some randomly chosen piece of English text. We expect it to have statistics much like that of the English language pdf, but would not be surprised to learn that, for example, this piece of text utilizes 1% more “e”‘s or 5% fewer “w”‘s than the pdf tells us are the case for the entire universe of English language text. However when dealing with, of order say 10^18 molecules that have had a chance to equilibrate, we would expect to wait much longer than the age of the universe before seeing deviations of order 1% between statistics calculated for our collection of molecules as compared to the appropriate value calculated from our {{sc|pdf}}}.)
==Reconciliation between thermodynamics and Boltzmann==
So in summary what we can say is that
# Boltzmann was right, in that the H-theorem does provide a mathematical proof of the monotonic increase of entropy AS HE DEFINED IT.
# His opponents were right in that real mechanical systems, in theory (though hardly in practice) can reduce their entropy AS THEY DEFINED IT.
# We would all be better off using a different word to distinguish the entropy of a {{sc|pdf}}, a nice, clearly defined mathematical construction, from the “entropy” of a specific mechanical system, a rather less well defined mathematical construction. (You can come up with a consistent mathematical definition for this “specific” entropy, but the result doesn’t quite mean what you probably think it means.)
# The fact that the {{sc|pdf}} entropy is (in practical terms) equal to the (per-instance) entropy is an example of a not-infrequent situation in science: two conceptually very different mathematical ideas, when not well understood, are considered to be the same thing. At first this allows for progress, but once the field is understood, the conflating of the two ideas (which usually occurs through using language inexactly) is inexcusable. Unfortunately it is a rare case indeed where textbook writers are willing to break with the past and modify their language so as to undo this confusion.
Another view of this is to bring classical thermodynamics into the mix.
One mathematically consistent way to look at the world is via statistical mechanics, utilizing
{{sc|pdf}}s and appropriately defined entropy as I have discussed.
Another mathematically consistent viewpoint is axiomatic thermodynamics which takes
concepts like temperature, entropy, and the second law as unprovable starting points.
What is not consistent, and where one gets into trouble, claiming things like “the second
law is only true on average” is where one attempts to utilize the statistical mechanics
viewpoint, but applies it not to the calculation of {{sc|pdf}}s, but to the calculation of
the average properties of some ”’specific”’ collection of molecules.
If you’re going to do this, you need to be very careful about exactly what you are claiming
is a specific property of your collection of molecules vs what is a property of the set of
of all collections of molecules. The astute reader will realize that Gibbs’ ensembles are,
essentially, a way to deal with this issue and, that, though not using my language, he is
concerned with calculating {{sc|pdf}}’s and their properties.
=Zermelo’s Criticism of the H-Theorem=
Along with the misguided attacks on the H-theorem, those that mistake the
evolution of the pdf for the evolution of the system, that we have discussed,
there is a more interesting attack, first presented by Zermelo.
The argument goes thus:
Liouville’s theorem tells us that under evolution via a Hamiltonian, the
measure of a subset of phase space does not change. It’s a short step from
this to showing that this means that the H of a mechanical system cannot
change (for any {{sc|pdf}}). After thinking about this for a few seconds, this
actually becomes quite reasonable, especially when thought of in the context
of our description of file compression above. What we have is a system with
a certain amount of uncertainty (the initial {{sc|pdf}}) along with deterministic
evolution in time which is not adding any more uncertainty.
(How can one reconcile this with Boltzmann’s proof of the H theorem?
That proof includes an expression describing the scattering after
interaction of two components, and reduces this to some sort of probabilistic
expression. If the Hamiltonian is taken as gospel, this reduction must be
invalid, and must be ignoring correlations in the components from earlier
interactions that, although apparently small, are actually essential.)
This is something of a kick in the pants, and strikes me as much
more problematic than the earlier attacks on the H theorem.
My take on this matter (and I’d love to be corrected if I am wrong) is that this
can be viewed in two ways.
* One could attempt to argue that H (or the equivalent, entropy) has not really increased because there exist fiendishly complicated correlations between the various components of the system; these correlations are, however, not in any way apparent to our eyes, and so the system appears to have become more disordered. It’s hard to keep this up, however, across all physical phenomena, for all of time. This argument is essentially claiming that the disorder of the world (and its increase) is only in our brains, not in reality.
* Alternatively one could argue that, although these correlations between components grow for some amount of time, every so often something occurs that ruins the coherence, and that it is ultimately this something that is driving the second law. In the pre-quantum past this something was called “molecular disorder”, and now we might call it “collapse of the wave function”. This is the view I espouse and is, I suspect, what most physicists would agree with if pushed. What is interesting is that so important an issue, ”’the”’ driver of entropy increase, is simply not mentioned in the same elementary textbooks that make such a mess of explaining the supposed problems with the second law.
The reader will, I trust, not have missed the remarkable similarity between
this discussion and the general problem of the evolution of quantum systems.
=Ergodicity=
A final related issue that sometimes causes confusion, though more so in the past,
is the issue of ergodicity. Ergodicity is the claim that a ”’specific”’ mechanical
system, if left for long enough, evolves through all the states of the {{sc|pdf}},
with the amount of time spent in the neighborhood of each state being
proportional to the probability associated by the {{sc|pdf}} with that neighborhood.
The ergodic assumption is, to clarify, not a part of the calculation of a {{sc|pdf}}
or how a {{sc|pdf}} evolves in time; it is useful when trying to connect the abstract
idea of a {{sc|pdf}} to the concrete reality of a specific physical system, the idea
being something like: if the ergodic hypothesis is true, then the specific
mechanical system (collection of gas molecules or whatever), evolves through
enough states over a macroscopic period of time that what our senses and our
instruments see is simply an average, moreover that average (over time, for
this specific instance of the mechanical system), is the same as the average
one calculates by averaging over the {{sc|pdf}}.
Maxwell and Boltzmann on occasion justified what they were doing on the basis
of the ergodic hypothesis.
If one is even slightly familiar with measure theory, the ergodic assumption
appears to have to be false; one is trying to map a trajectory (a single
continuous line) onto a volume, and measure theory tells us that while this
can be done, it cannot be done with a continuous mapping. The bottom line is
that mathematicians fairly easily proved that the ergodic assumption was
false. However it appears that what Maxwell and Boltzmann meant by the ergodic
assumption was not the exact ergodic assumption described above but something
that looks pretty much the same to physicists but not to mathematicians, the quasi-ergodic
assumption, which assumes that while the system will not pass through every
state in available,it will passes ”’arbitrarily close”’ to every state available.
Even this less demanding quasi-ergodic assumption is not
necessarily true for certain specific states of certain specific mechanical
systems. One can imagine, for example, a collection of billiard balls arranged
so carefully (as a lattice perhaps) in such a way that as time goes by they
continue to bounce off each other, while retaining the lattice structure,
forever. But this is clearly a somewhat pathological example.
Mathematicians have delighted in looking at this problem in ever
finer detail, asking if there are conditions one might place on either the
initial state or the collection of forces (ie the hamiltonian/lagrangian)
governing the time-evolution of the system, that will compel the system to
either be or not be quasi-ergodic. Their conclusion seems to be that
many interesting classical-mechanical systems are in fact quasi-ergodic.
This discussion is, to be honest, quite irrelevant to our real-world interest in
statistical mechanics. In the real world, what we want to know is to what
extent the averages we can calculate easily (ie averages over a {{sc|pdf}}) will
match what we measure (ie averages over some finite volume and some finite
timespan of the evolution of a specific instance of a mechanical system);
sometimes we are more ambitious and also want to know the extent of the
deviations we might expect our real world measurements to take from the {{sc|pdf}}
averages.
Ergodicity, as the mathematicians deal with it, is basically useless for this task.
* First of all it is clear that minute perturbations to the mechanical system, (for example gravitational effects of other planets) while presumably having no effect on large scale averages like density and pressure, have a significant effect on ergodicity or the lack thereof — return again to our finely balanced lattice of moving billiard balls. Ergodicity is an astonishingly brittle property.
* Secondly real world systems are, of course, quantum mechanical, and while classical mechanics is frequently a fine approximation to their behavior, it’s not at all obvious that a system that is proved to be ergodic or not as a classical system is actually such as a quantum mechanical system.
* Thirdly it is not relevant to know that over a _long enough_ duration of time a time average matches a {{sc|pdf}} average. One wants to know behavior over a specific duration of time, eg the duration that one’s experimental sensors are active.
As far as I can tell, as real world physicists, we pretty much simply
make the assumption that for any system we care about the myriad sources
of randomness in the world (minute perturbations, quantum effects, the finite
size and duration of measurements), all blend together in such a way that
{{sc|pdf}} averages are expected to match experimental results. I know of no
mathematical results that come even close to proving that this is actually
expected to be the case for real-world conditions, though it seems like the
sort of thing that could be proved if one were smart enough.
Yeah, I read all that. *rolls eyes*
Oh my, now we’re all, like, serious and stuff.
First, lets remember that there is a qualitative difference between infinite time and merely a very long time.
Now, let me present a toy model of the universe. Fans of some recent speculation may feel free to treat it as a real model of the universe.
Consider a 2-D Euclidean sheet.
Divide it into unit pixels.
Without loss of generality, let each pixel have two states, 1 and 0.
Let there be some initial time, T0, and let time advance in discrete step, dT.
Each pixel may change state according to some rules, based only on those nearby pixels in “causal contact” (ie after N steps, only pixels whose distance, s, is less than NdT away affect the pixel), for some metric on the space.
Now, assume that “a priori” if you sample a patch of this space of k pixels, the probability of any microstate is just 1/2^k
Now, there are logically 4 possibilities
a) finite space and finite time
b) finite space and infinite time
c) infinite space and finite time
d) infinite space and infinite time
So, in any of the 4 possibilities above, is it logically possible for all finite substates to be generated?
For a) and b) there are only finite number of allowed states, in either case it is possible, but not necessary that all states are accessed (given long enough a finite time)
For c) clearly you can NOT access all states;
so the only remaining option is d). For which we can ask, whether all possible finite states are reached somewhere on the sheet at some time.
The answer to that is “depends” – it depends on the sheet “initial condition” and on the rules.
Further, I would confidently claim that for such a system, in fact for any infinite system, the answer to whether all states are reached or whether some finite, or infinite, subset is never reached is formally indeterminable for most system rules for changing states (because for a lot of rules this reduces to the Turing halting problem).
So there. We may have infinite time and either finite or infinite space, but not only is it logically possible that some states are never reached, the actual answer may be unobtainable.
Any resemblance to holography or Wolfram’s speculations is a pure coincidence.
The possibility of continuous states vs discrete states is interesting; in QFT there is an assumption of asymptotice static and flat background space, this does not hold in reality, and given our actual cosmology combined with finite speed of light the question of true continuous quantum states is somewhat ill determined.
Cox has a stupid face.