Monday, September 26, 2016

On using Taylor expansions in economics

Jo Mitchell put up a tweet about a conversation with a theoretical chemist:

I'm fairly sure that the chemist's response must have been based on little other information about macroeconomics because after immersing myself in the subject this physicist doesn't see anything wrong with keeping just linear order terms.

One possibility is that the chemist misunderstood first term to mean just the zero-order polynomial (i.e. a constant), but I will take this to mean the first non-constant term (which may in fact be the quadratic one for reasons I'll go into below). For those unfamiliar with the idea, a Taylor expansion is a polynomial approximation to a more complex function, and the 'terms' are the pieces proportional to the powers of the variable. Basically, for any smooth enough function f(x) near x = a, we can say

f(≈ a) ≈ f(a) + (df/dx|x=a) (x − a) + (1/2)(d²f/dx²|x=a) (x − a)² + ...

where "F|x=a" means "F evaluated at x = a". This shows the zero order, first order and second order terms. Note that the first and second derivatives are evaluated at the point a that you are approximating the function near, and can therefore be considered constants:

f(x) ≈ c₀ + c₁ (x − a) + c₂ (x − a)² + ...

At a given order, this approximation is usually only good inside a limited region where x ≈ a. Taylor expansions are used in lots of places -- and are typically useful if the variable x stays in some neighborhood of a or x itself is small. In the case where x ≈ 0, it is technically called a Maclaurin series, which I only mention in order to post my picture of Colin Maclaurin's memorial in Greyfriar's Kirkyard in Edinburgh (again).


Anyway, a few really useful Taylor (Maclaurin) series expansions (to order ) are

sin(x) ≈  x
cos(x) ≈ 1 − x²/2
log(1+x) ≈ x - x²/2

That last one crops up in economics all the time; if you express your variable as a deviation from 100% (i.e. 1) and keep only the linear term, then the logarithm is approximately equal to that percent difference: log(100% + x%) ≈ x%. This is the basic idea behind log-linearization [pdf]. That also tells us that keeping only the linear terms isn't that big of a problem. For example, a bad recession involves a 10% shock to output or employment. The error in log(1+x) from keeping the linear term only is ~ x²/2, or about 0.1²/2 = 0.005 = 0.5%. Not bad. If you compound growth over many years, this starts to become an issue, though. For example, 2% inflation over 50 years leads to 50% error in the log-linear approximation.

In addition to not being numerically useful to go beyond leading order in macroeconomics, there are also a couple of other issues that might give you pause when using Taylor expansions.

We usually choose f(x = a) near an equilibrium in sciences

In economics, equilibrium isn't necessarily well defined (and even worse, just assumed), and the higher order terms in the Taylor expansion represent parameter space that is even further from that ill-defined equilibrium. Tread lightly in those dark corners! In physics, chemistry, and other sciences, this equilibrium is well-defined via some maximization/minimization principle (maximum entropy, minimum energy, etc) with an interior optimum and one can use that fact to your advantage. Being near an optimum means the linear term is c₁ ≈ 0, leaving only the second order term. You may think that the utility maximum in economics is a local optimum, however it is usually a utility maximum over a bounded region (e.g. budget constraint) meaning the optimum is on the edge so the linear term doesn't necessarily vanish (why I mentioned the interior optimum above).

Also, in the sciences, you degrees of freedom might change when you move away from the linear zone near x ≈ a. I am under the impression that rational agents are really only valid in a narrow region near macroeconomic equilibrium. In an ideal gas, the rotational or vibrational modes of your molecules might become important, or the thermal wavelength may become on the order of the deBroglie wavelength (quantum effects become important).

The function f(x) is usually ad hoc in economics

The function f(x) in macroeconomics is usually some guess (ansatz) like a Cobb-Douglas function or CES function. Taylor expanding an ad hoc function is really just choosing the c₁'s and  c₂'s to be some arbitrary parameters. This contrasts with the case in physics, chemistry, and other sciences where the function f(x) is usually not ad hoc (e.g. expanding the relativistic energy gives you the classical kinetic energy term at second order in v/c), or you are near an equilibrium in which case  c₁ ≈ 0 and adding c₂ doesn't lead to your problem having more parameters than a general linear case.

It makes the identification problem even worse in economics

Identification at linear order already has versions of the coefficient c₁ for m macroeconomic observables (an m×m matrix). Going to second order adds another parameters (the c₂'s). As mentioned by Paul Romer in the case of adding expectations, adding second order (nonlinear) terms makes the identification problem twice as bad (because the functions are ad hoc as mentioned above). The reason you have so many parameters is that the original m equations don't come from some theoretical framework like you have in physics and chemistry (where symmetries or conservation laws constrain the possible parameters).

And last but not least ...

Macroeconomics doesn't have a working linear theory yet

Some people might say there isn't a working linear theory yet because those second order terms are important. However given that a major recession is a 10% fall in RGDP growth this seems unlikely. In fact, RGDP per capita is a a fairly straight line (log-linear). There are some exceptions, but they are not frequent (e.g. France and Japan transitioned from one log-linear path to a different one after WWII). That is to say, unless you are dealing with a WWII-level disruption, the data is pretty (log-)linear. Once we get that down, we can start to try to understand a more complicated nonlinear theory.

...

Anyway, that is why I think it's fine to keep only those first terms. It has nothing to do with the mathematics, but rather the theoretical and empirical state of macroeconomics. The field still needs its linear training wheels, so let's not laugh.

Sunday, September 25, 2016

Krugman's Keynesians and information equilibrium

Keynes' General Theory (image from Bauman Rare Books)


I was reading Robert Waldmann on macroeconomic puzzles (a very good post) and it inspired me to see how one should understand Keynes in the light of information transfer economics (see link for definitions used below). Waldmann sets up an understanding in terms of Krugman's (see here [pdf] and I swear there was a recent blog post Krugman wrote about it again but I can't find it).  Anyway, Waldmann says (and quotes Paul Krugman from the linked pdf):
As usual, the challenge to the young macroeconomist goes back to Keynes. The General Theory of Employment Interest and Money begins with [B]ook 1 containing models which are not difficult enough (aside from the fact that they were explained clearly and in detail a year later by Hicks). It also includes Chapter 12 on long term expectations (beauty contests and all that) clearly presenting problems too hard for Keynes. ... Here (as almost always) I am following Krugman 
[begin Krugman] I’d divide Keynes readers into two types: Chapter 12ers and Book 1ers. Chapter 12 is, of course, the wonderful, brilliant chapter on long-term expectations, with its acute observations on investor psychology, its analogies to beauty contests, and more. Its essential message is that investment decisions must be made in the face of radical uncertainty to which there is no rational answer, and that the conventions men use to pretend that they know what they are doing are subject to occasional drastic revisions, giving rise to economic instability. What Chapter 12ers insist is that this is the real message of Keynes, ... [end Krugman] 
The lack of puzzles is due to the fact that the puzzle addressed in book 1 is solved in book 1 and the problems posed in [chapter 12] declare themselves to have "no rational answer".


This kind of encapsulates the information transfer picture. Book 1 in this view is a particular information equilibrium model (I started writing it explicitly here, but Hicks version as the IS-LM model can be seen as information equilibrium as well). For example, Keynes' Postulate I states that:
I. The wage is equal to the marginal product of labour

This is the information equilibrium relationship (W/P) : Y ⇄ L where Y is real output, W is nominal wage, P is the price level and L is the labor supply. The second postulate could be tackled in terms of the information equilibrium take on utility. The other relationships can be understood in terms of the information equilibrium IS-LM model. I will eventually try to put together the whole of Book 1 in terms of information equilibrium.

Chapter 12, on the other hand, is non-ideal information transfer. As Waldmann put it "Keynes wrote that there will be manias, panics, and crashes no matter what policy makers do ...". I find it interesting that what Keynes was discussing in Chapter 12 is the state of long-term expectation and that one way to think about deviations from information equilibrium is as expectations with varying degrees of accuracy. In general, non-ideal information transfer gives us a way to look at various economic shocks. Since it is not information equilibrium (from which we could see rational agents as emergent as long as we stay near equilibrium), non-ideal information transfer represents (as Krugman says above) "radical uncertainty for which there is no rational answer". There may eventually be an answer, but it will come from psychology and sociology.

The information transfer framework allows us to join together Krugman's "Book 1 Keynesians" and his "Chapter 12 Keynesians" in a coherent whole. Both are 'correct' (at least in terms of the IT framework) -- under the right conditions.

Are maximum entropy models the only possible models with rational expectations?

This post is an incomplete speculative synthesis of several previous posts:
The question is whether you can have any expectation of a future model equilibrium besides the model-consistent (i.e. rational) expectation a maximum entropy model? Or another way: Is the information loss required for the time translation (inverse Koopman) Et operator [2] the same information loss in reaching the maximum entropy state?

This is true for a normal distribution. The information in the initial distribution besides the mean and variance is exactly the information lost with the (inverse) Et operator.

Consider the KL divergence between an arbitrary distribution Q of given mean and variance and a normal distribution N(μ,σ): DKL(N||Q) ≡ ΔI. Propagation of Q into the future (via inverse of Et) leads to normal distribution (central limit theorem), resulting in information loss ΔI. Additionally, the normal distribution is maximum entropy distribution constrained to have a given mean and variance, so the approach to maximum entropy (disappearance of the initial condition Q information) will also yield a normal distribution, and therefore the same information loss DKL(N||Q) ≡ ΔI.

Does this always work out? We'd like to ask the question: Is whether every universal distribution (with constraint C) is also a maximum entropy distribution (with constraint C)?

One issue is that the uniform distribution is the maximum entropy distribution for a bounded random variable, but not universal in the sense of the central limit theorem (samples from the universal distribution that propagate in time via Et become a normal distribution). However, any distribution can be related to a uniform distribution (probability integral transform), so maybe this issue isn't as problematic as it first appears. I'll see if I can work out a proof**.

For right now, this post is just capturing my half-baked thinking. Here is my intuition in terms of economics. The key point is that agents must expect the information loss (in any model) because otherwise the operator Eis ill-defined (the inverse of a non-invertible operator [2]). Additionally, information loss (about the initial conditions) is exactly what happens when a system drifts toward is entropy maximum. Purchasing goods and services (i.e. transactions that propagate an economy into the future) consumes to the information in the prices (one way to think about the efficient markets hypothesis [4]), therefore the economic model must be a maximum entropy model [1] in order for the model to contain Et operators (expectations) and propagate the system towards the future equilibrium given initial conditions (losing information as both entropy maximizing and via the E operator).

This will fail if there is a failure of information equilibrium (non-ideal information transfer) because then you're not in an equilibrium (and therefore not moving towards a maximum entropy state) which breaks the connection between the information loss in the time translation and the entropy maximizing process.

This may seem like a word salad, but I swear it contains some useful intuition.

...

Footnotes:

** I have a nagging feeling is that what you end up with is something where the information loss in the entropy maximizing process is proportional to the information loss in the expectations/time translation (into the future) process -- resulting in a condition that is precisely an information equilibrium condition specific to the distributions involved (for uniform distributions, you end up with the basic information equilibrium equation of Fielitz and Borchardt) with the information transfer index relating to the proportionality. I'd probably have to consider the propagation into the future as an information equilibrium relationship between the future and the present [3].

Friday, September 23, 2016

Basic definitions in information transfer economics (reference post)


I thought it might be a good idea to put a bunch of definitions I use frequently into a single reference post. All of this stuff is discussed in my paper as well. Let's start with two macroeconomic observables $A$ and $B$.

Information

This is information entropy in the Shannon sense, not "meaningful knowledge" like knowing the fundamental theorem of calculus or how to play bridge. See here for an introduction to information theory specific to this blog.

Information equilibrium

The notation $p : A \rightleftarrows B$ represents an information equilibrium (IE) relationship between $A$ and $B$ with price $p$. I also refer to this as a market (it should be thought of as the $A$ and $B$ market, with $p$ being the $B$ price of $A$). It stands for the differential equation

$$
p \equiv \frac{dA}{dB} = k \; \frac{A}{B}
$$

which is derived from assuming the fluctuations in the information entropy of two uniform distributions have equal information content. These fluctuations register as fluctuations in the "price" $p$. The differential equation has the solution

$$
\begin{align}
A & =A_{ref} \; \left( \frac{B}{B_{ref}} \right)^{k}\\
p & = \frac{A_{ref}}{B_{ref}} \; \left( \frac{B}{B_{ref}} \right)^{k-1}
\end{align}
$$

Information transfer index

Frequently shortened to IT index, the parameter $k$ in the information equilibrium relationship above, or the information transfer (IT) relationship below.

Non-ideal information transfer

The notation $p : A \rightarrow B$ represents an information transfer (IT) relationship between $A$ and $B$ with price $p$. It is basically an information equilibrium relationship with information loss such that the information in fluctuations in $A$ are only partially registered in changes in $B$. The differential equation becomes a differential inequality

$$
p \equiv \frac{dA}{dB} \leq k \; \frac{A}{B}
$$

Via Gronwall's inequality (see here), the information equilibrium relationship defined above is a bound on this information transfer relationship. The observed price $p^{*}$ will fall below the information equilibrium price, $p^{*} \leq p$. The same applies to the observable $A$; we will see the observed value fall below the information equilibrium value, $A^{*} \leq A$.

Partition function approach

The information equilibrium partition function approach starts with an ensemble of markets $p_{i} : A_{i} \rightleftarrows B$ with common factor of production $B$ and defines the partition function

$$
Z = \sum_{i} e^{-\beta k_{i}}
$$

where $\beta \equiv \log (b + 1)$ and $b \equiv (B - B_{ref})/B_{ref}$. The normal application of the partition function in e.g. thermodynamics follows. It is derived from assuming a maximum entropy distribution of $B$ among the markets $A$ where the macrostate (the collection of all the markets $\{ A_{i}\}$) has a well defined ensemble average $\langle k \rangle$ (see here for more details).

Entropic force

Entropic forces are essentially the same in information transfer economics as in thermodynamics. They are "emergent" forces that do not have a description in terms of individual agents (e.g. atoms in thermodynamics). They arise from a tendency to maintain or achieve a particular maximum entropy distribution, or to keep two distributions in information equilibrium.

Thursday, September 22, 2016

Balanced growth, maximum entropy, and partition functions



I invariably catch a cold after work trips, so I'm at home today. At least it gives me the opportunity to present something I worked out on the flight after reading Dietrich Vollrath [1] about growth and productivity (a sign of good bloggers to me is that their posts inspire new work or a different way of understanding -- this one did both). Previously I had worked out the partition function approach [2] by what I called an elaborate analogy with thermodynamics. I'd like to present it this time as a more rigorous set of assumptions.

Let's say I have a series of markets with a single common production factor $B$ (don't worry too much about that -- it generalizes to multiple factors): $A_{i} \rightleftarrows B$ with IT indices $k_{i}$. This yields the general behavior:

$$
\begin{align}
A_{1} & \sim B^{k_{1}}\\
A_{2} & \sim B^{k_{2}}\\
A_{3} & \sim B^{k_{3}}\\
& \text{...}
\end{align}
$$

If $B$ grows at some rate $\gamma$, then $A_{i}$ grows at $k_{i} \gamma$. If this continued, then the market with the highest $k_{i}$ would eventually dominate the entire economy. The partition function approach was intended to bring this closer to reality by re-imaging the economy as a ensemble of changing $k$-states, where no market stayed in a particular $k$-state long enough to dominate the economy. See here [3] for a version this in terms of the labor market (where $k$ is instead productivity $p$ -- this was what I was thinking about reading [1] above). In [3], there is a partition function of the form

$$
Z = \sum_{i} e^{- \beta k_{i}}
$$

where $\beta \equiv \log (1+b)$ and $b \equiv (B - B_{ref})/B_{ref}$ or generally, some function of our factor of production $B$ (the specific form was worked out in [2] above and here [4]). The way to think about this is that the Gibbs measure used for that partition function is the maximum entropy probability distribution where the ensemble has some fixed (constrained) value of $\langle k \rangle$. In physics, that fixed value is frequently the energy, but can also be particle number, or some other thermodynamic variable. The variable $\beta$ represents the Lagrange multiplier of the constrained problem. In physics, this Lagrange multiplier is the temperature.

In economics, we should therefore see this partition function being built from the maximum entropy distribution of an ensemble of markets where the macroeconomy has some well-defined ensemble average growth rate relative to the growth rate of the factor of production. That is to say the growth rate of the collection of markets (i.e. the macroeconomy) $\{ A_{i}\}$ is $\gamma \langle k \rangle $ where the angle brackets indicate ensemble average. This is not to say it never changes or our measure in terms of GDP is a good measure of it, just that "the growth rate of the economy" is something we can reasonably talk about. Using the IT model, it also means the price level growth (i.e. inflation) is similarly well-defined -- again, not necessarily our measure, just that it exists -- since it goes as $\gamma \langle k - 1 \rangle $.

Economists have tried to capture this general concept in terms of equilibrium balanced (or steady state) growth (e.g. here, here, or here and links therein). The tendency, however, has been  to assert everything must grow at the same rate, else one piece of the economy dominate in the long run as mentioned above. If we look at the physics analogy, this would be like asserting every atom in an ideal gas would have to have the same energy in order for the system as a whole to have a well-defined energy (that the macro system is in an energy eigenstate). Steve Keen made this invalid argument in a lecture awhile ago (see here, and I discussed here how this definition of equilibrium doesn't represent reality -- sort of like defining swans as "beings from Neptune", and then complaining ornithologists are full of it because there aren't any swans).

While individual markets might be in a "growth eigenstate" of some factor of production, the macroeconomy as a whole isn't (and doesn't have to be).

[A good analogy here is that previously economists viewed economic growth as a laser (photons in the same energy eigenstate, markets in the same growth state), but the present view is as a flashlight (blackbody thermal radiation, markets in an ensemble of growth states).]

*  *  *

There are a few additional things we can glean from this way of looking at the macroeconomy. First, the Gibbs measure as a probability distribution says that the likelihood of occupying a high $k$ state is lower for higher $k$ and decreases with increasing factors of production. This is a more rigorous way of putting my statement that as economies grow, there are more configurations where it is made up of many low growth markets than a few high growth markets so it is more likely to be found in the former configuration. This could be what is behind e.g. secular stagnation -- i.e. no reason for lower growth, just greater likelihood.

Second, the (maximum entropy) partition function approach is easily extendable to multiple factors of production (Lagrange multipliers) and constrained macro observables. In physics, you end up with "potentials" like the Gibbs free energy or Helmholtz free energy depending on how you look at the problem. I already started down this path, but another take-away is that just like in physics you might have different macroeconomic models ("economic potentials") depending on how you look at the problem (e.g. what constraints you set, or which macro observables are well-defined).

Third, those potentials are made up of "emergent" concepts in physics like entropy, temperature, and pressure (and entropic forces for each term in the potential) that have no microscopic description. In economics, the various entropic forces -- described in terms of supply and demand -- arising from the terms in the potentials may not have a valid description in terms of individual agents. I've already considered sticky wages (and prices) to be an example (not observed individually). Additionally some of the variables themselves are emergent. Temperature makes no sense for an atom. An atom has various kinds of energy (rotational, center of mass kinetic, molecular bond potential) that contribute to understanding temperature. Likewise, maybe "money" makes no sense for an individual agent. Maybe an agent has various kinds of assets (checking accounts, currency, and treasury bonds) that contribute to understanding "money". This would also apply to inflation (i.e. it doesn't exist at the individual agent level) -- as noted above, in the IT model a well-defined growth rate of the macroeconomy implies a well defined inflation rate.

*  *  *

Previously I had used the partition function approach using money as the factor of production (and information transfer indices as the nominal growth states) in [2] above and using labor as the factor of production (where information transfer indices represent labor productivity) in [3] above. These two models see falling growth and falling productivity over time, respectively. This post makes the specific assumptions going into the model more explicit -- a maximum entropy distribution of the factor of production (money or labor) among the growth states (growth or productivity states) with the assumption that the macroeconomic state has a well-defined growth rate.

Monday, September 19, 2016

Spreadsheets really do make mistakes

Computers are stupid; they do exactly what we tell them to do. Excel spreadsheets don't make mistakes. I do. Like when I accidentally overwrote a number in a cell a few weeks ago. And that's not just in Excel. I have told computers to do stupid things with data in SAS, Visual Basic, Stata, Rats, R, etc. ... See a pattern? It's me. It's us. Not our algorithms.
That's from Claudia Sahm. Actually if you follow the link to Dave Giles blog (and then the first link Giles says to read), you will find out that: no, spreadsheets really do make mistakes. Here's an abstract from a paper Giles cites:
This paper discusses the numerical precision of five spreadsheets (Calc, Excel, Gnumeric, NeoOffice and Oleo) running on two hardware platforms (i386 and amd64) and on three operating systems (Windows Vista, Ubuntu Intrepid and Mac OS Leopard). The methodology consists of checking the number of correct significant digits returned by each spreadsheet when computing the sample mean, standard deviation, first-order autocorrelation, F statistic in ANOVA tests, linear and nonlinear regression and distribution functions. A discussion about the algorithms for pseudorandom number generation provided by these platforms is also conducted. We conclude that there is no safe choice among the spreadsheets here assessed: they all fail in nonlinear regression and they are not suited for Monte Carlo experiments.
This is not input errors, but rather actual errors in the software. Now it is true that the computers were told to do stupid things by the programmers of Excel, &c and as a general philosophy, human error is the source of lots of problems.

But really -- Excel computes things incorrectly even if you program it perfectly.

Now it is probably millions of times more likely that a given error you find is an error you yourself made, but every system out there has issues. Even Mathematica, my go-to math software (as you can probably tell from my graphs), has some bugs [pdf].

As always, the solution is to be alert and test results.

Of phlogiston and frameworks

It's always easier to criticize than to praise, so I first posted about the problems I had with Paul Romer's critique of macroeconomics. My criticism so far was twofold:

  1. The string theory analogy isn't appropriate. String theory is a framework that is a natural extension of the wildly successful quantum field theory framework. The DSGE framework is not based on empirical success but rather the wildly inaccurate rational utility maximizing agent framework. This means even the sociological implications of the analogy aren't appropriate. Specifically, citing Smolin's argument string theory should abandon their (single-parameter) thus far fruitless and approach with no experimental validation and divert resources to Smolin's (single-parameter) thus far fruitless and approach with no experimental validation (loop quantum gravity) as a parable of DSGE macro makes little sense. DSGE hasn't just been fruitless, but is either empirically wrong or suffers from identification problems (too many parameters to be wrong), and is built of pieces that themselves aren't empirically accurate. Or more simply, the lack of empirical grounding of string theory is different from DSGE macro. Neither is empirically successful, but string theory is built from empirically successful pieces and just applies to regimes in physics (Planck scale) where we can't measure data. [1], [2], [3]
  2. Romer claims macro practitioners Lucas, Sargent, and Prescott aren't being scientific because of what Romer sees an obvious natural experiment. However the obviousness of this natural experiment is itself model-dependent (I show two additional, but different interpretations of the data); the scientific way to have dealt with this is to say that Lucas, Sargent, and Prescott were should be much more skeptical of their own claims because the data isn't terribly informative. [4]

This could be summarized as simply saying there is no framework for macroeconomics. Frameworks, like string theory and quantum field theory, do two things: they tell you how to start looking at a problem, and they represent a shorthand for capturing the empirical successes of the field. My first criticism above says that string theory is a framework, so you can't make an analogy with macro which doesn't have a framework. My second criticism above says that until you have a working framework, you should be skeptical of any "natural experiments" because the interpretation of the natural experiments change with the framework.

In spite of this, for the most part I liked Romer's take on macroeconomics and I think he delivered some powerful arguments.

The first one is that Romer says macroeconomists (or at least Lucas, Sargent, and Prescott) keep reinventing phlogiston -- imaginary fields or forces that produce effects. I am a bit less harsh on this particular point. Since macroeconomics is in a nascent state, it's going to invent lots of phlogiston before it hits on the right one. Energy and momentum in physics started off as phlogiston. They gradually became useful concepts over time.

However as Romer points out these concepts have become impervious to measurement whether through theoretical evasion or simply ignoring the data. This is how phlogiston hypotheses turn into derpy phlogiston priors. Or as Romer puts it: post-real models.

Second, after illustrating the that the parameters in macroeconomic models scale as where m is the number of equations, he goes on to tell us that expectations make the number of parameters scale as 2m²:
Adding Expectations Makes the Identification Problem Twice as Bad ... So allowing for the possibility that expectations influence behavior makes the identification problem at least twice as bad. This may be part of what Sims (1980) had in mind when he wrote, "It is my view, however, that rational expectations is more deeply subversive of identification than has yet been recognized."
My own view is that expectations create a far more serious problem than too many parameters. However, Romer is illustrating a general principle in physics. Those parameters -- take the form of an m × m matrix. Because there are no established theoretical economic principles to reduce this number, you have to deal with all parameters. In physics, you can have principles like rotational symmetry or Lorentz invariance. For example, G = 8πκ T without general covariance could have had 16 parameters (a 4 × 4 matrix) for even small perturbations around equilibrium; instead it has one.

Because you have so many parameters (2m²) and so few observables (m), we have a case where many different sets of parameter values are consistent with a given set of observations. This is the identification problem in a nutshell. It's basically a dimensional argument -- a mapping from a 2m²-dimensional space to an m-dimensional space is going to have large subsets of that 2m²-dimensional space mapping to the same point in the m-dimensional space.

In physics, as noted above, this problem is solved by saying those subsets are actually equivalence classes (established by e.g. symmetry principle, gauge invariance, or general covariance -- theoretical frameworks). Economics has no such theoretical principles (yet), so per Romer it ends up relying on FWUTVs (which makes me thing of ROUSes): Facts With Uncertain Truth Value. These take several forms in Romer's paper: assumption, deduction (from assumptions), and obfuscation.

*  *  *

I think Romer's criticisms are serious, but as Cameron Murray said on Twitter: "It’s not a valid argument in economics [until] a high priest says it." That is to say these criticisms have existed for a long time. The real question is: how will economics deal with them?

My own approach is the information equilibrium framework.

In the same way string theory is based on the successful framework of quantum field theory, information equilibrium is based on the successful framework of information theory. It encodes the "empirical success" of supply and demand, promotes the idea that relative prices are all that matters (the scale invariance of economics) to a general symmetry principle, and is just a minor generalization (based on that symmetry principle) of an equation that appears in Irving Fisher's doctoral thesis.

Information equilibrium is a kind of gauge invariance relating several different parameterizations of the same model to each other. For example, any system of information equilibrium relationships that express a relationship between a set of observables {X, Y, Z ...} and some phlogiston variable U can be rewritten without the U. Originally, that U represented utility, but it can represent anything in Romer's menagerie of phlogiston. (One way to think about this is that information equilibrium builds up models out of pairwise relationships between observables or Cobb-Douglas functions of observables, limiting the possible relationships in that m × m matrix).

It's not necessarily completely at odds with existing economic theory either. For example, I was able to build a simple New Keynesian DSGE model out of information equilibrium relationships. Interestingly it has fewer parameters and a couple of its components are not empirically accurate. Lots of other pieces of existing economic theory also fit in the information equilibrium framework.

There is still a kind of phlogiston that exists in the information transfer framework, but it's good phlogiston. Let me explain ...

When information equilibrium relationships fail to describe the data, it could be that information is failing to be transferred completely from source to destination (non-ideal information transfer) -- i.e. there is information loss. This lost information is very much like phlogiston. It is unknown and it explains deviations from observations. However, three principles make it a good kind of phlogiston:

  • Deviations are always in one direction. That is to say an information equilibrium model is a bound on the related information transfer model.
  • Systems that are frequently far from information equilibrium will have not in general be good fits to information equilibrium relationships. That is to say unless the system is close to information equilibrium, you probably wouldn't posit the relationship in the first place. A "mostly phlogiston" (mostly lost information) relationship would be contentless.
  • Information loss is always due to correlations among agents. This makes the study of information loss a subject of sociology, not economics.

Phlogiston can be fine -- as long as there's a framework.

Saturday, September 17, 2016

Paul Romer on the Volcker disinflation

I mentioned I was going take a look at Paul Romer's new article on macroeconomics without regard to the string theory analogy; this is one of those posts. The focus here will be on Romer's discussion of the Volcker disinflation. Here is Romer's story:
If you want a clean test of the claim that monetary policy does not matter, the Volcker deflation is the episode to consider. Recall that the Federal Reserve has direct control over the monetary base, which is equal to currency plus bank reserves. The Fed can change the base by buying or selling securities. ... When one bank borrows reserves from another, it pays the nominal federal funds rate. If the Fed makes reserves scarce, this rate goes up. The best indicator of monetary policy is the real federal funds rate – the nominal rate minus the inflation rate. This real rate was higher during Volcker’s term as Chairman of the Fed than at any other time in the post-war era. ... [an] increase in the real fed funds rate, from roughly zero to about 5%, that followed soon [after Volcker took office].



... The rate of inflation fell, either because the combination of higher unemployment and a bigger output gap caused it to fall or because the Fed’s actions changed expectations. 
... If the Fed can cause a 500 basis point change in interest rates, it is absurd to wonder if monetary policy is important.
This represents a model of what happened between the inflation of the 1970s and the so-called Great Moderation. First, let me say that the very good information equilibrium (IE) model of interest rates (see my paper) says that making reserves more scarce would generally raise the short term interest rates toward the long term interest rate. However since the monetary base is not related to inflation (see my paper as well as here -- the true clean test of macroeconomics), this would have no impact on the price level (except through non-ideal information transfer, such as expectations based on the wrong mental model of the macroeconomy).

However, we'll ignore that for now, and rather show that one's view of the Volcker disinflation depends on how you see the data. Here's how Romer sees the inflation data:


Adding the uncertainty in the fit shows that the pre-1980s segment is much more uncertain than the post-1980s segment. Let's zoom out and show the monetary information transfer (IT)/information equilibrium model of the post-war seasonally adjusted core CPI inflation:


The IT model (gray) sees the two spikes in the 70s as something outside the monetary model -- in fact, they correspond with the oil shocks in 1973 and 1979. Volcker becomes the Fed chair in August of 1979, so you could imagine confusing monetary policy and real shocks to the economy. Romer's use of year over year inflation and low resolution makes his picture of pre-Volcker inflation look more plausible. If we switch to instantaneous inflation (continuously compounded annual rate of change), we can see the oil shock picture (dashed black spikes) look much sharper:


This paints a picture of the disinflation coming nearly a decade earlier than Volcker as well as having a different cause: the relative size of the income/inflation effect to the liquidity effect of monetary expansion:


In the language of the IT model, this is where the information transfer index changes from the high inflation limit (k >> 1) to the low inflation limit (k ~ 1). One of the major effects is that the trend in interest rates changes from generally rising to generally falling.

Now what about Romer's claim of a rise in (CPI deflated) real interest rates? We'll draw a representation of Romer's model -- a 500 basis point rise in the real rate after Volcker takes over:


Let's zoom out, increase resolution, and add some error bands on this picture:


This seems like a plausible interpretation, but if we look at the IT model (shown in gray and black) the picture is very different:


There's no sharp transition. In fact, zooming out even more, the relationship between inflation and the real interest rate is actually fairly complicated:


Paul Volcker's term does coincide with the real interest rate becoming approximately equal to the inflation rate (r ≈ π), but this has no particular meaning in the IT model (as far as I know).

Now the monetary IT model isn't the only possible model and in fact a completely different model (based on the Solow model) also works very well and tells a completely different story from Romer's picture or the monetary IT model. I called it the "quantity theory of labor and capital" (QTLK) after the IT quantity theory of labor (a reductio ad absurdum model) plus the Solow model. Here's how this model compared to Romer's lines:


And at higher resolution (continuously compounded annual rate of change evaluated monthly):


In the QTLK, the primary reason behind the spikes in core CPI inflation is the growth in the civilian labor force (CLF) relative to the population (the change in the civilian participation rate), possibly due to more women and minorities entering the labor force in the 1970s (and baby boomers, see Steve Randy Waldman H/T Steve Roth). Of course, causality could be interpreted the other way where inflation draws people into the labor force to pay their bills. However, the best fit lag shows increases in the CLF precede inflation increases by about 4 months. Of course, one could then say "expected inflation" and hold on to one's prior that inflation is always and everywhere a monetary phenomenon ...

The QTLK (combined with the ISLM model for the interest rates) shows a similar picture to the monetary IT model for real interest rates:


*  *  *

In summary, the picture Romer paints as a clean test of monetary policy impacting inflation is less clean in other models that encompass the entire post-war period of the US (and other countries) that see the spike in inflation as either a real supply shock (oil) or a sociological effect (women and minorities suffering from less discrimination, baby boomer demographic wave). This is not to say these any of these model is the "truth", only that they are all plausible and therefore we can't say there is a clean test of monetary policy. (In contrast, this is a clean test of monetary policy and it shows the monetary base reserves/short term interest rates have no impact on inflation.)

But the real issue here is that Romer seems to think that one particular approach to macroeconomics is unscientific because he thinks he sees a clean test. However Romer is suffering from the same problem he claims Lucas and Prescott are -- too strong of priors in the presence of uninformative data.

...

Update 22 September 2016

In a recent post, Romer gives us some more evidence that he is acting from a strong prior rather than a specific model and comparison with the data. He says (in the course of an open letter to a prospective macro student):
To learn about a department, visit and ask macroeconomists you meet “honestly, what do you think was the cause of the recessions of 1980 and 1982.” If they say anything other than “Paul Volcker caused them to bring inflation down,” treat this as at least a yellow caution flag.

This is a rephrasing of Romer's original model above where Paul Volcker (and the Fed) raised the real interest rate, which caused a recession, which brought inflation down. Romer does not see a difference between:

  • Volcker caused the recession which brought inflation down
  • Volcker raised real interest rates, which caused a recession, which brought inflation down

However, in the information transfer model, this new model from Romer may actually be true, while the original model is not. The story is that the Fed coordinated the economy, causing a fall in economic entropy (proportional to nominal output) when agents took correlated actions.

There is also the possibility that this "avalanche" (recession) might not be able to happen if there is insufficient "snow" (nominal output) build up (see here) -- or the aircraft cannot be stalled if it isn't going slow enough. Essentially, the Fed can't coordinate a sell-off there aren't enough assets out there that can be coordinated by the Fed into panicked selling.

In any case, the two statements above are only the same if you accept the prior that monetary policy caused the recession and inflation at the time was due to monetary policy (and not demographic or "real" factors like oil).

In a sense, the priors determine the entire model! I think this is part of a more general problem in economics -- that there are no real frameworks, only collections of priors. And one of the biggest priors is defining what a recession is. Your framework is supposed to help you figure out what recessions are, not define what recessions are. Asking a macroeconomist what caused the recessions of 1980 and 1982 should evoke the response: We do not know what a recession is, and therefore the causes are inherently ambiguous. Romer, in his advice above, is essentially asking the prospective student to believe in a different article of faith (which at least is more common from what I've gleaned from surveys) in the place of the RBC article of faith.

Friday, September 16, 2016

Macro is not like string theory, part III (Equations!)

I thought of another way to drive home the point that DSGE macro is not like string theory. It's essentially another way of representing the Venn diagram in that post, but this time in terms of equations. Basically, string theory is built up from a bunch of very successful pieces of physics in a natural way. A DSGE model is built of a bunch of pieces that haven't been empirically validated or worse appear to be wrong. I show the path integral with the Polyakov action for the simplest bosonic string theory (that I grabbed from here [pdf]), and compare it to the simple three-equation New Keynesian DSGE model (that I grabbed from here [pdf] because it showed the equations in compact form). I had to stop adding problems to the DSGE model (e.g. intertemporal optimization).


...

Update 18 September 2016

Let me call back to here:

To some degree, DSGE passes these tests [of being a framework]. Real Business Cycle (RBC) models as well as New Keynesian (NK) models can be expressed as DSGE models. But there is one test that DSGE fails (or at least fails to the best of my knowledge):
Theoretical frameworks organize well-established empirical and theoretical results. Using that framework allows your model to be consistent with all of that prior art.
The string theory model encapsulated prior empirical success; the DSGE model does not. Frameworks are a kind of shorthand for testing new theories against the existing empirical data. When the string theory Lagrangian uses special relativity or the path integral, it is going to be consistent with all the experimental tests of quantum mechanics and relativity.

...

PS I put together that NK DSGE model as a series of information equilibrium relationships (and showed that there are better, more empirically accurate versions).

Thursday, September 15, 2016

Macro is not like string theory, part II (my personal take)

I wanted to add a few notes to this post (update 16 Sep 2016: which I followed up with some graphics here) -- some things that fell through the cracks.

In my experience as a grad student, I never felt that string theory was some kind of in-group or religious cult. It is true that particle theory group at the University of Washington was definitely more QCD-focused (Stephen Ellis was on my committee). Ann Nelson was the center of beyond the standard model physics, and the only string theorist (Andreas Karch) was hired while I was there (I took his string theory class that was being offered for the first time, with some guest lectures by Joseph Polchinski). This is to say my experience with string theory was mostly through visitors like Polchinski and Edward Witten, reading the literature, and talking to other grad students about it. Maybe it is different at MIT, Harvard, and Princeton.

There was a lot of 'synergy' between the particle theory group and the nuclear theory group regarding QCD (lattice, strings, nonperturbative methods) and the Institute for Nuclear Theory (INT) even put on a great workshop about string theory and QCD (using AdS/CFT and holography to understand non-perturbative QCD) while I was there. I was a grad student in the nuclear theory group (my advisor was Jerry Miller -- who got some press in the NYTimes while I was there), but my thesis touched on lattice QCD and nonperturbative methods like the "large N" expansion of QCD and topological solutions. String theorist Witten wrote the definitive paper on using large N for baryons (which I cited), and it should also be noted that another large N expansion in the SYK model has recently been connected with holography.

I had a lot of respect for string theorists and grad students studying string theory. It is hard stuff. There wasn't a lot of money for particle/high energy theory students so they ended up with higher teaching/grading loads for a longer time. In nuclear theory, I only had to teach classes for the first couple years before I got a full support as a research assistant (and summer support through the INT). But string theory wasn't the only high-status "hard" subject you could use to show you were smart. String theory, esoteric field theories, and general relativity all involved a lot of similar math skills that would fall under the heading of topology and differential geometry of manifolds. That stuff is hard. And that's why black holes and strings have contributed to pure math in those areas.

To bring this back to the comparison between macro and string theory:
  • Cutting edge physics has always been associated with cutting edge math, from Newton and calculus to Einstein and differential geometry. String theory methods were used to prove the Poincare conjecture. Macro math isn't very cutting edge mathematically and it used to be ... more verbose. Therefore anything that references math in an analogy between macro and string theory is way off base.
  • In physics, string theory doesn't have a monopoly on the cutting edge, signalling intelligence, philosophical implications, or being a "rockstar". Most famous physicists are of the "I study black holes" variety (general relativity). You probably had to click the links above to know who Polchinski and Witten were. However I could probably reference Hawking without even a first name and you know who I'm talking about. Even Brian Greene isn't that famous. However as Noah Smith points out, macro really is the "glamour league" of economics. Paul Krugman was primarily in international trade, but he's popular for his macro.
  • There isn't a bright line between string theory and non-string theory. The large N expansion for baryons (N quarks) and the SYK model (N Majorana fermions) have connections to string theory. The conformal field theory in the original AdS/CFT correspondence is very similar to QCD (which is approximately conformal). The differential geometry of string theory and general relativity are directly related. Witten's big contribution was connecting 11-dimensional supersymmetric gravity to M-theory. Is Raphael Bousso a string theorist? Wikipedia says so, but his big contribution (covariant entropy bound) is more on the general relativity side and he studied under Hawking. In contrast in economics, you're either dealing with "the economy" as a whole in macro, or you're not.