Economic analysis is about identifying systematic patterns in data that may confirm or deny theory, and which can help decisionmakers see the best way forward. The key word is ‘systematic‘. In the real world (or the real business world) there is variation everywhere. Somebody is seen to raise price and sell more, to cut price and sell less, to increase market share and go bankrupt, to issue loads of debt and get rich…. everything is possible and can be documented on a case by case basis. The question is whether this or that case or data set provides generalizable insight, or is just an interesting situation. Is there something generalizable in what we see, to guide our future actions, or is it just circumstantially interesting?
The isolation of common, systematic patterns of behavior and building a body theory to guide thinking is what the practice of economics is about. The theory of demand is a good example, even though the simple demand curve is built with unrealistic simplifying assumptions. but, so much of what is at the essence of firm managerial behavior is captured by the simple demand concept— firms invest a lot trying to shift out and to tilt their demand curve— advertising, loyalty building investments, product differentiation, eliminating rivals, —all tend to shift and tilt demand so that profits can be enhanced by creating the power to raise price and increase revenue.
So, we hear that relations with Cuba will soon be normalized, or that the price of oil continues to fall, or see a report that shows that one of our stores is doing much better than the others. What are these data telling us that we can use?
Seeing VARIATION is a managers opportunity to learn more about WHY things are happening, and to move beyond the level of introspection that most people operate—understanding WHAT is happening. Yes, knowing WHAT is happening is important. But learning can be deeper is we can understand WHY it is happening— this allows the manager to generalize and craft incentives, to craft marketing programs, to take advantage of knowledge of WHY the variation is occurring.
So, if we see variation across stores in our chain, or variation in time of day our sales are happening— then we can do economic analysis of various types to see if there are any systematic patterns in the variation we see. Lets say we are interested in understanding more about why our chain of hospitals has variation in revenues per bed. What is driving this? Using SCATTER PLOTS, we can quickly checkout some possibilities of systematic differences going on among our hospitals. Below, we show the variation across the chain in terms of ADMISSIONS per CAPITA on the vertical axis. And, we show how many beds there are per capita in the places we have hospitals.
What this shows is that in places with more beds, we have more admissions per capita. This odd” relationship is something we call “supply induced demand”. Where there are more beds, people get more hospital care. This, of course suggests that the use of hospitals is not based on “scientific need for care”. Rather, hospitalization is quite discretionary for many kinds of problems– and doctors tend to develop practice styles of admitting more patients when there are more beds available. Note, that the pattern doesnt exist for the red line– showing the admissions for Hip Fractures— such admits dont vary at all with bed availability— if you have a fractured hip, admission is necessary. No discretion.
So, we learn a little about why admissions (and revenue) may vary across our sites because of the patterns we see. Looking at patterns across types of patients (age, gender, insurance status) or across types of service (maternity, surgery, outpatient, etc.) of community characteristic (rural, suburban, inner city) may help understand the WHY we are experiencing variation in our performance. This can be done by scatter plots, or tables of central tendency across groupings of our hospitals (averages or medians).
Sometimes we may suspect some systematic differences, and want to confirm it. We use statistical analysis to do that. Is the “pattern” we see more than just a possibility of the”noisy” variation in our data?
“What produces health” is a good illustration of massive variation in results–and the need to understand what underlying patters are driving most of that variation? Lots of things. At core, individual people decide how healthy they want to be, and what they might do day-to-day to achieve that aim. Not everyone cares the same about current and future health, and have different priorities for deploying their scare money and time resources toward their health aims. SOme people dont care about health at all—others fret daily about it. And, since “good health” can have a large effect on our future (in many ways) it is often sort of an “investment” —whereby people commit resources now in order to earn benefit6s later in life. But, of course, all people dont want to wait to recieve future benefits of spending money and time now to get healthier . So, the situation is that we have massive differences among us in our aim of how healthy we want to be, and consequently, differences in how healthy we are!
As policy analysts, thinking of improving the “bang for the buck” we get for our health spending (and also trying to make our health system more fair as well) we must understand the systematic patterns of the underlying health drivers. What makes us healthier? We do that by estimating “health production functions” using regression. We use data on individuals or groups of individuals to estimate algorithms like:
Health status = a + b (age) + c (gender) + d (educ level) + e (spending on health) + other factors
And, we use theory to guide us as to what factors are potentially important influences on individual health choices.
Testing Theory with Data
The issue in all such theory is developing tests of this behavior using real world data, where assumptions don’t match the simplifying theory. How to cope? Well, most theories are tested by using regression approaches, where assumptions are proxied by ‘controlling’ the extraneous influences on the simple price-quantity relationship. How to control for the effect of varying income, or effects of different competitor prices, or areas with more or less competition, etc. Essentially regression allows these ‘other influences on the key relationship of interest’ to be ‘held constant’ by including them in the regression model, which separately estimates their partial effects on the dependent variable. So, when i am interested in the effect of age group on the hospital charges, i can “control’ for severity by including it in the model as well. The regression coefficients are each partial estimates of the change in the dependent variable associated with a one unit change in the particular independent variable, holding constant the effects of other independent variables. So the effect of age group on charges holds constant the effect of severity on charges. This technique is the way we create measured effects of one variable on another, with simplifying assumptions about other relevant influences. Economic theory guides us on what these other influences might be, so we can be sure to include them in our economic analysis.
Our purpose is to determine if there are underlying patterns of relationships that confirm (or not) theories about how firms behave—- and our interest in these systematic patterns is to expand the body of knowledge that guides our understanding of how firms behave when hiring executives. For example, lets say we are interested in whether hospitals discriminate in their hiring behavior of managers. We would need to study the patterns of hiring managers to discover if discrimination is occurring. Do firms behave as if they were responding to labor market theories, or others? What we’d be interested in is whether the place-to-place variations in the propensity to hire minority managers is related to:
1. variations in the hospital marketplace itself— are there systematic differences in hiring when hospitals are larger, or when the industry is more/less competitive, or when the missions of hospitals are different (ownership and teaching status).
2. variations in the supply of workers—- are their systematic variations in minority hiring associated with the % of population that is minority, or the relative #s of minorites with college degrees,
3. variations in region. we know there are more and less willingness to accept minority managers in different regions of the country, and in urban and ruiral areas.
WE WOULD EXPECT SUCH relationships to explain some portion of the observed variation in minority hiring— but is there anything left unexplained? If yes, what could be the systematic explanation for it?
One theory that is relevant here is the theory of discrimination (Gary Becker, Nobel prize winner). His theory says that business may discriminate (racial, gender, ethnic, etc.) if they have customers or employees who prefer it. Said another way, discrimination is the sacrifice in potential profitability to achieve some non economic end. Discrimination is such an end. While stockholders may not like it, sometimes caving to pressure from other employees or customers (or other stakeholders) may be done. So, it may be that unexplained variations in minority hiring may reflect this or other systematic (but unmeasured) influences.
So, looking at the article, how important are supply forces in explaining place-to-place variations in % minority managers? How important are hospital market factors? Can You think of other things that would have been good to measure to help understand the variation?
Regression is a way of understanding the empirical relationship between variables, where the relationship is expressed in the form of an estimated linear equation developed from data. A demand curve, for example, would be such a line, relating the quantity bought and the price. Often, regression is a way of testing hypotheses about the relationship between 2 or more variables(one is called dependent variable, others are called independent variables). Multiple regression is a way of testing for these two way relationships when the effects of other variables is being held constant. Note: even though we call the variables dependent and independent, regression does not test for causality between the two variables. It tests for association between them (similar to correlation). The cause-effect relationship (eg the directions of causality) between two variables is possible to infer from theory, or from patterns of complex statistical results.
Regression analysis essentially displays the scatter of data between two variables
If we were doing a cost analysis and trying to determine fixed costs and marginal costs we could use regression. So, if we had data on each year of operations, or each month of operations, or each day we would use those data to specify the following equation:
Total costs = A + b (Volume of output)
A would be an estimate of what costs would be if volume were 0. This is fixed costs! The coefficient “b” is the estimated change in costs when we add (or subtract) one unit of output—this is the marginal costs. Average costs can easily be calculated in the raw data.
If we had some other variable that changes and might help explain why the cost/volume relationship might “shift” during the data period we are studying, then we could add it to the regression. For example if we had a monthly data set on costs and volumes we might want to note (and control for) the fact that the last 6 data points were from a time when we were open in the evenings (and the other data points were from times when we were not open in the evenings). So our regression is going to be
Total costs = A + b (V) + c (evening open) this =1 for the last 6 months, and = 0 otherwise)
So, A and B still mean what we said, though their values may change a bit with the new model. The coefficient “c” tells us how our total costs change (per month, per day or per week depending on what our data is) when we are open evenings, compared to our costs when we are not open in the evenings.
Regression coefficient estimates have important interpretations in economics. In the above example of the cost regression, we might have an estimated equation
Total monthly cost = 50000 + 100(V) + 20 (evening open)
the coefficient estimate “b” is interpreted at the change in total cost when we increase the volume (V) by one unit (this might be the number of clients we saw each day. This is called variable (or marginal) cost. The 50000 is the estimate of fixed costs (which we incur independent of the volume we produce.
In the case where we might have estimated a demand curve, such as
Quantity sold = A + b (price) + c (household income in 000s) + d(competitor’s price)
We could estimate it from data and get:
Quantity sold = 500 – 15 (price) + 20 (income ) + 30 (competitors price)
- What this means is that if our price was zero, and income were zero and the competitors price was zero we would sell 500 units (silly, but it tells us where the demand curve crosses the horizontal axis (eg where out price is zero)
- It says that if we increase our price by $1, we’d sell 15 fewer units of the product (other stuff like income and competitors prices staying constant)
- it says that if household income were 1000 higher on average, we could expect to sell 20 more units (other things held constant, like our price and competitor prices. Is this a normal or an inferior good?
- it says that if competitors lowered their prices by $1 we would sell 30 fewer units (other things the same. Are they a substitute or a complement?
Excel does regression. Look under tools to see if you can add in the “data analysis” add in. If you have it, find it under the Data tab. You can do descriptive analyses and other things with “data analysis”, but scroll down to regression. It will ask you to highlight the column of data that represents the dependent variable. Usually it is best to highlight the name of the variable and all the data in the column. Then it will ask to designate the independent variable, and you do the same thing. And then make sure to check the box that says “include the data labels” (because you highlighted the data labels too). If you have 2 or more independent variables you can include them in the model. You do this by putting all these variables in adjacent columns, and highlighting all of them in one fell swoop. Note, excel will not do regression if cells are missing data, or if there is a non numeric value in a cell (a comma, etc.). You will get some message when you push the regression button to run the model, and you’ll have to locate the problem, and possibly through away one of the observations.
i did a simple regression on excel and have attached it below. The data set below was used to do a regression to understand the factors associated with the size of hospital bills across a bunch of patients (eg the variable called “ charges “). I ran a regression analysis to test three relationships:
- Does age matter to the size of bill
- Does the category of the age matter
- Does severity of the diagnosis/procedure matter
basically what i did in excel was to go to the data analysis, regression page–and key in the cell location of the dependent variable (in this case, hospital charges for 39 patients) which variation in i was trying to explain by 3 independent variables — age, age category, severity. I had to put these variables in adjacent columns and key in the cell locations of these three things. In the regressions they came out as three unnamed variables since i neglected to select the column heading.
I explain the results on the sheet showing what excel produced as results—much of which is not important at this stage of the game.
You could use the data set to create a different model , say one that used only patient severity as a independent variable to explain charges.
| charges | age | age category | Severity | dr code# | Female=1 | admit | disch |
| 8,254 | 57 | 2 | 2 | 730 | 1 | 1/1/2004 | 1/3/2004 |
| 24,655 | 43 | 1 | 4 | 730 | 1 | 1/1/2004 | 1/9/2004 |
| 27,234 | 81 | 3 | 4 | 730 | 0 | 1/2/2004 | 1/13/2004 |
| 21,345 | 56 | 2 | 3 | 730 | 0 | 1/9/2004 | 1/14/2004 |
| 2,417 | 17 | 1 | 1 | 730 | 1 | 1/3/2004 | 1/4/2004 |
| 5,420 | 61 | 2 | 1 | 730 | 1 | 1/4/2004 | 1/6/2004 |
| 18,823 | -61 | 2 | 2 | 730 | 1 | 1/6/1944 | 1/12/2004 |
| 20,280 | 61 | 2 | 3 | 730 | 1 | 1/6/2004 | 1/11/2004 |
| 4,360 | 44 | 1 | 1 | 730 | 0 | 1/2/2004 | 1/5/2004 |
| 22,382 | 90 | 3 | 3 | 730 | 1 | 1/2/2004 | 1/6/2004 |
| 12,673 | 39 | 1 | 3 | 730 | 1 | 1/4/2004 | 1/10/2004 |
| 22,632 | 70 | 3 | 4 | 730 | 1 | 1/3/2004 | 1/11/2004 |
| 22,642 | 77 | 3 | 4 | 730 | 0 | 1/3/2004 | 1/13/2004 |
| 14,111 | 85 | 3 | 2 | 730 | 0 | 1/5/2004 | 1/11/2004 |
| 9,763 | 52 | 2 | 2 | 730 | 1 | 1/6/2004 | 1/13/2004 |
| 13,343 | 65 | 2 | 2 | 730 | 0 | 1/7/2004 | 1/11/2004 |
| 4,886 | 54 | 2 | 1 | 730 | 1 | 1/4/2004 | 1/7/2004 |
| 22,712 | 87 | 3 | 3 | 730 | 0 | 1/4/2004 | 1/14/2004 |
| 7,194 | 50 | 2 | 2 | 730 | 1 | 1/3/2004 | 1/7/2004 |
| 24,809 | 73 | 3 | 3 | 730 | 0 | 1/3/2004 | 1/15/2004 |
| 9,405 | 62 | 2 | 1 | 730 | 1 | 1/2/2004 | 1/7/2004 |
| 9,990 | 63 | 2 | 1 | 499 | 1 | 1/2/2004 | 1/6/2004 |
| 24,042 | 67 | 3 | 3 | 499 | 1 | 1/1/2004 | 1/20/2004 |
| 17,591 | 68 | 3 | 4 | 499 | 0 | 1/2/2004 | 1/10/2004 |
| 10,864 | 85 | 3 | 2 | 499 | 0 | 1/3/2004 | 1/9/2004 |
| 3,535 | 20 | 1 | 2 | 499 | 1 | 1/2/2004 | 1/3/2003 |
| 6,042 | 61 | 2 | 1 | 499 | 0 | 1/4/2004 | 1/6/2004 |
| 11,908 | 59 | 2 | 1 | 499 | 0 | 1/4/2004 | 1/10/2004 |
| 24,121 | 86 | 3 | 44 | 499 | 0 | 1/5/2004 | 1/21/2004 |
| 15,600 | 72 | 3 | 3 | 499 | 1 | 1/5/2004 | 1/11/2004 |
| 25,561 | 92 | 3 | 4 | 499 | 0 | 1/4/2004 | 1/19/2004 |
| 2,499 | 39 | 1 | 1 | 499 | 0 | 1/6/2004 | 1/7/2004 |
| 12,423 | 69 | 3 | 3 | 499 | 1 | 1/6/2004 | 1/9/2004 |
| 24,980 | 71 | 3 | 4 | 499 | 1 | 1/7/2004 | 1/19/2004 |
| 19,873 | 59 | 2 | 3 | 499 | 0 | 1/8/2004 | 1/22/2004 |
| 21,311 | 92 | 3 | 4 | 499 | 1 | 1/6/2004 | 1/12/2004 |
| 15,969 | 60 | 2 | 3 | 499 | 1 | 1/5/2004 | 1/11/2004 |
| 16,574 | 72 | 3 | 3 | 499 | 0 | 1/7/2004 | 1/13/2004 |
| 24,214 | 89 | 3 | 3 | 499 | 0 | 1/7/2004 | 1/19/2004 |
Forecasting
Almost every person finds themselves in a situation in their job where they have to make a projection or forecast of sales revenue, or cash needs or something. Every business plan requires this sort of thing. While this is an area where deep technical skills exist, it is also an area which requires MBAs to be equipped to do a serviceable job (when the job can’t afford to hire an expensive consultant) and to understand the limits of their work.
Finding future values of some measure or indicator is risky business. It is often important for businesses to do this for planning purposes, but it remains a difficult chore under the best of circumstances. How can we predict the future? We can’t. But we often have to try anyway.
There are four basic forecasting methods.
(1) extrapolating from historic or past data to find quantitative estimates of future values. There are a number of ways to do this.
(2) surrogate tracking—finding some metric for which forecasts are available that moves over time in roughly the same pattern as the thing we are trying to forecast.
(3) analytic forecasting, where we find known or logical drivers of what we want to forecast, and then look for evidence and opinions about what those drivers might be doing going forward. This may yield some qualitative notions of what to expect. This is our only option when there is no past data from which to extrapolate.
(4) a system of equations that link together the relationships between the drivers and the target variables. Historic data on all measures are used in such systems. This is the most comprehensive approach, and several universities and consulting organization have large models that make forecasts that are sold to corporate and government clients.
Extrapolation
Sometimes called time series analysis. This can be done several ways, but the essence is to calculate the future value of a measure by extrapolating or projecting from past data on that same measure. There are many ways to do this. But, two methods are most common. Smoothing the historic data, or moving average methods, so that trends can be separated from the “ups and downs” or “noise” in the raw data—-which then allows the analyst to see the trend better and use it to extrapolate going forward. Typically moving averages are used. So if we have annual data, we may see lots of up and down from year to year. We can “smooth” that data by separating the data into 3 year groups, or four year groups, taking the average for each of these year- groups. By plotting the three year averages, it tends to ‘smooth’ the messy ups and downs, letting the analyst see the underlying trends in the data. Excel supports this technique of moving averages.
A second approach to extrapolation is very finding forecasts (from BLS or other agencies) of the growth rates that might apply to make the forecast. If we are trying to forecast revenue for our Hess gas station chain, we might build a projection from the forecasted value of the price of oil going forward. It wont be exact, but it might be a useful estimate.
A third approach to extrapolation is simple regression. Regression simply passes a straight line through the data points (scatter) that relates the value of the variable and time. The next page describes the approach. Basically, the forecasted variable is the Y (dependent), and time is the independent (x) variable. The regression technique passes the best fitting straight line through a scatter plot of data.
That line has two parameters: (1) what the slope of the relationship is between the variable and time. Specifically, the change in the variable relative to a one unit change in time (a year if the data are annual, or per month, if the series has data for every month, etc). (2) the second parameter is the Y intercept. This is essentially the value of the Y variable if time=zero. This is meaningless, other than to position the best fitting straight line.
Excel does the work. Here below I did a regression on quantity purchased (Y) and price (X). You can see the data, and the results.
The two key parameters are shown as “coefficients”. Here, if price were zero, the customers would “buy” 44.54 units of the product. And the change in demand associated with a 1 unit price change is – 3.91 (a fall in price by $1 would cause a 3.91 increase in number of units sold.
The regression has several other results. The t-stat and associated p values on each parameter estimate are useful in testing hypotheses about whether the estimates we got could have been the result of chance variations in data. Usually a t-stat >2 is an indication that the coefficient is so far from zero (two random and unrelated variables in a regression would have regression coefficient of zero) so as to believe it couldn’t have been generated by chance—there must be a relationship between the independent and dependent variables!
The R square statistic ( = to 0.99) tells us how good the regression line is in representing the raw data. Here the regression nearly matches the raw data, because the raw data is nearly a perfect line. We would say that the regression line (model) explains over 99% of the variation in the dependent variable (quantity demanded here). It is as good as it gets. Most regression models have much lower R squared values. But, frankly, people using
models to forecast and to analyze relationships between variables are much more concerned about the t and p values, and the strength of the relationships between the two variables implied by them. R squared in interesting and a measure of how good the model is overall, but rarely is it a bar to action in the real world.
So, if had a regression telling us that every year added 23M to our sales revenue, we could do a forecast by; (1) taking our most recent year of sales revenue data, and (2) adding 23 million to that number for each year going out to the year we are trying to get a forecast for. There is a way to estimate how much statistical confidence we’d have in that forecasted value—but I am going to skip it here.
A variation on time series regression is to add a lagged value of the dependent variable as a second independent variable. The lagged value will cause us to lose one year of historical data (cause we don’t have a lagged value for the first year). The coefficient on the lagged term is the basis for creating the forecast.
In extrapolations, however we do it, we must remember two facts. One is that the projection is going to be less reliable as we move further and further into the future. Secondly, the recent experience (last year, the year before) are going to be more important to our forecast than the years a long time ago (1989, 1990). If we believe this, than we can “weight” the more recent years most heavily in our analysis (excel doesn’t allow us to do this, but statistical software like SAS and Stata do allow it).
The third issue we must keep in mind is that extrapolations, however done, stand or fail on the basis of their underlying characteristic— we are forecasting the future based on extending trends from the past. To the extent that the same underlying forces that made the trend in the past continue into the future, then the forecast will be a good one. To the extent that the underlying drivers of the past data change, then the forecast will be lousy. Thus, time series or extrapolation forecasts are notoriously bad at detecting turning points. The example below emphasizes this point. In such cases, or in all cases, this means we must rely on analytic methods to get us information about where the data is going.
Surrogate tracking
Sometimes we are trying to get time series data to extrapolate from but we have no history to do it with (eg a business plan for a new product). If we are lucking we may be able to identify some measure that should be highly correlated with what we are trying to project. For example, the illustrative problem below uses this technique. We are trying to forecast the Medicare per capita spending in Massachusetts. We have past data, but no forecast exists. A google search turned up a forecast of inflation rates for the national medicare spending per capita. While this isn’t exactly what we want, it was worth a look to see how well the Mass numbers tracked against the National numbers in the past. The chart shows what we found over six past years.
While the Mass numbers are a good bit higher than the National ones, the year to year changes seem to be driven by the same things (whatever they are). So, we decided to use the forecasted % increases in the National data to proxy the % changes in the Mass numbers.
Sometimes surrogates are not this close. We may not have a forecast of sales for our product, but if history shows that it is strongly related to the strength of the economy (income, jobs, etc.) we may be able to use forecasts of the economy as a proxy for projecting sales.
Analytic Forecasting
This can be done in 2 ways also. If we are forecasting the stock price of Disney stock, the first way is to make a list of the things that might influence price of Disney. Make a short list (not the stuff on the table, but other things in the economy that might influence Disney stock price). Then, the next step would be to snoop around, talk to experts, visit pundit web sites, and see if we can understand what the direction of change will be in these underlying “drivers” of the Disney stock price. This is not a “computational” method of forecasting at all— but it might yield a pretty consistent view that the drivers were going to cause the price to be higher in the future, or lower. Of course, there may be no consistency at all. And we can’t give management any good idea of what will happen to stock price using this method.
The Disney Case uses analytic forecasting in a computational way. It passes a line through the scatter of stock price (P) and one of the presumed ‘drivers” of changes in stock price—Earnings per share (EPS)— which has a slope of +31.388 —a change in EPS by $1 will cause stock price to change by $31.39 in the same direction, other things the same. This is not extrapolation. This is analytic forecasting. It says that the stock price which we need to forecast is systematically related to the EPS. If we knew how high EPS was going to be in 2007-9, we could forecast the stock price.
The drivers of the thing we are forecasting are usually not the components. Rather they are environmental/external factors that will determine the thing we are forecasting. Those factors are dependent on how far into the future we are trying to forecast:
- Short term forecasts— maybe for the few quarters, or the next year or so. Here, many of the things that drive the demand for our product or the health spending (or whatever it is we are interested in) are simply not going to change. But, there are always some things that might happen that could alter the situation. What are they? Then look to google and the pundits and experts to see what they are saying about our forecasting problem. Also google the key drivers (price or availability of a key raw material, or a government policy that might change overnight) and see if experts are commenting on the situation of the key driver. You would be surprised how much is available on the web, but you must organize the problem: what are we forecasting, is anyone else doing the same thing, what are the drivers, what are experts saying about them in the short term?
- Long term forecasts—maybe 5-10 years. This is hard. So much can change. The key drivers may be totally different. The longer the historical data series, the more helpful it may be here. Demand and supply and Government policy might be a framework to use. SWOT is a way to frame it also. Some forces are more fundamental here. And, pundit opinion is more needed here, and it is essential to find some good thinking about the long term
- Medium term — 2-5 years. Here some things are fairly well fixed (the kinds of competitors we have, the demographics lying behind the demand, the sources of key inputs. But some other things can certainly change. This is the list we want.
Putting it Together
The integration of these methods (extrapolation, analytic) may generally be the best way to proceed. Do a simple projection. Evaluate the drivers, and put it together. Management always wants to know that everything is pointing in the same direction, even if the number isn’t exact. A management also wants to know if there is a lack of agreement. The issue isn’t getting a number—its in analyzing the consistency of the evidence you have brought to the table.
Sometimes you may find a source that has thought about this problem a lot and has made a forecast, and reasoned it through in a very expert fashion. You may want to just “steal” this result and use it. This is likely to be better than anything you can independently come up with. Remember, this is not a research project, this is trying to estimate the future value of some measure. Use the best resources you can access.
In many cases a range estimate is going to better than a point estimate. Maybe you have no basis for distinguishing between two estimates: 2.2M and 2.4M. Then possibly what you can do is to say that your estimate is somewhere in the 2.2-2.4 range. And explain why this is the case and how you reached it. If pressed to pick a number, maybe the mid point (average) is the best bet (eg 2.3).
An Illustrative Example
Forecast the 2014 value of Medicare spending per capita in the Boston area. We have data from 2007 to 2012 on this measure. Here it is contrasted with national data, which is important because we want to see how the past in Boston is comparing to the Nation, largely because we know that the experts and pundits, if we find them, are going to be focusing their attention of the National picture, rather than the Boston market.
And, we see from the chart that pretty much the same trends (changes over time) are evident in Boston as ion the Nation as a whole. That helps.
The simple regression of the Boston per capita spending on year would yield a coefficient on the year variable of about 290, which could be used to project 2013 and 2014 values. But this method, using the historic data, yields values for 2013 and 2014 that are 11310 and 11600, which are way above the simple extrapolation of the first chart shown above. The regression estimate of an increase of per capital spending of 290 a year is based on the average increase from 2007 to 2012. The increment per year is obviously declining, and this kind of method wont give us forecasts that are believable, if the recent trends are accurate.
To help see what’s going on we made this Boston data chart from the above data. It shows the years change in the average per capita spending. Obviously the last 4 yes has seen the average spending increase by a smaller and smaller amount each year, and even become a negative increment in the final year (2012).
I also googled to see if there were any suggestions about forecasted values of medicare spending, or better yet, medicare spending per capita. Voila, I found something from the federal government at http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and- Reports/NationalHealthExpendData/downloads/proj2012.pdf
Since the national pattern and the Boston pattern has looked similar (there is a gap between them, but it remains constant over time).
So, one forecast I can make is to apply the projected % increases here from the national series to the Boston data. 2013 = 2012 (1.009) and then 2014 = 2013 (1.019). Applying these formulas, the per capita spending increases by less than 1% in 2013, and by 1.9% for 2014. The numbers for these 2 years are, 2013 = 11022 * 1.009 =11121 and 2014 =11121 * 1.019= 11332
And the projection chart becomes as follows:
This shows a small uptick for the projection period.
The class we were using this example in was asked to do a projection for 2014 from these data and consider both extrapolation and analytic approaches. The result was 13 forecasts for 2014 averaging 11,309 (and ranging from 10,813 to 11,900). This class average for 2014 is within $24 of the projection made based on the government’s average expected growth rates in national per capita Medicare spending (shown immediately above).