(Updates below the article; new estimate is closer to 2.5%)
The fatality rate of an emerging outbreak is difficult to determine, and especially so when the data is inconsistant and/or incomplete. Nonetheless, we can take our best shot.
But first I should disclose that I have no special training or background in medicine or public health. I am using publicly available data, and first principles. I very well may not be aware of something that makes a significant difference in terms of this estimation. So buyer beware.
Most sources including news sources such as the Washington Post are estimating the fatality rate by dividing the number of fatalities by the number of cases. While this method works fine for diseases which have been with us for some time, this is not an appropriate way to estimate the fatality rate of a newly emergent disease.
Every case is in one of three states: fatality, recovery, or still sick (where the outcome is yet to be determined). For diseases where most of the cases have completed, fatalities divided by cases works fine. But when most people are still sick, we don't know how many of them are going to die and how many of them are going to recover. Fatalities divided by cases presumes they are all going to recover, which is clearly too optimistic.
This optimistic figure for the 2019 Novel Coronavirus is about 2.2%.
Another way to compute the outcome is to look only at cases which have completed. For this we need to know how many people have recovered. According to qq.com, there have been 132 fatalities and only 107 recoveries. That puts the fatality rate at 132 / (132 + 107) = 55.2%. This is much higher than the optimistic figure. However this figure is pessimistic for at least two reasons. First, early on in an epidemic people are not being treated - they think they have the cold or the flu and convalesce at home. So the early fatality rate is very high compared to what happens after the medical community gets involved to treat patients. Second, during the scary phase of a new epidemic, medical professionals tend to play things very safe, and are very cautious about declaring anyone to have recovered. Thus the number of recoveries is artificially low.
This range, from 2.2% - 55.2% is a very wide range, but it serves as useful bounds.
Can we do better? We can try to predict the outcomes of existing cases based on their status. Patients who are still sick are either stable, serious, or critical. Most people in critical condition will not make it, and we can just approximate and consider for our purposes here that they will all die. Most people in stable condition will probably recover, and we can approxmate and consider that they all will. And people in serious condition could go either way, based on whatever the fatality rate is. Using Hubei province data of 3554 cases, 125 fatalities, 671 serious and 228 critical, we can predict future fatalities to be about 125 + 228 + 671*r where r is the fatality rate we end up with. This gives us (125 + 228 + 671r)/3554 = r. Solving for r we get a fatality rate of 12.2%.
12.2% is somewhere in the middle of our prior range calculation, so that checks out as reasonable.
We can also look to SARS and MERS. SARS had a fatality rate of 9.6%. MERS had a fatality rate of 37-41%. This number is somewhere in between. But we know of so few deadly Coronaviruses that presuming that the SARS and MERS fatality rates would bound this new one is not reasonable. The fatality rate could really be anything. Nonetheless, being in between SARS and MERS gives just a little bit more confidence.
But consider that Hubei province is overwhelmed, so the fatality rate there might be higher than it would be elsewhere. Given that fact, I'm going to simply fudge this number down a bit to what feels like, given all the data, a better guess.
I will be real interested in coming back in a month or two to see how accurate this was, and even more interested in why I was wrong.
UPDATE: 2020-01-30 16:01 +13:00: One aspect I missed in this analysis is the number of asymptomatic cases. I calculated from only the cases that have been identified and confirmed. Including asymptomatic cases, the death rate would go down. This data is currently unavailable and very hard to estimate.
UPDATE: 2020-01-30 21:14 +13:00: I've read that the WHO estimates the fatality rate at 4% in a sciencenews.org article, but I have been unable to confirm this. I do have this from the WHO: "Most seem to have mild disease, and about 20% appear to progress to severe disease, including pneumonia, respitory failure and in some cases death."
UPDATE 2020-02-01 11:56 +13:00: Based on the WHO 20% statement, I've done some thinking. If the 80% of people that don't get severe illness never come in to get tested (in Wuhan, due to overwhelmed hospitals), then the fatality rate coming out of Wuhan would be very biased towards those who are more likely to die. In that case I should divide my 10% figure by 5 to get a 2% figure. I don't know where their 20% figure comes from, but I trust it as the best figure we have at the moment. And I don't know if only severely sick people are getting tested. But I will presume that most people without severe illness in Wuhan are not being tested, and I will then presume that the fatality rate is probably closer to 3.5%. Keep in mind also that the fatality rate goes up as hospitals become overwhelemed, so if this spreads rapidly and hospitals become overwhelmed, the fatality rate will probably go up to the 5-8% range.
UPDATE 2020-02-11 14:00 +13:00: The top bound has dropped to 20% (1016 fatalities divided by (3996 recoveries + 1016 fatalities)). The naive calculation is now 2.38% (1016 fatalities from 42638 cases). The WHO has adjusted their data based on details of 17.000 cases which shows that 82% are mild, 15% are severe, 3% are critical, and less than 2% have died. I'm adjusting my estimate down to 2.5%.