Prescription drug data riddled with errors of drugmakers

Prescriptions sign

Drugmakers misspelled drug names and listed the same drug under multiple names which made analysis of prescription data difficult, according to an article by ProPublica and co-published with The New York Times.

As part of its ongoing “Dollars for Docs” series on payments from pharmaceutical companies to doctors, ProPublica reported there are widespread problems with the data that is posted to the “Open Payments” section of the Centers for Medicare and Medicaid Services (CMS) website.

Forest Laboratories misspelled its depression drug, Fetzima, as "Fetziima" 953 times — in more than one-third of all the reports on the drug.

CMS doesn’t double-check the data nor do they correct spelling errors or alter the data in any way, according to the story.

For people who work with data on a daily basis, much of the work is still janitorial work, as The New York Times reported in August. It’s the mundane work of cleaning and standardizing terms in a database. The process of cleaning data involves changing inconsistencies, such as different spellings of the same term, to a single standard or discovering inaccuracies.

Names of people, drugs, or companies can be spelled or entered differently depending on the person or people entering or collecting the data.

From the ProPublica story:

Take H.P. Acthar Gel, an expensive injectable drug used to treat multiple sclerosis, kidney disease, lupus and other conditions. The drug's maker, Questcor Pharmaceuticals, logged payments related to the drug under eight names, including Acthar, Acthar-Pulm, Acthar-IS, Acthar-Rheum and Acthar-MS. The payments associated with each name didn't stand out much. But when they were all added together, the drug ranked in the top 20 for spending on doctors.

The reporters didn’t find any evidence that errors in the data were deliberately made by companies trying to be evasive.

With so many errors in the original data, it would be very difficult for the average person to access the data and derive any meaning from it. This is often the case with large government databases if they haven’t been cleaned. Terms entered with a different spelling or entered in a different format appear as separate entries in a database, but are the same in reality.

This type of data cleaning makes a government database ready to be analyzed and used to draw conclusions for a news story.

Reach Eric Holmberg at 412-315-0266 or at eholmberg@publicsource.org. Follow him on Twitter @holmberges.