EagleRidge Technologies, Inc.: Resources : Legacy Data and Software Planning, by Crystal Sloan

[Home]
[Resources]

[Contact us]
[About us]
[Links]
[Services]
[Map]
[Privacy Policy]

Legacy Data and Software Project Planning

by Crystal (Bliss) Sloan, Microsoft MVP

Home --- Resources --- Legacy Data and Software Project Planning.

Averting Disaster:
Legacy Data and Software Project Planning

"As soon as phrases such as "mapping the data," "data transfer," "data migration," or "convert the legacy data" appear in the context of a project, a wise systems engineer or project manager should see red flags waving..."

Students in Professor McCumber's "Systems Development and Project Control" graduate classes are asked each term to share their tales of project disasters. Each term, a significant number of the disasters related revolve around the use or conversion of legacy data. For example:

"This six-month small project was an interface between [a legacy system at our company and a new system from] an outside vendor.... The project itself was basically successful. We hit our completion date and the interface works. What wasn't successful was the way we got there and the negative impact this had on my "career." According to my manager, development on this project took 40 percent longer than it should have and he had to pay 40 percent more for the $$$ contractor programmer to complete the project than he should have. We sacrificed testing time to compensate for the development time and I spent six months in a state of frenzied anxiety....We did not have certain data in our company's system or did not have data in the correct format to meet the vendor's requirements. At the beginning of this project, all people involved were describing it as "just a simple interface." I believed that and thought that requirements would just be a matter of mapping the data, gathering the data, and sending it. It turned out that the vendor needed more information than was present in my company's system of record. We had to add many new codes to give the vendor specific data and in some cases had to clarify and/or create new business rules and data entry rules."

"Vendor could not deliver database in type desired therefore historical data is already 6 months late in conversion."

"The [government agency] invested in a new computerized personnel security database in 1996. The system... was intended to replace the older tracking system and enable immediate access to personnel security files during all stages of the investigations conducted to grant a government security clearance to an individual. The older system kept track of who had clearances and would indicate if an investigation was open, but not any other details. The newer system would provide up-to-date information as to when an investigation was opened, what the current phase of the investigation.

Although the plan sounded good, it didn't go quite the way it was expected. When the new computer system and software was completely installed, the plug was pulled on the old system. After just a few days, the [agency] knew it made a grave error. The data from the old system did not totally correlate to the new database. As changes or new investigations required accessing a record file, it had to be completely re-entered and corrected. New records that had to be created for required investigations were delayed while operators tried to correct the existing database. The investigations began to back up. The [agency] was processing an average of approximately 300,000 investigative cases a year for industry and government agencies. In 1998, the investigations backlog had reached an astronomical 800,000 cases! The [agency's] Director... was replaced by [a new manager] who immediately set out to find a solution to the problem. After an assessment of the situation, [the new manager] conceded that the new computer database system was too far along to replace, yet it would take years to catch up to meet the needs of industry and government. His solution included hiring more investigators and tasking other government investigative agencies to help. Meanwhile, [agency] database operators would continue to change and input data for the millions of records in the system. In 2001, the backlog hasn't been reduced and the length of time to process an investigation has more than tripled.

The latest figures for the first quarter of FY01 indicate that the investigative cases that in 1995 took an average of 120 days was now taking 477 days. It was now very possible that a contractor could hire a person for an [important] contract, yet not be able to put them to work for almost a year and a half!

The system design... was basically good, but its implementation was disastrous! Perhaps if the changeover from the old data base to the new was not so abrupt, bugs could have been worked out. If the database records were tested before shutting down the older system, errors could have been corrected. Sample databases, test cases, and trials using real data could have prevented this whole situation." It is estimated that extra costs and delays from this mistake will continue for 5 to 10 more years.

"...The organization was in the process of converting their [ancient legacy] system to [a new system]. The [legacy] database was plagued with many duplicate files... [and] was very large. There was no consideration taken to see if data from [the legacy system] could be imported into the [new] database.... After taking days to find and delete the right duplicate files, the manager decided to convert the databases. Everything went wrong. Fields were not the same so data was not converted. The organization's entire... database was down for two days. The decision was to create the appropriate fields in [the new system] and have the entry coordinators input the previous two fiscal years of [legacy] data [into the new system]."

"My company recently migrated to our current financial system.... Basically, the conversion process involved the purchase of new hardware, client-side configurations, application server installation, and data migration. The new package... had no software conversion utilities for the [legacy] version... we were running. Therefore, the data migration was to be a "brute force" effort. To accomplish this, we set up a test bed of the new system, mapped the data, spied the table updates of the new system, and then migrated the old data to the current structure. Long story short, hardware procurement was coordinated with software procurement/installation and the system was put in place. Everything progressed along the proposed timeline and remained in budget with the exception of the data migration.... The data migration involved the mapping of data by the finance/purchasing personnel, the partial population of data into the new system, the verification of data by the finance/purchasing office, and the movement of all the data by functional area (i.e. Accounts Payable, Accounts Receivable). The two bottleneck areas turned out to be the data mapping and verification. We had a number of highly paid technical resources waiting on the finance office. The finance personnel were terribly overworked, performing two functions: their original jobs and the data migration phase. The problem was definitely a mismanagement of personnel. This problem directly effected the cost of the project. To offset the cost overruns, we made drastic cuts in the guaranteed bandwidth of the network connections. To compound the problem, some of the data was too old to be migrated. The data could not be mapped or traced in the old system. So the old system remains in place in case we have to produce financial data older than 5 years. This process would involve rebooting the HP 9000, faking the date to pre-2000 and producing data reports. Lessons Learned: 1) Personnel - steps that seem insignificant are not. The data mapping and data verification was considered to be a side note of the project. Obviously, this was one of the most important aspects of the migration."
"My war story included working on a government project as a contractor to develop a data warehouse. The project truly lacked requirements and totally lost its focus. We had no idea the amount of data this particular government agency had in its legacy systems. The data warehouse continued to grow, which is normal and the purpose of a data warehouse, but the quality of the data was poor. We tried to capture the metadata [data about the data], but no one really knew why certain fields were created. I feel we should have created a good requirements and design document from the beginning. At that point, we would have realized the best way to migrate the legacy data together is to create a smaller repository, such as a data store. The data store would have given us the opportunity to view the data in smaller quantities and to cleanse the data from anomalies. Then the data store could have provided us with a baseline to build our enterprise data warehouse, which eventually lost its funding."

In a recent discussion, one student said something that made me think back to some of the project disasters students posted earlier in this and previous terms:

"... programmers cannot properly script what they don't understand."

Many computer-related projects involve transferring legacy data to a new system, or interfacing to data on legacy systems, or both. I lump all the sub-projects relating to these topics under the term "data transfer projects," or simply "data transfers" or "data migrations."

Quite a few of the project "disasters" reported in this and past terms involved such data transfers. This is completely understandable when one examines exactly what is involved in such "data transfers."

To transfer data from one model/format to another, or even just to map the old data to the new model, in the case of connecting a legacy system to a new system, one must know the exact model and format actually used by both sides. One must know not only the current "official" formats and meanings and relationships of the legacy data, but also the real-world formats and meanings and relationships the data has had over time, not only generally but literally right down to the bit level. This means that the poor programmer stuck with the data transfer is really being given the belated task of finally and precisely documenting the entire history of the existing legacy data, a task that almost certainly no one else has done before.

I'm no data transfer "expert," but have probably participated in some hundreds of these data transfer projects (I stopped counting at 100 fifteen or twenty years ago, but kept right on doing them) and have seen many times the types of things that can happen.

Here is a typical example: a legacy medical database had a field to indicate how much a person smoked tobacco: According to the customer who owned the data, 0 meant none, 1 meant some, 2 meant 1/2 pack a day, 3 meant 1 or more packs a day. Simple, right? But it turned out that the field *used* to be used to indicate not how much a person smoked, but whether they had ever smoked: 0 was never, 1 was the person was a current smoker, and 2 meant the person used to smoke but quit.

Years before they had changed the meaning of the field, but left in the old data, since the people using the data at the time knew which patients were which--patients with (otherwise meaningless) IDs before a certain number were the old format, with IDs after a certain number were the new. However, this information relating the meaning of the smoking field to the value of the ID was never formally documented.

The proposed new system had an entirely different, meaningful ID system, requested by the customers to enhance their operations.

Enter the hapless programmer who is told how to generate the new numbers from legacy patient Social Security Number and birth date, and also accurately translates all the legacy data according to the stated specs: next thing, the programmer is faced with angry doctors who say their patient data is wrong! For example: "This patient is not a smoker! But the system now says he smokes!" Multiply this problem with the smoking field by similar stories for many of the other thousands of fields in that medical research database... Needless to say, the data transfer part of the project took much longer than the customers had wanted it to. Incidentally, although the programmer knew in advance this would happen, neither management nor the customer would listen-- a common scenario.

Often the mismatch between the "official" legacy format and the reality of it is so great that even perfectly programmed transfer programs written to the "official" specification will not successfully complete a run, even after thousands of "programming fixes" and re-starts, or will dump so much each run to a log of errors that the programmer must add fixes for a yet another round of newly-discovered differences between the specifications and the reality, and start testing yet again.

It is an extremely rare project indeed in which even the current owner of the legacy data knows exactly what all the fields and codes and records and tables have meant over time and their relationships to each other. Unfortunately, it is also an extremely rare project in which the powers that be do not insist that they really do know the exact formats and consequently insist that the data transfer is a small part of the larger project, more of a detail or an afterthought than the main project itself.

This means that nearly always, the data transfer portion of a larger project is allocated far too little calendar time and resources (labor) by the project planners.

A good rule of thumb for data transfer projects is to figure the time it "ought" to take to do the transfer--roughly proportionate to the number of fields in the model-- then multiple by McCumber's 2.2, or Sloan's 2.2 or a more conservative 2.5 to get a "safe" estimate for any other type of project, and then multiply by ... ten. I wish this were a joke, but it's not. Ten is actually a pretty good number for this.

The student who posted earlier in the term about being 40% over budget for a data transfer project may have been doing very well, if her manager would only realize it.

Managers and programmers who have been through some of these projects can save themselves a lot of trouble in future endeavors by treating data transfers for what they truly are--long-overdue precise documentation of the legacy data model and data.

As soon as phrases such as "mapping the data," "data transfer," "data migration," or "convert the legacy data," appear in the context of a project, a wise systems engineer or project manager should have red flags waving in his or her mind.

It is not that doing the programming for these conversions is so difficult or requires super-expert capability on the part of the programmers; it is that the following two major drawbacks accompany nearly every data transfer project: 1) the project will have specifications for the legacy data that do not mirror reality--that is, the project will be mis-specified; and 2) the project will be seen by the people who are paying for it as the exception to the rule--they will nearly always say that they do know the data formats exactly, so their project is different. One cannot do much about the almost-certain starting condition of flawed documentation of legacy data; however, one can try to communicate the potential for trouble to the customer in hopes of heading off disaster.

It may be that the IT professional can get the customer to see that a data transfer of legacy data is mostly a long-overdue precise documentation of the legacy data, with only a relatively small part of the project being routine programming. If the customer can be brought to understand this, then he or she may decide that the legacy data is not so important after all, or that it might cost less to keep the old legacy system around just to check on the legacy data in those few cases they might need to, rather than pay for a probably-expensive data transfer project.

It is indeed true that "... programmers cannot properly script what they don't understand."