Scientific ( and e-poster ( sessions were conducted at the eighth national conference on Advancing Practice, Instruction, and Innovation through Informatics (APIII 2003) on October 8–10, 2003, in Pittsburgh, Pa. The course directors were Michael J. Becich, MD, PhD, professor of Pathology and Information Sciences & Telecommunications, director of the Center for Pathology Informatics, and director of Benedum Oncology Informatics Center at the University of Pittsburgh, Pittsburgh, Pa; and Rebecca Crowley, MD, assistant professor, Pathology and Intelligent Systems, University of Pittsburgh, School of Medicine.

Comparison of Stain Quantification for Histological Specimens Using Spectrometer and Multi-Band Image Data

Othman Abdul-Karim1 ([email protected]); Tokiya Abe, MS2; Masahiro Yamaguchi2,3; Yukako Yagi.1 1Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pa; 2Tokyo Institute of Technology, Imaging Science & Engineering Laboratory, Tokyo, Japan; 3Telecommunication Advancement Organization, Akasaka Natural Vision Research Center, Tokyo, Japan.

Context: Accurate analysis of histopathologic tissue samples is important in pathologic diagnosis. This experiment is part of a multispectral research project for a decision support system. Sample tissue is normally observed under a bright-field microscope after the tissue has been stained. Normally in histologic stains, color signals the biochemical makeup of various tissue regions. One of the most popular staining methods is hematoxylin-eosin (H&E). This method selectively stains the nucleus and cytoplasm. In this study, the amount of stain will be calculated in different small regions in a tissue slide using 2 methods. In the first method the amount of stain will be estimated using only spectrometer data, and in the second method it will be estimated from 16-band image data.

Technology: To calculate stain amount, estimation of transmittance spectra is needed. Normally a spectral measurement device called a spectrometer is used with a microscope to measure transmittance spectra. However, such a device is expensive and requires a highly experienced operator. It is preferred to estimate transmittance spectra without using a spectrometer. The transmittance spectra will be estimated from 16-band image data. Estimation of transmittance spectra is important in areas such as tissue stain standardization in digital color images, stain detection in tissue color images, and also in tissue image segmentation.

Design: The purpose of this study is to compare stain amount obtained using spectrometer data to stain amount estimated using a multiband imaging system. In the first method, the transmittance spectra at selected small sample regions in the stained tissue are measured using a microscope spectrometer, bright-field microscope, and xyz stage, and the H&E amounts are estimated in the sample regions using the Beer-Lambert Law. In the second method, an imaging system is used to take multiband images using a bright-field microscope, 1 charge coupled device (CCD; 2K × 2K) digital camera, and 16 filters. The transmittance spectrum in each pixel in the image is estimated first using camera sensitivity data, the 16-filter transmittance data, and illumination spectra. The H&E stain amounts are estimated using the Beer-Lambert Law and a priori information about H&E spectral transmittance data.

Results: Using a spectrometer, the transmittance spectrum is measured in a number of small regions in a stained tissue slide. Using the Beer-Lambert Law, the amount of H&E stain is estimated. The transmittance spectrum is calculated in all pixels in the image, using method 2. The average value of the transmittance spectra in each region used in method 1 is calculated, and the root mean square error between the transmittance spectra measured in methods 1 and 2 is calculated. The amounts of H&E stain estimated from method 1 will be compared with the average value of stain amounts estimated using method 2 for each region.

Conclusions: This study compares 2 methods to estimate transmittance spectra and stain amounts in a stained tissue slide. Experiments conducted show normalized root mean square error for transmittance spectra within 6%. Stain amounts estimated using both measurement methods are in close agreement.

Standardization of Stain Condition

Tokiya Abe1 ([email protected]); Nagaaki Ohyama2; Masahiro Yamaguchi1; Yuri Murakami2; Yukako Yagi.3 1Tokyo Institute of Technology, Imaging Science & Engineering Laboratory, Tokyo, Japan; 2Tokyo Institute of Technology, Frontier Collaborative Research Center, Tokyo, Japan; 3Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pa.

Context: Stain conditions are not always constant, as the staining technique strongly depends on hospitals or an individual histotechnologist. Trials for reducing the variety of stain variation by standardizing staining procedures have been tried, but they have not yielded good results because of the complexity of the staining procedure. To develop a decision support system, standardization of the stain conditions of tissue samples is required. First, we worked to standardize the hematoxylin-eosin stain. For this purpose, we used multispectral imaging techniques.

Technology: We have developed and studied the methods of quantification and standardization of the hematoxylin-eosin stain condition using digital image–processing techniques.

Design: In the proposed method, the transmittance spectra are estimated from multispectral images, and the amount of stain color pigment is calculated based on the Beer-Lambert Law. To standardize the stain condition of test slides that are not optimally stained, that is, understained or overstained, a reference tissue slide that is optimally stained is used. The ratio of dye pigments between the reference and a particular test slide is referred to as the weighting factor and is introduced to the Beer-Lambert equation to implement the standardization of stain conditions. The standardization results are transformed to red-green-blue color images to view.

Results: The method has been applied to various images of tissue slides with differing staining conditions. The resulting images were evaluated with respect to an optimally stained slide. Pathologists confirmed our evaluation that the color impression of the resulting images hardly differs from that of the optimally stained tissue slide.

Conclusions: Liver tissues were used for this research. It is confirmed that the method works for liver tissue samples. Further study to adopt the current method to other types of tissue samples is being considered.

Coagulation Service Web Site and Handheld Application Provide Point-of-Care Access to Decision Support Information

Peter Anderson, DVM, PhD ([email protected]); Kristopher N. Jones, MD, MPH; Marisa B. Marques, MD; George A. Fritsma, MS, MT; Kristina T. C. Panizzi, MED. Department of Pathology, University of Alabama at Birmingham, Birmingham.

Context: Coordinating the diagnosis and treatment of bleeding or thrombotic disorders is a difficult task. Clinicians, pathologists, and laboratory staff must work closely together to keep each health care team member informed and educated. In addition to quality of care issues, costs must also be considered. Often, medical staff are unaware or unsure of which assays to order among the myriad of very similar-sounding tests. Wrong assays are ordered or “shot gunning” occurs, which drives up costs. The Departments of Pathology and Medicine at the University of Alabama at Birmingham have established an integrated coagulation consultation service to help provide an efficient, cost-effective approach to the complicated field of diagnosis and therapy of coagulation disorders. One way in which this service hopes to improve efficiency and reduce costs is through education of clinicians regarding laboratory assays. Toward this end, the service has developed an educational Web site concerning coagulation disorders, assays for evaluating these disorders, and tips on how to decipher which tests to order and how to interpret them. In addition, parts of the contents of the site are available for download to Palm and PocketPC (Microsoft Corporation, Redmond, Wash) handheld computers to increase usage of the site's information at the point of care.

Technology and Design: The Coagulation Service Web site was developed using's (Vancouver, British Columbia) Article Manager product, which allows word processor documents to be quickly published in HTML format while retaining their formatting from the word processor file. This enables Coagulation Service staff to quickly and easily update the Web site's information as the need to change any data arises. Protocol manuals are already stored in word processor files, and as these are updated, they can be ported to the Web site. In the same fashion, Mobipocket (Paris, France) was chosen as the format for publishing the site's content to handheld computers. The company's Mobipocket Office Companion product allows word processor documents to be published to both Palm and PocketPC devices, while retaining the entire original formatting. The reader is free for both platforms and is frequently already installed by many users for use with other medical knowledge bases. The existing textual information from the University of Alabama Coagulation Service was quickly converted to HTML and Mobipocket formats from the original word processor files and published on the Web at

Results: During the first 6 months of the site's existence (January through June 2003)—and without extensive publication of the URL while the site was being finalized—there were 8261 unique visits to the site with 33 732 total page views. On average, there were 46 unique visits to the Web site each day, viewing a total average of 189 pages. The top 5 pages viewed out of more than 300 pages (each with more than 1000 views during this period) dealt with Fibrinogen/Fibrin Degradation Products Assay (with more than 5000 views alone), Lupus Anticoagulant Profile Assay, Anticoagulant Therapy Monitoring Guidelines, Anticardiolipin IgG Antibody Assay, and Thrombophilia Guidelines.

Conclusions: During the short period of time that the Coagulation Service Web site has been operational, it is difficult to determine how the usage of laboratory tests has changed and if this has indeed resulted in more precise and more cost-effective utilization of laboratory services. However, the large number of “hits” does demonstrate that this information is useful to clinicians, and this increase in knowledge should lead to better patient care and cost savings.

Linear Transformation of Spectral Transmittance in Application to Digital Staining of Unstained Tissue Sample Image

Pinky A. Bautista, MS1 ([email protected]); Tokiya Abe, MS1; Masahiro Yamaguchi, PhD1; Yukako Yagi2; Nagaaki Ohyama, PhD.3 1Tokyo Institute of Technology, Imaging Science and Laboratory, Yokohama, Kanagawa, Japan; 2Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pa; 3Telecommunication Advancement Organization, Akasaka Natural Vision Research Center, Tokyo, Japan.

Context: The development of multispectral imaging systems has provided a novel framework for pathologic diagnosis. We examine the viability of this technology in the context of digital staining.

Design: The ultimate goal of this work is to digitally transform an unstained tissue sample image into its hematoxylin-eosin stained image counterpart. To bring us to our goal, we used the unstained and stained spectral transmittances of different tissue components, namely, blood, nucleus, and cytoplasm, estimated from a 16-band multispectral image, which was acquired at the multispectral imaging system facility of the Telecommunication Advancement Organization, Akasaka Natural Vision Center, Japan. Initially, a linear process (Singular Value Decomposition and pseudo inverse) was used to extract the underlying relationship that exists between the unstained and stained transmittances. The procedure yielded a 16 × 16 transformation matrix, which was then used, together with other information that relates to the device characteristics when the images were captured, to transform the unstained tissue image to its stained counterpart viewed in the red-green-blue color space.

Results: Multiband images of unstained and stained slices taken from continuous sections of kidney tissue were acquired, and samples of tissue components, such as nucleus, cytoplasm, and blood, were located and their spectral transmittances were estimated. Preliminary results demonstrated the possibility of digital staining, although more work has yet to be done to attain the ultimate result.

Conclusions: The information drawn from the spectral transmittances of a multiband image affords the possibility of realizing a system, which can transform an unstained tissue image to its stained image counterpart. It is believed that once this system is fully implemented, it can serve as a helpful aid to pathologists in performing a diagnostic procedure in the least possible time.

The authors acknowledge Yoshifumi Kanno, NTT Data Corp, and Hiroyuki Fukuda, TAO for their technical support.

Automated De-identification of Pathology Reports

Bruce Beckwith, MD1 ([email protected]); Frank Kuo, MD, PhD2; Ulysses Balis, MD3; Rajeshwarri Mahaadevan, BS.3 1Department of Pathology, Beth Israel Deaconess Medical Center and Harvard Medical School; 2Brigham & Women's Hospital and Harvard Medical School; 3Massachusetts General Hospital and Harvard Medical School, Boston, Mass.

Context: Pathology case reports are often searched to identify tissue suitable for inclusion in research protocols. We are participating in a National Cancer Institute cooperative project, the Shared Pathology Informatics Network (grant UO1 CA91429-02), which aims to make de-identified pathology reports from many different institutions available for searching. At present, there is no publicly available software to de-identify pathology reports, while keeping the remainder of the report text intact. Our goal was to produce an open-source, Health Insurance Portability and Accountability Act (HIPAA)–compliant, “scrubber” tailored for pathology reports.

Technology: We decided on a 2-pronged algorithm for finding and removing potential identifiers. The first was a series of regular expression clauses that looked for predictable patterns likely to represent identifying data, such as dates, accession numbers, and addresses, as well as patient, institution, and physician names. This was augmented by a second-pass comparison with a database of names and places to recognize potential identifiers not removed by the pattern-matching heuristics. The code was written in Java ( with a MySQL database ( Publicly available lists of names and US locations were obtained from the Internet. The regular expressions and lists are easily modified, so that institution-specific names (eg, patients and employees) can be used. Our source code is available at the Shared Pathology Informatics Network Web site (

Design: Pathology reports from 3 institutions were examined for commonly encountered identifiers. Regular expressions were created to find and remove these identifiers. The scrubber was run iteratively on a training set until it exhibited good scrubbing performance. One thousand eight hundred new pathology reports (600 from each institution, encompassing 3 different time frames) were then processed, and each report was reviewed manually to look for identifiers that were missed (underscrubbing). The listing of removed text was also examined to find nonidentifying text that was removed (overscrubbing).

Results: Approximately 33% of the pathology cases contained identifiers in the body of the report. Ninety-six percent of identifiers present in the test set were removed. The identifiers that were missed were largely institution names and foreign addresses. Of the scrubbed cases, 1.3% contained HIPAA-specified identifiers (names, accession numbers, and dates) that were missed. Outside consultation case reports typically contained numerous identifiers and were the most challenging to de-identify comprehensively. There was variation in performance among the test sets, highlighting the need for site-specific customization. Overscrubbing was more prevalent than underscrubbing, and most instances of overscrubbing were due to the extensive list of personal and location names used.

Conclusions: We conclude that our first test of this software confirms the initial hypothesis that it is possible to create robust de-identification software using open-source tools. This application is currently capable of removing the vast majority of identifying information from pathology reports, while leaving the nonidentifying text intact. While the software does not perform perfectly yet, we expect that fine-tuning of the regular expressions and expansion of the database will remove the remaining identifiers. The major sources of underscrubbing are misspellings, accession numbers with unusual formats, and unexpected or unusual proper names.

Predicting Tumor Marker Outcomes With Monte Carlo Simulations

Jules J. Berman, MD, PhD ([email protected]). National Cancer Institute, National Institutes of Health, Bethesda, Md.

Context: Genome and proteome research have promised a revolution in tumor diagnosis. The revolution has not arrived. In fact, only a handful of new markers have appeared in the past several years. A simple thought experiment demonstrates the problem.

In a retrospective study, Dr X demonstrated a “perfect” tumor marker that never failed to distinguish between 2 tumor variants (aggressive and indolent) with identical morphology. In this example, an aggressive variant grows 10 times as fast and metastasizes at 10 times the rate of the indolent variant with the same morphology. In a prospective trial of the same marker, 200 tumors are excised at the time of clinical detection (tumor size, 2 cm). Dr X finds that 100 of the tumors stain as “indolent variants” and 100 tumors stain as “aggressive variants.” The trials monitor all 200 patients, determining survival at 5 years. At the end of the trial, there is no survival difference between patients with indolent variants and patients with aggressive variants. The marker is considered a total failure, with millions of dollars wasted on the prospective trial.

Technology: How is this possible? In the prospective study, all tumors were excised at 2 cm. Survival after excision was determined entirely by the presence of metastases, as patients with aggressive or indolent tumors without metastasis “prior to excision” were cured by the procedure. Since the aggressive tumors have a growth rate 10 times that of the indolent tumors, they reached the 2-cm size in one tenth the time required for the indolent tumors. The rate of metastasis in the aggressive tumor is 10 times that of the indolent tumor, but since aggressive tumors had one tenth the growth history in which to metastasize, both the aggressive and indolent tumors had the same number of metastatic cases when the tumors were excised. Hence, there was no difference in the survival outcome between the tumor variants. Dr X may have benefited from a simulation model designed to predict outcomes from a set of biological conditions and restraints.

The purpose of this project is to provide general scripts for predicting tumor marker outcomes using calculation-intensive Monte Carlo algorithms that model tumor growth and metastasis.

Design: PERL scripts written by the author made use of a random number generator to create Monte Carlo simulations of tumor growth and metastasis. Scripts were written with 2 isomorphic simulations, probabilistic prediction (fast) and brute-force per cell random number generation (slow).

Results: Simulations predicted differences in growth and metastatic occurrences from preset potential probabilities. Monte Carlo algorithms using per cell calculations required seconds to minutes for each tumor growth simulation on a 2.79-GHz desktop computer.

Conclusions: Computer simulations may be helpful when they model plausible outcomes unanticipated by human thought. PERL is a free, open-source, cross-platform language. All PERL scripts, along with explanatory text, are placed in the public domain and are available for download at

Zero-Check, a Zero-Knowledge Protocol for Reconciling Patient Identities Across Institutions

Jules J. Berman, PhD, MD ([email protected]). National Cancer Institute, National Institutes of Health, Bethesda, Md.

Context: Large clinical studies collect patient records from multiple institutions. Unless patient identities can be reconciled across institutions, individuals with records held in different institutions will be falsely “counted” as multiple persons when databases are merged.

Technology: The purpose of this study is to create a safe, zero-knowledge protocol that can reconcile individuals with records in multiple institutions without exchanging or comparing confidential or identifying patient information.

Design: The steps of the protocol are as follows:

1. Institution A and Institution B each create a random character string and send it to the other institution.

2. Each institution sums the received random character with their own random character string, producing a random character string common to both institutions (RandA+B).

3. Each institution takes a patient identifier (a name, a social security number, a birth date, or some combination of identifiers) and sums it with RandA+B. The result is a patient random character string that is identical across institutions when the patient is identical in both institutions.

Optional Implementation: At this point, RandA+B can be destroyed at both institutions and RandA and RandB can be destroyed by institutions A and B, respectively, leaving only the patient random character string at each institution. The destruction of these random numbers makes it impossible to recompute the original identifier from the patient random character string.

Optional Implementation: At this point, institutions may provide the patient random character string to a data broker. Having only the patient random character strings, the broker has zero patient-related information.

4. Institutions A and B compare their patient random character strings.

Optional Implementation: Institution A sends the first character of the patient random character string to Institution B. If the first character is not identical in both institutions, the protocol ends. The 2 patients are not the same person. If the first character is identical in both institutions, Institution B sends the second character. The process is repeated until a sufficient number of transactions have occurred to convince the institutions that they have the same patient random character string. This strategy ensures that the patient random character strings are never actually exchanged between institutions.

Results: The protocol can be implemented so that no information about the patient is transmitted across institutions. The protocol can be executed at high computational speed. A PERL script ( compared the speed of creating de-identified information using the zero-check protocol and with the MD_5 1-way hash. There was no significant difference in computational speed. The PERL script is available at

Conclusions: A zero-knowledge protocol for reconciling patients across institutions is described. This protocol is an example of a computational approach to data sharing, designed to help medical researchers comply with newly enacted US medical privacy regulations.

Extracting Genetic Conditions That Predispose to Cancer, Using the Online Mendelian Inheritance in Man

Jules J. Berman, MD, PhD ([email protected]). National Cancer Institute, National Institutes of Health, Bethesda, Md.

Context: A complete terminology of lesions/conditions related to cancer would contain (1) a comprehensive nomenclature of tumors, (2) a comprehensive nomenclature of precancers (morphologically identifiable lesions that precede the development of cancer), (3) a comprehensive nomenclature of acquired conditions that increase the risk of cancer (eg, acquired immunodeficiency syndrome and radiation exposure), and (4) a comprehensive nomenclature of genetic conditions that predispose to cancer (such as Li-Fraumeni syndrome and xeroderma pigmentosum). A complete cancer terminology is currently unavailable to researchers and pathologists. The purpose of such a nomenclature would be to facilitate the integration of biomedical data with lesions of interest to cancer researchers. Data integration enables researchers to discover the medical relevance of heterogeneous data elements. The author has published informatics techniques used to compile nomenclatures 1 and 2. This abstract describes a way of compiling nomenclature 4, using the Online Mendelian Inheritance in Man (OMIM).

Technology: The OMIM is a publicly available comprehensive and curated collection of all inherited conditions in man. It can be downloaded through anonymous ftp at The June 23, 2003 OMIM file was used. This file is 87 722 918 bytes in length and contains descriptions of 15 113 different inherited conditions of man. Conditions that are associated with the development of tumors are provided with a listing of the tumors that have been reported.

Design: The PERL script ( collects OMIM conditions predisposed to neoplastic development. It extracts the following information from OMIM records: (1) the OMIM number of the condition, (2) the name of the condition and its synonymous or closely related terms, and (3) the names of tumors associated with the condition. The script requires an external file (look-up list) containing a comprehensive listing of neoplastic terms. Instructions for obtaining such a file can be obtained at The extracted information is collected into an extensible markup language (XML) file. A version of the raw XML output file can be downloaded from

Results: The PERL script produces an output file in about 10 seconds using a 1.6-GHz computer. The output contains 518 conditions. Lynch cancer family syndrome, hereditary nonpolyposis colorectal cancer, cheilitis glandularis, Pasini-type epidermolysis bullosa dystrophica, hereditary desmoid disease, Aase-Smith syndrome, familial type thyroid carcinoma, Michelin tire baby syndrome, Oslam syndrome, and Maffucci syndrome are a small sampling of extracted conditions.

Conclusions: A PERL script is entered into the public domain that extracts from OMIM inherited conditions that predispose man to cancer. The PERL script is available at The output file is XML, supporting the facile integration of data elements (such as the OMIM identifier and the names of tumors) with other biological databases. The output file can be easily updated with newer versions of OMIM.

Generation of Isolation Notifications for Patients With Positive Cultures of Epidemiological Significance

Gary Blank ([email protected]); Mary Blank; Carlene Muto; Sally Sappington. Department of Pathology, University of Pittsburgh Medical Center; Pittsburgh, Pa.

Context: Notifying a unit of the need for patient isolation to be instituted has depended on manual review of reports and either microbiology or infection control personnel initiating a call to the unit. This is both time-consuming and can lead to potential exposures prior to the isolation for staff, visitors, and other patients.

Technology: A real-time program has been developed to survey current and historical laboratory information system data for cultures of epidemiological significance. It is designed to accommodate the environment of a health system in which multiple institutions exchange patients and use common standards of care and that share a common laboratory information system. The decision support directives are dictated by infection control and clinical management requirements, and an effort was made to enhance the synergy needed for improving timely, accurate clinical care.

Design: To optimize inpatient management, decision criteria can accommodate inpatient and outpatient incidents, resulting status, and time of last culture positivity. Cultures that are judged repeats are ignored for isolation notification if an alert has been issued during a visit. Infection control establishes criteria for synonymous specimen types and assigns the severity and protocols used in the notifications for isolation activities.

Results: Any combination of 4 notification mechanisms can be deployed, including fax, e-mail, paper reports, and text pager messaging. While paging is useful as an alert tool, the volume of information required for notification requires reliance on one of the report techniques to best provide the content and audit trail to support the application.

Conclusions: Experiences from a 6-month deployment in a university hospital setting are reviewed. The process has been nondisruptive, flexible, and reliable. It provides additional information not previously available from affiliate hospitals and has created a real-time notification system.

A Linguistic Approach to the Identification of Motifs and Pharmaceutical Classification of Gene Protein–Coupled Receptor (GPCR)

Betty Yee Man Cheng ([email protected]); Judith Klein-Seetharaman; Jaime G. Carbonell. Language Technologies Institute, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pa.

Context: The superfamily of G-protein–coupled receptors (GPCRs) is the target of approximately 60% of current drugs in the market. Disruption in their regulation can cause diseases such as cancer, cardiovascular disease, Alzheimer and Parkinson diseases, stroke, diabetes, and inflammatory and respiratory diseases. G-protein–coupled receptors are classified into subfamilies by their pharmaceutical properties. Due to their low sequence similarity, traditional alignment-based approaches to classification have had limited success on GPCRs at the subfamily levels. A recent study tested Basic Local Alignment Search Tool (BLAST), k-nearest neighbors, hidden Markov models (HMMs), and support vector machines (SVMs) with alignment-based features on subfamily classification of GPCRs, and concluded that the highly complex SVMs performed the best.

Technology: In this study, we applied a popular approach to document classification in language technologies research to GPCR subfamily classification. Here, we viewed each protein as a “document” where the “words” are all contiguous sequences of 1 to 4 amino acids. As in document classification, we used a feature selection algorithm to select the most important “words” for our task and applied a classifier on the counts of those selected “words.”

Design: For our task, we have chosen 2 very simple classifiers, the decision tree and the Näive Bayes classifier, because they allow easy interpretation of the reasons behind their predictions. Moreover, to the best of our knowledge, these classifiers are simpler than any that have been attempted on protein classification. We used χ^2 feature selection, which has been shown to be the most successful feature selection method in document classification.

Results: Using the same data set and training and testing protocol as used in the study on SVMs, we found our Näive Bayes classifier surpassing the reported accuracy of the SVM by 4.8% in level I subfamily classification and by 6.1% in level II subfamily classification. Our decision tree classifier, while inferior to the SVM, still outperforms the reported accuracy of HMM in both levels I and II subfamily classification. More importantly, the “words” chosen by our feature selection method correlate with motifs that have previously been identified in wet laboratory experiments.

Conclusions: Using a language technologies approach, we have developed a classifier that is much simpler but more accurate than the traditional protein classifiers in the pharmacology-based GPCR subfamily classification. In addition, our method identifies motifs that have been conserved at the subfamily level and may be potential target sites for future drugs.

Essential Elements for Semi-Automating Biological and Clinical Reasoning in Oncology

Roger S. Day, ScD ([email protected]); William E. Shirey; Michele Morris. Department of Biostatistics, University of Pittsburgh, Pittsburgh, Pa.

Context: Biomedical scientists reason with published and unpublished data to generate or assess hypotheses. Some task areas of this reasoning are gathering information, forming a conceptual model, imagining how the model system would behave, and drawing conclusions. Each of these task areas is increasingly challenging with advances in scientific knowledge, and especially demanding on the “great generalists,” for example, cancer clinical trialists, who must attempt the synthesis of several or many biological and clinical processes, and many scientific disciplines. We identify essential elements of architecture in this software assistant, examined in the context of cancer research, and demonstrate their implementation in the Oncology Thinking Cap (OncoTCap).

Technology: OncoTCap is a Java-based program that integrates knowledge capture support with model building and model validation for cancer. Knowledge representation is based on Stanford's Protégé system.

Design: Three interconnected work processes are supported by OncoTCap: (1) knowledge capture, (2) code mapping, and (3) application building. The lynchpin is the “Encoding” class, holding free-text instructions for running a simulation.

The knowledge-capture work process has the Encoding as its end product; the Encoding is the starting point of the 2 other work processes. The code-mapping work process involves searching a catalog of “Statement Templates,” each of which represents a simple or composite idea as a sentence with “blanks” or parameters, and then representing the Encoding by selecting a preexisting Statement Template and “filling in the blanks.” Each Statement Template has previously been tied to Java code. The application building work process involves grouping selected Encodings to create “Submodels,” “Submodel Groups,” and finally “Model Controllers.”

The resulting Model Controllers contain specifications for automatically generating simulation-based applications of various kinds, such as validation suites, treatment optimization routines, and patient simulators for professional training. A complete runnable model results from assembling all Java code from statement templates, after substitution. The model can simultaneously contain behaviors of individual cancer cells; microenvironment conditions, including other cell populations; patient physiologic functions, such as metabolism; adverse event targets; patients; and oncologists and their patient management plans; protocol implementation; and even Institutional Review Board review.

Results: The 3 work processes are demonstrated together with a resulting interactive cancer model.

Conclusions: The synthesis of cancer research information from many sources and use for comprehensive multipurpose cancer modeling is feasible. In the near term, this should be useful for clinical researchers and basic scientists in planning new studies and exercising their imaginations. In the distant future, the architecture should support the construction of patient-specific models for adaptive biologically based treatment decisions.

This work is supported in part by CA-63548.

TMAJ: Open Source Software to Manage a Tissue Microarray Database

Angelo M. De Marzo, MD, PhD2 ([email protected]); James D. Morgan1; Christi Iacobuzio-Donahue2; Brian Razzaque1; Dennis Faith.2 1Vision Multimedia Technologies; 2Department of Pathology, Johns Hopkins University, Baltimore, Md.

Context: TMAJ is a database and set of software tools to manage tissue microarray (TMA) information. TMAJ is presently implemented at The Johns Hopkins TMA Laboratory, and is being made available as free, open-source software for academic use (

Technology: TMAJ is written in Java, which allows the client to run it on most commonly used platforms. Java Web Start, a free component from Sun Microsystems (Palo Alto, Calif) handles application distribution and ensures the client is running the latest version of the software. Apache Tomcat handles requests to the Sybase Adaptive Server Anywhere Database Engine. Images are served across the Internet using Apache HTTP Server.

Design: The Data Entry client application facilitates automated and manual entry of data related to patients, specimens, tissue blocks, and tissue subblocks (individual pathologic diagnoses). Array Builder allows users to design their own TMAs using data that were input either from Data Entry or from its own limited data entry tools. TMAJImage allows digital images of TMA spots to be viewed and diagnosed on-line. It also facilitates entry of immunohistochemical or in situ hybridization scoring data. Digital images from a TMA slide are imported into the database using TMAJImageImport. A microscope slide-scanning device, such as the Bacus Labs Inc (Lombard, Ill) Slide Scanner (BLISS), is used to generate images of tissue cores. The TMAJAdministrator controls access to the database. Security is enforced through permissions protected by passwords assigned to users. TMAJ allows users to either share or separate their data, based on study and sample permissions. These permissions can, in turn, be used to securely limit access to specific specimens, blocks, array blocks, and sessions.

TMAJ allows the storage of a wide variety of information related to TMA samples, including patient clinical data, specimens, donor blocks, core, and recipient block information. TMAJ supports multiple discrete organ systems and different TMA scoring strategies, and the data structure is compatible with the Tissue Microarray Data Exchange Specification.

Once completed, a user may decide to publish a session, which makes the session's images, diagnoses, and scoring data publicly available. The Health Insurance Portability and Accountability Act regulations are addressed in the software. Information about the specimen that is considered identifying health information (such as surgical pathology numbers) is available only to users with specific permission. For increased protection, personal patient information is sent to the database through a separate import program and is encrypted using the Blowfish encryption algorithm.

Results: TMAJ contains data from more than 13 500 specimens, 7000 blocks, and 235 TMAs containing more than 35 000 tissue cores. There are currently 42 users from 8 different institutions. TMAJ has been adapted to house information from several organ systems, including the prostate, ovaries, bladder, lymph nodes, colon, and pancreas.

Conclusions: Pathologists or other researchers who analyze data from TMAs may wish to implement this free software package.

Organizational Simulation of a Clinical Trial Error Identifies Error-Prone Processes

Douglas B. Fridsma, MD, PhD ([email protected]). Center for Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: The Institute of Medicine estimates that up to 98 000 patients die each year as a result of patient care errors. Although the sentinel error event can be obvious and dramatic, often it is a series of smaller, undetected errors that culminate in serious errors and that result in poor outcomes. Researchers in error analysis suggest that most errors result from faulty systems, not faulty people.

Because modern medical care is a collaborative activity among health care providers who work in complex, interconnected organizations, if we wish to reduce patient care errors, we must study the complex processes in which patient care is delivered. Designing effective, efficient, and safe work processes and organizations requires understanding and analyzing the complex relationships among work processes, organizational structure, organizational participants, and the unexpected events of medical care that can lead to poor outcomes.

Technology: In other industries in which the costs of errors can be high, researchers have used organizational simulations as a tool to evaluate how well a particular organization might respond to unexpected events and to identify efficiencies and bottlenecks in current and proposed work processes. Simulation systems such as the Virtual Design Team have been shown to make accurate predictions about the effectiveness and efficiency of alternative engineering project organizations. For example, the Virtual Design Team was used in the aerospace industry to successfully identify error-prone processes prior to the development and manufacture of satellite launch vehicles. These simulation experiments identified the work activities and organizational participants that were more susceptible to errors. In these experiments, managers would have been permitted to make interventions before significant errors had occurred.

Design: We developed Occam, a simulation tool based on Galbraith's information-processing theory and Simon's notion of bounded-rationality, and derived from the Virtual Design Team simulation tool. We then retrospectively modeled a chemotherapy administration error that occurred in a hospital setting using structured interviews and expert analysis.

Results: The traditional root-cause analysis suggested that the filling of a vacant position in the organization would have prevented the prescribing error. However, our organizational simulation suggested that when there was a high rate of unexpected events, the oncology fellow was differentially backlogged with work when compared with other organizational members. Alternative scenarios suggested that providing more knowledge resources to the oncology fellow improved her performance more effectively than adding additional staff to the organization.

Conclusions: Although it is not possible to know whether this organizational simulation might have prevented the error, it may be an effective tool to augment traditional analysis tools and prospectively evaluate organizational “weak links,” explore alternative scenarios to correct potential organizational problems before they lead to medical errors, and identify where resources can be used to improve both the efficiency and the safety of oncology care.

Microscreen: A Virtual Slide Program Used for Computer-Based Proficiency Testing

MariBeth Gagnon, MS, CT(ASCP)HTL ([email protected]); Edward Kujawski, MS, BME. Public Health Practice Program Office, Centers for Disease Control and Prevention, Atlanta, Ga.

Context: Laboratory procedures such as Papanicolaou tests are invaluable in detecting early cancer and reducing morbidity. Procedures such as this rely on the ability of laboratory scientists to detect and identify cellular abnormalities on glass-slide specimens. One way of measuring the skills and abilities of these scientists is to have them screen glass slides, render a diagnosis, and score their answers—proficiency testing. Because of the difficulties in obtaining glass slides, many scientists are not routinely tested. Consequently, an alternative method—a computer-based model—has been developed. This model uses MicroScreen, a virtual slide capture program developed by the Centers for Disease Control and Prevention (CDC). This computer-based model was developed to comply with the specific requirements used in glass slide proficiency testing. The need to meet these requirements and keep the navigation comparable to screening a glass slide using a microscope controlled the decisions of this model's inventors.

Technology: A collection of software programs was developed to integrate off-the-shelf hardware components from a variety of vendors to capture, archive, and display images from a microscope slide. The outcome is an automated process to capture 8000+ images, which are stitched together to create a virtual slide. The virtual slide may be viewed on any computer running Microsoft Windows (Redmond, Wash). Each virtual slide contains high and low magnifications, with the ability to change magnification or view multiple focal planes at any location on the slide, not just preselected areas. Best resolution is achieved when the computer has at least 512 MB of RAM and 32 MB of video RAM.

Design: A laptop computer provides portability and easy transport of the model for on-site testing. A laptop's 40-GB hard drive is capable of storing 40 to 50 images used to create multiple tests. The testing format, called CytoView, displays thumbnails of 10 virtual slides for selection and loading in a display window. The individual taking the test can choose the order of examining the 10 virtual slides, view the slide at ×10 and ×40 magnifications, focus up and down through multiple focal planes, and change diagnosis at any time prior to exiting the test. CytoView, like the glass slide test, has the ability to test a pathologist with slides that have been examined and premarked by a cytotechnologist.

Results: Two studies conducted by the CDC demonstrated user-friendliness of the system and technologic comparability between computer-based tests and glass slide tests.

Conclusions: Virtual slides may be used to conduct computer-based testing, and computer-based testing may be a suitable alternative to glass slide testing.

Comparative Analysis of Statistical Learning Techniques for Classification of Proteomic Profiles

Milos Hauskrecht, PhD1 ([email protected]); Richard C. Pelikan, BS1; James Lyons-Weiler, PhD.2 1Department of Computer Science, University of Pittsburgh, Pittsburgh, Pa; 2Center for Pathology Informatics, Benedum Oncology Informatics Center, University of Pittsburgh Cancer Institute, Pittsburgh, Pa.

Context: Proteomic profiling is believed to be an important keystone for developing techniques that improve detection and diagnosis methods for many diseases. In particular, there is extreme interest in searching for proteomic patterns that allow the successful discovery of cancer biomarkers. Interpretative analysis, such as accounting for mass inaccuracy, can be approached algorithmically or statistically. Our goal is to advance and contribute to the development of automated identification tools that aid in the discovery of such biomarkers, which in turn will aid the performance of existing early-detection and diagnosis techniques.

Technology: We implemented several algorithms using statistical machine learning and signal preprocessing toolboxes, available through MATLAB (copyright 1984–2002, The Mathworks, Inc, Natick, Mass). The data used in this study were published proteomic profiles from ovarian and prostate cancers generated through SELDI-TOF (Surface Enhanced Laser Desorption Ionization Time of Flight) mass spectrometry.

Design: We have developed tools for preprocessing and analysis of SELDI-TOF data, which examine the resulting spectra in their entirety. Our approach is divided into multiple modular phases, which interact and affect the performance of each other. The phases include peak detection, peak alignment, feature selection through principal component analysis, and modules that locate differently expressed regions using multiple selection criteria, including, but not limited to, Fisher score, receptor operator characteristic area under the curve, t test, and mutual information criteria. Finally, a spectrum of classification algorithms, including linear discriminant analysis, Näive Bayes, support vector machines, and others, are evaluated using cross-validation across standardized data sets.

Results: We have compared our current work with previous studies in the field and have provided a comparison of our varied feature modules' performance on the classification of cancer mass spectrometry profiles. Our results demonstrate that certain combinations of data processing and feature extraction techniques prove to be more effective in the classification of cancer cases than others.

Conclusions: Although proteomic profiling promises many breakthroughs in the improvement of cancer detection, diagnosis, and discovery, robust feature selection remains the key issue. We believe that development and use of more advanced techniques or combinations of existing techniques for feature selection will spur the progress of current detection and diagnosis methods for many types of cancer.

A 5-Dimensional, Mean-Shift Approach for Improved Segmentation of Pathologic Images

Wei He, PhD1 ([email protected]); David J. Foran, PhD2; Peter Meer, DSc.3 1Department of Biomedical Engineering; 2Department of Pathology and Radiology, Center for Biomedical Imaging and Informatics, University of Medicine and Dentistry of New Jersey; 3Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ.

Context: The University of Medicine and Dentistry of New Jersey, Rutgers University, and the University of Pennsylvania School of Medicine are collaborating to develop PathMiner, a suite of Web-based technologies and computational tools for reliable decision support in pathology.

Technology: PathMiner uses state-of-the-art computer vision technologies to automatically locate and retrieve those digitized pathology specimens from within a set of “ground-truth” databases that exhibit spectral and spatial profiles that are consistent with a given query image. Based on the majority logic of the ranked retrievals, the system provides the statistically most probable diagnosis.

Design: The system has been designed to perform unsupervised processing of the incoming queries. The first crucial step of the analysis is the delineation of input images into their constituent visual components. All subsequent, higher order abstractions related to the image rely on the speed and accuracy with which the segmentation operation is carried out. While there currently exist a multitude of segmentation strategies, most lack sufficient sensitivity for discriminating among the subtle differences in visual classes contained within a typical pathology specimen. In order to address this challenge, we present a 2-phase segmentation algorithm based on a 5-dimensional mean-shift strategy.

Results: The algorithm was shown to be stable in spite of variations in staining characteristics across specimens. To test the stability of this approach under controllable conditions, experiments were conducted using 4 types (gaussian, Poisson, salt and pepper, speckle) of additive noise. The result of these studies revealed a ranked tolerance of the algorithm toward noise. While the algorithm is virtually unaffected by the salt-and-pepper noise, the introduction of speckle noise produced the most significant impact on performance in terms of slower processing speed and coarser segmentation results. With regard to both gaussian and Poisson noise studies, the algorithm was shown to withstand perturbations even when the energy of the noise exceeded the energy of the original image.

Conclusions: Results from preliminary feasibility experiments showed that the system was effective for a series of pathology specimens, even under controllable noisy conditions. A more comprehensive study, including man-machine experiments, is currently underway to better gauge the performance of the system and to evaluate its usage in a broader range of pathologic studies.

A Bioinformatics Tool for Verification of Gene-Gene Interactions

Venkatesh Jitender, BE1 ([email protected]); Gulsum Anderson, PhD2; Judith Klein Seetharaman, PhD2; Vanathi Gopalakrishnan, PhD.1 1Center for Biomedical Informatics, University of Pittsburgh Medical Center, Pittsburgh, Pa; 2Department of Pharmacology, University of Pittsburgh Medical Center, Pittsburgh, Pa.

Context: The problem of learning gene regulatory networks from DNA microarray data sets has received much attention in systems biology. Such networks are typically represented as gene-gene interactions and are learned by applying advanced statistical and machine learning techniques to such gene expression data sets as the “Stanford yeast cell cycle.” The results from such analysis techniques depend on critical scoring measures and need to be validated from actual experimental data.

Technology: The goal of this project was to develop a computational tool to verify and validate the outcomes of computational analysis tools. Various biological databases are available on the World Wide Web, which provide users with useful results from experimental procedures. By verifying for the presence of prediction results from a combination of such databases, a critical evaluation of such prediction tools is possible.

Design: For this research, we used the Gene Ontology (GO), Saccharomyces Genome Data (SGD), CURAGEN–PATHCALLING, TRANSFAC, and KEGG databases. Each of the above information sources provides both complementary and overlapping information for validating gene-gene interaction. The SGD provides information on individual genes of interest, such as GO annotations and available literature. The PATHCALLING tool within CURAGEN is a gene interaction tool from valid experiments using the Yeast Two Hybrid System. The pathway database from KEGG provides a graphical representation of biological pathways within various organisms. TRANSFAC is organized as a set of tables that maintains the cofactor information for various genomes.

Our verification tool is designed as a Web-crawling spider that queries the above databases and checks for presence of relevant information to validate each predicted interaction. An interaction within a gene pair would be validated if it was found to have similar GO annotations or if the interaction was found within CURAGEN and TRANSFAC.

Results: With our tool, we were able to verify and validate a larger number of interactions than with any of the single sources alone. Of the information stored in the above databases, CURAGEN, TRANSFAC, and most importantly KEGG provided the most relevant information.

Conclusions: We have demonstrated that our tool is a powerful method to validate results obtained from prediction algorithms for gene-gene interaction within yeast cell cycle gene expression data using information from 4 different databases. Statistical analysis of the results should yield biological information on what type of information can be gained from gene expression data beyond co-regulation.

Macro-Driven Report Entry: Next Step in Report Generation

Drazen M. Jukic, MD, PhD1 ([email protected]); James R. Davie, MD, PhD.2 1Department of Pathology, University of Pittsburgh School of Medicine, Pittsburgh, Pa; 2Marshfield Clinic, Marshfield, Wis.

Context: In recent years, the utilization of anatomic pathology laboratory information systems (APLIS) has grown exponentially. One of the more popular systems is CoPath (owned by Cerner [], North Kansas City, Mo). The University of Pittsburgh Medical Center had utilized a host-based Massachusetts General Hospital Utility Multi-Programming System (MUMPS) application since 1991 and migrated to CoPath Plus, a Microsoft Windows–based client-server graphic user interface (GUI) application, in 1998.

The GUI has allowed for increased flexibility in handling steadily increasing demands. Data content has grown in part due to advances in pathology science, with new tests and data requirements, and partly due to regulatory and billing requirements. The increasing complexity of reports and the report-generating process has been dealt with in 2 ways: (1) increased dependence on the resident and/or ancillary staff help and (2) increased involvement of the attending pathologist in the report review and production.

Failure to get involved in any of the ways listed above results in increased turnaround time, constant complaints from our clinical colleagues, and a steady stream of reports with misspelled words and nonsensical statements.

Implementation of GUI sometimes increases (rather than decreases) the total time a pathologist needs to generate a report. Increase in “customer” demands through the years is addressed in each new version of GUI-driven APLIS and is reflected by the growing complexity of those systems. Therefore, an upgrade to a GUI is often accompanied by an expansion of the APLIS feature set and an increase in the data content of each report.

Such complexity often hinders the efficient entry of so-called “routine” cases. These are characterized by a high volume and a high proportion of “canned” or repetitious diagnoses or stain-ordering requirements. Data entry steps take up the majority of the total time required to handle these cases. Therefore, a simple dictation becomes a time-limiting step, with additional need for corrections.

Design: By using MacroExpress and a feature of CoPath that is a combination of Microsoft Word and the Microsoft Windows interface—“quick text”—we have developed a simple and yet very valuable system that saves time for pathologists, eliminates transcription errors, and significantly enhances turnaround time of reports.

We used Dell (Round Rock, Tex) and Compaq (Palo Alto, Calif) workstations (either Windows NT or Windows 2000 based), equipped with wireless keyboard and mouse (Logitech), with large screen monitors (Dell and Monivision). Necessary software included CoPath (Client/Server application) with quick text and “quick sign-out” features, MacroExpress software from Insight Software Solutions (, and Microsoft Word 97 or 2000 (bundled with CoPath).

We have compiled quick text that initially included the 80 most common diagnoses in dermatopathology and approximately 30 entities in genitourinary pathology. This has allowed for a seamless integration into the quick-text database in CoPath.

Cooperation from the anatomic pathology informatics support team was essential, especially in the creation of complex order menus in histology, which allowed for ordering of immunohistochemical panels with several keystrokes.

Results: As a consequence of macro-driven reporting and histology ordering, the turnaround time in dermatopathology has been reduced from 4 days to 2 days. Both dermatopathology and genitourinary pathology fields boasted structured reports with reproducible data that were praised by clinical services and cancer registry alike. All of the pending cases were clearly identified for pathologists and placed at the end of the sign-out queue, providing them with a momentary review of overdue work.

We counted the number of keystrokes needed to dictate a report in our environment. One needs to press at least 16 keys on the numeric (telephone) pad to initiate and complete a dictation, while for most cases, one needs to press only 4 to 8 keys on the computer keyboard to achieve the same result with macro-driven entries. Additionally, input by quick text eliminates the need for spell checking.

Conclusions: We have developed a unique timesaving system that allows for fast and structured reporting. Although applicable to the CoPath Plus and CoPath PicsPlus APLIS in our environment, it could be tailored to any GUI-driven APLIS. In the future, we will develop tight integration with Dragon Medically Speaking (ScanSoft Incorporated, Peabody, Mass), which will allow for dictation of cases that necessitate longer descriptions and comments, as well as voice-driven macros. University of Pittsburgh Medical Center has included with CoPath PLUS client server application the CoPath PicsPlus (Cerner) application. We are working on macros that will allow for either keystroke-driven or voice-driven image capture. Once this is underway, creating image-rich reports are a reality.

Expert Opinions Just a Mouse Click Away: The Army Telepathology Program

Keith J. Kaplan, MD1 ([email protected]); Thomas R. Bigott, BS2; Rod Herring3; Daniel R. Butler, HT(ASCP)3; Bruce Williams, DVM.3 Departments of 1Pathology and 2Telemedicine, Walter Reed Army Medical Center, Washington, DC; 3Armed Forces Institute of Pathology, Department of Telemedicine, Washington, DC.

Context: Telepathology is the practice of digitizing histological or macroscopic images for transmission along telecommunication pathways for diagnosis, consultation, or continuing medical education. In dynamic (real-time) telepathology, the consultant examines slides remotely with a robotic microscope that allows the viewer to select different fields and magnification powers. The use of real-time remote telepathology is attractive because it provides an opportunity for pathologists to obtain immediate consultation and allows for complete control by the consulting pathologist. The Army Medical Department (AMEDD) is the ideal setting for remote telepathology, as it has several pathologists located throughout the world who are isolated from regional medical centers.

Technology: Telepathology units consisting of standard microscopes with robotic stages controlled with standard personal computers (PCs) using MedMicroscopy (Trestle Corporation, Newport Beach, Calif) were deployed and installed at 11 US Army hospitals throughout the world. The units were connected over the Internet to the standard PCs at the Armed Forces Institute of Pathology using MedMicroscopy Viewer software. At the microscope end, an automated microscope is attached to a standard PC running MedMicroscopy. Once connected to the Internet, users can log onto and control the microscope from anywhere using the MedMicroscopy Viewer. Images appear on screen in real time, and the Viewer allows full navigation of the slide, including control of objective, focus, and illumination. An initial validation study of this technology was performed with 120 consecutive frozen sections reviewed retrospectively from across the AMEDD with 100% diagnostic concordance between the glass slide and telepathology diagnosis.

Design: During a 1-year period, the number of consults sent to Armed Forces Institute of Pathology using real-time, remote telepathology was 170, compared with 43 consults sent using static image-based telepathology alone.

Results: Eighty percent of cases sent using dynamic telepathology have resulted in definitive diagnoses, negating the need for slides and/or paraffin blocks, compared with only 20% of cases historically using standard static image-based telepathology. This represents a nearly 4-fold increase in both the number of consults sent via telepathology and resulting definitive diagnoses.

Conclusions: The use of dynamic telepathology provides live, interactive expert consultation to remote sites without access to local specialists, while helping to prevent medical errors, reduce costs, and increasing access to specialty care for military personnel and their families. The use of telepathology may even enable support to remote sites without direct pathology support. This is the first time a worldwide remote telepathology program has been accomplished in the AMEDD.

Diagnosis of Cutaneous T-Cell Lymphoma Patients With High or Low Tumor Burdens by Discriminant Analysis of Quantitative Polymerase Chain Reaction Data

Laszlo Kari, MD1 ([email protected]); Andrey Loboda, PhD1; Michael Nebozhyn, PhD1; Alain H. Rook, MD2; Eric C. Vonderheid, MD3; Maria Wysocka, PhD2; Michael K. Showe, PhD1; Louise C. Showe, PhD.1 1The Wistar Institute, Philadelphia, Pa; 2Department of Dermatology, The University of Pennsylvania School of Medicine, Philadelphia; 3Department of Dermatology, Johns Hopkins Medical Institutes, Baltimore, Md.

Context: The cutaneous T-cell lymphomas are non-Hodgkin lymphomas of epidermotropic lymphocytes, of which the most common forms are mycosis fungoides and its leukemic variant, Sezary syndrome. In a previous investigation, we analyzed gene expression profiles of peripheral blood mononuclear cells from patients with Sezary syndrome, using cDNA arrays. Our analysis of peripheral blood mononuclear cells from patients with high numbers of circulating Sezary cells identified a large number of genes that were differentially expressed between patient and control peripheral blood mononuclear cells. By using penalized discriminant analysis to analyze these data, we were able to identify small groups of genes that were able to accurately classify patients with high tumor burden (60%–99% of circulating lymphocytes) from normal controls with 100% accuracy. When array data from patients with lower tumor burdens were tested, a subset of 8 genes was identified, whose patterns of expression could still distinguish, with 100% accuracy, patients with as few as 5% tumor cells from the normal controls.

Technology: We used real-time polymerase chain reaction to determine the expression value for the biomarkers. These expression values were analyzed by penalized discriminant analysis. The penalized discriminant analysis proceeds in 2 steps. The gene expression data for 2 known classes of samples, in this case patient and control samples, are first analyzed in a training session. In the second step, the “discriminator” developed in the training set is used to analyze new samples in an independent test set to validate the ability of the selected genes to correctly identify which are patients and which are controls.

Design: The patient and control samples were divided into several groups with an appropriate balance of each class of samples in both the training and test sets. Training sets included patients with high numbers of circulating Sezary cells to provide samples with the strongest cutaneous T-cell lymphoma gene signatures for the training. Test sets included primarily patients with low Sezary cell counts in order to test the accuracy of the 5-gene discriminator. The classification method was first validated on samples composed of amplified RNA, and then on samples of unamplified total RNA. As a final validation, we tested our classification procedure on a blinded set of 23 patient and control samples, also using total RNA.

Results: We applied penalized discriminant analysis to gene expression values measured using quantitative real-time polymerase chain reaction, for just 5 diagnostic genes. The 5-gene panel can identify patient samples with 97% accuracy, including those with as few as 5% circulating tumor cells. Classification accuracy was 96% for the independent blinded set.

Conclusions: Our results suggest that any disease with a unique gene expression signature, identified by differential gene expression patterns, can also be accurately identified with supervised classification of quantitative polymerase chain reaction data. In our case, this signature could be reduced to a small number of genes, which may not always be possible. As new molecular signatures that define the various tumor characteristics are identified, this simple and inexpensive method can be equally effective for tumor staging, monitoring prognosis, and defining responsiveness to therapy.

An E-Mobile Solution for Wireless Oncology Clinical Trial Reporting

Stephanie Karsten, CMC ([email protected]); Virginia Riehl; Jordan Hirsch; Gavin Asdorian. Technology Consultants, Humanitas, Inc, Silver Spring, Md.

Context: Clinical trial data collection occurs in many locations. Much of this data collection is done on paper forms and later transferred to computer systems. The use of a mobile data-collection tool could increase accuracy and timeliness of clinical trial data collection. We present a prototype of a mobile clinical trial data-collection tool, eMobile. It was developed for the National Cancer Institute. The project is currently funded to develop a complete mobile data-collection tool.

Technology: The eMobile prototype was developed to run on both a laptop and tablet platform. Both devices weigh less than 4.4 pounds. A wireless Internet connection is used to download protocol logic and forms, and to upload completed forms. A Web-based data repository is used to manage data that are uploaded from the mobile devices.

Design: The prototype design was based on extensive interviews with staff involved in cancer clinical trial data collection. Detailed flow charts were developed and areas where automation could improve current processes were identified. A key focus of the design was to create a user interface that matched the current data collection process. The prototype also incorporated functions to support the management of clinical trial data collection, such as reminders, error flagging, and personal follow-up notes.

Results: Academic health center, community hospital, and National Cancer Institute clinical trial data collection experts reviewed the prototype. All reviewers said the prototype enabled them to improve the efficiency and quality of clinical trial data collection. They did not have a clear preference for either tablet or laptop devices. Based on the prototype, the National Cancer Institute has awarded funding for development of a complete application.

Conclusions: A mobile data collection device has the potential to improve efficiency, timeliness, and quality of data collection for cancer clinical trials. Its special features that support the management of clinical trial data collection may increase the likelihood of adoption by cancer researchers in a diverse array of types of clinical settings.

Novel Approaches for Sensitive Detection of Gene Expression Changes Using Oligonucleotide Microarrays

Stephen R. Master, MD, PhD1,2,3 ([email protected]); Alexander J. Stoddard, MS2,3; L. Charles Bailey, Jr, MD, PhD2,3,4; Katherine D. Dugan, MS2,3; Lewis A. Chodosh, MD, PhD.2,3 1Department of Pathology and Laboratory Medicine, Hospital of the University of Pennsylvania, Philadelphia; 2Department of Cancer Biology, University of Pennsylvania School of Medicine, Philadelphia; 3Abramson Family Cancer Research Institute, Philadelphia, Pa; 4Department of Pediatrics, Children's Hospital of Philadelphia, Philadelphia, Pa.

Context: The use of gene expression profiling for pathologic diagnosis within complex tissues relies on the sensitive, reliable detection of changes in transcript abundance. Microarray estimates of gene expression by hybridization to multiple oligonucleotides have been widely used in an investigational setting. However, current analytical methods (such as SAM or dChip) do not directly use the information contained in these separate hybridizations to improve the sensitivity of comparisons across multiple replicates. Additionally, the statistical significance of these methods has not been rigorously validated using independent, biologically identical samples.

Technology: We have developed 2 novel algorithms for reliably detecting gene expression changes. These algorithms exploit the information contained in individual oligonucleotide hybridizations on Affymetrix (Santa Clara, Calif) microarrays. One (Intersector) is implemented in PERL and uses data derived from the Affymetrix Microarray Analysis Suite. The second (ChipStat) is implemented in C and directly analyzes CEL file data.

Design: In order to empirically determine the false-positive rate of these algorithms as a function of stringency parameters and to compare them with previously described approaches, microarray analysis was performed on a series of independent, biologically identical RNA samples from the murine mammary gland. Additional triplicate samples representing the transition of the neonatal gland into puberty were also examined to determine the relative sensitivity of these novel algorithms when compared with previous approaches. As this developmental transition involves both changes in intracellular genetic networks as well as shifts in the relative abundance of various cell types, this comparison was expected to reflect the types of changes characteristic of a variety of pathologic conditions. In order to independently assess the false-positive rates of our approaches, a subset of transcripts was prospectively selected in a random fashion for confirmation by Northern blot hybridization.

Results: Using independent, biologically identical samples to empirically calibrate false-positive rates, the algorithms we developed have been shown to outperform approaches that are currently in widespread use for the detection of gene expression changes. Further, at a given false-positive rate, a unique subset of genes was found by each algorithm, suggesting that each approach identifies different features. Analysis of the lists of genes shown to change significantly during early murine mammary gland development is consistent with changes in cell subtypes and functions previously described.

Conclusions: The algorithms we developed demonstrate greater sensitivity at a given false-positive rate than methods currently in widespread use. We have demonstrated the utility of our approaches by examining developmental changes in the murine mammary gland during the transition to puberty. These methods can provide a basis for identifying pathologic changes of gene expression in subsets of cells within a complex tissue. As such, we anticipate that they will contribute to the application of high-throughput microarray technologies to pathologic diagnosis in neoplasia.

Development of a Real-Time, Intraoperative Teleneuropathology System: The University of Pittsburgh Medical Center Experience

Rafael Medina-Flores1 ([email protected]); Yukako Yagi2; Michael J. Becich, MD, PhD2; Clayton A. Wiley, MD.1 1Division of Neuropathology, University of Pittsburgh, Pittsburgh, Pa; 2Division of Pathology Informatics, University of Pittsburgh Medical Center, Pittsburgh, Pa.

Context: In the year 2002, we identified an increasing demand for intraoperative neuropathological consults at University of Pittsburgh Medical Center Shadyside Hospital. A decision was made to use the combined expertise of 5 consultants from the nearby Division of Neuropathology, based at University of Pittsburgh Medical Center Presbyterian Hospital. This work is intended as a narrative of our efforts during the last 19 months to develop a flexible, on-demand, real-time telepathology system for intraoperative neurosurgical consultation at the University of Pittsburgh Medical Center.

Technology: Our initial choice for a telepathology system was an integrated video teleconferencing system (VTEL Products Corporation, Austin, Tex) with real-time microscopy capabilities; this system was used sparingly for 6 months and then substituted with a Nikon DN100 Digital Network Camera (Nikon USA, Melville, NY). The DN100 system features 1.3-megapixel color resolution imaging, easy networking, and full capability to broadcast images and streaming video to a remote pathologist through a Web browser. All microscopic images were generated on-site through an Olympus BX2 series microscope.

Design: When a neurosurgical consult was called at Shadyside Hospital, the remote neuropathologist was paged and a simultaneous telephone conference was established while the consultant started the video teleconferencing system or pointed his or her Web browser to the Shadyside Web site. We agreed on a common protocol for handling intraoperative specimens with emphasis on intraoperative smear cytology; frozen sections were performed only when a diagnosis was not readily apparent. The consultant received a real-time stream of histologic images from the referring pathologist at University of Pittsburgh Medical Center Shadyside. After discussion of the findings, an intraoperative diagnosis was rendered. All neurosurgical specimens were then sent to University of Pittsburgh Medical Center Presbyterian, where further immunohistochemical and molecular workup were done, and a final diagnosis was provided by the same intraoperative consultant.

Results: From January 2002 to August 2003, we performed a total of 73 remote intraoperative consults, with an average of 3.7 cases per month (range, 0–13). The caseload was distributed as follows: consultant 1, 22 cases; consultant 2, 10 cases; consultant 3, 22 cases; consultant 4, 18 cases; and consultant 5, 1 case. The mix of neuropathologic diagnoses was similar to what is reported in other institutions, with 30% glial neoplasms, 21% meningiomas, 20% metastatic disease, 5% each of pituitary adenomas and cerebral infarcts, and 16% of other neoplasms. Our intraoperative diagnostic accuracy rate for distinguishing between benign and malignant conditions was 100%; further histologic classification was correct 83% of the time. Intraoperative diagnoses were deferred in 12% of cases. Discordance between intraoperative and final diagnoses was seen in 3 cases, owing to 1 or more of the following: inadequate sampling of the specimen or the microscopic preparation, poor quality of imaging, or extensive necrosis.

Conclusions: This preliminary study proves the feasibility of real-time, intraoperative teleneuropathology given adequate technological and consultation resources. Our future aim is to fully digitize intraoperative slides and transmit them via electronic mail to reduce broadcasting time and sampling bias.

Improved Software for Quantitative Analysis of Fluorescence Microscopy Images

Alan K. Meeker, PhD1 ([email protected]); Joe Zimmerman2; Jessica Beckman-Hicks1; Angelo M. DeMarzo.1 1Department of Pathology, Johns Hopkins Hospital, Baltimore, Md; 2Vision Multimedia Technologies, Baltimore, Md.

Context: Telomeres shorten in cancer and precancerous lesions; therefore, the measurement of telomere lengths in situ will facilitate studies on cancer pathogenesis. To overcome the limitations posed by traditional Southern-based methods, we recently developed a technique for quantitative telomere length analysis; however, the image-processing steps are cumbersome, requiring extensive manual intervention. Here, we describe new image analysis software for extracting telomere length data from fluorescence microscopy images. This software streamlines the process of quantitative telomere length analysis and allows for direct export of results to an established Microsoft Access (Microsoft Corporation, Redmond, Wash) database.

Technology: The protocol for combined fluorescence in situ hybridization (FISH) staining of telomeric DNA and immunostaining, as well as image acquisition, was performed as previously published. A Zeiss Axioskop epifluorescence microscope (Carl Zeiss Inc, Thornwood, NY) equipped with appropriate fluorescence filter sets (Omega Optical, Brattleboro, Vt) was used, and grayscale images were captured with a cooled charge coupled device (CCD) camera (Micro MAX digital camera, Princeton Instruments, Trenton, NJ). The custom-designed plug-in was written for the image analysis software package ImageJ ( using a desktop Pentium 4–based computer.

Design: Quantification of the digitized fluorescent signals is accomplished through the image analysis software package ImageJ ( and the custom designed plug-in as follows: matched telomeric and nuclear DNA image files are normalized with a simple background subtraction and the resulting telomere image is run through a sharpening filter, followed by enhancement using a rolling ball algorithm ( for contouring of telomeric spots, followed by manual thresholding to remove any remaining background noise. Binarized masks are then created and applied to the original fluorescence images. For a given cell, telomeric signals identified by the segment mask that are contained within the area inscribed by the nuclear DNA signal are then measured, and the data for each telomeric spot are tabulated. The total DAPI (DNA) fluorescence signal for each nucleus is likewise quantified. For each cell, the individual telomere intensities are summed, and this total is divided by the total DAPI fluorescence signal for that nucleus, thus correcting for differences in nuclear cutting planes and ploidy. Tabulated data are then stored in a MySQL ( database and viewed through Microsoft Access.

Results: The newly written ImageJ plug-in performed equivalently to the prior, more labor-intensive method, when assessed on standard curve validation image sets. We find that use of the rolling ball algorithm provides rapid and superior delineation of telomeric signals. Overall, the new plug-in largely eliminates the need for manual intervention and should be applicable to other situations requiring quantification of fluorescence signals, such as with FISH images.

Conclusions: A newly developed plug-in for ImageJ that accelerates the process of extracting quantitative data from fluorescence microscopy images is presented. The new program features superior discrimination of fluorescent signals and substantially reduces the need for user intervention during image processing. This software has proven useful in telomere length analysis and should be broadly applicable to other FISH studies where quantification is desired.

GNegEx: Implementation and Evaluation of a Negation Tagger for the Shared Pathology Informatics Network

Kevin J. Mitchell, MS ([email protected]); Rebecca S. Crowley, MD, MS. Benedum Oncology Informatics Center and Center for Pathology Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: Use of negation is a ubiquitous problem in the extraction of information from free-text clinical reports. Automated coding of free text typically results in negated concepts being coded as present. This diminishes the precision and recall of information-retrieval systems based on autocoded terms.

Technology: We report on implementation and evaluation of the NegEx algorithm for detecting and tagging negated concepts in free-text pathology reports. Our GNegEx negation tagger is part of a larger, pipeline-based information extraction system that we are developing as part of the Shared Pathology Informatics Network (SPIN), which also includes modules for chunking reports into component sections and tagging Unified Medical Language System (UMLS) concepts. A total of 20 000 reports have now been automatically coded and are available for use to the Shared Pathology Informatics Network.

NegEx is a public domain negation algorithm developed by Bridwell and Chapman (∼chapman/NegEx.html) that matches common prenegation phrases (eg, “No evidence of carcinoma”) as well as common postnegation phrases (eg, “Perineural invasion is not found”) in text. A 6-word window is created around each UMLS term. If a prenegation phrase or postnegation phrase is detected within the window, the UMLS term is negated. Importantly, the algorithm excludes pseudonegations—phrases containing negations that must be ignored (eg, “impossible to rule out”). Our implementation of NegEx is created as a processing resource using the JAPE regular expression system within the GATE framework for language engineering. GATE is an open-source distribution available from the University of Sheffield (

Design: To test the accuracy of our negation tagger, we evaluated our system against a gold-standard, human-annotated corpus of 250 reports. The corpus was developed using a modified Delphi technique: 4 pathologists coded negated concepts in each report and then revisited discrepant codes until only unresolvable discrepancies remained. These remaining discrepancies were resolved by committee. The corpus included 250 documents with a total of 11 449 nonblank lines, 65 858 words, and 310 human-coded negations.

Results: Performance of GNegEx was best within the key final diagnosis section, achieving 84% precision and 80% recall compared to the gold standard. Performance was worst in the microscopic section, achieving only 19% precision and 37% recall, under equivalent conditions. Poor performance in the microscopic section may be explained by the greater variability of negation phrases and the complex linguistic constructions found in this portion of text.

Conclusions: The GNegEx Negation Tagger provides a sharable, open-source method for accurate tagging of negated concepts in final diagnosis fields of pathology reports. The software for the entire pipeline, including the negation tagger, is available under terms of the GNU General Public License at, along with instructions for implementing it within the GATE framework.

Integrated Framework for Digital Pathology Image Analysis

Tony C. Pan, MS1 ([email protected]); Umit V. Catalyurek, PhD1; Raghu Machiraju, PhD2; Daniel J. Cowden, MD1; Stephen J. Qualmann, MD3; Joel H. Saltz, MD, PhD.1 1 Department of Biomedical Informatics Department, Ohio State University, Columbus; 2Department of Computer and Information Sciences, Ohio State University, Columbus; 3Department of Pathology, Children's Hospital, Columbus, Ohio.

Context: Continual advances in scanner speed and resolution have made digitization of microscopy slides a feasible part of pathology workflow. Coupled with virtual microscope software, this capability allows an institution to acquire and archive digital slides for reviews and education. While commercial systems exist for data acquisition, storage, and browsing, they are insufficient for computer-aided decision support, diagnosis, and prognosis, where advanced image analysis and clinical data fusion are required. Challenges also exist in storage, retrieval, and computation, as high-resolution scanners can generate uncompressed images up to 25 GB in size.

In an earlier work, we developed a virtual microscope system that used a cluster of computers to serve large images on demand to the client software across a network. Currently, we are extending the system to streamline image storage from slide scanner, support query and retrieval for virtual microscope, provide connectivity to image archival and hospital information systems for data fusion, and enable analyses of large images via advanced image-processing methods. The development of image analysis capabilities is motivated by the need for automated computation of features in neuroblastoma, osteosarcoma, and Wilms tumor pathology reviews.

Technology: Clusters are easily scalable in disk capacity and computing power, and allow parallel and distributed execution of tasks to improve overall performance. Central to the implementation of the storage and computer system is a distributed execution framework, called DataCutter, developed at the Department of Biomedical Informatics at Ohio State University, which allows application data-processing components to be executed on different nodes in distributed collections of clusters. This framework enables storage, query, resample, and retrieval of images on clusters, using clustering and declustering algorithms to improve file access performance.

For image processing and analysis, we use a middleware, referred to as Image Processing for the Grid (IP4G), also developed at the Department of Biomedical Informatics at Ohio State University, which integrates the Visualization Toolkit and Insight Segmentation and Registration Toolkit with the distributed execution support of DataCutter. This toolkit makes it possible to construct image-processing and analysis pipelines using the Insight Segmentation and Registration Toolkit, Visualization Toolkit, and custom functions and to execute them on 1 or more slides in a distributed environment.

Results: In this work, we demonstrate the image-processing and analysis capabilities of the cluster with example applications in 3 pediatric diseases: neuroblastoma, osteosarcoma, and Wilms tumor. For neuroblastoma, mitosis and karyorrhexis count, as well as differentiation level, are significant. Osteosarcoma analysis depends on percent necrosis in tumor after resection. In Wilms tumor, the distribution of anaplastic nuclei is important. All these characteristics are diagnostic and/or prognostic, but are traditionally estimated visually. We have developed preliminary algorithms to use the cluster for automated computation of these disease-specific features.

Conclusions: The inclusion of a scalable cluster system in a digital pathology system allows for fast storage, query, and retrieval of very large images. Such a system enables data fusion via interfaces with clinical information and image archive systems. It also supports general image processing and advanced image analysis tasks for specific diseases in a distributed and parallel computing environment.

The Pennsylvania Cancer Alliance Bioinformatics Consortium: A Common Web-Based Data Entry and Query Tool for a Statewide Virtual Biorepository

Ashokkumar A. Patel, MD1 ([email protected]); Monica E. de Baca, MD2; Rajnish Gupta, MS1; Yimin Nie, MD1; John Gilbertson, MD.1 1Center for Pathology Informatics and Benedum Oncology Informatics Center, University of Pittsburgh, Pittsburgh, Pa; 2Department of Pathology, Thomas Jefferson University, Philadelphia, Pa.

Context: The Pennsylvania Cancer Alliance Bioinformatics Consortium (PCABC) is developing a bioinformatics-driven system for the identification of biomarkers important in the diagnosis, treatment, or prognosis of cancer. In its early stages, the consortium has focused its efforts on the development of a bioinformatics infrastructure and tools needed to identify tissue available for collaborative biomarker research. To create an initial functional tool, the scope of the project has been limited initially to prostate cancer, breast cancer, and melanoma. Collaborative efforts with committees staffed by experts from each of the participating cancer centers have centered on: (1) creation of a common data element set; (2) design of data entry, data-mining, image storage, and analysis tools, and of the main biorepository; and (3) design of a data warehouse for biomarkers and analysis tools.

Technology: This is a Web-based data entry and query tool designed as part of the Organ-Specific Data Warehouse. The annotation warehouse system is supported in a 3-tiered architecture. The mid-tier has implemented Oracle's (Oracle Corporation, Redwood Shores, Calif) Application Server (version 9.0.2) on a Compaq DL360 Server running Win2K with Service Pack 2. The application uses the Oracle http server and mod_plsql extensions to generate dynamic pages from the database to the users. The database is Oracle Enterprise Edition implemented on a SunFire (Snohomish, Wash) V880 Server running Solaris 2.8.

Design: (1) Common data element (CDE) identification and design: Common data element sets were defined by consensus of the participating consortium members. The elements provide a strong backbone of information, which is transferable among tissue groups as well as tissue-specific elements. (2) Data entry tool: The Oracle-based data entry tool is a flexible, easily mastered Web-based tool. It is portable and flexible, so that it may transcend specific organ sites. Data entry is done at each participating center. De-identification of data begins on-site; each center is responsible for link-back capabilities for the specimens entered. Data are automatically “site-tagged” by the tool. This allows for subsequent link back to the cancer center of origin. Data include cancer center and case identifiers, research consent elements, patient demographics including family history and exposures, therapy, progressions and outcomes, and follow-up history. The data elements are exhaustively documented in an “Entry Help” portion of the program. (3) Data query tool: Researchers can query de-identified data from the warehouse through a point-and-click query environment based on a data-mart model. The query tool is designed so that 2 important things occur: (a) the data elements selected for a given query environment are selected from the warehouse and essentially copied into a data mart, and (b) the selected data are then transformed from a relational structure to dimensionally modeled structure in a data mart. This allows us to create very efficient and secure specialized query environments for our users. The entire system runs in Oracle; the query front ends are rendered in extensible markup language (XML) from Java servlets.

Results: To date, 1748 cases have been entered into the system by 6 of the participating cancer centers. By organ system, this breaks down as follows: prostate, n = 1543; breast, n = 97; and melanoma, n = 108. Internal quality control results display good precision. Feedback from data entry managers has been used to implement change for added ease of use. The query tool will now allow researchers within the state of Pennsylvania to tap into a very rich resource of biorepository material for biomarker research.

Conclusions: For its bioinformatics-driven system for the identification of biomarkers important in the diagnosis, treatment, or prognosis of cancer, the PCABC has developed common data element sets and a data entry and query tool. These, in conjunction with a “virtual tissue bank” biorepository, a data-mining tool, and imaging storage and analysis software, make up a bioinformatics platform that aims to facilitate the identification of cancer-specific biomarkers and collaborative research efforts among the participating centers. The common data element sets described form the core of a Web-based data entry tool, which has been successfully launched in 6 cancer centers across the state of Pennsylvania. Future developments will focus on automated data entry tools using existing tissue bank databases at the collaborating institutes of the PCABC.

This work is supported by a grant from the Pennsylvania Department of Health (ME 01-740, Pennsylvania Cancer Alliance Bioinformatics Consortium [PCABC] for Cancer Research, PA Commonwealth Department of Health Tobacco Settlement to the PCABC;

GEDA: A Gene Expression Data Analysis Suite

Satish Patel, MS ([email protected]); Soumyaroop Bhattacharya, MS; James Lyons-Weiler, PhD. Benedum Oncology Informatics Center, University of Pittsburgh, Pittsburgh, Pa.

Context: There is a growing demand for easy-to-use tools for the analysis and interpretation of global gene expression patterns from microarray experiments. Although commercial platforms are available, most are specifically designed for data from 1 platform. Most of the software for array analysis available from academia requires updates via downloads from the Web, which often limits the number of platforms the software can run on and often requires the use of plug-ins or additional software downloads. In contrast, Web applications allow users to upload their data securely to a server bank, where the options for analysis are selected through the graphical interface in the users' browsers.

Technology: Our on-line Gene Expression Data Analysis (GEDA) Tool ( is a platform-independent Web-based application written in Java. The Web application runs on a cluster of Intel servers running Microsoft (Redmond, Wash) Windows 2000 Advance Server operating system. It was developed using the open source and freely available tools that include Apache Tomcat Web server.

Design: Our on-line GEDA tool is a high-end Web application that uses JavaServer Pages and Servlet's best practices with a “write once run anywhere” paradigm. This is in contrast to embedded applets and scriptlets, which are not efficient and are highly subject to updates in browser technology. Methods of analysis from normalizations to tests to classification can be added as modules in this already developed sophisticated Web application.

Results: GEDA is designed to educate the user as well as to provide a diversity of options for analysis. GEDA is the only microarray analysis application with a companion Gene Expression Data Simulator that allows developers and users to test out their ideas on analysis ahead of study design. Recommendations for analysis listed at a companion Web site were derived from what we have learned using simulations and from the reanalysis of more than 20 published cancer microarray data sets.

Conclusions: GEDA is growing in popularity among researchers at the University of Pittsburgh and around the country. Its purpose is to provide researchers with up-to-the-minute tools for the secure and robust analysis of global gene expression patterns from microarray experiments accessible from anywhere with no extra downloads.

Web-Based Implementation of a Modular, General-Purpose Temporal Abstraction Framework for Pattern Identification in Clinical Laboratory Data

Andrew Post, MD1 ([email protected]); Jonhan Ho, MD2; Gary Blank, MD1; James Harrison, MD, PhD.1 1Center for Pathology Informatics, University of Pittsburgh Medical Center, Pittsburgh, Pa; 2Department of Pathology, Forum Health, Youngstown, Ohio.

Context: The clinical laboratory data repository represents a comprehensive source of patient information in which clinical contexts are represented by temporal sequences of results. Identification of particular contexts and patients is useful for a variety of purposes, including laboratory utilization review, quality assurance, medical process improvement, and outcomes research. Our previously reported LabScanner program used a simple temporal abstraction strategy to identify data patterns in a limited subset of the laboratory repository exported as a flat-text file. We have now extended LabScanner to create a general-purpose temporal abstraction framework, and we have implemented this framework in an environment that supports general access to recent laboratory data.

Technology: The laboratory data are contained within a production Misys (Misys, Inc, Woodstock, Vt) Flexilab laboratory information system (LIS). Data for pattern analysis are transferred to a MySQL database. The pattern analysis framework is implemented using the JBoss Java 2 Enterprise Edition (JBoss Inc, Atlanta, Ga) environment. A Web-based user interface is implemented in JavaServer pages using the Apache Tomcat servlet container. The software runs on a dedicated Windows 2000 (Microsoft, Redmond, Wash) server.

Design: A MUMPS script written within the LIS supports extraction of all LIS transactions to a daily log file, which is supplied to the pattern detection system periodically via File Transfer Protocol (FTP). A script written in Jython (a hybrid of Python and Java) manages data import from the extraction file into the MySQL database. The database is designed to retain data for a rolling 14-day period. The pattern analysis engine queries the database via Java Database Connectivity (JDBC) (Sun Microsystems, Inc, Santa Clara, Calif) and identifies temporal patterns based on a defined rule set. Found patterns and graphs created on-the-fly are displayed to users via a Web browser interface.

Results: A prototype system has been implemented using an initial data extract. The prototype supports pattern detection based on rules stored in a configuration file. Users log in through a standard Web browser, and found patterns are displayed by name, along with a graph of the data elements contributing to the pattern. The daily extract files from the LIS each contain approximately 50 000 transactions and are 5 MB in size.

Conclusions: We have implemented a prototype of a general-purpose temporal pattern detection system. The system has access to all laboratory transactions and is capable of accumulating specified data patterns for periodic display via the Web. We expect that this capability will allow clinical laboratories to more closely monitor service utilization, clinical process problems, and specific clinical contexts of interest to improve patient follow-up and laboratory planning.

Unsupervised Color Analysis of Histologically Stained Tissue Microarrays

Andrew Rabinovich, BS1 ([email protected]); Sameer Agarwal, MS2; Serge J. Belongie, PhD2; Jeffrey H. Price, MD, PhD.1 Departments of 1Bioengineering and 2Computer Science, University of California, San Diego.

Context: Histologically stained tissue sections and microarrays are primarily analyzed and/or scored manually for cancer diagnoses or expression. Spectral information is critically important whether tissue is analyzed manually by a pathologist or automatically by machine. Both research and clinical studies could be enhanced by automated densitometry of immunohistochemistry stains via spectral analysis. The current state-of-the-art spectral analysis requires human interaction with the digitized images to select characteristic colors of each stain. But background staining ensures that each pixel contains some color from each stain, and the subjective color choices make the process laborious and error prone. In this work, we present 2 completely automated approaches for spectral analysis of histologically stained tissue samples. We also quantitatively evaluate the performance of these techniques via a novel ground truth study.

Technology: Tissues used in this study were derived from human biopsies and stained automatically using diaminobenzidine (DAB) with the Envision-Plus-Horseradish Peroxidase. Spectral images (a stack of 10 color bands from 413 nm to 668 nm) were then collected on an adapted Q3DM Inc (San Diego, Calif) Eidaq 100 high-throughput microscope. Given a spectral stack of n images of a tissue sample stained with m dyes, where n > m, we wish to recover the proportion of staining contributed by each dye, pixel by pixel. Since a spectral profile of the dyes is spread over the entire visible range, recovering the m components representing the dye concentrations at each pixel becomes challenging. In solving this problem, we assume that staining is an additive process, but make no assumptions about the spectra of the dyes or the affinity for the various tissue components.

Design: The accuracy of the unsupervised decomposition techniques was measured directly in a ground truth study. Diaminobenzidine was first applied alone, followed by multispectral image stack acquisition. The hematoxylin stain was then added to the same sample, and a second image stack was acquired. The second stack was the input to the 2 algorithms, and DAB densitometry from the hematoxylin and DAB combination was compared with densitometry derived from DAB alone for direct measurement of the error.

Results: Experimental results indicate that both of the unsupervised methods are capable of performing color decomposition of tissue samples stained with multiple histologic dyes. The range of error across both methods was 4.47% to 19.52% for different tissue samples.

Conclusions: Two automated techniques for spectral decomposition and densitometry of histologically stained tissue samples were tested. Possible instrumentation error sources include light source instability, variations in focus, misregistration of the spectral images, and errors in spectral decomposition. Specimen sources of error include chemical interaction between the dye components used for staining and limited stoichiometry. According to the literature, measurement error due to dye interaction can be as high as 15%. Thus, compared to the expected chemical interaction errors, both methods provide excellent fully automated spectral separation for densitometry.

Web-Based Management Improves User Access Security Practices for Laboratory Information Systems

Kavous Roumina, PhD ([email protected]); Thomas Shirk; Walter H. Henricks, MD. Department of Laboratory Information Services, Cleveland Clinic Foundation, Cleveland, Ohio.

Context: Pathology departments require mechanisms to manage access to laboratory information systems (LISs) and other systems that hold confidential patient data and other data. As the laboratories' activities and administrative control become geographically decentralized, achieving the necessary balance of managing requests for system access privileges (add, delete, modify) with the need for sound system security practices becomes a challenge. Data security practices are under greater scrutiny with the Health Insurance Portability and Accountability Act (HIPAA) and are an integral part of regulatory and accreditation processes. Organized and accurate documentation of system access privileges better facilitates system administration tasks. To meet these needs, we developed a security-conscious Web-based User Access Request Protocol for laboratory managerial staff requesting user access to multiple LIS systems.

Design: Departmental supervisors first enter identifying data about the employee for whom privileges are requested and about the requester into an intranet form. Access requests can be made for any of 5 systems in use at our institution. The requester enters LIS-specific requests, such as privilege levels and/or membership to predefined security groups. Submitted requests are automatically routed via e-mail to the managers for the requested systems. For authentication purposes, a unique workstation identification number is automatically attached before the request is electronically submitted to the system manager(s) and stored in a relational database. After a system manager has completed the requested change in his or her system (eg, add a new user), he or she updates the database via a different intranet page. Requesters receive automatic notification e-mails at 2 points in the process, namely, a confirmation that the request has been received, which also includes a request number for tracking, and a notification that the request has been completed. The system is based on Active Server Pages (ASP; Microsoft Corporation, Redmond, Wash) technology and a relational database (SQL 7.0; Microsoft).

Results: In the paper-based system, potential existed for user request forms (with user names, passwords, and other sensitive data) to be misplaced or compromised. With the new system, only the requester and the system managers have access to such information. Additionally, the system has improved security by eliminating potential delays in the termination of system access for departed employees (due to mishandling of paper forms). During the initial 7-month period, other benefits observed were (1) a 23.3% improvement in turnaround time for request completion, (2) a 2.7% reduction in errors, (3) elimination of lost requests, (4) improved reporting and auditing, (5) immediate notification of receipt and of completion of requests via separate e-mails, and (6) a less paper-intensive environment.

Conclusions: The Web-based User Access Request Protocol has resulted in improved security practices regarding assignment of user access privileges to pathology departmental systems that are used in a geographically dispersed user environment. The system has also improved the accuracy and efficiency with which user privileges and access are administered.

College of American Pathologists Cancer Templates in Practice: A User-Friendly Software System for Cancer Report Entry, Transmission, and Automated SNOMED Encoding

Mark Routbort, MD, PhD ([email protected]); John Madden, MD, PhD. Department of Pathology, Duke University, Durham, NC.

Context: Structured pathology report templates with embedded Systematized Nomenclature of Medicine (SNOMED) terminology codes are a powerful means for supporting uniformity in reporting, as well as subsequent data extraction and aggregation. The publication of a set of Cancer Protocols by the College of American Pathologists (CAP) with designation of required clinical elements for specific cancer resection specimens provides a rich conceptual framework and highlights the need for developing widely available software implementations in support of structured anatomic pathology reports. We discuss needs for such a template system and introduce our Web-based, database-driven model implementation.

Technology: Our template system consists of 3 interrelated components: (1) a Web server application front-end written using Microsoft ASP.NET (Microsoft Corporation, Redmond, Wash), (2) a relational database implementing a hierarchical model of structured templates (Microsoft SQL Server 2000), and (3) a graphical template editor application (Microsoft Visual Basic/Access).

Design: Templates are conceptualized in this system as hierarchical structures composed of simple elements, such as ordinal choice values, free-text fields, and numeric fields, as well as groups of related simple elements. Important design considerations in implementation of the data model and front-end logic include (1) support for interdependencies between template “branches,” by which user input guides the set of contextually relevant available choices; (2) mechanisms for reusing complex elements in multiple templates; (3) output as formatted text or extensible markup language (XML); (4) active user assistance by way of links and pop-ups; (5) support for “computed” outputs, which use parsed JavaScript syntax to automate such tasks as staging; and (6) element level SNOMED CT encoding of templates.

Results: SNOMED codes from 56 cancer checklists in Microsoft Word format have been imported into the database structure of this system, with between 22 and 141 codes in each checklist. Implementation of “precoded” templates has enabled a high degree of accuracy and granularity in the semantic representation of completed cancer reports.

Conclusions: Successful adoption of the CAP cancer protocols would be facilitated by the availability of public tools to generate compliant, coded reports. A rich, open model of templates that includes support for interactive documentation, simple observable-based data entry, and automatic validation is presented.

A Lightweight Extensible Markup Language (XML) Editor for Clinical Laboratory Procedure Manuals

Gilan M. Saadawi, MD, PhD ([email protected]); James H. Harrison, MD, PhD. Center for Pathology Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: Formal clinical laboratory procedures are standardized documents that support the uniform performance of testing, specimen handling, and equipment maintenance by laboratory personnel. Procedure manuals are typically maintained as collections of word processor files (eg, Microsoft [MS] Word; Microsoft Corporation, Redmond, Wash) and are printed on paper for daily use and periodic review. Because large laboratories may support thousands of procedures at multiple sites, there are clear advantages to moving procedures from a paper-based to a centrally managed electronic system. To that end, we have previously developed an extensible markup language (XML) document type definition, which supports the creation of electronic procedure files in an NCCLS-compatible format. One barrier to moving procedures to this form is the requirement for manual translation of files from word processor documents to XML documents, a process that requires the use of an XML editor and expertise in XML. To allow laboratory personnel to manage the transition to XML procedures, we have created a simple graphical editor, which may be used by personnel without expertise in XML, specialized for translation of MS Word documents to XML-based laboratory procedures.

Technology: Because our current procedures are managed as MS Word documents, we chose Visual Basic and the Microsoft Net Framework for development of our editor.

Design: Our design goals included the ability to import MS Word documents in their native format, the ability to select sections of the document and assign them as components of our procedure DTD tree using a simple graphical interface, and the ability create a new valid XML text file reflecting these section assignments as XML elements. The editor should allow transformation of word processor to XML files without requiring any knowledge of XML or specific document markup.

Results: The Procedure Editor is designed so that laboratory procedures can be opened directly as MS Word documents. Document text, including text styles, is displayed in a scrolling text area. Document components corresponding to XML elements and attributes from our procedure DTD are displayed as a list of options, and only choices that are appropriate for the current document section are displayed. The user progressively assigns sections and subsections from the list until the document is complete. These assignments are saved in a valid XML file, compliant with our procedure DTD, which can be opened and re-edited later if desired. The document component list can be modified to support specific DTD features or changes using a small built-in graphical editor, which manages the XML tree.

Conclusions: We have developed a simple Procedure Editor, which supports user-managed conversion of word processor files to well-formed XML documents that can be validated against our DTD. It includes a facility for simple modifications in the DTD that might be needed in the future. We expect this editor to be useful in allowing regular laboratory personnel without expertise in XML to create and manage XML-based laboratory procedures within standard XML publishing systems.

Optimal Management of Immunohistochemistry Results Requires a Different Data Model and Functionality Than for Anatomic Pathology and Clinical Pathology Results

Rodney Schmidt, MD, PhD ([email protected]); Kevin Fleming. Department of Pathology, University of Washington, Seattle, Wash.

Context: Immunohistochemistry (IHC) results are often written directly into surgical pathology reports using unconstrained text. This method of reporting is an inefficient process that saves data in a form that is poorly structured for data retrieval. Moreover, when multiple-specialty pathologists concurrently enter results into the same report, there is a risk that some results will be released prematurely if one pathologist verifies the entire report before all studies are complete.

Technology: Visual Basic, Word, SQL Server, and ADO (Microsoft Corporation, Redmond, Wash); PowerPath (Impac Medical Systems, Mountain View, Calif).

Design: We considered general design goals, workflow, and data relationships. Goals included integration with our current workflow and laboratory information system (PowerPath), facilitated data entry, semiautomated reporting using Microsoft Word, a high degree of flexibility and configurability, usability for both clinical and research purposes, and future ability to identify cases by combination of IHC results. Workflow analysis revealed that IHC data entry and reporting events do not bear a consistent relationship to the steps involved in completing the rest of the pathology report. In addition, multiple-specialty laboratories may issue IHC reports for a single accession. In contrast to traditional clinical pathology tests, which tend to have a 1:1 relationship between orders and results, there is a complex relationship between IHC orders and IHC reports—a single IHC test may yield many sets of IHC results pertaining to different cell and stromal populations. In contrast to free-text anatomic pathology results, IHC results need to be stored as discrete data elements for optimal data retrieval.

Results: Based on these considerations, we developed a Word add-in that facilitates data entry, saves IHC results to an SQL Server table, and inserts results into Word on demand. Result values are selected from a constrained pick list, but free-text comments are also permitted. When reactivity for more than 1 cell or stromal population needs to be recorded, multiple result sets can be entered. On command, the add-in automatically creates Word tables listing all IHC orders and results for each population. A subset of results can be reported at any time. The add-in recognizes studies performed using different methodologies and automatically groups results by methodology with appropriate boilerplate text. The laboratory-specific boilerplate text is implemented through Word documents that require no programming to modify. Research results may be entered into the database without being reported. All IHC results are linked to diagnostic text, so that cases can be identified by a combination of IHC results and vice versa.

Conclusions: The described system accomplishes the major design goals. It is used to report IHC, fluorescence in situ hybridization, immunofluorescence, and in situ hybridization results and can be extended further. Multiple pathologists may enter sets of specialty results with reduced risk of having a colleague prematurely release them. The add-in integrates with PowerPath and should be easy to modify to integrate with other systems using Word and permitting ADO connectivity to a modern database management system. Design elements critical to success include storage of IHC results as discrete data elements, decoupling IHC result entry and verification from creation of the free-text pathology report, and the concept that reactivity on multiple populations may need to be reported for each antibody order.

TMAtrix: A Web-Based Relational Database for Complete Tissue Microarray Data Administration

David B. Seligson, MD1 ([email protected]); Stefan Deusch, PhD2; Khy Huang, BS2; Sheila Tze, BS1; Robert Dennis, PhD.3 1University of California Los Angeles (UCLA) Tissue Array Core Facility, Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, UCLA, Los Angeles; Interactive Media Group, Departments of 2Medical and Molecular Pharmacology and 3Biological Chemistry, David Geffen School of Medicine, UCLA, Los Angeles.

Context: Tissue microarray (TMA)-based studies involve the collection and utilization of abundant tissue resources, the administration of voluminous and varied data sources, and the interaction of several layers of personnel. This presents a critical bioinformatics challenge led by the need to manage and manipulate ever-increasing data pools—and to do it in an intelligent and streamlined fashion. Only a database system able to collect, contain, and present multiple data sets of TMA information will take us from limited “snapshot” investigations, to a better understanding of the fluid and interactive nature of gene marker expression relative to disease. Further maturation, standardization, and growth of the platform rely on the development of foundational data management tools.

Design: The University of California Los Angeles (UCLA) “TMAtrix” tissue microarray relational data model, linking more than 700 data fields, was developed on a Linux development database server (Helsinki, Finland), a stable version later transferred to a SUN 6500 Enterprise server (Palo Alto, Calif) hosted and maintained by the UCLA Advanced Research Computing Cooperative. Using an open-source application framework, Expresso (Jcorporate, Freeport, Bahamas;, user interfaces and server-side classes were developed in-house to manage, query, and maintain all TMA data. This data management system runs on an Apache Tomcat Server (Apache Software Foundation, Delaware, Conn; June 1, 1999) that consists of a Web server and the Catalina servlet engine. Data are managed in an Oracle (Redwood Shores, Calif) 8i Relational Database Management System. The system is interfaced with a ScanScope high-resolution, whole-slide image–scanning workstation (Aperio Technologies, Inc, Vista, Calif), from which images are captured and served on query from the database application.

Results: The Web-based system described handles all of the operating functionality required to manage tissue microarray studies and provides a foundation for interaction, growth, and data standardization. It has removed our previous need to oversee disparate data files, increasing efficiency, reducing error, and allowing multiple users data access.

Conclusions: The growth and utility of TMA research platforms present a challenge in the integration and organization of the resultant supporting data files. We present here an open-source, Web-based relational database solution in response to that need, recognizing both current and potential future functionalities.

A Flexible, Scalable Solution for Acquisition and Storage of Clinical Digital Images

John Sinard, MD, PhD ([email protected]); Mark Mattie, MD, PhD. Department of Pathology, Yale Medical School, New Haven, Conn.

Context: The many advantages of digital imaging over conventional photography have enticed many pathology departments to adopt this technology. Anatomic pathology laboratory information system (LIS) vendors are increasingly integrating this functionality into their products, which provides the added advantage of allowing image incorporation into pathology reports. However, these integrated imaging solutions often place significant restrictions on the acquisition hardware that can be used, the workflow of the acquisition process, and the availability of the acquired images for other uses, such as teaching and digital case presentations.

Technology: Our solution involves multiple custom software elements, including Macintosh (Apple Computer Inc, Cupertino, Calif) programs written in REALbasic (REAL Software, Austin, Tex), Windows NT (Microsoft Corporation, Redmond, Wash) software written in PowerBuilder (Sybase, Dublin, Calif), and PHP- and JavaScript-enhanced Web documents. Our laboratory information system is CoPath Plus (Cerner DHT, Inc, Waltham, Mass) with a Sybase Adaptive Server database (Sybase).

Design: The solution to the integrated imaging problem is to separate the image acquisition step from the image filing and storage step, and to create 2 parallel image repositories, one consisting of flat image files and the second integrated into the LIS. We accomplish this by using an intermediate image-holding area (an image “drop box”). Images are “addressed” to the proper clinical case by means of a file-naming convention, which contains the case accession number and encoded image type. Optional image descriptions may be placed in the comment field of the image file header. Images can be manually addressed or, more routinely, an image selection program is used to automate the image-naming process. Once in the image drop box, images are filed by an Image Filing Engine. This program both places the flat image file into a dynamically generated directory tree and encodes a second copy of the image into the LIS. The LIS copy can be used for incorporation into clinical reports. The flat image files are available for all other uses, and are open for browsing and accessible via a password and firewall-protected Web-based interface, which dynamically scans both the image repository directory structure and the LIS database to display both images and specimen information.

Results: Using this approach, we have successfully enabled image acquisition from multiple devices running on 3 different operating systems. Current acquisition stations include gross images in autopsy, surgical pathology, and frozen section; histology images with 3 different camera systems; and electron microscopy and fluorescence micrographs. Future plans include additional photomicroscopy stations, as well as image capture in molecular diagnosis and document imaging. Images are available from the image repository and in the LIS within 1 hour of being placed in the image drop folder and are now routinely referenced during case sign-out.

Conclusions: Separating the image acquisition step from the image storage and archiving step provides much greater flexibility in the selection of acquisition devices, both at the camera and operating system level. Automatic dual-image storage also provides greater flexibility for image browsing and utilization, offering the benefits of both flat image files and LIS-integrated images.

Generic Development Environment for Rapid Deployment and Maintenance of Multiple Data Collection Applications

Harpreet Singh ([email protected]); Rajnish Gupta; Yimin Nie; John Gilbertson, MD. Center for Pathology Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: The requirement for structured data collections from Web clients to back-end relational databases is a common problem in medical informatics. In fact, the demand for such systems is growing so fast that it is no longer feasible to develop and maintain custom data entry and database applications. Instead, it is more efficient to create a single set of data collection, maintenance, and database tools that can be easily configured, through changes in meta data, to support a wide range of pathology and oncology domains.

Technology: This Web-based data collection application is supported on the Oracle9i (Redwood Shores, Calif) database server, Oracle Application server, and mod_plsql extensions to generate dynamic pages from the database to build the application tools, including a Web-based meta-data dictionary updating tool and the application to map the dictionary elements to the actual schema tables. There is also a Web-based data entry tool and a Web-based query application that externalize the data and data elements for the users. Simply by applying different sets of data elements, the system can be used in a wide range of applications without changing the underlying code. Significantly, all data elements and data values in the system are linked to a unique numeric identifier. This identifier, not the actual data or data description, is used for all data references. This results in a much faster and more flexible system.

Design: The overall design is divided into 3 layers. The first is the data tables schema layer, where the actual data and data relations (as numeric unique identifiers) are maintained. The second is the meta-data dictionary and mapping layer, where all the data elements, their values, element descriptions, validation rules, layout details, etc, are stored. The third layer is a set of dynamic procedures for controlling the front end of the application. These dynamic procedures are flexible enough to accommodate any changes in the meta-data dictionary and to immediately reflect changes in the user interface. The Web-based interface is supported by Oracle Apache on the server.

Results: These tools allow the data mangers to update their own dictionary elements, element values, and layout of end-user interface; define the element definition and validation rules; and help documentation by changing meta data through the Web-based dictionary application tool. Any additions, deletions, and changes in the dictionary are reflected immediately in the data entry and query front ends. This application is currently being used to support the collection and display of prostate, melanoma, and breast tissue banking data; studies of medical errors and patient safety; and cancer registry data. It is being used to aggregate data from approximately 10 sites nationally.

Conclusions: Implementing and maintaining static data collection query applications for medical research purposes is very expensive in terms of software development and coping with the changing requirements. In contrast, our solution provides the project data managers the ability to build their own data dictionary by defining data elements, data values, data layout, data validation rules, and data definitions and to immediately see the results in the Web-based front end. This significantly improves turnaround times and frees up developers for core development tasks.

How Do Low-Literacy Adults Access Internet Health Information?

Richard Steinman, MD, PhD1 ([email protected]); Mehret Birru1; Valerie Monaco, PhD, MHCI2; Lonelyss Charles, MA3; Hadiya Drew, MA.3 1University of Pittsburgh Cancer Institute, Pittsburgh, Pa; 2University of Pittsburgh School of Medicine, Pittsburgh, Pa; 3University of Pittsburgh Library and Information Sciences, Pittsburgh, Pa.

Context: The Internet is a powerful vehicle for augmenting consumer health knowledge. However, health resources on the Internet are generally written at a high school reading level. Little is known about how low-literacy adults, who are at great risk for poor health outcomes, search for and understand on-line health information. This study examined searching strategies used by low-literacy adults seeking health information on the Internet. Our study also characterized the ability of participants to locate accurate answers to health-specific questions.

Technology: Camtasia Studio software by Techsmith (Okemos, Mich). This program recorded subjects' keystrokes, the Web sites and URLs they accessed, and amount of time they spent navigating the Web.

Design: We enrolled 9 adult literacy students (average age, 41.5 years; third to eighth grade reading levels) from Bidwell Training Center, a vocational school in Pittsburgh, Pa. All subjects participated in a computer skills workshop. An investigator met separately with each subject for independent observation. Subjects were (1) administered the REALM test (Rapid Estimate of Adult Literacy in Medicine) to measure health literacy; (2) taught how to “think aloud,” or continually express their thoughts while on the computer; and (3) asked to use the Internet to find answers to 1 health question of their choosing and 2 other simple health-related questions developed by the research team. Subjects used the Google (Mountain View, Calif) search engine. A fourth question required participants to navigate through materials on the American Cancer Society Web site. Camtasia Studio software recorded keystrokes and think-aloud recordings. Investigators did not answer navigational questions while subjects searched. Questionnaires and think-aloud methods were used to ascertain subjects' criteria in evaluating health Web sites.

Results: Subjects retained many Web skills from the computer skills workshop. Many subjects were initially unsure whether to place spaces between words used as search terms. Spelling mistakes were largely self-corrected. Because many subjects lacked extensive Internet experience, most reported discomfort thinking aloud while searching for information. Most participants used sponsored-site links retrieved by Google to answer questions. Information accessed was written at an 11th grade average level (Microsoft Word Flesch-Kincaid algorithm). No participant completely answered all 4 questions. Subjects spent an average of 8.5 minutes and used 1.5 links to answer questions. Performance was best for the question posed directly on the American Cancer Society site (eighth or ninth grade level content). There was discordance between subjects' relatively positive self-evaluations of their performance and the assessment of their performance by the observer. Overall, subjects reported that it was moderately easy to find readable information on the Internet. REALM test results for this population were inconclusive; most subjects scored significantly higher than their institutionally reported reading levels.

Conclusions: Subjects generally used broad search terms and suboptimal navigation techniques, which resulted in poor overall success in answering questions. Most subjects were able to answer the American Cancer Society question, but navigating to the information was generally laborious and indirect. The preferred use of sponsored sites by this group is intriguing. Further study should address what factors underlie this choice. Of concern, when subjects arrived at sponsored sites offering alternative treatments of cancer, most viewed these as nonbiased treatment options.

Flexible Framework for Microarray Applications

David Tuck, MD ([email protected]); James Cowan; Peter Gershkovich. Department of Pathology, Yale University, New Haven, Conn.

Context: Microarray and other high-throughput technologies are increasingly important in cancer and pathology for both research and clinical applications. Software systems for the management of data and analysis tools must have the capacity to grow and evolve with these relatively new technologies. They must be capable of integrating current and future tools, as well as novel and evolving external data sources, into a comprehensive, user-friendly system and to incorporate evolving standards and techniques from the information sciences.

Design: We have developed an open-source application framework written in the Java programming language, based on stable, public license programming libraries and designed specifically to meet the needs of microarray research projects. By using a flexible database design coupled with an extensible, customizable, persistence class generation mechanism, the framework is adaptable to meet many system requirements that are common to microarray technologies, including the management of heterogeneous data from multiple sources. Developers are able to customize and implement the framework as a foundation for applications supporting microarray research, eliminating the duplication of effort and minimizing the development life cycle for these projects.

Conclusions: A distinctive, cross-development, multiple-application project was undertaken to meet the needs of 2 separate, but related research groups at Yale. One, developed for the Core Tissue Microarray Facility, was required to handle the merging of clinical data from disparate sources and linking the clinical data to the tissue microarray experiment data. The other, for the Molecular Oncology and Development group, was required to manage gene and protein expression data and to provide continuously updated links to external data sources, such as Locus Link and Gene Ontology. The framework was used successfully to develop these applications from quite different domains.

Multispectral Imaging Applications in Pathology

Yukako Yagi1 ([email protected]); Masahiro Yamaguchi2,3; Nagaaki Ohyama.2,3 1Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pa; 2Tokyo Institute of Technology, Imaging Science and Laboratory, Tokyo, Japan; 3Telecommunications Advancement Organization, Akasaka Natural Vision Research Center, Tokyo, Japan.

Context: During the past 4 years, we have demonstrated the application of a multispectral imaging system in a variety of pathology domains, including the identification of fibrosis in hematoxylin-eosin–stained histology sections and initial research in digital image standardization and decision support systems. Based on recent results, this year we have developed modules that demonstrate that digital image standardization and decision support systems based on multispectral imaging are realistic. Five abstracts were submitted to APIII this year.

Design: Digital imaging standardization is basic to all image decision support systems. Parallel research is ongoing for pathology, and all research will be related and combined. Imaging formats, including compression and database, are being developed by other groups. We are using a 16-band multispectral imaging system developed by Telecommunications Advancement Organization, Akasaka Natural Vision Center, Tokyo, Japan. However, we are also looking at the limitations of conventional 3-band systems and how many bands are needed for multispectral imaging to be effective in pathology. The major components of the standardization research include (1) hematoxylin-eosin stain standardization by a 16-band imaging system, (2) stain standardization by a 6-band imaging system applied in a conventional system, (3) optimization of multiple bands, and (4) design of calibration slides. The major components of the decision support system are (5) to identify each tissue component without special staining (digital stain), (6) to identify each tissue component without any stain (digital stain from unstained slide), and (7) morphologic analysis across multiple magnifications, from low power to high power.

Results: Detailed results will be presented in 4 specific abstracts (see “Context” section above).

Conclusions: Multispectral imaging generates a lot of information. Current results indicate that the pathology imaging standardization and decision support system could help pathologists in many ways. In the future, the technologies developed in this line of research will be merged with whole-slide imaging to create a better system.

Individual Estimation of Hematoxylin-Eosin Image Distribution Using a Double-Band Microscope

Satoshi Arai ([email protected]); Susumu Kikuchi. Advanced Core Technology Department, Olympus Optical Co, Ltd, Tokyo, Japan.

Context: In the field of pathology, digital imaging technology has been widely expected to support medical staff in providing efficient procedures, such as telediagnoses, digital image archiving, and diagnostic assistance via automatization. In order to use digital images in pathology, we need to standardize their color information, because a conventional set of color signals composed of the 3 primary colors (red, green, and blue) easily varies color balance through the processes of staining and image capturing.

Technology: For this subject, multispectral imaging methods proposed by Natural Vision Project, sponsored by the Japan Ministries, must use pigments; they have proposed a method to individually estimate the distribution of each pigment used, particularly for pathologic tissue samples stained by hematoxylin-eosin (H using a multiband microscope and a digital processing software. Based on this method, we can quantitatively represent color images robustly for variation of imaging conditions. Since we need more than 6 band images for the method, however, we would normally sequentially acquire multiband images by changing an optical filter using a rotational filter turret prepared for the microscope system. Therefore, this method cannot be widely applied to the practical system owing to the inconvenience of such an imaging process. To lessen this difficulty, we propose an advanced method using the fewest number of filters.

Design: Assuming the spectra of hematoxylin-eosin pigment is space invariant inside tissue samples, we need only 2 filters for the estimation. We designed these filters to optimize both the estimation accuracy and the signal-to-noise ratio of an image-capturing device. Using a double-band image taken through these filters, we first estimate the spectrum and next the optical density of each pigment for each pixel of the image. For this 2-step estimation, we use the statistical spectrum data of image samples and hematoxylin-eosin pigments as a priori information.

Results: Experimentally, we compared the quantitative distribution of each pigment based on our proposed method with one measured using a spectrometer. The estimation accuracy was calculated to be about 10% and less than 20%.

Conclusions: We believe our method will be practically applied to pathology imaging systems, because we can standardize color information of tissue samples stained by hematoxylin-eosin using a double-band microscope, the band number of which is less than one of a conventional imaging device based on a red-green-blue color system.

We have been studying this theme as a part of Natural Vision Project sponsored by the Japan Ministry of Education, Culture, Sports, Science, and Technology. We especially thank Dr Mukai and Dr Izumi of Tokyo Medical University for providing pathologic tissue samples and medical advice.

Managing User-Identified Gaps in Controlled Vocabularies

David B. Aronow, MD, MPH ([email protected]). Clinical Informatics, Ardais Corporation, Lexington, Mass.

Context: One metric of good controlled-vocabulary practice is completeness of domain concept coverage. Ardais has devised a clinical ontology based on pre-coordinated controlled vocabularies to ensure that collected data are well structured and tailored to genomic research. Abstractors review unstructured medical documents and enter information into structured data fields using field-specific pick lists. When a required term is not evident in the pick list, a gap in domain coverage might exist.

Technology: Ardais information systems are a 3-tier Web application deployed on IBM's WebSphere (White Plains, NY), using Java Server Pages for the presentation layer, Java 2 Enterprise Edition beans for the business-logic layer, and Oracle-91 (Redwood Shores, Calif) for the database, all hosted on Sun Solaris (Santa Clara, Calif) hardware.

Design: Ongoing discovery and remedy of gaps in Ardais controlled vocabularies is managed through the OCE (Other Code Edits) process. The foundation of OCE is inclusion of a specific “Other” concept in each controlled vocabulary domain. Users select Other when the value desired is not in the pick list for the data field of interest. When Other is selected, a text-entry window activates, into which the user types the concept name they would have picked. The code for Other and the desired text are posted to the appropriate data table.

The code for Other, the desired text, and appropriate identifiers are also posted to a special Others table. This table is accessed by an OCE editing application, which displays the desired text, source documents (eg, the ASCII pathology report), and that data element's controlled vocabulary. The editor reviews the text and, if the correct concept is already available, the OCE application updates the Other code in the data table with the code for the concept, and the Other status flag is set to “Fixed.”

If the desired concept is not already available, the editor sets the status flag for the Other text to denote that a new concept definition is required, that the Other needs off-line review, or that the text entered is inappropriate. New concepts are queued for inclusion during periodic controlled vocabulary and code updates. “Others” needing review are printed with their source documents for multidisciplinary terminology reviews. Inappropriate text is referred to product managers. On a monthly basis, product managers also receive a list of Other usage along with feedback to users concerning how to find correct, existing concept names in pick lists.

Results: Since initiation of OCE for Diagnosis, Tissue, and Procedure in 2001, users have entered approximately 10 000 Others in these domains. Half of these Others contribute to critical product identifiers, of which 80% are marked as fixed. Half of the Others unrelated to product identifiers and from other clinical vocabulary domains have also been fixed.

Conclusions: Rapid identification and closure of controlled vocabulary gaps is essential to successful ontology implementation. Providing a real-time means to signal potential vocabulary deficiencies, coupled with ongoing training or vocabulary creation, gives users confidence that the language they need for their work will be available, current, and respected.

Extracting Cancer Terms From Publicly Available Nomenclatures

Jules J. Berman, MD, PhD; ([email protected]). National Cancer Institute, National Institutes of Health; Bethesda, Md.

Context: A comprehensive nomenclature of cancer terms can be used to annotate and integrate tumor-specific data from heterogeneous sources (eg, hospital records and biological databases). There is only 1 publicly available nomenclature that consists exclusively of neoplastic disorders, namely, the World Health Organization International Classification of Diseases for Oncology. The current version is the third revision (ICD-O-3) and contains about 2500 cancer terms.

Technology: The National Library of Medicine's Unified Medical Language System (UMLS) is a compilation of approximately 100 medical nomenclatures and contains many neoplastic terms missing from ICD-O-3, interspersed in the many UMLS source vocabularies.

A PERL script was created that draws from ICD-O-3 and the January 2003 version of the UMLS metathesaurus, automatically compiling a coded listing of cancer terms. The metathesaurus files used are MRCON, a 151-MB file containing more than 2 million medical terms, and MRCXT, a 1.7-GB UMLS file with more than 27 million records expressing the relationships for terms contained in MRCON.

Design: The PERL script ( collects all UMLS terms with a “neoplasms” relationship and all ICD-O terms not included in UMLS, preserving UMLS and ICD-O codes. It then executes 3 transformations on terms: (1) expanding the number of terms by including grammatically equivalent expressions (eg, adenocarcinoma of colon → colon adenocarcinoma → colonic adenocarcinoma), (2) normalizing terms by converting every term to lowercase and obliterating most plural forms by truncating the trailing “s” character, and (3) removing duplicate terms.

Results: The PERL script produces an output file in about 7 minutes on a 1.6-GHz computer. The output contains approximately 29 500 cancer terms, of which 23 600 are English. Thirteen foreign languages are included in the terms. An example of a single concept entry is hepatocellular carcinoma, which encompasses 78 terms under the UMLS identifier C0019204, including adenocarcinoma of liver, cancer of liver, carcinoma of liver, hcc, hepatic adenocarcinoma hepatic cancer, hepatic carcinoma, hepatocarcinoma, hepatocellular adenocarcinoma, hepatocellular cancer, hepatocellular carcinoma hepatoma, lcc, liver adenocarcinoma, liver cancer, liver carcinoma liver cell carcinoma, and malignant hepatoma.

Conclusions: A PERL script is entered into the public domain that extracts more than 29 000 codified cancer terms, from 2 publicly available medical nomenclatures (UMLS and ICD-O), accounting for more than 10 times the number of cancer terms contained in the most recent version of the ICD-O.

Computational Discovery of Cancer Biomarkers From Microarray Expression Measurements

Soumyaroop Bhattacharya, MS ([email protected]); Satish Patel, MS; James Lyons-Weiler, PhD. Center for Biomedical Informatics, Center for Pathology Informatics, Benedum Center for Oncology Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: Class discrimination in case versus cancer studies, such as normal and cancers based on gene expression values, has been the primary focus of microarray analysis. But despite advancements in technology over the years, high variability in the data due the presence of outliers, background noise, and flawed experimental designs have made it difficult for researchers to derive meaningful and useful findings. Proper analysis methods that either adjust for such artifacts or are resistant to them are the most effective at both uncovering the problems and identifying gene expression patterns of interest.

Design: We present the results of our Maximum Difference SubSet (MDSS) algorithm (Lyons-Weiler et al, 2003) for classification of cancer samples from normal specimens. The earlier proposed MDSS algorithm was applied to 4 gene expression cancer data sets: (1) a human melanoma data set with melanoma (n1 = 31) and normal controls (n2 = 7), (2) a colon cancer data set with tumors (n1 = 40) and epithelial normal specimens (n2 = 5), (3) an acute myeloid leukemia (AML) data set with AML chemotherapy success (n1 = 7) and AML chemotherapy failure (n2 = 8), and (4) an epithelial ovarian carcinoma data set with tumors (n1 = 35) and normal specimens (n2 = 4).

Results: The MDSS algorithm successfully distinguished between the 2 specified groups with 100% success in all the data sets. We also present a list of potential biomarkers for each of the cancers studies that constituted the differentially expressed genes identified by the MDSS algorithm.

Conclusions: Our results show that our simple machine-learning–approach MDSS performs at par, if not better than the complex algorithms published earlier. The MDSS algorithm can be applied to any empirical data set on the Gene Expression Data Analysis (GEDA) Web tool at It is part of University of Pittsburgh Bioinformatics Web Tool Collection (, which runs on a Java Web server and is available for public use. Researchers can upload their data and run a wide range of analysis options available in the Web tool.

Image Capture, Storage, Retrieval, and Conversion in an Academic Medical Center Pathology Department: Considerations for Transition From Film to Digital Technology

Phillip J. Boyer1 ([email protected]); Reade A. Quinton1; Peter Millward2; Loren E. Clarke2; Charles L. White.1 1Department of Pathology, Division of Neuropathology, University of Texas Southwestern, Dallas, Tex; 2Department of Pathology, Pennsylvania State University, Hershey, Pa.

Context: The rapid synchronous evolution of digital image capture and storage technologies has enabled the acquisition of high-quality images that rival the data obtained using film-based image capture technology. This evolution of technology coincides with a growing demand on pathologists for digital images for use in reports, tumor board presentations, lectures, Web-based instructional modules, poster presentations, and publications. Easily accessible digital images also allow for image analysis, telemedicine applications, and other opportunities. Digital conversion of previously acquired film-based images offers access to archival image data. However, transition to digital technologies is a time- and resource-intensive endeavor, and the complexity of the transition and the numerous options available present multiple potential pitfalls. Ideally, numerous issues and implications would be contemplated and informed decisions would be made before the transition from a film-based image system to a digital image system is undertaken. Presently, limited pathology-related literature exists describing the interests, issues, and options related to and guiding the implementation of digital imaging.

Technology: Digital image capture, conversion, database, and storage devices and software.

Design: The principal objective of this work is to describe the progress toward implementation and evolution of macroscopic and microscopic digital image capture and conversion systems in 2 academic medical center pathology departments. Specific aims of the project include (1) systematic consideration of the numerous interests and issues that confront implementation of digital image capture and conversion systems; (2) documentation of options for and choice of hardware and software; and (3) gathering and summarization of data regarding (a) preimplementation digital imaging skills, (b) training requirements, (c) preimplementation and postimplementation usage patterns as a measure of success, (d) cost considerations, and (e) lessons learned.

Results: This project has thoroughly evaluated multiple interests, issues, and options inherent to digital image capture and conversion, including benefits and risks of transition from a film-based system; choice of camera and implementation of imaging workstations; image capture standards (eg, file size and format); database software; file databasing considerations (eg, file nomenclature and indexing); patient confidentiality (eg, Health Insurance Portability and Accountability Act [HIPAA] compliance considerations); data and hardware security; image archiving and distribution (eg, networked server vs CD or DVD); data backup; user considerations, including training; conversion of digital images to hard copy (film or prints); and system support and maintenance.

Conclusion: Digital image capture offers a large number of advantages compared to film-based image capture. In addition, conversion of film-based images to digital files allows for access to archival image data. However, successful transition from a film-based image system to a digital image system that meets the needs of multiple users of various skill levels in an academic pathology setting requires thorough consideration of multiple interests, issues, and options. It is anticipated that publication of our experience will be of use to pathologists at various phases of transition from film-based to digital image capture systems.

Virtual Reality (VR) for Telepathology: The Use of Personal Computer–Based VR Technology for Telepathology

Ho-young Byun1 ([email protected]); Jeongwook Choi1; Joonhoe Eom1; Jaeyoun Hwang1; Peom Park.2 1HuminTec Co, Ltd, South Korea; 2Ajou University, Seoul, South Korea.

Context: The presentation and control system of telepathology will undergo endless changes in the future. Most of all, virtual reality (VR) technology will become effective for the presentation of pathologic images (including gross and microscope images) and in the control of the virtual microscope. In this E-poster, we present and demonstrate some of the possibilities of VR technology.

Technology: We use 2 types of VR equipment, the first of which is the Data Glove (5DT, Irvine, Calif). The Data Glove is put on the hand and can then be seen as a floating hand in the virtual environment. It can be used to initiate commands and operate the virtual microscope. For example, in virtual telepathology, space pointing the glove upward raises the intensity of the microscope light. The virtual hand is like a cursor, able to execute commands by pointing at a particular icon and clicking. We use the 5DT Data Glove 5 with the following specifications:

  • Bend Sensing Method: fully-enclosed fiber optic bend sensors;

  • Number of Sensors: 5 (1 per finger);

  • Resolution: 8 bits (256 positions per finger) ± 60 pitch & roll;

  • Interface: serial port, RS-232 (3 wire) 19.2 K baud (full duplex);

  • Sample Rate: 200 Hz per finger;

  • Tracking: integrated pitch and roll sensor; and

  • Calibration Routine: open and close hand, each user.

The second type of equipment is the Head-Mounted Display (HMD) (5DT). As you move through the virtual telepathology world made by VR, images of objects that surround you are displayed on the screen, based on your position. When you view virtual telepathology images with an HMD (the best-known tool for data output in VR), the virtual image corresponding to the current head position is shown. We use the 5DT Head-Mounted Display with the following specifications:

  • Display Resolution: 800 × 600 pixels, full SVGA;

  • Optics Field of View: 32° (diagonal) (4:3 aspect ratio); and

  • Weight: approximately 600 g.

Design: Our system consists of 4 modules using an OpenGL library, Visual C++, and Visual Basic (Microsoft Corporation, Redmond, Wash): (1) telepathology system connection module, (2) data glove control and command transfer module, (3) image transfer and presentation module, and (4) database module.

Results: We have implemented a microscope by VR technology and it can extend the teaching and learning ability of participants. Moreover, the virtual telepathology system can allow pathologists to operate a microscope.

Conclusions: Our initial studies have shown VR technology to be effective for a virtual telepathology system. However, there are some issues that pathologists should consider, including cost, user sickness, and the image-resolution of HMD.

Prototype for the Development of Secure Wireless Solutions Deployed on Handheld Devices

Samuel Caughron, MD ([email protected]). Department of Pathology, Creighton University, Omaha, Neb.

Context: Wireless networks and handheld devices, that is, personal data assistants (PDAs), are 2 of the more powerful and exciting recent advances in information technology. Wireless networks have become increasingly available in hospitals, clinics, and areas of general public use. Handheld devices with built-in wireless networking capability are now widely available from multiple vendors. Both technologies we believe will evolve into common instruments for the collection and delivery of health-related information. Their use has already been demonstrated in the delivery of laboratory information, billing charge collection, research data gathering, and physician education. However, the documented cracking of the native wireless encryption technology has raised concerns about the security of wireless solutions. Furthermore, the novelty of handheld software development and heterogeneity of devices present difficulties in delivering effective, low-cost solutions. To address these issues, we sought to implement a prototype for the delivery of wireless handheld solutions at relatively low cost with acceptable security, performance, and utility.

Design: A spare desktop computer available within the department (333 Mhz iMac, Apple Computer, Cupertino, Calif) was selected as the server because of its availability and ability to run a Berkeley Software Distribution (BSD)-based operating system (Mac OS X 10.2.6). Mac OS X was also selected for its networking flexibility and reputation for excellent security. Hypertext transfer protocol (HTTP) was chosen for the bidirectional transfer of information because of its maturity and ubiquity among clients with established security standards. The open-source Apache http server (1.3.27,, Forest Hills, Md) was compiled with support for OpenSSL (0.9.6i,, an open-source toolkit for implementing secure socket layer (SSL). Support for PHP (4.3.2,, an open-source hypertext processor, was also compiled into Apache and selected for the core software development because of its performance and flexibility in connecting to multiple database formats. MySQL (4.0, was used for local database storage. The Oracle (,, Redwood Shores, Calif) client for Mac OS X was installed on the server, and support was compiled into PHP to allow access to the trial data in our hospital's hospital information system and demonstrate Oracle access capability within the prototype.

Results: Using Apache, OpenSSL, PHP, and MySQL—all freely available open-source technologies—we were able to assemble a prototype system for the secure delivery and collection of information using wireless-enabled handheld devices. The cost of the system was limited to server hardware, suggesting the cost for development of a complete solution could be limited to necessary hardware and actual application development. Application-level security superseded the need for robust wireless encryption with the transfer of all critical data occurring over a 128-bit encrypted SSL stream using a self-signed certificate. Deployment using an HTML-based browser interface facilitated securing data transfer and provided support for a wide range of handheld devices.

Conclusions: Wireless handheld solutions, which are secure and perform well on a wide variety of handheld devices, can be effectively delivered using freely available open-source tools. Within environments in which a wireless infrastructure and wireless-enabled handheld devices are already available, the cost of a solution can be limited to hardware and actual application development.

A Survey of Statistical Software for Surgical Pathology: 2003

Gilbert Edward Corrigan, MD, PhD ([email protected]). St Louis, Mo.

Context: An analysis of statistical analysis software available in 2003 was carried out in consideration of application to the data usually derived from surgical pathology studies.

Technology: Informational sources for the survey included the Internet, local and national libraries, the periodic literature, and available promotional materials.

Design: Statistical programs were cataloged, indexed, priced, evaluated, and compared on a worldwide basis.

Results: (Final results to follow.) More than 204 separate companies or universities were discovered. The complexity of technology was immense, with many different derivatives of “classic” statistical theory. A “growth-phase” elaboration is present.

Conclusions: Computer-based statistical analysis for the surgical pathologist is readily available but must be considered as occurring in a growth-phase environment; periodic reanalysis may be productive of additional statistical insight.

Statistical Software Complementary to Pathologic Analysis


Gilbert Edward Corrigan, MD, PhD ([email protected]). St Louis, Mo.

Context: A recently completed study of human lipomas obtained during 2 years of surgical excision in which lipomas comprised 2.2% of the total surgical specimens was applied to net-available statistical graphing programs obtained by downloading.

Technology: Software was downloaded from Web-page sites maintained by various statistical software providers. The data from the lipoma study were applied and the graphs were presented at a scientific meeting.

Design: Initially, the data from the lipoma study were used to demonstrate the graphing and statistical capabilities of the downloaded program, and then an intensive search was made to find other available statistical software. These sites were collected, evaluated, and listed to form a complete description of the current status of statistical software programs.

Results: The data from the lipoma study were graphed with ease, and the graphs were presented at a scientific meeting (Missouri Academy of Science, 2003). The software enquiry demonstrated an immense worldwide enterprise, represented by more than 204 available statistical software programs in English. Data analysis software URLs were tested for activity and the sites were evaluated; a user's guide was prepared for educational purposes and presented at the author's Web page (

Conclusions: Statistical software with graphing capabilities is readily available and easy to use. The statistical software enterprise is a worldwide activity that has grown from isolated commercial software programs in the 60s to active worldwide enterprises that support research from the initial concept to the completed distributed journal article, including elaborate educational and support services.

Intralaboratory Quality Control Based on Large-Scale Average of Normals (AON) Values and Operational Line Characteristics

Daniel J. Cowden, MD1 ([email protected]); Catalin C. Barbacioru1; Joel Saltz, MD, PhD1; Michael Bissell.2 Departments of 1Biomedical Informatics and 2Pathology, Ohio State University, Columbus.

Context: The Clinical Laboratory Improvement Amendments of 1988, NCCLS, and the Food and Drug Administration are a few of the major governing bodies that set federal regulations and guidelines for quality assurance in laboratory medicine. All laboratories implement a quality assurance program to account for analytical variation and to control accuracy. To maintain their credentials, laboratories must assure the accuracy of their results. Strict adherence is necessary to meet these guidelines. The most commonly practiced quality assurance is based on control analytes and is referred to as control-based quality assurance. In most laboratories, this provides an assurance mechanism that facilitates a strong quality assurance program; however, it has 1 major weakness, in that it is entirely dependent on the quality of the controls. Because of this, other methods have been investigated to guarantee quality assurance, especially patient-based controls. It is based on the concept that when we redirect the way we control for analytes by using aggregate normal patient data (termed average of normals [AON]), we might be able to provide a supplementary assurance method to control a broader range of variables that extend beyond control-based controls to include the quality of the controls, themselves.

Technology: This study uses stored procedures within Microsoft SQL Server 7 (Microsoft Corporation, Redmond, Wash) to automate (OLE automation) the production of Levey-Jennings charts in Microsoft Excel. The data extraction was done with Standard Query Language (transported from an Oracle [Redwood Shores, Calif] Database), and the automation was performed with OLE automation within SQL Modules and Excel Macros.

Design: The AON was determined for patients in the year 2002, based on the normal ranges designated by the commercial companies. Using Oracle 8 and the Ohio State University (OSU) Medical Centers Information Warehouse, we isolated all normal values in the data warehouse. Averages, standard deviations, variance, and counts were performed for each of the 3 time intervals and grouped by the specific laboratory test name. A Levey-Jennings Chart was then made for each time interval. This amassed to 756 tests for each time interval and more than 2000 charts to review for “shifts,” “trends,” or “dispersions,” which represent classic ways of reviewing operational lines on Levey-Jennings charts.

Results: Five million six hundred thousand normal laboratory results were extracted from the OSU Information Warehouse. These were partitioned by daily, weekly, and monthly intervals, and then stratified across 756 distinctly named tests to determine performance measures of the different intervals. Classic distributions of each test calculated by means, variances, and standard deviations were then evaluated by a 2-sample t test for independent samples with unequal variances (Satterthwaite method), from which 153 significant t statistics at the 95% quintile (P = .05) resulted. “Operational line” characteristics were observed in this cohort that provided a needed measure for quality control programs.

Conclusions: Our study modifies the traditional Levey-Jennings Chart, adding regression analysis and multicolored indicators of Westgard rules, and in so doing adds considerable utility to the clinical pathologist. Not only do we provide a deeper analysis of aggregate test values to enable a more accurate assessment of quality and controls in a laboratory quality assurance program, we suggest the fittest time interval for this type of analysis. The daily time intervals of AON values are perhaps the best method for patient-based controls and can be used on large-scale aggregate data.

Ordering Patterns to Evaluate Order Sets in a Clinical Order Entry System

Daniel J. Cowden, MD ([email protected]); Catalin C. Barbacioru; Joel Saltz, MD, PhD. Department of Biomedical Informatics, Ohio State University, Columbus.

Context: Protocols—like their electronic counterparts, order sets—provide an “indication” identifying the clinical scenario of the patient's condition when the ordering event occurred. This indication is rarely captured by individual orders and provides difficult challenges to developers of information systems. While mandating an indication be entered for every medication or laboratory order makes the clinician's job much more taxing, it is appealing to researchers and accountants. We have attempted to bypass that consideration by identifying ordering patterns that predict diagnosis-related groups (DRGs) and diagnostic codes that would greatly facilitate the information-gathering process and still provide a flexible and user-friendly physician interface.

Technology: Statistical procedures embedded in Microsoft SQL Server (Microsoft Corporation, Redmond, Wash) and the Standard Query Language were used to calculate and record the findings.

Design: Initial categorization grouped orders written within an order set, as well as orders written without an order set. This grouping was done to establish the final cohorts that would permit comparison and allow determination of the utilization of order sets. Probabilities of finding an individual order in each order set were initially determined with a Yates corrected χ2 model. Each order set was then reanalyzed for patterns of orders nested within each individual order set. This was accomplished by coupling the subset of individual orders with the order that had the highest χ2 from the previous round. By aggregating all additional orders placed within that order set, given an already significant χ2 order, the second round of analysis extracted distinct ordering patterns within an individual order set. These results were again measured with a Yates corrected χ2 statistic. Then, by comparing similar frequencies of DRGs, distinct ordering patterns were able to assign ambiguous orders (ie, those placed without the aid of an order set) into the most likely order set. Based on this prediction, utilization of order sets was assessed.

Results: Three million six hundred thousand orders were initially extracted from the Ohio State University Medical Center Information Warehouse. The first-round analysis resulted in a Yates corrected χ2 on 9762 orders, of which 904 were statistically significant (χ2 > 3.84). The second round resulted in more specific patterns and identified an ordering pattern in most order sets. Once these patterns were determined, they were correlated with known DRGs, as well as with the detailed diagnostic codes. More than 2 million records were grouped by DRGs and diagnostic codes, resulting in 516 DRG categories and 4729 diagnostic categories.

Conclusions: The pattern of orders was modeled to profile an order set by combining 2 orders and evaluating their combined χ2s to predict what order set would correspond to a seemingly uncharacteristic ordering pattern. By combining DRGs, we were able to characterize a population of these orders into their corresponding order sets and determine utilization of these order sets. Diagnostic codes were not successful with this statistical model, perhaps because of their narrow focus. The profile provides a powerful model not only to predict DRGs based on ordering patterns, but also to assess utilization and quality of the order sets themselves.

The Virtual Slide Set: A Flexible Authoring and Presentation System for Virtual Slides

Rebecca Crowley, MD, MS ([email protected]); Katsura Fujita. Centers for Pathology and Oncology Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: We describe the development of a Virtual Slide system for creating and viewing clinicopathologic cases with embedded interactive digital microscopy. The system supports rich text-to-image annotation, including (1) hotlinks of text descriptions that move the student to the correct part of the slide and (2) annotations, such as arrows and circles, that appear on the Virtual Slide on request. The interface can be configured by the student to alter the degree of guidance the system provides. The authoring layer provides a graphical user interface to authors for creating new case sets, cases, questions, and annotated virtual slides, which are saved to a database and automatically added to the Virtual Slide home page. The system has been used in 2 pilot studies at the University of Pittsburgh.

Technology: The Virtual Slide Set ( consists of 5 tiers: client, presentation, business, integration, and resource. The system includes a graphic user interface for authoring implemented as a Java applet that allows instructors to create their own Virtual Slide resources. Virtual Slides are whole-slide images captured as multiple partial digital images at high magnification and then joined to create a single large (∼1 GB) image file. Virtual Slides allow panning, zooming, and magnification without loss of resolution—creating an interface that reproduces the main functions of a microscope. Virtual Slides are just starting to be used in Web-based teaching. Our system represents the first Virtual Slide curriculum development system that we are aware of.

Clinical Trials Management: An Interactive Solution to Paper-Based Treatment Schedules for Oncology Clinical Trials

Michael Davis, MS ([email protected]); Brenda Crocker; Bob Rubin. Benedum Oncology Informatics/Pathology Informatics, University of Pittsburgh, Pittsburgh, Pa.

Context: Throughout the country, a number of research-based clinical trials are being conducted within various universities, medical centers, and pharmaceutical companies, particularly in the area of oncology. With these trials, extensive amounts of data must be captured and managed for analysis. These data consist of a vast range of information, from clinical to regulatory to financial. However, one area that has not been fully exploited in the oncology sector is the use of automated technology to manage a cycle-based treatment schedule for temporal patient activities/milestones. This would include managing activities such as experimental drug administration, treatment administration, or specialized testing time points. Traditionally, these schedules or milestones are created manually by the investigator on a word processor and distributed in a hard-copy format to the associates of the trial.

Technology: The system was developed in Java using a 3-tiered object-oriented approach. It uses a standard Web browser on a Microsoft Windows (Microsoft Corporation, Redmond, Wash) platform for the front-end interactions. The middle tier is a Remote Method Invocation server running on an Intranet machine using Windows servers connecting to back-end Oracle (Redwood Shores, Calif) databases residing on Sun Unix (Palo Alto, Calif) hardware.

Design: It is in this context that we have developed the Clinical Trials Management Application, a Web-based application to provide the clinical researchers, nurse coordinators, and supporting offices with an integrated set of tools for managing the administrative and clinical functions for both trial- and patient-based activities. One particular goal was to design software that could allow the user to interactively tie several aspects of managing clinical trial data (ie, event-based activities, treatment, and financial activity), as it relates to defining a cycle-based treatment schedule. This functionality uses an event-based strategy for generating an automated treatment calendar. Using a multidimensional matrix approach, the coordinator defines a set of treatment cycle and day occurrence points. From the representation of these temporal data, the system constructs a matrix-like structure to allow the coordinator to designate events and/or activities (complete blood count draw, study drug, fiscal activities, etc) to the schema. With these event definitions, the user can then assign them to specific cycle-day coordinates, thus completing the treatment matrix. When a patient is registered for the clinical trial, he or she will get a virtual list of scheduled treatments and activities, based on the matrix attributes, which are calculated from the start of treatment date.

Results: The creation of this interactive schedule tool is by no means the definitive or final answer to solving the complexity of clinical trial treatment schedules. However, preliminary user acceptance has suggested that with minor refinements, we will be able to deliver an even more robust and comprehensive treatment schedule for patient care.

Conclusions: The advantage of this practical approach is that it provides a comprehensive and unified method of managing the evolution of patient, trial, and fiscal event activities. Initial usability studies have proven this methodology to be an effective means of capturing treatment-related data, using Web-based technology. This has provided an innovative approach over the traditional paper-based method. Also, this basis can lead to new pathways for integrating interactive scheduling into the clinical trials workflow. This real-time accessibility into capturing temporal clinical data will allow for enhanced analysis and decision support.

KaryoReader: A Tool for Computational Analysis of Cytogenetic Data

George Deeb, MD ([email protected]); Jianxin Wang, PhD; AnneMarie W. Block, PhD; Ping Liang, PhD. Department of Pathology, Roswell Park Cancer Institute, Buffalo, NY.

Context: Conventional cytogenetics is a validated technique to study the acquired gross chromosomal aberrations associated with human cancer. Decades of reported numerical and structural chromosomal abnormalities have now been centralized into an extensive database (Mitelman Database). Analysis of these data using a systematic and computational approach presents ample opportunities for gene discovery and for improving our understanding of human cancer. However, systematic analysis of cytogenetic data has not been possible due to the high level of complexity and encryption of the data.

Technology: KaryoReader (KR) is a Web-based program ( that was designed by 2 of the authors (J.W. and P.L.) to decode karyotypic data and to calculate all implied chromosomal aberrations.

Design: The current version of KR handles karyotypic data written using the International System of Cytogenetic Nomenclature (ISCN) 1995. KaryoReader takes flat files available for download from the Mitelman Database ( as default input format with options allowing users to customize their own data formats. KaryoReader outputs calculated chromosomal aberrations in a tabulated binary format that is analyzable by statistical software.

Results: By enabling the analysis of cytogenetic data on a large scale, KR can be used for several applications, including (1) calculating hidden chromosomal abnormalities to reveal novel recurrent aberrations; (2) characterizing cytogenetic signatures/ patterns for individual cancer types, especially solid tumors; (3) exploring the evolution pathways of chromosome aberrations by comparing different evolution stages of individual cancer types; and (4) guiding the design of array comparative genomic hybridization for discovery studies and for diagnostic purposes.

Conclusions: KaryoReader is a useful tool for systematic analysis of large-scale cytogenetic data to identify novel cytogenetic signatures for human cancer and to study chromosomal aberration pathways.

This work was presented in part at the 94th Annual Meeting of the Association of American Cancer Research, Washington, DC, 2003, and was supported by National Cancer Institute grant CA101501 to P.L.

Spectral Transmittance Measurement Method From Pathologic Image

Hiroyuki Fukuda, MS1 ([email protected]); Yuri Murakami, PhD1,4; Tokiya Abe, MS2; Masahiro Yamaguchi, PhD1,3; Hideaki Haneishi, PhD3; Nagaaki Ohyama, PhD1,2; Yukako Yagi.5 1Telecommunications Advancement Organization, Akasaka Natural Vision Research Center, Tokyo, Japan; 2Tokyo Institute of Technology, Imaging Science & Engineering Laboratory; 3Chiba University, Research Center for Frontier Medical Engineering, Tokyo, Japan; 4Tokyo Institute of Technology, Frontier Collaborative Research Center, Tokyo, Japan; 5University of Pittsburgh Medical Center, Pittsburgh, Pa.

Context: Microscopic images captured with digital cameras for pathologic studies have been analyzed at many research institutes. Although the quality of digital images has dramatically improved, there are factors affecting their use in determining pathologic diagnosis, such as the variability of staining conditions and the characteristics of imaging devices. To solve these problems, a method for quantification of stain conditions from microscopic images taken by multispectral microscope systems has been proposed. In the proposed method, the transmittance spectra are estimated and the amount of stain color pigments is obtained based on the transmittance by means of the Beer-Lambert Law. The authors have designed a 16-band, prototype multispectral microscope imaging system to collect raw fundamental data. In this prototype, the concept of a multispectral imaging system to conventional digital cameras that are used with microscopes universally is explored.

Technology: This spectral transmittance measurement method analyzes microscopic images obtained as transmittance spectra, which are estimated from the signal of a consumer-based digital camera. A system is characterized by taking a set of pictures of color charts in place of specimens. Both the characteristics of the illumination and of the camera sensitivity are estimated as a linear system matrix. The principal steps in obtaining the spectral transmittance of specimens from these pictures are as follows:

  • Take pictures of charts from which the spectral transmittance was previously measured.

  • Calculate the system matrix (H) based on the signals output by the camera and on the spectral transmittance of the charts: H = RT(RRT + ɛIn)−1D (where R indicates transmittance of charts; RT, transpose of R; D, picture signals of charts; and ɛIn, noise level).

  • Take pictures of specimen.

  • Estimate spectral transmittance of specimen () using Wiener estimation: = RfHT(HRfHT + Rn)−1g (where Rf indicates autocorrelation of samples; g, signals of pictures; and Rn, autocorrelation of noise).

Design: A multispectral imaging system using 9 kinds of charts made from plastic film applied for Olympus (Melville, NY) CAMEDIA E-10 digital camera and BX-51 microscope has been developed. As well as taking 3-band (red-green-blue) images, the camera is able to take 6-band images by using previously arranged filter(s). A graphical user interface enables easy measurement of the spectral transmittance at any point on an image.

Results: This system was tested using an Olympus CAMEDIA E-10 digital camera, BX-51 microscope, and hematoxylin-eosin– stained tissue specimens. The results demonstrated that the spectrum information of both hematoxylin and eosin pigments can be estimated using the proposed system.

Conclusions: Testing confirmed the effectiveness of using plastic film charts to measure illumination and camera sensitivity characteristics. Implementation of this method should contribute to the standardization of microscopic pathology images.

Mining Patient Data to Perform Analyte Comparison for Multiple Analyzers

Dennis Jay, PhD ([email protected]). Department of Pathology, St Jude Children's Research Hospital, Memphis, Tenn.

Context: Laboratories reporting the same analyte from 2 or more analyzers are required to verify analyzer agreement as part of their quality control program. The College of American Pathologists requires comparative analyses at least every 6 months. This is typically performed by assaying the same patient samples on each analyzer with subsequent statistical comparison by regression and/or bias analyses. An alternate method is presented, whereby data are mined from the laboratory information system (LIS) for each analyte/analyzer combination and data are analyzed through comparison of population percentiles.

Technology: Data are extracted from our Cerner Classic LIS (Cerner Corp, Kansas City, Mo) using a query constructed in Cerner Command Language (CCL). The resulting ASCII file is then transferred via file transfer protocol (FTP) and imported into Microsoft (Redmond, Wash) Access, where a Visual Basic for Applications (VBA) module performs calculations.

Design: Comparison of population percentiles requires an adequate amount of data for statistical significance. At our institution, data are collected covering the previous month, which yields about 1000 to 1500 data points, and 1500 data points per analyzer for each routine chemistry analyte. The 10th, 20th, 25th, 50th, 75th, 80th, and 90th population percentiles are then calculated for each analyte/analyzer. The difference between each respective analyzer percentile is then compared with allowable limits to determine acceptance. Assumptions inherent in the method are that the population assayed on each analyzer is not statistically different and that the range of values is adequately representative of the patient population.

Results: Data are presented from routine chemistry analytes performed on 2 Vitros 950 analyzers (Ortho Clinical Diagnostics, Rochester, NY), including the comprehensive metabolic profile, lactate dehydrogenase, magnesium, phosphorus, and uric acid. All analytes demonstrate acceptable agreement across percentiles on a monthly basis except for alkaline phosphatase, alanine aminotransferase, aspartate aminotransferase, lactate dehydrogenase, and uric acid.

Conclusions: This method has significantly reduced the labor and reagent costs associated with verifying analyzer agreement by eliminating the need to reassay patient samples for most routine chemistry analytes. Discrepancies have been largely due to differences in patient populations assayed on each analyzer. For these analytes, analysis of fresh patient samples is necessary. A Web-Based Tool That Continues to Evolve in Response to System and Departmental Needs

Fauzia Khan, MD ([email protected]); L. Michael Snyder, MD. Department of Pathology, University of Massachusetts Memorial Health Care, Worcester.

Context: Last year, we presented a clinical decision support tool that provided information to clinicians to assist them in making the right diagnosis. The value of these tools is further underscored by new studies and data that emphasize that about half the time a correct diagnostic approach is not used by clinicians. A team of more than 70 clinicians and laboratorians, subspecialists in their respective fields, developed the original content. Many of these contributors continue to add to our original database. Lippincott, Williams & Wilkins also published a selection of our content as a textbook, which has received very positive reviews from academic journals. The primary aim of this content development was to bridge the gap between the laboratory data and clinical practice, so that clinicians can better use laboratory tests.

Technology: Important features of our Web site,, include ease of navigation and Microsoft Word–like content management capabilities, which require no training for content entry and formatting. Entered content is immediately available at the front end on a real-time basis.

Design: Based on the needs of University of Massachusetts Memorial Health Care and the Department of Hospital Laboratories, we decided to add a new subsection titled “Resources” to our front page. Three new data modules were created under this subsection. Since our departmental site, which is distinct from this site, is only accessible from the intranet, about one third of our clinicians and support staff who comprise the Outreach Program cannot access it. Thus, a new module was created to report all departmental updates listed by the date of announcement. This Web site,, is globally available from the Internet as well as the intranet. A second module that consists of cases indexed by the presenting symptomology was also added. The house staff in pathology will add new cases. The third module was added after feedback, consisting of statistical information as it pertains to interpretation and integration of laboratory tests in clinical care, was received from students and residents.

Results: Our Web-based system has received very positive feedback about ease of navigation and user-friendliness. In addition, physicians who are outside the premises but a part of our extensive system are able to ask laboratory-related questions through this site.

Conclusions: Our efficient, scalable application with extensive content management capabilities can be used in various ways to enhance our mission of providing excellent patient care. We continue to align our Web site to our mission and to add modules accordingly.

Clustering Biomedical Researchers at University of Pittsburgh Medical Center

Sujin Kim, PhD ([email protected]). Pathology and Oncology Informatics Center, Department of Pathology, University of Pittsburgh, Pittsburgh, Pa.

Context: The objective of the study was to understand the information requirements of tissue-based researchers at the University of Pittsburgh Medical Center, a multicomplex organization consisting of various health-related disciplines.

Technology: Data warehouses exist to support user queries, so understanding information needs of users is very important in understanding individual data warehouses. A critical step in this process is to investigate the types and characteristics of users. Since tissue-related resources are designed to support health care professionals for various work roles, the review of user modeling in the context of information-seeking behavior is essential. Therefore, in the user grouping section, this study reviews the literature on professional information-seeking behavior, health care professional information-seeking behavior, complex user group information-seeking behavior, and methodologies used to group users.

Design: Biomedical information required by cancer center researchers at the University of Pittsburgh Medical Center was identified through interviews and publication analysis of academic papers in relation to designing dimensional modeling. The information requirements were placed in the form of “queries,” and the collected queries were then analyzed against existing and proposed research databases supported by the Centers for Oncology and Pathology Informatics at the University of Pittsburgh Medical Center.

Results: Frequent types of queries required were clinical (50.18% in type I), prognostic (17.87% in type II), diagnosis/disorder-based (50.72% in type III), and research oriented (51.9% in type IV). Specimen diagnosis information was requested most frequently (n = 7, 100%), followed by patient demographic information, patient progression or outcomes data, and patient treatment data (n = 6, 85.71%). The great majority of research queries could be represented in a fairly discrete set of information spaces that mapped well to existing information sources in the medical center.

Conclusions: The results indicate that it should be possible to create a research data warehouse—containing de-identified clinical information pulled from existing operational systems—that would support the great majority of research requirements in academic cancer centers.

A Study of User Requirements in Relation to Designing a Tissue-Centric Data Warehouse

Sujin Kim, PhD ([email protected]). Pathology and Oncology Informatics Center, Department of Pathology, University of Pittsburgh, Pa.

Context: The objective of this study is to characterize the types of tissue-centric users based on tissue use and requirements and their job- or work-related variables at the University of Pittsburgh Cancer Institute (UPCI), Pittsburgh, Pa.

Technology: Cluster analysis is one of the multivariate methods used to create a numerically meaningful categorization that fits a set of observations. The study used agglomerative hierarchical clustering, a method for creating clusters in which the 2 closest cases form a cluster at every step until all cases are combined. To measure the distance among a total of 62 subjects represented by 204 binary data results, Binary Euclidean Distance was used. In addition, Ward's linkage method (increase in sum of squares in ClustanGraphics) was used to combine clusters. Ward's linkage is used to recalculate the dissimilarities computed from the binary squared euclidean distance after the 2 newly clustered groups are created. A binary euclidean distance measure is given by B + C/A + B + C + D, where B + C is the number of discordant cases, “distances dissimilarity measures for binary data.”

Design: A self-reporting survey questionnaire was distributed to biomedical researchers at UPCI. Descriptive analysis and cluster analysis were performed to identify and characterize the complex types of tissue-based researchers.

Results: Sixty-two respondents completed the survey, and 2 cluster solutions for all data sets, 2 cluster solutions for the tissue-centric data set, and 4 cluster solutions for the research- or work-related data set were formed. The study found that there are 2 distinct groups of tissue-centric users who directly use tissue samples for their research, as well as its associated information, while indirect users only require the associated information.

Conclusion: Tissue-centric users are of various types, and these types can be distinguished in terms of tissue use and data requirements, as well as work- or research-related activities. This result could be used to characterize not only the users, but also the types of information requests (user queries) and the types of data elements required by different user groups to benefit the design of tissue-centric data warehouses.

Open-Source and Modularized Natural Language Processing Tools for Medical Text Processing

Soo I. Kim1 ([email protected]); Qing Zeng2; Aziz Boxwala2; Frank Chuan Kuo.3 1Department of Electrical and Computer Engineering & DSG, 2Decision Systems Group, and 3Department of Pathology, Brigham and Women's Hospital, Boston, Mass.

Context: Medical free-text processing is an important area of medical informatics research. As part of the Shared Pathology Information Network (SPIN) project, we have been developing natural language processing (NLP) tools for parsing pathology reports. Most of the existing NLP tools for medical text processing are domain-specific or institutional data–specific. They are also rarely available in open-source or modularized format for adaptation or incremental development, and thus result in duplicate or overlapping efforts.

Design: Although medical texts vary from specialty to specialty and institute to institute, their processing shares some common tasks, just as the NLP in general has some generic tasks. Inspired by an open-source NLP software development environment (GATE), we have created a set of open-source medical NLP modules that can serve as building blocks for different NLP applications.

Our general approach is to separate specialty-specific functionalities from non–specialty-specific functionalities and to use parameters to isolate common functionalities from application specifics. This approach facilitates incremental development, and sharing and reuse of software.

Results: We have implemented a total of 7 modules. (In processing a report, 2 basic modules, Tokenizer and Sentence Splitter, which are provided with GATE, are also needed.) Five of the 7 modules are not pathology-specific, namely, Concept Mapper, Concept Classifier, Regular Expression Detector, Negation Detector, and Modifier Finder. These modules address some of the common needs in medical text processing: identify concepts from free text; extract certain types of concepts, such as main findings; extract certain string patterns, such as size or date; and identify modifiers for findings. Negation is a special type of modifier that was deemed especially important in some contexts, thus a Negation Detector was implemented to better capture negations. We have also created 2 pathology-specific modules, namely, Gleason Score Recognizer and TNM Value Recognizer. Gleason score and TNM can be recognized using the general Regular Expression Detector. These 2 modules, however, were created to not only detect the occurrence of Gleason score and TNM, but also to further interpret their components.

Conclusions: The design and implementation of this approach will be refined and tested as we further develop the modules for SPIN's needs and adapt them for use by other domains, such as radiology and surgery. If successful, we should be able to reuse some of the existing modules, minimize and localize modification to the existing modules, and limit the need to develop new modules.

Some Lack-of-Fit Tests Based on Martingale Residuals for the Censored 2-Sample Lifetime Model

Seung-Hwan Lee, MS1 ([email protected]); Song Yang.2 1Department of Mathematics and Statistics, Texas Tech University, Lubbock, Tex; 2Office of Biostatistics, National Institutes of Health, Bethesda, Md.

Context: We explore some lack-of-fit tests for the censored 2-sample accelerated life model. In more statistical terminology, this is the area of survival analysis that is concerned with statistical analysis of time-to-event data. The event may be death, the appearance of a tumor, or the development of some disease. The times to the occurrences of such events are termed lifetimes. Major areas of application, therefore, are medical studies on chronic disease. Lifetime data often come with a feature that creates special problems as the data are analyzed. This feature is known as censoring. For this reason, conventional statistical methods cannot be applied directly.

Technology: We consider 2 processes, an observed process and a simulated process obtained from approximation. We can then plot the observed process along with a number of simulated processes for comparison. These comparisons enable us to assess objectively how unusual the pattern of the observed process is by examining its behavior. Standard bootstrap method applied to the approximation of the observed process leads to the construction of the lack-of-fit tests.

Design: In survival analysis, one of the important problems is the comparison of the lifetimes of 2 groups. This can occur, for example, when 2 different treatments are given to the 2 groups, or when new treatments are given to an existing group. When studying possibly censored lifetimes from 2 groups, the proportional hazard model has been most widely used. The 2-sample accelerated life model provides an alternative to the proportional hazard model. It has simple interpretation that treatment accelerates or decelerates the lifetime by a scale factor.

Results: The new inference procedures and lack-of-fit tests are applied to a couple of data sets from the literature. The first example concerns times to death from vaginal cancer or censorship following treatment of rats with carcinogen DMBA. The second example is from a study of patients with limited stage II or IIIA ovarian carcinoma at the Mayo Clinic. Using the proposed method, vaginal cancer data are reasonably fit by a 2-sample scale model, but not in the case of ovarian cancer data.

Conclusions: Comparison of the processes allows us to pull some information out of the graph about how the model is mis-specified visually. For further development of this method, we can apply the bootstrap method to our problem as a numerical description. We can also extend our 2-sample problem to a more general situation, the k-sample problem. This could be obtained by multiple comparisons.

Comparison of Applications for Remote Control of Personal Computers: Differences in Cost, Performance, and Security

Mark Luquette, MD ([email protected]). Department of Pathology, Children's Hospital and Ohio State University, Columbus.

Context: Remote control of a personal computer (PC) is used by individuals who need to access their PC from a variety of locations, as well as by administrators who need to remotely control a group of computers to service and maintain them. Considerations to the user include cost, security, performance, and functionality, such as the ability to do file transfer and provide data about the remote machine.

Design: Twelve different programs were evaluated. Four are described here to illustrate the major differences in the product classes: VNC (Real VNC Ltd,, PCAnywhere (Symantec,, DameWare (DameWare Development LLC,, and NetSupport Manager (NetSupport Ltd, The other 8 included Remote PC Access (, TightVNC (, Radmin (Famatech LLC,, Remotely Anywhere (3am Labs Ltd,, NetOP (CrossTec Corp,, Huey (GID Software,, Remote Control PRO (Alchemy Lab,, and Cool! Remote Control ( We tested freeware, fully functional demoware, or a fully licensed version. Features were tabulated with respect to cost, ability to do file transfer, ease of the file transfer interface, security compromises to use the application, strength of security in a remote session, and application support.

Results: Costs range from free to approximately $200 for a single user. Most use or allow encryption and do file transfers. Most were easy to install and intuitive to use. Three general classes became apparent: (1) control only, remote computer can be controlled but few other functions are available; (2) standard function, does file transfer and uses encryption; and (3) expanded function, provides enterprise solutions with many administrative controls, including inventory of a remote computer's hardware, software, services, and applications. Specific comments follow.

  • VNC: Advantages are that it is free, platform-independent (Web browser as client), and a very small application (150 KB). Disadvantages are that it is not secure (except Unix SSH), text transfer is on a clipboard but there is no file transfer, and it is for a single user/password.

  • PCAnywhere: Advantages include encryption, corporate support, easy file transfer, multiuser platform, and administrator options. Its disadvantage is cost.

  • Dameware: Advantages are that it installs and uninstalls on the fly on the target PC (no prior need to install server on target). Disadvantages are that it requires that file and print sharing be enabled for remote installation. File transfer is awkward.

  • NetSupport Manager: Advantages include a very facile front end (once configured); a wealth of administrative tools, including global distribution and complete inventory of a target's hardware, software, and services; reasonable cost; selectable encryption (56 to 256 bit) using shared keys; view multiple PCs; multiple administrators view same PC; audio interface; and good support. Its disadvantage is that the initial configuration requires more technical skill.

Conclusions: Testing of 12 different applications to remote control a PC shows each has strengths and weaknesses that tailor it to a particular environment, user type, and mission. Once the class of application is chosen, cost and personal preference become a factor. All offer demoware to assist in a final choice. The Official Web Site for the Reference Laboratory at Children's Hospital, Columbus, Ohio

Mark Luquette, MD ([email protected]); Stephen Qualman, MD; David Thornton, PhD; Cheryl Hamon, MT(ASCP), MS; Lisa Littleton, MT(ASCP), BSCS; Lorence Sing, MT(ASCP), BS; Stephanie Cannon, BS, MS. Department of Pathology, Children's Hospital, Columbus, Ohio.

Context: The Department of Laboratory Medicine at Columbus Children's Hospital performs more than 500 different clinical laboratory and surgical pathology pediatric tests for health care institutions across the country. Since pediatric testing is our main focus, our tests and our processes are designated for the special needs of children, such as using capillary punctures instead of venipuncture and carefully reporting results with age-adjusted controls for appropriate interpretation.

Design: is designed as a user-friendly reference site to provide information about clinical and anatomic laboratory tests and services offered, as well as information describing the capabilities of the departments within the laboratory and its staff. Links to client services, education, and research are also included. Key features of the site include a searchable test directory that specifies the details of specimen collection and preparation, as well as the availability of each test; the ability to order collection supplies; and the ability to set up new accounts and newsletter articles describing clinical activities within the department.

Conclusions: Reflecting the expertise and cutting edge diagnostics of a doctoral staff of 15 pathologists and doctoral scientists, as well as American Society of Clinical Pathology–certified medical technologists, is the most comprehensive Web site representing a department of pediatric laboratory medicine.

Senex: A Bridge From Clinical Medicine to Basic Science

Vei H. Mah, MD1 ([email protected]); Sheldon S. Ball2; Ashalatha V. Nhalil.1 1Senex, Los Angeles, Calif; 2University of California Los Angeles Multicenter Program in Geriatrics and Gerontology, VA Greater Los Angeles Health Care System, Los Angeles.

Context: Clinical and molecular information is needed from laboratory bench to patient bedside, during contact between physician and patient, educator and student, scientist and technician. The information needs to be structured so that specific and real-time retrieval is achievable, and it needs to be presented in a manner that facilitates a conceptual understanding of the details presented. Additionally, an information system should display a certain degree of intelligence, including flexibility in accepting input from the user, the capacity to reason with structured information, and the esthetic display of context-specific information. This information tool should allow us to become better physicians, scientists, educators, and informed citizens.

Technology: Senex is a functional prototype computer system that addresses: (1) representation of medical and biological information, (2) presentation of data, and (3) reasoning with this information. Since the fields of medicine and molecular pathology are dynamic, changing fields, the architecture of Senex must be open to change with the advent of new technologies.

Design: Senex is written in a robust, portable, object-oriented programming environment supported by Common Lisp and the Common Lisp Object System (CLOS). Senex uses the Common Lisp Interface Manager (CLIM) for diagrammatic representation of an object linked directly to its semantics, thus facilitating the separation of the internal representation of objects from the presentation of information to users. This provides a single development environment, from prototyping to application delivery. Senex runs on Macintosh and Windows operating systems.

Results: Senex is an electronic reference integrating clinical medicine with molecular pathology. It is a functional prototype, 14 years in development and currently in use (a) as a teaching tool and electronic reference at the West Los Angeles Geriatric Research, Education and Clinical Center and (b) in a research laboratory at University of California Los Angeles investigating the role of the amyloid precursor protein in the pathology of Alzheimer disease.

Senex is an integrated system that includes information (and tools for interpretation of data) on major components of internal medicine, molecular pathology (biochemistry, molecular biology, genetics, pathology), laboratory medicine, radiology, pharmacology, anatomy, and statistics. It functions independently as a stand-alone application (thus fast, reliable, and mobile), but additionally provides direct links into the World Wide Web, to molecular and clinical databases and to original literature (journal articles, World Health Organization, Centers for Disease Control and Prevention, etc). Tools for data analysis of microarray experiments are integrated into Senex, and methods of data clustering, expression correlation, and data display have been implemented and are in further development.

Conclusions: Senex is a computer application/electronic reference that serves as a “bridge” between clinical medicine and basic science (from bench to bedside). It is a tool useful for physicians in the practice of medicine, as well as a tool useful for research investigators in the interpretation and design of experiments. It is also useful for students and teachers of medicine and the biomedical sciences.

Database Gateway Deployment for Distributed Database Systems

John Milnes, BA ([email protected]). Benedum Oncology Informatics Center and Center for Pathology Informatics, University of Pittsburgh Cancer Institute, Pittsburgh, Pa.

Context: Across the University of Pittsburgh Medical Center Cancer Center sites, a very wide variety of packaged applications and in-house–developed software has been deployed. Today, there is a strong focus on providing integration between these systems to improve clinical processes and enhance research opportunities. Building integration links between systems greatly improves workflow and streamlines data-entry tasks across the organization. At the Oncology-Pathology Informatics Centers, we are using database gateway products as our primary integration tool.

Technology: Three of the primary applications in use here are the in-house–developed Clinical Trials Management System (CTMA), The CoPathPlus System (Cerner Corporation, Kansas City, Mo), and the Impath Cancer Registry System (IMPATH Information Services, Hackensack, NJ). All 3 of these systems reside on different database types on separate hardware platforms. CTMA resides on Oracle's Enterprise Database (Oracle Corporation, Redwood Shores, Calif) on a Sunfire V880 server (Sun Microsystems Inc, Santa Clara, Calif). CoPathPlus resides on a Sybase Adaptive server on a Windows 2000 server (Microsoft Corporation, Redmond, Wash), and the Impath Cancer Registry System resides in a Sybase SQL Anywhere server on a Windows 2000 platform (Microsoft).

Design: Our current gateway architecture involves 3 tiers: a source tier, a gateway tier, and a database tier. The source tier holds the data for CoPathPlus and Impath, which run on a Sybase Adaptive server and SQL Anywhere server, respectively. No changes were required on these systems for the integration. A dedicated, read-only user account was given for access to each system. Because CTMA is already in an Oracle database, the decision was made to integrate the other sources into the CTMA database.

The gateway tier consists of a Windows 2000 server with the following software products installed: Oracle's Enterprise edition, Oracle's Advanced Security Package, Oracle's Networking Server Package, and Oracle's Sybase Adaptive Server Gateway product. Also installed on the gateway tier was the Sybase client software version 11.9.4 for Sybase Adaptive server and an Impath client software installation for SQL Anywhere client connectivity and matching Open Database Connectivity (ODBC) driver.

The database tier consists of Sun Solaris servers (Sun Microsystems) running Oracle's Database. Connectivity is achieved via database links inside existing databases. These gateways were configured to run with all network traffic encrypted, and Internet Protocol packet-level access to the hosts was restricted.

For gateway connectivity, the source database is configured to appear as a Transparent Network Substrate–compliant database to Oracle's networking tools. Each data source on the gateway must have a dedicated listening endpoint created that routes all incoming requests to a particular location for handling. In the case of the Adaptive Server gateway, the listener routes connections to a configuration file, which calls the Sybase gateway software and identifies the location of the Sybase client software. In the case of the SQL Anywhere data source, a different configuration file directs the connection to a predefined ODBC connection on the gateway server.

Results: We can now access any piece of data from across our primary applications seamlessly from inside an Oracle database. From there, we create views and procedures to display data and drive workflow processes. We are presenting data from Impath to users of our intranet site, and we are updating data in our CTMA system. We have been able to eliminate double data entry and manage workflow more effectively than in the past. We have also used this system to read in Excel spreadsheets and migrate data from MySQL-based databases.

Conclusions: At the University of Pittsburgh Medical Center Cancer Centers, we are now able to provide functional data flow across the distributed systems in our environment, unlocking the potential power of these data for our employees, our management, our research partners, and, above all, for the improved care of our customers.

Laptop for Pathology Residents

Deepak Mohan, MD ([email protected]). Department of Pathology, University of Pittsburgh Medical Center, Pittsburgh, Pa.

Context: At APIII 2000, we explored the possibility of providing personal digital assistants to pathology residents ( Our group proposed to provide software and hardware optimized to improve the efficiency of the pathology residents. The primary goal was to provide a portable way to access, organize, and modify patient-related information. The secondary goal was to use the available medical and pathology electronic reference material. As more of these resources became electronically available, we felt the handheld devices would be inadequate. Therefore, the laptop computer, thought to be superior and a potentially more powerful tool, was chosen.

Technology: We selected a Compaq EVO N1020V with 256-MB DDR SDRAM, 40-GB hard drive, 1.44-MB floppy drive (fixed), 15-XGA DVD/CDRW Combo Drive, 56 K V.92 modem, 10/100 NIC Ethernet (Palo Alto, Calif), Intel Pentium 4 processor 2.40 GHz (Santa Clara, Calif), and Microsoft Windows XP Professional operating system (Microsoft Corporation, Redmond, Wash). These devices were to have access to laboratory information systems, including CoPath (Cerner, Kansas City, Mo), Sunquest (Raleigh, NC), and Stentor (Brisbane, Calif). Access was provided to the following Internet-based utilities: events schedule, autopsy schedule, call schedule, anatomic pathology/clinical pathology call log, paging directory, hospital contact list, and task list (

Design: Twenty-six residents in our program were issued laptop computers with the above specifications. A questionnaire was distributed to the pathology residents after proper institutional review board approval. Twenty-one questionnaires were returned by the specified date.

Results: Residents used the laptops for education-related purposes (18/20, 90%); clinical work (17/20, 85%); editing and completing pathology reports (15/19, 79%); conference presentations (20/21, 95%); preparing academic publications (17/20, 85%); and reading medical journals and electronic text (18/20, 90%). In addition, the residents used laptops to e-mail (90%), obtain information regarding their assignments (80%), write research papers or notes (84%), prepare presentations (85%), look for clinical information in the hospital information system (75%), browse the Internet (84%), search medical literature (81%), read e-books and e-journals (86%), and check e-mail (90.5%). They used e-mail to communicate with friends (90%), other residents (76%), attending physicians (19%), or others (85%).

On the Internet, the residents were visiting clinical information portals (90.5%); professional societies, such as the College of American Pathologists (14%); government organizations (38%); Internet search engines (90.5%); and newsgroups or listservs (14%). The frequency of use at home was in the range of daily (52%) or 3 to 4 times a week (33%).

The residents were also using other personal productivity software, such as a word processor (90.5%), presentation software (42.9%), statistical package (76.2%), spreadsheet (71.4%), scheduling/calendar program (9.5%), and database management (71.4%). Eighteen (90%) of 20 residents felt that the laptop made them more efficient and productive. Nineteen (91%) of 21 residents replied that they will use the laptop for board examination review. Eighteen (84%) residents agreed that other residency programs should provide laptops. Nineteen residents (90%) felt that the residency program should continue to provide laptops in the future. One resident suggested that residents should be allowed to keep the laptops after the end of the training. Only 2 residents (9%) felt that laptops did not make any impact on their training.

Conclusions: Laptop computers are very good tools for residency programs to enhance productivity and efficiency of pathology residents. A larger study is necessary to evaluate the increase in efficiency after a certain interval and a cost analysis is needed for further clarification.

Use of Web-Based Surveys for Gathering and Distribution of Immunohistochemistry Protocols

Federico A. Monzon, MD1 ([email protected]); Monica E. de Baca2; Neal Goldstein3; Richard N. Eisen.4 1Department of Pathology, University of Pittsburgh, Pittsburgh, Pa; 2Department of Pathology, Thomas Jefferson University Hospital, Philadelphia, Pa; 3Department of Pathology, William Beaumont Hospital, Royal Oak, Mich; 4Department of Pathology, Greenwich Hospital, Yale New Haven Health System, Greenwich, Conn.

Context: The Society for Applied Immunohistochemistry (SFAI) initiated a clinical laboratory immunohistochemistry (IHC) question survey program for hospital-based or independent clinical laboratories. The survey was designed to address predominantly procedural and technical issues. The goal is to obtain and disseminate IHC technical information of “most common practices” to help develop standardization guidelines and improve IHC stain quality. A major goal of this program is to establish a background information pool of current IHC laboratory practices, which will be available to various societies.

Technology: A Web-based survey tool was developed using csPoller, version 2 (, LLC, Indianapolis, Ind), and displayed at the home page of SFAI (

Design: A survey to gather information on a specific antibody was designed. The survey allows for a single user per laboratory to enter information regarding procedures in his or her laboratory. The topics include section preparation, antibody information (clone, dilution, and incubation time), antigen retrieval type and duration, automation, chromogen, IHC stain laboratory volume, IHC stain workload per technologist, average turnaround time, and laboratory stain volume. Preferred antibody panels for specific diagnostic situations, quality control, and reporting issues are also to be surveyed. The survey results are tallied and results are available on-line at the end of each month after each survey closes.

Results: In our first survey, we collected information on the protocols used for pan-cytokeratin staining. The survey is easy to use, and there were no reported problems with its use. Results show that most institutions use different protocols for the same antibody. It was also noted that 1 commercial vendor of antibodies, detection systems, and instrumentation dominates the market.

Conclusions: Web-based surveys are a useful tool for polling laboratories with the objective of protocol standardization and information dissemination. We will continue to gather information on different antibodies and display results on the SFAI Web site.

Modal Logic Theory for Pathology Inference

G. William Moore, MD, PhD1 ([email protected]); Lawrence A. Brown, MD1; Robert H. Burger, MD, MPA1; Grover M. Hutchins, MD2; Robert E. Miller, MD.2 1Department of Pathology, Baltimore VA Medical Center, Baltimore Veterans Affairs Maryland Health Care System, Baltimore; 2Department of Pathology, University of Maryland Medical System, The Johns Hopkins Medical Institutions, Baltimore.

Context: An automated data-mining program for surveying a clinicopathologic database should incorporate the fundamental constraints on data acquisition in routine medical practice. It may be variously unnecessary, uneconomic, technically unfeasible, or unethical to fill in all possible data items in a rectangular database. Existing clinical databases should include formal considerations for missing values, patient consent, patient risk, and provider alerts. This report proposes a mathematically consistent theory of clinicopathologic inference with the modal concepts of know-whether, need-to-know-whether, and try-to-know-whether.

Technology: Zermelo-Frankel set theory and 3 modal operators.

Design: The theory employs a set of distinct, atomic statements (atoms, A), each of which has a definite true-false status (no self-reference paradoxes). Quantitative, interval, ranked, and categorical data are reformulated as true-false statements. Each atom is either a datum (complaints, history, physical findings, statements of consent, etc) or a medical entity (cancer, inflammation, necrosis, etc). No datum is an entity and no entity is a datum. For each atom, a, belonging to set A, there exists known-to-the-k-a, for integer k up to a maximum, M; and additionally for each datum, d, there exists need-to-know-d and try-to-know-d. A datum is d-Hippocratic (do-no-harm) if and only if not-need-d implies not-try-d, and d-conative (try) if and only if (not-know-d and need-d) implies try-d. An entity may be k-vexative (worrisome) or k-ontologic (exists), based on previously collected data.

Results: The theory is mathematically consistent and satisfies Occam's Razor, namely, that no entities are known without data. The d-Hippocratic, d-conative, k-vexative, and k-ontologic properties are consistent if data are entered consensually, consecutively, and consistently, that is, no datum is entered after its negation has been entered. The computer algorithm concludes within polynomial time.

Conclusions: This report introduces a consistent mathematical system for managing medical concepts and data. Modal logic operators expand the purview of classic symbolic logic to accommodate constraints on clinicopathologic data collection. The theory supports such medical concepts as do-no-harm, try-if-you-need-to, worrisome findings, disease ontologies, and levels of certainty. Ontologies are organized in ascending certainty versus possible harm to the patient. The theory is completely general and permits definitions of patient injury that include mortality, morbidity, inconvenience, financial constraints, or loss of privacy, as well as definitions of need-to-know-datum, where the need may differ among observers (patient, physician, insurer, national health policy, or research protocol). Mathematical theories can serve to organize medical knowledge and patient data, and can improve the scheduling and effectiveness of data collection and surveillance in large clinicopathologic data systems.

An Open-Source Clinical Laboratory Management System Developed for the Kidwai Institute of Oncology

Hareesh Nagarajan, BE ([email protected]). Department of Computer Science and Engineering, R.V. College of Engineering, Bangalore, India; project executed at the Kidwai Institute of Oncology, Bangalore, India.

Context: Federal cancer hospitals in India, such as the Kidwai Institute of Oncology, refrain from computerizing integral business processes primarily because of the high software procurement costs involved in creating comprehensive hospital management systems. By using open-source software components and breaking away from the mindset that “computerization must encompass all the business processes to be deemed useful,” the Kidwai Clinical Laboratory Management System concentrates only on automating the manual and grossly inefficient procedure of patient laboratory test requisitions.

Technology: MySQL-3.26 (, MySQL AB, Uppsala, Sweden) with InnoDB Tables (, Innobase Oy Inc, Helsinki, Finland) was used as the database. The front end was created using PHP4 (, The PHP Group) as the scripting language, running on an APACHE HTTP server (, Apache Software Foundation). With this selection of open-source components (for Windows), the computers on the Local Area Network at the Institute were in a position to access the system using any standard Web browser.

Design: The prime stakeholders in the system were identified to be:

  • Doctors,

  • Central laboratory technicians (CLTs),

  • Individual laboratory technicians (ILTs),

  • Microbiology, Cytopathology, Histopathology, etc), and

  • Administrators.

The administrator is assigned with the responsibility of adding new tests, as well as with adding new users with appropriate privileges. The CLT was responsible for registering every patient's laboratory test (to be performed) into a central repository, which in turn was used at individual laboratories for verification. The ILTs were assigned with the responsibility of writing the test reports using basic HTML. Because of the lack of an e-mail– based communication framework, a simplistic mechanism for communication among the users of the system was devised.

Results: The system has been tested alongside the manual system during the last 2 months. An increase in the level of participation of doctors has been observed, in spite of the inherent difficulties in accessing the system from nonpersonal workstations. The laboratory technicians involved with report generation have experienced some trouble in preparing the reports using HTML tags, but the reports have matched the written ones in appearance. Doctors inquiring on the status of pending tests have used the built-in messaging facility extensively. The canned transactions provided at the CLT's terminal have been used effectively to gauge the efficiency of individual laboratories. This was done by observing the number of pending tests at any point in time at these laboratories, but taking into consideration tests that have historically been more time-consuming than others at the Institute's laboratories.

Conclusions: Federal hospitals in India have a different set of priorities concerning the implementation of informatics than hospitals in the West. It may be for seemingly trivial issues, such as the automation of common laboratory procedures, but the implications of these systems can have far reaching consequences in improving a hospital's existing framework. These improvements could mean increased efficiency, a greater level of accountability, and greater visibility in tasks carried out by the various users of the system. Systems built using open-source components seem to be the only way to implement high-quality, cost-effective solutions for such hospitals.

Web-Based Virtual Tissue Microarray Slides With Clinical Data for Researchers in Hypertext Markup Language (HTML), Excel, and Application Program Interface (API) Standard Extensible Markup Language (XML) Data Exchange Specification Formats Produced With Microsoft Office Applications

David G. Nohle, MS ([email protected]); Barbara A. Hackman, MS, MPH; Leona W. Ayers, MD. The Mid-Region AIDS and Cancer Specimen Resource and the Department of Pathology, College of Medicine and Public Health, The Ohio State University, Columbus.

Context: The Mid-Region AIDS and Cancer Specimen Resource (ACSR) is a tissue bank consortium sponsored by the National Cancer Institute's Division of Cancer Treatment and Diagnosis. The ACSR offers tissue microarray (TMA) sections with de-identified clinical data to approved researchers. Researchers interested in the type and quality of TMA tissues and the associated clinical data need an efficient method for review. Because each of the hundreds of tissue cores has separate data, an organized approach to producing, navigating, and publishing such data is necessary. The April 2003 Association for Pathology Informatics extensible markup language (XML) TMA Data Exchange Standard (TMA DES) proposed format offers an opportunity to evaluate the standard in this setting. Using basic software, we created a Web site that organizes and cross-references TMA lists, virtual slide images, TMA DES export data, linked legends, and clinical details for researchers.

Technology: Microsoft Excel 2000, Microsoft Word 2000, Microsoft Access 2000, Microsoft FrontPage 2000, TMA DES verifier, Virtual Microscope (Olympus America Inc, Melville, NY; BX51 microscope with ×4, ×10, ×20, ×40, and ×60 U Pan Apochromatic objectives; 640 × 480 image acquisition hardware; Prior H101A stage with 0.01-μm scales; Prior H29V4 XYZ controller, Media Cybernetics, Inc, Silver Spring, MD; ImagePro Plus software with the ScopePro module), and the TMA DES PERL script verifier.

Design: For each TMA block, a virtual microscopic image of a stained TMA section is stored in Access. An Excel book for each TMA block contains a details sheet and a legend sheet. The details sheet has core-specific demographic data copied from the database. A link column automatically links each tissue core to its place in the legend sheet. The legend sheet, a “roadmap” of the slide, automatically fills itself in and links to the details sheet. A single TMA block list Excel book lists the TMA blocks (linked to each TMA Excel block file and each image) and a patient-to-block cross-reference sheet. To produce the TMA DES file for each block, the table in the details sheet in that block's Excel file is selected and exported from the Excel program to a tab-delimited text file. A Word main merge file is hand edited to indicate the block name and description and is merged with the text file producing the TMA DES file for that block. Excel files are saved as HTML files with minor editing. The TMA data are represented in Microsoft Excel files with hyperlinks automatically added to allow rapid navigation and custom sorting.

Results: Block lists, cross references, virtual slide images, legends, clinical details, and exports have been placed on a Web site for 12 blocks with 586 cores of 2.0, 1.0, and 0.6 mm sizes. The virtual microscope can be used to view and annotate these images. Researchers can readily navigate from TMA legends to see clinical details for a selected core. The TMA DES PERL script verifier was used to verify the accuracy of the exports. Eleven Common Data Elements from the TMA DES standard were used, and 6 more were created for site-specific data.

Conclusions: Virtual TMA tissue sections with clinical data can be viewed on the Internet by interested researchers. An organized approach to producing, sorting, navigating, and publishing TMA information has been created to facilitate such review. An anticipated pure HTML version will allow simplified data access using a browser.

Aligned Rank Statistics for Repeated Measurement Models With Orthonormal Design

Bernard Omolo ([email protected]); Frits Ruymgaart. Department of Mathematics and Statistics, Texas Tech University, Lubbock.

Context: In 1958, Chernoff and Savage published their landmark article on asymptotic normality for a large class of rank statistics for 2 sample problems. They established asymptotic normality under fixed alternatives (including the null hypothesis) and proved this convergence to be uniform over a large class of alternatives so that asymptotic normality under local alternatives would be derived as a corollary. Rank tests are only distribution- free in a limited number of linear models. In general linear models, however, alignment can be applied to obtain asymptotically distribution-free procedures. This kind of result was obtained by Adichie (1978), among other workers, in the late 1970s based on the contiguity techniques. Other workers also obtained this type of result.

Design: We apply the Chernoff-Savage approach to derive the asymptotics of an aligned rank statistic in the special case in which a linear analysis of variance model has an orthonormal design and repeated measurements are given. Formally, we cannot apply Adichie's results here, since our centered design matrices are not of full rank. This model is particularly useful in testing for linearity in partly linear models, even when repetitions are not present but there are enough data to do some grouping.

Results: The statistic in this repeated measurement model turns out to be the difference of 2 quadratic forms of a vector with Wilcoxon-type components, with an asymptotic χ2 distribution, regardless of the choice of the aligner. Simulation results about the rate of rejecting the null hypothesis in testing for linearity in nonparametric regression with standard Cauchy random errors are discussed.

Conclusions: Investigation into why the asymptotics of the test statistic do not depend on the aligner shows this to be due to cancellation in the location case and also choice of score function in the scale case. Our approach can be easily extended to the multivariate case (multivariate analysis of variance) using a projection method.

Medmicroscopy Telepathology Implementation Between a Regional Care Teaching Facility and a Community Hospital in New Jersey: Initial Hiccups and Recovery

Maneesha Pandey, MD1 ([email protected]); Cyril D'Cruz, MD2; M. Pessin-Minsley, MD2; Robert Borrello, BS.2 Departments of 1Pathology and 2Laboratory Medicine and Pathology, Newark Beth Israel Medical Center, Livingston, NJ (affiliate of Saint Barnabas Medical Center).

Context: Ten pathologists are affiliated with and rotate between a regional care teaching hospital, Newark Beth Israel Medical Center (669 beds), and their satellite community hospital, Clara Maass Medical Center (465 beds). There is only 1 pathologist present at the satellite hospital (Clara Maass Medical Center), with a half-hour drive from the medical center. The medmicroscopy system and software for this setup were installed with the objective of functioning as a telepathology instrument for intraoperative consultations, case discussions, and also continuing medical education between and within the 2 hospitals. The equipment was operational within 2 weeks of installation. Initial technical and software problems, as well as user friendliness of the system for the pathologists, was assessed to evaluate the degree of success of the system for this setup.

Technology: The installed medmicroscopy system by Trestle Corporation (Irvine, Calif) consists of 2 microscope sites, one for each hospital, and multiple viewer sites for the users. The medmicroscopy 2.9v software was installed in computers at both microscope and viewer sites. The microscope sites have an automated microscope (Olympus BX51 [Melville, NY] with ×2, ×4, ×10, ×20, ×40, and ×60 lens objectives) with robotic arm, digital camera, and a control panel attached to a standard personal computer. The viewer sites have only a standard personal computer. The system transmits 24-bit, true-color, high-fidelity images in real time via Internet/intranet protocol and local area network simultaneously to multiple viewer sites.

Design: Technical and software problems were monitored and documented from July 14, 2003 to September 9, 2003. A laboratory computer technician worked to solve these problems. The 10 pathologists were interviewed to assess how comfortable they were in using the system.

Results: The technical and software problems consisted of difficulty scanning slides, including poor digital image quality at low power, shutting down of entire system every 30 minutes, impaired ×4 lens objective, and failure of system to connect to the local host. All these problems were addressed and fixed in a reasonable time. The pathologists' level of comfort using the software depended on previous exposure to computers. At the end of 2 months, 6 out of 10 pathologists were comfortable using it, 3 were planning to use it, and 1 was still learning. A total of 5 cases (1 frozen, 2 rapid cytology assessment, and 2 controversial cases) were discussed between the 2 hospitals. The major use of the system has been for continuing medical education for intradepartmental conferences and research. The pathology presentations were better and were commended at the intradepartmental conferences.

Conclusions: Two months into implementation, the technical and software glitches ranged from minor to moderate and all were solved. The pathologists are continuing to learn and apply the system with greater ease. The medmicroscopy system implementation appears to have been successful in this setup and will continue to aid in efficient functioning of pathology departments of both hospitals.

Pathology Laboratory Information System Using a Structured Document Report and Image-Enhanced Report System in a Commercial Laboratory Environment

Rae Woong Park, MD1 ([email protected]); Sang Yong Song2; Wooyoung Jang3; Sang-yeop Yi4; Hee-Jae Joo.1 1Division of Medical Informatics, Department of Pathology, Ajou University School of Medicine, Suwon, Republic of Korea; 2Department of Pathology, Sungkyunkwan University School of Medicine, Samsung Medical Center, Seoul, Republic of Korea; 3Department of Pathology, Hallym University College of Medicine, Chunchon, Republic of Korea; 4Department of Pathology, Kwandong University College of Medicine, Seoul, Republic of Korea.

Context: We sought to introduce an image-enhanced report and structured document report system in gynecologic cytology in a commercial laboratory to enhance the quality of pathology diagnostic service and to reduce turnaround time.

Technology: The program was implemented in Microsoft Visual Basic 6.0 (Microsoft Corporation, Redmond, Wash). Microsoft SQL server software was used running under a Windows 2000 operating system.

Design: The Bethesda System 2001 cytology report guideline can be easily converted into the structured document report system. Every gynecologic cytology data-processing document filed is computerized to create a paperless laboratory environment. Cytologists perform the screening process and make screening diagnoses by clicking on the computer monitor. A photograph taken by cytologists during the screening process is incorporated into the report. The user interface for the photograph-taking process should be simple and convenient to reduce the workload. Screened cases for which an image has been taken are sent to pathologists. The pathologist on duty makes diagnoses and generates reports by clicking on the computer monitor. A pathologist can add to or modify the preliminary diagnoses. If the image is not satisfactory, then he or she can replace it directly. Various quality control program requirements, such as specimen adequacy, slide-staining quality, and statistics of intraepithelial lesion frequency, are incorporated into the laboratory information system. After confirming the content of reports, each report is generated as a portable document format (PDF) data file. The generated PDF files are directly and quickly transferred to local agencies using the Internet file transfer protocol (FTP) service. Then, the local agency prints the PDF files using a color laser printer and delivers them to client hospitals.

Results: We could maintain a specimen turnaround time of less than 48 hours by using this laboratory information system based on a structured document report and image-enhanced report system. The images in the image-enhanced report helped explain the report to the patient. More comments and textual information can be provided to clinicians, because typing on a computer keyboard is more convenient than handwriting for experienced pathologists. Real-time analysis of diagnoses was helpful in improving laboratory quality control.

Conclusions: The structured document report and image-enhanced report system can be easily constructed, especially in gynecologic cytology. This system helps reduce turnaround time in a laboratory. Clinicians and their patients willingly accept the image-enhanced report.

Implementation of a Central Autopsy Database for a Multihospital Medical Center

Reade A. Quinton, MD1 ([email protected]); William K. Lawrence, UT1; Philip J. Boyer, MD, PhD1; David Dolinak, MD2; Jeffrey J. Barnard, MD.2 1Department of Pathology, University of Texas Southwestern Medical Center, Dallas; 2Southwestern Institute of Forensic Sciences, Dallas, Tex.

Context: Hospital autopsies play an invaluable role for correlation of clinical and pathologic findings, teaching, and research. Despite the importance and value of this service, autopsy data are rarely easy to search, even if the reports are incorporated into a hospital laboratory information system (LIS). In July 2002, the office of the Dallas County Medical Examiner assumed supervision of the University of Texas Southwestern Medical Center autopsy service. This transition provided a unique opportunity to address problems with the service. Prior to 2003, autopsy reports were generated as Microsoft Word (Microsoft Corporation, Redmond, Wash) documents and were not included in the medical center LIS. These files were stored on a local desktop, with no systematic mechanism for backup, data analysis, or data retrieval. In addition, the autopsy service covers multiple hospitals, which are not all linked to the same LIS. Although there were plans to update the aging LIS, the medical center was still a few years away from making significant changes. A large number of issues needed consideration before making the successful transition from a Word-based system to a database system. Limited literature exists guiding the implementation of an autopsy database and reporting system.

Technology: Given the current state of the medical center LIS, emerging patient-confidentiality considerations, and the limited budget that the autopsy service commands, an autopsy database and reporting tool was designed using Filemaker Pro 5.5 software (Filemaker Inc, Santa Clara, Calif). The database was contained on a central server, accessible to both Mac and PC platforms.

Design: Primary interests in the design of the client-server autopsy database included ease of use and data entry, ease of accessibility of reports across a large campus, security of the data, and potential for data mining, quality assurance, and research efforts. In addition, the database needed to generate printed reports similar to those previously issued, and it needed to be compatible with any future LIS, which will be completed by 2005. Most importantly, by building a personalized system we could specifically fill the needs of our service and continue to modify the system as new ideas or complications arose.

Results: We provide a discussion and demonstration of our autopsy database, highlighting (1) systematic consideration of the numerous issues that confront implementation of an autopsy database, (2) documentation of choice of hardware and software, (3) training considerations, (4) preimplementation and postimplementation usage patterns (eg, turnaround time) as a measure of success, (5) pitfalls, and (6) cost considerations.

Conclusions: This project resulted in an autopsy database and reporting system that met our diverse interests and remains flexible for future additions and projects. Future goals include incorporation of the autopsy digital image archive and fusion with the future medical center LIS.

Development and Deployment of a Biomaterial Tracking and Management (BTM) System Into Anatomic Pathology Practice to Support Tissue Banking Initiatives

Mark Tuthill, MD ([email protected]); Azadeh Stark, PhD1; David Chen, MD1; Rebbeca Robinette1; Azita Sharif, MS, MBA2; Richard Zarbo, MD, DMD.1 1Department of Pathology and Laboratory Medicine, Henry Ford Health System, Detroit, Mich; 2Daedalus Software, Cambridge, Mass.

Context: In order to support our tissue procurement and tissue banking initiatives, we have developed and implemented an Internet-accessible database application. This application enables pathology personnel to rapidly accession and track tissue samples, as well as support real-time, Health Insurance Portability and Accountability Act (HIPAA)-compliant queries from researchers looking for samples. By using thin client technology, we can easily deploy workstations at the point of service in the anatomic pathology laboratory or tissue bank. Furthermore, as the application is Internet-based, researchers are able to access the system and run queries from nearly any locality.

Technology: The system is currently deployed on a Dell PowerEdge server (Dell Inc, Round Rock, Tex) running a Windows 2000 Server operating system (Microsoft Corporation, Redmond, Wash). Internet services are provided through Microsoft Internet Information Server. The Biomaterial Tracking and Management (BTM) system is capable of working with any relation database management system. Our implementation uses MS-SQL. Application coding was managed with a variety of tools, but ultimately generates JavaScript and Microsoft active server pages (ASP). Windows OS Client workstations require only Microsoft Internet Explorer, along with a barcode printer and scanner drivers (USB or serial).

Design: Design, development, and implementation have occurred in 3 discrete phases: pilot systems development, operational systems deployment and integration, and enterprise and interfacing. Currently, we have completed pilot testing of most system components. The second phase of the project is now under way, with integration of the software into the clinical workflow allowing specimens to be added to the repository as they are received. Steps include installation of workstations in the surgical pathology gross dissection areas. Implementation of the BTM is occurring in conjunction with implementation of an anatomic pathology laboratory information system (AP-LIS) that will use the same workstation hardware. Each workstation will have access to the BTM, Internet e-mail, the anatomic and clinical pathology information systems, and the Henry Ford Health System clinical information system (CarePlus). Each workstation will also be equipped with a barcode label reader and barcode printer.

The enterprise interface phase of the project focuses on interfaces between the BTM and the AP-LIS, eliminating effort and error due to duplicate entry of data at sample accession. This final step will require a highly coordinated team approach on the part of Pathology Informatics, Henry Ford Health System information technology, and our commercial partners.

Results: Demographic and pathologic data have been entered from a total of 2160 individuals who consented to donate surgically excised tissue and 928 patients who consented to donate blood samples.

Conclusions: Completion of this project during the next 2 years will link our clinical information systems to the BTM, allowing for an efficient and reliable system for tracking and managing tissue specimens. Simultaneously, the system will tightly secure clinical data, ensure patient confidentiality, and support researchers' needs to mine tissue banks for appropriate sample selection.

Corresponding author: Michael J. Becich, MD, PhD, University of Pittsburgh Medical Center, Department of Pathology, 5230 Centre Ave, 3rd Floor, Suite 301, Pittsburgh, PA 15232 ([email protected]).

Reprints not available from the authors.