1. The Xerox DocuShare CPX Extensible Database—Real Time Connection of XML Content
      1. White Paper
      2. April 2007
      3. Intake and Indexing—with a Twist
      4. Find and Repurpose Specific Content Strings
      5. Would You Like to Learn More?

    The Xerox DocuShare CPX Extensible Database—
    Real Time Connection of XML Content
    White Paper
    April 2007

    Xerox DocuShare - DocuShare CPX XDB White Paper -
    page
    The Xerox DocuShare CPX Extensible Database—
    Real Time Connection of XML Content
    With the majority of today’s business content passing through Internet-
    based networks, XML has become the de facto language for transferring
    structured information among many business applications and processes.
    XML is now the informational fluid for managing the movement of data
    across an organization in ways not previously possible. Its inherent flexibil -
    ity enables myriad options to format and structure that information.
    But as a consequence, the structural variety of fles passed makes them
    impossible to connect without either a precise matching of the XML
    structure (schema) or the use of some intermediary translation. Effective
    integration requires advanced knowledge of the detailed application or
    process schema, so that connection points can be pre-determined and
    accommodated in the XML code. Without that “prior alignment,” conver -
    sion steps are often needed, involving costly and time-consuming human
    intervention.
    Harnessing the true potential of XML as the conduit of informational con-
    nectivity requires a seamless mapping of structured content regardless of
    the source fle’s XML construction. Previously, this capability did not exist.
    Now, Xerox DocuShare CPX offers an extensible database (XDB) that
    enables simple, direct XML-to-XML connection, quickly and automatically
    linking diverse organizational content to accelerate business processes
    and productivity.
    DocuShare CPX Takes XML to a New Level
    Unlike many XML information passing systems, the DocuShare CPX
    extensible database retains an original document (such as a Microsoft
    Word, Microsoft Excel, Adobe PDF, or Adobe FrameMaker fle) while
    also providing direct access to the information contained within the
    document. The XDB summarizes that information into XML and then
    uses the converted XML to extract and share relevant data for other
    organizational needs, such as quickly creating reports that pull from
    multiple source documents.
    This capability not only applies to new documents created after business
    processes are defned, but also extends to archived or legacy documents
    which are already associated with a process. CPX XDB spans all structured
    information to identify common process touch points, eliminating manual
    intervention in mapping source document structures. To ensure adher -
    ence to established security policies, once information is brought into
    DocuShare CPX, its access permissions are enforced, whether the infor -
    mation is accessed in XML or in the original source format.
    Over 80% of data within
    enterprises is estimated
    to be in unstructured
    formats like Microsoft
    Word and Excel as well
    as Adobe PDF fle formats.
    There are 300 million
    Excel installations world-
    wide, 200 million PDF
    documents on the Web,
    and 100 million new
    Microsoft Offce docu -
    ments created every day.
    —Informatica, Inc., November 2006

    1
    2
    3
    4
    1 .xls
    2 .pdf
    3 .doc
    4
    XML Index Data
    Document Renditions
    +
    Metadata
    Relational Database
    DocuShare File Store
    Extensible Database (XDB) Technology
    DocuShare Repository
    XDB Indexer
    Source Content Files
    XDB Intake and Indexing Process
    Content Parsers
    .xml Data
    Source File
    XML Data
    Files submitted to DocuShare are added to the DocuShare File Store and associated metadata is indexed into a relational database.
    When XDB processing is enabled, incoming files are also processed through a specific file parser that creates an XML file that represents
    the structure and hierarchy of information components. Each new XML file is then processed using a patented schema-less mapping
    algorithm that indexes the hierarchical content structure into a relational database. Once the XML representation is indexed, it is added
    to the DocuShare repository as an alternative rendition of the original content.
    1 .xls
    2 .pdf
    3 .doc
    4
    XML Index
    Metadata
    Xerox DocuShare - DocuShare CPX XDB White Paper -
    page
    Intake and Indexing—with a Twist
    DocuShare’s extensible database accomplishes these connectivity goals
    through its unique intake and indexing process.
    The process begins with the source content fles, including today’s most
    common formats. With the standard DocuShare CPX content management
    system, source fles are added to the DocuShare repository where they are
    stored and where metadata is added to facilitate content management.
    However, when the XDB is enabled, an additional process on the incoming
    content is performed in tandem. The original source fle is passed through
    a content parser that creates an associated XML fle. The XML fle is stored
    in the DocuShare repository as a second rendition of the original document.
    The XML rendition is then passed through the CPX XDB Indexer,
    a technology used by DocuShare that indexes the content into a relational
    database management system (either Oracle or Microsoft SQL Server).
    The resulting XDB index in the database co-exists along with the metadata
    attached in the standard DocuShare CPX process, becoming part of
    a flexible DocuShare knowledge network through which users can easily
    search for and retrieve stored content.

    Process manager uses the Excel Summari-
    zation Template to design an automated
    summarization spreadsheet in Excel. This
    spreadsheet can then be retained on the
    desktop or uploaded to DocuShare.
    Process manager creates a form in Excel
    utilizing the Excel Submission Template
    and uploads it to DocuShare.
    Process participants download and
    complete the forms then upload them to
    DocuShare by clicking the Submit button.
    XDB intakes, processes, and indexes
    the information from each submission.
    Query
    Process manager views an aggregated
    report of all inputs in the Excel summari-
    zation spreadsheet. Information in this
    spreadsheet is automatically updated
    based on dynamic queries to the XDB.
    XML Indexer
    Because incoming content is
    converted to XML and indexed
    based on its structural organiza-
    tion, XDB reports can access
    information that is embedded
    in viturally any type of docu-
    ment, including Microsoft
    Word and E-form responses.
    DocuShare Repository
    E-form Pathway
    Word Pathway
    XDB In Action
    Updated
    Information
    Excel Pathway
    Xerox DocuShare - DocuShare CPX XDB White Paper -
    page
    Because the information is indexed based on contextual identifers, the
    XDB can easily access information represented in the content and summa -
    rize it across documents whenever needed. These optional processes are
    easily enabled by the DocuShare CPX administrator who specifes what
    types of content should be subject to XML conversion and XDB indexing.
    Find and Repurpose Specific Content Strings
    One of greatest strengths of the XDB technology is its ability to fnd
    and assimilate specifc content components from similarly structured
    business documents like contracts, presentations, or spreadsheets.
    The individual components can be retrieved and re-assembled by
    the XDB to create concise summaries of relevant information across
    multiple source documents.
    The components are found based on the XML context that is associated
    with them. For instance, a company may use standard contracts created
    in Microsoft Word as part of a specifc business process. Each contract
    has a unique termination clause as part of its content, which is identifed
    based on textual markers (headlines, bolding, underlining, page position,
    etc.) and tagged as context during the XML conversion. The XDB has a
    simple-to-use search function that queries the context information to fnd
    all occurrences of the termination heading. It then returns the text in the
    “Enterprises must recog -
    nize that the thousands of
    uncontrolled spreadsheets
    their employees use every
    day represent a signifcant
    risk. Poorly managed
    spreadsheetsmay—through
    negligence, incompetence
    or deliberate criminal con-
    duct—result in signifcant
    business losses, exposure
    to legal liability, damage to
    reputation and unwelcome
    regulatoryaten
    t tion.”
    —From Gartner, Symposium/ITXpo 2006,
    “The Information Explosion and What
    to Do About It,” Toby Bell, October 2006

    Xerox DocuShare - DocuShare CPX XDB White Paper -
    page
    paragraphs of just that clause for each contract fle within a designated
    DocuShare collection. The retrieved content summary can be either saved
    to a report format or repurposed, wholly or in part, into another document
    through a simple cut and paste.
    This capability is especially useful for highly complex content structures
    such as those found in Excel spreadsheets. Excel content can vary
    from basic names of columns or rows to detailed cell ranges. The XDB
    content parser identifes spreadsheet content based on range names,
    attaches XML context data, and then passes it on to the XDB Indexing
    process and into the RDBMS. The content is then readily retrieved
    and shared on demand. Because Excel information so frequently drives
    corporate business processes, the XDB can be a particularly powerful tool
    for integrating systems around Excel spreadsheets or quickly accessing
    summarizations of Excel data from disparate sources.
    Even further, the extensible database is impartial to the original source
    format of stored data. Once it is passed through the XDB Indexing
    process, identically named data from varying source documents and
    formats, such as from Word and Excel, can be retrieved to the same
    report. For example, a column labeled ‘location of travel’ from Excel-
    based expense reports can be combined with ‘location of travel’
    information contained in standard Word-based sales trip reports.
    Accumulated studies by
    audit frms since 1998
    show that as many as 94%
    of corporate spreadsheets
    may have some form of
    error, ranging from negli-
    gible to extremely serious.
    —Results of research by R. Panko, “What
    We Know About Spreadsheet Errors,”
    University of Hawaii, January 2005
    Faster, More Accurate Business Intelligence with XML Submission and Summary:
    Universities Space Research Association
    60 person hours each month to capture
    and collate the necessary information
    into useful reports. This manual process
    also generated a high number of tran-
    scription errors—an audit of one $700M
    program with over 00 mile-stones
    revealed a 0% discrepancy rate.
    USRA addressed this growing problem
    by creating a performance management
    tool with NASA that leveraged the XML
    submission and rendering capabilities
    built into the DocuShare CPX extensible
    database. USRA uses XDB as an XML-hub
    for managing, storing, and synchroniz -
    ing project data among source documents,
    including integration with the organiza -
    tion’s core systems from Oracle and
    SAP. The solution enables project
    managers to automate submission of
    content through XDB-enabled source
    documents, such as Excel spreadsheets.
    The Universities Space Research
    Association (USRA), a non-profit
    research organization chartered to foster
    cooperative research, development, and
    education associated with space science
    and technology, helps the National
    Aeronautics and Space Administration
    (NASA) manage its business intelligently.
    With billions of dollars worth of research
    and development projects currently
    underway, certain centers within NASA
    were facing efforts required to manage
    information resources for its fnancial
    and project performance reports. Manag -
    ers were required to manually copy
    and paste detailed fnancial and project
    information from many disparate sources
    into numerous reports. This resulted in
    valuable time being spent consolidat -
    ing data rather than analyzing it. For
    example, one report alone took up to
    XDB then reassembles the XML content
    as required by each manager into accurate
    summary documents. $ .B of internal
    activity is now managed using the tool.
    The resulting time and labor effciencies
    have made project performance informa -
    tion available to managers in a much more
    timely, accurate, and effective manner.
    By automating the process, report creation
    time was signifcantly reduced, from 60
    person hours down to for example,
    and discrepancies were virtually elimi -
    nated. Now managers and analysts can
    spend time actually analyzing and using
    data rather than consolidating it.
    For more information, contact USRA’s
    Research Institute for Advanced
    Computer Science (
    www.riacs.edu)
    info@riacs.edu
    .

    Xerox DocuShare - DocuShare CPX XDB White Paper -
    page 6
    Would You Like to Learn More?
    For more information on DocuShare CPX XDB, please call
    1.800.735.7749
    or visit
    docushare.xerox.com
    .
    About DocuShare CPX
    Xerox DocuShare CPX, a highly intuitive and secure Enterprise Content
    Management (ECM) application, enables document intensive organiza -
    tions to dynamically capture, manage, retrieve and distribute information
    easily, regardless of skill level or location. Part of the Xerox DocuShare
    family of ECM products, DocuShare CPX customers can signifcantly
    improve productivity, streamline business processes, and reduce the
    time and cost of managing routine business documents and information.
    Leading the industry in speed of deployment and ease of administration
    and use, DocuShare CPX signifcantly reduces installation and complexity,
    and flexibly extends into an existing infrastructure, resulting in lower total
    cost of ownership and faster return on investments. Tightly integrated
    with Xerox Document Centre and WorkCentre Pro, DocuShare CPX can
    manage both hard copy and electronic content with unsurpassed ease
    and convenience.
    Xerox DocuShare Business Unit
    A Division of Xerox Global Services
    00 Hillview Avenue
    Palo Alto, California 9 0
    U.S.A.
    1.800.7 .77 9
    © 007 Xerox Corporation. All rights reserved. Copyright protection claimed includes all forms and matters of copyrightable material and
    information now allowed by statutory or judicial law or hereinafter granted. Xerox, DocuShare, and WorkCentre are registered trademarks
    of Xerox Corporation. All other trademarks are the property of their respective companies and are recognized as such.

    Back to top