DESCRIPTION
1. Technical Field
This invention relates to the field of computer application programs and more specifically to a method and means for combining the data contained in a number of tables having dissimilar formats.
2. Background of the Invention
Computer spreadsheet applications are a popular means of relating a large amount of numerical data with predetermined mathematical relationships. Specifically, a spreadsheet typically comprises a plurality of cells, arranged in rows and columnswherein each cell is identified by its X-Y position in the spreadsheet and further wherein data within the cell may be manipulated or processed according to its relative position with respect to other cells. In addition to two dimensional arrays, somespreadsheets may include other planes for associating cells, thus providing a spreadsheet structure having three or more dimensions.
Groups of cells within a spreadsheet which may be logically grouped may be thought of as a table wherein a single spreadsheet may contain multiple tables. In other words, a table may comprise a portion of a spreadsheet. In addition, a table mayexist in a static form such as a table which comprises a portion of a word processing document.
Occasionally, it is desirable to combine the data contained in multiple tables into a single master table. Prior spreadsheet application programs have provided only limited capability for table aggregation wherein only source tables havingidentical formats, in terms of size and category designations, could be combined. No technique is known which allows any number of tables having virtually any configuration to be automatically combined.
SUMMARY AND OBJECTS OF THE INVENTION
Accordingly, it is an object of the present invention to provide a method and system for aggregating tables having dissimilar formats.
It is another object of the present invention to provide a method and system for aggregating tables based on specific categories of information.
It is still another object of the present invention to provide a method and system for aggregating tables based on generic categories of information.
In summary, the present invention contemplates a method and system for automatically aggregating tables having a variety of configurations or layouts. Specifically, tables having a variety of categories with multiple divisions within thosecategories may be combined wherein rows and columns are automatically created in a destination table based on the categories and divisions of one or more source tables. In accordance with the teachings of the present invention, a plurality of sourcetables are selected as input to the system. Categories for aggregation are then specified by a user or are automatically generated based on the categories contained within the source tables. Once the desired categories are specified, mapping tables forrows and columns are created wherein each mapping table comprises an array of pairs of values wherein each pair comprises a first value for identifying a source table location and a second value for identifying a destination table location. The systemthen conducts a binary search based on each pair in the mapping table to find the correct location in the destination table and to apply the desired table mapping by performing the desired mathematical function on the values in the source and destinationtables (e.g., summing the values in the appropriate locations in the source and destination tables).
BRIEF DESCRIPTION OF THE DRAWINGS
These and other objects may be completely understood through the detailed description of the invention below and the accompanying drawings in which:
FIG. 1 is a diagram showing a prior aggregation technique.
FIG. 2 is a diagram showing the aggregation of spreadsheets having dissimilar formats.
FIG. 3 is a diagram showing the automatic aggregation of multiple spreadsheets based on specific categories of information.
FIG. 4 is a diagram showing the aggregation of spreadsheets based on generic categories of information in accordance with the teachings of the present invention.
FIG. 5 is a diagram of a table demonstrating the cell location identifying convention used in the operation of the present invention.
DETAILED DESCRIPTION OF THE INVENTION
The present invention contemplates a method and system which allows the consolidation or aggregation of data in disparate tables into a single aggregate table which summarizes that data. While the present invention is adapted for aggregatingvirtually any type of table, in the discussion below, spreadsheets are shown as an example of one application of this invention. As shown in FIG. 1, in prior systems, only spreadsheets having identical configurations could be combined. The example ofFIG. 1 demonstrates the aggregation capabilities of the popular Lotus 1-2-3 spreadsheet application program available from Lotus Development Corp. In the system 100, source spreadsheets 102, 104 represent typical spreadsheets wherein various categoriesare arranged along the vertical axis and various divisions are aligned along the horizontal axis of the spreadsheet, thus forming a plurality of cells arranged in rows and columns. In the example of system 100, financial categories are aligned along thevertical axis of the spreadsheets 102, 104 and yearly divisions are aligned along the horizontal axis of the spreadsheet.
In the system 100, source spreadsheets 102, 104 may be aggregated into an aggregate spreadsheet 106 which must be the same configuration as source spreadsheets 102, 104 wherein an identical number of rows and columns must be used with identicalcategories and divisions aligned along each vertical and horizontal axis, respectively.
In contrast, the present invention provides an improved method and system wherein source tables of virtually any size and configuration may be combined in an aggregate table based on user-defined, or automatically generated criteria. Referringnow to FIG. 2, in a first aspect of the present invention, the source spreadsheets 202, 204 may be combined into a destination spreadsheet 206 wherein the source spreadsheets 202, 204 may contain a different number of rows and columns having differentcategory and division designations. For example, spreadsheet 202 may contain the financial categories "Cost", "Profit", "Sales" and "Inventory" aligned along the vertical axis with divisions for the years 1986-1990 aligned along the horizontal axis inascending order; and spreadsheet 204 may contain the financial categories "Sales", "Profit" and "Losses" aligned along the vertical axis with divisions for the years 1989-1986 aligned along the horizontal axis in descending order.
The present invention provides a method and means for automatically aggregating source spreadsheets 202, 204 to generate a destination spreadsheet 206 having all of the categories and divisions of both spreadsheets 202, 204. This aspect of thepresent invention is referred to as dynamic aggregation. In the system 200, the respective fields of the source spreadsheets 202, 204 may be combined according to predefined relationships such as summing, subtracting, etc., as represented by fields 208,210 which are summed to generate field 212. In cases wherein there are no common categories between source spreadsheets, new fields are created in the destination spreadsheet 206 as represented by fields 214, 216 and fields 218, 220, respectively.
In another aspect of the present invention, referred to as static aggregation and shown in FIG. 3, specific fields may be specified by the user for aggregating in the destination spreadsheet 302. For example, in the system 300, sourcespreadsheets 202, 204 described above are used to generate aggregate spreadsheet 302 which contains aggregates of the "Cost" and "Profit" fields of spreadsheet 202 and the "Profit" fields of spreadsheet 204. In this example, exemplary source fields 208and 210 of spreadsheets 202, 204, respectively, may be summed to create destination field 306 and the cost fields, e.g., cost field 214, are directly copied into aggregate spreadsheet 302 to generate new cost fields, e.g. cost field 304.
In the above examples, the respective aggregate spreadsheets are generated from source spreadsheets having nonvariant or specific categories, or categories for only one type of item. In another aspect of the present invention, the presentinvention contemplates a method and system for aggregating spreadsheets having generic categories of information. Referring now to FIG. 4, the spreadsheets 202' and 204' are similar to the spreadsheets 202, 204 with the exception that the "Sales" fieldshave been modified. For example, sales field 404 has been modified to represent sales of "widgets" for 1986 and sales field 406 has been modified to represent sales of "cogs" for 1986. Accordingly, in addition to all of the features described above,the present invention contemplates a method and system wherein a generic aggregate field may be defined, such as field 408, which represents total sales for both widgets and cogs for 1986.
According to the principles of the present invention, inputs to the inventive aggregation system comprise a list of input tables and an optional output table template which may be specified by a user. If no output table template is specified,one is generated by scanning all of the input tables and including each unique category and division found in the input tables. This is performed for each axis.
Referring now to FIG. 5, according to the principles of the present invention, input data is generated in the form of a list of tables designated:
where N is the total number of input tables given.
The value of a particular element (or entry) of a table is represented in the form:
where T.sub.i is the i.sup.th table in the input list, j is the row number and k is the column number of the table entry. The number of data rows and columns contained in table T.sub.i is indicated by r.sub.i and c.sub.i, respectively. Thecategory and division elements are treated as special table entries with 0-value indices. A table incorporating the above convention is shown in FIG. 5.
An output table template (either automatically generated or specified by a user), is referred to as O, each element being listed as O[i,j]. In the preferred practice of the present invention, the output table template is initialized wherein alldata values are set to zero and all unique categories and divisions are stored in the elements O[i,0], and O[0,j], respectively. The maximum data row and column of the output table is designated r.sub.O and c.sub.O.
The GenerateTable function is used to automatically generate an output table template:
______________________________________ GenerateTable({T.sub.1, T.sub.2, . . ., T.sub.N }, O) for each table, T.sub.i .epsilon.{T.sub.1, T.sub.2, . . ., T.sub.N } { for r.sub.in = 1 to r.sub.i { for r.sub.out = 1 to r.sub.o if T[r.sub.in, 0]== O[r.sub.out, 0] goto RowFound; r.sub.o ++; O[r.sub.o, 0] = T[r.sub.in, 0]; RowFound: } for c.sub.in = 1 to c.sub.i { for c.sub.out = 1 to c.sub.o if T[0, c.sub.in ] == O[0, c.sub.out ] goto ColFound; c.sub.o ++; O[0, c.sub.o ] = T[0,c.sub.in ]; ColFound: } } } ______________________________________
A subtable is referred to with index ranges instead of scalar constants. For example, while an entire output table may be referred to by O, a data area may be designated as follows:
The vector of row categories is specified as follows:
In the preferred practice of the present invention, a mapping list is used to map source table row and columns to corresponding destination output (or aggregate) table row and columns. A mapping list is written as a list of pairs M={(s.sub.1,d.sub.1), (s.sub.2, d.sub.2), ... (s.sub.i,d.sub.i)} where each s.sub.i represents a source row or column and each d.sub.i represents a destination row or column (a mapping list contains either all source and destination row numbers or all source anddestination column numbers, but not both).
The mapping process of the present invention is defined by the pseudo code DataConsolidate listing below which is set forth in the syntax of the well known "C" programming language. The following algorithm receives as input two arguments: aninput table list and an output table template (which may be automatically generated by the GenerateTable subroutine or specified by a user), specified with the element identification convention set forth above.
______________________________________ DataConsolidate({T.sub.1, T.sub.2, . . ., T.sub.N }, O) for each table, T.sub.i .epsilon.{T.sub.1, T.sub.2, . . ., T.sub.N } { M.sub.r = Map(O[1 . . . r.sub.o, 0], T.sub.i [1 . . . r.sub.i, 0]); M.sub.c =Map(O[0, 1 . . . c.sub.o ], T.sub.i [0, 1 . . . c.sub.i ]); for each pair, (s.sub.r, d.sub.r), in M.sub.r for each pair, (s.sub.c, d.sub.c), in M.sub.c Apply(O, T.sub.i [s.sub.r, s.sub.c ], d.sub.r, d.sub.c); } ComputeStats(0); }______________________________________
The map subroutine called by DataConsolidate is used to create a mapping list from a source and a destination category vector and from a source and destination division vector. Instead of performing an equality comparison to locate matchingcategories and matching divisions, the output table uses categories and divisions as a pattern to match against all categories and divisions in a source table. A pseudo code listing for the map subroutine is set forth below:
______________________________________ Map(V.sub.o, V.sub.i) NewMap(M); //Create a new empty mapping list for o = 1 to .vertline.V.sub.o .vertline. for i = 1 to .vertline.V.sub.i .vertline. if(PatternMatch(V.sub.o [o], V.sub.i [i])) AddToMap((i, o), M); return(M); } ______________________________________
A PatternMatch subroutine receives as input arguments:
(1) a pattern string
(2) an object string.
If the pattern string contains no wild-card characters (used for generic aggregation), PatternMatch returns true if the pattern and object string are identical. Otherwise, PatternMatch returns true if the pattern string can be made to beidentical to the object string after substituting a `?` character in the pattern for any other single character, and an `*` character in the pattern for any zero or more characters. For example, the pattern string "c?t" would match an object string of"cat", and "*Sales" would match "Regional Sales". A specific implementation for the PatternMatch routine is not set forth herein. However, a PatternMatch routine suitable for use with the present invention is set forth in Chapter 9, Aho, Hopcroft, andUllman, "The Design and Analysis of Computer Algorithms," 1974 and published by Addison-Wesley.
The AddToMap subroutine called by the Map subroutine adds the first argument (a pair of numbers) to the second argument mapping list. It ensures that each pair is added only once to M (i.e., only unique pairs are allowed with no duplicates).
__________________________________________________________________________ AddToMap is coded as follows: NewMap(M) /* For simplicity, assume a large fixed allocation, k. In practice, we use dynamic re-allocation of the array as it grows. */ M.Array = AllocateArray(k); M.size = 0; } AddToMap((i, o), M) { /* Binary search */ iMin = 0; iMid = 0; iMac = M.size; while (iMin - = iMac) { iMid = iMin + (iMac-iMin)/2; sgn = ComparePair(M.Array[iMid], (i, o)); /* Pair already in array,return */ if (sgn == 0) return; /* Too low, look in upper interval */ if (sgn < 0) iMin = iMid+1; /* Too high, look in lower interval */ else iMac = iMid; } /* Not found, insert new pair at iMid position in array */ /* Move all pairs abovethe new pair up by one */ M.size++; for i = M.size to iMid+1 step-1 M.Array[i] = M.Array[i-1] M.Array[iMid] = (i, o); } ComparePair((i.sub.1, o.sub.1), (i.sub.2, o.sub.2)) { if(i.sub.1 -=i.sub.2) return(i.sub.1 -i.sub.2) else return(o.sub.1-o.sub.2); } An apply function adds the applied value to the corresponding element of the output table. A pseudo code listing for the apply function is set forth below: Apply(O, val, r, c) { O[r, c] + = val; } ComputeStats(0) /* This function needdo no work to compute a sum */ { } __________________________________________________________________________
While the above description was set forth in terms of additive mapping, the present invention may be used for aggregate functions other than summing with a small modification. For example, rather than the Apply subroutine adding entries directlyto the output table O, the routine enters intermediate values into a set of temporary tables of the same dimension as O. For example, to average values, two tables can be used, one for holding a cumulative sum, and one to contain the count of elementscontributing to the sum. In this embodiment, the DataConsolidate routine is modified to include a step which combines the intermediate values from the temporary tables such that each element in O is assigned to the quotient of the sum table divided bythe value in the count table. Similarly, a statistical variance function may be implemented through the use of three tables, a sum table, a sum of squares table and a count table. These features may be implemented according to the routine set forthbelow:
__________________________________________________________________________ Apply(O, val, r, c) /* Version of Apply to compute Average */ Count[r, c]++; O[r, c] + = val; } ComputeStats(O) { for r = 1 to r.sub.o for c = 1 to c.sub.o =0)ount[r, c] O[r, c]/ = Count[r, c]; } Similarly, a statistical VARIANCE function can be computed by using three temporary tables, a sum, and sum of squares, and a count. Apply(O, val, r, c) { Count[r, c]++; O[r, c] + = val; SumSq[r, c] + =val.sup.2 ; } ComputeStats(O) { for r = 1 to r.sub.o for c = 1 to c.sub.o = 0)ount[r, c] O[r, c].sup.2)/Count[r,umSq[r, c] c]; __________________________________________________________________________
In summary, an improved method for aggregating tables having dissimilar formats has been described. Accordingly, all of such uses and modifications will be apparent to persons of ordinary skill without departing from the spirit and scope of theappended claims.
* * * * *