|
|
An
enterprise can enhance its information delivery and analytical
capabilities through three major development initiatives:
-
Data Warehouse
- Data
Mart
- Data
Mining
First:
Organize the information currently scattered among different
sources and store it in a data warehouse.
Scattered and incompatible data from different sources is gathered
together (extraction) and then cleansed and made consistent
(transformation).
Second: Subset and process the relevant datasets in
a data warehouse to create one or more specialized data warehouses
or data marts. The data mart may be derived from a
data warehouse containing transactional records. Each data mart
contains the data needed for a related group of analyses. The
data is summarized at appropriate levels. The first and second
steps can be combined so that only the relevant data is transformed.
The summaries in the data warehouse or data mart are now available
for ad hoc queries and for integration by SAS into multidimensional
database (MDDB) cubes, accessed through the Internet or an intranet
and analyzed, using the appropriate middleware – e.g.,
SAS IntrNet or ASP. SAS IntrNet also makes it easy to create
a web-based user interface to enter parameters and run any SAS
program, and to view HTML-formatted output on the desktop.
Users
can query summarized datasets through their browsers via HTML
forms or more active front ends employing
cgi, ASP, Visual Basic, or Java. MDDB cubes can be viewed through
the browser using products such as SAS OLAP Viewer or Microsoft
Office Web Components, or as Microsoft Excel pivot tables.
Example: Multiple records for each customer in a large auto
insurance data warehouse are summarized to create a data mart
consisting of (1) one record for each customer with just that
subset of fields needed for rating agency reporting and (2)
more highly summarized data needed to create multidimensional
database cubes for on-line analytical processing (OLAP).
We develop data warehouses and data marts efficiently by utilizing
state-of-the-art middleware tools that allow us to reach diverse
operational databases and integrate this enterprise data into
a single end-to-end system. Since business users can interact
with the data warehouse or data mart directly through their Web
browsers – e.g., performing OLAP on SAS multidimensional
database cubes - they gain immediate access to critical management
decision information.
Third: Enhance decision support by adding data mining
tools to access and analyze the contents of the datawarehouse.
Data
mining is the process of selecting, exploring, and modeling large
amounts of data to uncover previously unknown patterns to achieve
business advantage. This technology integrates different statistical
and pattern recognition techniques such as neural networks, tree-based
models, churn analysis, and traditional statistics, allowing the
discovery of patterns, trends, exceptions, relationships, and
anomalies that might otherwise stay hidden. We apply data mining
methodology to a data warehouse or data mart to address following
business problems:
- Demographic profiling of high value customers using decision
tree models. |
- Response modeling using a combination of regression methods,
selecting those that produce the best gain chart. |
- Determining of the hierarchy structure of the MDDB and defining
the drill-down pass in OLAP applications. We exploit decision
tree models using the dimensions of the data cube as independent
variables and a high-value customer flag as a response variable.
|
|