Business Intelligece Data Warehouse Consultants Atlanta

Knowledgy Consulting, llc

Atlanta's Premier Business Intelligence and Data Warehouse Consultants


Predictive Analysis with SQL Server 2008

ER Data ModelsI've always found predictive analysis and data mining to be interesting and fun. I feel like a child walking through a cave amazed at the disoveries they are finding. So I couldn't wait to play around with the new predictive analysis tools in SQL Server 2008 Analysis Services. SSAS provides a complete data mining platform that is easy and fun to use. Organizations can use SSAS to infuse insight and prediction into everyday business decisions. The 2007 Office system uses Pervasive delivery through the Data Mining Add-Ins. These tools are a blast to use and will help you uncover knowledge hidden in your data. The comprehensive development environment and extensible range of innovative data mining algorithms combined with the enterprise-level scalability and manageability of SQL Server Analysis Services makes SQL Server 2008 an ideal and easy way to bring the benefits of predictive analysis to your enterprise. The predictive analysis capabilities of MSSQL 2008 makes it easy to incorporate intelligence into reporting, data integration, OLAP analysis, and business performance monitoring. It will help your business drive increased business agility, create a tangible competitive advantage and bottom line, grow revenues. The ability to extend the data mining technologies of SQL Server through custom algorithms and visualizations, together with the ability to embed predictive functionality into line-of-business applications makes SQL Server 2008 a powerful platform for introducing predictive analysis into existing business processes to add insight and recommendations into every operation.

Linked Server Execution

Let's say you run the following query on a linked server:

Select *

From LinkedServerName.DatabaseName.SchemaName.TableName 
Where SomeField = 'SomeFilter'

Do the records get filtered at the target database, or is the whole table sent to the requesting server?

Linked servers are an excellent way to get data in real time from one server to another. Incorrectly written linked server queries can quickly decrease system performance on one or both servers. The query optimizer doesn't always work as you would expect. I often see queries that join a local table to two remote tables and the queries take hours to run. That's because the
local optimizer doesn't know which records to request from the remote table.

It therefore requests that the remote server transmit the entire table, and all that data is then loaded into a temporary table and the join is done locally. Unfortunately, because the local table is a temporary table -- and not a physical table on the source system -- the indexes on the remote table do not get created on the temporary table. Because of the lack of indexes,
expected query execution time skyrockets.

There are a couple of techniques you can use to improve query response time. The first is to create a stored procedure on the remote database and have it return a record set, being a subset of the remote tables, which is then loaded into a local temporary table. It can then be indexed as needed. The trick with this method is to provide an input variable to the remote
procedure where input values can be passed to. Thus, you will reduce the number of returned records by as much as possible. Fewer records will reduce the run time of that stored procedure as well as the network latency on transferring those records from the remote system to the local system.

The second technique you can use is a variation of the first method. You create local temporary tables for each of the remote tables and transfer over the columns and records needed from each of the remote tables. Next, index the tables as needed and join the temp tables locally.

While the second technique is easier and faster to set up and implement, the first method gives you a greater performance savings, as typically less data needs to be transferred between servers.

A new feature in Sql Server 2008 allows linked servers to run INSERT...EXEC statements. Now you'll be able to insert data into a table locally by executing a stored procedure on a seperate server or vica versa. Pretty handy!

Understanding *.mdmp / sqldump?.txt files

The .mddmp and .txt files are created by access violations. There are several things you can get from them. Here's what you may see inside the file:

SqlDumpExceptionHandler: Process 20 generated fatal exception c0000005
EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process.

SQL Access Violations are caused by "something" causing SQL Server itself to encouter an error it doesn't know how to respond to. The.txt file will have the input buffer with the query that caused the problem. You can make sure a developer didn't write some wacky code. It also has the memory address. If you see something like "Access Violation occurred reading address 0000000000000000 " it is most likely a null pointer. Unless you have a 3rd party component, it is probably a SQL bug. Make sure you have the latest build and call support. Also, Check the ERRORLOG's for extra info.

What is a "data sublanguage"?

In relational database theory, the term sublanguage, coined by E. F. Codd in 1970, refers to a computer language used to define or manipulate the structure and contents of a relational database management system (RDBMS). Typical sublanguages associated with modern RDBMS's are QBE (Query by Example) and SQL (Structured Query Language). In 1985, Codd encapsulated his thinking in twelve rules which every database must satisfy in order to be truly relational. The fifth rule is known as the Comprehensive data sublanguage rule, and states:

A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings, and that is comprehensive in supporting all of the following items:
- Data definition
- View definition
- Data manipulation (interactive and by program)
- Integrity constraints
- Authorization
- Transaction boundaries (begin, commit, and rollback)

A sublanguage is simply a proper subset of a language devoted to some identified task. SQL has DML and DDL sublanguages. Some languages support embedding SQL as a sublanguage through a pre-processor etc.

Types of Database Management Systems

Here's a brief overview of the majority of the different DBMS's:

Hierarchical - The hierarchical data model organizes data in a tree Structure. There is a hierarchy of parent and child data segments.

Network - the network model permits the modeling of many-to-many relationships in data.

The above dbms are pre-relational dbms's.

Relational - A database based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures,
storage and retrieval operations and integrity constraints.

Object/relational - database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems.

Object-Oriented Model - Object DBMSs add database functionality to object programming languages.

Semistructured - the information that is normally associated with a schema is contained within the data, which is sometimes called ``self-describing''. In such database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application.

Associative - divides the real-world things about which data is to be recorded into two sorts:
Entities are things that have discrete, independent existence. An entity's existence does not depend on any other thing. Associations are things whose existence depends on one or more other things, such that if any of those things ceases to exist, then the thing itself ceases to exist or becomes meaningless.