This is the third post of my blog for MIS 587 Business Intelligence course in the Eller College of Management. In this post, I am going to talk about the differences between unstructured and structured data. I will be discussing about the present state and volume of various data types available to organizations. Then, there will be a small section on the limitations of data warehousing in analyzing different types of data. I will finally be concluding by discussing where the role of data warehouse is headed in the near future.
Differences between unstructured and structured data
Let us first look at an image which clearly illustrates the major difference between unstructured and structured data.
Furthermore, Wikipedia defines unstructured data as information that either does not have a pre-defined data model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. Structured data refers to any data that resides in a fixed field within a record or file. This includes data contained in relational databases and spreadsheets.
Unstructured data is not useful when fit into a schema/table. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in fielded form in databases or annotated (semantically tagged) in documents. Structured data has the advantage of being easily entered, stored, queried and analyzed. At one time, because of the high cost and performance limitations of storage, memory and processing, relational databases and spreadsheets using structured data were the only way to effectively manage data. Anything that couldn't fit into a tightly organized structure would have to be stored on paper in a filing cabinet.
How Data Warehouse fits into analyzing this data
In today's world, real-time data is very unstructured. Many organizations believe that their unstructured data stores include information that could help them make better business decisions. Unfortunately, it's often very difficult to analyze unstructured data. There are four basic properties of data which makes it very difficult to mine and get useful information out of it. They are:
Volume: These data are with a size which is beyond the ability of customary databases and software tools. It can include large data points, longer periods in time, more variables and can discover more subtle patterns.
Variety: Such data has a lot greater variety than other data. The data available to organizations are structured, unstructured and semi-structured.
Velocity: The rate at which the data is transmitted and received is higher than usual. It is often the result of new applications like Facebook, Twitter, etc.
Veracity: Credibility of data is different from older models. Data is not generated by users of the data and hence not always trustworthy.
In computing, Data Warehouse (DW) is used for reporting and data analysis. DWs are central repositories of integrated data from one or more disparate sources. They store current and historical data and are used for creating analytical reports for knowledge workers throughout the enterprise. Data Warehousing can help in transforming unstructured data into a structured form. Data warehousing incorporates data stores and conceptual, logical, and physical models to support business goals and end-user information needs. A data warehouse (DW) is the foundation for a successful BI program. Creating a DW requires mapping data between sources and targets, then capturing the details of the transformation in a metadata repository. The data warehouse provides a single, comprehensive source of current and historical information.
Data warehouses tend to have a high query success, as they have complete control over the four main areas of data management systems:
- Clean data
- Indexes: multiple types
- Query processing: multiple options
- Security: data and access
Limitations of Data Warehousing
However, there are considerable disadvantages involved in moving data from multiple, often highly disparate, data sources to one data warehouse that translate into long implementation time, high cost, lack of flexibility, dated information and limited capabilities:
- Major data schema transforms from each of the data sources to one schema in the data warehouse, which can represent more than 50% of the total data warehouse effort
- Data owners lose control over their data, raising ownership (responsibility and accountability), security and privacy issues
- Long initial implementation time and associated high cost
- Adding new data sources takes time and associated high cost
- Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users
- Typically, data is static and dated
- Typically, no data drill-down capabilities
- Difficult to accommodate changes in data types and ranges, data source schema, indexes and queries
- Typically, cannot actively monitor changes in data
Role of Data Warehouse in the near future
With data no more being called as data, but rather big data because of its volume and variety, it is of utmost important that organizations use Data Warehousing to get information from these large unstructured parts of data. The data is out there being generated at an incredible velocity and variety, but not many are tapping into the immense potential of it. All answers to management problems in organizations lie within these data but it is incredibly hard to make any sensible information from them. For example, a fast food chain wants to know which location will be most suitable to open a new franchise shop. It knows people have been tweeting about their place and some have been complaining it is very far from their home. They know these tweets but do not know the origin. They can set up a data warehouse to place these tweets into a structured data warehouse and then using ETL, get some information about them and come to a decision.
As the days pass by, organizations are realizing the importance of Data Warehousing and Business Intelligence. They are realizing the fact that without these tools they are losing their business to competitors. It is somewhat funny to notice that how all the answers are around you but there is nothing which can exactly point to it. Data Warehousing can do that magic to organizations and help make management decisions based on the outputs of the BI tool. In the near future, the success of organizations will not only be decided by whether they are incorporating DW/BI, but rather how good their DW/BI tool is.
References
- Wikipedia
- Webopedia
- http://www.smartdatacollective.com/michelenemschoff/206391/quick-guide-structured-and-unstructured-data
- http://www.webopedia.com/TERM/U/unstructured_data.html
- https://tdwi.org/portals/data-warehousing.aspx
- http://www.whamtech.com/adv_disadv_dw.htm
No comments:
Post a Comment