SelfInformed

July 2015


What Is Your Data Worth and Where Should You Store It?

Monday, July 27, 2015

Introduction

We live in an information age! The manner that your data is managed provides an essential competitive edge to your business. The right data can help you unlock the potential of your business. A robust data management plan will provide insights growing or starting your business by focusing on building revenue and reducing costs. It is therefore imperative that due diligence is applied to the gathering, processing, storage and interpretation of your data to maximize its productive impact.

What do these terms mean?

GATHERING accumulating information from applications, physical documents, purchased data or scraping data from the internet, to meet the needs of your business.

PROCESSING ensuring that your data is in the right place, at the right time, in the right form. This is the process of associating disassociated data, often from other application data sources. Various software applications are available to assist in organizing your data and streamlining its accessibility. Many of these applications may be individually tailored to suit your business and its specific requirements.

STORAGE once data is organized it needs to be stored for when it is needed. There are various options available. Refer to the discussion below regarding storage mechanisms and formats.

INTERPRETING this is arguably the most important aspect of effective data management. It provides you with business intelligence in the form of reports, dashboards, schedules, graphs, charts, exception reports, variance analyses, statutory reporting to name but a few.

In order to ensure that your data is maximized, a database administrator is a valuable asset. However, for most small business owners and self-employed entrepreneurs, this is a luxury. Therefore, this is a role that you would need to assume. But not all business owners have the time or the aptitude to take on this challenge. Thankfully, there are ways to outsource various aspects of data management. This article introduces some ideas on how to accomplish this.

Data Storage

This is broken down into 2 fundamental parts:

Storage format – What application stores the data?

Storage mechanism – where is the data stored?

Data storage is a key component of data management. Essentially, data is stored in binary, a series of 0’s and 1’s. In order to make data most effective, we need to identify the target user. This means that we are faced with the challenge, on the one hand, of sharing data; and, on the other hand, ensuring that it is secure, both in storage and confidentiality. This plays a major role in determining the storage mechanism for your data.

The other consideration introduced earlier is the interpretation of the data. Often, the outputs and storages utilize different applications. For example, data may be stored in a database (SQL Server, Oracle, Microsoft Access, MySQL,, etc.) and interpreted and presented through a reporting tool (Sharepoint, Crystal, SQL Server Reporting Services, Excel, etc)

The nature of the data and available resources are the best way to determine the most appropriate storage for your data. For example, if your data is client contact information and you have an Exchange server, you can store your clients in a global address book, making the data available to all users via Outlook. Outlook has a journal function for recording events, tasks for follow up, tags for categories, email for communication and a wide range of Outlook compatible add-ins to automate the process to leverage your Exchange server if you have one.

When it comes to transactional data, such as orders, sales, purchases, inventory, you typically need an application that stores the volume of your data, and thus the application is the interpreter. ERP styled applications, like Quickbooks, typically allow you to store most of this data and meet your statutory requirements.

It is very common for there to be various smaller solutions used within your overall data management plan. This however, increases the amount of redundancy and disconnection between data. For example, you may be using Outlook, or a third party such as Sales Logistics for your Customer Relationship Manager (CRM) and Quickbooks (QB) for your Chief Financial Officer. This results in the data in the CRM Clients table being isolated from the QB Customer table. This means that in order to answers such questions as, “How many calls did I make to each client and how many sales did I generate?” data will need to be joined from the various sources listed above.

STORAGE FORMAT

Data can be stored in simple text files using either a comma or a tab delimiter. This is good for transmitting data, extracting from the internet and applying data across multiple applications. Once accumulated, these files may be used to create tables, a collection of which forms a database. Another way to understand a database, is to view it as a big box full of information that is organized in such a way to provide valuable outputs. Databases allow for security to be implemented to protect your data and permit multi-user access.

While there is an array of options available for setting up databases, “Excel Databases” appear to be common, largely due to the ease with which users are able to create, design and maintain them. Another option is Google Docs which are different than Excel in that they are cloud hosted documents and allow for limited simultaneous multi-user access.

If you are tempted to use a simple programming platform like Excel as a database, there are a few caveats to bear in mind:

Excel’s primary function is to be a data analysis tool and should not be used for large amounts of raw data.

Excel is very dynamic (this is also a disadvantage) and easy to set up. Excel is a true programming environment. It is one of the few programs outside of “Calculator”, where an input of “=1+1” actually returns “2”. Word, Powerpoint, Email, etc. simply return your input.

Sheets can be used like tables in a database.

Data may be joined using using vlookups (although an index/match combo does allow for reverse lookups).

Data from “subtables” (i.e. sheets) can be made available via the DataValidation tool to ensure integrity and allow for quicker entry of data.

There are, however, problems with this approach:

Databases allow for multi-user access to a single file source. In laymen’s terms this means we all get to write to a common knowledge. Excel does not naturally facilitate this and requires a complex set of macros to achieve it and requires a programmer.

When two people open an Excel spreadsheet, the second user is locked into read-only mode. They are not designed to allow multi-user input. Excel spreadsheets can be set up as a shared document, however they are subject to conflict resolution in which case one user’s changes may not be saved. Google Sheets do allow simultaneous access to a shared sheet, but still have the same limitations for changes to the same cell.

If your business needs require a single user-input, then a database and spreadsheets work the same way. If your needs change and multi-user inputs are required, then Google Sheets may work for a few users. Past a few users, a database is essential.

An article on spreadsheets written for the BBC noted that one of the downsides is that they are not written by programmers and are not subject to version control. This means that there is a large potential for error in the data. An error in one formula could completely change the picture the data interpretation displays.

Spreadsheets in general are not good at handling large volumes of data. If you will never have more than 50,000 rows in a spreadsheet and never more than a few spreadsheets that need to reference each other, you might be ok. However, if your needs grow to millions of records that need to be related in many spreadsheets (tables), then the data can become too large or complex to allow you to pull information from them quickly.

Spreadsheets differ from databases in crucial ways: They store formulas (“= A1 + A2 + A3”, “= sum(B1:B200)”,etc.) and they store positional references (column A, row 1 is directly above column A, row 2) to other data. In a database, if you need to add all the values of column A, you use a tool to pull that data from the database and sum the values or write code in a language like SQL (Structured Query Language) to do it.

The issue of formulas, positional references and volume is critically important. If you decide at a later point that spreadsheets are no longer working or are just too big to function, converting formulas and positional references into a database will be very difficult and most likely require an expensive consultant to do the conversion.

STORAGE MECHANISM

The options surrounding where to save your data include removable media, hard-drive, network, web based, cloud-based storage and servers.

Storage covers where those precious 0’s & 1’s get saved. A very important consideration here is who has access to it. If the information is sensitive, such as salaries and SSN’s, a network that is not strictly controlled is a breach of confidentiality. Web based solutions must implement very strict access controls to ensure hackers are unable to get in, and consulting an internet security expert is highly recommended because just having antivirus software is not going to be enough.

When determining the storage, the targeted users are an essential driver. If there is a need for clients to interact with your data then a web based solution is the only option. Alternatively, the following are other options:

Removable media a memory stick, flash drive, removable hard-drive, CDs or DVDs are a bad idea for storing data, but are great for short-term backups or distributing read-only data. However, remember small items like a memory stick or flash drive are easily lost and therefore, easily compromised.

Hard-drive – local to a particular machine. Data on that hard-drive is unavailable to other users unless sharing is enabled. File sharing enables one computer, (for example, your laptop) to access another computer’s files (for example, your desktop files). File sharing is a free way of providing a few of a fileserver’s functionalities. One drawback is that it requires the “host computer” to be available. File sharing can be restricted to specific users or available to all users and the security of every computer accessing the data is critical.. The sensitivity of the data will determine which approach to adopt.

Network requires a company to have a file server setup. Data is stored on the server, and a user can access it by mapping a drive to the data source. Administrators can configure user access rights to limit the ability to read or write to files. This is a common approach for most businesses, but the drawbacks include the cost of setting it up and maintaining a file server as well as the requirement of being connected to the network to access the data. Working remotely requires a VPN connection to your network. A SQL server is required if a SQL database is going to be used. The costs associated with a file server can range from the cost of a PC (if free software is used) to tens of thousands of dollars. The great advantage of a true database server like SQL Server or Oracle is they are designed for large volumes of data and can correlate that data very quickly if designed correctly and they provide for up to a point-in-time restoration of your data. With something like Excel or Access which are prone to data corruptions, if the hard drive fails or the data gets corrupted, you have to revert back to the point where you last backed up the file which could be many hours, days, or months ago. SQL and Oracle can be setup to take backups automatically at night and take incremental backups every few minutes, so you can get back to almost the exact point when the failure occurred. Which choice you make depends on how much work/data your business can afford to lose and how long you can afford to be down while you restore your systems.

An important consideration with all storage mechanisms is backing up your data as they are a physical storage that can be corrupted and will eventually fail. Best practice is to use an offsite location. Cloud solutions can be a very effective way of achieving that goal.

Cloud storage examples include DropBox, Googledrive, OneDrive, etc. offer a free (with limitation on the amount of storage) way to store your data that can be accessed via a web connection. Cloud storage provides the ability to access files via a web connection, as well as previous versions and peace of mind that your valuable data is constantly backed up. Cloud storage is great for single-user files or backups, but is not well-suited for multi-user data.

Some applications can make use of the “File Sharing” feature in Windows used to share data between computers via LAN or Wi-Fi network, but the “main” file can only be moved to the Cloud by the owner of that file. Using Google Docs does allow for some multi-user access, but is still subject to the problem of two users updating the same piece of data.

Cloud servers (Virtual Machines or VMs) – examples are Azure (Microsoft), Google Cloud Platform, and Amazon Web Services and they function in a similar manner to network servers, however they are web-based. For small to medium sized companies, they are significantly cheaper and easier to maintain, however they do require a web-connection. They offer various products and services, including backups of entire servers and data. They typically offer pricing in batches of users, thus 2 users would cost the same as 5 users, etc.

The big difference between cloud storage and a cloud server is that the cloud server runs applications and thus has limited processing capacities, but being virtual, they can be given limited additional capacity (more storage space, more memory, more processors, etc.) when needed. Most cloud servers are priced based on performance specifications. There are two types of cloud servers, shared and dedicated. Dedicated means that the service provider sets up and provides your business with virtual servers that do not share resources with other companies. Dedicated servers are more expensive than servers using shared resources. A shared server uses a percentage of the resources of a much larger “Host” computer to create the virtual server. You still have a server that only you can access, but the Host computer manages the resources behind the scenes between all the virtual machines it is hosting. The biggest issue with sharing is the performance can be affected by other businesses usage. However, that issue is becoming less and less an issue as providers actively monitor their systems and improve the technology to allow them to move the resource hogs to a separate set of resources.

Web servers – can either be internally hosted (similar to a file server), or outsourced to a hosting company. The database on most hosted sites is provided through MySQL for free as it is open source, however connecting to your company’s internal systems is not a given and a programmer will most likely be needed to bridge the gap between the external (host) server and your internal server. A web server can provide public access to your database via your website which is advantageous when you want to share data with clients, suppliers, etc. who do not have access to your internal network. Web pages are written using web based server-side scripting languages such as PHP or ASP along with HTML and languages like C#.NET to communicate between the user and the database. An Intranet web server is restricted to internal network users and thus is significantly less susceptible to attacks from hackers and viruses as long as the network is secure.

If your website is eCommerce enabled there are regulatory PCI compliance requirements to ensure credit card information is stored securely. Various industries have additional regulatory compliance requirements imposed, for example HIPAA if you are in the Medical industry.

Conclusion

A data management plan is a “must have” to ensure that you are maximizing the effectiveness of your data. Significant loss of data or breach of security could severely impact your business.

To be able to rely on your data, you need to ensure it is accurate and secure. In order to create reliable business intelligence from your data, your tools and code must be strictly implemented and version controlled to avoid errors and loss of intellectual property.

Bringing in a consulting firm to assist you in designing and implementing your data management plan to work according to your business needs and resources is a very good place to start. A few hours invested may result in providing you with powerful business intelligence and ensure the safety of your data.

 

SIDEBAR

Know Your Data

Do you know what data you have, how much you have and where it is? Your data could be in spreadsheets, text files, databases, online applications, cloud storage. You need to know all these things about your data because if you don't, you might be missing out on opportunities and might be susceptible to losing important data.

Backup Your Data

Backups and off-site storage of those backups are your friend in a disaster or for research projects. If you have small files, periodically backing up to secure cloud storage might be a good option. If you have virtualized or cloud-based servers, make sure what the provider’s restore options are or backup your data
periodically to another virtual server.

Plan for the Future

Generally speaking and especially in today’s business climate, your data needs are never going to shrink, so plan ahead. Putting data in a spreadsheet or flat file (text, csv, etc.) might seem like a good idea, but if you can foresee needing millions
of records, it will get unwieldy fast. Look for technologies that can grow (SQL) or are at least easier to convert (MySQL).

Keep Your Data Safe

On top of backups, you also need to prevent anyone from taking or destroying your data. This can involve network security and/or encryption methods. Consulting an expert to work through what data you have and how to protect it is a good way to go.

Don’t Let Your Data go to Waste

Research your data. Collecting the data to protect yourself or to use for auditing purposes is great, but there might be much more valuable information in there. Pulling all the data together into a report or in a spreadsheet can provide the first insights into relationships in your data and insights into your business you never realized were there. You might be able to save money buying supplies at a different time or you might be able to see when to stock up on certain kinds of items. The big companies already do this and the tools to do at least the first stage of data mining are either already readily available or becoming
less expensive every day.

Comment

  1. RadEditor - HTML WYSIWYG Editor. MS Word-like content editing experience thanks to a rich set of formatting tools, dropdowns, dialogs, system modules and built-in spell-check.
    RadEditor's components - toolbar, content area, modes and modules
       
    Toolbar's wrapper 
     
    Content area wrapper
    RadEditor's bottom area: Design, Html and Preview modes, Statistics module and resize handle.
    It contains RadEditor's Modes/views (HTML, Design and Preview), Statistics and Resizer
    Editor Mode buttonsStatistics moduleEditor resizer
      
    RadEditor's Modules - special tools used to provide extra information such as Tag Inspector, Real Time HTML Viewer, Tag Properties and other.
       

Business Management Blog

Your Connection to Small Business Specialists

Read the Blog!