Data

Data Anomalies

Background

An anomaly is something that is unusual or unexpected; an abnormality
In technology, an anomaly can be seen as something that strays from common practice
There are three types of data anomalies: insert, delete and update

Insert Anomaly

An insertion anomaly occurs when data cannot be inserted into a database due to other missing data
This is most common for fields where a foreign key must not be NULL, but lacks the appropriate data
An example of this anomaly can be explained with a simple user database
- A user must have a group ID as a foreign key
- No groups have yet been created
- Thus, a user can not be inserted in to the database as the group ID must not be NULL
This can result in data redundancy due to the omission of data

Delete Anomaly

A deletion anomaly occurs when data is unintentionally lost due to the deletion of other data
For example, if a database row contained "Username" and "User Group"
- "John" and "Fred" are in the user group "Contributors"
- If John and Fred are removed from the database, our Contributors group will also disappear
- This is because we haven't normalised our data, meaning the only reference to the Contributors user group lies within the same database row (or record)
- Hence, removing the only two references of our user group results in the loss of data accuracy and integrity
This also goes to show why it's important for us to normalise our data and how combining unlike information can be problematic

Update Anomaly

An update anomaly occurs when data is only partially updated in a database
A database that hasn't undergone normalisation may reference the same data element in more than one location
As these locations haven't been consolidated and referenced, we have to make sure each location is manually updated
This can cause problems as we then need to spend time searching for and updating each reference to the data element
An example of this is a database containing two records; Users and Mailing List
- John has an email address of john@mail.com in the Users record
- John has the same email address in the Mailing List record
- John decides to change his email preferences, which in turn updates the User record for John
- However, the system did not automatically update the Mailing List record, leaving John with two different associated emails and thus creating inconsistencies within our database

Further Research

Read more about Data Anomalies from Wikia here
Read more about Data Anomalies from Johnstone High School here

Data Duplication & Redundancy

Background

Data duplication and data redundancy are similar concepts, but are not the same
Both concepts work to ensure efficiency and consistency in terms of databases and storage

Data Redundancy

Data redundancy occurs when the same data is entered in to two or more fields of a database
For example, "Joe" is entered in to the Name field under a record called Customers
"Joe" is also entered in to the Customer field under a record called Purchases
Although we are referring to the same Joe in both fields, each piece of data is seen as unique
This means that to update "Joe", we need to manually edit each reference
- This can cause problems in database systems such as data anomalies
This differs from data duplication, as it is often not intentional and can take up potentially required storage space

Data Duplication

Data duplication occurs when an exact copy of a piece of data is created
For example, copy and pasting an item called "MyPicture.jpg"
- The new pasted item contains the exact same data as the original picture
- On different Operating Systems, the naming convention for copies will change (e.g. "MyPicture 2.jpg" or "MyPicture copy.jpg")
Data duplication provides benefits such as providing us with the ability to back up copies of files and create multiple verions of a file (which may be required for progress reporting or other information)
The duplication of data is often intentional and used primarily for creating backups
Data duplication on a database may result in data redundancy, and thus an inefficient and inconsistant database

Why are these important?

The act of normalising a database (organizing data to prevent redundancy) is critical to maintain an efficient and clean database
Ensuring unncessary data doesn't reside on the database allows for a consistent and accurate database
The ability to duplicate data intentionally allows us to create backups and maintain our data

Further Research

Read more about Data Redundancy from DatabaseDev here
Read more about Data Duplication from GeekInterview here

Data Integrity

What is Data Integrity?

Data integrity refers to the accuracy and consistency of data over its entire lifecylce
This is a fundamental concept in Computer Science, as it can make or break an IT system
The integrity of data is dependant on many factors, including the influence of data duplication and redundancy
In terms of a database; referential, domain and entity integrity are assessed

Referential Integrity

Referential integrity states that every foreign key must reference a valid existing value in another table
This means that for every record in a normalised database, the linking element (the foreign key) must exist in another record
Both the primary and foreign keys must be the same data type and length

Domain Integrity

Domain integrity refers to the boundaries that shape the data entered into a database
This can be as simple placing a limit on the length of the data item and enforcing a specific data type
Domain integrity ensures organisation and validity in a database structure

Entity Integrity

Entity integrity is a simple concept that ensures the validity of primary keys
The concept states that each primary key must not be NULL (meaning it must contain a value of some sort)
It also states that each primary key must be unique, meaning no primary key value may be the same as another primary key value in the same record

Further Research

Read more about Data Integrity from Microsoft here

Data Mining

Structure of Data Warehouse

A data warehouse is a database or collection of databases that ARE updated (edited 5/617).
This data is stored for many years.
The word warehouse is chosen to give the impression of a 'large' area.
This database or these databases can reside on one server for a company.
Or it can be on several servers for that company.
The overall goal of this 'data warehouse' is to store data over a period of time, which is used in data mining.

The Role of Data Mining

Data mining is used by business.
They use it find trends which can assist sales, promotions and marketing.
The use it to identify future planning for the company

Structure of a Data Mart

A data mart is a small data warehouse usually with data for just one area.
It is still a database on a server.
It is queried by Data Mining software to get valuable trends in data for a company.

As Methods of Storage and Distribution

(please note this section needs confirmation for accuracy)
The storage of data warehouses and marts are on servers.
The distribution occurs by software that analyses the information in the databases.
There are 100s of these that provide reports offering recommendations for the company based on the query.
In 2017, these were the top 34 data analysis software. (courtesy of Predictive analytics Today)

Further Research

See this page for more details on Data Warehouses and Data Marts.

Data Management

Background

The ability to manage and sort files and folders is a crucial aspect of computer systems
Their implementation allows for organisation and ease of use
Files and folders are essential elements to the IT world
To read about files and file systems in more depth, check out The Computing Teacher's article on File System here

What's the difference between Files and Folders?

A computer file is used to store data and information
- All files have their own type and relative extensions
- For example, Microsoft Office Word Documents contain the extension .doc or .docx
- These file extensions tell the computer what program to open each file with
A computer folder is used to store files and other folders
- Folders are used in an organizational manner
- They can be sorted by various attributes and are used to store larger numbers of files and other folders
- Think of folders as filing cabinet systems of a computer, they're used to tidy up the office (or in this case, directory) by taking all papers (or files) and acting as a container for them

How can I manage my data?

Organization is key to utilising the power of computers; make sure you stick to a naming convention and folder structure throughout your file system
Delete any old or unnecessary files
Sort out files and downloads as you receive them - don't wait until they're just another icon lost amidst your Downloads folder
Create a few manageable folders (such as "Work", "Personal" and "Media")
Always make sure to create backups of your data incase of accidental loss or corruption

Further Research

View HowToGeek's article on file and folder organization here
View Lifehacker's guide on data organization here

Data Manipulation

Background

To manipulate data means to change or process information
Data can be manipulated in order to make it easier to read, to organize certain elements or to change the information completely
The manipulation of data is common in applications such as websites, where data is concurrently being selected and updated, or even inserted or deleted
Data manipulation langauges (DML's) such as SQL are utilised to manipulate data from sources such as databases

Common DML Commands

SELECT: This command is used to ouput a list of rows (or record) from a database.
- Syntax: SELECT [x] FROM [y] WHERE [z]
UPDATE: This command is used to alter the data from one or more tables.
- Syntax: UPDATE [x] SET [y] WHERE [z]
INSERT: This command is used to add one or more entries to a database table.
- Syntax: INSERT INTO [x] [y] VALUES [z]
DELETE: This command removes one or more entries from a database table depending on the conditions.
- Syntax: DELETE FROM [y] WHERE [z]

Further Research

Read more about Data Manipulation Languages here
View from common DML commands from Microsoft here

Data Protection

What is data protection?

Data protection refers to keeping personal and potentially sensitive information safe from unauthorised access
Protecting our data is essential to ensuring safety and privacy in the IT world
Authentication and encryption are two methods which allow us to protect and secure data
For more on the benefits of protecting data and methods of keeping it secure, read The Computing Teacher's article on Data Security

What is Authentication?

Authentication helps to protect our computer systems and data by blocking unauthorized access and control

passwords
- type in characters from the keyboard
- more complex passwords are more secure
- long passwords are more secure
biometrics
- using part of the body to gain access
- this may replace password access (disneyland example at wikipedia)
- eg finger print access to a device (finger print recognition from wikipedia)
- has two phases. 1. Digitise the information into a database (eg your scanned finger print) 2. Access the device with the digitised information (your finger print) which matches the version in the database.
- body parts being digitised are scanned by; finger scanner, hand scanner, face scanner, retina scanner and voice scanner
- finger scanner is the main one being experimented with and used today
digital signatures
- making one in Adobe Reader; (first part and last part of this video are worthwhile)
- this is NOT just copying an image of your signature onto a letter or document
- this is a mathematical method to keep digital documents or message secure
- commonly used in banking or any organisation to ensure security
- computerhope.com gives the following description
  - "A digital signature can be broken down into three parts: A key generation algorithm, a signing algorithm, and a signature verifying algorithm. The key generation algorithm selects a random private key from a set of possibilities and sends the private key with a related public key. The signing algorithm produces a signature based on the message and the private key. Finally, the signature verifying algorithm accepts or rejects the authenticity of the message when provided with the message, signature, and public key."

What is Encryption?

Encryption is to hide the true meaning of some text or numbers. Go to computer hope here and type in some text to see some hidden versions of your text.

There are now very advanced methods of data encryption using private and public keys.
It uses both the private key and the pubic key together to encrypt and then decrypt the data
public key - is available for anyone to use
private key - is only given to trusted people which can digitally decrypt and format the data so it can be read.
you can increase security with longer encryption eg eg 32bit could go up to 256bit and it would have a longer encryption variable (much bigger mixture of text and numbers)
this youtube clip is worthwhile.

BONUS: A tip for passwords

Further Research

Read Hexistor's guide to protecting your data here
Read HowToGeek's guide on Encryption here

Data Redundancy

Background

Data duplication and data redundancy are similar concepts, but are not the same
Both concepts work to ensure efficiency and consistency in terms of databases and storage

Data Redundancy

Data redundancy occurs when the same data is entered in to two or more fields of a database
For example, "Joe" is entered in to the Name field under a record called Customers
"Joe" is also entered in to the Customer field under a record called Purchases
Although we are referring to the same Joe in both fields, each piece of data is seen as unique
This means that to update "Joe", we need to manually edit each reference
- This can cause problems in database systems such as data anomalies
This differs from data duplication, as it is often not intentional and can take up potentially required storage space

Data Duplication

Data duplication occurs when an exact copy of a piece of data is created
For example, copy and pasting an item called "MyPicture.jpg"
- The new pasted item contains the exact same data as the original picture
- On different Operating Systems, the naming convention for copies will change (e.g. "MyPicture 2.jpg" or "MyPicture copy.jpg")
Data duplication provides benefits such as providing us with the ability to back up copies of files and create multiple verions of a file (which may be required for progress reporting or other information)
The duplication of data is often intentional and used primarily for creating backups
Data duplication on a database may result in data redundancy, and thus an inefficient and inconsistant database

Why are these important?

The act of normalising a database (organizing data to prevent redundancy) is critical to maintain an efficient and clean database
Ensuring unncessary data doesn't reside on the database allows for a consistent and accurate database
The ability to duplicate data intentionally allows us to create backups and maintain our data

Data Security

Background

Data security refers to protecting data (or information) from unwanted or unauthorized actions from potentially harmful users
This applies to computer access, databases and all types of personal information
Data security is an integral part of the IT industry as it helps to ensure the security and privacy of sensitive information

Why do we need to secure our data?

Data security is essential to protect privacy of individuals
It is also essential to protect intellectual property of individuals, businesses or governement
Data is very attractive to criminals as data theft can lead to bank fraud or other illegal financial gain
Data security can also help to prevent malicious attacks on a computer system
To use the security practice of restoring from back ups to retrieve data when corruption occurs

Tips for securing data

Place data servers in locked rooms to prevent physical access and theft
Use strong passwords on network devices to prevent access to data
Physically lock away any sensitive information on an external drive
Enable a firewall and restrictions on Internet access
Use an anti-malware program such as MalwareBytes and ensure it is up to date
Use an anti-virus program and ensure it is up to date
Secure your WiFi network with a passphrase to ensure no unauthorized access to your local network
Activate password protection for devices to lock out unwanted guests
Regularly backup data to separate devices and store off site with password protected copies

Further Research

Read some more tips on data security by SpamLaws here
Read 10 ways to secure your data on Computer World here

Data Types

Background

A data type lets a computer system know how to deal with each piece of given information
It applies certain rules and functions when dealing with specific data types
Each data type contains a different range of values and boundaries
Data types allow us to keep clean, organised and effective databases

Common Data Types

Data Type

Description

Character

Contains a single value of a number, letter or symbol
Multiple characters are used to form strings
Characters have a fixed value of one

Number

Stored fixed and floating point data types.
Range: ±1 x 10-130 to ±9.99...9 x 10125 with up to 38 significant digits

Date / Time

Date: stores the year, month and day values

Time: stores the hours, minutes and second values

Timestamps generally reference the beginning from Jan 1st, 1970. See if you can find out why!

Currency

Used to store data that is used for monetary values or financial calculations
Contains the numeric value plus the localised currency symbol

Text / String

Used to store an ordered set of symbols

A string contains multiple characters

Boolean (True / False)

Used to define a true / false statement

This can also be thought of as on / off

Often used as FLAGS for developers

Further Research

Read more about data types across different SQL programs here
Read more about data types in programming here
Read more from IGCSE ICT here

Data Validation

The purpose of Data Validation

A validation rule verifies that data is entered correctly.
Validation rule's ensure that data entered in a record matches a specified standard, structure or range of values
Data validation provides consistent formatting and presentation of information in a database

Examples

A web form that asks for an email address, won't accept text input unless it has @ in the text
A password may be rejected because it is too short. Validation states it must be minimum x characters.
check digit is looking at the last 2 digits in a list of numbers to confirm numbers are correct. Bar Code readers use this.
Range check - that the numbers entered fall within an expected range eg month must fall between 01 and 12.

Page updated

Google Sites

Report abuse