Rabu, 11 Desember 2024

Data Processing

The Steps Of Data Processing

Data processing is a serious task that needs to be done in the right way. The data processing cycle consists of a series of steps where raw data (input) is fed into a system to produce actionable insights (output). Each step is taken in a specific order, but the entire process is repeated in a cyclic manner. The first data processing cycle's output can be stored and fed as the input for the next cycle, as the illustration below shows us.

Generally, there are 9 main steps in the data processing cycle:

Step 1: Data Collection

The collection of raw data. The raw data should be gathered from defined and accurate sources. Raw data can include monetary figures, website cookies, profit/loss statements of a company, user behavior, etc.

Step 2: Storing the data

Once you have the required data, you need to store the data in a safe and secure digital environment. This is to ensure it remains clean and unblemished, so it can be accurately analysed and presented.

You can store your data in one of the following places:

  • Data Lake: This is a centralised repository that aims to store large amounts of unstructured, semi-structured, and unstructured data.
  • Data Warehouse (DW): In this storage facility, data flows into a warehouse from relational databases or transactional systems. It may also be known as an enterprise data warehouse and can be from single or multiple sources.
  • Data Vault: This is a data modelling design pattern that’s used to create a warehouse for enterprise-level analytics. There are three different entities in a data vault —satellites, hubs, and links.

Step 3: Data Preparation Or Data Cleaning

The process of sorting and filtering the raw data to remove unnecessary and inaccurate data.
Raw data is checked for errors, duplication, miscalculations or missing data, and transformed into a suitable form for further analysis and processing.

The purpose of this step to remove bad data (redundant, incomplete, or incorrect data) so as to begin assembling high-quality information (accuracy, efficiency, able to analyse) so that it can be used in the best possible way for business intelligence.

Step 4: Input

The raw data is converted into machine readable form and fed into the processing unit. This can be in the form of data entry through a keyboard, scanner or any other input source. 

Step 5: Data Processing

Data Processing depending on

  • The source of the data being processed: Whether it has come from connected devices, data lakes, online databases, site cookies, or somewhere else.
  • To use the data for: Is it for streamlining your operations, establishing patterns in user behaviour, or another purpose.

Step 6: Analysing the data

The part of the process where you extract value from the data. This is achieved by using analytical and logical reasoning to systematically evaluate the data, delivering results and conclusions that you can present to your stakeholders .

There are four types of data analytics:

  • Descriptive Analytics: This concerns describing things that have occurred over time. It will be things such as whether one month’s revenue is higher than its predecessors, or if the number of visitors to a website has changed from one day to another.
  • Diagnostic Analytics: The focus here is on understanding the reason an event has occurred. It needs a much broader set of data and it needs a hypothesis (such as “does the Olympic games increase sales of running shoes?”) that you seek to prove or disprove.
  • Predictive analytics: This type of analysis addresses events that are believed to be set for occurrence in the immediate future. It seeks to answer questions concerning things like the weather, for example: “how much hotter will this year’s summer be than last year’s?”
  • Prescriptive Analytics: The distinguishing factor in this type of analysis is that there is a plan of action. For instance, a company may seek a plan for how to deal with the impact an increase of 5 degrees in temperature may have on its operations. By considering all the factors relevant to this, the data analysis determines the optimal approach to take in the event of this occurring.

Step 7: Output

The data is finally transmitted and displayed to the user in a readable form like graphs, tables, vector files, audio, video, documents, etc. This output can be stored and further processed in the next data processing cycle. 

Step 8 : Presenting the data

The final part of data processing is to present your findings. To make the demonstration clear and intelligible (accurate, reliable), your data will be represented in one or more of the following ways:
  • Plain Text Files: This is the simplest way of representing data, with the information being presented as Word, or notepad files.
  • Spreadsheets And Tables: A multifunctional way of presenting data, this displays the information in columns and rows. The data can be interpreted in a range of ways, with sorting, filtering, and ordering all possible.
  • Charts And Graphs: Using this approach makes it easy for your viewers to make sense of complex data, as numbers can be visualised.
  • Images, Maps, Or Vectors: If you’re displaying spatial data or geographical information then you may decide to choose this method of presentation. It’s ideal for data that’s regional, national, continental, or international.

Step 9: Storage

The last step of the data processing cycle is storage, where data and metadata are stored (in well-form condition and well-defined process) for further use. This allows for quick access and retrieval of information whenever needed, and also allows it to be used as input in the next data processing cycle directly.

Notes:

Well-defined data is data that is described, recorded, and shared using common data standards. These standards are based on industry best practices and ensure that data is consistent, accurate, and easy to understand.

Well-formed data indicate in syntactic correctness. The syntactic correctness satisfies a Specific (detail, simple, sensible, significant), Measurable (can be count, meaningful, motivating), Achievable (reasonably accomplish/achieve, agreed, attainable), Reachable/Relevant (probable to reach/must reasonable, realistic and resourced, results-based), dan Timeable/Timebound (time bound/schedule estimation) goal criteria.

Well-accessible data is data that is easy to find, understand, and use by users within an organization. It is a crucial concept in the digital age, where data is used for decision-making, strategic planning, and operational efficiency.

Well-structured data is data that is organized in a standardized format, making it easy to access, analyze, and process: 

  • Structure: Structured data has a clear structure that conforms to a data model. 
  • Format: Structured data is presented in a tabular format with rows and columns that define data attributes. 
  • Meaning: The meaning, format, and definition of the data is explicitly understood. 
  • Access: Data is easy to access and query for humans and other programs. 
  • Analysis: Elements can be addressed, enabling efficient analysis and processing.

Types of Data Processing

There are different types of data processing based on the source of data and the steps taken by the processing unit to generate an output. There is no one-size-fits-all method that can be used for processing raw data.

Type: Uses
1. Manual Data Processing: The Manual Data processing method is where data entry specialists record and process data manually through then manual data entry process. Though it is one of the earliest data processing methods, manual data entry is costly, time-consuming, error-prone, and labor-intensive. E.g. the ledger, paper record systems.
2. Mechanical Data Processing: Mechanical data processing processes data through mechanical devices. E.g. data from typewriters, mechanical printers, and other devices.
3. Electronic Data Processing: In EDP, the computer seamlessly processes the data automatically with pre-defined instructions from the data specialists. E.g. the use of spreadsheets to record student marks was prevalent during this time.
4. Real-time Data Processing: Real-time processing came into existence with the advent of the internet. By utilizing the internet, this processing method receives and processes data at the same time. Simply put, it captures data in real-time and generates quick or automatic reports. Used for large amounts of data. Though the process saves time and labor, it is expensive and requires heavy maintenance. E.g. take GPS tracking systems and give input on a real-time basis, withdrawing money from ATM.
5. Automatic Data Processing: Data processing cannot be made better, with no human intervention, data entry on a real-time basis, error-free, and secure than any processing methods. Though the process saves time and labor, it is expensive and requires heavy maintenance. E.g. data of billions and billions of invoices in the logistics sector.
6. Online Data Processing: Data is automatically fed into the CPU as soon as it becomes available. Used for continuous processing of data (both receive and process data simultaneously). The user can fed and extract data anytime, anywhere. E.g. barcode scanning, access cards.
7. Batch Processing: Data is collected and processed in batches (process data by providing actions to multiple data sets through a single command). Used for large amounts of data. E.g. payroll system, spreadsheets data.
8. Multiprocessing/Parallel Processing: Data is broken down into frames and processed using two or more CPUs within a single computer system. E.g. weather forecasting.
9. Time-sharing: Allocates computer resources and data in time slots to several users simultaneously. Though the process saves time and labor, it is expensive and requires heavy maintenance. E.g. airport network flight scheduling, dock network shipping scheduling.

From data processing to data analysis

Businesses and public institutions / organizations use software to collect information that reveals associations, patterns, and trends. To arrive at this outcome, they’ll follow five steps:
  • Determining the questions and goals
  • Collecting the data they require
  • Wrangling the data
  • Establishing the data analysis approach
  • Interpreting their results
  • Your organisation analyses the data it has collected. The goal of this is to deliver valuable information, provide and support conclusions, and aid decision-making for a variety of purposes.

There are many different examples of data analysis, both in professional and personal environments.

  • In the first instance, a private or public organisation may analyse data it holds about its users in order to deliver a more personalised service. For example, a customer’s past purchases may be assessed and this information could be used by companies to create bespoke offers for them.
  • In the second instance, you might review a range of different companies that offer the same product and make a data-driven decision on which one to take by assessing the features against the cost

< to learn data process in database, click Data Process In Database
.

Bibliography

Database; Pearson Ebook
Data Modeling; O'Reilly Ebook
Data Warehousing
Project Management
Software Design
Software Engineer
System Analyst
https://snowplow.io/
https://www.atlassian.com/
https://www.mindtools.com/
https://www.simplilearn.com/

Related Post


Tidak ada komentar:

Posting Komentar