Data Acquisition and Wrangling

Hana M April 28, 2023 | 10:55 AM Technology

Data acquisition and wrangling are the processes of obtaining and cleaning data, respectively. These two steps are critical in preparing data for analysis, as data that is incomplete, inconsistent, or incorrect can lead to inaccurate or misleading results.

Data acquisition involves obtaining data from various sources, such as databases, files, or APIs. This may involve collecting data from internal or external sources, or through data scraping. Data acquisition can also involve data integration, which involves combining data from multiple sources into a single dataset.

Data wrangling, on the other hand, involves cleaning, transforming, and organizing the data so that it is ready for analysis. This may involve tasks such as removing missing values, handling outliers, dealing with duplicate records, and transforming data into a more suitable format. Data wrangling may also involve merging or joining datasets, creating new variables or features, or aggregating data to a higher level of granularity.

Figure 1. Data Acquisition and Wrangling [1]

Figure 1 shows data acquisition and wrangling. The process of data acquisition and wrangling can be time-consuming and labor-intensive, but it is critical for ensuring the quality and integrity of the data. Proper data acquisition and wrangling can improve the accuracy of analyses and enable more informed decision-making.

The Goals of Data Wrangling

  • Reveal a "deeper intelligence" by gathering data from multiple sources [2]
  • Provide accurate, actionable data in the hands of business analysts in a timely matter [2]
  • Reduce the time spent collecting and organizing unruly data before it can be utilized [2]
  • Enable data scientists and analysts to focus on the analysis of data, rather than the wrangling [2]
  • Drive better decision-making skills by senior leaders in an organization [2]

Key Steps to Data Wrangling

Data Acquisition: Identify and obtain access to the data within your sources. [2]

Joining Data: Combine the edited data for further use and analysis. [2]

Data Cleansing: Redesign the data into a usable and functional format and correct/remove any bad data. [2]

Some common tools for data acquisition and wrangling include SQL for data extraction and manipulation, Python or R for data cleaning and transformation, and tools like Excel or Google Sheets for basic data cleaning and manipulation. It's also important to keep in mind data ethics and privacy considerations when acquiring and wrangling data, such as ensuring data is de-identified, consent is obtained where necessary, and that data is used in an ethical manner.

References:

  1. https://en.wikipedia.org/wiki/Data_wrangling
  2. https://altairengineering.fr/what-is-data-wrangling/

Cite this article:

Hana M (2023), Data Acquisition and Wrangling, AnaTechmaz, pp.60

Recent Post

Blog Archive