Skip to Main Content

Data Science

A guide to library resources about Data Analytics

Data Analytics and Resources

  • "Data analytics is the computer-supported analysis of data. Its purpose is to derive information from the data and then use that knowledge to make better decisions or to effect change. That is, people want to reach an optimal decision quickly, and apply it, based on the analysis of data. By analyzing large amounts of data, people can discover hidden patterns and gain insights. Data may be structured or unstructured and may range up to a zettabyte in size." (Malliaris, 2018).
  • Qualitative data analysis aims to "the generation of taxonomy, themes, and theory germane" for a research, and it "can improve the description and explanation of complex, real-world phenomena pertinent to the research (Bradley, Curry, & Devers, 2007).

Notes:

Malliaris, M. (2018). Data analytics. In R. Kolb (Ed.), The SAGE encyclopedia of business ethics and society (Vol. 1, pp. 822-826). Thousand Oaks,, CA: SAGE Publications, Inc. doi: 10.4135/9781483381503.n291

Bradley, E., Curry, L., & Devers, K. (2007). Qualitative Data Analysis for Health Services Research: Developing Taxonomy, Themes, and Theory. Health Services Research, 42(4), 1758–1772. https://doi.org/10.1111/j.1475-6773.2006.00684.x

There are two types of statistics:

Descriptive statistics

Descriptive statistics are used to summarize and describe data (information that has been collected).

  • Data are usually organized and presented in tables or graphs that summarize information, such as histograms, pie charts, bars or scatter-plots.
  • Descriptive statistics are only descriptive and, thus, do not involve generalizing beyond the data that has been collected.

Examples:

  • The average age of university students
  • The number of female and male students in a college

Inferential statistics:

With Inferential statistics, data are usually collected from a sample; that is, a smaller representative subset of the larger population we wish to investigate.

  • Inferential statistics use the theory of probability to investigate whether patterns found in the sample of study can be generalized to the wider population where the sample comes from.
  • Inferential statistics aim to test hypotheses and explore relationships between variables, and can be used to make predictions about the population.
  • Inferential statistics are used to draw conclusions and inferences; that is, to make valid generalizations from samples.

Examples:

  • Statistical techniques to explore the relationship between variables (e.g. correlation coefficients).
  • These techniques show us whether two variables are related
  • Whether there is a relationship between stress levels and academic results.

What type of statistic (descriptive or inferential) would you employ to answer the research questions below?  

Question Type of Statistic
  • How many college students experience high stress levels?
 
  • Do female students experience higher stress levels than male students?
 
  • What study strategies are used by first year college students?
 
  • Is there a relationship between college students' study strategies and their academic results?
 

 

The following are open sources and books to help learning data mining:

  • Websites 

            Coursera Data Mining Specialization
            Social Media Mining
            Twitter Data Analytics
            Social Media Mining With R
            Text Mining with R -- an Analysis of Twitter Data
            RDatamining.com: R and Data Mining
            Using the R Twitter Package

  • E-books

Data Mining Applications with R 

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications 

Data Analytics Tools

Bookshelf