Skip to Main Content

Data Science

A guide to library resources about Data Analytics

Data Analytics and Resources

  • "Data analytics is the computer-supported analysis of data. Its purpose is to derive information from the data and then use that knowledge to make better decisions or to effect change. That is, people want to reach an optimal decision quickly, and apply it, based on the analysis of data. By analyzing large amounts of data, people can discover hidden patterns and gain insights. Data may be structured or unstructured and may range up to a zettabyte in size." (Malliaris, 2018).
  • Qualitative data analysis aims to "the generation of taxonomy, themes, and theory germane" for a research, and it "can improve the description and explanation of complex, real-world phenomena pertinent to the research (Bradley, Curry, & Devers, 2007).

Notes:

Malliaris, M. (2018). Data analytics. In R. Kolb (Ed.), The SAGE encyclopedia of business ethics and society (Vol. 1, pp. 822-826). Thousand Oaks,, CA: SAGE Publications, Inc. doi: 10.4135/9781483381503.n291

Bradley, E., Curry, L., & Devers, K. (2007). Qualitative Data Analysis for Health Services Research: Developing Taxonomy, Themes, and Theory. Health Services Research, 42(4), 1758–1772. https://doi.org/10.1111/j.1475-6773.2006.00684.x

There are two types of statistics:

Descriptive statistics

Descriptive statistics are used to summarize and describe data (information that has been collected).

  • Data are usually organized and presented in tables or graphs that summarize information, such as histograms, pie charts, bars or scatter-plots.
  • Descriptive statistics are only descriptive and, thus, do not involve generalizing beyond the data that has been collected.

Examples:

  • The average age of university students
  • The number of female and male students in a college

Inferential statistics:

With Inferential statistics, data are usually collected from a sample; that is, a smaller representative subset of the larger population we wish to investigate.

  • Inferential statistics use the theory of probability to investigate whether patterns found in the sample of study can be generalized to the wider population where the sample comes from.
  • Inferential statistics aim to test hypotheses and explore relationships between variables, and can be used to make predictions about the population.
  • Inferential statistics are used to draw conclusions and inferences; that is, to make valid generalizations from samples.

Examples:

  • Statistical techniques to explore the relationship between variables (e.g. correlation coefficients).
  • These techniques show us whether two variables are related
  • Whether there is a relationship between stress levels and academic results.

What type of statistic (descriptive or inferential) would you employ to answer the research questions below?  

Question Type of Statistic
  • How many college students experience high stress levels?
 
  • Do female students experience higher stress levels than male students?
 
  • What study strategies are used by first year college students?
 
  • Is there a relationship between college students' study strategies and their academic results?
 

 

The following are open sources and books to help learning data mining:

  • Websites 

            Coursera Data Mining Specialization
            Social Media Mining
            Twitter Data Analytics
            Social Media Mining With R
            Text Mining with R -- an Analysis of Twitter Data
            RDatamining.com: R and Data Mining
            Using the R Twitter Package

  • E-books

Data Mining Applications with R 

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications 

Data Analytics Tools

Summary of Software Access

Software Windows/Mac Developer Open source Access

ArcGIS

Windows Esri No CUNY-wide licensed
Excel Both Microsoft No CUNY-wide licensed
Minitab Windows Minitab Inc. No Free trial download
SAS Windows SAS Institute No CUNY-wide licensed
SPSS Both IBM No CUNY-wide licensed
Stata Both StataCorp LLC No Purchase
R Both R Foundation Yes Free download 
Tableau Both Tableau Software No

Free trial download

A free one-year license for students

A free one-year license for instructors

Use CUNY Virtual Desktop to access CUNY licensed software packages remotely.

**The server name for CUNY Virtual Desktop (VMware Horizon Clientvmware horizon client): https://virtualdesktop.cuny.edu

undefined

To remotely access software using CUNY Virtual Desktop:

  1. Install it onto your computer, tablet or smartphone.
  2. Access your applications by clicking on one of the icons above via https://www.cuny.edu/about/administration/offices/cis/virtual-desktop/.
  3. Log in using your CUNYfirst ID followed by @login.cuny.edu, and your CUNYfirst password.
  4. Save your data to a flash drive or local drive. Be sure to save your data before exiting the CUNY Virtual Desktop or your work will be lost. Print to any printer connected to your local device.
  5. For additional help, please see the FAQ’susage instructions, or contact QC ITS (Information Technology Services).

You can reach QC ITS via:

  • Email: helpdesk@qc.cuny.edu​
  • Live on-line support
  • Create your ticket - log into Self Service (use your QC Username if available; otherwise, login as a guest)
  • Phone: 718-997-4444
  • In person: Dining Hall, Room 151 (Currently No On Campus Support)NoOn Campus Suppor

To get more information, please visit QC ITS Service Desk.

Bookshelf