Skip to Main Content

Data Science

A guide to library resources about Data Analytics

Data Sources: QC Databases

Open Data - Economics, Finance & Tax

Open Data Sources, General

Research Data Journals

        -- Policies and Guidelines
        -- Public repositories to store and find data


Research Data Repositories

  • The Harvard Dataverse Repository is open to all researchers, both inside and outside of the Harvard community.
  • It contains over 100,000+ multidisciplinary datasets from published studies by researchers within and outside the Harvard community.

  • Dryad is a non-profit initiative organized by several academic institutions, research societies, and publishers to archive and share data.
  • Covers a broad range of research topics in natural and life sciences.

  • IEEE DataPort provides datasets from studies published in their many scholarly journals.


Directory of Data Repositories

Open data directories curate and organize lists of high-quality scholarly/research data repositories by subject area.

  • Contains over 2,000 research data repositories indexed.
  • Provides multidisciplinary data on Humanities & Social Sciences, Natural Sciences, Life Sciences, and Engineering Sciences.

  • Open Access Directory (OAD) is maintained by a community of researchers and scholars.
  • Data repositories are categorized by research area.



  • Contains over 200,000+ datasets collected from its many federal agencies and departments.

  • From the U.S. Census Bureau.
  • Provides quality demographic data about U.S. population and economy.
  • From the US Department of Health & Human Services.
  • Provides data on environmental health, medical devices, Medicare & Medicaid, social services, community health, mental health, and substance abuse.

  • From the US Department of Transportation
  • Provides data on driving to public transit, bicycling and walking. etc.


Major Organizations/Associations

  • From the Federal Reserve
  •  Provides over 500,000 financial and economic data series, including datasets on economic indicators, banking & finance, labor markets, employment, and national and international accounts.    
  • From the Pew Research Center
  • Provides data on politics, media, culture, religion, and internet/tech.  

                                                                 [list of topics] [list of dataset categories]

  • From the World Bank
  • Provides data on major world development indicators (i.e., GDP, population, life expectancy, and education levels).

  • From WHO's Global Health Observatory (GHO) project
  • Provides data on major global health indicators (i.e., disease spread, vaccinations, and rates of other illnesses/conditions).

Dataset Aggregators

  • Google's dataset search engine.
  • It can be used to search for many of the government and association-provided datasets.

  • AwesomeData community publishes open data resources on GitHub.
  • Data resources are categorized neatly by resarch area.

Data Science Communities
Data science communities upload and share interesting datasets for the purposes of model building and analysis.

  • A popular data science competition site.
  • Contains datasets for building predictive models.
  • The data the quality may vary as any user in the Kaggle community can submit datasets,

  • From the University of California--Irvine, the UCI Machine Learning Repository
  • Provides datasets cleaned and preprocessed.
  • Pprovides the data on politics, economics, and sports.


The City of New York's open data is collected by the Open Data team, which is consisted of the Mayor’s Office of Data Analytics (MODA) and the Department of Information Technology and Telecommunications (DoITT).

It provides datasets on almost all aspects of New Yorkers' life.

  • Business
  • City government
  • Education
  • Evironment
  • Health

  • New York State's Open Data Portal provides a powerful tool that provides centralized access to high-value government data to search, explore, download, and share.
  • Users can track visualized data on coronavirus testing and infection rates in NYS.



• is a platform where the world’s problem solvers can find and use a vast array of high-quality open data.

• World Bank Open Data

• Open data site finder (Tableau Public viz)

• Kaggle Open Datasets find open datasets on everything from government, health, and science to popular games and dating trends.

• Global Health Data Exchange the world’s most comprehensive catalog of surveys, censuses, vital statistics, and other health-related data.

• UNICEF Statistics

• World Health Organization Data

• The Guardian Data Blog

United States

• is the Federal Government’s open data initiative. Hundreds of data sets.

• The Pew Research Center has a number of data sets on different social and technology topics.

• Socrata is an open data repository with data, mostly from government sources.

• US Dept. of Agriculture

• US Census Bureau


• Centers for Disease Control and Prevention

• US Dept. of Education College Scorecard Data

• NASA's Open Data Portal


• Canada Open Data

United Kingdom


Data Preparation, Presentation, and Citation

Data cleaning, transformation, and integration stage will take 60-80% of your time. Data visualization can be a part of the analyze or present stages.

Data finding & extraction example:

The Censuses of Religious Bodies, 1906 - 1936

Some key objectives to think about when presenting data analysis:

  • Visual communication
  • Audience and context
  • Charts, graphs, and images
  •  Focus on important points
  • Design principles
  • Storytelling
  • Persuasiveness
  • Dashboards


Some resources to help properly cite data for research: