Skip to main content

Opensource or Public Datasets for Machine Learning Studies and Research



Machine learning (ML) techniques have been applied in many applications from academia to industry and have started to influence our daily lives such as in social media applications or online shopping. Hence, many machine learning algorithms have been developed to improve the performance of these ML techniques.


While learning machine learning basic or developing new algorithms it is essential to have reliable and large datasets which include logical connections and labels between data member. Especially in academia, having a well-known and extensively examined datasets is necessary in order to investigate the performance of newly developed machine learning algorithms and compare them to existing ones.

There are a large amount of publicly available datasets that could be used with various machine learning techniques such as deep learning, classification, reinforcement learning, clustering, etc. I would like to present the datasets that I really like to use:

1. UC Irvine Machine Learning Repository

Nearly all datasets have been published by university researchers in this repository. A wide range of datasets from various areas from marketing to wireless communication systems can be found and most of them are well documented. Link

2. Deeplearning.net Datasets

These datasets are mainly to be used in benchmarking deep learning algorithms. Link

3. Wikipedia: List of Datasets


A Wikipedia page lists plenty of datasets with comprehensive details about them including format, creators, reference study and descriptions of them. Link  




Extra: MIT Lectures on Machine Learning and Deep Learning








Popular posts from this blog

Electromagnetic Modelling and Antenna Simulation via Opensource Software

Commercial electromagnetic simulation (EM) software packages such as CST Microwave Studio and  ANSYS HFSS are widely used in commercial applications and educational purposes. Based on my experience, they provide very accurate results which match measurements in most antenna works. On the other hand, there are also very solid opensource software and applications which may also provide similar results in some applications. Antennas are also used in radio telescopes While commercial EM software suits usually have very good documentations, easy-to-use interference, and result visualisation and navigation tools, opensource EM software suits might consist of only the solver and documentation which explains how it should be used and implemented for design and simulation via an interference and a programming language such as Python, MATLAB, C++ . As they are opensource, it is also possible to edit their codes and advance their functions and performance. Thus, these features makes openso...

15 Best IoT Energy Startups

15 Best IoT Energy Startups Energy is one of the common problems all around the world. Without it, all our technology will be useless. The interesting thing about energy is despite we have numerous sustainable and renewable energy sources; we still rely on scarce sources. Renewable energy is important because it provides a clean, sustainable and unlimited source of energy that can reduce our dependence on finite and polluting fossil fuels, helping to mitigate the impacts of climate change. Additionally, renewable energy can lead to local economic development, improve energy security and reliability, and create new jobs and industries. Wind power is one of the most important renewable energy sources.  On the other hand, the Internet of Things will change the trends in the energy industry as well. In fact, the number of IoT energy startups is worth mentioning. If you are interested in this field and love the idea of taking advantage of new technologies, then you need to know the foll...

Most Popular and In-demand Programming Languages in 2021

Top 5 Most Popular Programming Languages in 2021 If you are planning to learn to program, then we believe this guide will be an amazing source for you. We have covered everything that you need to know about the popular programming languages. Although, the programming language you need to use may vary depending on your purpose, this list will still be beneficial for you. Considering that the following programming languages are widely used in industry, learning one or two of these languages will greatly help you find fantastic job opportunities all over the world. Moreover, software programming and engineering jobs are the best options for the people who want to work from home (WFH) and home office. Software Engineer and Programmer What are the Best Programming Languages to Find a Great Job? We here present the obviously best programming languages that will also dominate the markets in 2022. It is still not too late to learn any of them since they will be around for at least a decade a...