Skip to main content

Opensource or Public Datasets for Machine Learning Studies and Research



Machine learning (ML) techniques have been applied in many applications from academia to industry and have started to influence our daily lives such as in social media applications or online shopping. Hence, many machine learning algorithms have been developed to improve the performance of these ML techniques.


While learning machine learning basic or developing new algorithms it is essential to have reliable and large datasets which include logical connections and labels between data member. Especially in academia, having a well-known and extensively examined datasets is necessary in order to investigate the performance of newly developed machine learning algorithms and compare them to existing ones.

There are a large amount of publicly available datasets that could be used with various machine learning techniques such as deep learning, classification, reinforcement learning, clustering, etc. I would like to present the datasets that I really like to use:

1. UC Irvine Machine Learning Repository

Nearly all datasets have been published by university researchers in this repository. A wide range of datasets from various areas from marketing to wireless communication systems can be found and most of them are well documented. Link

2. Deeplearning.net Datasets

These datasets are mainly to be used in benchmarking deep learning algorithms. Link

3. Wikipedia: List of Datasets


A Wikipedia page lists plenty of datasets with comprehensive details about them including format, creators, reference study and descriptions of them. Link  




Extra: MIT Lectures on Machine Learning and Deep Learning








Popular posts from this blog

Electromagnetic Modelling and Antenna Simulation via Opensource Software

Commercial electromagnetic simulation (EM) software packages such as CST Microwave Studio and  ANSYS HFSS are widely used in commercial applications and educational purposes. Based on my experience, they provide very accurate results which match measurements in most antenna works. On the other hand, there are also very solid opensource software and applications which may also provide similar results in some applications. Antennas are also used in radio telescopes While commercial EM software suits usually have very good documentations, easy-to-use interference, and result visualisation and navigation tools, opensource EM software suits might consist of only the solver and documentation which explains how it should be used and implemented for design and simulation via an interference and a programming language such as Python, MATLAB, C++ . As they are opensource, it is also possible to edit their codes and advance their functions and performance. Thus, these features makes opensource

Most Popular and Best Video Games in 2021

The video game industry is one of the biggest virtual industries in the world. Almost half of the world's population is regularly playing games to relieve the stress in their lives. As a result, the importance of the gaming industry is increasing every passing year. Thus, we wanted to compile the most popular video games in 2021 for you! Do not miss your chance to check out the following games to give them a try. These games have been chosen by our editor and we are sure that you will enjoy playing them. Computer games are played by everyone ! These are the best video games in 2021 according to our game editor, please keep reading for the details of each game. Call of Duty: Modern Warfare FIFA 2021 Fortnite New World Apex Legends Top 5 - Most Popular Video Games in 2021 Here are the top 5 most popular video games that managed to be highly popular in 2021. We bet you have played some of these amazing games before! 1. Call of Duty: Modern Warfare Call of Duty is one of the most popul

On the performance of Matlab and Parallel Computing

MATLAB is one of the most powerful scientific computing tools along with Python. Although Python is my favorite scientific programming language since it is opensource, well-documented and has plenty of libraries, I sometimes use MATLAB especially while dealing with very large matrices as MATLAB is highly optimized for large-scale matrix operations, consequently, it performs better at processing very large matrices. From a parallel computing perspective, MATLAB actually strives to utilize all available CPU cores in a parallel way to maximize its performance and reduce the computation time when it is possible. Therefore, it does a kind of parallel computing when it is possible such as in matrix operations as these operations are very suitable to be run parallelly.  However, the parallel operation of the MATLAB might be restricted by bad coding practice of the users especially using for or while loops, because those loops are generally performed in a serial manner with an increasi