With digitalization, we are living in an exciting time where data is flowing in from everywhere (mobile devices, computers, etc.). This information is collected, stored and processed in order to give it meaning. That is to say, this information is used to make marketing decisions, solve complex problems, react more quickly to change and understand the world around us. However, making sense of this information can be subtle when you don’t know where to start or how to go about it; And this is where Data Science comes into play.
Data science is the use of methods to analyze massive amounts of data and extract the knowledge they contain. Data Science is a combination of analytical knowledge of mathematics and statistics, computer programming knowledge needed to work with data, and an area of expertise. The field of expertise is the fundamental element in data science because without this element one is called to be a mathematician, statistician or programmer. The data scientist is then responsible for analyzing, processing, modeling the data and then interpreting it for possible decision-making.
The sectors of activity using Data Science in decision-making are only increasing, so we can cite:
These few examples allow us to conclude that data science is an integral part of our future.
Data are important and to learn them we must know the different type of data that we have.
In data science there’s many different types of data:
Structured data are kind of data that are formatted and depended on a predefined model. They are easily processable and can be accessed by humans and computers. They are generally stored in a database and the structured query language is the preferred way to manage them.
Unstructured data is data that doesn’t depend on any model. They aren’t easy to fit into a data model because the content is context-specific or varying.They has no rules or format, and it cannot be easily used by programs.
Natural language is a special type of unstructured data; They concern data that are used in the NLP (Natural Language) which is a branch of data science. It allows computers to analyse, understand human language and generate interactions, transforming raw data into intelligent conversation.
Machine-generated data is information that’s automatically created by a computer, process, application, or other machine without human intervention. Machine-generated data is becoming a major data resource and will continue to do so.Examples of machine data are web server logs, call detail records.
Graph or network data is, in short, data that focuses on the relationship or adjacency of objects. The graph structures use nodes, edges, and properties to represent and store graphi- cal data. Graph-based data is a natural way to represent social networks, and its struc- ture allows you to calculate specific metrics such as the influence of a person and the shortest path between two people.
Audio, image, and video are data types that pose specific challenges to a data scientist. Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for computers.
While streaming data can take almost any of the previous forms, it has an extra property. The data flows into the system when an event happens instead of being loaded into a data store in a batch.Although this isn’t really a different type of data, we treat it here as such because you need to adapt your process to deal with this type of information.
So now we know the different types of data and then we will discover what skills we need to be data scientists
Nowadays there are some schools that now offer specialized programs tailored to the educational requirements for pursuing a career in data science, giving students the option to focus on the field of study they are most interested in, and in a shorter period of time. There’s some skills needed to become data scientist:
It’s good for now, so let’s become data scientist 😉