**How to become a data scientist**

In today’s age, the importance of computers is increasing day by day.

So, today’s young person who has just finished his engineering studies and wants to get a position as a responsible person in the family thinks of a job so that he can meet the needs of himself as well as his family.

Now you may be wondering if a person who is proficient in another computer language like Android, PHP, ASP.NET MVC, etc. can get a higher paying job by acquiring knowledge as a **“Data Scientist”**? So, all your questions will be answered here.

The demand for **“data scientists”** is the highest these days.

If you have very in-depth knowledge, then you can get a mouth-watering salary from the company. Let’s discuss in detail Machine Learning, Data Scientists, Python, etc.

Whenever you start your journey as a data scientist, the first step is to move towards “data pre-processing”.

If you want to understand the reason, you should read the following paragraph carefully, in which I have given information on how machine learning is leaving its mark on human society and how it will be useful to human society in the future.

When you try to get information about any topic or are interested to buy a product online at that time, many other options are shown to you by the computer.

Now you have so many options to choose from, and now you can get information from your chosen website or buy a product.

Let’s understand it as a beginner, this process is a kind of **algorithm** or we can say a **kind of data processing**, which works to deliver the item according to your needs.

**Data Pre-processing** will be your first step when you choose to become a data scientist as your earning tool.

So first we need to see how we can import a library, upload CSV files, etc. using a **Google Colaboratory. ****Import library**

Step 1) “numpy”, will allow us to work with an array.

Step 2) “matplotlib” will allow us to plot charts or graphs (for data visualizations).

Step 3) “pandas” will allow us to import the data set but also create a matrix of features and dependent variable vector.

There are many libraries [Seaborn, Plotly (for interactive visualizations), Scikit-Learn (for Machine Learning Tasks), Machine Learning, Tensorflow, and more], I am going to cover most of them which are necessary for you.

In my first blog post, I informed you that you should have knowledge of one of the two languages (R or Python). If you want to read that post, I have put the link here, you can go and get the basic information about it.

**Let’s just understand the basic library “NumPy” in general. **

**WHAT IS NUMPY?**

- It is open-source and distributed under a library BSD License, widely used in almost most of the field of Data Science.

- Both Fresher, as well as Experience person, can learn it.

- It is a linear algebra library for the Python programming language, Fast and versatile.

- It works as the main building block for all libraries present in “PyData”.

- It is used in wide variety of mathematical operations on arrays, basic statical operation, shape manipulation and many more.

- Two main important part is Vector (
**Strictly**1-D Array) and matrices (2D), Which I am going to discuss in the upcoming blog.

Now, if you want to learn and play with code and if python is already installed on your pc then use it,

conda install numpy

or

pip install numpy

For installation Numpy.