Microsoft Fabric and Data Science

One of Microsoft’s key selling points for convincing organizations to adopt Microsoft Fabric is its ability to support data science activities by allowing individuals to complete and end-to-end data science project through one service.

What is Data Science?

Data Science combines mathematics, computer engineering, and statistics.

Data science supports organizations by helping them to make informed decisions on key business problems that require leveraging their data assets. Data scientists help organizations to analyze their data and identify patterns

The data science process

What is Machine Learning?

The objective of machine learning is to train models that have the ability to identify patterns in vast amounts of data. With knowledge of these patterns, we can make predictions that provide new insights that can drive action and decision-making.

There are four types of machine learning models:

  1. Classification: Predict a categorical value like whether a customer may churn.

  2. Regression: Predict a numerical value like the price of a product.

  3. Clustering: Group similar data points into clusters or groups.

  4. Forecasting: Predict future numerical values based on time-series data like the expected sales for the coming month.

What is the Data Science process?

Data scientists use machine learning models to identify patterns and generate insights that can help businesses answer important questions. The process for creating these models can be broken down generally into the following steps:

  1. Define the problem: Working with business representatives and analysts, decide on what the model should predict and what defines success

  2. Get the data: Explore, access, and store data from key data sources

  3. Prepare the data: Explore, clean and transform the data based on the model's requirements

  4. Train the model: Choose an algorithm and hyperparameter values based on trial and error by tracking your experiments (e.g., with MLflow)

  5. Generate insights: Leverage model batch scoring to generate the requested predictions

What do data scientists do with models? Why are they important?

How do I train the model?

You can use open-source libraries depending on your language of choice. For example, if you work with Python, use Numpy and/or Pands to prepare the data and leverage libraries like Scikit-Learn, PyTorch, or SynapseML.

Can I use different models?

Yes, and you should. Experimenting with different models will allow you to understand how your choices impact the model’s success and you can use Microsoft Fabric’s MLflow to manage the models you’ve selected and deployed.

How does Microsoft Fabric support the spectrum of data science activities?

Next
Next

Microsoft Fabric and the Future of Power BI