SQLBits 2017

The purpose of this post is to summarize my thoughts and points I came across the SQLBits (web|twitter) conference. The points are not explained into detail as they serve as reminder of topics that are important and have to be looked into. I use them to possible key words for deeper research on the topic covered during the presentation. This is just a tip of iceberg I gained during the four days, so if you are interested in any of these topics in more detail feel free to reach out to me. I’ve included a link to each session, which should be available as recording for free, each speaker’s blog and twitter account where you can find as well a lot more on each topic. Enjoy!

Day 1

From 0 to DAX

Speaker: Alberto Ferrari (blog|twitter)

Notes

Mind shift between Multidimensional and Tabular
- Use more calculated measures (CPU) rather than calculated columns (RAM)
- CPU is much, much faster than RAM
- Smaller models are faster
Filter context
- Automatically on table filtered by selections
  - Row
  - Column
  - Filter
  - Slicer
- Not changed by FILTER function
- FILTER changes the table, but not the filter context
Row context
- Automatically for each row
- Iterates through rows in table
- Does not filter anything
FILTER function
- Contexts and Relationships
- Go by default from many to one not from one to many
- Cross-filter direction filters as well from fact table to another dimension
- Alberto: “Result is correct but nobody understands it. It’s a strange definition of correct.”
Context Transition
- Performed by CALCULATE
- Row context transforms into filter context
- Calling measure is automatically surrounded by CALCULATE
- Never add a table name in front of a measure – Alberto: “If you do it, you go to hell!”
Troubleshooting issues
- Stop thinking
- Look at filter context and how it is applied
Use only one Date table in model

Day 2

Columnstore Indexes-the practical solutions

Speaker: Niko Neugebauer (blog|twitter)

Notes

Columnstore Archive Compression
- Better
- For data not used frequently
- Can separate on partitions
Dictionaries
- Do not use strings on fact tables
- Dictionary pressure
  - If dictionary is over 7MB it cuts down your row groups and slows down the performance
  - If you have too large dictionaries (more than 100k rows) they are not used to use predicate pushdown for strings
- Global dictionary
  - Each partition has its own dictionary
Niko: “Strings are evil!” 🙂
Use bulk load to load data directly to CI
- Use magic number to avoid Deltastore
Do not use CI with In-memory optimized tables
- not production ready, more problems than solutions
Hammer DB
- Generate test databases with lots of data
- http://www.hammerdb.com/
Batch mode is the gamechanger not compression
Partitioning
- Make sure how big the row-groups are in your partitions
- Each partition = min. 1 rowgroup
- Be wise with partitioning
- It’s not a performance feature, but a data management feature
In case of large strings and data dictionary pressure
- Create a NCI and avoid the large column
- Look at trim_reason_desc = NO_TRIM => this is ideal

Day 3

Columnstore killed the OLAP Star-Did it really?

Speaker: Gabi Munster (blog|twitter) and Niko Neugebauer (blog|twitter)

Notes

Hard to find people skilled in both
Hardware is not as easy to get as everybody says
If going to Azure check your requirements
Beware of dictionary pressure with Columnstore and page trimming
CI can perform on complex queries, but requires lot of knowledge writing it
Three factors for evaluation
- Performance
- Hardware cost
- Expert person cost and availability
In some cases, Tabular can run circles around columnstore. #performance
Semantic model is the main aspect for SSAS
Modern Datawarehouse requires combination of various data sources
- Tabular is comfortable here

T-SQL : Bad Habits and Best Practices

Speaker: Aaron Bertrand (blog|twitter)

Notes

Check out SQL Plan Explorer
Presentation for this on SentryOne blog
Great to review your T-SQL habits
Act 1 Stay consistent
Act 2 Use dates properly
Act 3 Get predictable behavior
Act 4 Query efficiently

Introduction to Azure Analysis Services

Speaker: Chris Webb (blog|twitter)

Notes

SSAS Tabular on cloud
Be aware which subscription you use
- Might happen that you deploy with wrong cached account
Using standard development tools
REST API for automation
Main drawback is the cost
- It’s in preview
Opens new usage scenarios of maturing Power BI

Optimizing multi-billion row tables in Tabular

Speaker: Alberto Ferrari (blog|twitter)

Notes

Do not store primary keys of fact tables
Care much more about cardinality rather than data type
Segments not smaller than partition
Parallelism is driven by segments not partitions
Create benchmark for optimizing columns
- Time of Sum
- Time of DistinctCount
Hardware is super-important
- Have the fastest cores you can have
Be aware of NUMA mode of your server
Be aware of relationships
- 1M on dimension

In-database Predictive Analytics with SQL Server vNext

Speaker: Sacha Tomey (blog|twitter)

Notes

Be a careful to select direct query vs. import
If you do not have gateway you can have some options with OneDrive
Carefully name the data sources in gateway according to actual server and database names

Power BI Gateway

Speaker: Adam Saxton (blog|twitter)

Notes

Map users between AAD a AD

Day 4

Database Design Throwdown

Speaker: Karen López (blog|twitter) and Thomas LaRock (blog|twitter)

Notes

Karen “Your main job is to keep your boss out of jail!”
Karen “Just in case is a lousy design pattern.”
Make the design decision on Cost, Benefit, Risk
Use sequences instead of identity
Start from lowest number not 1
Script for identity analysis from Tom’s blog
- https://thomaslarock.com/2016/12/sql-server-identity-values-check-github/

Building Blocks of Cortana Intelligence Suite in Azure

Speaker: Melissa Coates (blog|twitter)

Notes

Overview of Azure services forming Cortana Intelligence Suite
Presentation on Melissa’s site
Azure Data Analytics
- Currently not usable for ad-hoc queries
Azure Machine Learning
- Look at it as Import/Export Wizard in SSIS
- Helps you to start quickly and learn advanced analytics before you advance to own customized algorithms

Using M for data access in Analysis Services Tabular v.next

Speaker: Chris Webb (blog|twitter)

Notes

Azure adventure works from Redgate
Query folding for load performance
- SQL server
- SSAS
- Odata
Lot of new small SSAS model
- Will grow up and will need to scale
Direct query
- SSAS as new semantic layer on Big Data / Cloud sources
Partitioning with M in SSAS

SQLBits 2017

Day 1

From 0 to DAX

Speaker: Alberto Ferrari (blog|twitter)

Notes

Day 2

Columnstore Indexes-the practical solutions

Speaker: Niko Neugebauer (blog|twitter)

Notes

Day 3

Columnstore killed the OLAP Star-Did it really?

Speaker: Gabi Munster (blog|twitter) and Niko Neugebauer (blog|twitter)

Notes

T-SQL : Bad Habits and Best Practices

Speaker: Aaron Bertrand (blog|twitter)

Notes

Introduction to Azure Analysis Services

Speaker: Chris Webb (blog|twitter)

Notes

Optimizing multi-billion row tables in Tabular

Speaker: Alberto Ferrari (blog|twitter)

Notes

In-database Predictive Analytics with SQL Server vNext

Speaker: Sacha Tomey (blog|twitter)

Notes

Power BI Gateway

Speaker: Adam Saxton (blog|twitter)

Notes

Day 4

Database Design Throwdown

Speaker: Karen López (blog|twitter) and Thomas LaRock (blog|twitter)

Notes

Building Blocks of Cortana Intelligence Suite in Azure

Speaker: Melissa Coates (blog|twitter)

Notes

Using M for data access in Analysis Services Tabular v.next

Speaker: Chris Webb (blog|twitter)

Notes

Leave a Reply Cancel reply

Tech Lunch News

Data Engineer Diary: How to make Power BI Gateway work in an environment with a proxy

Self-hosted integration runtime

PowerShell kvíz – a modří už vědí?

Data Engineer Diary: How to take over Power BI Gateway

Power BI – unpivotování více než jednoho řádku