▲Microsoft Python Driver for SQL Servergithub.com

78 points by kermatt 143 days ago | 32 comments

zurfer 143 days ago [-]

This is really timely. I just needed to build a connector to Azure Fabric and it requires ODBC 18 which in turn requires openssl to allow deprecated and old versions of TLS. Now I can revert all of that and make it clean :)

zurfer 143 days ago [-]

Actually bad luck, it seems this doesn't support Microsoft Fabric with the datawarehouse engine... it fails because Fabric doesn't support the DECLARE CURSOR operation, which this driver relies on.

abirch 143 days ago [-]

What my workself would love is to easily dump Pandas or Polar data frames to SQL Tables in SQL Server as fast as possible. I see this bcp, but I don't see an example of uploading a large panda dataframe to SQL Server.

RaftPeople 143 days ago [-]

> What my workself would love is to easily dump Pandas or Polar data frames to SQL Tables in SQL Server as fast as possible

We run into this issue also where we want to upload volumes of data but don't want to assume access to BCP on every DB server.

We wrote a little utility that actually works pretty fast compared to other methods we've tested (fast=about 1,000,000 rows per minute for a table with 10 random columns with random data), here's the approach:

1-Convert rows into fixed length strings so each row is uploaded as one single varchar column (which makes parsing+execution of SQL stmt during upload much quicker)

2-Repeatedly upload groups of fixed length rows into temp table until all uploaded.

Details:

Multiple fixed length rows are combined into one fixed length varchar column that will be uploaded as one single raw buffer row. We found a buffer size of 15,000 to be the sweet spot.

Multiple threads will each process a subset of source data rows. We found 5 threads to be generally pretty good.

At the end of this step, the destination temp table will have X rows of buffers (the buffer column is just a varchar(15000), and inside each of those buffers are Y source data rows with Z number of columns in fixed format.

3-Once the buffer rows are all uploaded then split out the source data rows+columns using a temp sproc generated for the exact schema (e.g. substring(Buffer_Data,x,y) as Cust_Name)

ludamn 143 days ago [-]

Thanks for the detailed insight

bob1029 143 days ago [-]

On BULK INSERT, the data source doesn't have to live on the actual MSSQL box. It can be something on a UNC or even in Azure:

https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-...

> Beginning with SQL Server 2017 (14.x), the data_file can be in Azure Blob Storage.

You don't need to have access to the MSSQL box to invoke high performance batch loads. UNC should give you what you need if your IT department is willing to work with it.

A4ET8a8uTh0_v2 143 days ago [-]

Honestly, what I find myself doing more often than not lately is not having problems with the actual data/code/schema whatever, but, instead, fighting with layers of bureaucracy, restrictions, data leakage prevention systems, specific file limitations imposed by the previously listed items...

There are times I miss being a kid and just doing things.

143 days ago [-]

kermatt 143 days ago [-]

While bcp lacks some features to make this as straightforward as PostgreSQL for example, i.e. piped data into bcp, it is a fast ingest option for MSSQL.

We wound up staging a local tab delimited file, and importing via bcp:

    bcp "$DESTINATION_TABLE" in "$STAGE_FILE.dat" -u -F 2 -c -t'\t'

Not elegant, but it works.

qsort 143 days ago [-]

How large? In many cases dumping to file and bulk loading is good enough. SQL Server in particular has openrowsets that support bulk operations, which is especially handy if you're transferring data over the network.

abirch 143 days ago [-]

Millions of rows large. I tried doing the openrowsets but encountered permission issues with the shared directory. Using fast_executemany with sqlalchemy has helped, but sometimes it's a few minutes. I tried bcp as well locally but IT has not wanted to deploy it to production.

sceadu 143 days ago [-]

You might be able to do it with ibis. Don't know about the performance though

abirch 143 days ago [-]

Thank you, I'll look into this. Yes performance is the main driver when some data frames have millions of rows.

__mharrison__ 143 days ago [-]

Very cool. Used to be a huge pain to connect to sqlserver from Python (especially non Windows platforms).

qsort 143 days ago [-]

I do expect this package to make connecting easier, but it was okay even before. ODBC connectivity via pyodbc has always worked quite well and it wasn't really any different when compared to any other ODBC source. I'm more on the data engineering side and I'm very picky about this kind of stuff, I don't expect the average user would even notice besides the initial pain of configuring ODBC from scratch.

tracker1 143 days ago [-]

IIRC, I had trouble if I installed the MS ODBC driver and some of the updates for Ubuntu (WSL) out of order. I generally prefer a language driver package where available.

Would be nice if MS and Deno could figure things out to get SQL working in Deno.

brewmarche 143 days ago [-]

I have very limited Python experience but I remember using pymssql in the past. Are there any problems with it?

mrweasel 143 days ago [-]

We used it heavily 10 years ago. It was okay, but it had a rocky history. For a long time it seemed abandoned, then Python 3 happened and we had to patch our own version for a while. Then a new maintainer took over and stuff would just break or APIs rewritten between versions. We ran our own pypi instance to deal with pymssql specifically.

In the later years it became rather good though and with Python 3 many of the issues with character encoding went away.

tugberkk 142 days ago [-]

I use it on Windows (and Linux) but can't get it to work on M1 chips.

yread 143 days ago [-]

iirc it had problems with named instances - only on some newer version of sql server

bormaj 143 days ago [-]

I'd love to see a production grade release of this package with async and BCP support. MSSQL has always been a second class citizen (and rightfully so) amongst its open source peers and now it's nice to see MS dedicating resources to this project. They've got some catching up to do, but the alpha benchmarks look quite promising so far!

sakesun 143 days ago [-]

Twenty years ago, when I first used Python 2.5 to connect to MS SQL Server, I had to rely on a library called adodbapi. Back then, I thought the future would move toward OLE DB—but it has since become obsolete.

wiseowise 143 days ago [-]

https://github.com/microsoft/mssql-python/blob/main/mssql_py...

Is this generated by LLM? Comments are straight out of generic LLM slop.

jollyllama 143 days ago [-]

Oooh, has it got sql_variant support? And how far back does SQL Server compatibility go?

Wojtkie 143 days ago [-]

How is this different than pymssql?

denis_dolya 143 days ago [-]

I’ve been working with SQL Server from Python on various platforms for several years. The new Microsoft driver looks promising, particularly for constrained environments where configuring ODBC has historically been a source of friction.

For large data transfers — for example, Pandas or Polars DataFrames with millions of rows — performance and reliability are critical. In my experience, fast_executemany in combination with SQLAlchemy helps, but bulk operations via OpenRowSets or BCP are still the most predictable in production, provided the proper permissions are set.

It’s worth noting that even with a new driver, integration complexity often comes from platform differences, TLS/SSL requirements, and corporate IT policies rather than the library itself. For teams looking to simplify workflows, a driver that abstracts these nuances while maintaining control over memory usage and transaction safety would be a strong improvement over rolling your own ODBC setup.

th0ma5 143 days ago [-]

This is the correct prospective. Often driver issues transcend technical and political boundaries. My old team dropped a vendor who changed the features of a driver and spent several years trying to find another as well as making that vendor reapply and make a new case, which, didn't work out for them.

143 days ago [-]

gigatexal 143 days ago [-]

If MSSQL really wanted to become more mainstream they'd release a properly free version to compete with MySQL/MariaDB and PostgreSQL.

I've not used MSSQL since 2015/2016 and haven't missed much.

Now I live in the OLAP space so I think of it far, far less.

stackskipton 143 days ago [-]

Money here means this won’t happen.

Sure, greenfield does not use MSSQL but there is a ton of companies stuck with MSSQL that will continue to have to fork over big licensing money.

gigatexal 143 days ago [-]

Which will mean it’ll stay irrelevant still.

cafard 142 days ago [-]

Irrelevant to whom? The accounting department at my old employer, which uses Great Plains/Dynamics for accounts payable? Next time I'm downtown I'll tell the comptroller that she should get with the program and run AP out of Maria DB.

stackskipton 143 days ago [-]

Which Microsoft is fine with, they bought Postgres sharding company and Azure supports all the popular DBs so they still get their money.

razakel 142 days ago [-]

SQL Server is hardly an also-ran.

Loading comments...

zurfer 143 days ago [-]

Actually bad luck, it seems this doesn't support Microsoft Fabric with the datawarehouse engine... it fails because Fabric doesn't support the DECLARE CURSOR operation, which this driver relies on.

abirch 143 days ago [-]

RaftPeople 143 days ago [-]

> What my workself would love is to easily dump Pandas or Polar data frames to SQL Tables in SQL Server as fast as possible

We run into this issue also where we want to upload volumes of data but don't want to assume access to BCP on every DB server.

1-Convert rows into fixed length strings so each row is uploaded as one single varchar column (which makes parsing+execution of SQL stmt during upload much quicker)

2-Repeatedly upload groups of fixed length rows into temp table until all uploaded.

Details:

Multiple fixed length rows are combined into one fixed length varchar column that will be uploaded as one single raw buffer row. We found a buffer size of 15,000 to be the sweet spot.

Multiple threads will each process a subset of source data rows. We found 5 threads to be generally pretty good.

3-Once the buffer rows are all uploaded then split out the source data rows+columns using a temp sproc generated for the exact schema (e.g. substring(Buffer_Data,x,y) as Cust_Name)

ludamn 143 days ago [-]

Thanks for the detailed insight

bob1029 143 days ago [-]

On BULK INSERT, the data source doesn't have to live on the actual MSSQL box. It can be something on a UNC or even in Azure:

https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-...

> Beginning with SQL Server 2017 (14.x), the data_file can be in Azure Blob Storage.

You don't need to have access to the MSSQL box to invoke high performance batch loads. UNC should give you what you need if your IT department is willing to work with it.

A4ET8a8uTh0_v2 143 days ago [-]

There are times I miss being a kid and just doing things.

143 days ago [-]

kermatt 143 days ago [-]

While bcp lacks some features to make this as straightforward as PostgreSQL for example, i.e. piped data into bcp, it is a fast ingest option for MSSQL.

We wound up staging a local tab delimited file, and importing via bcp:

    bcp "$DESTINATION_TABLE" in "$STAGE_FILE.dat" -u -F 2 -c -t'\t'

Not elegant, but it works.

qsort 143 days ago [-]

abirch 143 days ago [-]

sceadu 143 days ago [-]

You might be able to do it with ibis. Don't know about the performance though

abirch 143 days ago [-]

Thank you, I'll look into this. Yes performance is the main driver when some data frames have millions of rows.

__mharrison__ 143 days ago [-]

Very cool. Used to be a huge pain to connect to sqlserver from Python (especially non Windows platforms).

qsort 143 days ago [-]

tracker1 143 days ago [-]

IIRC, I had trouble if I installed the MS ODBC driver and some of the updates for Ubuntu (WSL) out of order. I generally prefer a language driver package where available.

Would be nice if MS and Deno could figure things out to get SQL working in Deno.

brewmarche 143 days ago [-]

I have very limited Python experience but I remember using pymssql in the past. Are there any problems with it?

mrweasel 143 days ago [-]

In the later years it became rather good though and with Python 3 many of the issues with character encoding went away.

tugberkk 142 days ago [-]

I use it on Windows (and Linux) but can't get it to work on M1 chips.

yread 143 days ago [-]

iirc it had problems with named instances - only on some newer version of sql server

bormaj 143 days ago [-]

sakesun 143 days ago [-]

wiseowise 143 days ago [-]

https://github.com/microsoft/mssql-python/blob/main/mssql_py...

Is this generated by LLM? Comments are straight out of generic LLM slop.

jollyllama 143 days ago [-]

Oooh, has it got sql_variant support? And how far back does SQL Server compatibility go?

Wojtkie 143 days ago [-]

How is this different than pymssql?

denis_dolya 143 days ago [-]

th0ma5 143 days ago [-]

143 days ago [-]

gigatexal 143 days ago [-]

If MSSQL really wanted to become more mainstream they'd release a properly free version to compete with MySQL/MariaDB and PostgreSQL.

I've not used MSSQL since 2015/2016 and haven't missed much.

Now I live in the OLAP space so I think of it far, far less.

stackskipton 143 days ago [-]

Money here means this won’t happen.

Sure, greenfield does not use MSSQL but there is a ton of companies stuck with MSSQL that will continue to have to fork over big licensing money.

gigatexal 143 days ago [-]

Which will mean it’ll stay irrelevant still.

cafard 142 days ago [-]

stackskipton 143 days ago [-]

Which Microsoft is fine with, they bought Postgres sharding company and Azure supports all the popular DBs so they still get their money.

razakel 142 days ago [-]

SQL Server is hardly an also-ran.