Azure Confidential Computing updates with Mark Russinovich | Best of Microsoft Ignite 2018


Coming up we’re joined by none other
than Mark Russinovich to take an early look at Microsoft’s leadership role
within confidential computing which opens up new opportunities for the
processing of sensitive data in the cloud we’ll explain the core concepts and
demonstrate the underlying tech including the latest silicon based
approaches with Intel SGX the new Azure confidential computing DC VM series and
a new software development kit that makes it easier to create apps that keep
your important data and algorithms confidential during computation so I’m
joined today by the CTO of Azure Mark Russinovich welcome back to Microsoft
Mechanics thanks it’s good to be back yeah come on so Mart you’ve been a leader in the
security space for a while and the concepts of confidential computing in
the cloud is something that you’ve personally been spearheading so can you
explain what it is and why it’s so important for people sure well in Azure
we’ve adopted a mindset of assume breach so which means that we assume that
hackers are going to get into the infrastructure that potentially there
might be malicious insiders even in Azure so and and our ultimate goal is to
protect customer data so the ways that we’ve done that in Azure are by
implementing technologies like encryption of data arrest including with
server-side manage keys as well as bring your own keys for a number of our
services we also encrypt data in transit or an customers can also encrypt their
own data in transit using protocols like TLS but the missing piece for really
protecting data is encryption while in use and what that means is while you’re
processing the data it’s typically out in the clear in the processor in the
memory and in the caches and that makes it subject to attacks it makes it
subject to a logical attacks with hosts administrative processes that can access
that memory and it makes it susceptible to physical attacks okay so how do we
deliver on confidential computing in Asia so this is a project that we’ve
been working on for about five years now and it’s been that collaboration between
Microsoft Research Windows and Azure looking at how we can make confidence or
computing practical how we can deliver this encryption of data while in use can
you help us understand a little bit more about the some of the mechanics behind
the trusted execution environment some of the things that we can help the
audience understand a little bit more depth yeah so the the fundamental
concept behind the confidence our computing is this concept of a trusted
execution environment or te you can think of it as a black box and in that
black box you can put some code you can put some data and from the outside of
the black box you can’t see anything inside of it you can’t see the memory
which is encrypted you can’t mess with the code but you can get an attestation
from the Eee provider as to what code is running inside that black box so if
there’s a te executing and you trust whatever provider has created that te
and I’ll talk about those in a minute you can gain trust and that code that’s
running inside it and know that if you release data to it in the me by means of
giving it a key that will allow it to decrypt data into that black box and
then process it right then you can be assured that your data are going to be
protected while it’s in use but the dates it’s not exposed to the cloud
provided specifically that’s right these tes are on claims or on claims as the
industry calls them are protected depending on whether they’re implemented
in software or Hardware which I’ll talk about from everything including physical
attacks or everything including administrators on the box okay so how do
we create these trusted execution environments so first I’ll talk a little
bit about the hardware based version right and with a hard based version it’s
a te that’s implemented by the CPU the CPU creates this Enclave and that
Enclave is running in a special mode of the CPU that’s inaccessible even from
other all other privileged modes of the CPU there’s memory that’s encrypted
which is where the data of the Enclave is stored and the only code that’s
allowed to access that data is code inside of the Enclave that that memory
is attached to right mentioned an example of this Hardware technology is
Intel SGX okay that’s one of the first of its kind in the industry that’s right
okay so what about then the next level of software layer so I mentioned that
there’s two kinds I thought the hardware kind there’s also the
software con software enclaves can be implemented for example by a hypervisor
and we introduced something called virtual secure mode which is now called
virtualization based security in Server 2016 and that’s where the hypervisor
itself creates essentially a virtual memory partition that it protects from
everything including the hosts partition including somebody that owns the box ok
and you can use the hypervisor to put code and data into that Enclave and then
get an attestation from the hypervisor as to what code is actually running in
that box but in both cases the data in the code is ok it’s protected from
everything except for the hypervisor this very thin layer yeah
it’s protected from ok great so what are we doing to make it easier to harness
this environment as apps are built and run in Asia so we’re starting at the
hardware level with release of specialized VMs VMs first in the public
cloud to offer Intel SGX servers so this has been through a strong collaboration
with Intel of getting this Hardware early and making available and so we’ve
make that available to you and we fact we announce public preview of it this
week here at a night of these new DC series virtual machines that have SDX
enabled in them we’re also building tooling that allows developers to go
create applications that can take advantage of these tes because what you
need to do is spit split your application into the part that runs
outside of the Enclave and the part that runs inside the part outside of course
is less trusted and can be messed with and administrator can I get access to it
so the sensitive data processing that portion of it you run inside the Enclave
today enclaves are very limited in size and capability and so which is the
reason why you want to split your application up that way plus it reduces
the attack surface that that data is exposed to by just putting the code that
needs to manipulate that data inside the t’ee right so so to clarify people
specialized virtual machines at the compute layers there’s lots of great
innovation happen in the hardware layer as well yep cool ok so when we when we
think about the chipset hardware that’s something that Microsoft hasn’t
typically been involved in historically it hasn’t but in this case imagers
getting more and more involved in hardware design and in the case of tes
we’ve been working with Intel closely like I mentioned influencing the
capabilities that go into SGX to make sure that they can be leveraged easily
by applications but we also work with AMD and arm on their te implementations
as well okay so just so it’s just a revisit again going back to the virtual
machine side as well it says a use cases to help make it real for people
what kind of applicable use cases are you seeing out there that can really fit
with this yeah so there’s a couple of very clear use cases the obvious one is
I’ve got sensitive data that I want to put in the cloud and I want it protected
so it’s whether it’s sensitive IP or customer data and in some cases we’re
hearing some of our customers that want to take advantage of confidence for
computing it’s their own data it’s their own data that they’ve collected that on
behalf of their customers for example geo tracking of their users and they
want to make sure that not even their own administrators can you have access
to that data so they want it protected in these confidential enclaves the other
case that I’m going to talk about and demonstrate actually is the ability to
perform computation on datasets from multiple parties right in a way that
doesn’t expose the data to the other party so if you’ve got two companies for
example maybe each have their own data sets they want to reason over those data
sets perform analytics or machine learning training on those data sets
they don’t want that data set exposed to the other company confidential computing
allows them to trust the code that’s in the Enclave not to leak secrets not to
leak their code and then they can’t afford that data why don’t we take a
look at an example can you show us some some of this in action yep so what
you’re seeing here is I’m sitting on the sequel server itself and this sequel
server is outside the Enclave where the sequel query processor is executing
somebody has uploaded a bunch of data into that sequel server that is
encrypted Social Security numbers and salaries you can see here there’s a web
portal even the sequel administrator sitting right there on the box has no
access to what’s inside of the Enclave just encrypted data but somebody outside
running that caught that web application on the outside that web application has
established trust with sequel server verified that it is on an SGX processor
and then released the key to it and at that point the web browser can start
executing sequel queries that come back in the clear we because it’s been
encrypted just so the client can see it the sequel administrator on the box
can’t see what’s going like I mentioned inside of it you can
see the magic sauce here is this encrypted search pattern which has the
filters for what salaries are the client is looking for and then what comes back
is that encrypted result data that is shown in the web console you can see
that doing that locally on the box without the right key and you just get
encrypted garbage awesome so we’ve mentioned before sequel always encrypted
we’ve had that for a while but now it does yeah we call this sequel always
encrypted via next or next there’s actually been a session about it here
it’s gonna be available in Azure database on the cloud as well as sequel
server on-premises so if you’ve got a SDX server you can take advantage of
this as well excellent confidential computing isn’t necessarily
for everyone’s everyday workloads so what scenarios make sense for
confidential computing yeah well so what one of the things I mentioned is storing
data like you said like you just saw in the sequel case sticking the data in the
Enclave and then doing processing on it from the outside and say that was that’s
a sensitive data there’s social security numbers and salaries you don’t want
exposed to anybody and then like I mentioned to the other one is the
multi-party case and one of the scenarios that we’ve seen that is
resonated a lot is multi-party machine learning where two organizations put
their data together and get better results than if they just trained over
their own data and one of the great scenarios we see interest in is from the
healthcare industry which has typically patient data that they can’t share with
other hospitals or agencies for regulatory reasons or just for wanting
to concern about the privacy of their patients those two hospitals might have
collected a bunch of data from their patients individually they result in
machine learning models that are effective but combined they can even be
more effective because machine learning is fed by data so the board data you get
the better the results in this case both hospitals would establish trust in the
machine learning model running inside the Enclave and then establish their
care connections deploy their encrypted data into the
Enclave each one of them the enclaves would have the keys to decrypt the data
handed to them by the hospitals and then the Enclave can decrypt both datasets
and run machine learning training over oh sorry and that’s what actually what I
can show you a demonstration of excellent so let’s uh here we’ve got
hospital a and this is you can think of this as hospital a
workstation where they’re gonna be encrypting their data set this is a
breast cancer training set a data set they’re going to encrypt it they’re
going to go to the U X the portal for the confidential computing machine
learning service which is running in an azure confidence or competing virtual
machine right sign in and then upload that encrypted data so it’s a relatively
streamlined process for an organization to it to harness this that’s right and
then we’re gonna go to hospital B which I’ve I’m simulating their workstation as
a different desktop and do the same thing we’re gonna upload the data
encrypt it and we’re gonna sign in this hospital B sign in upload their data set
and then there’s going to be training that happens and if we go back to
hospital a you can see the dramatic difference between computing over one of
these versus computing over both because if I go to the tour download the model
here and I go to evaluate the model copy this
here this is the evaluation of just user age data if we trained over that you can
see it’s an 84 percent accuracy rate over that data predicting breast cancer
now over a and B’s data we get a 97% accuracy and neither Hospital A or B
exposed their data to each other it was only processed inside and I clave in
fact it wasn’t even exposed to the cloud provider or the administrator of the
server or the hardware itself so serene this could be groundbreaking in the
field of medical and other research and you can leverage your sensitive data
without exposing it as a diverter and as a developer you can build proprietary
and protected algorithms that are not exposed to other parties oh that’s right
yeah now you mentioned a few times that
you’re not necessarily locking down data processing for the whole app so even how
do you even begin to construct an application to take advantage of
confidential computers yes what we’re looking at here is what’s called the
host code so this is sitting outside the on play of the code that’s sitting
outside Enclave this is one of our samples from the SDK that we’re
releasing this SDK by the way is completely open-source and works with
the SGX enclaves in both debug and production mode and you can see code
here that is creating enclaves it’s creating two of them now in this
particular example it’s creating two long claves it’s having one of the
enclaves generate some data that’s going to be only visible that Enclave then
it’s getting a key from Enclave to getting an attestation that it Enclave
tooth trusted SGX handing that key to umm clave one so this is an unclaimed to
split key Enclave one that encrypts the data so that only Enclave two can see it
and then that is that transmission is proxied through the host that’s what
this code is doing excellent and so you can see the storing the public key of
Enclave two inside of Enclave 1 getting the public key of Enclave 2 and then
generate asking Enclave 1 to encrypt the data and then process the data and you
can see that these are called the e calls which are Enclave calls so these
particular calls of like a transferring the data into an enclave one that is
done through one of these II call z’ so as their call to have Enclave one
encrypt the data and that’s what this is a we call
Enclave and if we go over to this right here you can say this is the code that
you would write that is the trusted portion of the code this is the Enclave
side of that generate encrypted data function which you can see is just
creating a random buffer of data and then encrypting it and then returning
that encrypted data back out that’s encrypting it with its own key that it’s
generated that is visual studio code its ventually going to support the creation
of VSM enclaves as well as well as on place for other hardware providers and
it’s going to support other runtimes on top of C++ okay cool so for both Linux
and Windows I want to make that clear to the initial release is just Linux which
shows you Microsoft loves Linux we’re releasing the Linux version first we’ll
release that Windows version in a few months awesome good stuff so we’ve
covered a lot of the core concepts of confidential computing and we’ve seen
how we can refactor apps to take advantage of it but what’s next so we’ve
been working with MSR one of the goals here like I mentioned is to protect data
and we want to protect the data against side channel leaks we also want to
protect the data from leaking outside because of bugs in the code and so
Microsoft Research has been working on compiler tooling to help prevent those
bugs from getting in to the code and from side channels leaking data outside
the code and like I mentioned we’re also working with additional hardware
providers we’re working on attestation services that make it easier to verify
what’s in there and maybe get extra claims about what’s in there like is
this SGX machine running in u.s. because you might say the data needs to stay
within the US or Europe or I see that those could be additional claims awesome
excellent so thank you so much for joining us mark and of course keep
watching Microsoft Mechanics so the latest in tech updates follow us on
Twitter thanks for watching bye for now

One thought on “Azure Confidential Computing updates with Mark Russinovich | Best of Microsoft Ignite 2018

  1. Spot on! Data encryption while compute is the missing piece! Excellent! [oops! should have watched more of the vid..this is not total encryption-while-compute in that model even the MP does not know what it's computing because its code is encrypted]

Leave a Reply

Your email address will not be published. Required fields are marked *

Copyright © 2019 Explore Mellieha. All rights reserved.