TL;DR - sign into the Azure portal, create a new resource, choose the Data Science Virtual Machine 18.04 - set a resource group, name the VM then choose a spot VM and look through the regions for the best price. Set your ssh, skip through the other pages and click create on the review page - wait 3 minutes! Then add the fastai stuff

  1. Step by Step
  2. Getting Started with your VM
  3. Basics
  4. Disks
  5. Networking
  6. Management
  7. Advanced
  8. Tags
  9. Review and Create
  10. Connect
  11. Course Configuration
  12. References

Step by Step

If anyone was thinking of using Azure VMs for this fastai course then hopefully this post will help! Please be aware that VMs with GPU are not available for some of the ‘free’ Azure offers. There may also be restrictions on the availability for certain regions too. I cover the three pricing options that you might find for the low end GPU base VMs – the NC6 – and suggest the Data Science Virtual Machine for Linux (Ubuntu) as a great starting point,

  • Full price – most flexibility but highest price
  • Promo price – special offers, usually on older hardware – same VM as the full price one
  • Spot price – lowest price, but not guaranteed availability – your VM can be taken away at a moment’s notice. Not great for long running processes, but usually ok for short runs. Data is persisted even if the VM is taken away – but obviously anything in memory in a current run is lost. Subscription credit options – like a Visual Studio account cannot usually be used for Spot prices.
  • Billing and Cost Management
Be sure to STOP your VM through the portal, and not just shut down - to ensure you are not charged. Unless it shows deallocated you are being charged!

Getting Started with your VM

Go to https://portal.azure.com and log in to (or sign up for) your Azure account.

Basics

To create a Data Science VM click Create a resource and search for data science – and choose the Data Science Virtual Machine- Ubuntu 18.04 (The one that doesn’t say 18.04 is 16.04 but does have a few kore items added such as SQL Server)

Click Create rather than the pre-set options The next screen gives the various details needed to configure the VM – and choices such as location, spot instance or not – and the authentication options. I’ll step though the important options one at a time.

If you have more than one Azure subscription you can choose from the drop down. The 2nd option is choosing or creating a resource group. Think of this as a way to group all the resources you may be using – VM, additional disks, virtual networks… It is convenient to have a group when you want to delete everything at the end – you can just delete the resource group. I’m going to use rg-brismithfastai The virtual machine name needs to be unique – and will also be used as the host name for the server. Mine will be called vmbrismithfastai. For the region there is an advantage obviously in choosing a region near you – for lower latency – but you may need to look at other regions to get the class of server you need. I’ll start with West US. Click here to get an idea which regions have the NC series VMs. To keep the price down I’ll not choose any availability options – but for production use these options give redundancy either within or across regions as well as scaling options. The image is pre-populated as the DSVM. I’ll be choosing a spot instance – but I’ll first show the other priced alternatives. The default shown is non-GPU – DS3 v2 - $213.89/month in the region I’m looking at. Leaving the spot instance as No and clicking Change size allow a view across all options. Adding NC into the search box limits my result set to the NC range I’m interested in.

It will vary by region, but I am seeing the full price as over $650 per month (There is also a promo price of less than 300). If I didn’t want the go for the spot price the promo one would be my next choice (and the idea is not to leave it running 24/7 – so the real price should be well under the 300 anyway). But I’m going to take a look at spot option next.

The options for spot allow either the lowest cost option – capacity only – or you can name your (higher) price to improve your chances of not being evicted! I’ll stick with the first option. As you can see there is only a Stop/Deallocate option – which means your VM will persist data when you are evicted. The delete option is fine for compute only nodes – but doesn’t make sense when we need to keep our data. Click Change size again and we see our spot pricing – and for our NC6 it is under $0.12 an hour! Almost tempted to step up to the NC12, but for now I’ll click NC6 and Select at the foot of the page.

The next option is the administrator account and the authentication type. SSH is the preferred option here, and there is a,so a link on the page to help if you are not familiar with SSH. If you are just doing the course work you may be comfortable with using a username/password instead. At this point my VM configuration looks like this – next we go to Disks.

Disks

For disks I’m going to stick with the default Standard SSD. This should give us enough space to work through the course work – if you were planning on doing much larger datasets then you can add additional data disks here – then you’d need to configure them once the VM was provisioned. Be aware that for disks you will be paying even when the VM is not running – so just get what you need. Or keep your data somewhere else in the cloud and pull in as needed. If you already use Azure you can also attach an existing disk.

Networking

For Networking I’ll keep the defaults – you can see it bases the resource names on my vm name name. It also tells you that Network Security Group (NSG) rules are pre-configured for this VM. These are rules that govern the ports that can be accessed - usually the initial ones for ssh and Jupyter.

Management

The Management section has some very useful options. Again, I’ll take the defaults – which gives me boot diagnostics but not the detailed and guest diagnostics. I’ll also forego the managed identity and Azure Active Directory – but useful to know they exist. A very cool option is the auto-shutdown – where I can set a time that the VM will be shut down if it still happens to be running – a great cost saver (and it can be set to notify). I’ve set mine to 10pm PST for now - but I’m guessing I might have some later nights!

Advanced

Under Advanced there aren’t any options I need – but for production use it is good to see I could set proximity placement if required – to make sure latency between servers wasn’t an issue.

Tags

I don’t need any tags set – these can be useful if you have hundreds of Azure resources and you want another grouping mechanism above resource groups.

Review and Create

The final page validates all the entries and checks that your subscription doesn’t have any limits or quotas (mine failed first time around as I’d set a Visual Studio subscription that does not look like it supported GPUs…). You can also see at the foot of the page an option to save a template so you can easily create another similar VM. These are often referred to as ARM templates – Azure Resource Manager.

Click Create – and wait (for me usually just 2 or 3 minutes) – and you will soon have a VM – ready to load the fastai stuff!

Connect

Once the deployment is complete then you can go to the resource, where you will find the details of the resource such as the public IP address. I usually use X2Go client configured to use XFCE and then just need to set the ssh private key – but for now I’ll just connect via ssh from my Window 10 PowerShell terminal prompt.

Click Connect – set a path to your ssh private key (or set your user name and password if you took that option) – and you are done! I copied the ssh command into my PowerShell terminal window and connected!

Course Configuration

See the fastai forums for setup details - I’ll update here when the course is public

The using the default JupyterHub (https://:8000) should work and you will see nbs amongst the other samples in the DSVM if you ensure they are under notebooks from your home directory

See the revised CLI configuration settings too in the fastai forum article Setup Help - in the Azure section.

References

  • Visual Studio plans - https://azure.microsoft.com/en-us/pricing/member-offers/credit-for-visual-studio-subscribers/#subscription-pricing
  • Spot VMs - https://docs.microsoft.com/en-us/azure/virtual-machines/linux/spot-vms
  • SSH - https://docs.microsoft.com/en-us/azure/virtual-machines/linux/mac-create-ssh-keys
  • Disks - https://docs.microsoft.com/en-us/azure/virtual-machines/linux/managed-disks-overview
  • Azure Billing - https://aka.ms/billingfaq