What if you could access 60+ AI models without paying a single subscription fee, all directly inside VSCode? Microsoft's AI Toolkit for VSCode raises three critical questions:
- Is it really free to use?
- What can this extension actually do for you?
- And how can you get started using it?
If you're more of a visual learner or simply want to see how it's done, I recommend checking out my YouTube tutorial. It covers everything step-by-step.
The Evolution
The toolkit is the renamed and improved version of what used to be called 'Windows AI Studio'. Microsoft dropped the 'Windows' part because it works across all operating systems. The toolkit comes with five key features:
- Run small language models like phi 3 completely offline
- Connect to larger models remotely
- Fine-tune models for specific tasks
- Chat interface with file attachment support
- Built-in evaluation system for AI model performance verification
Getting Started
The installation process requires two basic components: VSCode and a GitHub account. The extension can be found in the VSCode Extensions marketplace under "AI Toolkit" or by clicking this link directly: AI Toolkit for Visual Studio Code. The official Microsoft extension is identifiable by the Microsoft publisher name and website.
Model Catalog
The toolkit provides access to 60 different AI models, including offerings from OpenAI, Google, Anthropic, and various open-source options. Although you will need to provide an API key for the commercial models such as Claude from Anthropic and Gemini from Google.
While these models are accessible through GitHub, they come with certain limitations. There are token limits and query restrictions, but the free tier is sufficient for small to medium-sized projects. For extensive usage, like multiple bulk evaluations or large dataset processing, a paid plan might be necessary.
The Playground
The Playground serves as the primary chat interface for model interactions. The interface supports extensive customization through the “Context Instructions” field, where you can define specific personas or styles for the AI’s responses.
The chat interface also accepts file attachments, including text files and images when working with multimodal models. Generated code can be either copied to clipboard or directly saved as a new file through the “New File” button.
Bulk Processing
The Bulk Run feature requires a JSON Lines file format for processing multiple queries. Here’s how to use it:
- Click ‘Import Dataset’ and select your JSON Lines file containing the queries
- Choose your preferred model from the dropdown menu
- Click the ‘Run All’ button to begin processing
- Export results using the “Export” button, which saves the output as a JSON Lines file
Here is an example of the content I have uploade inside a .jsonl
file:
{"query":"Write a detailed recipe for a traditional dish, including ingredients, preparation steps, and cooking tips."}
{"query":"Create a comprehensive workout routine for a specific fitness goal, including exercises, sets, reps, and progression plan."}
{"query":"Design a step-by-step home organization project for a specific room, including decluttering, sorting, and storage solutions."}
{"query":"Develop a weekly meal prep plan including shopping list, preparation instructions, and storage guidelines.”"}
{"query":"Draft a detailed garden planning guide for a specific season, including plant selection, layout, and maintenance schedule."}
Evaluation Features
The evaluation tool requires specific setup steps for accurate model assessment:
- Start by clicking “New Evaluation” and naming your evaluation project
- Select evaluation criteria carefully - ‘Coherence’ and ‘Relevance’ criteria require specific columns (“query” and “response”) in your JSON Lines file
- Choose an evaluation model from the dropdown menu (e.g., “GPT 4o mini”)
- Upload your dataset from previous bulk runs
- Run the evaluation to receive numerical scores
The evaluation system provides scores for different criteria, though the scoring scale documentation needs improvement. While scores like 4.6 for coherence and 5.0 for relevance appear, the scale interpretation (whether out of 5 or 10) remains somewhat unclear.
After doing some digging in Azure OpenAI's evaluation documentation, I'm making an educated guess that these are scores out of 5, which would mean our results are actually quite impressive.
But I want to be transparent - I'm not entirely certain about this interpretation. It would be really helpful if Microsoft could clarify this in their documentation.
Final Assessment
The Microsoft AI Toolkit for VSCode represents a significant advancement in AI accessibility. It uniquely combines local and remote AI model access in one tool, making it particularly valuable for developers. While the bulk run and evaluation features may feel experimental, the playground functionality excels.
For coding assistance, GitHub Copilot remains superior due to better contextual awareness. However, the Microsoft AI Toolkit stands out as a user-friendly option for AI model exploration. One notable consideration is privacy when using remote GitHub-hosted models. For enhanced data confidentiality, the extension can be combined with local models using Ollama.
While not perfect, Microsoft's AI Toolkit marks a significant step toward making AI tools more accessible and user-friendly for the developer community.