Costa Rica
Last updated: 2025-08-05
Describe features of computer vision workloads on Azure
Note
The questions and answers provided in this study guide are for practice purposes only and are not official practice questions. They are intended to help you prepare for the AI-900 Microsoft certification exam. For additional preparation materials and the most up-to-date information, please refer to the official Microsoft documentation.
List of References (Click to expand)
List of questions/answers (Click to expand)
- Q1: Identifying AI Services for Sentiment Analysis
- Q2: Extracting Key Phrases from Text
- Q3: Translating Text Between Languages
- Q4: Detecting Language of Text
- Q5: Identifying Objects in Images
- Q6: Extracting Text from Images
- Q7: Analyzing Speech for Keywords
- Q8: Detecting Anomalies in Data
- Q9: Building Conversational Agents
- Q10: Translating Speech in Real-Time
- Q11: Choosing the Right Azure Cognitive Service
- Q12: Identifying AI Services for Document Processing
- Q13: Extracting Data from Business Cards
- Q14: Extracting Data from Medical Forms
- Q15: Extracting Information from Scanned Documents
- Q16: Analyzing Images for Specific Criteria
- Q17: Identifying Computer Vision Tasks
Tip
Azure Computer Vision Services Overview:
| Service | Primary Purpose | Key Capabilities | Use Cases |
|---|---|---|---|
| Computer Vision | Pre-built image analysis | Object detection, OCR, image description | Content moderation, accessibility |
| Custom Vision | Custom image classification | Train custom models, object detection | Brand recognition, quality control |
| Face API | Face detection and recognition | Face detection, verification, identification | Security systems, photo organization |
| Form Recognizer | Document data extraction | OCR, key-value pairs, table extraction | Invoice processing, form automation |
Tip
Computer Vision Capabilities:
| Capability | Description | Output Example |
|---|---|---|
| Image Analysis | Analyze visual content | Tags, categories, descriptions |
| Object Detection | Locate and identify objects | Bounding boxes with labels |
| OCR (Read API) | Extract text from images | Structured text data |
| Spatial Analysis | Analyze people and spaces | People counting, social distancing |
| Image Classification | Categorize entire images | Single category per image |
| Face Detection | Find faces in images | Face coordinates and attributes |
Tip
Custom Vision Training Process:
| Step | Description | Key Considerations |
|---|---|---|
| 1. Create Project | Set up classification or detection project | Choose domain (General, Food, etc.) |
| 2. Upload Images | Provide training images | Minimum 15 images per class |
| 3. Tag Images | Label objects or classify images | Consistent and accurate tagging |
| 4. Train Model | Build custom model | Multiple iterations for improvement |
| 5. Evaluate Performance | Review precision and recall | Threshold adjustment |
| 6. Publish Model | Deploy for prediction | API endpoint creation |
Tip
OCR and Text Extraction:
| Feature | Computer Vision OCR | Form Recognizer |
|---|---|---|
| Best For | Simple text extraction | Structured document processing |
| Supported Formats | Images (JPG, PNG, BMP) | PDFs, Images, Office documents |
| Output | Raw text | Structured data with key-value pairs |
| Languages | 70+ languages | 70+ languages |
| Use Cases | Street signs, handwritten notes | Invoices, receipts, forms |
Tip
Image Classification vs Object Detection:
| Aspect | Image Classification | Object Detection |
|---|---|---|
| Purpose | Categorize entire image | Find and locate specific objects |
| Output | Single label per image | Multiple bounding boxes with labels |
| Training Data | Images with single labels | Images with object annotations |
| Use Cases | Content filtering, photo organization | Security monitoring, inventory tracking |
| Azure Service | Custom Vision (Classification) | Custom Vision (Object Detection) |
Tip
Computer Vision Tasks and Techniques:
| Task | Description | Technical Approach |
|---|---|---|
| Image Classification | Assign category to entire image | Convolutional Neural Networks |
| Object Detection | Locate objects with bounding boxes | YOLO, R-CNN algorithms |
| Semantic Segmentation | Label each pixel | Pixel-level classification |
| Instance Segmentation | Separate individual objects | Object boundaries and classes |
| Face Recognition | Identify specific individuals | Feature extraction and matching |
| Optical Character Recognition | Convert images to text | Text detection and recognition |
Tip
Azure Form Recognizer Models:
| Model Type | Purpose | Key Features |
|---|---|---|
| Prebuilt Invoice | Process invoices | Extract vendor, amount, date |
| Prebuilt Receipt | Process receipts | Extract merchant, total, items |
| Prebuilt Business Card | Process business cards | Extract contact information |
| Prebuilt ID Document | Process ID cards/passports | Extract personal information |
| Custom Model | Domain-specific forms | Train on your specific documents |
| Layout Model | Extract structure | Tables, text, selection marks |
Tip
Face API Capabilities:
| Operation | Purpose | Use Case |
|---|---|---|
| Detect | Find faces in images | People counting, demographics |
| Verify | Compare two faces | Identity verification |
| Identify | Match face to person | Access control, photo tagging |
| Group | Cluster similar faces | Photo organization |
| Find Similar | Find matching faces | Duplicate detection |
Tip
Computer Vision Performance Metrics:
| Metric | Description | When to Use |
|---|---|---|
| Precision | Correct positive predictions / Total positive predictions | When false positives are costly |
| Recall | Correct positive predictions / Total actual positives | When false negatives are costly |
| F1-Score | Harmonic mean of precision and recall | Balanced performance measure |
| Accuracy | Correct predictions / Total predictions | Overall performance |
| mAP | Mean Average Precision | Object detection evaluation |
| IoU | Intersection over Union | Object localization accuracy |
Tip
Image Preprocessing Best Practices:
| Technique | Purpose | Implementation |
|---|---|---|
| Normalization | Standardize pixel values | Scale to 0-1 range |
| Resizing | Consistent input dimensions | Maintain aspect ratio |
| Data Augmentation | Increase training variety | Rotation, flipping, cropping |
| Noise Reduction | Improve image quality | Gaussian blur, median filter |
| Contrast Enhancement | Improve visibility | Histogram equalization |
Tip
Azure Computer Vision Pricing Tiers:
| Tier | Features | Best For |
|---|---|---|
| Free (F0) | Limited transactions | Development and testing |
| Standard (S1) | Full features, pay-per-use | Production workloads |
| Custom Vision Free | Limited projects and images | Learning and prototyping |
| Custom Vision Standard | Unlimited projects | Production custom models |
Tip
Common Computer Vision Challenges:
| Challenge | Description | Solution |
|---|---|---|
| Poor Image Quality | Blurry, dark, or low-resolution images | Image preprocessing, better lighting |
| Occlusion | Objects partially hidden | Multiple viewpoints, data augmentation |
| Scale Variation | Objects at different sizes | Multi-scale training data |
| Lighting Conditions | Varying illumination | Normalize lighting, diverse training data |
| Background Clutter | Busy backgrounds | Focus on object features, segmentation |
Tip
Real-World Computer Vision Applications:
| Industry | Application | Azure Service |
|---|---|---|
| Retail | Product recognition, inventory | Custom Vision |
| Healthcare | Medical image analysis | Computer Vision + Custom models |
| Manufacturing | Quality control, defect detection | Custom Vision |
| Security | Surveillance, access control | Face API, Computer Vision |
| Automotive | Autonomous driving, parking | Computer Vision, Custom Vision |
| Agriculture | Crop monitoring, pest detection | Custom Vision |
Tip
API Response Formats:
| Service | Response Type | Key Fields |
|---|---|---|
| Computer Vision | JSON | categories, tags, description, objects |
| Custom Vision | JSON | predictions, probability, boundingBox |
| Face API | JSON | faceId, faceRectangle, faceAttributes |
| Form Recognizer | JSON | pages, tables, keyValuePairs |
Tip
Integration Patterns:
| Pattern | Description | Benefits |
|---|---|---|
| REST API | Direct HTTP calls | Simple integration, language agnostic |
| SDK Integration | Use official SDKs | Strongly typed, error handling |
| Logic Apps | No-code integration | Visual workflow design |
| Power Platform | Citizen developer tools | Business user friendly |
| Azure Functions | Serverless processing | Event-driven, scalable |
Tip
Security and Compliance:
| Aspect | Implementation | Benefit |
|---|---|---|
| API Keys | Secure key management | Access control |
| VNet Integration | Private network access | Enhanced security |
| Managed Identity | Azure AD authentication | Passwordless authentication |
| Data Encryption | At rest and in transit | Data protection |
| Compliance | GDPR, HIPAA, SOC | Regulatory compliance |
Which service should you use to analyze customer reviews and determine whether they are positive, negative, or neutral?
Options:
- A. Text Analytics
This is the correct answer because Text Analytics is used for analyzing unstructured text data, such as sentiment analysis. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for sentiment analysis. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for analyzing text data. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for sentiment analysis.
Which service should you use to extract important phrases and keywords from a large collection of documents?
Options:
- A. Text Analytics
This is the correct answer because Text Analytics is used for extracting key phrases and keywords from unstructured text data. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for extracting key phrases. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for analyzing text data. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for extracting key phrases.
Which service should you use to translate text from one language to another?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for translating text. - B. Translator
This is the correct answer because Translator is used for translating text between different languages. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for translating text. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for translating text.
Which service should you use to identify the language of a given text?
Options:
- A. Text Analytics
This is the correct answer because Text Analytics includes language detection capabilities. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for detecting the language of text. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for detecting the language of text. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for detecting the language of text.
Which service should you use to identify and label objects in images, such as cars, trees, and buildings?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for identifying objects in images. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for identifying objects in images. - C. Custom Vision
This is the correct answer because Custom Vision allows you to train a custom model to identify and label specific objects in images. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for identifying objects in images.
Which service should you use to extract text from images, such as scanned documents or photos of signs?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for extracting text from images. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for extracting text from images. - C. Optical Character Recognition (OCR)
This is the correct answer because OCR is used for recognizing and extracting text from images. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for extracting text from images.
Which service should you use to analyze spoken language and extract keywords from audio recordings?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for analyzing spoken language. - B. Speech to Text
This is the correct answer because Speech to Text is used for converting spoken language into text, which can then be analyzed for keywords. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for analyzing spoken language. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for analyzing spoken language.
Which service should you use to detect unusual patterns or anomalies in time-series data?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for detecting anomalies in time-series data. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for detecting anomalies in time-series data. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for detecting anomalies in time-series data. - D. Anomaly Detector
This is the correct answer because Anomaly Detector is used for identifying unusual patterns or anomalies in time-series data.
Which service should you use to create a chatbot that can understand and respond to user queries in natural language?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for building conversational agents. - B. Language Understanding
This is the correct answer because Language Understanding (LUIS) is used for building natural language understanding models that can be integrated into chatbots. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for building conversational agents. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for building conversational agents.
Which service should you use to translate spoken language in real-time during a conversation?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, not for translating spoken language. - B. Speech Translation
This is the correct answer because Speech Translation is used for translating spoken language in real-time during conversations. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for translating spoken language. - D. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for translating spoken language.
You are developing a solution to analyze images from grocery stores and identify various fruits and vegetables. The solution will use a custom model. Which Azure Cognitive Services service should you use?
Options:
- A. Form Recognizer
This is incorrect because Form Recognizer is used for extracting text and data from forms and documents, not for identifying objects in images. - B. Face
This is incorrect because the Face service is used for detecting and recognizing human faces, not for identifying objects like fruits and vegetables. - C. Computer Vision
This is incorrect because while Computer Vision can analyze images, it is not specifically designed for custom object identification. - D. Custom Vision
This is the correct answer because Custom Vision allows you to train a custom model to identify specific objects, such as fruits and vegetables, in images.
Tip
Custom Model = Custom Vision
Which service should you use to extract key information such as dates, amounts, and vendor names from scanned invoices?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, such as sentiment analysis and key phrase extraction. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for extracting structured data from scanned documents. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for extracting structured data from scanned documents. - D. Form Recognizer
This is the correct answer because Form Recognizer is designed to extract key information such as dates, amounts, and vendor names from scanned documents like invoices.
Which service should you use to automatically extract contact information such as names, phone numbers, and email addresses from scanned business cards?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, such as sentiment analysis and key phrase extraction. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for extracting structured data from scanned documents. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for extracting structured data from scanned documents. - D. Form Recognizer
This is the correct answer because Form Recognizer is designed to extract contact information such as names, phone numbers, and email addresses from scanned documents like business cards.
Which service should you use to extract patient information such as names, dates of birth, and medical history from scanned medical forms?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, such as sentiment analysis and key phrase extraction. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for extracting structured data from scanned documents. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for extracting structured data from scanned documents. - D. Form Recognizer
This is the correct answer because Form Recognizer is designed to extract patient information such as names, dates of birth, and medical history from scanned documents like medical forms.
Which service should you use to analyze and extract structured data, such as text, key/value pairs, and tables, from scanned receipts?
Options:
- A. Text Analytics
This is incorrect because Text Analytics is used for analyzing unstructured text data, such as sentiment analysis and key phrase extraction. - B. Language Understanding
This is incorrect because Language Understanding (LUIS) is used for building natural language understanding models, not for extracting structured data from scanned documents. - C. Custom Vision
This is incorrect because Custom Vision is used for image classification and object detection, not for extracting structured data from scanned documents. - D. Form Recognizer
This is the correct answer because Form Recognizer is designed to extract text, key/value pairs, and table data from scanned documents, such as receipts and invoices.
You are organizing a photo contest where participants submit images of landscapes. The task is to ensure that only photos with at least one tree and a visible horizon are accepted. Which operation should you use to analyze the images?
Options:
- A. The Verify operation in the Face service
This is incorrect because the Verify operation is used to compare faces, not analyze landscape images. - B. The Detect operation in the Face service
This is incorrect because the Detect operation is used to identify faces in images, not landscape features. - C. The Analyze Image operation in the Computer Vision service
This is incorrect because while the Analyze Image operation provides information about the image, it is not as specific as the Describe Image operation for identifying detailed features like trees and horizons. - D. The Describe Image operation in the Computer Vision service
This is the correct answer because the Describe Image operation provides a detailed description of the image content, including objects like trees and horizons.
Identifying and labeling each pixel in an image to determine whether it belongs to a car, road, or pedestrian is an example of:
Options:
- A. Image classification
This is incorrect because image classification involves categorizing the entire image into a single category, not labeling individual pixels. - B. Object detection
This is incorrect because object detection involves identifying and locating objects within an image, often using bounding boxes, not labeling individual pixels. - C. Optical character recognition (OCR)
This is incorrect because OCR involves recognizing and extracting text from images, not labeling individual pixels. - D. Semantic segmentation
This is the correct answer because semantic segmentation involves classifying each pixel in an image into a category, such as car, road, or pedestrian.