● Complete Learning Series Python & AI Engineering for .NET Developers

Python Control Flow & Collections

Tech Buddy June 12, 2026 3 min read

Part of Complete Learning Series Python & AI Engineering for .NET Developers

All lessons

LLMs return JSON. Your RAG pipeline returns lists of retrieved chunks. Your evaluation harness processes arrays of model responses. Python's control flow and collection types are the primary tools you'll use to work with this data — and they're more expressive and compact than their C# equivalents in most AI use cases.

This lesson uses realistic LLM response structures to show how Python's loops, conditionals, and comprehensions handle the data manipulation patterns you'll encounter daily.

Python's Core Collection Types

Python has four built-in collection types. Each maps to a C# equivalent, but with important behavioral differences:

Python	C# Equivalent	Mutable?	Ordered?	Unique Keys?
`list`	`List<T>`	Yes	Yes	No
`tuple`	`ValueTuple / record`	No	Yes	No
`dict`	`Dictionary<K,V>`	Yes	Yes (3.7+)	Keys only
`set`	`HashSet<T>`	Yes	No	Yes

Working with Lists

Lists are the bread and butter of AI pipelines — storing retrieved chunks, model responses, message histories, and evaluation results.

# Simulated LLM batch response (realistic structure)
                      responses = [
                          {"id": "msg_1", "text": "Embeddings are dense vector representations...", "tokens": 42, "finish_reason": "end_turn"},
                          {"id": "msg_2", "text": "RAG stands for Retrieval-Augmented Generation...", "tokens": 38, "finish_reason": "end_turn"},
                          {"id": "msg_3", "text": None, "tokens": 0, "finish_reason": "max_tokens"},
                          {"id": "msg_4", "text": "Vector databases store high-dimensional...", "tokens": 55, "finish_reason": "end_turn"},
                      ]
                      
                      # List operations
                      print(len(responses))         # 4
                      print(responses[0])           # first item
                      print(responses[-1])          # last item
                      print(responses[1:3])         # slice: items at index 1 and 2
                      
                      # Mutating
                      responses.append({"id": "msg_5", "text": "New response", "tokens": 10, "finish_reason": "end_turn"})
                      responses.pop(2)  # remove by index (removes msg_3)

Control Flow: Conditionals and Loops

if / elif / else

Python's conditionals are close to C# but use indentation instead of braces, and elif instead of else if:

def classify_finish_reason(response: dict) -> str:
                          reason = response.get("finish_reason", "unknown")
                      
                          if reason == "end_turn":
                              return "complete"
                          elif reason == "max_tokens":
                              return "truncated"
                          elif reason == "stop_sequence":
                              return "stopped"
                          else:
                              return f"unknown: {reason}"
                      
                      
                      for r in responses:
                          status = classify_finish_reason(r)
                          print(f"{r['id']}: {status}")

for Loops and range()

// Index-based
                      for (int i = 0; i < responses.Count; i++)
                      {
                          Console.WriteLine(responses[i]["id"]);
                      }
                      
                      // For-each
                      foreach (var r in responses)
                      {
                          Console.WriteLine(r["id"]);
                      }

Python

# Index-based (use range)
                      for i in range(len(responses)):
                          print(responses[i]["id"])
                      
                      # For-each (preferred)
                      for r in responses:
                          print(r["id"])
                      
                      # With index (preferred over range+len)
                      for i, r in enumerate(responses):
                          print(f"{i}: {r['id']}")

Useful Iteration Patterns for AI Code

# zip — pair two lists together
                      prompts = ["What is RAG?", "Explain embeddings", "What is a token?"]
                      model_outputs = ["RAG is...", "Embeddings are...", "A token is..."]
                      
                      for prompt, output in zip(prompts, model_outputs):
                          print(f"Q: {prompt}\nA: {output}\n")
                      
                      
                      # Iterating dict items (extremely common with JSON responses)
                      response_meta = {"model": "claude-3-5-sonnet", "usage": {"input_tokens": 100, "output_tokens": 250}}
                      for key, value in response_meta.items():
                          print(f"{key}: {value}")
                      
                      
                      # Early exit with break
                      for r in responses:
                          if r.get("finish_reason") == "max_tokens":
                              print(f"Warning: {r['id']} was truncated!")
                              break  # stop after first truncated response

List Comprehensions: Python's Superpower

List comprehensions are a concise, readable way to filter and transform collections. They're used everywhere in production AI code — in LINQ-style transformations, data cleaning, and result post-processing. Once you internalize them, you'll wonder how you lived without them.

C# LINQ

// Filter complete responses
                      var complete = responses
                          .Where(r => r["finish_reason"] == "end_turn")
                          .ToList();
                      
                      // Extract texts
                      var texts = responses
                          .Where(r => r["text"] != null)
                          .Select(r => (string)r["text"])
                          .ToList();

Python Comprehension

# Filter complete responses
                      complete = [
                          r for r in responses
                          if r["finish_reason"] == "end_turn"
                      ]
                      
                      # Extract texts (non-null only)
                      texts = [
                          r["text"] for r in responses
                          if r["text"] is not None
                      ]

Practical Comprehension Patterns

# 1. Transform: extract and clean text from responses
                      cleaned_texts = [r["text"].strip() for r in responses if r["text"]]
                      
                      # 2. Filter + transform: get high-token responses, normalized
                      long_responses = [
                          {"id": r["id"], "word_count": len(r["text"].split())}
                          for r in responses
                          if r["text"] and r["tokens"] > 40
                      ]
                      
                      # 3. Nested comprehension: flatten a list of chunk lists
                      retrieved_chunks = [["chunk1a", "chunk1b"], ["chunk2a"], ["chunk3a", "chunk3b", "chunk3c"]]
                      all_chunks = [chunk for chunk_list in retrieved_chunks for chunk in chunk_list]
                      # Result: ["chunk1a", "chunk1b", "chunk2a", "chunk3a", "chunk3b", "chunk3c"]
                      
                      # 4. Dict comprehension: build a lookup map
                      response_map = {r["id"]: r["text"] for r in responses if r["text"]}
                      # Use: response_map["msg_1"] → "Embeddings are dense vector representations..."

Dictionaries: Working with LLM JSON

LLM APIs return JSON objects. In Python, these become dictionaries. Knowing the right dict access patterns prevents common bugs:

# Simulated Anthropic API response dict
                      api_response = {
                          "id": "msg_abc123",
                          "type": "message",
                          "role": "assistant",
                          "content": [{"type": "text", "text": "Here's the analysis..."}],
                          "model": "claude-3-5-sonnet-20241022",
                          "stop_reason": "end_turn",
                          "usage": {"input_tokens": 150, "output_tokens": 320},
                      }
                      
                      # Safe access patterns
                      text = api_response["content"][0]["text"]          # raises KeyError if missing
                      stop = api_response.get("stop_reason", "unknown")  # returns default if missing
                      
                      # Nested safe access
                      input_tokens = api_response.get("usage", {}).get("input_tokens", 0)
                      
                      # Check key exists
                      if "usage" in api_response:
                          total = api_response["usage"]["input_tokens"] + api_response["usage"]["output_tokens"]
                      
                      # Destructuring-style unpacking
                      model = api_response.get("model")
                      usage = api_response.get("usage", {})
                      input_t, output_t = usage.get("input_tokens", 0), usage.get("output_tokens", 0)
                      print(f"Model: {model} | Tokens: {input_t} in, {output_t} out")

⚠️ dict vs get() vs setdefault()

Use d["key"] when the key must exist (loud failure is good). Use d.get("key", default) when the key is optional. Use d.setdefault("key", []) to initialize a key only if missing — useful for accumulating results into a dict of lists.

Sets for Deduplication

Sets are underused but invaluable in AI pipelines — deduplicating retrieved document IDs, tracking which chunks have been processed, and enforcing uniqueness in evaluation datasets.

# Deduplicating retrieved document IDs across multiple queries
                      query_results = [
                          ["doc_001", "doc_002", "doc_005"],
                          ["doc_002", "doc_003", "doc_005"],
                          ["doc_001", "doc_004"],
                      ]
                      
                      # Flatten and deduplicate in one step
                      all_doc_ids = {doc_id for results in query_results for doc_id in results}
                      print(all_doc_ids)  # {'doc_001', 'doc_002', 'doc_003', 'doc_004', 'doc_005'}
                      
                      # Set operations — useful for evaluation
                      expected = {"doc_001", "doc_002", "doc_003"}
                      retrieved = {"doc_001", "doc_003", "doc_005"}
                      
                      precision_hits = expected & retrieved   # intersection: {'doc_001', 'doc_003'}
                      missed = expected - retrieved           # difference: {'doc_002'}
                      extra = retrieved - expected            # false positives: {'doc_005'}
                      
                      recall = len(precision_hits) / len(expected)
                      print(f"Recall: {recall:.1%}")  # 66.7%

Sorting, Filtering, and Aggregating

# Sort responses by token count (descending)
                      sorted_responses = sorted(
                          [r for r in responses if r["text"]],
                          key=lambda r: r["tokens"],
                          reverse=True,
                      )
                      
                      # Group responses by finish reason using a dict
                      from collections import defaultdict
                      
                      grouped: dict[str, list] = defaultdict(list)
                      for r in responses:
                          grouped[r["finish_reason"]].append(r)
                      
                      print(dict(grouped))
                      # {'end_turn': [...], 'max_tokens': [...]}
                      
                      # Aggregate stats
                      total_tokens = sum(r["tokens"] for r in responses)
                      avg_tokens = total_tokens / len(responses)
                      max_tokens_response = max(responses, key=lambda r: r["tokens"])
                      print(f"Total: {total_tokens} | Avg: {avg_tokens:.1f} | Max: {max_tokens_response['id']}")

📚 collections.defaultdict and Counter

The collections module has two must-know types for AI data work: defaultdict(list) for grouping without KeyError checks, and Counter for frequency counting (e.g., counting finish reasons, token distributions, or label occurrences in evaluation sets).

Key Takeaways

Python's four core collection types map to C# generics: list → List, dict → Dictionary, tuple → ValueTuple, set → HashSet
List comprehensions replace most LINQ chains — master them early, they're everywhere in AI code
Use dict.get("key", default) for safe access to JSON API responses; use d["key"] when the field is required
Sets are your deduplication tool — use them for document ID tracking, evaluation recall/precision, and deduplicating retrieved chunks
enumerate() replaces index-based for i in range(len(...)); zip() pairs multiple iterables cleanly
collections.defaultdict and Counter are the workhorses of grouping and frequency analysis in AI evaluation pipelines