Python Control Flow & Collections

Tech Buddy June 12, 2026 3 min read
Python Control Flow & Collections

LLMs return JSON. Your RAG pipeline returns lists of retrieved chunks. Your evaluation harness processes arrays of model responses. Python's control flow and collection types are the primary tools you'll use to work with this data — and they're more expressive and compact than their C# equivalents in most AI use cases.

This lesson uses realistic LLM response structures to show how Python's loops, conditionals, and comprehensions handle the data manipulation patterns you'll encounter daily.

Python's Core Collection Types

Python has four built-in collection types. Each maps to a C# equivalent, but with important behavioral differences:

Python C# Equivalent Mutable? Ordered? Unique Keys?
list List<T> Yes Yes No
tuple ValueTuple / record No Yes No
dict Dictionary<K,V> Yes Yes (3.7+) Keys only
set HashSet<T> Yes No Yes

Working with Lists

Lists are the bread and butter of AI pipelines — storing retrieved chunks, model responses, message histories, and evaluation results.

# Simulated LLM batch response (realistic structure)
                          responses = [
                              {"id": "msg_1", "text": "Embeddings are dense vector representations...", "tokens": 42, "finish_reason": "end_turn"},
                              {"id": "msg_2", "text": "RAG stands for Retrieval-Augmented Generation...", "tokens": 38, "finish_reason": "end_turn"},
                              {"id": "msg_3", "text": None, "tokens": 0, "finish_reason": "max_tokens"},
                              {"id": "msg_4", "text": "Vector databases store high-dimensional...", "tokens": 55, "finish_reason": "end_turn"},
                          ]
                          
                          # List operations
                          print(len(responses))         # 4
                          print(responses[0])           # first item
                          print(responses[-1])          # last item
                          print(responses[1:3])         # slice: items at index 1 and 2
                          
                          # Mutating
                          responses.append({"id": "msg_5", "text": "New response", "tokens": 10, "finish_reason": "end_turn"})
                          responses.pop(2)  # remove by index (removes msg_3)

Control Flow: Conditionals and Loops

if / elif / else

Python's conditionals are close to C# but use indentation instead of braces, and elif instead of else if:

def classify_finish_reason(response: dict) -> str:
                              reason = response.get("finish_reason", "unknown")
                          
                              if reason == "end_turn":
                                  return "complete"
                              elif reason == "max_tokens":
                                  return "truncated"
                              elif reason == "stop_sequence":
                                  return "stopped"
                              else:
                                  return f"unknown: {reason}"
                          
                          
                          for r in responses:
                              status = classify_finish_reason(r)
                              print(f"{r['id']}: {status}")

for Loops and range()

C#
// Index-based
                          for (int i = 0; i < responses.Count; i++)
                          {
                              Console.WriteLine(responses[i]["id"]);
                          }
                          
                          // For-each
                          foreach (var r in responses)
                          {
                              Console.WriteLine(r["id"]);
                          }
Python
# Index-based (use range)
                          for i in range(len(responses)):
                              print(responses[i]["id"])
                          
                          # For-each (preferred)
                          for r in responses:
                              print(r["id"])
                          
                          # With index (preferred over range+len)
                          for i, r in enumerate(responses):
                              print(f"{i}: {r['id']}")

Useful Iteration Patterns for AI Code

# zip — pair two lists together
                          prompts = ["What is RAG?", "Explain embeddings", "What is a token?"]
                          model_outputs = ["RAG is...", "Embeddings are...", "A token is..."]
                          
                          for prompt, output in zip(prompts, model_outputs):
                              print(f"Q: {prompt}\nA: {output}\n")
                          
                          
                          # Iterating dict items (extremely common with JSON responses)
                          response_meta = {"model": "claude-3-5-sonnet", "usage": {"input_tokens": 100, "output_tokens": 250}}
                          for key, value in response_meta.items():
                              print(f"{key}: {value}")
                          
                          
                          # Early exit with break
                          for r in responses:
                              if r.get("finish_reason") == "max_tokens":
                                  print(f"Warning: {r['id']} was truncated!")
                                  break  # stop after first truncated response

List Comprehensions: Python's Superpower

List comprehensions are a concise, readable way to filter and transform collections. They're used everywhere in production AI code — in LINQ-style transformations, data cleaning, and result post-processing. Once you internalize them, you'll wonder how you lived without them.

C# LINQ
// Filter complete responses
                          var complete = responses
                              .Where(r => r["finish_reason"] == "end_turn")
                              .ToList();
                          
                          // Extract texts
                          var texts = responses
                              .Where(r => r["text"] != null)
                              .Select(r => (string)r["text"])
                              .ToList();
Python Comprehension
# Filter complete responses
                          complete = [
                              r for r in responses
                              if r["finish_reason"] == "end_turn"
                          ]
                          
                          # Extract texts (non-null only)
                          texts = [
                              r["text"] for r in responses
                              if r["text"] is not None
                          ]

Practical Comprehension Patterns

# 1. Transform: extract and clean text from responses
                          cleaned_texts = [r["text"].strip() for r in responses if r["text"]]
                          
                          # 2. Filter + transform: get high-token responses, normalized
                          long_responses = [
                              {"id": r["id"], "word_count": len(r["text"].split())}
                              for r in responses
                              if r["text"] and r["tokens"] > 40
                          ]
                          
                          # 3. Nested comprehension: flatten a list of chunk lists
                          retrieved_chunks = [["chunk1a", "chunk1b"], ["chunk2a"], ["chunk3a", "chunk3b", "chunk3c"]]
                          all_chunks = [chunk for chunk_list in retrieved_chunks for chunk in chunk_list]
                          # Result: ["chunk1a", "chunk1b", "chunk2a", "chunk3a", "chunk3b", "chunk3c"]
                          
                          # 4. Dict comprehension: build a lookup map
                          response_map = {r["id"]: r["text"] for r in responses if r["text"]}
                          # Use: response_map["msg_1"] → "Embeddings are dense vector representations..."

Dictionaries: Working with LLM JSON

LLM APIs return JSON objects. In Python, these become dictionaries. Knowing the right dict access patterns prevents common bugs:

# Simulated Anthropic API response dict
                          api_response = {
                              "id": "msg_abc123",
                              "type": "message",
                              "role": "assistant",
                              "content": [{"type": "text", "text": "Here's the analysis..."}],
                              "model": "claude-3-5-sonnet-20241022",
                              "stop_reason": "end_turn",
                              "usage": {"input_tokens": 150, "output_tokens": 320},
                          }
                          
                          # Safe access patterns
                          text = api_response["content"][0]["text"]          # raises KeyError if missing
                          stop = api_response.get("stop_reason", "unknown")  # returns default if missing
                          
                          # Nested safe access
                          input_tokens = api_response.get("usage", {}).get("input_tokens", 0)
                          
                          # Check key exists
                          if "usage" in api_response:
                              total = api_response["usage"]["input_tokens"] + api_response["usage"]["output_tokens"]
                          
                          # Destructuring-style unpacking
                          model = api_response.get("model")
                          usage = api_response.get("usage", {})
                          input_t, output_t = usage.get("input_tokens", 0), usage.get("output_tokens", 0)
                          print(f"Model: {model} | Tokens: {input_t} in, {output_t} out")

Sets for Deduplication

Sets are underused but invaluable in AI pipelines — deduplicating retrieved document IDs, tracking which chunks have been processed, and enforcing uniqueness in evaluation datasets.

# Deduplicating retrieved document IDs across multiple queries
                          query_results = [
                              ["doc_001", "doc_002", "doc_005"],
                              ["doc_002", "doc_003", "doc_005"],
                              ["doc_001", "doc_004"],
                          ]
                          
                          # Flatten and deduplicate in one step
                          all_doc_ids = {doc_id for results in query_results for doc_id in results}
                          print(all_doc_ids)  # {'doc_001', 'doc_002', 'doc_003', 'doc_004', 'doc_005'}
                          
                          # Set operations — useful for evaluation
                          expected = {"doc_001", "doc_002", "doc_003"}
                          retrieved = {"doc_001", "doc_003", "doc_005"}
                          
                          precision_hits = expected & retrieved   # intersection: {'doc_001', 'doc_003'}
                          missed = expected - retrieved           # difference: {'doc_002'}
                          extra = retrieved - expected            # false positives: {'doc_005'}
                          
                          recall = len(precision_hits) / len(expected)
                          print(f"Recall: {recall:.1%}")  # 66.7%

Sorting, Filtering, and Aggregating

# Sort responses by token count (descending)
                          sorted_responses = sorted(
                              [r for r in responses if r["text"]],
                              key=lambda r: r["tokens"],
                              reverse=True,
                          )
                          
                          # Group responses by finish reason using a dict
                          from collections import defaultdict
                          
                          grouped: dict[str, list] = defaultdict(list)
                          for r in responses:
                              grouped[r["finish_reason"]].append(r)
                          
                          print(dict(grouped))
                          # {'end_turn': [...], 'max_tokens': [...]}
                          
                          # Aggregate stats
                          total_tokens = sum(r["tokens"] for r in responses)
                          avg_tokens = total_tokens / len(responses)
                          max_tokens_response = max(responses, key=lambda r: r["tokens"])
                          print(f"Total: {total_tokens} | Avg: {avg_tokens:.1f} | Max: {max_tokens_response['id']}")

Key Takeaways

  • Python's four core collection types map to C# generics: list → List, dict → Dictionary, tuple → ValueTuple, set → HashSet
  • List comprehensions replace most LINQ chains — master them early, they're everywhere in AI code
  • Use dict.get("key", default) for safe access to JSON API responses; use d["key"] when the field is required
  • Sets are your deduplication tool — use them for document ID tracking, evaluation recall/precision, and deduplicating retrieved chunks
  • enumerate() replaces index-based for i in range(len(...)); zip() pairs multiple iterables cleanly
  • collections.defaultdict and Counter are the workhorses of grouping and frequency analysis in AI evaluation pipelines