Machine Semiosis

About Machine Semiosis

Space is no longer apprehended as a purely physical or experiential entity; it is often conceived as a domain systematically transformed into distinct types of information. As Adriaans (2021) outlines, information can be categorized into three forms: Information-A, referring to meaningful, logically structured content comprehensible to human cognition; Information-B, derived from Shannon’s model, which reduces signals to statistical units measured by probability and entropy; and Information-C, which concerns algorithmic compressibility and computational complexity. Through digital representation, spatial environments are increasingly reconstituted within these informational frameworks. Visual elements such as images may be converted into tensors, colors into numerical matrices, and sensory experience into RGB channels. In this process, space does not necessarily persist in its full phenomenological form but is often reduced, at least in operational terms, to the mode of its representation.

Neural networks may exemplify the statistical abstraction. They operate by relating features not through meaning, but through statistical proximity: attention mechanisms identify which signals co-occur frequently, but not why they matter. The internal architecture of neural networks is built from previous data associations; space becomes a matrix of values, where each encoded layer corresponds not to lived depth but to algorithmic salience. Generative Adversarial Networks (GANs), while capable of producing photorealistic imagery, function only within the logic of singular frames and fixed perspectives. Their outputs simulate realism but never reconstruct the multiplicity of a lived spatial encounter. As the outputs of GANs, they reflect only one sensory modality, namely vision, and fail to approximate the richness of a space that is also touched, heard, smelled, and embodied.

What results from the spatial reduction is not spatial knowledge in the epistemic sense, but a relational encoding, a form of mapped cognition, where numerical proximity substitutes for semantic interpretation. The process entails loss at every stage: what is encoded is always less than what is lived. Each translation involves abstraction, omission, and flattening. Even the RGB values used to express spatial textures participate in a hierarchy of attention: low-level features like color gradients are transformed into high-level features like object labels, yet the structure of these mappings is governed entirely by pre-trained associations (Newman, 2024).

The reduction brings into focus the epistemological opacity of such processes. As Harley (1989) contends, mapping has become a black box—a system whose operations, assumptions, and exclusions are inaccessible to the very actors who rely on its outputs. The same may apply to artificial intelligence models. Although their results can be interpreted, the processes by which those results are produced are rarely understood. Transformer-based attention mechanisms are often praised for their accuracy, but what is produced is not meaning. It is a statistically optimized correlation. In this sense, attention itself becomes a form of mapping, yet one that encodes without disclosing. The ontology of space is thereby restructured: it is no longer lived, but parsed, no longer embodied but calculated. In doing so, it establishes what is important according to its internal logic, thereby also determining what qualifies as “information”. This transformation is grounded in the principle of relationality. As Acar (2019) argues, mapping is essentially an act of linking, comparing, and organizing; it often constructs the structural and semantic relationships through which knowledge becomes legible. No informational unit stands alone; it acquires meaning only in context, through association. Patterns are reinforced not because they hold intrinsic meaning, but because they appear with sufficient frequency to be recognized as significant by the model. What is prioritized is not semantic relevance, but statistical consistency. These systems do not evaluate what matters in a human sense; rather, they are calibrated to detect and amplify recurrence. Accordingly, AI outputs are not epistemically neutral, as they are shaped by the assumptions embedded in the training data and the design choices underlying the model (Newman, 2024; Vaswani et al., 2017).

Computer vision refers to a subfield of artificial intelligence that enables machines to interpret and extract information from visual data, such as images or videos. It typically involves detecting patterns, segmenting objects, labeling features, and generating representations based on pixel-level input. However, beyond its technical aims, the process of visual interpretation performed by such systems invites broader reflection on questions of signification and meaning. In this context, what might be termed ”machine semiosis” offers a useful conceptual lens. Machine semiosis refers to the idea that algorithmic systems may not merely record visual input but instead participate in the active production of signs. When a convolutional network (CNN) processes an urban streetscape, it begins by extracting numeric features—edges, gradients, and histograms. These raw values are quickly inscribed with semantic labels such as “graffiti,” “window,” or “vacant lot.” This two-stage translation mirrors classical semiotics: a denotative layer of measurable form is overlaid by a connotative layer of meaning, itself conditioned by prior datasets As a result, the algorithm’s gaze is never neutral; it foregrounds the visible where examples are abundant and renders invisible the different, the unstructured or the undocumented in Figure 2.9.10 The dashboards and maps generated downstream inherit these visual hierarchies, reinforcing them through the illusion of neutrality. Recognizing this dynamic expands critical cartography into the domain of AI. It requires us to ask not only what is being represented, but how and why such representations are constructed. Auditing datasets, making architectural assumptions explicit, and integrating plural epistemologies become essential to producing accountable spatial analytics. The task is not to abandon automation, but to situate it within the politics of representation—transforming machine vision from an inscrutable oracle into a reflexive partner in understanding the complexities of urban life.

Figure 2.9: Heatmap comparison between two urban scenes as interpreted by DINOv2. In the
first row, the model clearly attends to familiar architectural features such as window bars and
facade symmetry. In the second row, although there is a visible opening similar to a window,
the model’s attention is less focused and more dispersed.

Machine vision in contemporary AI does not simply capture spatial reality; it may actively construct it through mechanisms of attention, statistical association, and inherited classifications. Rather than producing spatial knowledge in a phenomenological sense, it generates computational abstractions shaped by prior data. This critical view of machine vision may invite a reconsideration of how algorithmic systems render space intelligible and to what extent such renderings can be treated as meaningful reflections of urban reality. What becomes visible, what remains obscure, and how these distinctions are constructed will remain open questions throughout the analytical process. Street View imagery has transitioned from a centralized visualization tool to a distributed infrastructure for spatial data production. Its expansion through diverse capture methods and participatory platforms reflects a broader shift toward operational and algorithmic modes of mapping. Yet, this apparent democratization raises critical concerns regarding authorship, labor, and representational politics. As visual geographies become increasingly shaped by platformmediated systems, ensuring transparency, equity, and accountability in their design and governance remains essential.

From my thesis - Mapping urban change using street view imagery and computer vision: case of
Karaköy