Research Note

Identifying the Digital Self

[1]When individuals, organisations and other entities are represented within a digital system, the design and emulation of this representation is called a digital identity. Digital identity is a multifaceted socio-technical[2] construct that facilitates online interactions and transactions, serving as a virtual representation of an entity. It plays a critical role in enabling a wide range of activities in the digital environment, from personal communication to professional engagements, and is subject to concerns related to privacy, security, resilience and authenticity. Despite the development of sophisticated cryptographic systems and security practices, and widespread multi-decade efforts to deploy these defence mechanisms, digital identity remains the weakest link in systems design. What does it take to assemble a digital identity? What do different implementations of digital identity share?

Drawing on historical conceptualisation of governance and identity, the roots of digital identity trace back to the 19th century. One significant early application of digital identity was its use in the 1890 United States Census[3], representing an early systemic embedding of the punch card system as a tool for governance. This would develop into the modern computing field in the following century. As societies have digitised and computerised, the use and influence of digital identities has expanded dramatically, with different context-sensitive implementations of digital identity touching nearly all parts of modern life.

Anticipating and examining the effects of digital identity on governance, commercial activity and the social interactions of digitised societies requires a more concrete definition of the first principle[4] of digital identity. Despite the universal and multi-faceted deployment of digital identity in modern life, there is no agreed-to standard definition of what exactly encompasses digital identity[5]. This ambiguity stems from the broad range of datasets used to define and structure a digital identity system, the equally-broad objective and application of the identity within a wider digital system, and the competing interests held by bodies that define and deploy digital identity systems.

Depending on their intended use case, internal or external constraints, legislative requirements or other factors, digital identities are derived from what data they are designed to hold. Through the act of serialisation in which aspects of a person is converted into a data set, a digital identity begins to form once it is capable of storing of one or more entity-representing data sets for later lookup. Common data that make up a digital identity system include:

  • Unique identifiers specific to the identity system, including usernames or other system-assigned addresses;

  • Personal identifying information within the context of the identity (attributes), such as an individual’s full name, address, gender, date of birth, etc;

  • User curated data, such as a profile photo, account name or self-assigned categorisations;

  • Passwords, passkeys or other security primitives, such as cryptographic key pairs;

  • User-generated behavioural or transactional data, such as network activity, financial records, location histories, or other unique information logs;

  • Online or offline social graphs, including self-declared, observed or inferred real-world or digital relationships, and other associations;

  • Data assigned by a third party, such as classification by an identity vendor, social credit scores, credit histories or criminal records;

  • Network association, such as domain instance[6] or choice of protocol;

  • Additional non-human data, such as MAC addresses or other hardware information when identities represent devices, or corporate branding and other legal information where identities represent organisations and entities;

Each of these data types has a profound effect on the shape and potential application of a digital identity in areas of trust, privacy, capability, accuracy, governance, social dynamics, power, integrity and colonialism. Across the landscape of the digital identity first principle, the assembly and structure of this data is both highly contextualised within its own boundaries, and diffuse and amorphous in the aggregate. The digital identity first principle is often well understood within its immediate application, but not regulated or standardised[7] due to the wider complexity of the multitudes of implementations, the diverse motivations of differing implementations, and the influence of marketing or lobbying. For example, the IP address of a residential customer of an Internet Service Provider can either represent a digital identity or a data-point in a digital identity. The discrepancy between these two perspectives has significant consequences: the treatment of IP addresses as a legally sound representation of a digital identity was a core strategy of corporate litigation against private citizens during the file-sharing lawsuits of the 2010s[8]. So long as a data set is used to identify an entity of some kind, it can be classified as an example of a digital identity.

The application of digital identity is equally broad and varied, as digital identities are deployed in the pursuit of institutional, legislative or ideological objectives. The objectives can be radical: privacy-focused projects such as Signal[9] or the Tor Project[10] deploy a kind of digital identity designed to defend against the de-anonymisation of users while simultaneously ensuring that users are able to identify each other reliably (user to user, in the case of Signal) or attempting to offer a degree of reliability when looking up the identity of Onion-based web services (user to device, in the case of Tor), or to provide cryptographic verification using device-based identities, a practice common to both tools.

At the other extreme, immigration and border control objectives rely on detailed digital identities derived from a mix of data — criminal records, observational profiles, social histories and other datapoints held by state or private actors. As an individual enters or exits a country, border agents use e-passports and biometric scans as a kind of namespace lookup[11], first comparing the real-world individual to their documentation, before retrieving additional data designed to assess and record the transiting individual’s history and character.

Between the examples of network privacy and border control, the application of digital identity has countless forms: self-curated profiles on social media platforms, digital currency wallets, advertising profiles, credit histories, webs of trust, social credit scores, virtual reality or VTuber[12] avatars, digital banking profiles, government-to-citizen services, computer operating systems and text-based chat systems are just a handful of examples of products and services that rely on the application of digital identities. The objectives of these examples determine not just what is contained within an identity, but also what the identity is capable of representing, and the validity of the claims of what is represented. Social media platforms such as Facebook, TikTok or Bluesky, and developer services such as GitHub offer identity variations to represent organisations and companies alongside individual users, and sometimes allow users to use their identity to authenticate and associate themselves with an organisation. IoT or device-first systems use identities to represent and authenticate machines—be it during interactions with with other machines, or with users.

The range of potential definitions and configuration of a digital identity paradigm, combined with the multitudes of potential applications of these paradigms plays a significant role in the inability for the technology and policy communities to communicate and build consensus around the design and use of this critical concept. As members of the ID2020 Web of Trust workshop assert in their paper Identity Crisis: Clearer Identity through Correlation:

When we think about “identity” in terms of “who we are”, we get caught up in the consequences and ramifications of policy and privacy and human rights. These are important debates, but they often slip into abstractions, miscommunication, and political disagreements that undermine our efforts to build functioning identity systems. On the other hand, when we think about “identity” as a mere collection of attributes or identifiers, we ignore and sometimes dismiss the deeper meanings others interpret in the word.[13]

In examining digital identity over the course of this research, we have identified a set of common properties that we propose as a universal definition of digital identity for the purposes of the case studies, landscape review and qualitative interviews contained in this research. This is a multi-faceted, multi-perspective working definition for the first principle of digital identity that includes six core properties:

  1. Serialisation, in which part of an individual is read and converted into a digital form by a software or hardware sensory apparatus and defined at the discretion of a systems designer;

  2. Custodianship, in which the serialised self from which the identity is derived is stored and maintained in some form, be it via software automation or via manual means by the user[14] or a third party[15];

  3. Presentation, where the serialised data is reassembled and made legible to machines or humans through an interface of some kind;

  4. Authentication, where the digital identity becomes a central mechanism in which an individual invokes some form of cryptography and/or relational trust to gain access to digital or real-world resources, services or opportunities, or is granted movement in a place[16];

  5. Authorisation, where the authentication and presentation layers of a digital identity act as a vessel for an individual that allows gatekeepers to give and maintain access to a system or resource, and;

  6. Assetisation, where the digital identity is employed as the support for a wider financial speculation goal and/or other commercial ventures.

Despite the popular concepts of digital identity being tied to user self-expression or data-politics, the expression or representation of self is not the intent of the digital identity first principle. Instead, the overarching goal shared by all implementations of digital identity is that of the broader intent of cybernetics: to govern a population in aggregate. This is accomplished by standardising the properties of entities and actors as they are appear within the digital system, eliminating edge cases where possible, and then designing socio-technical touchpoints within the system that allow for the management of these subjects. Such an array of techniques obeys to a diffuse rationality in the act of governementality[17], a concept that is inseparable from the material and systemic tension points of society. The term governing here is broadly agnostic, applying equally to digital identity that serves to manage users on a Discord server, to assign and disperse essential provisions to a population in crisis, or anything in between.

At the same time, the digital identity first principle is an individualistic paradigm. Although identities often represent companies, organisations, devices or other non-human actors, these implementations are nevertheless derived from a Libertarian-inspired ‘one user one identity’[18] design popularised by technology advocates in the 2000s[19]. Aside from a few tightly controlled exceptions, non-individual identities become temporarily individual or retain an individual identity developed over time. In an example of the former, users will frequently express themselves as an individual through the group identity, such as employees including their initials on messages posted from corporate social media accounts. For the latter, a device identity in a anonymous cyptocurrency system becomes an individual identity as it is subjected to forensic on-chain analysis and scrutiny over time.

All digital identities are simulacra[20], in the sense that the processes of capture and reproduction of the identity are inherently flattening and imitative. The representation of self held within a digital identity is modified by the system and the hardware apparatus that supports the system itself. Pressing the organic world into silicon for the purposes of assembling a sort of diorama representation is achieved through standardisation and serialisation.

Even at the smallest scales, these digital systems are fraught with compounding complexity, not just within their own design, but in the design of systems that support them—network topographies, sensor capabilities, storage considerations, etc. The same is true for the reassembly and presentation of the identity at a later stage. Entropy, edge cases and nuance create significant challenges for the conceptualisation and operation of digital systems, and the offline sources for digital identity are rich in all three. To the reduce complexity and the tools used in the conversion contribute to tremendous data loss. As the Hong Kong philosopher Yuk Hui writes, “Generally speaking, technological diversity is disappearing and becoming homogenized due to cybernetic hegemony. Technological development throughout the world now consists of nothing more than a vast process of “translation”: exactly as with linguistic translation, we seek equivalences between different cultures for each element of the system – but that never really works.”[21]

Finally, all digital identities are eventually human readable. Regardless of the humanity of the intended counterparty (or lack of), all digital identity can and will eventually take a human readable form. This human legibility can be inherent to the system for which the digital identity was designed for, such as a user profile interface in a social media platform, or a human-readable email address. Legibility can also be derived from the digital identity and its interactions within a digital system by a third party, for example the forensic analysis of a cryptocurrency wallet address and its social graph.

To assess a digital identity, one might consider the legibility of each of the core six properties of a digital identity — serialisation, custodianship, presentation, authentication, authorisation and assetisation — as the paradigm interfaces with the real world. It is through this tactile and quasi-Para-Real[22] that the strengths and flaws — the risks and opportunities — are often considered. Questions of universal access, digital literacy, disability, colonialism, privacy, security, discrimination and other issues driven by digital identity are at their most visceral: in the inter-facing layer between the electronic world and those who gaze into it. It is precisely the complexity and intensity of this surface that encourages the deeper entrenchment of the flawed first principles of the electronic self.

Cade Diehm
January 2024

Edited by Benjamin Royer, with assistance from Roel Roscam Abbing.
A full list of acknowledgements for this Research Note will be included in a forthcoming Digital Identity research report, due late Q2, 2024.

  1. This Research Note is an abridged opening chapter of our forthcoming Digital Identity research report.

  2. Socio-technical refers to the 'emergent interplay between tools and behaviors of users,' and is especially useful in emerging digital security practice. See also: Entanglements and Exploits: Sociotechnical Security as an Analytic Framework
    Matt Goerzen, Elizabeth Anne Watkins & Gabrielle Lim
    2019 ↩︎

  3. 'Do Not Fold, Spindle or Mutilate': A Cultural History of the Punch Card
    Steven Lubar, Journal of American Culture
    Winter 1992 ↩︎

  4. A first principle is a fundamental, foundational concept or assumption that serves as the bedrock for a system's design, operation, and understanding. A first principle is not derived from other principles or assumptions but stands as an axiom. It coalesces from a complex set of political, material and ideological constraints, and guides the development and function of any cybernetic system that supports digital identity. ↩︎

  5. Identity Crisis: Clearer Identity
    through Correlation

    Joe Andrieu (Editor), Kevin Gannon, Igor Kruiper, Ajit Tripathi & Gary Zimmerman
    Web of Trust II: ID2020 Design Workshop
    2020 ↩︎

  6. A domain instance refers to a top level domain associated with a user account, particularly in federated networks. For example, in the case of the mastodon account, the domain instance is ↩︎

  7. Identity Crisis: Clearer Identity
    through Correlation

    Joe Andrieu (Editor), Kevin Gannon, Igor Kruiper, Ajit Tripathi & Gary Zimmerman
    Web of Trust II: ID2020 Design Workshop
    2020 ↩︎

  8. RIAA v. The People: Five Years Later
    Electronic Frontier Foundation
    30 September 2008 ↩︎

  9. Signal is a cryptographically secure open-source messaging service. ↩︎

  10. The Tor Project is primarily responsible for maintaining software for the Tor anonymity network, a decentralised anti-censorship web browsing network. ↩︎

  11. A name lookup is the act in which a supplied name, when encountered in a program, is associated with the declaration that introduced it. ↩︎

  12. 'Chester the Otter', a Vtuber avatar, depicted as an anthropomorphic otter in a maid outfit. Chester's fur is a blend of orange and white, with prominent, round eyes, and a small, endearing snout. The avatar is dressed in a traditional maid outfit with a dark blue sailor collar and a matching large bow, while blue ribbons adorn the ears. The background is a playful pink with a fruity pattern, enhancing the character's charming appeal.
    'Chester the otter,' a VTuber character by streamer Kris Yim. VTubers, or Virtual YouTubers, are livestreamers who perform using virtual avatars puppeteered by motion capture hardware and software. ↩︎

  13. Identity Crisis: Clearer Identity
    through Correlation

    Joe Andrieu (Editor), Kevin Gannon, Igor Kruiper, Ajit Tripathi & Gary Zimmerman
    Web of Trust II: ID2020 Design Workshop
    2020 ↩︎

  14. Local first storage includes devices and services that opt to store user data on said user's local device, rather than a remote or centralised location, such as biometric data stored in Apple's Secure Enclave or a user's Steam game library. See also: Local-first software: You own your data, in spite of the cloud
    Martin Kleppmann, Adam Wiggins, Peter van Hardenberg & Mark McGranaghan
    Ink & Switch
    2019 ↩︎

  15. For example, an IT department responsible for maintaining the user identities and data of a company. ↩︎

  16. See for example the National Institute of Standards and Technology's approach to identity and authentication. ↩︎

  17. For Michel Foucault, governementality refers to the rationality of the act of governing, a practice he locates during the long birth of the liberal Nation State. ↩︎

  18. Challenges of Identity Management Systems and Mechanisms: A Review of Mobile Identity
    Raphael Bandaa & Jackson Phiri
    ICTSZ International Conference
    2018 ↩︎

  19. What's in a Name: Facebook's Real Name Policy and User Privacy Shun-Ling Chen, Kansas Journal of Law & Public Policy
    2018 ↩︎

  20. The simulacrum is, for Jean Baudrillard, the end result of the process by which the sign discards any relationship with what it is supposed to represent or signify: a copy of a copy of reality, with no original, where signs simply call upon other signs.
    Simulacres et Simulation
    Jean Baudrillard
    Éditions Galilée
    1981 ↩︎

  21. Yuk Hui : « Produire des technologies alternatives »
    Interview by Michael Crevoisier
    9 July 2020 ↩︎

  22. The Para Real: A manifesto 10 December 2022 ↩︎