SMU Research & Summaries Paper

Description

I need you to summarize it in one page that covers the followings:5 Paragraphs:summary of main ideas in the paper, in your own wordsContext of this work within the literaturewhat you liked about the paperwhat you disliked about the paperNext steps in the research not identified by authorEnd with 2 thought-provoking questions27 Years and 81 Million Opportunities Later:
Investigating the Use of Email Encryption for an
Entire University
Christian Stransky
∗ , Oliver Wiese
† , Volker Roth† , Yasemin Acar
‡ , and Sascha Fahl
∗§
∗ Leibniz University Hannover, stransky@sec.uni-hannover.de
† Freie Universität Berlin, {oliver.wiese,
volker.roth}@fu-berlin.de
‡ Max Planck Institute for Security and Privacy, yasemin.acar@mpi-sp.org
§ CISPA Helmholtz-Center for Information Security, sascha.fahl@cispa.de
Abstract—Email is one of the main communication tools and
has seen significant adoption in the past decades. However, emails
are sent in plain text by default and allow attackers easy access.
Users can protect their emails by end-to-end encrypting them
using tools such as S/MIME or PGP.
Although PGP had already been introduced in 1991, it is
a commonly held belief that email encryption is a niche tool
that has not seen widespread adoption to date. Previous user
studies identified ample usability issues with email encryption
such as key management and user interface challenges, which
likely contribute to the limited success of email encryption.
However, so far ground truth based on longitudinal field data
is missing in the literature. Towards filling this gap, we measure
the use of email encryption based on 27 years of data for 37,089
users at a large university. While attending to ethical and data
privacy concerns, we were able to analyze the use of S/MIME
and PGP in 81,612,595 emails.
We found that only 5.46% of all users ever used S/MIME or
PGP. This led to 0.06% encrypted and 2.8% signed emails. Users
were more likely to use S/MIME than PGP by a factor of six.
We saw that using multiple email clients had a negative impact
on signing as well as encrypting emails and that only 3.36% of
all emails between S/MIME users who had previously exchanged
certificates were encrypted on average.
Our results imply that the adoption of email encryption is
indeed very low and that key management challenges negatively
impact even users who have set up S/MIME or PGP previously.
I. I NTRODUCTION
Email is one of the major online communication tools. As
of February 2021, there are more than 4 billion email users
worldwide sending and receiving over 300 billion emails per
day [41]. While email is used for all kinds of information including the most sensitive kinds such as trade secrets, account
credentials, and health data, regular email is not encrypted and
allows network attackers and service providers unauthorized
access. This is not for a lack of tools. Both S/MIME [14]
and PGP [46] were introduced almost 30 years ago with the
goal to provide end-to-end encryption for email. However,
in contrast to modern messaging tools such as Signal [37]
or WhatsApp [43] that implement end-to-end encryption by
default, S/MIME and PGP require a complex manual setup
by users. Consequently, previous work has shown that using
email encryption correctly and securely is challenging for
many users [20], [32], [34]–[36], [45]. They struggle with
setting up and configuring encryption keys, distributing them,
managing keys on multiple devices, and revoking them. These
findings, already anticipated by Davis [12], are corroborated
by public reports of failed PGP use. For example, it took
Edward Snowden and the journalist Glenn Greenwald a few
months and serious effort to set up PGP for email in order to
communicate securely [28]. Hence, it is commonly believed
in the security community that end-to-end encrypted email
is not widely used, mostly because of lacking usability and
awareness issues identified in a multitude of user studies in
the past 22 years (cf. [11], [20]–[22], [30], [34], [36]). To the
best of our knowledge, our work is the first scientific collection
and evaluation of ground truth on the adoption of end-to-end
email encryption. Our work is mainly motivated as follows:
Ground Truth. We aim to confirm the security community’s
anecdotal knowledge about the low adoption of end-to-end
email encryption and provide ground truth based on field data.
Our longitudinal field data can help motivate future work to
improve the adoption of end-to-end encryption for email.
Method Extension. We extend the toolbox of the past 22
years of email encryption research that was initiated with the
seminal paper “Why Johnny Can’t Encrypt” [45] at USENIX
Security’99 that is mostly based on laboratory experiments
and self-reporting studies: In this work, we investigate a large
dataset including millions of data points of thousands of users
and years of their email data.
Validate Results from Previous Work and Investigate Underexplored Challenges. We confirm findings from previous
work (e.g. [1], [26], [27], [30]) obtained by other methods
including smaller-scale interviews, surveys, and controlled experiments. Additionally, we also investigate further challenges
that require large scale field data.
Motivated by the above, we make the following contributions in the course of this work:
Data Collection Pipeline. In collaboration with our data
protection officer, university staff council, and the technical
staff of the university IT department, we developed and tested
a reproducible and privacy friendly data collection pipeline
that allows to analyze large amounts of email data with a
focus on S/MIME and PGP usage (cf. Section IV-A). The
data collection pipeline is part of our replication package.
We aim to encourage other institutions to investigate their
adoption of S/MIME and PGP to contribute to an even better
understanding of the email encryption ecosystem.
Adoption of Email Encryption at a Large University.
We provide a detailed evaluation of the adoption of email
encryption at our university in the past 27 years. In our
evaluation, we focus on the use of S/MIME and PGP for
37,089 total email accounts. Our investigation of 81,612,595
emails found that 2.8% of them were digitally signed and
0.06% were encrypted. We found that only 5.46% of our users
ever used S/MIME or PGP and that S/MIME was more widely
used than PGP. However, PGP was the more popular email
encryption tool amongst researchers.
Use of S/MIME and PGP. We provide a detailed overview of
S/MIME certificates and PGP keys in our dataset and find that
RSA is the most widely used key algorithm, employing 2048
bits keys most often for S/MIME. PGP keys used 4096 bits
most often, although newer PGP keys used less secure 2048
bits. We find that more than one third of all PGP keys did not
have an expiration date set making revocation unnecessarily
complicated and Deutsche Telekom to be the root CA for
64.95% of all S/MIME certificates.
User Interaction Challenges including Key Management.
We report on an investigation of user interaction challenges
that previous work identified in user studies. Most interestingly, we focus on key management issues during key
distribution, multi device use, and key rollover. We find that
even after exchanging public keys, only 3.36% of all emails
between S/MIME users were encrypted on average. The use
of multiple email clients had a negative impact on the amount
of signed and encrypted emails. While most S/MIME and
PGP users renewed their keys in time, 11.5% of S/MIME key
rollovers occurred after the keys’ expiration.
Overall, our results confirm the pessimistic assessment of
the security community: Although our university provides all
researchers, staff, and students with free access to S/MIME
certificates, only very few make use of them and only a negligible amount of emails was encrypted or signed. Our findings
also support results from previous user studies and illustrate
additional challenges. Management of email encryption keys
is hard and distributing keys, using multiple email clients, or
having to renew keys complicates matters.
The rest of the paper1 is organized as follows: In Section II
we provide information on S/MIME, PGP, and our university’s
S/MIME certificate authority. We provide an overview of
related work and contextualize our contributions in Section III.
In Section IV, we describe our methodology by providing
details for our data collection pipeline, discussing ethical and
1 Find
our companion website at: https://publications.teamusec.de/2022oakland-email/
data privacy implications of our work, illustrating limitations,
and summarizing the replication package. In Section V, we
provide detailed results of our evaluation, discuss their implications in Section VI, and conclude the paper in Section VII.
II. BACKGROUND
In this section, we provide background information on
OpenPGP, S/MIME, and the email ecosystem of our university
including its S/MIME certificate authority.
A. OpenPGP
OpenPGP2 is an encryption standard (cf. [10], [15]) which
is used for email encryption and digital signatures. PGP is
an open source project and was first standardized in 1996.
In the first standardization, PGP messages were added to the
text body (named: PGP Inline) of an email. Later versions
introduced a separate MIME type for PGP messages (named:
PGP MIME). Over time, new algorithms have been added,
including the Camellia and ECDSA cryptography algorithms.
PGP supports the use of key servers for public key exchange. Users can search these servers for keys for given
email addresses. However, keys may also be exchanged by
attaching public keys to emails. Additionally, several email
clients, like K9 on Android or Thunderbird using the Enigma
plugin, support hidden key exchange by adding public keys to
email headers. This feature has been standardized and further
developed by the open source project Autocrypt3 since 2016.
In contrast to centralised trust infrastructures known from
the web PKI or S/MIME, PGP relies on the Web of Trust
to verify identities. In the web of trust approach, users sign
each other’s key when meeting other PGP users in person.
Therefore, users can trust a new key if another trusted key
previously signed the new key, relying on a decentralized trust
chain.
B. S/MIME
Secure/Multipurpose Internet Mail Extensions (S/MIME) is
a standard to encrypt and sign emails. It was first introduced in
1998 (RFC2311 [15]) and has constantly been improved since
then. S/MIME utilizes a Public Key Infrastructure to verify
certificates and as such has mostly been used in corporate
environments, where a certificate authority (CA) is deployed or
third party CAs are utilized to issue certificates to employees.
It has been widely supported out-of-the box without the need
for third party plugins in commercially used email clients like
Outlook 98 and higher or Thunderbird.
C. Email Ecosystem at our University
Email at our university is a centralized service. The university’s computing center provides email accounts for all
administrative staff members, for all students as well as faculties, departments, and research groups. Overall, our university
offers 90 different study subjects reaching from engineering
to humanities, and has about 30,000 students and 5,000
2 Abbreviated as PGP in the paper
3 cf. https://autocrypt.org
staff members. Faculties, departments and research groups are
organized in decentralized units, i.e. each faculty, and most
departments and research groups have their own subdomain
(e.g., sec.uni-hannover.de for the information security research
group) for their email. Users can access their email accounts
either through a web interface or dedicated email clients using
the university’s POP3, or IMAP and SMTP servers.
D. University Certificate Authority / Registration Authority
Our university is part of the public key infrastructure of the
communications network for science and research in Germany
(DFN).4 The university’s computing center provides a registration authority for the DFN CA to issue certificates for email
end-to-end encryption and signing, server authentication for
TLS, and document signing for its scientific and administrative
staff and all students (DFN-PKI).5 Certificate signing and
revocation is processed through the DFN CA.
Certificate Policies. All university employees and students
are eligible to obtain S/MIME certificates. However, certificate
use is neither officially endorsed, nor are issuances automatically triggered. Individual work groups may informally
encourage certificate use. While the CA also provides server
certificates (e.g., for TLS), our work focuses on user certificates for email encryption/signing.
User Certificates. To apply for a certificate, eligible staff and
students can apply online, receive a certificate signing request,
make an appointment with the registration authority, show
up in person, present a proof of identity, and then receive
a valid certificate. This process is comparably complicated. In
contrast, creation of a student ID that can be used to access
free transport and student discounts, and is used as proof of
identity in exams, does not require in-person interaction. The
process is also not embedded in any other existing onboarding
process at the university. New certificates for the same user can
be issued without another identification if the last identification
is not older than 39 months.
Certificate Signing Request Process. Users can generate a
certificate signing request (CSR) for email certificates using
a web application entering their personal details and a revocation pin. The web application generates a CSR, saves it
in the browser certificate store, and asks the user to enter a
passphrase which protects the CSR’s private key.
Certificate Revocation Process. The CA provides a web
interface for certificate revocation. To revoke certificates, users
have to enter their certificate’s serial number and a revocation
PIN.
Certificate Expiration. Once 30 days and a second time 15
days before the user certificate expires, the DFN sends users
an expiration warning via email and encourages users to renew
the certificate.
History and Milestones. Our university began to issue
S/MIME certificates in 2004. The first root certificate used to
sign the original CA (G1) through the DFN expired in 2019.
Starting from 2017, certificates were issued using a new root
CA (G2). Additionally, 479 certificates issued using the G1
CA had to be replaced with new certificates issued by the
G2 root CA. Figure 10 in Appendix C illustrates the issued
certificates per year. In the first years, the amount of server
and user certificates is roughly equal. However, the number of
S/MIME certificates is only slowly increasing compared to the
number of server certificates. We can see a drop in requested
certificates in 2020, probably due to the COVID-19 pandemic.
III. R ELATED W ORK
We discuss related work in two key areas and put our work
into context: User studies for end-to-end encryption with end
users and email field studies.
A. End-to-End Encryption Studies for End Users
The usability of end-to-end encryption has been a research
focus for decades. With their seminal work, Whitten and
Tygar [45] set this line of research off in 1999 when they
evaluated the usability of PGP and identified challenges for
end-users in a lab study. In their qualitative user study,
participants had trouble with key management, specifically key
exchange with other participants. One participant forgot the
password for their key pair and thus had to generate a new
one. Another participant was unable to encrypt an email and
others were unable to decrypt emails. Following Whitten and
Tygar, Garfinkel and Miller [20] and Roth [29] hypothesized
that some of the challenges could be overcome by simplifying
the key verification process. Their solutions were based on
omitting the verification of keys by third parties and instead
using a trust on first use (TOFU) approach. Garfinkel and
Miller tested their approach on lay users and found that colorcoding messages depending on their signature status makes
users significantly less susceptible to social engineering attacks
overall. Garfinkel et al. surveyed 470 merchants who received
digitally-signed VAT invoices from Amazon and found that
merchants should send signed emails by default as the passive
exposure seems to increase acceptance and trust [19], [21].
Ruoti et al. studied different self-designed and publicly available encryption tools [31]–[33], [35] to improve the usability
of email encryption in their laboratory. Their research interests
were key management, key distribution, and automatically vs.
manually enabled encryption. In several laboratory studies by
Ruoti et al. [31], [32], [35], participants rated at least one tool
as usable and indicated interest in secure email, but did not
know when or how they wanted to use it. In another recent
journal article, Ruoti et al. came to the conclusion that secure
and usable systems so far had only been tested in short-term
studies and future research should investigate long term usability and adoptability of secure email systems [30]. Atwater
et al. and Lerner et el. proposed clients for PGP similar to
Keybase6 to study how to simplify key distribution [5], [25].
4 https://www.dfn.de/en/
5 https://www.pki.dfn.de/ueberblick-dfn-pki/ (german)
6 https://keybase.io/
They proposed to upload users’ public keys to a website and
confirm ownership such that they can retrieve emails from
the corresponding email address. In a lab study, Atwater et
al. found that such a key distribution mechanism enabled
participants to send more encrypted emails and had improved
usability for (webmail) users. Lerner et al. compared their tool
called Confidante with Mailvelope7 and showed that their tool
reduces the error potential [25]. In a lab study, all participants
in the Mailvelope group struggled to import public keys or to
share their own public key. In the Confidante group, three out
of nine participants struggled to import public keys. However,
all of them managed to share their public keys successfully.
Bai et al. proposed encryption prototypes to study user flows
of different key management approaches [8], [9]. In an interview study, participants preferred to register their public
keys on a webpage and automatically retrieve public keys of
communication partners from the webpage over manual key
management. Fahl et al. examined different usability aspects
for Facebook message encryption mechanisms and found that
automatic key management and key recovery capabilities are
important for adoption [16]. McGregor et al. reported that
cooperating journalists used PGP to encrypt their emails when
investigating the Panama Papers [27]. However, journalists
also identified obstacles when using encryption with multiple
devices. Consequently, they used secure messengers such as
Signal instead of PGP on their smartphones. Gaw et al.
interviewed nine employees from the same company and found
that users flagged encrypted mails as urgent and found those
to be annoying when used for all messages [22]. They argued
that understanding of social factors is important for adoption.
In a combined lab and field study, Mauriés et al. participants
struggled with Enigmail for Thunderbird and the Mailvelope
browser plugin [26]. Enigmail users needed help to setup the
tool on their computer. The setup process in Mailevelope was
unclear. One participant struggled to import a public key and
send an encrypted email.
In addition to email, mobile messaging apps including
Signal, Threema and WhatsApp, made end-to-end encryption
available for the masses. WhatsApp, for example, introduced
end-to-end encrypted messages for all its users by default in
2016 [43]. In an interview study, Abu-Salma et al. identified
blockers and barriers for the adoption of end-to-end encryption
including incompatible tooling and misconceptions of end-toend encryption features [2]. They argue that usability is not
the primary obstacle and that fragmented userbases or a lack
of multi-device support significantly contribute to the nonadoption of end-to-end encryption. In a different study, AbuSalma et al. further explored users’ mental models and found
misconceptions about security properties of messengers [1].
They argue that adoption is no longer the main challenge
for end-to-end encryption tools but that people instead switch
to non secure communication tools and need assistance in
choosing the right one for sensitive information. Stransky et
al. confirmed these findings in an online study with WhatsApp
7 Mailvelope is a browser plugin for PGP: https://www.mailvelope.com/
users. They found that security perceptions of end-to-end
encryption in mobile messaging apps heavily depended on
the reputation and expectations of an app provider, while
visualizing encryption has only limited impact on perceived
security [38]. Similarily, Akgul et al. found that participants
noticed educational messages and that they improved understanding of security concepts when they are used in isolation.
However, when those messages were implemented in a realistic environment, they could not find significant improvements
in the mental models of end-to-end encryption of users [3].
Overall, previous work primarily focused on identifying
blockers and barriers to adopting end-to-end encryption for
email or mobile messaging apps and studied alternatives to
or extensions of existing approaches using laboratory and
interview studies. In contrast, we aim to extend the toolbox
of end-to-end encryption research and provide ground truth
based on longitudinal field data by evaluating a large dataset
including millions of data points of thousands of users and
years of their email data.
B. Email Field Studies
In addition to the user studies discussed above, researchers
performed multiple field studies on the use of email. In 1996,
Whittaker and Sidner analyzed the mailboxes of 18 NotesMail
users containing 2,482 emails. They postulated email overload
and studied how users handle a mass of emails [44]. In 2006,
Fisher et al. revisited this analysis with a sample of 600
mailboxes containing 28,660 emails [17]. They analyzed users’
email sorting strategies, especially with respect to dealing with
the increased volume of emails, and postulated that large folders would make email retrieval hard. Alrashed et al. studied a
sample of anonymized email logs from Outlook Web Access
over a four month period, containing about 800 million actions
[4]. They aimed to understand how users handle incoming
emails. They find that most emails have a short lifetime and
that deleting email is the most common action on messages
users interacted with once. Avigdor-Elgrabli et al. analyzed
a sample of donated mailboxes of a major email service
provider, containing about 5 million emails [7]. They used
machine learning techniques to identify relationships between
emails. Roth et al. performed a study with anonymized mailboxes from 17 voluntary users that contained approximately
139,000 mails to investigate which security mechanisms would
be most appropriate for their communication patterns [29].
They argue that for individual non-commercial users, out-ofband verification of keys would be more feasible than relying
on public key certificates issued by third parties.
In addition to the above field studies that focused on email
usability more broadly, researchers investigated the adoption
of security protocols for email. Foster et al. scanned the
Simple Mail Transfer Protocol (SMTP) configurations of about
300,000 major email providers and email generators in March
2014 and February 2015, and investigated the behavior of
known email providers [18]. They found that TLS is widely
used and discovered a dramatically low adoption of effective
TLS certificate validation. Durumeric et al. examined log data
for SMTP handshakes of Google’s Gmail service from January
2014 to April 2015 and compared it with a snapshot of SMTP
configs of Alexa top million domains as of April 2015 [13].
The authors examined the distribution and use of TLS and
other server-side security mechanisms. They found that the top
mail providers proactively encrypted and authenticated messages. However, these practices had yet to reach widespread
adoption in a long tail of over 700,000 SMTP servers with
less secure configurations. Ulrich et al. evaluated the PGP key
database (SKS-Keyserver) in December 2009, examining 2.7
million keys of which 400,000 were expired and 100,000 were
revoked [42].
Overall, the above field studies investigate the adoption and
usability of email in a broader context, measure email servers’
security configurations, and conduct small-scale security analyses of email encryption. We extend previous field studies by
focusing on adopting email encryption using a longitudinal
large field dataset.
IV. A NALYZING 27 Y EARS OF E MAIL DATA
Below, we provide detailed information on the data collection and analysis process in our work.
We performed our longitudinal analysis of a large email
dataset in coordination with the technical staff of the university’s IT department, the data protection officer, and the university’s staff council (see Section IV-C for more details). We implemented a data collection pipeline to collect pseudonymized8
metadata for all email accounts, including the use of S/MIME
and PGP (cf. Figure 1) at our institution in the last 27 years. At
no point did we collect email subject or body information to
avoid the disclosure of personally identifiable information (PII)
to the researchers. We also ensured that metadata included
neither email account names nor the departments’ names or
subdomains. We aimed to keep the number of processing
errors low and consistently tested the pipeline with our own
mailboxes until no further processing errors occurred.
The IT department’s technical staff reviewed the pipeline for
functionality and data protection aspects, and then executed it
on the university’s standby backup email server. The backup
server is a hot-standby of the primary mail server, and automatically takes over in case of a failure. Data between both
servers is constantly synchronized and as a result, the backup
is identical to the live data. The backup server retains all email
data, dating back to early 1994.
Figure 1 provides an overview of the nine-step processing
pipeline: We initially started with a local testing environment
on a small sample emailbox created specifically for our
study (1); The technical staff reviewed the initial pipeline
and iteratively tested it against the full set of emailboxes of
the researchers and their own emailboxes (2); We exported
the emailboxes to JSON-formatted files (3); We parsed and
pseudonymized all emails (4); We performed assertion checks
on every email to ensure that neither the emailaddress nor
the domain was present in any result fields to account for
8 The pseudonymization process is described in Sections IV-A and IV-C.
emailclients writing private data to unexpected places (5);
In the case of a succeeding assertion check, we stored the
resulting email metadata for further analysis on a secure server
in the university’s computing center (6a); In the event of a
failure, we dropped all email metadata to avoid the leakage of
private information (6b); The IT department’s technical staff
transferred pseudonymized results to the authors’ secure cloud
storage (7); We analyzed the pseudonymized results (8).
A. Data Collection Pipeline
The university uses Dovecot9 for their email servers. Dovecot offers an export feature to extract all emails as a JSONformatted file. We implemented our processing pipeline using
a large JSON file per mailbox containing all exported emails as
input. For parallel processing, we used the GNU Parallel [40]
tool. On behalf of the researchers, the university’s IT staff
executed the pipeline on the backup email server to make
sure raw emails were not exposed to the researchers. Below,
we describe the metadata we collected. In the cases where
we applied pseudonymization, we provide a description of the
pseudonymization procedure. Table IV in Appendix B gives
an overview of both general and S/MIME and PGP specific
information.
General Information. For each email we collected the local
user account, message ID, the sender, and the list of receiver
email addresses. If present, we also collected the lists of carbon
copy and blind carbon copy addresses for outgoing emails.
For pseudonymization, we hashed all of these values using
a secret salt10 and the SHA-256 hash function. We grouped
email users into: Student, Staff, Faculty, NX Unknown11 and
External. For data protection reasons, we did not collect the
exact send and receive dates as well as times of an email but
only the corresponding week. If set, we collected the raw user
agent string to identify email client software, operating system, and compute platform if possible. We grouped mailbox
folders into: Inbox, Subfolder of Inbox, Outbox, Subfolder
of Outbox, Junk, Trash, and Spam. For further cryptographic
metadata analysis, we stored whether an email was signed
and encrypted or contained Autocrypt headers. Below, we use
the term cryptographic emails for all emails that contained
cryptographic metadata.
Cryptographic Metadata. For all cryptographic emails, we
collected attached cryptographic metadata for S/MIME, PGP
and Autocrypt. Table IV gives an overview of the collected
information and storage formats including pseudonymization.
For S/MIME, we collected the pseudonymized serial number of the leaf certificate, validity start and end date (granularity by week), the signing hash algorithm, the key size
and key type (e.g., 2048 and RSA). We collected the key
usage and extended key usage options (e.g., email signing,
9 https://www.dovecot.org/
10 The secret salt was only accessible to the IT staff and not to the
researchers
11 Used if the email subdomain did not exist anymore, and the original
purpose was unclear when we performed our experiments.
Researchers
2
University IT Services
Verify and Test on own Mail
Develop Data
Collection and
Pseudonymization
Report Errors
Verify and Execute
Execute
Check for Errors
1
Develop Analysis
Execute
Local Pipeline Test
with dedicated sample
8
Primary
6b
7
Execute
Secondary
5
4
Transfer
3
Pseudonymization
Assertion failed,
6a whole mail removed
Fig. 1. Illustration of our data collection and evaluation pipeline.
certificate signing, code signing), and the number of valid
email addresses for each certificate. Finally, we collected the
complete certificate chain, including all metadata for all signing certificates. We did not pseudonymize the serial numbers
for non-leaf certificates.
For PGP, we collected the key type (e.g., public key or
a sub-key), the signature algorithm, and key length. For
elliptic curve keys, we collected curve information as well
as pseudonymized key IDs, and creation and expiration dates.
For extended PGP keys, we collected update dates. If a key
included subkeys we also stored the above data for the sub
keys.
The pipeline dropped all data (like email subject, email
content, non-key attachments) that we excluded from our
analysis.
B. Data Cleaning
We included all 81,647,559 emails and 37,463 email user
accounts at our university from January 1994 to July 2021 in
the initial analysis (cf. Section V-A). However, we excluded
some emails and email user accounts based on the following
procedure:
Processing Errors. Parsing a dataset that spans millions of
emails from over two decades that were generated and sent by
many different email clients and versions poses a significant
challenge. Hence, we were not able to successfully parse
all emails. Our parser failed to parse 0.09% emails in the
dataset. Due to privacy restrictions, we were not allowed to
further investigate root causes of parser failures. However,
Appendix A provides more details on S/MIME and PGP
related parsing errors.
Inactive Email Accounts. Our initial data set included 37,463
total user accounts. However, we identified 18,302 inactive
email accounts for which we did not find any sent emails.
Of them, 17,928 email accounts received but did not send
emails and 374 did neither receive nor send emails. Many
students prefer to use their private email accounts instead of
their automatically created university email accounts, leaving
them inactive.
Invalid Dates. We excluded 307,680 (0.38% of our dataset)
emails with obviously forged header dates, e.g., year 2021 (after data collection) and emails for which the date
parser failed.
C. Ethical Concerns and Data Privacy
To conduct the large scale measurement study on email
data, our institutions, and specifically the institution where
the data was collected and evaluated, did not require formal
ethical review for this type of study. Therefore, we did not
involve an ethics review committee. However, we followed
our institutions’ guidelines for good scientific practice, which
includes ethical guidelines. Here, the institutions specifically
place the burden of determination of whether research is
ethical on the respective researchers. We intensively discussed
within and outside our research team to determine possible
concerns with this research project, and whether this project
would be feasible. We concluded that in addition to following
laws and our institutions’ ethics requirements, we should also
follow the de facto ethics standards of the S&P community.
The data used in this study can be described as pseudonymized
data derived from human subjects, as mentioned in the Call
for Papers.
In addition to ethics, we made sure to address all legal
aspects of our research to adhere to strict German privacy
protection laws and the European General Data Protection
Regulation (GDPR). Therefore, we involved the data pro-
tection officer and the works committee of the institution
where the data was collected and evaluated, as required by
the German data protection regulations. We developed the
data collection plan jointly with the data protection officer,
with the goal to protect users’ privacy and adhere to the
strict German data protection regulations and the regulations
in the state of Lower Saxony. After more than a year of
multiple discussions and hearings, we agreed on the presented
data collection plan (cf. details in Section IV-A). After the
involved authorities had rigorously assessed the legal situation
based on the GDPR, German data protection laws, and state
law of the involved authorities, we were allowed to analyze
pseudonymized metadata of all users at our institution without
requiring user consent. Additionally, our legal counsel decreed
that the benefit of our research to society outweighed the
risks to individuals. We concur with the assessment that
answering our research questions is beneficial for future endto-end encryption research, which ultimately benefits society,
and that there was no harm done to any participants based
on possibly re-identifiable metadata. Importantly, we will not
publish the metadata we collected and only an encrypted copy
will be stored at the university data center for ten years without
access by researchers to follow good scientific practice. As is
common in research, we only publish aggregate data, and no
email accounts can be re-identified through the publication
of this paper. As part of the joint development of our data
collection plan, we decided to take the following measures
to protect users’ (metadata) privacy and adhere to the GDPR,
German, and state laws:
The involved researchers never had access to raw data.
The data collection pipeline was executed by the university’s IT staff who operate the email servers and
have access to the backup data. They transferred the
pseudonymized data to the researchers to a secure server.
• We reduced the amount of data to the absolute minimum
we required to investigate our research questions.
• We used cryptographically secure hash functions with
salts unavailable to the researchers to pseudonymize user
data.
• At all times pseudonymized data (cf. Table IV in the
appendix) was only stored on secured university servers.
• We did not and will not share any data other than the
aggregate numbers in the paper with anyone outside the
team of involved researchers.
• We assured the data protection officer and the works
committee that we would not take any actions to deidentify users.

Data Set. Our dataset includes email data from January 1994
to July 2021 covering 81,647,559 emails of 37,463 users from
a large Germany University with more than 30,000 students
and 5,000 employees. However, the dataset might not be
complete. We could not include emails that users deleted from
their accounts; similarly, we did not include emails sent to
deleted accounts. Some research groups and departments have
their own email servers; their emails were also not included
in our dataset. Therefore, the dataset should not be assumed
to include all emails that were sent or received from January
1994 to July 2021 at our university. Additionally, we do not
assume that our dataset is representative for all email data
in Germany or globally. Instead, we think our data set might
overreport the use of email encryption since most email users
in the dataset are highly educated and our university offers
free S/MIME certificates to all email users.
However, we think our dataset is one of the largest and most
valuable for the type of analysis we perform and our results
are a valuable contribution to the security research community.
Data Analysis. We could not analyse all cryptographic details, e.g., we could not verify digital signatures since we were
not allowed to parse the body content of emails and we could
not extract details for certificates or signatures that were used
in encrypted emails, since this data is also encrypted.
We tested our pipeline thoroughly but still missed some
edge cases that inevitably arise in mail software that evolves
over a long time span. While our pipeline was able to process
99.91% of all emails, processing failed for the remaining
0.09%. Errors during S/MIME and PGP parsing were logged
separately. We encountered 1,199 S/MIME and 23,168 PGP
emails where parsing failed (cf. Appendix A for more details).
We deemed this margin of error tolerable compared to the high
organizational costs of refining and repeating the entire process
once more.
Distinction Between Send and Receive. The dataset we
evaluated did not contain information whether an email was
sent or received. To still group emails into sent and received
emails, one approach would be to group emails based on the
folder they were stored in. However, this would introduce
challenges such as email clients using different names for sent
folders or users using their own folder names. Therefore, we
decided to identify emails based on multiple parameters. Sent
emails were not allowed to contain a return_path header,
since it is added by outgoing mail servers and emails were
only allowed to go through at most one mail relay12 . Most
incoming emails have more than five email relays.
D. Limitations
Mail Client Behaviour. The macOS and iOS clients AppleMail, iPhoneMail and iPadMail generally identify themselves
using the X-Mailer header in mails, but the copy placed in
the sent folder by these clients does not contain this header.
As a result, these clients could only be correctly detected on
received emails, their sent mails are included in the ”No User
Like every measurement study, our work comes with several
limitations:
12 One mail relay header is added when the webmail client of the university
is used for sending mails. Regular mail clients do not add this header.
The above data protection precautions ensured the necessary
compromise between user privacy and data utility in order to
perform the analyses on which we report in Section V.
Agent” group. The ticket system used by the university data
center automatically deletes mails in its inbox and does not
place a copy in the sent folder and as a result only the answered
tickets are available in our dataset.
E. Replication Package
To improve the replicability of our work, we provide a
replication package including the following material: (a) the
complete processing pipeline consisting of multiple Python
scripts to process and pseudonymize emails from Dovecot
mail servers, (b) the analysis scripts to replicate our results
on different datasets, and (c) the agreement with our data protection officer. Due to the sensitive nature of our measurement
study, we cannot make raw email data available. We hope this
replication package helps future studies to better compare and
position themselves to our work; and hope others replicate our
work on different email datasets to improve our community’s
understanding of the use of email encryption. The replication
package is available on our website at [39].
V. R ESULTS
We provide a detailed analysis of the email corpus we
collected and the adoption of S/MIME and PGP from 1994 to
July 2021 below.
A. Dataset Summary
In total, we analyzed metadata for 81,647,559 emails from
37,089 email accounts. Overall, the university’s email users
exchanged 40,540,140 (49.67%) emails internally.
Figure 2 illustrates the use of email at our university
in the past 27 years. While we found only 350 emails in
1994, we detected an almost exponential growth and found
17,190,472 emails in 2020. This development reflects the
enormous relevance of email as a communication tool and
is in line with reports on the global use of email13 .
Use of Email Encryption and Signatures. We found
2,334,042 (2.86%) emails that were either encrypted or signed
using S/MIME or PGP. We identified a huge difference
between the use of email encryption and signatures.
46,973 (0.06%) emails were encrypted. 26,105 (55.57%)
emails were encrypted using S/MIME and 20,868 (44.43%)
of them were encrypted using PGP. In contrast, 2,287,922
(2.8%) emails were signed. 2,040,794 (89.2%) of them were
signed using S/MIME and only 247,128 (10.8%) were signed
using PGP.
Figure 2 illustrates the use of S/MIME and PGP between
1994 and 2020. We found the first S/MIME signed email in
1998 and the first S/MIME encrypted email in 1999. The
first PGP signed email appeared in 1994 and the first PGP
encrypted email in 1997.
Key Insights: Dataset.
We saw an exponential growth of the use of email between 1994
and 2020.
• Only 0.06% of emails were encrypted.

13 cf. https://www.emailisnotdead.com/
Fig. 2. Rise of email, S/MIME and PGP over time at our university.


2.8% of emails were signed.
S/MIME was more widely used than PGP.
B. S/MIME Certificates and PGP Keys
Below, we give an overview of the S/MIME certificates and
PGP keys we found. This includes certificates and keys from
internal and external senders. Overall, we were able to collect
9,765 S/MIME certificates, 3,741 primary PGP keys and 3,840
sub keys (cf. Table II for details).
S/MIME Certificates. All but one certificate that used an
elliptic curve encryption algorithm supported the RSA encryption algorithm.
2048 bits was the most widely used RSA key size (91.58%);
5.54% of the RSA keys had 4096 bits. In 237 cases we saw
1024 bits RSA keys; 6 RSA keys had 512 bits. While the
last 512 bits RSA key we found was created in 2010, we saw
2 1024 bits RSA keys created in 2020.
7,472 (76.52%) certificates supported the SHA-256 signature algorithm. However, we also found outdated signature
algorithms including SHA-1 (2,028; 20.77%) or MD5 (148;
1.52%). Surprisingly, we found 11 certificates issued in 2020
using SHA-1. The last certificate using MD5 was generated
in 2017.
5,194 (53.19%) of all certificates expired in 2020 or earlier.
The mean validity period for S/MIME certificates was 3.13
years (sd= 2.70) ranging overall from a minimum of 4.00
weeks to a maximum of 99.99 years. 6,953 (71.20%) certificates were created between 2015 and 2020 with a peak of
1,654 certificates (16.94% of all S/MIME certificates) in 2019.
Overall, we found 671 different issuer names. However,
1,150 (11.78%) certificates had no issuer. The most prominent issuer was the DFN issuing 3,209 (32.86%) certificates.
622 (6.37%) were issued by our university. Another German
university issued 563 (5.77%) of all S/MIME certificates we
found. In 332 (3.40%) cases, a distinct issuer only signed a
single certificate. 89 (0.91%) issuers signed only two certificates. 137 of them (32.54%) had no root CA. In total, we
S/MIME
Sent
PGP
Received
Sent
Total
Received
Sent
Received
Signed
Encrypted
Signed
Encrypted
Signed
Encrypted
Signed
Encrypted
Total
356,330
9,358
1,684,464
16,747
69,950
8,197
176,983
12,633
16,660,280
64,952,315
Client
Thunderbird
Outlook
Ticketsystem
AppleMail
Evolution
Mutt/NeoMutt
Outlook-Express
Claws Mail
iPhone/iPad-Mail
MailMate
Other
No Useragent
231,475
78,736
0
?1
11,479
0
7
0
?1
2,361
1,270
31,002
6,311
1,675
0
?1
190
29
0
0
?1
8
3
1,142
508,423
258,013
311,743
58,752
16,366
50
7,355
145
5,275
3,526
13,397
501,419
7,103
3,728
1
2,157
341
31
31
0
104
3
128
3,120
46,407
325
0
?1
232
564
0
1,654
?1
63
2,804
17,901
5,775
22
0
?1
48
1
0
207
?1
2
221
1,921
85,904
5,528
223
20,722
1,387
11,506
2,452
2,918
7
131
8,658
37,547
6,825
52
0
1,408
74
1
16
161
8
3
2,167
1,918
8,206,215
3,102,760
4
?1
26,224
6,066
29,065
2,573
?1
5,296
2,572,631
2,708,758
11,875,969
7,785,073
1,259,208
2,268,757
70,482
141,263
447,775
21,640
723,482
8,364
9,715,649
30,634,653
Operating System2
Windows
Linux
Mac
iOS
Android
Webmail
Unknown
No Useragent
167,350
60,827
5,909
1
81
135
91,025
31,002
2,183
3,985
170
0
1
3
1,874
1,142
326,747
164,926
88,144
5,294
66
831
597,037
501,419
2,555
4,379
2,397
104
0
8
4,184
3,120
38,751
8,797
2,126
0
34
10
2,331
17,901
2,531
3,372
172
0
101
2
98
1,921
66,786
23,069
24,849
7
167
322
24,236
37,547
2,589
4,191
1,841
8
110
2
1,974
1,918
7,150,676
799,179
340,599
695
84,744
2,378,940
3,196,689
2,708,758
10,679,520
1,457,578
3,057,579
726,209
120,115
2,425,629
15,851,032
30,634,653
Usergroup
Scientific
Staff
NX Internal
Student
External
237,579
106,434
9,903
1,700
714
3,992
5,318
20
28
0
808,539
684,791
121,243
67,813
2,078
8,376
8,155
129
71
16
66,483
2,122
77
1,242
26
6,708
1,134
39
311
5
115,388
49,111
7,153
4,903
428
7,796
3,864
436
536
1
11,776,198
3,265,602
970,160
393,685
72,256
39,504,464
17,560,401
4,960,051
2,703,240
224,159
Affected emails
>= 5%
> 4%
> 3.5%
> 3%
> 2.5%
> 2%
> 1.5%
> 1%
> 0.5%

Purchase answer to see full
attachment

We offer the bestcustom writing paper services. We have done this question before, we can also do it for you.

Why Choose Us

  • 100% non-plagiarized Papers
  • 24/7 /365 Service Available
  • Affordable Prices
  • Any Paper, Urgency, and Subject
  • Will complete your papers in 6 hours
  • On-time Delivery
  • Money-back and Privacy guarantees
  • Unlimited Amendments upon request
  • Satisfaction guarantee

How it Works

  • Click on the “Place Order” tab at the top menu or “Order Now” icon at the bottom and a new page will appear with an order form to be filled.
  • Fill in your paper’s requirements in the "PAPER DETAILS" section.
  • Fill in your paper’s academic level, deadline, and the required number of pages from the drop-down menus.
  • Click “CREATE ACCOUNT & SIGN IN” to enter your registration details and get an account with us for record-keeping and then, click on “PROCEED TO CHECKOUT” at the bottom of the page.
  • From there, the payment sections will show, follow the guided payment process and your order will be available for our writing team to work on it.