Authors:
(1) Simon Kafader, University of Bern, Bern, Switzerland ([email protected]);
(2) Mohammad Ghafari, University of Auckland, Auckland, New Zealand ([email protected]).
We had not the possibility to conduct an on-site experiment due to the COVID-19 pandemic. The participants themselves had to record the time spent on each task. It is possible that how each participant calculated the time varies from the actual time. We mitigated this issue by explaining how to measure the time. Each participant had to read a task, comprehend basic concepts, and then start the task.
The experience of participants and their familiarity with cryptography could impact the result. We tried to mitigate this threat by recruiting participants with different level of experience and expertise in this domain, and by briefing them on the basics of cryptography.
Participants self-reported their cryptography knowledge and rated the difficulty of each task based on their personal perceptions of their experience. It is possible that each subject had a different perception about his/her experience, which may not have been necessarily true. Likewise, a perceived “level of knowledge” in cryptography may not precisely reflect how developers perform in practice. We mitigated this threat by defining what each level means.
Every participant used both tools, i.e., Node.js API and FluentCrypto, for each task. We did not specify which tool to use first, but the one used second may be subject to learning bias. Therefore, a study with two groups of participants that exclusively use either of these tools may mitigate this threat. Nevertheless, this has no effect on the security of a provided solution.
We assessed the security of each solution manually, which may be prone to observers’ expectancy and subjectivity. We mitigated this threat by reviewing each solution based on a check list of risks provided by a team of external security experts.
There is a threat to external validity due to the small number of subjects in this study. We reduced this threat by only recruiting participants who were employed at the software industry. Nevertheless, future studies with a more representative number of participants is necessary. What’s more, we cannot rule out the possibility that the results could have been different if the tasks were more complex and within a real-world software development context.