I currently have some code that looks through various datasets and models electronic relationships between them. E.g., JSESSIONID.
I would like to model each user's interactions with an application where they have to submit unique identifiers. E.g., email address.
In processing logs of the application, I see emailA@host.com use the application with JSESSIONID asdfghjkl. I then see emailB@host.com also use the applcation with JESSIONID asdfghjkl. Finally, I see emailB@host.com use JSESSIONID qwertyuiop.
In my go code, it's easy for me to process the logs and write out both emailA@host.com and emailB@host.com as Nodes and then write the JSESSIONID relationship between them.
MERGE (a:EMAIL {label:userA@host.com}) MERGE (b:EMAIL {label:userB@host.com}) MERGE (a)-[:asdfghjkl]-(b)
However, I don't know the best way to do this at scale. (i.e., Application logs are 1TB in size). The limitation is memory -- I can't find all email addresses that use asdfghjkl as a SessionIDs without processing all the data, so I can't write out the relationship between them due to memory constraints.
What I would really like to do is to write out something as is follows, but this obviously fails:
MERGE (a:EMAIL {label:userA@host.com}) (a)-[:asdfghjkl]
Then later: MERGE (b:EMAIL {label:userB@host.com}) (b)-[:asdfghjkl]
Can I create these relationships with a query after the fact?