I've been looking for a simple Java algorithm to generate a pseudo-random alpha-numeric string. In my situation it would be used as a unique session/key identifier that would "likely" be unique over 500K+ generation (my needs don't really require anything much more sophisticated). Ideally, I would be able to specify a length depending on my uniqueness needs. For example, a generated string of length 12 might look something like `"AEYGF7K0DM1X"`.

csdnceshi62 I know this is old, but... the "50% chance" in the birthday paradox is NOT "per try", it's "50% chance that, out of (in this case) 34 billion strings, there exists at least one pair of duplicates". You'd need 1.6 septillion - 1.6e21 - entries in your database in order for there to be a 50% chance per try.

larry*wei Length is irrelevant as it depends on encoding - what you are interested in is entropy. Entropy of 128bit should be fine for most use cases (e.g. 32 hex digits)

3 年多之前 回复

5 年多之前 回复
elliott.david Even taking the birthday paradox in consideration, if you use 12 alphanumeric characters (62 total), you would still need well over 34 billion strings to reach the paradox. And the birthday paradox doesn't guarantee a collision anyways, it just says it's over 50% chance.

19个回答

## Algorithm

To generate a random string, concatenate characters drawn randomly from the set of acceptable symbols until the string reaches the desired length.

## Implementation

Here's some fairly simple and very flexible code for generating random identifiers. Read the information that follows for important application notes.

``````import java.security.SecureRandom;
import java.util.Locale;
import java.util.Objects;
import java.util.Random;

public class RandomString {

/**
* Generate a random string.
*/
public String nextString() {
for (int idx = 0; idx < buf.length; ++idx)
buf[idx] = symbols[random.nextInt(symbols.length)];
return new String(buf);
}

public static final String upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";

public static final String lower = upper.toLowerCase(Locale.ROOT);

public static final String digits = "0123456789";

public static final String alphanum = upper + lower + digits;

private final Random random;

private final char[] symbols;

private final char[] buf;

public RandomString(int length, Random random, String symbols) {
if (length < 1) throw new IllegalArgumentException();
if (symbols.length() < 2) throw new IllegalArgumentException();
this.random = Objects.requireNonNull(random);
this.symbols = symbols.toCharArray();
this.buf = new char[length];
}

/**
* Create an alphanumeric string generator.
*/
public RandomString(int length, Random random) {
this(length, random, alphanum);
}

/**
* Create an alphanumeric strings from a secure generator.
*/
public RandomString(int length) {
this(length, new SecureRandom());
}

/**
* Create session identifiers.
*/
public RandomString() {
this(21);
}

}
``````

## Usage examples

Create an insecure generator for 8-character identifiers:

``````RandomString gen = new RandomString(8, ThreadLocalRandom.current());
``````

Create a secure generator for session identifiers:

``````RandomString session = new RandomString();
``````

Create a generator with easy-to-read codes for printing. The strings are longer than full alphanumeric strings to compensate for using fewer symbols:

``````String easy = RandomString.digits + "ACEFGHJKLMNPQRUVWXYabcdefhijkprstuvwx";
RandomString tickets = new RandomString(23, new SecureRandom(), easy);
``````

## Use as session identifiers

Generating session identifiers that are likely to be unique is not good enough, or you could just use a simple counter. Attackers hijack sessions when predictable identifiers are used.

There is tension between length and security. Shorter identifiers are easier to guess, because there are fewer possibilities. But longer identifiers consume more storage and bandwidth. A larger set of symbols helps, but might cause encoding problems if identifiers are included in URLs or re-entered by hand.

The underlying source of randomness, or entropy, for session identifiers should come from a random number generator designed for cryptography. However, initializing these generators can sometimes be computationally expensive or slow, so effort should be made to re-use them when possible.

## Use as object identifiers

Not every application requires security. Random assignment can be an efficient way for multiple entities to generate identifiers in a shared space without any coordination or partitioning. Coordination can be slow, especially in a clustered or distributed environment, and splitting up a space causes problems when entities end up with shares that are too small or too big.

Identifiers generated without taking measures to make them unpredictable should be protected by other means if an attacker might be able to view and manipulate them, as happens in most web applications. There should be a separate authorization system that protects objects whose identifier can be guessed by an attacker without access permission.

Care must be also be taken to use identifiers that are long enough to make collisions unlikely given the anticipated total number of identifiers. This is referred to as "the birthday paradox." The probability of a collision, p, is approximately n2/(2qx), where n is the number of identifiers actually generated, q is the number of distinct symbols in the alphabet, and x is the length of the identifiers. This should be a very small number, like 2‑50 or less.

Working this out shows that the chance of collision among 500k 15-character identifiers is about 2‑52, which is probably less likely than undetected errors from cosmic rays, etc.

## Comparison with UUIDs

According to their specification, UUIDs are not designed to be unpredictable, and should not be used as session identifiers.

UUIDs in their standard format take a lot of space: 36 characters for only 122 bits of entropy. (Not all bits of a "random" UUID are selected randomly.) A randomly chosen alphanumeric string packs more entropy in just 21 characters.

UUIDs are not flexible; they have a standardized structure and layout. This is their chief virtue as well as their main weakness. When collaborating with an outside party, the standardization offered by UUIDs may be helpful. For purely internal use, they can be inefficient.

``````static final String AB = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
static SecureRandom rnd = new SecureRandom();

String randomString( int len ){
StringBuilder sb = new StringBuilder( len );
for( int i = 0; i < len; i++ )
sb.append( AB.charAt( rnd.nextInt(AB.length()) ) );
return sb.toString();
}
``````

2 年多之前 回复

elliott.david Is there a good reason to create the Random object in each method invocation? I don't think so.
4 年多之前 回复
python小菜 Why not put static Random rnd = new Random(); inside the method?
4 年多之前 回复

4 年多之前 回复

YaoRaoLov Consider using SecureRandom instead of the Random class. If passwords are generated on a server, it might be vulnerable to timing attacks.

8 年多之前 回复

If you're happy to use Apache classes, you could use `org.apache.commons.text.RandomStringGenerator` (commons-text).

Example:

``````RandomStringGenerator randomStringGenerator =
new RandomStringGenerator.Builder()
.withinRange('0', 'z')
.filteredBy(CharacterPredicates.LETTERS, CharacterPredicates.DIGITS)
.build();
randomStringGenerator.generate(12); // toUpperCase() if you want
``````

Since commons-lang 3.6, `RandomStringUtils` is deprecated.

5 年多之前 回复

6 年多之前 回复

Java supplies a way of doing this directly. If you don't want the dashes, they are easy to strip out. Just use `uuid.replace("-", "")`

``````import java.util.UUID;

public class randomStringGenerator {
public static void main(String[] args) {
System.out.println(generateString());
}

public static String generateString() {
String uuid = UUID.randomUUID().toString();
return "uuid = " + uuid;
}
}
``````

Output:

``````uuid = 2d7428a6-b58c-4008-8575-f05549f16316
``````
ℙℕℤℝ This will generate a 36 characters string. (32 hex digits + 4 dashes), not more.
3 年多之前 回复

4 年多之前 回复
elliott.david Just use base 64 if you want it to be hashed and alpha-numeric.
4 年多之前 回复
~Onlooker What about MD5 on this output? It's should be more difficult to guess.
5 年多之前 回复

6 年多之前 回复

6 年多之前 回复

6 年多之前 回复
from.. I suppose one of the other high-scoring answers
6 年多之前 回复
Lotus＠ - So what should you use instead of UUIDs?
6 年多之前 回复
from.. According to RFC4122 using UUID's as tokens is a bad idea: Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example. A predictable random number source will exacerbate the situation. ietf.org/rfc/rfc4122.txt
7 年多之前 回复

9 年多之前 回复

You can use Apache library for this: RandomStringUtils

``````RandomStringUtils.randomAlphanumeric(20).toUpperCase();
``````
?yb? Since commons-lang 3.6, RandomStringUtils is deprecated in favor of RandomStringGenerator of commons-text

10.24 Nothing to do with over engineering. If you want to create session ids, you need a cryptographic pseudo random generator. Every prng using time as seed is predictable and very insecure for data that should be unpredictable. Just use SecureRandom and you are good.

3 年多之前 回复
5 年多之前 回复

MAO-EYE : You are (unnecessarily) overengineering the system. While I agree that it uses time as seed, the attacker has to have the access to following data to to actually get what he wants 1. Time to the exact millisecond, when the code was seeded 2. Number of calls that have occurred so far 3. Atomicity for his own call (so that number of calls-so-far ramains same) If your attacker has all three of these things, then you have much bigger issue at hand...

I found this solution that generates a random hex encoded string. The provided unit test seems to hold up to my primary use case. Although, it is slightly more complex than some of the other answers provided.

``````/**
* Generate a random hex encoded string token of the specified length
*
* @param length
* @return random hex string
*/
public static synchronized String generateUniqueToken(Integer length){
byte random[] = new byte[length];
Random randomGenerator = new Random();
StringBuffer buffer = new StringBuffer();

randomGenerator.nextBytes(random);

for (int j = 0; j < random.length; j++) {
byte b1 = (byte) ((random[j] & 0xf0) >> 4);
byte b2 = (byte) (random[j] & 0x0f);
if (b1 < 10)
buffer.append((char) ('0' + b1));
else
buffer.append((char) ('A' + (b1 - 10)));
if (b2 < 10)
buffer.append((char) ('0' + b2));
else
buffer.append((char) ('A' + (b2 - 10)));
}
return (buffer.toString());
}

@Test
public void testGenerateUniqueToken(){
Set set = new HashSet();
String token = null;
int size = 16;

/* Seems like we should be able to generate 500K tokens
* without a duplicate
*/
for (int i=0; i<500000; i++){
token = Utility.generateUniqueToken(size);

if (token.length() != size * 2){
fail("Incorrect length");
} else if (set.contains(token)) {
fail("Duplicate token generated");
} else{
}
}
}
``````
derek5. I don't think it is fair to fail for duplicate tokens which is purely based on probability.

using Dollar should be simple as:

``````// "0123456789" + "ABCDE...Z"
String validCharacters = \$('0', '9').join() + \$('A', 'Z').join();

String randomString(int length) {
return \$(validCharacters).shuffle().slice(length).toString();
}

@Test
public void buildFiveRandomStrings() {
for (int i : \$(5)) {
System.out.println(randomString(12));
}
}
``````

it outputs something like that:

``````DKL1SBH9UJWC
JH7P0IT21EA5
5DTI72EO6SFU
HQUMJTEBNF7Y
1HCR6SKYWGT7
``````

3 年多之前 回复
``````import java.util.*;
import javax.swing.*;
public class alphanumeric{
public static void main(String args[]){
String nval,lenval;
int n,len;

nval=JOptionPane.showInputDialog("Enter number of codes you require : ");
n=Integer.parseInt(nval);

lenval=JOptionPane.showInputDialog("Enter code length you require : ");
len=Integer.parseInt(lenval);

find(n,len);

}
public static void find(int n,int length) {
String str1="0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
StringBuilder sb=new StringBuilder(length);
Random r = new Random();

System.out.println("\n\t Unique codes are \n\n");
for(int i=0;i<n;i++){
for(int j=0;j<length;j++){
sb.append(str1.charAt(r.nextInt(str1.length())));
}
System.out.println("  "+sb.toString());
sb.delete(0,length);
}
}
}
``````

Here it is in Java:

``````import static java.lang.Math.round;
import static java.lang.Math.random;
import static java.lang.Math.pow;
import static java.lang.Math.abs;
import static java.lang.Math.min;

public class RandomAlphaNum {
public static String gen(int length) {
StringBuffer sb = new StringBuffer();
for (int i = length; i > 0; i -= 12) {
int n = min(12, abs(i));
sb.append(leftPad(Long.toString(round(random() * pow(36, n)), 36), n, '0'));
}
return sb.toString();
}
}
``````

Here's a sample run:

``````scala> RandomAlphaNum.gen(42)
res3: java.lang.String = uja6snx21bswf9t89s00bxssu8g6qlu16ffzqaxxoy
``````
Lotus＠ All this double-infested random int generation is broken by design, slow and unreadable. Use Random#nextInt or nextLong. Switch to SecureRandom if needed.

5 年多之前 回复
Memor.の This will produce insecure sequences i.e. sequences which can be easily guessed.
6 年多之前 回复
``````import java.util.Random;

public class passGen{
//Verison 1.0
private static final String dCase = "abcdefghijklmnopqrstuvwxyz";
private static final String uCase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private static final String sChar = "!@#\$%^&*";
private static final String intChar = "0123456789";
private static Random r = new Random();
private static String pass = "";

public static void main (String[] args) {
System.out.println ("Generating pass...");
while (pass.length () != 16){
int rPick = r.nextInt(4);
if (rPick == 0){
int spot = r.nextInt(25);
pass += dCase.charAt(spot);
} else if (rPick == 1) {
int spot = r.nextInt (25);
pass += uCase.charAt(spot);
} else if (rPick == 2) {
int spot = r.nextInt (7);
pass += sChar.charAt(spot);
} else if (rPick == 3){
int spot = r.nextInt (9);
pass += intChar.charAt (spot);
}
}
System.out.println ("Generated Pass: " + pass);
}
}
``````

So what this does is just add's the password into the string and ... yeah works good check it out... very simple. I wrote it

hurriedly% I allowed myself to make some minor modifications. Why do you add + 0 that often? Why do you split declaration of spot and initialisxation? What is the advantage of indexes 1,2,3,4 instead of 0,1,2,3? Most importantly: you took a random value, and compared with if-else 4 times a new value, which could always mismatch, without gaining more randomness. But feel free to rollback.
8 年多之前 回复