How to find nearest divisor to given value with modulo zero

Question:

I’m trying to preprocess a dataset for a neuronal network. Therefore, I need to reshape an array with the shape (2040906, 1) into an array of batches.

I need a batch size around 1440 rows but 2040906 is not dividable (with a remainder of zero) by that number obviously.

I tried to just calculate the modulo of the division and drop as many rows as the remainder so the division will result in a modulo of zero. But dropping rows of my dataset is not what I want to do.

So this is an example snippet to reproduce the problem.

import numpy as np

x = np.ones((2040906, 1))

np.split(x, 1440)

The perfect solution for me would be some kind of function, that returns the nearest divisor for a given value that has a remainder of 0.

Asked By: Steven

||

Answers:

Looking for the largest divisor is not a good approach because of two reasons.

  1. The size of array might be prime number.
  2. The divisor may be too large or too small resulting in ineffective learning.

The better idea is to pad dataset with samples randomly selected from the whole dataset to make it divisible by optimal batch size. Here is the simple trick to compute the size of padded array divisible by 1440

(-x.shape[0] % 1440) + x.shape[0]

However, when data is ordered (like time series) then padding cannot be used because there no way to construct representative content of padding data.

The alternative solution would be minimization of truncated data. One can search through a range a available padding to find requires minimal truncation.

def find_best_divisor(size, low, high, step=1):
    minimal_truncation, best_divisor = min((size % divisor, divisor)
        for divisor in range(low, high, step))
    return best_divisor

This approach is nice because it allows to utilize data well and use padding suitable for training.

Answered By: tstanisl

Not sure this is the most elegant solution, but you can do the following:

  1. Get all divisor for the number in question
def getDivisors(n, res=None) : 
    res = res or []
    i = 1
    while i <= n : 
        if (n % i==0) : 
            res.append(i), 
        i = i + 1
    return res

getDivisors(2040906)
Out[4]: 
[1,
 2,
 3,
 6,
 7,
 14,
 21,
 42,
 48593,
 97186,
 145779,
 291558,
 340151,
 680302,
 1020453,
 2040906]
  1. Return the closest divisor
def get_closest_split(n, close_to=1440):
    all_divisors = getDivisors(n)
    for ix, val in enumerate(all_divisors):
        if close_to < val:
            if ix == 0: return val
            if (val-close_to)>(close_to - all_divisors[ix-1]):
                return all_divisors[ix-1]
            return val

def get_closest_split(n, close_to=1440)
Out[6]: 42

Which in your case, would return 42 as the only divisor closest to 1440. Thus, np.split(x, 42) should work.

Answered By: realr

Another solution for finding either the closest larger divisor, or closest smaller divisor.

import numpy as np

def get_whole_ceil(n,near):
    nn = np.divide(n,np.linspace(1,np.ceil(n/near),int(np.ceil(n/near))))
    return(nn[nn%1==0][-1])

def get_whole_floor(n,near):
    nn = np.divide(n,np.linspace(np.floor(n/near),n,int(n-np.floor(n/near)+1)))
    return(nn[nn%1==0][0])

get_whole_ceil(2040906,1440)

Out[1]: 48593.0

get_whole_floor(2040906,1440)

Out[1]: 42.0
Answered By: Rythn

Sometimes it is easier to solve to more general problem than to solve the problem at hand. So I look for prime factors and the calculate all possible products between them. In this case, it’s also x40 faster. I also took note from @tstanisl to allow you to limit the amount of work done.

You can use divisors() for a sorted list of divisors, then look for the nearest one.

from itertools import chain, combinations
from functools import reduce # Valid in Python 2.6+, required in Python 3
import operator

def prime_factors(n, up_to=None):
  """
  Returns prime factors for 'n' up to 'up_to', excluding 1 (unless n == 1)
  as a sequence of tuples '(b, e)', 'b' being the factor and 'e' being
  the exponent of that factor.
  """
  if up_to is None:
    up_to = n
  for i in range(2, min(n, up_to)):
    if n % i == 0:
      factors = prime_factors(n//i, up_to=up_to)
      if factors:
        # we always get the smallest factor last, so if it is
        # the same as the current number we're looking at,
        # add up the exponent
        last_factor, last_exp = factors[-1]
        if last_factor == i:
          return factors[:-1] + ((i, last_exp+1),)
      return factors + ((i,1),)
  if up_to is not None and up_to < n:
    return tuple()
  return ((n,1),)

# thanks to https://docs.python.org/dev/library/itertools.html#itertools-recipes
def powerset(iterable):
  """
  Generates the powerset of a given iterable.
  >>> list(powerset([1,2,3]))
  [(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]
  """
  s = list(iterable)
  return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

# thanks to https://stackoverflow.com/questions/595374/whats-the-function-like-sum-but-for-multiplication-product
def prod(t):
  return reduce(operator.mul, t, 1)

def divisors(n, up_to=None):
  """
  Returns a sorted list of divisors of 'n'. If 'up_to' is specified,
  only prime factors up to 'up_to' will be considered when calculating
  the list of divisors.
  """
  return [1] + sorted([
      prod(fs)
      for comb in powerset(prime_factors(n, up_to))
      if comb
      for fs in itertools.product(*(
          tuple(b**ei for ei in range(1,e+1))
          for b,e in comb))
  ])


# >>> divisors(2040906)
# [1, 2, 3, 6, 7, 14, 21, 42, 48593, 97186,
#  145779, 291558, 340151, 680302, 1020453, 2040906]
# >>> divisors(2040906, 48592)
# [1, 2, 3, 6, 7, 14, 21, 42]
# >>> %timeit divisors(2040906)
# 100 loops, best of 5: 3.93 ms per loop
# >>> %timeit getDivisors(2040906)  # from answer by @calestini
# 10 loops, best of 5: 170 ms per loop
Answered By: Yuval

I created a simple code for this, and it works well for me.

def get_closest_divisor(num, divisor):
    for i in range(num):
        if ( num % divisor > 0): 
            num = num + 1
    return num

Then by running this function

get_closest_divisor(33756, 512)
[Out]: 33792
Answered By: zinger
Categories: questions Tags: , ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.